Removing previously recommended or items already in the training data or already marked as "Don't show" should all better be handled in the presentation layer with other business logic.
The rationale is that there is no single correct answer for any of these. Recommending razor blades to somebody who already bought them is a good thing. Recommending the same book to somebody who bought one recently, not so much. Likewise with stuff previously recommended. Hidden recommendations also have to be handled idiosyncratically. Is there a hold on entire genre's of things? On just one? All of this can be done easily at the business level and doing it at the recommender level just distorts everything. On Wed, Jul 31, 2013 at 2:46 PM, Sebastian Schelter <[email protected]> wrote: > Ideally, you would file a bug and see whether it still happens with trunk. > I think the problems comes from the fact, that we only use a certain number > of preferences from the user for the final recommendation phase. Therefore > we can hit an item as recommendation whose preference we neglected. > > Best, > Sebastian > > > > 2013/7/31 Rafal Lukawiecki <[email protected]> > > > Dear Sebastian, > > > > It looks like setting --maxPrefsPerUser 10000 have resolved the issue in > > our case—it seems that the most preferences a user had was just about > 5000, > > so I doubled it just-in-case, but when I operationalise this model, I > will > > make sure to calculate the actual max number of preferences and set the > > parameter accordingly. I will double-check the resultset to make sure the > > issue is really gone, as I have only checked the few cases where we have > > spotted a recommendation of a previously preferred item. > > > > Would you like me to file a bug, and would you like me to test it on 0.8 > > or another version? I am using 0.7. > > > > Thanks for your kind support. > > Rafal > > -- > > Rafal Lukawiecki > > Strategic Consultant and Director > > Project Botticelli Ltd > > > > On 31 Jul 2013, at 06:22, Sebastian Schelter <[email protected]> > > wrote: > > > > Hi Rafal, > > > > can you try to set the option --maxPrefsPerUser to the maximum number of > > interactions per user and see if you still get the error? > > > > Best, > > Sebastian > > > > On 30.07.2013 19:29, Rafal Lukawiecki wrote: > > > Thank you Sebastian. The data set is not that large, as we are running > > tests on a subset. It is about 24k users, 40k items, the preference file > > has 65k preferences as triples. This was using Similarity Cooccurrence. > > > > > > I can see if I could anonymise the data set to share if that would be > > helpful. > > > > > > Thanks for your kind help. > > > > > > Rafal > > > -- > > > Rafal Lukawiecki > > > Pardon my brevity, sent from a telephone. > > > > > > On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]> wrote: > > > > > >> Hi Rafal, > > >> > > >> can you issue a ticket for this problem at > > >> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests > that > > >> check whether this happens and currently they work fine. I can only > > imagine > > >> that the problem occurs in larger datasets where we sample the data in > > some > > >> places. Can you describe a scenario/dataset where this happens? > > >> > > >> Best, > > >> Sebastian > > >> > > >> 2013/7/30 Rafal Lukawiecki <[email protected]> > > >> > > >>> I'm new here, just registered. Many thanks to everyone for working on > > an > > >>> amazing piece of software, thank you for building Mahout and for your > > >>> support. My apologies if this is not the right place to ask the > > question—I > > >>> have searched for the issue, and I can see this problem has been > > reported > > >>> here: > > >>> > > > http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items > > >>> > > >>> Unfortunately, the trail leads to the newsgroups, and I have not > found > > a > > >>> way, yet, to get an answer from them, without asking you. > > >>> > > >>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7, > and > > I > > >>> am finding that it is recommending items that the user has already > > >>> expressed a preference for in their input file. I understand that > this > > >>> should not be happening, and I am not sure if there is a know fix or > > if I > > >>> should be looking for a workaround (such as using the entire input as > > the > > >>> filterFile). > > >>> > > >>> I will double-check that there is no error on my side, but so far it > > does > > >>> not seem that way. > > >>> > > >>> Many thanks and my regards from Ireland, > > >>> Rafal Lukawiecki > > >>> > > >>> -- > > >>> > > >>> Rafal Lukawiecki > > >>> > > >>> Strategic Consultant and Director > > >>> > > >>> Project Botticelli Ltd > > >>> > > >>> > > > > > > > > >
