Hi Ted, That's true in general, but for usecases such as generating recommendations in batch for personalized newsletters, its a nice to have feature.
I also have the impression that most users expect to not see items with interactions. Ideally we would have a flag that lets users choose which behavior they want. I think one of the big benefits of ItemSimilarityJob and RecommenderJob is that these are "CSV file in" and "CSV file out" jobs that are super easy to use and we should keep that up. Best, Sebastian 2013/7/31 Ted Dunning <[email protected]> > Removing previously recommended or items already in the training data or > already marked as "Don't show" should all better be handled in the > presentation layer with other business logic. > > The rationale is that there is no single correct answer for any of these. > Recommending razor blades to somebody who already bought them is a good > thing. Recommending the same book to somebody who bought one recently, not > so much. Likewise with stuff previously recommended. > > Hidden recommendations also have to be handled idiosyncratically. Is there > a hold on entire genre's of things? On just one? > > All of this can be done easily at the business level and doing it at the > recommender level just distorts everything. > > > > On Wed, Jul 31, 2013 at 2:46 PM, Sebastian Schelter <[email protected]> > wrote: > > > Ideally, you would file a bug and see whether it still happens with > trunk. > > I think the problems comes from the fact, that we only use a certain > number > > of preferences from the user for the final recommendation phase. > Therefore > > we can hit an item as recommendation whose preference we neglected. > > > > Best, > > Sebastian > > > > > > > > 2013/7/31 Rafal Lukawiecki <[email protected]> > > > > > Dear Sebastian, > > > > > > It looks like setting --maxPrefsPerUser 10000 have resolved the issue > in > > > our case—it seems that the most preferences a user had was just about > > 5000, > > > so I doubled it just-in-case, but when I operationalise this model, I > > will > > > make sure to calculate the actual max number of preferences and set the > > > parameter accordingly. I will double-check the resultset to make sure > the > > > issue is really gone, as I have only checked the few cases where we > have > > > spotted a recommendation of a previously preferred item. > > > > > > Would you like me to file a bug, and would you like me to test it on > 0.8 > > > or another version? I am using 0.7. > > > > > > Thanks for your kind support. > > > Rafal > > > -- > > > Rafal Lukawiecki > > > Strategic Consultant and Director > > > Project Botticelli Ltd > > > > > > On 31 Jul 2013, at 06:22, Sebastian Schelter <[email protected]> > > > wrote: > > > > > > Hi Rafal, > > > > > > can you try to set the option --maxPrefsPerUser to the maximum number > of > > > interactions per user and see if you still get the error? > > > > > > Best, > > > Sebastian > > > > > > On 30.07.2013 19:29, Rafal Lukawiecki wrote: > > > > Thank you Sebastian. The data set is not that large, as we are > running > > > tests on a subset. It is about 24k users, 40k items, the preference > file > > > has 65k preferences as triples. This was using Similarity Cooccurrence. > > > > > > > > I can see if I could anonymise the data set to share if that would be > > > helpful. > > > > > > > > Thanks for your kind help. > > > > > > > > Rafal > > > > -- > > > > Rafal Lukawiecki > > > > Pardon my brevity, sent from a telephone. > > > > > > > > On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]> > wrote: > > > > > > > >> Hi Rafal, > > > >> > > > >> can you issue a ticket for this problem at > > > >> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests > > that > > > >> check whether this happens and currently they work fine. I can only > > > imagine > > > >> that the problem occurs in larger datasets where we sample the data > in > > > some > > > >> places. Can you describe a scenario/dataset where this happens? > > > >> > > > >> Best, > > > >> Sebastian > > > >> > > > >> 2013/7/30 Rafal Lukawiecki <[email protected]> > > > >> > > > >>> I'm new here, just registered. Many thanks to everyone for working > on > > > an > > > >>> amazing piece of software, thank you for building Mahout and for > your > > > >>> support. My apologies if this is not the right place to ask the > > > question—I > > > >>> have searched for the issue, and I can see this problem has been > > > reported > > > >>> here: > > > >>> > > > > > > http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items > > > >>> > > > >>> Unfortunately, the trail leads to the newsgroups, and I have not > > found > > > a > > > >>> way, yet, to get an answer from them, without asking you. > > > >>> > > > >>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7, > > and > > > I > > > >>> am finding that it is recommending items that the user has already > > > >>> expressed a preference for in their input file. I understand that > > this > > > >>> should not be happening, and I am not sure if there is a know fix > or > > > if I > > > >>> should be looking for a workaround (such as using the entire input > as > > > the > > > >>> filterFile). > > > >>> > > > >>> I will double-check that there is no error on my side, but so far > it > > > does > > > >>> not seem that way. > > > >>> > > > >>> Many thanks and my regards from Ireland, > > > >>> Rafal Lukawiecki > > > >>> > > > >>> -- > > > >>> > > > >>> Rafal Lukawiecki > > > >>> > > > >>> Strategic Consultant and Director > > > >>> > > > >>> Project Botticelli Ltd > > > >>> > > > >>> > > > > > > > > > > > > > > >
