Hi Sebastian, I've rechecked the results, and, I'm afraid that the issue has not gone away, contrary to my yesterday's enthusiastic response. Using 0.8 I have retested with and without --maxPrefsPerUser 9000 parameter (no user has more than 5000 prefs). I have also supplied the prefs file, without the preference value, that is as: user,item (one per line) as a --filterFile, with and without the -maxPrefsPerUser, and I am afraid we are also seeing recommendations for items the user has expressed a prior preference for.
I suppose I need to file a bug report. Rafal -- Rafal Lukawiecki Pardon my brevity, sent from a telephone. On 31 Jul 2013, at 22:35, "Rafal Lukawiecki" <[email protected]> wrote: > Dear Sebastian, > > It looks like setting --maxPrefsPerUser 10000 have resolved the issue in our > case—it seems that the most preferences a user had was just about 5000, so I > doubled it just-in-case, but when I operationalise this model, I will make > sure to calculate the actual max number of preferences and set the parameter > accordingly. I will double-check the resultset to make sure the issue is > really gone, as I have only checked the few cases where we have spotted a > recommendation of a previously preferred item. > > Would you like me to file a bug, and would you like me to test it on 0.8 or > another version? I am using 0.7. > > Thanks for your kind support. > Rafal > -- > Rafal Lukawiecki > Strategic Consultant and Director > Project Botticelli Ltd > > On 31 Jul 2013, at 06:22, Sebastian Schelter <[email protected]> > wrote: > > Hi Rafal, > > can you try to set the option --maxPrefsPerUser to the maximum number of > interactions per user and see if you still get the error? > > Best, > Sebastian > > On 30.07.2013 19:29, Rafal Lukawiecki wrote: >> Thank you Sebastian. The data set is not that large, as we are running tests >> on a subset. It is about 24k users, 40k items, the preference file has 65k >> preferences as triples. This was using Similarity Cooccurrence. >> >> I can see if I could anonymise the data set to share if that would be >> helpful. >> >> Thanks for your kind help. >> >> Rafal >> -- >> Rafal Lukawiecki >> Pardon my brevity, sent from a telephone. >> >> On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]> wrote: >> >>> Hi Rafal, >>> >>> can you issue a ticket for this problem at >>> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests that >>> check whether this happens and currently they work fine. I can only imagine >>> that the problem occurs in larger datasets where we sample the data in some >>> places. Can you describe a scenario/dataset where this happens? >>> >>> Best, >>> Sebastian >>> >>> 2013/7/30 Rafal Lukawiecki <[email protected]> >>> >>>> I'm new here, just registered. Many thanks to everyone for working on an >>>> amazing piece of software, thank you for building Mahout and for your >>>> support. My apologies if this is not the right place to ask the question—I >>>> have searched for the issue, and I can see this problem has been reported >>>> here: >>>> http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items >>>> >>>> Unfortunately, the trail leads to the newsgroups, and I have not found a >>>> way, yet, to get an answer from them, without asking you. >>>> >>>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7, and I >>>> am finding that it is recommending items that the user has already >>>> expressed a preference for in their input file. I understand that this >>>> should not be happening, and I am not sure if there is a know fix or if I >>>> should be looking for a workaround (such as using the entire input as the >>>> filterFile). >>>> >>>> I will double-check that there is no error on my side, but so far it does >>>> not seem that way. >>>> >>>> Many thanks and my regards from Ireland, >>>> Rafal Lukawiecki >>>> >>>> -- >>>> >>>> Rafal Lukawiecki >>>> >>>> Strategic Consultant and Director >>>> >>>> Project Botticelli Ltd > > >
