Hi Rafal, this sounds really strange, the bug should not have anything to do with the version of Hadoop that you are running. You could sometimes not see it due to the random sampling of the preferences.
--sebastian On 07.08.2013 13:53, Rafal Lukawiecki wrote: > Sebastian, > > I've been doing a little more digging regarding the issue of preferences > being calculated for already preferred items. I re-run the jobs using the > same data and the same parameters on a different installation of Hadoop, and > the problem seems to have gone away. For now it looks like the issue arises > when I run it under Mahout 0.7 and 0.8 using HDP (Hortonworks Data Platform) > for Windows 1.1.0, with Hadoop 1.1.0. This problem does not show up, yet in > my tests, under Hadoop 1.2.1 compiled for OS X. I will work a little more to > ensure my results, but if they stood up, should I still report it as a Mahout > issue? > > Rafal > -- > Rafal Lukawiecki > Strategic Consultant and Director > Project Botticelli Ltd > > On 1 Aug 2013, at 17:31, Sebastian Schelter <[email protected]> wrote: > > Setting it to the maximum number should be enough. Would be great if you > can share your dataset and tests. > > 2013/8/1 Rafal Lukawiecki <[email protected]> > >> Should I have set that parameter to a value much much larger than the >> maximum number of actually expressed preferences by a user? >> >> I'm working on an anonymised data set. If it works as an error test case, >> I'd be happy to share it for your re-test. I am still hoping it is my >> error, not Mahout's. >> >> Rafal >> -- >> Rafal Lukawiecki >> Pardon brevity, mobile device. >> >> On 1 Aug 2013, at 17:19, "Sebastian Schelter" <[email protected]> wrote: >> >>> Ok, please file a bug report detailing what you've tested and what >> results >>> you got. >>> >>> Just to clarify, setting maxPrefsPerUser to a high number still does not >>> help? That surprises me. >>> >>> >>> 2013/8/1 Rafal Lukawiecki <[email protected]> >>> >>>> Hi Sebastian, >>>> >>>> I've rechecked the results, and, I'm afraid that the issue has not gone >>>> away, contrary to my yesterday's enthusiastic response. Using 0.8 I have >>>> retested with and without --maxPrefsPerUser 9000 parameter (no user has >>>> more than 5000 prefs). I have also supplied the prefs file, without the >>>> preference value, that is as: user,item (one per line) as a >> --filterFile, >>>> with and without the -maxPrefsPerUser, and I am afraid we are also >> seeing >>>> recommendations for items the user has expressed a prior preference for. >>>> >>>> I suppose I need to file a bug report. >>>> >>>> Rafal >>>> -- >>>> Rafal Lukawiecki >>>> Pardon my brevity, sent from a telephone. >>>> >>>> On 31 Jul 2013, at 22:35, "Rafal Lukawiecki" < >> [email protected]> >>>> wrote: >>>> >>>>> Dear Sebastian, >>>>> >>>>> It looks like setting --maxPrefsPerUser 10000 have resolved the issue >> in >>>> our case—it seems that the most preferences a user had was just about >> 5000, >>>> so I doubled it just-in-case, but when I operationalise this model, I >> will >>>> make sure to calculate the actual max number of preferences and set the >>>> parameter accordingly. I will double-check the resultset to make sure >> the >>>> issue is really gone, as I have only checked the few cases where we have >>>> spotted a recommendation of a previously preferred item. >>>>> >>>>> Would you like me to file a bug, and would you like me to test it on >> 0.8 >>>> or another version? I am using 0.7. >>>>> >>>>> Thanks for your kind support. >>>>> Rafal >>>>> -- >>>>> Rafal Lukawiecki >>>>> Strategic Consultant and Director >>>>> Project Botticelli Ltd >>>>> >>>>> On 31 Jul 2013, at 06:22, Sebastian Schelter <[email protected]> >>>>> wrote: >>>>> >>>>> Hi Rafal, >>>>> >>>>> can you try to set the option --maxPrefsPerUser to the maximum number >> of >>>>> interactions per user and see if you still get the error? >>>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> On 30.07.2013 19:29, Rafal Lukawiecki wrote: >>>>>> Thank you Sebastian. The data set is not that large, as we are running >>>> tests on a subset. It is about 24k users, 40k items, the preference file >>>> has 65k preferences as triples. This was using Similarity Cooccurrence. >>>>>> >>>>>> I can see if I could anonymise the data set to share if that would be >>>> helpful. >>>>>> >>>>>> Thanks for your kind help. >>>>>> >>>>>> Rafal >>>>>> -- >>>>>> Rafal Lukawiecki >>>>>> Pardon my brevity, sent from a telephone. >>>>>> >>>>>> On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]> >> wrote: >>>>>> >>>>>>> Hi Rafal, >>>>>>> >>>>>>> can you issue a ticket for this problem at >>>>>>> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests >> that >>>>>>> check whether this happens and currently they work fine. I can only >>>> imagine >>>>>>> that the problem occurs in larger datasets where we sample the data >> in >>>> some >>>>>>> places. Can you describe a scenario/dataset where this happens? >>>>>>> >>>>>>> Best, >>>>>>> Sebastian >>>>>>> >>>>>>> 2013/7/30 Rafal Lukawiecki <[email protected]> >>>>>>> >>>>>>>> I'm new here, just registered. Many thanks to everyone for working >> on >>>> an >>>>>>>> amazing piece of software, thank you for building Mahout and for >> your >>>>>>>> support. My apologies if this is not the right place to ask the >>>> question—I >>>>>>>> have searched for the issue, and I can see this problem has been >>>> reported >>>>>>>> here: >>>> >> http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items >>>>>>>> >>>>>>>> Unfortunately, the trail leads to the newsgroups, and I have not >>>> found a >>>>>>>> way, yet, to get an answer from them, without asking you. >>>>>>>> >>>>>>>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7, >>>> and I >>>>>>>> am finding that it is recommending items that the user has already >>>>>>>> expressed a preference for in their input file. I understand that >> this >>>>>>>> should not be happening, and I am not sure if there is a know fix or >>>> if I >>>>>>>> should be looking for a workaround (such as using the entire input >> as >>>> the >>>>>>>> filterFile). >>>>>>>> >>>>>>>> I will double-check that there is no error on my side, but so far it >>>> does >>>>>>>> not seem that way. >>>>>>>> >>>>>>>> Many thanks and my regards from Ireland, >>>>>>>> Rafal Lukawiecki >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Rafal Lukawiecki >>>>>>>> >>>>>>>> Strategic Consultant and Director >>>>>>>> >>>>>>>> Project Botticelli Ltd >>>> >> > >
