if you also set --maxPrefsPerUserInItemSimilarity to a number higher than
the max preferences per user, no sampling should occur. This might slow
down the job however.

2013/8/7 Rafal Lukawiecki <[email protected]>

> Is there a set of parameters which I could pass to RecommenderJob to avoid
> that random sampling, in order to create a test case for the issue I have
> experienced? Would setting --maxSimilaritiesPerItem and/or
> --maxPrefsPerUserInItemSimilarity help? Many thanks.
>
> On 7 Aug 2013, at 16:12, Sebastian Schelter <[email protected]>
>  wrote:
>
> It could affect the results even in this case, as we also sample the
> preferences when computing similar items.
>
> On 07.08.2013 17:07, Rafal Lukawiecki wrote:
> > Thank you, Sebastian. Would the random sampling affect the results of
> RecommenderJob, in any case? I am setting --maxPrefsPerUser to exceed the
> actual, maximum number of preferences expressed by every user.
> >
> > Rafal
> >
> > On 7 Aug 2013, at 15:48, Sebastian Schelter <[email protected]>
> > wrote:
> >
> > The code in trunk allows to you to specify a randomSeed, the older
> > versions don't unfortunately.
> >
> > On 07.08.2013 16:35, Rafal Lukawiecki wrote:
> >> Hi Sebastian,
> >>
> >> The quantity of returned "duplicates" is much too large to be caused
> just by sampling's randomness. I wonder if this could be related to
> something that is platform-specific, as in Windows vs. *nix representation
> of input files, data types etc.
> >>
> >> For argument's sake, is it possible to fix the seed of the random
> aspect of the sampling so I could feed the same input through two platforms
> and compare the results?
> >>
> >> Rafal
> >>
> >> On 7 Aug 2013, at 15:20, Sebastian Schelter <[email protected]>
> >> wrote:
> >>
> >> Hi Rafal,
> >>
> >> this sounds really strange, the bug should not have anything to do with
> >> the version of Hadoop that you are running. You could sometimes not see
> >> it due to the random sampling of the preferences.
> >>
> >> --sebastian
> >>
> >> On 07.08.2013 13:53, Rafal Lukawiecki wrote:
> >>> Sebastian,
> >>>
> >>> I've been doing a little more digging regarding the issue of
> preferences being calculated for already preferred items. I re-run the jobs
> using the same data and the same parameters on a different installation of
> Hadoop, and the problem seems to have gone away. For now it looks like the
> issue arises when I run it under Mahout 0.7 and 0.8 using HDP (Hortonworks
> Data Platform) for Windows 1.1.0, with Hadoop 1.1.0. This problem does not
> show up, yet in my tests, under Hadoop 1.2.1 compiled for OS X. I will work
> a little more to ensure my results, but if they stood up, should I still
> report it as a Mahout issue?
> >>>
> >>> Rafal
> >>> --
> >>> Rafal Lukawiecki
> >>> Strategic Consultant and Director
> >>> Project Botticelli Ltd
> >>>
> >>> On 1 Aug 2013, at 17:31, Sebastian Schelter <[email protected]> wrote:
> >>>
> >>> Setting it to the maximum number should be enough. Would be great if
> you
> >>> can share your dataset and tests.
> >>>
> >>> 2013/8/1 Rafal Lukawiecki <[email protected]>
> >>>
> >>>> Should I have set that parameter to a value much much larger than the
> >>>> maximum number of actually expressed preferences by a user?
> >>>>
> >>>> I'm working on an anonymised data set. If it works as an error test
> case,
> >>>> I'd be happy to share it for your re-test. I am still hoping it is my
> >>>> error, not Mahout's.
> >>>>
> >>>> Rafal
> >>>> --
> >>>> Rafal Lukawiecki
> >>>> Pardon brevity, mobile device.
> >>>>
> >>>> On 1 Aug 2013, at 17:19, "Sebastian Schelter" <[email protected]> wrote:
> >>>>
> >>>>> Ok, please file a bug report detailing what you've tested and what
> >>>> results
> >>>>> you got.
> >>>>>
> >>>>> Just to clarify, setting maxPrefsPerUser to a high number still does
> not
> >>>>> help? That surprises me.
> >>>>>
> >>>>>
> >>>>> 2013/8/1 Rafal Lukawiecki <[email protected]>
> >>>>>
> >>>>>> Hi Sebastian,
> >>>>>>
> >>>>>> I've rechecked the results, and, I'm afraid that the issue has not
> gone
> >>>>>> away, contrary to my yesterday's enthusiastic response. Using 0.8 I
> have
> >>>>>> retested with and without --maxPrefsPerUser 9000 parameter (no user
> has
> >>>>>> more than 5000 prefs). I have also supplied the prefs file, without
> the
> >>>>>> preference value, that is as: user,item (one per line) as a
> >>>> --filterFile,
> >>>>>> with and without the -maxPrefsPerUser, and I am afraid we are also
> >>>> seeing
> >>>>>> recommendations for items the user has expressed a prior preference
> for.
> >>>>>>
> >>>>>> I suppose I need to file a bug report.
> >>>>>>
> >>>>>> Rafal
> >>>>>> --
> >>>>>> Rafal Lukawiecki
> >>>>>> Pardon my brevity, sent from a telephone.
> >>>>>>
> >>>>>> On 31 Jul 2013, at 22:35, "Rafal Lukawiecki" <
> >>>> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Dear Sebastian,
> >>>>>>>
> >>>>>>> It looks like setting --maxPrefsPerUser 10000 have resolved the
> issue
> >>>> in
> >>>>>> our case—it seems that the most preferences a user had was just
> about
> >>>> 5000,
> >>>>>> so I doubled it just-in-case, but when I operationalise this model,
> I
> >>>> will
> >>>>>> make sure to calculate the actual max number of preferences and set
> the
> >>>>>> parameter accordingly. I will double-check the resultset to make
> sure
> >>>> the
> >>>>>> issue is really gone, as I have only checked the few cases where we
> have
> >>>>>> spotted a recommendation of a previously preferred item.
> >>>>>>>
> >>>>>>> Would you like me to file a bug, and would you like me to test it
> on
> >>>> 0.8
> >>>>>> or another version? I am using 0.7.
> >>>>>>>
> >>>>>>> Thanks for your kind support.
> >>>>>>> Rafal
> >>>>>>> --
> >>>>>>> Rafal Lukawiecki
> >>>>>>> Strategic Consultant and Director
> >>>>>>> Project Botticelli Ltd
> >>>>>>>
> >>>>>>> On 31 Jul 2013, at 06:22, Sebastian Schelter <
> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Rafal,
> >>>>>>>
> >>>>>>> can you try to set the option --maxPrefsPerUser to the maximum
> number
> >>>> of
> >>>>>>> interactions per user and see if you still get the error?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Sebastian
> >>>>>>>
> >>>>>>> On 30.07.2013 19:29, Rafal Lukawiecki wrote:
> >>>>>>>> Thank you Sebastian. The data set is not that large, as we are
> running
> >>>>>> tests on a subset. It is about 24k users, 40k items, the preference
> file
> >>>>>> has 65k preferences as triples. This was using Similarity
> Cooccurrence.
> >>>>>>>>
> >>>>>>>> I can see if I could anonymise the data set to share if that
> would be
> >>>>>> helpful.
> >>>>>>>>
> >>>>>>>> Thanks for your kind help.
> >>>>>>>>
> >>>>>>>> Rafal
> >>>>>>>> --
> >>>>>>>> Rafal Lukawiecki
> >>>>>>>> Pardon my brevity, sent from a telephone.
> >>>>>>>>
> >>>>>>>> On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Rafal,
> >>>>>>>>>
> >>>>>>>>> can you issue a ticket for this problem at
> >>>>>>>>> https://issues.apache.org/jira/browse/MAHOUT ? We have
> unit-tests
> >>>> that
> >>>>>>>>> check whether this happens and currently they work fine. I can
> only
> >>>>>> imagine
> >>>>>>>>> that the problem occurs in larger datasets where we sample the
> data
> >>>> in
> >>>>>> some
> >>>>>>>>> places. Can you describe a scenario/dataset where this happens?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Sebastian
> >>>>>>>>>
> >>>>>>>>> 2013/7/30 Rafal Lukawiecki <[email protected]>
> >>>>>>>>>
> >>>>>>>>>> I'm new here, just registered. Many thanks to everyone for
> working
> >>>> on
> >>>>>> an
> >>>>>>>>>> amazing piece of software, thank you for building Mahout and for
> >>>> your
> >>>>>>>>>> support. My apologies if this is not the right place to ask the
> >>>>>> question—I
> >>>>>>>>>> have searched for the issue, and I can see this problem has been
> >>>>>> reported
> >>>>>>>>>> here:
> >>>>>>
> >>>>
> http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately, the trail leads to the newsgroups, and I have not
> >>>>>> found a
> >>>>>>>>>> way, yet, to get an answer from them, without asking you.
> >>>>>>>>>>
> >>>>>>>>>> Essentially, I am running a Hadoop RecommenderJob from Mahout
> 0.7,
> >>>>>> and I
> >>>>>>>>>> am finding that it is recommending items that the user has
> already
> >>>>>>>>>> expressed a preference for in their input file. I understand
> that
> >>>> this
> >>>>>>>>>> should not be happening, and I am not sure if there is a know
> fix or
> >>>>>> if I
> >>>>>>>>>> should be looking for a workaround (such as using the entire
> input
> >>>> as
> >>>>>> the
> >>>>>>>>>> filterFile).
> >>>>>>>>>>
> >>>>>>>>>> I will double-check that there is no error on my side, but so
> far it
> >>>>>> does
> >>>>>>>>>> not seem that way.
> >>>>>>>>>>
> >>>>>>>>>> Many thanks and my regards from Ireland,
> >>>>>>>>>> Rafal Lukawiecki
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> Rafal Lukawiecki
> >>>>>>>>>>
> >>>>>>>>>> Strategic Consultant and Director
> >>>>>>>>>>
> >>>>>>>>>> Project Botticelli Ltd
> >>>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> >
>
>
>
>

Reply via email to