Is there a set of parameters which I could pass to RecommenderJob to avoid that 
random sampling, in order to create a test case for the issue I have 
experienced? Would setting --maxSimilaritiesPerItem and/or 
--maxPrefsPerUserInItemSimilarity help? Many thanks.

On 7 Aug 2013, at 16:12, Sebastian Schelter <[email protected]>
 wrote:

It could affect the results even in this case, as we also sample the
preferences when computing similar items.

On 07.08.2013 17:07, Rafal Lukawiecki wrote:
> Thank you, Sebastian. Would the random sampling affect the results of 
> RecommenderJob, in any case? I am setting --maxPrefsPerUser to exceed the 
> actual, maximum number of preferences expressed by every user.
> 
> Rafal
> 
> On 7 Aug 2013, at 15:48, Sebastian Schelter <[email protected]>
> wrote:
> 
> The code in trunk allows to you to specify a randomSeed, the older
> versions don't unfortunately.
> 
> On 07.08.2013 16:35, Rafal Lukawiecki wrote:
>> Hi Sebastian,
>> 
>> The quantity of returned "duplicates" is much too large to be caused just by 
>> sampling's randomness. I wonder if this could be related to something that 
>> is platform-specific, as in Windows vs. *nix representation of input files, 
>> data types etc.
>> 
>> For argument's sake, is it possible to fix the seed of the random aspect of 
>> the sampling so I could feed the same input through two platforms and 
>> compare the results?
>> 
>> Rafal
>> 
>> On 7 Aug 2013, at 15:20, Sebastian Schelter <[email protected]>
>> wrote:
>> 
>> Hi Rafal,
>> 
>> this sounds really strange, the bug should not have anything to do with
>> the version of Hadoop that you are running. You could sometimes not see
>> it due to the random sampling of the preferences.
>> 
>> --sebastian
>> 
>> On 07.08.2013 13:53, Rafal Lukawiecki wrote:
>>> Sebastian,
>>> 
>>> I've been doing a little more digging regarding the issue of preferences 
>>> being calculated for already preferred items. I re-run the jobs using the 
>>> same data and the same parameters on a different installation of Hadoop, 
>>> and the problem seems to have gone away. For now it looks like the issue 
>>> arises when I run it under Mahout 0.7 and 0.8 using HDP (Hortonworks Data 
>>> Platform) for Windows 1.1.0, with Hadoop 1.1.0. This problem does not show 
>>> up, yet in my tests, under Hadoop 1.2.1 compiled for OS X. I will work a 
>>> little more to ensure my results, but if they stood up, should I still 
>>> report it as a Mahout issue?
>>> 
>>> Rafal  
>>> --
>>> Rafal Lukawiecki
>>> Strategic Consultant and Director 
>>> Project Botticelli Ltd
>>> 
>>> On 1 Aug 2013, at 17:31, Sebastian Schelter <[email protected]> wrote:
>>> 
>>> Setting it to the maximum number should be enough. Would be great if you
>>> can share your dataset and tests.
>>> 
>>> 2013/8/1 Rafal Lukawiecki <[email protected]>
>>> 
>>>> Should I have set that parameter to a value much much larger than the
>>>> maximum number of actually expressed preferences by a user?
>>>> 
>>>> I'm working on an anonymised data set. If it works as an error test case,
>>>> I'd be happy to share it for your re-test. I am still hoping it is my
>>>> error, not Mahout's.
>>>> 
>>>> Rafal
>>>> --
>>>> Rafal Lukawiecki
>>>> Pardon brevity, mobile device.
>>>> 
>>>> On 1 Aug 2013, at 17:19, "Sebastian Schelter" <[email protected]> wrote:
>>>> 
>>>>> Ok, please file a bug report detailing what you've tested and what
>>>> results
>>>>> you got.
>>>>> 
>>>>> Just to clarify, setting maxPrefsPerUser to a high number still does not
>>>>> help? That surprises me.
>>>>> 
>>>>> 
>>>>> 2013/8/1 Rafal Lukawiecki <[email protected]>
>>>>> 
>>>>>> Hi Sebastian,
>>>>>> 
>>>>>> I've rechecked the results, and, I'm afraid that the issue has not gone
>>>>>> away, contrary to my yesterday's enthusiastic response. Using 0.8 I have
>>>>>> retested with and without --maxPrefsPerUser 9000 parameter (no user has
>>>>>> more than 5000 prefs). I have also supplied the prefs file, without the
>>>>>> preference value, that is as: user,item (one per line) as a
>>>> --filterFile,
>>>>>> with and without the -maxPrefsPerUser, and I am afraid we are also
>>>> seeing
>>>>>> recommendations for items the user has expressed a prior preference for.
>>>>>> 
>>>>>> I suppose I need to file a bug report.
>>>>>> 
>>>>>> Rafal
>>>>>> --
>>>>>> Rafal Lukawiecki
>>>>>> Pardon my brevity, sent from a telephone.
>>>>>> 
>>>>>> On 31 Jul 2013, at 22:35, "Rafal Lukawiecki" <
>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Dear Sebastian,
>>>>>>> 
>>>>>>> It looks like setting --maxPrefsPerUser 10000 have resolved the issue
>>>> in
>>>>>> our case—it seems that the most preferences a user had was just about
>>>> 5000,
>>>>>> so I doubled it just-in-case, but when I operationalise this model, I
>>>> will
>>>>>> make sure to calculate the actual max number of preferences and set the
>>>>>> parameter accordingly. I will double-check the resultset to make sure
>>>> the
>>>>>> issue is really gone, as I have only checked the few cases where we have
>>>>>> spotted a recommendation of a previously preferred item.
>>>>>>> 
>>>>>>> Would you like me to file a bug, and would you like me to test it on
>>>> 0.8
>>>>>> or another version? I am using 0.7.
>>>>>>> 
>>>>>>> Thanks for your kind support.
>>>>>>> Rafal
>>>>>>> --
>>>>>>> Rafal Lukawiecki
>>>>>>> Strategic Consultant and Director
>>>>>>> Project Botticelli Ltd
>>>>>>> 
>>>>>>> On 31 Jul 2013, at 06:22, Sebastian Schelter <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Rafal,
>>>>>>> 
>>>>>>> can you try to set the option --maxPrefsPerUser to the maximum number
>>>> of
>>>>>>> interactions per user and see if you still get the error?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Sebastian
>>>>>>> 
>>>>>>> On 30.07.2013 19:29, Rafal Lukawiecki wrote:
>>>>>>>> Thank you Sebastian. The data set is not that large, as we are running
>>>>>> tests on a subset. It is about 24k users, 40k items, the preference file
>>>>>> has 65k preferences as triples. This was using Similarity Cooccurrence.
>>>>>>>> 
>>>>>>>> I can see if I could anonymise the data set to share if that would be
>>>>>> helpful.
>>>>>>>> 
>>>>>>>> Thanks for your kind help.
>>>>>>>> 
>>>>>>>> Rafal
>>>>>>>> --
>>>>>>>> Rafal Lukawiecki
>>>>>>>> Pardon my brevity, sent from a telephone.
>>>>>>>> 
>>>>>>>> On 30 Jul 2013, at 18:18, "Sebastian Schelter" <[email protected]>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Rafal,
>>>>>>>>> 
>>>>>>>>> can you issue a ticket for this problem at
>>>>>>>>> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests
>>>> that
>>>>>>>>> check whether this happens and currently they work fine. I can only
>>>>>> imagine
>>>>>>>>> that the problem occurs in larger datasets where we sample the data
>>>> in
>>>>>> some
>>>>>>>>> places. Can you describe a scenario/dataset where this happens?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Sebastian
>>>>>>>>> 
>>>>>>>>> 2013/7/30 Rafal Lukawiecki <[email protected]>
>>>>>>>>> 
>>>>>>>>>> I'm new here, just registered. Many thanks to everyone for working
>>>> on
>>>>>> an
>>>>>>>>>> amazing piece of software, thank you for building Mahout and for
>>>> your
>>>>>>>>>> support. My apologies if this is not the right place to ask the
>>>>>> question—I
>>>>>>>>>> have searched for the issue, and I can see this problem has been
>>>>>> reported
>>>>>>>>>> here:
>>>>>> 
>>>> http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items
>>>>>>>>>> 
>>>>>>>>>> Unfortunately, the trail leads to the newsgroups, and I have not
>>>>>> found a
>>>>>>>>>> way, yet, to get an answer from them, without asking you.
>>>>>>>>>> 
>>>>>>>>>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7,
>>>>>> and I
>>>>>>>>>> am finding that it is recommending items that the user has already
>>>>>>>>>> expressed a preference for in their input file. I understand that
>>>> this
>>>>>>>>>> should not be happening, and I am not sure if there is a know fix or
>>>>>> if I
>>>>>>>>>> should be looking for a workaround (such as using the entire input
>>>> as
>>>>>> the
>>>>>>>>>> filterFile).
>>>>>>>>>> 
>>>>>>>>>> I will double-check that there is no error on my side, but so far it
>>>>>> does
>>>>>>>>>> not seem that way.
>>>>>>>>>> 
>>>>>>>>>> Many thanks and my regards from Ireland,
>>>>>>>>>> Rafal Lukawiecki
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> Rafal Lukawiecki
>>>>>>>>>> 
>>>>>>>>>> Strategic Consultant and Director
>>>>>>>>>> 
>>>>>>>>>> Project Botticelli Ltd
>>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 



Reply via email to