Hi Brian *& *Miliauskas,
I am a data mining engineer form Taobao recommendation team. In past one
month, I have read all the code of mahout itemCF.
So maybe I can answer this question.
We consider the input of itemCF for one user is a item vector, like this
(the notation is from Json object model):
<userid, [ {item1, perf(u, i1)}, {item2, perf(u, i2)}, ..... {itemN,
perf(u, in)} ]>
So, maxPrefsPerUser means max length of item vector. If
user preferred more than this number items, there a sample will be applied
the make sure the limitation.
We also consider the output of ItemCF for one item is a similarity vector,
like this:
<item1, [ {item2, sim(2,1}, {item3, sim(3,1), .... {itemK, sim(K,1)} ]>
So, maxSimilaritiesPerItem means max length of similarity vector, if
item1 has more similar items than this number, mahout just output top
'maxSimilaritiesPerItem'
items.
For parameter 'maxPrefsPerUserItemSimilarity', I haven't find it. Can you
give me a link to find it.
Thanks
2013/9/12 Darius Miliauskas <[email protected]>
> Hi, Brian,
>
> this question is also relevant for me. Perhaps somebody will give more
> details because I am just learning myself. But, I guess you can try to
> change the parameters, and check the performance, and write here about it
> that everybody would get more knowledge!
>
> In general, if these values are lower, the performance should be faster
> because mahout based on some algorithms of hadoop. I think it could help if
> you will try the algorithms with several pieces of data, and look if you
> are missing some important recommendations. Let's say if you choose "
> maxSimilaritiesPerItem" as 4, and you miss some recommendations, then you
> should increase the value. It is a balance between performance and better
> results, and you should find that balance. Hope, you to share more details
> about what you will find out because I noticed that here (in the mailing
> list of mahout) everybody is asking but only few replying, and sharing.
>
>
> Thanks,
>
> Darius
>
>
> 2013/9/12 Brian Arnold <[email protected]>
>
> > Hi,
> >
> > I am currently trying to run the distributed Item Based Collaborative
> > filtering algorithm on our Hadoop cluster, and I have a few questions
> > regarding tweaking the various properties of the algorithm. For the
> > maxPrefsPerUser,maxSimilaritiesPerItem, and maxPrefsPerUserItemSimilarity
> > properties I was wondering if I could get a more detailed explanation of
> > what these properties control. I saw the description in the code, but I
> am
> > just wondering how changing these values will affect the results of the
> > algorithm, and will increasing them result in a better recommendation.
> >
> > Thanks
> >
>