Hi, Thank you for the response! What you said makes sense. Here is a link to the other property: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java#RecommenderJob.0DEFAULT_MAX_SIMILARITIES_PER_ITEM
Supposing I have a sufficiently large cluster to process the data, would increasing the values necessarily give me a better recommendation? Which do you feel would have the largest impact on the quality of the recommendation? Brian On Thu, Sep 12, 2013 at 7:05 AM, 林伟 <[email protected]> wrote: > Hi Brian *& *Miliauskas, > > I am a data mining engineer form Taobao recommendation team. In past one > month, I have read all the code of mahout itemCF. > So maybe I can answer this question. > > We consider the input of itemCF for one user is a item vector, like this > (the notation is from Json object model): > <userid, [ {item1, perf(u, i1)}, {item2, perf(u, i2)}, ..... {itemN, > perf(u, in)} ]> > So, maxPrefsPerUser means max length of item vector. If > user preferred more than this number items, there a sample will be applied > the make sure the limitation. > > We also consider the output of ItemCF for one item is a similarity vector, > like this: > <item1, [ {item2, sim(2,1}, {item3, sim(3,1), .... {itemK, sim(K,1)} ]> > So, maxSimilaritiesPerItem means max length of similarity vector, if > item1 has more similar items than this number, mahout just output top > 'maxSimilaritiesPerItem' > items. > > For parameter 'maxPrefsPerUserItemSimilarity', I haven't find it. Can you > give me a link to find it. > > Thanks > > > > 2013/9/12 Darius Miliauskas <[email protected]> > > > Hi, Brian, > > > > this question is also relevant for me. Perhaps somebody will give more > > details because I am just learning myself. But, I guess you can try to > > change the parameters, and check the performance, and write here about it > > that everybody would get more knowledge! > > > > In general, if these values are lower, the performance should be faster > > because mahout based on some algorithms of hadoop. I think it could help > if > > you will try the algorithms with several pieces of data, and look if you > > are missing some important recommendations. Let's say if you choose " > > maxSimilaritiesPerItem" as 4, and you miss some recommendations, then you > > should increase the value. It is a balance between performance and better > > results, and you should find that balance. Hope, you to share more > details > > about what you will find out because I noticed that here (in the mailing > > list of mahout) everybody is asking but only few replying, and sharing. > > > > > > Thanks, > > > > Darius > > > > > > 2013/9/12 Brian Arnold <[email protected]> > > > > > Hi, > > > > > > I am currently trying to run the distributed Item Based Collaborative > > > filtering algorithm on our Hadoop cluster, and I have a few questions > > > regarding tweaking the various properties of the algorithm. For the > > > maxPrefsPerUser,maxSimilaritiesPerItem, and > maxPrefsPerUserItemSimilarity > > > properties I was wondering if I could get a more detailed explanation > of > > > what these properties control. I saw the description in the code, but > I > > am > > > just wondering how changing these values will affect the results of the > > > algorithm, and will increasing them result in a better recommendation. > > > > > > Thanks > > > > > >
