Hi,

Thank you for the response!  What you said makes sense.  Here is a link to
the other property:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java#RecommenderJob.0DEFAULT_MAX_SIMILARITIES_PER_ITEM

Supposing I have a sufficiently large cluster to process the data, would
increasing the values necessarily give me a better recommendation?  Which
do you feel would have the largest impact on the quality of the
recommendation?

Brian


On Thu, Sep 12, 2013 at 7:05 AM, 林伟 <[email protected]> wrote:

> Hi Brian *& *Miliauskas,
>
> I am a data mining engineer form Taobao recommendation team. In past one
> month, I have read all the code of mahout itemCF.
> So maybe I can answer this question.
>
> We consider the input of itemCF for one user is a item vector, like this
> (the notation is from Json object model):
> <userid,  [ {item1, perf(u, i1)}, {item2, perf(u, i2)}, ..... {itemN,
> perf(u, in)} ]>
> So,  maxPrefsPerUser  means max length of item vector. If
> user preferred more than this number items, there a sample will be applied
> the make sure the limitation.
>
> We also consider the output of ItemCF for one item is a similarity vector,
>  like this:
> <item1,  [ {item2, sim(2,1}, {item3, sim(3,1), .... {itemK, sim(K,1)} ]>
> So, maxSimilaritiesPerItem  means max length of similarity vector,  if
> item1 has more similar items than this number, mahout just output top
> 'maxSimilaritiesPerItem'
>  items.
>
> For parameter 'maxPrefsPerUserItemSimilarity',  I haven't find it.  Can you
> give me a link to find it.
>
> Thanks
>
>
>
> 2013/9/12 Darius Miliauskas <[email protected]>
>
> > Hi, Brian,
> >
> > this question is also relevant for me. Perhaps somebody will give more
> > details because I am just learning myself. But, I guess you can try to
> > change the parameters, and check the performance, and write here about it
> > that everybody would get more knowledge!
> >
> > In general, if these values are lower, the performance should be faster
> > because mahout based on some algorithms of hadoop. I think it could help
> if
> > you will try the algorithms with several pieces of data, and look if you
> > are missing some important recommendations. Let's say if you choose "
> > maxSimilaritiesPerItem" as 4, and you miss some recommendations, then you
> > should increase the value. It is a balance between performance and better
> > results, and you should find that balance. Hope, you to share more
> details
> > about what you will find out because I noticed that here (in the mailing
> > list of mahout) everybody is asking but only few replying, and sharing.
> >
> >
> > Thanks,
> >
> > Darius
> >
> >
> > 2013/9/12 Brian Arnold <[email protected]>
> >
> > > Hi,
> > >
> > > I am currently trying to run the distributed Item Based Collaborative
> > > filtering algorithm on our Hadoop cluster, and I have a few questions
> > > regarding tweaking the various properties of the algorithm.  For the
> > > maxPrefsPerUser,maxSimilaritiesPerItem, and
> maxPrefsPerUserItemSimilarity
> > > properties I was wondering if I could get a more detailed explanation
> of
> > > what these properties control.  I saw the description in the code, but
> I
> > am
> > > just wondering how changing these values will affect the results of the
> > > algorithm, and will increasing them result in a better recommendation.
> > >
> > > Thanks
> > >
> >
>

Reply via email to