Thanks Sean. I am taking a look at this paper: www2.research.att.com/~yifanhu/PUB/cf.pdf<http://www2.research.att.com/%7Eyifanhu/PUB/cf.pdf>
And it seems like they use very high values for lambda, between 150 - 500. Am I missing anything? I was wondering if the algorithm implemented in Mahout should be run with the low lambda (for implicit feedback without strength). Thanks a lot, Royi On Mon, Dec 10, 2012 at 6:53 PM, Sean Owen <[email protected]> wrote: > The versions of this algorithm where the value is 1 (no strength, > implicit only) will have a cost function where the squared-error terms > are relatively smaller -- because the errors are otherwise weighted by > that cu = 1 + alpha * ru term, which is largeish. So the > regularization term is relatively larger all else equal. This value of > lambda is fairly low and looks like the kind of value used in the > original paper cited here (without strengths). So it's fine. > > I find you need something larger when using the second version, with > strengths, since lambda of this size will make the regularization term > orders of magnitude smaller than the other terms. I actually use > lambda * alpha instead since it kinda should scale with alpha like the > squared error term's weights do. > > On Mon, Dec 10, 2012 at 4:41 PM, Sebastian Schelter <[email protected]> > wrote: > > The usage seems to be ok, I'm not sure whether the learning rate value > > (lambda) works well for the implicit variant of the algorithm, though. > > > > The algorithm should work with binary data, but was originally designed > > to incorporate the strength of the implicit interaction (like number of > > views etc). > > > > /s > > > > On 10.12.2012 17:27, ronen.royi wrote: > >> > >> Thanks! Could you confirm the correcrness of usage? > >> > >> > >> > >> Sent from Samsung MobileSebastian Schelter <[email protected]> > wrote:Hi Royi, > >> > >> If you specify implicitFeedback=true, then another variant of ALS is > >> used that is described in this paper: > >> > >> Collaborative Filtering for Implicit Feedback Datasets > >> www2.research.att.com/~yifanhu/PUB/cf.pdf > >> > >> /s > >> > >> On 10.12.2012 17:07, Danny Bickson wrote: > >>> As far as I know the ALS algorithm is described in the paper: > >>> > >>> > >>> Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan. > >>> Large-Scale Parallel Collaborative Filtering for the Netflix Prize. > >>> Proceedings of the 4th international conference on Algorithmic Aspects > >>> in Information and Management. Shanghai, China pp. 337-348, 2008. > >>> > >>> Best, > >>> > >>> Dr. Danny Bickson > >>> Project Scientist, Machine Learning Dept. > >>> Carnegie Mellon University > >>> > >>> > >>> > >>> On Mon, Dec 10, 2012 at 5:59 PM, Royi Ronen <[email protected]> > wrote: > >>> > >>>> Hi, > >>>> > >>>> I am looking for confirmation regarding my usage of Mahout matrix > >>>> factorization with implicit feedback. > >>>> The input file is of the form <user,item,1> , as advised in one of the > >>>> Mahout forums. > >>>> All my usage points are positive (i.e, the user watched the movie). > >>>> > >>>> I changed the MovieLens Example: > >>>> > >>>> $MAHOUT parallelALS --input /tmp/mahout-work-cloudera/input.txt > --output > >>>> ${WORK_DIR}/als/out \ > >>>> --tempDir ${WORK_DIR}/als/tmp --numFeatures 20 --numIterations 40 > >>>> --lambda 0.065 --implicitFeedback true > >>>> > >>>> # compute recommendations > >>>> $MAHOUT recommendfactorized --input ${WORK_DIR}/als/out/userRatings/ > >>>> --output ${WORK_DIR}/recommendations/ \ > >>>> --userFeatures ${WORK_DIR}/als/out/U/ --itemFeatures > >>>> ${WORK_DIR}/als/out/M/ \ > >>>> --numRecommendations 10 --maxRating 5 > >>>> > >>>> > >>>> This runs OK and gives recommendations that sometimes seem to be > biased > >>>> towards popular items. > >>>> I would like to verify that this is the right way to run it. > >>>> > >>>> Also - does anyone know which algorithm is used to factorize? > >>>> > >>>> Thanks very much :) > >>>> > >>> > >> > > >
