There are different lambdas here, really. In the paper Danny
mentioned, lambda is nearer to 0.01 but is multiplied by the number of
users / items rated per item / user.

In the first example, where it's 500, it's "competing" with much
larger squared-error terms since the first test is prediction number
of views, not 0/1. In the second... I suppose that does seem
unintuitively high, 150, given how the squared error term has changed,
but hey maybe that's how it works out.

In the full model, the squared error terms get these weights that grow
with "alpha", which is given as 40. A large lambda may again be
useful. I, personally, use a model where "lambda" is "lambda * alpha"
in the second paper, since it doesn't really make sense to me to not
couple those constants. My starting default value is lower, like
lambda * alpha = 1 - 10.

I haven't double-checked the implementation to see if lambda is used
any differently than the above. But that's what I'd look for. Try a
few values at different orders of magnitude; that will show quickly
where the right size lies. In the end it does depend on the data.


On Tue, Dec 11, 2012 at 8:44 AM, Royi Ronen <[email protected]> wrote:
> Thanks Sean.
> I am taking a look at this paper:
> www2.research.att.com/~yifanhu/PUB/cf.pdf<http://www2.research.att.com/%7Eyifanhu/PUB/cf.pdf>
>
> And it seems like they use very high values for lambda, between 150 - 500.
> Am I missing anything?
> I was wondering if the algorithm implemented in Mahout should be run with
> the low lambda (for implicit feedback without strength).
>
> Thanks a lot,
> Royi
>
> On Mon, Dec 10, 2012 at 6:53 PM, Sean Owen <[email protected]> wrote:
>
>> The versions of this algorithm where the value is 1 (no strength,
>> implicit only) will have a cost function where the squared-error terms
>> are relatively smaller -- because the errors are otherwise weighted by
>> that  cu = 1 + alpha * ru term, which is largeish. So the
>> regularization term is relatively larger all else equal. This value of
>> lambda is fairly low and looks like the kind of value used in the
>> original paper cited here (without strengths). So it's fine.
>>
>> I find you need something larger when using the second version, with
>> strengths, since lambda of this size will make the regularization term
>> orders of magnitude smaller than the other terms. I actually use
>> lambda * alpha instead since it kinda should scale with alpha like the
>> squared error term's weights do.
>>
>> On Mon, Dec 10, 2012 at 4:41 PM, Sebastian Schelter <[email protected]>
>> wrote:
>> > The usage seems to be ok, I'm not sure whether the learning rate value
>> > (lambda) works well for the implicit variant of the algorithm, though.
>> >
>> > The algorithm should work with binary data, but was originally designed
>> > to incorporate the strength of the implicit interaction (like number of
>> > views etc).
>> >
>> > /s
>> >
>> > On 10.12.2012 17:27, ronen.royi wrote:
>> >>
>> >> Thanks! Could you confirm the correcrness of usage?
>> >>
>> >>
>> >>
>> >> Sent from Samsung MobileSebastian Schelter <[email protected]>
>> wrote:Hi Royi,
>> >>
>> >> If you specify implicitFeedback=true, then another variant of ALS is
>> >> used that is described in this paper:
>> >>
>> >> Collaborative Filtering for Implicit Feedback Datasets
>> >> www2.research.att.com/~yifanhu/PUB/cf.pdf
>> >>
>> >> /s
>> >>
>> >> On 10.12.2012 17:07, Danny Bickson wrote:
>> >>> As far as I know the ALS algorithm is described in the paper:
>> >>>
>> >>>
>> >>> Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan.
>> >>> Large-Scale Parallel Collaborative Filtering for the Netflix Prize.
>> >>> Proceedings of the 4th international conference on Algorithmic Aspects
>> >>> in Information and Management. Shanghai, China pp. 337-348, 2008.
>> >>>
>> >>> Best,
>> >>>
>> >>> Dr. Danny Bickson
>> >>> Project Scientist, Machine Learning Dept.
>> >>> Carnegie Mellon University
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Dec 10, 2012 at 5:59 PM, Royi Ronen <[email protected]>
>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I am looking for confirmation regarding my usage of Mahout matrix
>> >>>> factorization with implicit feedback.
>> >>>> The input file is of the form <user,item,1> , as advised in one of the
>> >>>> Mahout forums.
>> >>>> All my usage points are positive (i.e, the user watched the movie).
>> >>>>
>> >>>> I changed the MovieLens Example:
>> >>>>
>> >>>> $MAHOUT parallelALS --input /tmp/mahout-work-cloudera/input.txt
>> --output
>> >>>> ${WORK_DIR}/als/out \
>> >>>>      --tempDir ${WORK_DIR}/als/tmp --numFeatures 20 --numIterations 40
>> >>>> --lambda 0.065 --implicitFeedback true
>> >>>>
>> >>>> # compute recommendations
>> >>>> $MAHOUT recommendfactorized --input ${WORK_DIR}/als/out/userRatings/
>> >>>> --output ${WORK_DIR}/recommendations/ \
>> >>>>      --userFeatures ${WORK_DIR}/als/out/U/ --itemFeatures
>> >>>> ${WORK_DIR}/als/out/M/ \
>> >>>>      --numRecommendations 10 --maxRating 5
>> >>>>
>> >>>>
>> >>>> This runs OK and gives recommendations that sometimes seem to be
>> biased
>> >>>> towards popular items.
>> >>>> I would like to verify that this is the right way to run it.
>> >>>>
>> >>>> Also - does anyone know which algorithm is used to factorize?
>> >>>>
>> >>>> Thanks very much :)
>> >>>>
>> >>>
>> >>
>> >
>>

Reply via email to