This isn't specific to Spark; it's from the original paper.

alpha doesn't do a whole lot, and it is a global hyperparam. It
controls the relative weight of observed versus unobserved
user-product interactions in the factorization. Higher alpha means
it's much more important to faithfully reproduce the interactions that
*did* happen as a "1", than reproduce the interactions that *didn't*
happen as a "0".

I don't think there's a good rule of thumb about what value to pick;
it can't be less than 0 (less than 1 doesn't make much sense either),
and you might just try values between 1 and 100 to see what gives the
best result.

I think that generally sparser input needs higher alpha, and maybe
someone tells me that really alpha should be a function of the
sparsity, but I've never seen that done.



On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <[email protected]> wrote:
> Hi, I've been doing some POC for CF in MLlib.
> In my environment,  ratings are all implicit so that I try to use it with
> trainImplicit method (in python).
>
> The trainImplicit method takes alpha as one of the arguments to specify a
> confidence for the ratings as described in
> <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>,
> but the alpha value is global for all the ratings so I am not sure why we
> need this.
> (If it is per rating, it makes sense to me, though.)
>
> What is the difference in setting different alpha values for exactly the
> same data set ?
>
> I would be very appreciated if someone give me a reasonable explanation for
> this.
>
> Best regards,
> Hiro

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to