This isn't specific to Spark; it's from the original paper. alpha doesn't do a whole lot, and it is a global hyperparam. It controls the relative weight of observed versus unobserved user-product interactions in the factorization. Higher alpha means it's much more important to faithfully reproduce the interactions that *did* happen as a "1", than reproduce the interactions that *didn't* happen as a "0".
I don't think there's a good rule of thumb about what value to pick; it can't be less than 0 (less than 1 doesn't make much sense either), and you might just try values between 1 and 100 to see what gives the best result. I think that generally sparser input needs higher alpha, and maybe someone tells me that really alpha should be a function of the sparsity, but I've never seen that done. On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <[email protected]> wrote: > Hi, I've been doing some POC for CF in MLlib. > In my environment, ratings are all implicit so that I try to use it with > trainImplicit method (in python). > > The trainImplicit method takes alpha as one of the arguments to specify a > confidence for the ratings as described in > <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>, > but the alpha value is global for all the ratings so I am not sure why we > need this. > (If it is per rating, it makes sense to me, though.) > > What is the difference in setting different alpha values for exactly the > same data set ? > > I would be very appreciated if someone give me a reasonable explanation for > this. > > Best regards, > Hiro --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
