Lisen, did you use all m-by-n pairs during training? Implicit model penalizes unobserved ratings, while explicit model doesn't. -Xiangrui
On Feb 26, 2015 6:26 AM, "Sean Owen" <so...@cloudera.com> wrote: > > +user > > On Thu, Feb 26, 2015 at 2:26 PM, Sean Owen <so...@cloudera.com> wrote: >> >> I think I may have it backwards, and that you are correct to keep the 0 elements in train() in order to try to reproduce the same result. >> >> The second formulation is called 'weighted regularization' and is used for both implicit and explicit feedback, as far as I can see in the code. >> >> Hm, I'm actually not clear why these would produce different results. Different code paths are used to be sure, but I'm not yet sure why they would give different results. >> >> In general you wouldn't use train() for data like this though, and would never set alpha=0. >> >> On Thu, Feb 26, 2015 at 2:15 PM, lisendong <lisend...@163.com> wrote: >>> >>> I want to confirm the loss function you use (sorry I’m not so familiar with scala code so I did not understand the source code of mllib) >>> >>> According to the papers : >>> >>> >>> in your implicit feedback ALS, the loss function is (ICDM 2008): >>> >>> in the explicit feedback ALS, the loss function is (Netflix 2008): >>> >>> note that besides the difference of confidence parameter Cui, the regularization is also different. does your code also has this difference? >>> >>> Best Regards, >>> Sendong Li >>> >>> >>>> 在 2015年2月26日,下午9:42,lisendong <lisend...@163.com> 写道: >>>> >>>> Hi meng, fotero, sowen: >>>> >>>> I’m using ALS with spark 1.0.0, the code should be: >>>> https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala >>>> >>>> I think the following two method should produce the same (or near) result: >>>> >>>> MatrixFactorizationModel model = ALS.train(ratings.rdd(), 30, 30, 0.01, -1, 1); >>>> >>>> MatrixFactorizationModel model = ALS.trainImplicit(ratings.rdd(), 30, 30, 0.01, -1, 0, 1); >>>> >>>> the data I used is display log, the format of log is as following: >>>> >>>> user item if-click >>>> >>>> >>>> >>>> >>>> >>>> >>>> I use 1.0 as score for click pair, and 0 as score for non-click pair. >>>> >>>> in the second method, the alpha is set to zero, so the confidence for positive and negative are both 1.0 (right?) >>>> >>>> I think the two method should produce similar result, but the result is : the second method’s result is very bad (the AUC of the first result is 0.7, but the AUC of the second result is only 0.61) >>>> >>>> >>>> I could not understand why, could you help me? >>>> >>>> >>>> Thank you very much! >>>> >>>> Best Regards, >>>> Sendong Li >>> >>> >> >