Re: possible bug in Spark's ALS implementation...
ecessary unless I'm overlooking some desired side-effect. >> Only the factor >> matrix which generates the other in the first iteration needs to >> be >> initialized. >> >> I also found that the product and user RDDs were being rebuilt >> many times >> over in my tests, even for tiny data sets. By persisting the RDD >> returned >> from updateFeatures() I was able to avoid a raft of duplicate >> computations. >> Is there a reason not to do this? >> >> Thanks. >> >> >> >> -- >> View this message in >> context:http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s >> -ALS-implementation-tp2567p2704.html >> Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> >> >> >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s >> -ALS-implementation-tp2567p3588.html >> To unsubscribe from possible bug in Spark's ALS implementation..., click >> here. NAML >> >> > > > View this message in context: Re: possible bug in Spark's ALS > implementation... > > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
the process >> > by computing the user matrix. I haven't done any testing >> > comparing the >> > models generated starting from Y or X, but they are very >> > different. Is there >> > a reason Spark begins the iteration by computing Y? >> > >> > Initializing both X and Y as is done in the Spark implementation >> > seems >> > unnecessary unless I'm overlooking some desired side-effect. >> > Only the factor >> > matrix which generates the other in the first iteration needs to >> > be >> > initialized. >> > >> > I also found that the product and user RDDs were being rebuilt >> > many times >> > over in my tests, even for tiny data sets. By persisting the RDD >> > returned >> > from updateFeatures() I was able to avoid a raft of duplicate >> > computations. >> > Is there a reason not to do this? >> > >> > Thanks. >> > >> > >> > >> > -- >> > View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s >> > -ALS-implementation-tp2567p2704.html >> > Sent from the Apache Spark User List mailing list archive at >> > Nabble.com. >> > >> > >> > >> >> >> > If you reply to this email, your message will be added to the >> discussion >> > below: >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s >> > -ALS-implementation-tp2567p3588.html >> > To unsubscribe from possible bug in Spark's ALS implementation..., >> click >> > here. NAML >> > >> > >> >> -- >> View this message in context: Re: possible bug in Spark's ALS >> implementation...<http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html> >> >> Sent from the Apache Spark User List mailing list >> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >> > >
Re: possible bug in Spark's ALS implementation...
http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s > > -ALS-implementation-tp2567p2704.html > > Sent from the Apache Spark User List mailing list archive at > > Nabble.com. > > > > > > > > > > If you reply to this email, your message will be added to the discussion > > below: > > > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s > > -ALS-implementation-tp2567p3588.html > > To unsubscribe from possible bug in Spark's ALS implementation..., click > > here. NAML > > > > > > -- > View this message in context: Re: possible bug in Spark's ALS > implementation...<http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html> > > Sent from the Apache Spark User List mailing list > archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >
Re: possible bug in Spark's ALS implementation...
Hi Nick, I don't have my spark clone in front of me, but OTOH the major differences are/were: 1. Oryx multiplies lambda by alpha. 2. Oryx uses a different matrix inverse algorithm. It maintains a certain symmetry which the Spark algo does not, however I don't think this difference has a real impact on the results. 3. Oryx supports the specification of a convergence threshold for termination of the algorithm, based on delta rmse on a subset of the training set if I recall correctly. I've been using that as the termination criterion instead of a fixed number of iterations. 4. Oryx uses the weighted regularization scheme you alluded to below, multiplying lambda by the number of ratings. I've patched the spark impl to support (4) but haven't pushed it to my clone on github. I think it would be a valuable feature to support officially. I'd also like to work on (3) but don't have time now. I've only been using Oryx the past couple of weeks. Cheers, Michael On Tue, 1 Apr 2014, Nick Pentreath [via Apache Spark User List] wrote: > Hi Michael > Would you mind setting out exactly what differences you did find between the > Spark and Oryx implementations? Would be good to be clear on them, and also > see if there are further tricks/enhancements from the Oryx one that can be > ported (such as the lambda * numRatings adjustment). > > N > > > On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman <[hidden email]> wrote: > I've been thoroughly investigating this issue over the past > couple of days > and have discovered quite a bit. For one thing, there is > definitely (at > least) one issue/bug in the Spark implementation that leads to > incorrect > results for models generated with rank > 1 or a large number of > iterations. > I will post a bug report with a thorough explanation this > weekend or on > Monday. > > I believe I've been able to track down every difference between > the Spark > and Oryx implementations that lead to difference results. I made > some > adjustments to the spark implementation so that, given the same > initial > product/item vectors, the resulting model is identical to the > one produced > by Oryx within a small numerical tolerance. I've verified this > for small > data sets and am working on verifying this with some large data > sets. > > Aside from those already identified in this thread, another > significant > difference in the Spark implementation is that it begins the > factorization > process by computing the product matrix (Y) from the initial > user matrix > (X). Both of the papers on ALS referred to in this thread begin > the process > by computing the user matrix. I haven't done any testing > comparing the > models generated starting from Y or X, but they are very > different. Is there > a reason Spark begins the iteration by computing Y? > > Initializing both X and Y as is done in the Spark implementation > seems > unnecessary unless I'm overlooking some desired side-effect. > Only the factor > matrix which generates the other in the first iteration needs to > be > initialized. > > I also found that the product and user RDDs were being rebuilt > many times > over in my tests, even for tiny data sets. By persisting the RDD > returned > from updateFeatures() I was able to avoid a raft of duplicate > computations. > Is there a reason not to do this? > > Thanks. > > > > -- > View this message in > context:http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s > -ALS-implementation-tp2567p2704.html > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s > -ALS-implementation-tp2567p3588.html > To unsubscribe from possible bug in Spark's ALS implementation..., click > here. NAML > > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p3619.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Hi Michael Would you mind setting out exactly what differences you did find between the Spark and Oryx implementations? Would be good to be clear on them, and also see if there are further tricks/enhancements from the Oryx one that can be ported (such as the lambda * numRatings adjustment). N On Sat, Mar 15, 2014 at 2:52 AM, Michael Allman wrote: > I've been thoroughly investigating this issue over the past couple of days > and have discovered quite a bit. For one thing, there is definitely (at > least) one issue/bug in the Spark implementation that leads to incorrect > results for models generated with rank > 1 or a large number of iterations. > I will post a bug report with a thorough explanation this weekend or on > Monday. > > I believe I've been able to track down every difference between the Spark > and Oryx implementations that lead to difference results. I made some > adjustments to the spark implementation so that, given the same initial > product/item vectors, the resulting model is identical to the one produced > by Oryx within a small numerical tolerance. I've verified this for small > data sets and am working on verifying this with some large data sets. > > Aside from those already identified in this thread, another significant > difference in the Spark implementation is that it begins the factorization > process by computing the product matrix (Y) from the initial user matrix > (X). Both of the papers on ALS referred to in this thread begin the process > by computing the user matrix. I haven't done any testing comparing the > models generated starting from Y or X, but they are very different. Is > there > a reason Spark begins the iteration by computing Y? > > Initializing both X and Y as is done in the Spark implementation seems > unnecessary unless I'm overlooking some desired side-effect. Only the > factor > matrix which generates the other in the first iteration needs to be > initialized. > > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a reason not to do this? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: possible bug in Spark's ALS implementation...
Great work Xiangrui thanks for the enhancement!— Sent from Mailbox for iPhone On Wed, Mar 19, 2014 at 12:08 AM, Xiangrui Meng wrote: > Glad to hear the speed-up. Wish we can improve the implementation > further in the future. -Xiangrui > On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote: >> I just ran a runtime performance comparison between 0.9.0-incubating and your >> als branch. I saw a 1.5x improvement in performance. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Glad to hear the speed-up. Wish we can improve the implementation further in the future. -Xiangrui On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote: > I just ran a runtime performance comparison between 0.9.0-incubating and your > als branch. I saw a 1.5x improvement in performance. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
I just ran a runtime performance comparison between 0.9.0-incubating and your als branch. I saw a 1.5x improvement in performance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Sorry, the link was wrong. Should be https://github.com/apache/spark/pull/131 -Xiangrui On Tue, Mar 18, 2014 at 10:20 AM, Michael Allman wrote: > Hi Xiangrui, > > I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can > you explain? > > Also, thanks for addressing the issue with factor matrix persistence in PR > 165. I was probably not going to get to that for a while. > > I will try to test your changes today for speed improvements. > > Cheers, > > Michael > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2817.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Hi Xiangrui, I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can you explain? Also, thanks for addressing the issue with factor matrix persistence in PR 165. I was probably not going to get to that for a while. I will try to test your changes today for speed improvements. Cheers, Michael -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2817.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Hi Michael, I made couple changes to implicit ALS. One gives faster construction of YtY (https://github.com/apache/spark/pull/161), which was merged into master. The other caches intermediate matrix factors properly (https://github.com/apache/spark/pull/165). They should give you the same result as before, but faster (~2x in my local tests). If you have time to try the improved version, please let me know the speed-up on your data. Thanks! Best, Xiangrui On Mon, Mar 17, 2014 at 5:07 PM, Michael Allman wrote: > I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address > the issue of the factor matrix recomputation. I'm planning to submit a > related pull request shortly. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2785.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address the issue of the factor matrix recomputation. I'm planning to submit a related pull request shortly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2785.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
You are correct, in the long run it doesn't matter which matrix you begin the iterative process with. I was thinking in terms of doing a side-by-side comparison to Oryx. I've posted a bug report as SPARK-1262. I described the problem I found and the mitigation strategy I've used. I think that this problem has many possible solutions, so I'm omitting a patch to let the community hash out the best approach. However, I will suggest we move to a pure Java implementation of a linear system solver to provide better assurances of correctness across platforms (differences in java.lang.Math notwithstanding) and to make the implementation more transparent. It is not clear exactly what native code JBlas is linked to and using for its solver. I suggested the QR decomposition-based solvers provided by Colt and Commons Math as candidate replacements. Cheers. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2783.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
The factor matrix Y is used twice in implicit ALS computation, one to compute global Y^T Y, and another to compute local Y_i^T C_i Y_i. -Xiangrui On Sun, Mar 16, 2014 at 1:18 PM, Matei Zaharia wrote: > On Mar 14, 2014, at 5:52 PM, Michael Allman wrote: > > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a reason not to do this? > > > This sounds like a good thing to add, though I'd like to understand why > these are being recomputed (it seemed that the code would only use each one > once). Do you have any sense why that is? > > Matei
Re: possible bug in Spark's ALS implementation...
On Mar 14, 2014, at 5:52 PM, Michael Allman wrote: > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a reason not to do this? This sounds like a good thing to add, though I’d like to understand why these are being recomputed (it seemed that the code would only use each one once). Do you have any sense why that is? Matei
Re: possible bug in Spark's ALS implementation...
Hi Michael, Thanks for looking into the details! Computing X first and computing Y first can deliver different results, because the initial objective values could differ by a lot. But the algorithm should converge after a few iterations. It is hard to tell which should go first. After all, the definitions of "user" and "product" are arbitrary. One trick we can do is to rescale the columns of X and Y after each iteration such that they have the same column norms. For the comparison, you should compute some metrics to verify the convergence. I don't think initializing Y is necessary if we start with X. However, if Y_0 is not used, the data is not actually generated. So the overhead should be small. Best, Xiangrui On Fri, Mar 14, 2014 at 5:52 PM, Michael Allman wrote: > I've been thoroughly investigating this issue over the past couple of days > and have discovered quite a bit. For one thing, there is definitely (at > least) one issue/bug in the Spark implementation that leads to incorrect > results for models generated with rank > 1 or a large number of iterations. > I will post a bug report with a thorough explanation this weekend or on > Monday. > > I believe I've been able to track down every difference between the Spark > and Oryx implementations that lead to difference results. I made some > adjustments to the spark implementation so that, given the same initial > product/item vectors, the resulting model is identical to the one produced > by Oryx within a small numerical tolerance. I've verified this for small > data sets and am working on verifying this with some large data sets. > > Aside from those already identified in this thread, another significant > difference in the Spark implementation is that it begins the factorization > process by computing the product matrix (Y) from the initial user matrix > (X). Both of the papers on ALS referred to in this thread begin the process > by computing the user matrix. I haven't done any testing comparing the > models generated starting from Y or X, but they are very different. Is there > a reason Spark begins the iteration by computing Y? > > Initializing both X and Y as is done in the Spark implementation seems > unnecessary unless I'm overlooking some desired side-effect. Only the factor > matrix which generates the other in the first iteration needs to be > initialized. > > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a reason not to do this? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
I've been thoroughly investigating this issue over the past couple of days and have discovered quite a bit. For one thing, there is definitely (at least) one issue/bug in the Spark implementation that leads to incorrect results for models generated with rank > 1 or a large number of iterations. I will post a bug report with a thorough explanation this weekend or on Monday. I believe I've been able to track down every difference between the Spark and Oryx implementations that lead to difference results. I made some adjustments to the spark implementation so that, given the same initial product/item vectors, the resulting model is identical to the one produced by Oryx within a small numerical tolerance. I've verified this for small data sets and am working on verifying this with some large data sets. Aside from those already identified in this thread, another significant difference in the Spark implementation is that it begins the factorization process by computing the product matrix (Y) from the initial user matrix (X). Both of the papers on ALS referred to in this thread begin the process by computing the user matrix. I haven't done any testing comparing the models generated starting from Y or X, but they are very different. Is there a reason Spark begins the iteration by computing Y? Initializing both X and Y as is done in the Spark implementation seems unnecessary unless I'm overlooking some desired side-effect. Only the factor matrix which generates the other in the first iteration needs to be initialized. I also found that the product and user RDDs were being rebuilt many times over in my tests, even for tiny data sets. By persisting the RDD returned from updateFeatures() I was able to avoid a raft of duplicate computations. Is there a reason not to do this? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2704.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Ah, thank you, I had actually forgotten about this and this is indeed probably a difference. This is from the other paper I cited: http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf It's the "WR" in "ALS-WR" -- weighted regularization. I suppose the intuition is that you penalize complex explanations of prolific users and items proportionally more. The paper claims it helps and I also found it did. That could be the difference. -- Sean Owen | Director, Data Science | London On Thu, Mar 13, 2014 at 2:30 AM, Michael Allman wrote: > Hi Sean, > > Digging deeper I've found another difference between Oryx's implementation > and Spark's. Why do you adjust lambda here? > > https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491 > > Cheers, > > Michael > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2636.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Hi Sean, Digging deeper I've found another difference between Oryx's implementation and Spark's. Why do you adjust lambda here? https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491 Cheers, Michael -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2636.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
Thank you everyone for your feedback. It's been very helpful, and though I still haven't found the cause of the difference between Spark and Oryx, I feel I'm making progress. Xiangrui asked me to create a ticket for this issue. The reason I didn't do this originally is because it's not clear to me yet that this is a bug or a mistake on my part. I'd like to see where this conversation goes and then file a more clearcut issue if applicable. Sean pointed out that Oryx differs in its use of the regularization parameter lambda. I'm aware of this and have been compensating for this difference from the start. Also, the handling of negative values is indeed irrelevant as I have none in my data. After reviewing Sean's analysis and running some calculations in the console, I agree that the Spark code does compute YtCuY correctly. Regarding testing, I'm computing EPR on a test set as outlined in the paper. I'm training on three weeks of data and testing on the following week. I recently updated my data sets and rebuilt and tested the new models. The results were inconclusive in that both models scored about the same. I'm continuing to investigate the source of the wide difference in recommendations between implementations. I will reply with my findings when I have something more definitive. Cheers and thanks again. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2632.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: possible bug in Spark's ALS implementation...
On Wed, Mar 12, 2014 at 7:36 AM, Nick Pentreath wrote: > @Sean, would it be a good idea to look at changing the regularization in > Spark's ALS to alpha * lambda? What is the thinking behind this? If I > recall, the Mahout version added something like (# ratings * lambda) as > regularization in each factor update (for explicit), but implicit it was > just lambda (I may be wrong here). I also used a different default alpha than the one suggested in the paper: 1, instead of 40. But so does MLlib. And if alpha = 1, the variation I mention here has no effect. The idea was that alpha "is supposed to" control how much more weight a known user-item value gets in the factorization. The weight is "1 + alpha*r" for nonzero r, and of course "1" otherwise, and alpha can make the difference larger. But large alpha has the side-effect of making the regularization terms relatively smaller in the cost function. This dual effect seemed undesirable. So: multiply the regularization term by alpha too to disconnect these effects. Other ALS papers like http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf again use a different definition of lambda by stuffing something else into it. So the absolute value of lambda is already different in different contexts. So depending on Michael's settings this could be a red herring but worth checking. The only other variation was in choosing the random initial state but that too is the same now in both implementations (at least in HEAD). The initial state really shouldn't matter so much. I can't think of other variations. Michael what was your eval metric?
Re: possible bug in Spark's ALS implementation...
The mahout implementation is just a straight-forward port of the paper. No changes have been made. On 03/12/2014 08:36 AM, Nick Pentreath wrote: It would be helpful to know what parameter inputs you are using. If the regularization schemes are different (by a factor of alpha, which can often be quite high) this will mean that the same parameter settings could give very different results. A higher lambda would be required with Spark's version to be comparable. When I submitted the PR for this, I verified (on ml-100k, ml-1m and ml-10m data) that this version gives the same RMSE as Mahout's implicit model, as well as a separate Spark version that I wrote that was a from-scratch port of the Mahout algorithm (though I didn't compare vs Myrrix/Oryx). I'm fairly confident things are correct but if there is a bug let's definitely find and fix it! @Sean, would it be a good idea to look at changing the regularization in Spark's ALS to alpha * lambda? What is the thinking behind this? If I recall, the Mahout version added something like (# ratings * lambda) as regularization in each factor update (for explicit), but implicit it was just lambda (I may be wrong here). On Wed, Mar 12, 2014 at 4:57 AM, Xiangrui Meng wrote: Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some metrics to tell which recommendation is better? -Xiangrui On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng wrote: Hi Michael, I can help check the current implementation. Would you please go to https://spark-project.atlassian.net/browse/SPARK and create a ticket about this issue with component "MLlib"? Thanks! Best, Xiangrui On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote: Hi, I'm implementing a recommender based on the algorithm described in http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the basis for Spark's ALS implementation for data sets with implicit features. The data set I'm working with is proprietary and I cannot share it, however I can say that it's based on the same kind of data in the paper---relative viewing time of videos. (Specifically, the "rating" for each video is defined as total viewing time across all visitors divided by video duration). I'm seeing counterintuitive, sometimes nonsensical recommendations. For comparison, I've run the training data through Oryx's in-VM implementation of implicit ALS with the same parameters. Oryx uses the same algorithm. (Source in this file: https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java ) The recommendations made by each system compared to one other are very different---moreso than I think could be explained by differences in initial state. The recommendations made by the Oryx models look much better, especially as I increase the number of latent factors and the iterations. The Spark models' recommendations don't improve with increases in either latent factors or iterations. Sometimes, they get worse. Because of the (understandably) highly-optimized and terse style of Spark's ALS implementation, I've had a very hard time following it well enough to debug the issue definitively. However, I have found a section of code that looks incorrect. As described in the paper, part of the implicit ALS algorithm involves computing a matrix product YtCuY (equation 4 in the paper). To optimize this computation, this expression is rewritten as YtY + Yt(Cu - I)Y. I believe that's what should be happening here: https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376 However, it looks like this code is in fact computing YtY + YtY(Cu - I), which is the same as YtYCu. If so, that's a bug. Can someone familiar with this code evaluate my claim? Cheers, Michael
Re: possible bug in Spark's ALS implementation...
It would be helpful to know what parameter inputs you are using. If the regularization schemes are different (by a factor of alpha, which can often be quite high) this will mean that the same parameter settings could give very different results. A higher lambda would be required with Spark's version to be comparable. When I submitted the PR for this, I verified (on ml-100k, ml-1m and ml-10m data) that this version gives the same RMSE as Mahout's implicit model, as well as a separate Spark version that I wrote that was a from-scratch port of the Mahout algorithm (though I didn't compare vs Myrrix/Oryx). I'm fairly confident things are correct but if there is a bug let's definitely find and fix it! @Sean, would it be a good idea to look at changing the regularization in Spark's ALS to alpha * lambda? What is the thinking behind this? If I recall, the Mahout version added something like (# ratings * lambda) as regularization in each factor update (for explicit), but implicit it was just lambda (I may be wrong here). On Wed, Mar 12, 2014 at 4:57 AM, Xiangrui Meng wrote: > Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i > x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some > metrics to tell which recommendation is better? -Xiangrui > > On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng wrote: > > Hi Michael, > > > > I can help check the current implementation. Would you please go to > > https://spark-project.atlassian.net/browse/SPARK and create a ticket > > about this issue with component "MLlib"? Thanks! > > > > Best, > > Xiangrui > > > > On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote: > >> Hi, > >> > >> I'm implementing a recommender based on the algorithm described in > >> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms > the > >> basis for Spark's ALS implementation for data sets with implicit > features. > >> The data set I'm working with is proprietary and I cannot share it, > however > >> I can say that it's based on the same kind of data in the > paper---relative > >> viewing time of videos. (Specifically, the "rating" for each video is > >> defined as total viewing time across all visitors divided by video > >> duration). > >> > >> I'm seeing counterintuitive, sometimes nonsensical recommendations. For > >> comparison, I've run the training data through Oryx's in-VM > implementation > >> of implicit ALS with the same parameters. Oryx uses the same algorithm. > >> (Source in this file: > >> > https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java > ) > >> > >> The recommendations made by each system compared to one other are very > >> different---moreso than I think could be explained by differences in > initial > >> state. The recommendations made by the Oryx models look much better, > >> especially as I increase the number of latent factors and the > iterations. > >> The Spark models' recommendations don't improve with increases in either > >> latent factors or iterations. Sometimes, they get worse. > >> > >> Because of the (understandably) highly-optimized and terse style of > Spark's > >> ALS implementation, I've had a very hard time following it well enough > to > >> debug the issue definitively. However, I have found a section of code > that > >> looks incorrect. As described in the paper, part of the implicit ALS > >> algorithm involves computing a matrix product YtCuY (equation 4 in the > >> paper). To optimize this computation, this expression is rewritten as > YtY + > >> Yt(Cu - I)Y. I believe that's what should be happening here: > >> > >> > https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376 > >> > >> However, it looks like this code is in fact computing YtY + YtY(Cu - I), > >> which is the same as YtYCu. If so, that's a bug. Can someone familiar > with > >> this code evaluate my claim? > >> > >> Cheers, > >> > >> Michael >
Re: possible bug in Spark's ALS implementation...
Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some metrics to tell which recommendation is better? -Xiangrui On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng wrote: > Hi Michael, > > I can help check the current implementation. Would you please go to > https://spark-project.atlassian.net/browse/SPARK and create a ticket > about this issue with component "MLlib"? Thanks! > > Best, > Xiangrui > > On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote: >> Hi, >> >> I'm implementing a recommender based on the algorithm described in >> http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the >> basis for Spark's ALS implementation for data sets with implicit features. >> The data set I'm working with is proprietary and I cannot share it, however >> I can say that it's based on the same kind of data in the paper---relative >> viewing time of videos. (Specifically, the "rating" for each video is >> defined as total viewing time across all visitors divided by video >> duration). >> >> I'm seeing counterintuitive, sometimes nonsensical recommendations. For >> comparison, I've run the training data through Oryx's in-VM implementation >> of implicit ALS with the same parameters. Oryx uses the same algorithm. >> (Source in this file: >> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java) >> >> The recommendations made by each system compared to one other are very >> different---moreso than I think could be explained by differences in initial >> state. The recommendations made by the Oryx models look much better, >> especially as I increase the number of latent factors and the iterations. >> The Spark models' recommendations don't improve with increases in either >> latent factors or iterations. Sometimes, they get worse. >> >> Because of the (understandably) highly-optimized and terse style of Spark's >> ALS implementation, I've had a very hard time following it well enough to >> debug the issue definitively. However, I have found a section of code that >> looks incorrect. As described in the paper, part of the implicit ALS >> algorithm involves computing a matrix product YtCuY (equation 4 in the >> paper). To optimize this computation, this expression is rewritten as YtY + >> Yt(Cu - I)Y. I believe that's what should be happening here: >> >> https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376 >> >> However, it looks like this code is in fact computing YtY + YtY(Cu - I), >> which is the same as YtYCu. If so, that's a bug. Can someone familiar with >> this code evaluate my claim? >> >> Cheers, >> >> Michael
Re: possible bug in Spark's ALS implementation...
On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman wrote: > I'm seeing counterintuitive, sometimes nonsensical recommendations. For > comparison, I've run the training data through Oryx's in-VM implementation > of implicit ALS with the same parameters. Oryx uses the same algorithm. > (Source in this file: > https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java) On this note, I should say that Oryx varies from that paper in a couple small ways. In particular it the regularization parameter that is used in the end is not just lambda, but lambda * alpha. (There are decent reasons for this.) So the difference with the "same" parameters could be down to this. What param values are you using? It might be the difference. (There is another difference in handling of negative values, but that is probably irrelevant to you? It is in Spark now too though. It was not in 0.9.0 but is in HEAD.) > However, it looks like this code is in fact computing YtY + YtY(Cu - I), > which is the same as YtYCu. If so, that's a bug. Can someone familiar with > this code evaluate my claim? I too can't be 100% certain I'm not missing something, but from a look at that line, I don't think it is computing YtY(Cu-I). It is indeed trying to accumulate the value Yt(Cu-I)Y by building it up from pieces, from rows of Y. For one row of Y that piece is, excusing my notation, Y(i)t (Cu(i)-1) Y(i). The middle term is just a scalar so it's fine to multiply it at the end as you see in that line. You may wish to follow HEAD, which is a bit different: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L390 The computation is actually the same as before (for positive input), expressed a little differently. Happy to help on this given that I know this code a little and the code you are comparing it to a lot.
Re: possible bug in Spark's ALS implementation...
Hi Michael, I can help check the current implementation. Would you please go to https://spark-project.atlassian.net/browse/SPARK and create a ticket about this issue with component "MLlib"? Thanks! Best, Xiangrui On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote: > Hi, > > I'm implementing a recommender based on the algorithm described in > http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the > basis for Spark's ALS implementation for data sets with implicit features. > The data set I'm working with is proprietary and I cannot share it, however > I can say that it's based on the same kind of data in the paper---relative > viewing time of videos. (Specifically, the "rating" for each video is > defined as total viewing time across all visitors divided by video > duration). > > I'm seeing counterintuitive, sometimes nonsensical recommendations. For > comparison, I've run the training data through Oryx's in-VM implementation > of implicit ALS with the same parameters. Oryx uses the same algorithm. > (Source in this file: > https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java) > > The recommendations made by each system compared to one other are very > different---moreso than I think could be explained by differences in initial > state. The recommendations made by the Oryx models look much better, > especially as I increase the number of latent factors and the iterations. > The Spark models' recommendations don't improve with increases in either > latent factors or iterations. Sometimes, they get worse. > > Because of the (understandably) highly-optimized and terse style of Spark's > ALS implementation, I've had a very hard time following it well enough to > debug the issue definitively. However, I have found a section of code that > looks incorrect. As described in the paper, part of the implicit ALS > algorithm involves computing a matrix product YtCuY (equation 4 in the > paper). To optimize this computation, this expression is rewritten as YtY + > Yt(Cu - I)Y. I believe that's what should be happening here: > > https://github.com/apache/incubator-spark/blob/v0.9.0-incubating/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L376 > > However, it looks like this code is in fact computing YtY + YtY(Cu - I), > which is the same as YtYCu. If so, that's a bug. Can someone familiar with > this code evaluate my claim? > > Cheers, > > Michael