Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
In implicit feedback model, the coefficients were already penalized (towards zero) by the number of unobserved ratings. So I think it is fair to keep the 1.3.0 weighting (by the number of total users/items). Again, I don't think we have a clear answer. It would be nice to run some experiments and see which works better. -Xiangrui On Thu, May 7, 2015 at 9:35 AM, Ravi Mody wrote: > After thinking about it more, I do think weighting lambda by sum_i cij is > the equivalent of the ALS-WR paper's approach for the implicit case. This > provides scale-invariance for varying products/users and for varying > ratings, and should behave well for all alphas. What do you guys think? > > On Wed, May 6, 2015 at 12:29 PM, Ravi Mody wrote: >> >> Whoops I just saw this thread, it got caught in my spam filter. Thanks for >> looking into this Xiangrui and Sean. >> >> The implicit situation does seem fairly complicated to me. The cost >> function (not including the regularization term) is affected both by the >> number of ratings and by the number of user/products. As we increase alpha >> the contribution to the cost function from the number of users/products >> diminishes compared to the contribution from the number of ratings. So large >> alphas seem to favor the weighted-lambda approach, even though it's not a >> perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but again >> it's not a perfect match. >> >> I believe low alphas won't work well with regularization because both >> terms in the cost function will just push everything to zero. Some of my >> experiments confirm this. This leads me to think that weighted-lambda would >> work better in practice, but I have no evidence of this. It may make sense >> to weight lambda by sum_i cij instead? >> >> >> >> >> >> On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng wrote: >>> >>> Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 >>> and used the same lambda scaling as in 1.2. The change will be >>> included in Spark 1.3.1, which will be released soon. Thanks for >>> reporting this issue! -Xiangrui >>> >>> On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng wrote: >>> > I created a JIRA for this: >>> > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have >>> > a clear answer about how the scaling should be handled. Maybe the best >>> > solution for now is to switch back to the 1.2 scaling. -Xiangrui >>> > >>> > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote: >>> >> Ah yeah I take your point. The squared error term is over the whole >>> >> user-item matrix, technically, in the implicit case. I suppose I am >>> >> used to assuming that the 0 terms in this matrix are weighted so much >>> >> less (because alpha is usually large-ish) that they're almost not >>> >> there, but they are. So I had just used the explicit formulation. >>> >> >>> >> I suppose the result is kind of scale invariant, but not exactly. I >>> >> had not prioritized this property since I had generally built models >>> >> on the full data set and not a sample, and had assumed that lambda >>> >> would need to be retuned over time as the input grew anyway. >>> >> >>> >> So, basically I don't know anything more than you do, sorry! >>> >> >>> >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng >>> >> wrote: >>> >>> Hey Sean, >>> >>> >>> >>> That is true for explicit model, but not for implicit. The ALS-WR >>> >>> paper doesn't cover the implicit model. In implicit formulation, a >>> >>> sub-problem (for v_j) is: >>> >>> >>> >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >>> >>> >>> >>> This is a sum for all i but not just the users who rate item j. In >>> >>> this case, if we set X=m_j, the number of observed ratings for item >>> >>> j, >>> >>> it is not really scale invariant. We have #users user vectors in the >>> >>> least squares problem but only penalize lambda * #ratings. I was >>> >>> suggesting using lambda * m directly for implicit model to match the >>> >>> number of vectors in the least squares problem. Well, this is my >>> >>> theory. I don't find any public work about it. >>> >>> >>> >>> Best, >>> >>> Xiangrui >>> >>> >>> >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen >>> >>> wrote: >>> I had always understood the formulation to be the first option you >>> describe. Lambda is scaled by the number of items the user has rated >>> / >>> interacted with. I think the goal is to avoid fitting the tastes of >>> prolific users disproportionately just because they have many >>> ratings >>> to fit. This is what's described in the ALS-WR paper we link to on >>> the >>> Spark web site, in equation 5 >>> >>> (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) >>> >>> I think this also gets you the scale-invariance? For every >>> additional >>> rating from user i to product j, you add one new term to the >>> squared-err
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
After thinking about it more, I do think weighting lambda by sum_i cij is the equivalent of the ALS-WR paper's approach for the implicit case. This provides scale-invariance for varying products/users and for varying ratings, and should behave well for all alphas. What do you guys think? On Wed, May 6, 2015 at 12:29 PM, Ravi Mody wrote: > Whoops I just saw this thread, it got caught in my spam filter. Thanks for > looking into this Xiangrui and Sean. > > The implicit situation does seem fairly complicated to me. The cost > function (not including the regularization term) is affected both by the > number of ratings and by the number of user/products. As we increase alpha > the contribution to the cost function from the number of users/products > diminishes compared to the contribution from the number of ratings. So > large alphas seem to favor the weighted-lambda approach, even though it's > not a perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but > again it's not a perfect match. > > I believe low alphas won't work well with regularization because both > terms in the cost function will just push everything to zero. Some of my > experiments confirm this. This leads me to think that weighted-lambda would > work better in practice, but I have no evidence of this. It may make sense > to weight lambda by sum_i cij instead? > > > > > > On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng wrote: > >> Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 >> and used the same lambda scaling as in 1.2. The change will be >> included in Spark 1.3.1, which will be released soon. Thanks for >> reporting this issue! -Xiangrui >> >> On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng wrote: >> > I created a JIRA for this: >> > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have >> > a clear answer about how the scaling should be handled. Maybe the best >> > solution for now is to switch back to the 1.2 scaling. -Xiangrui >> > >> > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote: >> >> Ah yeah I take your point. The squared error term is over the whole >> >> user-item matrix, technically, in the implicit case. I suppose I am >> >> used to assuming that the 0 terms in this matrix are weighted so much >> >> less (because alpha is usually large-ish) that they're almost not >> >> there, but they are. So I had just used the explicit formulation. >> >> >> >> I suppose the result is kind of scale invariant, but not exactly. I >> >> had not prioritized this property since I had generally built models >> >> on the full data set and not a sample, and had assumed that lambda >> >> would need to be retuned over time as the input grew anyway. >> >> >> >> So, basically I don't know anything more than you do, sorry! >> >> >> >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng >> wrote: >> >>> Hey Sean, >> >>> >> >>> That is true for explicit model, but not for implicit. The ALS-WR >> >>> paper doesn't cover the implicit model. In implicit formulation, a >> >>> sub-problem (for v_j) is: >> >>> >> >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >> >>> >> >>> This is a sum for all i but not just the users who rate item j. In >> >>> this case, if we set X=m_j, the number of observed ratings for item j, >> >>> it is not really scale invariant. We have #users user vectors in the >> >>> least squares problem but only penalize lambda * #ratings. I was >> >>> suggesting using lambda * m directly for implicit model to match the >> >>> number of vectors in the least squares problem. Well, this is my >> >>> theory. I don't find any public work about it. >> >>> >> >>> Best, >> >>> Xiangrui >> >>> >> >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen >> wrote: >> I had always understood the formulation to be the first option you >> describe. Lambda is scaled by the number of items the user has rated >> / >> interacted with. I think the goal is to avoid fitting the tastes of >> prolific users disproportionately just because they have many ratings >> to fit. This is what's described in the ALS-WR paper we link to on >> the >> Spark web site, in equation 5 >> ( >> http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf >> ) >> >> I think this also gets you the scale-invariance? For every additional >> rating from user i to product j, you add one new term to the >> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the >> regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at >> least >> both increasing about linearly as ratings increase. If the >> regularization term is multiplied by the total number of users and >> products in the model, then it's fixed. >> >> I might misunderstand you and/or be speaking about something slightly >> different when it comes to invariance. But FWIW I had always >> understood the regularization to
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
Whoops I just saw this thread, it got caught in my spam filter. Thanks for looking into this Xiangrui and Sean. The implicit situation does seem fairly complicated to me. The cost function (not including the regularization term) is affected both by the number of ratings and by the number of user/products. As we increase alpha the contribution to the cost function from the number of users/products diminishes compared to the contribution from the number of ratings. So large alphas seem to favor the weighted-lambda approach, even though it's not a perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but again it's not a perfect match. I believe low alphas won't work well with regularization because both terms in the cost function will just push everything to zero. Some of my experiments confirm this. This leads me to think that weighted-lambda would work better in practice, but I have no evidence of this. It may make sense to weight lambda by sum_i cij instead? On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng wrote: > Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 > and used the same lambda scaling as in 1.2. The change will be > included in Spark 1.3.1, which will be released soon. Thanks for > reporting this issue! -Xiangrui > > On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng wrote: > > I created a JIRA for this: > > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have > > a clear answer about how the scaling should be handled. Maybe the best > > solution for now is to switch back to the 1.2 scaling. -Xiangrui > > > > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote: > >> Ah yeah I take your point. The squared error term is over the whole > >> user-item matrix, technically, in the implicit case. I suppose I am > >> used to assuming that the 0 terms in this matrix are weighted so much > >> less (because alpha is usually large-ish) that they're almost not > >> there, but they are. So I had just used the explicit formulation. > >> > >> I suppose the result is kind of scale invariant, but not exactly. I > >> had not prioritized this property since I had generally built models > >> on the full data set and not a sample, and had assumed that lambda > >> would need to be retuned over time as the input grew anyway. > >> > >> So, basically I don't know anything more than you do, sorry! > >> > >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng > wrote: > >>> Hey Sean, > >>> > >>> That is true for explicit model, but not for implicit. The ALS-WR > >>> paper doesn't cover the implicit model. In implicit formulation, a > >>> sub-problem (for v_j) is: > >>> > >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 > >>> > >>> This is a sum for all i but not just the users who rate item j. In > >>> this case, if we set X=m_j, the number of observed ratings for item j, > >>> it is not really scale invariant. We have #users user vectors in the > >>> least squares problem but only penalize lambda * #ratings. I was > >>> suggesting using lambda * m directly for implicit model to match the > >>> number of vectors in the least squares problem. Well, this is my > >>> theory. I don't find any public work about it. > >>> > >>> Best, > >>> Xiangrui > >>> > >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen wrote: > I had always understood the formulation to be the first option you > describe. Lambda is scaled by the number of items the user has rated / > interacted with. I think the goal is to avoid fitting the tastes of > prolific users disproportionately just because they have many ratings > to fit. This is what's described in the ALS-WR paper we link to on the > Spark web site, in equation 5 > ( > http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf > ) > > I think this also gets you the scale-invariance? For every additional > rating from user i to product j, you add one new term to the > squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the > regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least > both increasing about linearly as ratings increase. If the > regularization term is multiplied by the total number of users and > products in the model, then it's fixed. > > I might misunderstand you and/or be speaking about something slightly > different when it comes to invariance. But FWIW I had always > understood the regularization to be multiplied by the number of > explicit ratings. > > On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng > wrote: > > Okay, I didn't realize that I changed the behavior of lambda in 1.3. > > to make it "scale-invariant", but it is worth discussing whether this > > is a good change. In 1.2, we multiply lambda by the number ratings in > > each sub-problem. This makes it "scale-invariant" for explicit > > feedback. However, in
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 and used the same lambda scaling as in 1.2. The change will be included in Spark 1.3.1, which will be released soon. Thanks for reporting this issue! -Xiangrui On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng wrote: > I created a JIRA for this: > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have > a clear answer about how the scaling should be handled. Maybe the best > solution for now is to switch back to the 1.2 scaling. -Xiangrui > > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote: >> Ah yeah I take your point. The squared error term is over the whole >> user-item matrix, technically, in the implicit case. I suppose I am >> used to assuming that the 0 terms in this matrix are weighted so much >> less (because alpha is usually large-ish) that they're almost not >> there, but they are. So I had just used the explicit formulation. >> >> I suppose the result is kind of scale invariant, but not exactly. I >> had not prioritized this property since I had generally built models >> on the full data set and not a sample, and had assumed that lambda >> would need to be retuned over time as the input grew anyway. >> >> So, basically I don't know anything more than you do, sorry! >> >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng wrote: >>> Hey Sean, >>> >>> That is true for explicit model, but not for implicit. The ALS-WR >>> paper doesn't cover the implicit model. In implicit formulation, a >>> sub-problem (for v_j) is: >>> >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >>> >>> This is a sum for all i but not just the users who rate item j. In >>> this case, if we set X=m_j, the number of observed ratings for item j, >>> it is not really scale invariant. We have #users user vectors in the >>> least squares problem but only penalize lambda * #ratings. I was >>> suggesting using lambda * m directly for implicit model to match the >>> number of vectors in the least squares problem. Well, this is my >>> theory. I don't find any public work about it. >>> >>> Best, >>> Xiangrui >>> >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen wrote: I had always understood the formulation to be the first option you describe. Lambda is scaled by the number of items the user has rated / interacted with. I think the goal is to avoid fitting the tastes of prolific users disproportionately just because they have many ratings to fit. This is what's described in the ALS-WR paper we link to on the Spark web site, in equation 5 (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) I think this also gets you the scale-invariance? For every additional rating from user i to product j, you add one new term to the squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least both increasing about linearly as ratings increase. If the regularization term is multiplied by the total number of users and products in the model, then it's fixed. I might misunderstand you and/or be speaking about something slightly different when it comes to invariance. But FWIW I had always understood the regularization to be multiplied by the number of explicit ratings. On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng wrote: > Okay, I didn't realize that I changed the behavior of lambda in 1.3. > to make it "scale-invariant", but it is worth discussing whether this > is a good change. In 1.2, we multiply lambda by the number ratings in > each sub-problem. This makes it "scale-invariant" for explicit > feedback. However, in implicit feedback model, a user's sub-problem > contains all item factors. Then the question is whether we should > multiply lambda by the number of explicit ratings from this user or by > the total number of items. We used the former in 1.2 but changed to > the latter in 1.3. So you should try a smaller lambda to get a similar > result in 1.3. > > Sean and Shuo, which approach do you prefer? Do you know any existing > work discussing this? > > Best, > Xiangrui > > > On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: >> This sounds like a bug ... Did you try a different lambda? It would be >> great if you can share your dataset or re-produce this issue on the >> public dataset. Thanks! -Xiangrui >> >> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: >>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly >>> smaller factors (and hence scores). For example, the first few product's >>> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the >>> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This >>> difference of several order
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
I created a JIRA for this: https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have a clear answer about how the scaling should be handled. Maybe the best solution for now is to switch back to the 1.2 scaling. -Xiangrui On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote: > Ah yeah I take your point. The squared error term is over the whole > user-item matrix, technically, in the implicit case. I suppose I am > used to assuming that the 0 terms in this matrix are weighted so much > less (because alpha is usually large-ish) that they're almost not > there, but they are. So I had just used the explicit formulation. > > I suppose the result is kind of scale invariant, but not exactly. I > had not prioritized this property since I had generally built models > on the full data set and not a sample, and had assumed that lambda > would need to be retuned over time as the input grew anyway. > > So, basically I don't know anything more than you do, sorry! > > On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng wrote: >> Hey Sean, >> >> That is true for explicit model, but not for implicit. The ALS-WR >> paper doesn't cover the implicit model. In implicit formulation, a >> sub-problem (for v_j) is: >> >> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >> >> This is a sum for all i but not just the users who rate item j. In >> this case, if we set X=m_j, the number of observed ratings for item j, >> it is not really scale invariant. We have #users user vectors in the >> least squares problem but only penalize lambda * #ratings. I was >> suggesting using lambda * m directly for implicit model to match the >> number of vectors in the least squares problem. Well, this is my >> theory. I don't find any public work about it. >> >> Best, >> Xiangrui >> >> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen wrote: >>> I had always understood the formulation to be the first option you >>> describe. Lambda is scaled by the number of items the user has rated / >>> interacted with. I think the goal is to avoid fitting the tastes of >>> prolific users disproportionately just because they have many ratings >>> to fit. This is what's described in the ALS-WR paper we link to on the >>> Spark web site, in equation 5 >>> (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) >>> >>> I think this also gets you the scale-invariance? For every additional >>> rating from user i to product j, you add one new term to the >>> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the >>> regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least >>> both increasing about linearly as ratings increase. If the >>> regularization term is multiplied by the total number of users and >>> products in the model, then it's fixed. >>> >>> I might misunderstand you and/or be speaking about something slightly >>> different when it comes to invariance. But FWIW I had always >>> understood the regularization to be multiplied by the number of >>> explicit ratings. >>> >>> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng wrote: Okay, I didn't realize that I changed the behavior of lambda in 1.3. to make it "scale-invariant", but it is worth discussing whether this is a good change. In 1.2, we multiply lambda by the number ratings in each sub-problem. This makes it "scale-invariant" for explicit feedback. However, in implicit feedback model, a user's sub-problem contains all item factors. Then the question is whether we should multiply lambda by the number of explicit ratings from this user or by the total number of items. We used the former in 1.2 but changed to the latter in 1.3. So you should try a smaller lambda to get a similar result in 1.3. Sean and Shuo, which approach do you prefer? Do you know any existing work discussing this? Best, Xiangrui On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: > This sounds like a bug ... Did you try a different lambda? It would be > great if you can share your dataset or re-produce this issue on the > public dataset. Thanks! -Xiangrui > > On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: >> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly >> smaller factors (and hence scores). For example, the first few product's >> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the >> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This >> difference of several orders of magnitude is consistent throughout both >> user >> and product. The recommendations from 1.2.0 are subjectively much better >> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses >> less >> memory. >> >> My first thought is that there is too much regularization in the 1.3.0 >> results, but I'm using the same lambda paramete
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
Ah yeah I take your point. The squared error term is over the whole user-item matrix, technically, in the implicit case. I suppose I am used to assuming that the 0 terms in this matrix are weighted so much less (because alpha is usually large-ish) that they're almost not there, but they are. So I had just used the explicit formulation. I suppose the result is kind of scale invariant, but not exactly. I had not prioritized this property since I had generally built models on the full data set and not a sample, and had assumed that lambda would need to be retuned over time as the input grew anyway. So, basically I don't know anything more than you do, sorry! On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng wrote: > Hey Sean, > > That is true for explicit model, but not for implicit. The ALS-WR > paper doesn't cover the implicit model. In implicit formulation, a > sub-problem (for v_j) is: > > min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 > > This is a sum for all i but not just the users who rate item j. In > this case, if we set X=m_j, the number of observed ratings for item j, > it is not really scale invariant. We have #users user vectors in the > least squares problem but only penalize lambda * #ratings. I was > suggesting using lambda * m directly for implicit model to match the > number of vectors in the least squares problem. Well, this is my > theory. I don't find any public work about it. > > Best, > Xiangrui > > On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen wrote: >> I had always understood the formulation to be the first option you >> describe. Lambda is scaled by the number of items the user has rated / >> interacted with. I think the goal is to avoid fitting the tastes of >> prolific users disproportionately just because they have many ratings >> to fit. This is what's described in the ALS-WR paper we link to on the >> Spark web site, in equation 5 >> (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) >> >> I think this also gets you the scale-invariance? For every additional >> rating from user i to product j, you add one new term to the >> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the >> regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least >> both increasing about linearly as ratings increase. If the >> regularization term is multiplied by the total number of users and >> products in the model, then it's fixed. >> >> I might misunderstand you and/or be speaking about something slightly >> different when it comes to invariance. But FWIW I had always >> understood the regularization to be multiplied by the number of >> explicit ratings. >> >> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng wrote: >>> Okay, I didn't realize that I changed the behavior of lambda in 1.3. >>> to make it "scale-invariant", but it is worth discussing whether this >>> is a good change. In 1.2, we multiply lambda by the number ratings in >>> each sub-problem. This makes it "scale-invariant" for explicit >>> feedback. However, in implicit feedback model, a user's sub-problem >>> contains all item factors. Then the question is whether we should >>> multiply lambda by the number of explicit ratings from this user or by >>> the total number of items. We used the former in 1.2 but changed to >>> the latter in 1.3. So you should try a smaller lambda to get a similar >>> result in 1.3. >>> >>> Sean and Shuo, which approach do you prefer? Do you know any existing >>> work discussing this? >>> >>> Best, >>> Xiangrui >>> >>> >>> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: This sounds like a bug ... Did you try a different lambda? It would be great if you can share your dataset or re-produce this issue on the public dataset. Thanks! -Xiangrui On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: > After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly > smaller factors (and hence scores). For example, the first few product's > factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the > first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This > difference of several orders of magnitude is consistent throughout both > user > and product. The recommendations from 1.2.0 are subjectively much better > than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less > memory. > > My first thought is that there is too much regularization in the 1.3.0 > results, but I'm using the same lambda parameter value. This is a snippet > of > my scala code: > . > val rank = 75 > val numIterations = 15 > val alpha = 10 > val lambda = 0.01 > val model = ALS.trainImplicit(train_data, rank, numIterations, > lambda=lambda, alpha=alpha) > . > > The code and input data are identical across both versions. Did anything > change between the two ver
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
Hey Sean, That is true for explicit model, but not for implicit. The ALS-WR paper doesn't cover the implicit model. In implicit formulation, a sub-problem (for v_j) is: min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 This is a sum for all i but not just the users who rate item j. In this case, if we set X=m_j, the number of observed ratings for item j, it is not really scale invariant. We have #users user vectors in the least squares problem but only penalize lambda * #ratings. I was suggesting using lambda * m directly for implicit model to match the number of vectors in the least squares problem. Well, this is my theory. I don't find any public work about it. Best, Xiangrui On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen wrote: > I had always understood the formulation to be the first option you > describe. Lambda is scaled by the number of items the user has rated / > interacted with. I think the goal is to avoid fitting the tastes of > prolific users disproportionately just because they have many ratings > to fit. This is what's described in the ALS-WR paper we link to on the > Spark web site, in equation 5 > (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) > > I think this also gets you the scale-invariance? For every additional > rating from user i to product j, you add one new term to the > squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the > regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least > both increasing about linearly as ratings increase. If the > regularization term is multiplied by the total number of users and > products in the model, then it's fixed. > > I might misunderstand you and/or be speaking about something slightly > different when it comes to invariance. But FWIW I had always > understood the regularization to be multiplied by the number of > explicit ratings. > > On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng wrote: >> Okay, I didn't realize that I changed the behavior of lambda in 1.3. >> to make it "scale-invariant", but it is worth discussing whether this >> is a good change. In 1.2, we multiply lambda by the number ratings in >> each sub-problem. This makes it "scale-invariant" for explicit >> feedback. However, in implicit feedback model, a user's sub-problem >> contains all item factors. Then the question is whether we should >> multiply lambda by the number of explicit ratings from this user or by >> the total number of items. We used the former in 1.2 but changed to >> the latter in 1.3. So you should try a smaller lambda to get a similar >> result in 1.3. >> >> Sean and Shuo, which approach do you prefer? Do you know any existing >> work discussing this? >> >> Best, >> Xiangrui >> >> >> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: >>> This sounds like a bug ... Did you try a different lambda? It would be >>> great if you can share your dataset or re-produce this issue on the >>> public dataset. Thanks! -Xiangrui >>> >>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly smaller factors (and hence scores). For example, the first few product's factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This difference of several orders of magnitude is consistent throughout both user and product. The recommendations from 1.2.0 are subjectively much better than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less memory. My first thought is that there is too much regularization in the 1.3.0 results, but I'm using the same lambda parameter value. This is a snippet of my scala code: . val rank = 75 val numIterations = 15 val alpha = 10 val lambda = 0.01 val model = ALS.trainImplicit(train_data, rank, numIterations, lambda=lambda, alpha=alpha) . The code and input data are identical across both versions. Did anything change between the two versions I'm not aware of? I'd appreciate any help! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
I had always understood the formulation to be the first option you describe. Lambda is scaled by the number of items the user has rated / interacted with. I think the goal is to avoid fitting the tastes of prolific users disproportionately just because they have many ratings to fit. This is what's described in the ALS-WR paper we link to on the Spark web site, in equation 5 (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) I think this also gets you the scale-invariance? For every additional rating from user i to product j, you add one new term to the squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least both increasing about linearly as ratings increase. If the regularization term is multiplied by the total number of users and products in the model, then it's fixed. I might misunderstand you and/or be speaking about something slightly different when it comes to invariance. But FWIW I had always understood the regularization to be multiplied by the number of explicit ratings. On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng wrote: > Okay, I didn't realize that I changed the behavior of lambda in 1.3. > to make it "scale-invariant", but it is worth discussing whether this > is a good change. In 1.2, we multiply lambda by the number ratings in > each sub-problem. This makes it "scale-invariant" for explicit > feedback. However, in implicit feedback model, a user's sub-problem > contains all item factors. Then the question is whether we should > multiply lambda by the number of explicit ratings from this user or by > the total number of items. We used the former in 1.2 but changed to > the latter in 1.3. So you should try a smaller lambda to get a similar > result in 1.3. > > Sean and Shuo, which approach do you prefer? Do you know any existing > work discussing this? > > Best, > Xiangrui > > > On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: >> This sounds like a bug ... Did you try a different lambda? It would be >> great if you can share your dataset or re-produce this issue on the >> public dataset. Thanks! -Xiangrui >> >> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: >>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly >>> smaller factors (and hence scores). For example, the first few product's >>> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the >>> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This >>> difference of several orders of magnitude is consistent throughout both user >>> and product. The recommendations from 1.2.0 are subjectively much better >>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less >>> memory. >>> >>> My first thought is that there is too much regularization in the 1.3.0 >>> results, but I'm using the same lambda parameter value. This is a snippet of >>> my scala code: >>> . >>> val rank = 75 >>> val numIterations = 15 >>> val alpha = 10 >>> val lambda = 0.01 >>> val model = ALS.trainImplicit(train_data, rank, numIterations, >>> lambda=lambda, alpha=alpha) >>> . >>> >>> The code and input data are identical across both versions. Did anything >>> change between the two versions I'm not aware of? I'd appreciate any help! >>> - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
Okay, I didn't realize that I changed the behavior of lambda in 1.3. to make it "scale-invariant", but it is worth discussing whether this is a good change. In 1.2, we multiply lambda by the number ratings in each sub-problem. This makes it "scale-invariant" for explicit feedback. However, in implicit feedback model, a user's sub-problem contains all item factors. Then the question is whether we should multiply lambda by the number of explicit ratings from this user or by the total number of items. We used the former in 1.2 but changed to the latter in 1.3. So you should try a smaller lambda to get a similar result in 1.3. Sean and Shuo, which approach do you prefer? Do you know any existing work discussing this? Best, Xiangrui On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng wrote: > This sounds like a bug ... Did you try a different lambda? It would be > great if you can share your dataset or re-produce this issue on the > public dataset. Thanks! -Xiangrui > > On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: >> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly >> smaller factors (and hence scores). For example, the first few product's >> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the >> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This >> difference of several orders of magnitude is consistent throughout both user >> and product. The recommendations from 1.2.0 are subjectively much better >> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less >> memory. >> >> My first thought is that there is too much regularization in the 1.3.0 >> results, but I'm using the same lambda parameter value. This is a snippet of >> my scala code: >> . >> val rank = 75 >> val numIterations = 15 >> val alpha = 10 >> val lambda = 0.01 >> val model = ALS.trainImplicit(train_data, rank, numIterations, >> lambda=lambda, alpha=alpha) >> . >> >> The code and input data are identical across both versions. Did anything >> change between the two versions I'm not aware of? I'd appreciate any help! >> - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0
This sounds like a bug ... Did you try a different lambda? It would be great if you can share your dataset or re-produce this issue on the public dataset. Thanks! -Xiangrui On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote: > After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly > smaller factors (and hence scores). For example, the first few product's > factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the > first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This > difference of several orders of magnitude is consistent throughout both user > and product. The recommendations from 1.2.0 are subjectively much better > than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less > memory. > > My first thought is that there is too much regularization in the 1.3.0 > results, but I'm using the same lambda parameter value. This is a snippet of > my scala code: > . > val rank = 75 > val numIterations = 15 > val alpha = 10 > val lambda = 0.01 > val model = ALS.trainImplicit(train_data, rank, numIterations, > lambda=lambda, alpha=alpha) > . > > The code and input data are identical across both versions. Did anything > change between the two versions I'm not aware of? I'd appreciate any help! > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org