Re: Movie Recommendation tutorial
It's something like the average error in rating, but a bit different -- it's the square root of average squared error. But if you think of the ratings as 'stars' you could kind of think of 0.86 as 'generally off by 0.86' stars and that would be somewhat right. Whether that's good depends on what the range of input was. For 1-5 that's OK; for 1-100 it would be fantastic. To give you a point of comparison, when Netflix launched their Netflix Prize, their recommender had an RMSE of 0.95 or so. The winning solution was at about 0.85. Their data set was a larger, harder problem than the movielens data set though. So: reasonably good. On Tue, Feb 24, 2015 at 8:19 PM, Krishna Sankar wrote: > Yep, much better with 0.1. > > "The best model was trained with rank = 12 and lambda = 0.1, and numIter = > 20, and its RMSE on the test set is 0.869092" (Spark 1.3.0) > > Question : What is the intuition behind RSME of 0.86 vs 1.3 ? I know the > smaller the better. But is it that better ? And what is a good number for a > recommendation engine ? > > Cheers > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Movie Recommendation tutorial
Yep, much better with 0.1. "The best model was trained with rank = 12 and lambda = 0.1, and numIter = 20, and its RMSE on the test set is 0.869092" (Spark 1.3.0) Question : What is the intuition behind RSME of 0.86 vs 1.3 ? I know the smaller the better. But is it that better ? And what is a good number for a recommendation engine ? Cheers On Tue, Feb 24, 2015 at 1:03 AM, Guillaume Charhon < guilla...@databerries.com> wrote: > I am using Spark 1.2.1. > > Thank you Krishna, I am getting almost the same results as you so it must > be an error in the tutorial. Xiangrui, I made some additional tests with > lambda to 0.1 and I am getting a much better rmse: > > RMSE (validation) = 0.868981 for the model trained with rank = 8, lambda = > 0.1, and numIter = 10. > > > RMSE (validation) = 0.869628 for the model trained with rank = 8, lambda = > 0.1, and numIter = 20. > > > RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = > 1.0, and numIter = 10. > > > RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = > 1.0, and numIter = 20. > > > RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = > 10.0, and numIter = 10. > > > RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = > 10.0, and numIter = 20. > > > RMSE (validation) = 0.866605 for the model trained with rank = 12, lambda > = 0.1, and numIter = 10. > > > RMSE (validation) = 0.867498 for the model trained with rank = 12, lambda > = 0.1, and numIter = 20. > > > RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda > = 1.0, and numIter = 10. > > > RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda > = 1.0, and numIter = 20. > > > RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda > = 10.0, and numIter = 10. > > > RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda > = 10.0, and numIter = 20. > > > The best model was trained with rank = 12 and lambda = 0.1, and numIter = > 10, and its RMSE on the test set is 0.865407. > > > On Tue, Feb 24, 2015 at 7:23 AM, Xiangrui Meng wrote: > >> Try to set lambda to 0.1. -Xiangrui >> >> On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar >> wrote: >> > The RSME varies a little bit between the versions. >> > Partitioned the training,validation,test set like so: >> > >> > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6) >> > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] >> % >> > 10) < 8) >> > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8) >> > Validation MSE : >> > >> > # 1.3.0 Mean Squared Error = 0.871456869392 >> > # 1.2.1 Mean Squared Error = 0.877305629074 >> > >> > Itertools results: >> > >> > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20) >> > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10) >> > >> > Cheers >> > >> > >> > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng >> wrote: >> >> >> >> Which Spark version did you use? Btw, there are three datasets from >> >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui >> >> >> >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez >> >> wrote: >> >> > What do you mean? >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html >> >> > Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> > >> >> > - >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> > For additional commands, e-mail: user-h...@spark.apache.org >> >> > >> >> >> >> - >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> > >> > >
Re: Movie Recommendation tutorial
I am using Spark 1.2.1. Thank you Krishna, I am getting almost the same results as you so it must be an error in the tutorial. Xiangrui, I made some additional tests with lambda to 0.1 and I am getting a much better rmse: RMSE (validation) = 0.868981 for the model trained with rank = 8, lambda = 0.1, and numIter = 10. RMSE (validation) = 0.869628 for the model trained with rank = 8, lambda = 0.1, and numIter = 20. RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = 1.0, and numIter = 10. RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = 1.0, and numIter = 20. RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = 10.0, and numIter = 10. RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = 10.0, and numIter = 20. RMSE (validation) = 0.866605 for the model trained with rank = 12, lambda = 0.1, and numIter = 10. RMSE (validation) = 0.867498 for the model trained with rank = 12, lambda = 0.1, and numIter = 20. RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda = 1.0, and numIter = 10. RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda = 1.0, and numIter = 20. RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda = 10.0, and numIter = 10. RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda = 10.0, and numIter = 20. The best model was trained with rank = 12 and lambda = 0.1, and numIter = 10, and its RMSE on the test set is 0.865407. On Tue, Feb 24, 2015 at 7:23 AM, Xiangrui Meng wrote: > Try to set lambda to 0.1. -Xiangrui > > On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar > wrote: > > The RSME varies a little bit between the versions. > > Partitioned the training,validation,test set like so: > > > > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6) > > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] % > > 10) < 8) > > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8) > > Validation MSE : > > > > # 1.3.0 Mean Squared Error = 0.871456869392 > > # 1.2.1 Mean Squared Error = 0.877305629074 > > > > Itertools results: > > > > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20) > > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10) > > > > Cheers > > > > > > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng > wrote: > >> > >> Which Spark version did you use? Btw, there are three datasets from > >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui > >> > >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez > >> wrote: > >> > What do you mean? > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html > >> > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > > >> > - > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > >> > > >> > >> - > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > >
Re: Movie Recommendation tutorial
Try to set lambda to 0.1. -Xiangrui On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar wrote: > The RSME varies a little bit between the versions. > Partitioned the training,validation,test set like so: > > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6) > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] % > 10) < 8) > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8) > Validation MSE : > > # 1.3.0 Mean Squared Error = 0.871456869392 > # 1.2.1 Mean Squared Error = 0.877305629074 > > Itertools results: > > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20) > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10) > > Cheers > > > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng wrote: >> >> Which Spark version did you use? Btw, there are three datasets from >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui >> >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez >> wrote: >> > What do you mean? >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > - >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Movie Recommendation tutorial
1. The RSME varies a little bit between the versions. 2. Partitioned the training,validation,test set like so: - training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6) - validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] % 10) < 8) - test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8) - Validation MSE : - - # 1.3.0 Mean Squared Error = 0.871456869392 - # 1.2.1 Mean Squared Error = 0.877305629074 3. Itertools results: - 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20) - 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10) Cheers On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng wrote: > Which Spark version did you use? Btw, there are three datasets from > MovieLens. The tutorial used the medium one (1 million). -Xiangrui > > On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez > wrote: > > What do you mean? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Movie Recommendation tutorial
Which Spark version did you use? Btw, there are three datasets from MovieLens. The tutorial used the medium one (1 million). -Xiangrui On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez wrote: > What do you mean? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Movie Recommendation tutorial
What do you mean? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org