Re: Movie Recommendation tutorial

2015-02-24 Thread Sean Owen
It's something like the average error in rating, but a bit different
-- it's the square root of average squared error. But if you think of
the ratings as 'stars' you could kind of think of 0.86 as 'generally
off by 0.86' stars and that would be somewhat right.

Whether that's good depends on what the range of input was. For 1-5
that's OK; for 1-100 it would be fantastic.

To give you a point of comparison, when Netflix launched their Netflix
Prize, their recommender had an RMSE of 0.95 or so. The winning
solution was at about 0.85. Their data set was a larger, harder
problem than the movielens data set though.

So: reasonably good.

On Tue, Feb 24, 2015 at 8:19 PM, Krishna Sankar  wrote:
> Yep, much better with 0.1.
>
> "The best model was trained with rank = 12 and lambda = 0.1, and numIter =
> 20, and its RMSE on the test set is 0.869092" (Spark 1.3.0)
>
> Question : What is the intuition behind RSME of 0.86 vs 1.3 ? I know the
> smaller the better. But is it that better ? And what is a good number for a
> recommendation engine ?
>
> Cheers
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Movie Recommendation tutorial

2015-02-24 Thread Krishna Sankar
Yep, much better with 0.1.

"The best model was trained with rank = 12 and lambda = 0.1, and numIter =
20, and its RMSE on the test set is 0.869092" (Spark 1.3.0)

Question : What is the intuition behind RSME of 0.86 vs 1.3 ? I know the
smaller the better. But is it that better ? And what is a good number for a
recommendation engine ?

Cheers


On Tue, Feb 24, 2015 at 1:03 AM, Guillaume Charhon <
guilla...@databerries.com> wrote:

> I am using Spark 1.2.1.
>
> Thank you Krishna, I am getting almost the same results as you so it must
> be an error in the tutorial. Xiangrui, I made some additional tests with
> lambda to 0.1 and I am getting a much better rmse:
>
> RMSE (validation) = 0.868981 for the model trained with rank = 8, lambda =
> 0.1, and numIter = 10.
>
>
> RMSE (validation) = 0.869628 for the model trained with rank = 8, lambda =
> 0.1, and numIter = 20.
>
>
> RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda =
> 1.0, and numIter = 10.
>
>
> RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda =
> 1.0, and numIter = 20.
>
>
> RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda =
> 10.0, and numIter = 10.
>
>
> RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda =
> 10.0, and numIter = 20.
>
>
> RMSE (validation) = 0.866605 for the model trained with rank = 12, lambda
> = 0.1, and numIter = 10.
>
>
> RMSE (validation) = 0.867498 for the model trained with rank = 12, lambda
> = 0.1, and numIter = 20.
>
>
> RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda
> = 1.0, and numIter = 10.
>
>
> RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda
> = 1.0, and numIter = 20.
>
>
> RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda
> = 10.0, and numIter = 10.
>
>
> RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda
> = 10.0, and numIter = 20.
>
>
> The best model was trained with rank = 12 and lambda = 0.1, and numIter =
> 10, and its RMSE on the test set is 0.865407.
>
>
> On Tue, Feb 24, 2015 at 7:23 AM, Xiangrui Meng  wrote:
>
>> Try to set lambda to 0.1. -Xiangrui
>>
>> On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar 
>> wrote:
>> > The RSME varies a little bit between the versions.
>> > Partitioned the training,validation,test set like so:
>> >
>> > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6)
>> > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3]
>> %
>> > 10) < 8)
>> > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8)
>> > Validation MSE :
>> >
>> > # 1.3.0 Mean Squared Error = 0.871456869392
>> > # 1.2.1 Mean Squared Error = 0.877305629074
>> >
>> > Itertools results:
>> >
>> > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20)
>> > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10)
>> >
>> > Cheers
>> > 
>> >
>> > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng 
>> wrote:
>> >>
>> >> Which Spark version did you use? Btw, there are three datasets from
>> >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui
>> >>
>> >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez 
>> >> wrote:
>> >> > What do you mean?
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> > -
>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>> >> >
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>>
>
>


Re: Movie Recommendation tutorial

2015-02-24 Thread Guillaume Charhon
I am using Spark 1.2.1.

Thank you Krishna, I am getting almost the same results as you so it must
be an error in the tutorial. Xiangrui, I made some additional tests with
lambda to 0.1 and I am getting a much better rmse:

RMSE (validation) = 0.868981 for the model trained with rank = 8, lambda =
0.1, and numIter = 10.


RMSE (validation) = 0.869628 for the model trained with rank = 8, lambda =
0.1, and numIter = 20.


RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda =
1.0, and numIter = 10.


RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda =
1.0, and numIter = 20.


RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda =
10.0, and numIter = 10.


RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda =
10.0, and numIter = 20.


RMSE (validation) = 0.866605 for the model trained with rank = 12, lambda =
0.1, and numIter = 10.


RMSE (validation) = 0.867498 for the model trained with rank = 12, lambda =
0.1, and numIter = 20.


RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda =
1.0, and numIter = 10.


RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda =
1.0, and numIter = 20.


RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda =
10.0, and numIter = 10.


RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda =
10.0, and numIter = 20.


The best model was trained with rank = 12 and lambda = 0.1, and numIter =
10, and its RMSE on the test set is 0.865407.


On Tue, Feb 24, 2015 at 7:23 AM, Xiangrui Meng  wrote:

> Try to set lambda to 0.1. -Xiangrui
>
> On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar 
> wrote:
> > The RSME varies a little bit between the versions.
> > Partitioned the training,validation,test set like so:
> >
> > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6)
> > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] %
> > 10) < 8)
> > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8)
> > Validation MSE :
> >
> > # 1.3.0 Mean Squared Error = 0.871456869392
> > # 1.2.1 Mean Squared Error = 0.877305629074
> >
> > Itertools results:
> >
> > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20)
> > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10)
> >
> > Cheers
> > 
> >
> > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng 
> wrote:
> >>
> >> Which Spark version did you use? Btw, there are three datasets from
> >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui
> >>
> >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez 
> >> wrote:
> >> > What do you mean?
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > -
> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>


Re: Movie Recommendation tutorial

2015-02-23 Thread Xiangrui Meng
Try to set lambda to 0.1. -Xiangrui

On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar  wrote:
> The RSME varies a little bit between the versions.
> Partitioned the training,validation,test set like so:
>
> training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6)
> validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] %
> 10) < 8)
> test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8)
> Validation MSE :
>
> # 1.3.0 Mean Squared Error = 0.871456869392
> # 1.2.1 Mean Squared Error = 0.877305629074
>
> Itertools results:
>
> 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20)
> 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10)
>
> Cheers
> 
>
> On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng  wrote:
>>
>> Which Spark version did you use? Btw, there are three datasets from
>> MovieLens. The tutorial used the medium one (1 million). -Xiangrui
>>
>> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez 
>> wrote:
>> > What do you mean?
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > -
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Movie Recommendation tutorial

2015-02-23 Thread Krishna Sankar
   1. The RSME varies a little bit between the versions.
   2. Partitioned the training,validation,test set like so:
   - training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6)
  - validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and
  (x[3] % 10) < 8)
  - test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8)
  - Validation MSE :
  -
 - # 1.3.0 Mean Squared Error = 0.871456869392
 - # 1.2.1 Mean Squared Error = 0.877305629074
  3. Itertools results:
  - 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter =
  20)
  - 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter =
  10)

Cheers


On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng  wrote:

> Which Spark version did you use? Btw, there are three datasets from
> MovieLens. The tutorial used the medium one (1 million). -Xiangrui
>
> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez 
> wrote:
> > What do you mean?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Movie Recommendation tutorial

2015-02-23 Thread Xiangrui Meng
Which Spark version did you use? Btw, there are three datasets from
MovieLens. The tutorial used the medium one (1 million). -Xiangrui

On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez  wrote:
> What do you mean?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Movie Recommendation tutorial

2015-02-23 Thread poiuytrez
What do you mean? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Movie Recommendation tutorial

2015-02-23 Thread poiuytrez
Hello,

I am following the Movies recommendation with MLlib tutorial
(https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html).
However, I get RMSE that are much larger than what's written at step 7:
The best model was trained with rank = 8 and lambda = 1.0, and numIter = 10,
and its RMSE on the test set is 1.357078.
instead of 
The best model was trained using rank 8 and lambda 10.0, and its RMSE on
test is 0.8808492431998702.

Is it a mistake on the tutorial or am I doing something wrong? 

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org