Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Nick Pentreath Sun, 24 Jul 2016 12:51:09 -0700

It seems likely that you're running into
https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
test dataset in the train/test split contains users or items that were not
in the training set. Hence the model doesn't have computed factors for
those ids, and ALS 'transform' currently returns NaN for those ids. This in
turn results in NaN for the evaluator result.


I have a PR open on that issue that will hopefully address this soon.


On Sun, 24 Jul 2016 at 17:49 VG <vlin...@gmail.com> wrote:

> ping. Anyone has some suggestions/advice for me .
> It will be really helpful.
>
> VG
> On Sun, Jul 24, 2016 at 12:19 AM, VG <vlin...@gmail.com> wrote:
>
>> Sean,
>>
>> I did this just to test the model. When I do a split of my data as
>> training to 80% and test to be 20%
>>
>> I get a Root-mean-square error = NaN
>>
>> So I am wondering where I might be going wrong
>>
>> Regards,
>> VG
>>
>> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> No, that's certainly not to be expected. ALS works by computing a much
>>> lower-rank representation of the input. It would not reproduce the
>>> input exactly, and you don't want it to -- this would be seriously
>>> overfit. This is why in general you don't evaluate a model on the
>>> training set.
>>>
>>> On Sat, Jul 23, 2016 at 7:37 PM, VG <vlin...@gmail.com> wrote:
>>> > I am trying to run ml.ALS to compute some recommendations.
>>> >
>>> > Just to test I am using the same dataset for training using ALSModel
>>> and for
>>> > predicting the results based on the model .
>>> >
>>> > When I evaluate the result using RegressionEvaluator I get a
>>> > Root-mean-square error = 1.5544064263236066
>>> >
>>> > I thin this should be 0. Any suggestions what might be going wrong.
>>> >
>>> > Regards,
>>> > Vipul
>>>
>>
>>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Reply via email to