bo song created SPARK-17987:
-------------------------------

             Summary: ML Evaluator fails to handle null values in the dataset
                 Key: SPARK-17987
                 URL: https://issues.apache.org/jira/browse/SPARK-17987
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.0.1, 1.6.2
            Reporter: bo song


Take the RegressionEvaluator as an example, when the predictionCol is null in a 
row, en exception "scala.MatchEror" will be thrown. The missing null prediction 
is a common case, for example when an predictor is missing, or its value is out 
of bound, almost machine learning models could not produce correct predictions, 
then null predictions would be returned. Evaluators should handle the null 
values instead of an exception thrown, the common way to handle missing null 
values is to ignore them. Besides of the null value, the NAN value need to be 
handled correctly too. 

Those three evaluators RegressionEvaluator, BinaryClassificationEvaluator and 
MulticlassClassificationEvaluator have the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to