Robert Dodier created SPARK-6332:
------------------------------------

             Summary: cmpute calibration curve for binary classifiers
                 Key: SPARK-6332
                 URL: https://issues.apache.org/jira/browse/SPARK-6332
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
            Reporter: Robert Dodier
            Priority: Minor


For binary classifiers, calibration measures how classifier scores compare to 
the proportion of positive examples. If the classifier is well-calibrated, the 
classifier score is approximately equal to the proportion of positive examples. 
This is important if the scores are used as probabilities for making decisions 
via expected cost. Otherwise, the calibration curve may still be interesting; 
the proportion of positive examples should at least be a monotonic function of 
the score.

I propose that a new method for calibration be added to the class 
BinaryClassificationMetrics, since calibration seems to fit in with the ROC 
curve and other classifier assessments. 

For more about calibration, see: 
http://en.wikipedia.org/wiki/Calibration_%28statistics%29#In_classification

References:

Mahdi Pakdaman Naeini, Gregory F. Cooper, Milos Hauskrecht. "Binary Classifier 
Calibration: Non-parametric approach." http://arxiv.org/abs/1401.3390

Alexandru Niculescu-Mizil, Rich Caruana. "Predicting Good Probabilities With 
Supervised Learning." Appearing in Proceedings of the 22nd International 
Conference on Machine Learning, Bonn, Germany, 2005. 
http://www.cs.cornell.edu/~alexn/papers/calibration.icml05.crc.rev3.pdf

"Properties and benefits of calibrated classifiers." Ira Cohen, Moises 
Goldszmidt. http://www.hpl.hp.com/techreports/2004/HPL-2004-22R1.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to