GitHub user robert-dodier opened a pull request:

    https://github.com/apache/spark/pull/5025

    [SPARK-6332] [MLlib] compute calibration curve for binary classifier

    This PR contains an implementation of a calibration method in the class 
BinaryClassificationMetrics. The code was adapted from the method for ROC curve 
construction. Tests on small data sets have been added to 
BinaryClassificationMetricsSuite, and the current version of the code passes 
those tests. 
    
    In this implementation, the return value of the new method is an 
RDD[((Double, Double), (Double, Long))]. The first pair describes each bin and 
the second pair describes the content of each bin. In the first pair, the two 
values are the least and greatest scores in the bin. In the second pair, the 
two values are the proportion of positive examples in the bin, and the number 
of examples in the bin. I chose this representation in order to keep as much 
information as possible. However, a simpler representation might be better; 
let's talk about that if anyone is interested.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/robert-dodier/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5025
    
----
commit 1450509ef32eae77484cb1a85e55c0d3943e415f
Author: Robert Dodier <[email protected]>
Date:   2015-03-13T17:53:39Z

    Change command line arguments in order to get tests to run.

commit d1aacfb084bcccb68e9c04ac7ec835c9710f5e36
Author: Robert Dodier <[email protected]>
Date:   2015-03-13T17:58:04Z

    Initial, incomplete work towards calibration for binary classifiers.
    
     o ProbabilisticClassifier.scala:
        mention calibration in comments
    
     o BinaryClassificationMetrics.scala:
        adapting code for ROC to calibration; incomplete and commented
        out for now
    
     o BinaryClassificationMetricsSuite.scala:
        tests for calibration

commit 87382e5e6fe238824ee9ddb50c6a4c2e9a858cec
Author: Robert Dodier <[email protected]>
Date:   2015-03-13T21:08:14Z

    Initial attempt to implement calibration; compiles, not tested yet.

commit d3c81cd61c685a132c29cae86f48e05c37569d3c
Author: Robert Dodier <[email protected]>
Date:   2015-03-13T21:26:32Z

    Change (..., (Double, Int)) to (..., (Double, Long)) to match
    types to what calibration actually returns.

commit 951817f94921f42cf3031a15a97f787d032dcd92
Author: Robert Dodier <[email protected]>
Date:   2015-03-14T00:48:56Z

    Adjust JVM command line to get tests to run.

commit 07e807a8176b76ec5b26f93bb826b6e7b696306b
Author: Robert Dodier <[email protected]>
Date:   2015-03-14T00:50:11Z

    Adjust bin size to prevent final bin from being very small compared to 
others.

commit c526891c2eaa372f0dbf9835f3420819a967085c
Author: Robert Dodier <[email protected]>
Date:   2015-03-14T02:14:30Z

    Revert local changes to pom.xml.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to