GitHub user robert-dodier opened a pull request:
https://github.com/apache/spark/pull/5025
[SPARK-6332] [MLlib] compute calibration curve for binary classifier
This PR contains an implementation of a calibration method in the class
BinaryClassificationMetrics. The code was adapted from the method for ROC curve
construction. Tests on small data sets have been added to
BinaryClassificationMetricsSuite, and the current version of the code passes
those tests.
In this implementation, the return value of the new method is an
RDD[((Double, Double), (Double, Long))]. The first pair describes each bin and
the second pair describes the content of each bin. In the first pair, the two
values are the least and greatest scores in the bin. In the second pair, the
two values are the proportion of positive examples in the bin, and the number
of examples in the bin. I chose this representation in order to keep as much
information as possible. However, a simpler representation might be better;
let's talk about that if anyone is interested.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/robert-dodier/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5025.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5025
----
commit 1450509ef32eae77484cb1a85e55c0d3943e415f
Author: Robert Dodier <[email protected]>
Date: 2015-03-13T17:53:39Z
Change command line arguments in order to get tests to run.
commit d1aacfb084bcccb68e9c04ac7ec835c9710f5e36
Author: Robert Dodier <[email protected]>
Date: 2015-03-13T17:58:04Z
Initial, incomplete work towards calibration for binary classifiers.
o ProbabilisticClassifier.scala:
mention calibration in comments
o BinaryClassificationMetrics.scala:
adapting code for ROC to calibration; incomplete and commented
out for now
o BinaryClassificationMetricsSuite.scala:
tests for calibration
commit 87382e5e6fe238824ee9ddb50c6a4c2e9a858cec
Author: Robert Dodier <[email protected]>
Date: 2015-03-13T21:08:14Z
Initial attempt to implement calibration; compiles, not tested yet.
commit d3c81cd61c685a132c29cae86f48e05c37569d3c
Author: Robert Dodier <[email protected]>
Date: 2015-03-13T21:26:32Z
Change (..., (Double, Int)) to (..., (Double, Long)) to match
types to what calibration actually returns.
commit 951817f94921f42cf3031a15a97f787d032dcd92
Author: Robert Dodier <[email protected]>
Date: 2015-03-14T00:48:56Z
Adjust JVM command line to get tests to run.
commit 07e807a8176b76ec5b26f93bb826b6e7b696306b
Author: Robert Dodier <[email protected]>
Date: 2015-03-14T00:50:11Z
Adjust bin size to prevent final bin from being very small compared to
others.
commit c526891c2eaa372f0dbf9835f3420819a967085c
Author: Robert Dodier <[email protected]>
Date: 2015-03-14T02:14:30Z
Revert local changes to pom.xml.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]