Dear sklearn mailing list,

I love all the wonderful ways scikit-learn has made good practices in ML more 
accessible to so many! Thanks for all of that!

I’m wondering if there is there a design reason the default behavior for ROC 
generation 
(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html)
 doesn’t return the convex hull of the ROC?

In the default ROC computation, the resulting ROCs aren’t on their convex hulls 
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.html)
 even though points on the convex hulls are achievable performance. So the 
default ROCs returned are suboptimal. That’s a point made in Tom Fawcett’s ROC 
101 paper (https://www.math.ucdavis.edu/~saito/data/roc/fawcett-roc.pdf) that 
was cited in the sklearn docs.

He writes: “More generally, a classifier is potentially optimal if and only if 
it lies on the convex hull of the set of points in ROC space. The convex hull 
of the set of points in ROC space is called the ROC convex hull (ROCCH) of the 
corresponding set of classifiers.”

Apologies if this is already answered somewhere else… I searched and could only 
find this apparently abandoned repo: https://github.com/tfawcett/pycost

I’ve implemented an ROC convex hull myself and have found significant 
performance estimate improvements just from using the convex hull and am 
wondering if there was some reason this wasn’t implemented as the default.

Thanks,
-johnk-

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to