I think we could keep the existing simple score / loss functions for
day to day manual validation of analysis output in an interactive
session for instance, while introducing a richer object oriented API
when working with model selection tools such as cross validation and
grid search.
For instance we could have:
```
class ROCAreaUnderCurveScore(object):
higher_is_better = True
def from_estimator(self, clf, X, y_expected):
if hasattar(clf, 'decision_function'):
y_predicted_thresholds = clf.decision_function(X)
elif hasattr(clf, 'predict_proba'):
y_predicted_thresholds = clf.predict_proba(X)
else:
raise TypeError("%r does not support thresholded predictions" % clf)
# TODO: check binary classification shape or raise ValueError
return self.from_decision_thresholds(y_expected, y_predicted_thresholds)
def from_decision_thresholds(self, expected, predicted_thresholds):
return auc_score(expected, predicted_thresholds)
class FScore(object):
higher_is_better = True
def __init__(self, beta):
self.beta = beta
def from_estimator(self, clf, X, y_expected):
# TODO: check input to provide meaningful ValueError or
TypeError to the caller
return fbeta_score(y_expected, clf.predict(X), beta=self.beta)
def from_multiclass_prediction(self, y_expected, y_predicted):
return fbeta_score(y_expected, y_predicted, beta=self.beta)
class RMSELoss(object):
higher_is_better = False
def from_estimator(self, clf, X, y_expected):
# TODO
def from_regression_prediction(self, y_expected, y_predicted):
# TODO
# Then later to address common use cases in a flat manner:
COMMON_SCORES = {
'roc_auc': ROCAreaUnderCurveScore(),
'f1': FScore(1.0),
'pr_auc': PRAreaUnderCurveScore(beta=1.0),
'rmse': RMSELoss(),
}
```
Then in GridSearchCV we can have a flat and convenient API for common
uses cases such as:
>>> GridSearchCV(clf, score='roc_auc').fit(X, y)
while preserving a flexible yet homogeneous API to handle custom use cases:
>>> class MyCustomScore(object):
... def __init__(self, some_param=1.0):
... self.some_param = some_param
... def from_decision_thresholds(self, expected, predicted_threshold):
... # do something with self.some_param, expected and
predicted_threshold
... return score_value
...
>>> the_forty_two_custom_score = MyCustomScore(some_param=42.)
>>> GridSearchCV(clf, score=the_forty_two_custom_score).fit(X, y)
This way we still have a flat API for 99% of the common use cases
while allowing to express richer semantics when needed (for instance
plugging a domain specific evaluation metric for your own research to
reproduce a domain-specifc benchmark or to compete on a kaggle
challenge).
We further use plain vanilla ducktyping instead of framework specific
helpers (decorators / DSL) to handle the complex case.
We can also provide a bunch of scorers mixin types / ABC to factorize
redundant code.
Finally we might want to later extend such as Scoring API to deal wrap
a validation set / OOB samples to do early stopping on a configurable
score for various scikit-learn models such as SGDClassifer, GBRT... I
have not really thought about that part yet but having score objects
rather than simple funcs / callables should probably make that much
easier.
WDYT?
--
Olivier
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general