Re: [Scikit-learn-general] design of scorer interface

Andy Tue, 28 Oct 2014 11:21:49 -0700

Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?

When do you need scorers and out of bag samples? The scorers are used inGridSearchCV and cross_val_score, but the out of bag samples basicallyreplace cross validation,

so I don't quite understand how these would work together.

I think it would be great if you could give a use-case and some (pseudo)code on how it would look with your favourite solution.


Cheers,
Andy

On 10/26/2014 10:33 PM, Aaron Staple wrote:

Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a smallproject to allow customization of the scoring metric used when scoringout of bag data for random forests (seehttps://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,@mblondel and I have been discussing an architectural issue that wewould like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficultyusing the scorer implementation as it exists today - in particular,with the interface expressed in _BaseScorer. The current _BaseScorerinterface is callable, accepting an estimator (utilized as aPredictor), along with some prediction data points X, and returning ascore. The various _BaseScorer implementations compute a score bycalling estimator.predict(X), estimator.predict_proba(X), orestimator.decision_function(X) as needed, possibly applying sometransformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is arather specialized procedure because the model used differs for eachtraining point, based on how that point was used during fitting.Computing these predictions is not particularly suited forimplementation as a Predictor. In addition, in the PR we’ve beendiscussing that idea that a random forest estimator will make its outof bag predictions available as attributes, allowing a user of theestimator to subsequently score these provided predictions. Also,@mblondel mentioned that for his work on multiple-metric grid search,he is interested in scoring predictions he computes outside of aPredictor.
The difficulty is that the current scorers take an estimator and datapoints, and compute predictions internally. They don’t acceptexternally computed predictions.
I’ve written up a series of different generalized options forimplementing a system of scoring externally computed predictions (someare likely undesirable but are provided as points of comparison):
1) Add a new implementation that’s completely separate from theexisting _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusingthe Predictor interface and creating something like a dummy predictorthat ignores X and returns the externally computed predictions -predictions not inherently based on the X variable, but which wereexternally computed based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computedpredictions. The private api can be called by a public helper functionin scorer.py.
4) Change the public api of _BaseScorer to make scoring of externallycomputed predictions a public operation along with the existingfunctionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handlesexternally computed predictions. The existing functionality would beimplemented by the caller (as a callback, since the required type ofprediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4seeming like a good candidate. Once we decide on one of these options,I’d like to follow up with stakeholders on the specifics of what thenew interface will look like.
Thanks,
Aaron Staple


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] design of scorer interface

Reply via email to