Hi.
Can you give a bit more details on 3 and 4?
And can you give an example use case?
When do you need scorers and out of bag samples? The scorers are used in
GridSearchCV and cross_val_score, but the out of bag samples basically
replace cross validation,
so I don't quite understand how these would work together.
I think it would be great if you could give a use-case and some (pseudo)
code on how it would look with your favourite solution.
Cheers,
Andy
On 10/26/2014 10:33 PM, Aaron Staple wrote:
Greetings sklearn developers,
I’m a new sklearn contributor, and I’ve been working on a small
project to allow customization of the scoring metric used when scoring
out of bag data for random forests (see
https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
@mblondel and I have been discussing an architectural issue that we
would like others to weigh in on.
While working on my implementation, I’ve run into a bit of difficulty
using the scorer implementation as it exists today - in particular,
with the interface expressed in _BaseScorer. The current _BaseScorer
interface is callable, accepting an estimator (utilized as a
Predictor), along with some prediction data points X, and returning a
score. The various _BaseScorer implementations compute a score by
calling estimator.predict(X), estimator.predict_proba(X), or
estimator.decision_function(X) as needed, possibly applying some
transformations to the results, and then applying a score function.
The issue I’ve run into is that predicting out of bag samples is a
rather specialized procedure because the model used differs for each
training point, based on how that point was used during fitting.
Computing these predictions is not particularly suited for
implementation as a Predictor. In addition, in the PR we’ve been
discussing that idea that a random forest estimator will make its out
of bag predictions available as attributes, allowing a user of the
estimator to subsequently score these provided predictions. Also,
@mblondel mentioned that for his work on multiple-metric grid search,
he is interested in scoring predictions he computes outside of a
Predictor.
The difficulty is that the current scorers take an estimator and data
points, and compute predictions internally. They don’t accept
externally computed predictions.
I’ve written up a series of different generalized options for
implementing a system of scoring externally computed predictions (some
are likely undesirable but are provided as points of comparison):
1) Add a new implementation that’s completely separate from the
existing _BaseScorer class.
2) Use the existing _BaseScorer without changes. This means abusing
the Predictor interface and creating something like a dummy predictor
that ignores X and returns the externally computed predictions -
predictions not inherently based on the X variable, but which were
externally computed based on a known X value.
3) Add a private api to _BaseScorer for scoring externally computed
predictions. The private api can be called by a public helper function
in scorer.py.
4) Change the public api of _BaseScorer to make scoring of externally
computed predictions a public operation along with the existing
functionality. Also possibly rename _BaseScorer => BaseScorer.
5) Change the public api of _BaseScorer so that it only handles
externally computed predictions. The existing functionality would be
implemented by the caller (as a callback, since the required type of
prediction data is not known by the caller).
So far in the PR we’ve been looking at options 2, 3, and 4, with 4
seeming like a good candidate. Once we decide on one of these options,
I’d like to follow up with stakeholders on the specifics of what the
new interface will look like.
Thanks,
Aaron Staple
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general