Re: [Scikit-learn-general] design of scorer interface

Aaron Staple Tue, 28 Oct 2014 23:37:21 -0700

Following up on Andy’s questions:

The scorer implementation provides a registry of named scorers, and these
scorers may implement specialized logic such as choosing to call an
appropriate predictor method or munging the output of predict_proba. My
task was to make oob scoring support the same set of named scoring metrics
as cv, so my inclination was to use the existing scorers rather than start
from scratch. (Writing a separate implementation would be option #1 in my
list above.)


I’ve also written up some examples (copying details from @mblondel’s
example earlier)

For #3, the interface might look something like:

class _BaseScorer(…):

@abstractmethod
def __call__(self, estimator, X, y_true, sample_weight=None):
 pass

@abstractmethod
def _score(self, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
pass

class _ProbaScorer(_BaseScorer):

…

def _score(self, y, y_pred=None, y_proba=None, y_decision=None,
sample_weight=None):
if y_proba is None:
raise ValueError("This scorer needs y_proba.")
if sample_weight is not None:
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y, y_proba, **self._kwargs)


And then there would be a function

def getScore(scoring, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
return lookup(scoring)._score(y_true, y_prediction, y_prediction_proba,
y_decision_function)

(more detail in a possible variation of this at
https://github.com/staple/scikit-learn/blob/3455/sklearn/metrics/scorer.py,
where the __call__ and _score methods share an implementation.)

For #4,

class _BaseScorer(…):

@abstractmethod
def __call__(self, estimator, X, y_true, sample_weight=None):
 pass

@abstractmethod
def get_score(self, y_true, y_prediction=None, y_prediction_proba=None,
y_decision_function=None):
pass

class _ProbaScorer(_BaseScorer):

…

def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
sample_weight=None):
if y_proba is None:
raise ValueError("This scorer needs y_proba.")
if sample_weight is not None:
return self._sign * self._score_func(y, y_proba,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y, y_proba, **self._kwargs)


On Tue, Oct 28, 2014 at 7:10 PM, Mathieu Blondel <math...@mblondel.org>
wrote:

> Different metrics require different inputs (results of predict,
> decision_function, predict_proba). To avoid branching in the grid search
> and cross-validation, we thus introduced the scorer API. A scorer knows
> what kind of input it needs and calls predict, decision_function,
> predict_proba as needed. We would like to reuse the scorer logic for
> out-of-bag scores as well, in order to avoid branching. The problem is that
> the scorer API is not suitable if the predictions are already available.
> RidgeCV works around this by creating a constant predictor but this is in
> my opinion an ugly hack. The get_score method I proposed would avoid
> branching, although it would require to compute y_pred, y_decision and
> y_proba.
>
> In the classification case, another idea would be to compute out-of-bag
> probabilities. Then a score would be obtained by calling a
> get_score_from_proba method. This method would be implemented as follows:
>
> class _PredictScorer(_BaseScorer):
>     def get_score_from_proba(self, y, y_proba, classes):
>         y_pred = classes[np.argmax(y_proba)
>         return self._sign * self._score_func(y, y_pred, **self._kwargs)
>
> class _ProbaScorer(_BaseScorer):
>     def get_score_from_proba(self, y, y_proba, classes):
>         return self._sign * self._score_func(y, y_proba, **self._kwargs)
>
> The nice thing about predict_proba is that it returns consistently an
> array of shape (n_samples, n_classes). decision_function is more
> problematic because it doesn't return an array of shape (n_samples, 2) in
> the binary case. There was a discussion a long time ago about adding a
> predict_score method that would be more consistent in this regard but I
> don't remember the discussion outcome.
>
> I don't agree that RidgeCV is an exception. If your labels are binary, it
> is perfectly valid to train a regressor on them and want to compute ranking
> metrics like AUC or Average Precision. And there is RidgeClassifierCV too.
>
> Mathieu
>
> On Wed, Oct 29, 2014 at 3:21 AM, Andy <t3k...@gmail.com> wrote:
>
>>  Hi.
>> Can you give a bit more details on 3 and 4?
>> And can you give an example use case?
>> When do you need scorers and out of bag samples? The scorers are used in
>> GridSearchCV and cross_val_score, but the out of bag samples basically
>> replace cross validation,
>> so I don't quite understand how these would work together.
>>
>> I think it would be great if you could give a use-case and some (pseudo)
>> code on how it would look with your favourite solution.
>>
>> Cheers,
>> Andy
>>
>>
>> On 10/26/2014 10:33 PM, Aaron Staple wrote:
>>
>>  Greetings sklearn developers,
>>
>>  I’m a new sklearn contributor, and I’ve been working on a small project
>> to allow customization of the scoring metric used when scoring out of bag
>> data for random forests (see
>> https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
>> @mblondel and I have been discussing an architectural issue that we would
>> like others to weigh in on.
>>
>>  While working on my implementation, I’ve run into a bit of difficulty
>> using the scorer implementation as it exists today - in particular, with
>> the interface expressed in _BaseScorer. The current _BaseScorer interface
>> is callable, accepting an estimator (utilized as a Predictor), along with
>> some prediction data points X, and returning a score. The various
>> _BaseScorer implementations compute a score by calling
>> estimator.predict(X), estimator.predict_proba(X), or
>> estimator.decision_function(X) as needed, possibly applying some
>> transformations to the results, and then applying a score function.
>>
>>  The issue I’ve run into is that predicting out of bag samples is a
>> rather specialized procedure because the model used differs for each
>> training point, based on how that point was used during fitting. Computing
>> these predictions is not particularly suited for implementation as a
>> Predictor. In addition, in the PR we’ve been discussing that idea that a
>> random forest estimator will make its out of bag predictions available as
>> attributes, allowing a user of the estimator to subsequently score these
>> provided predictions. Also, @mblondel mentioned that for his work on
>> multiple-metric grid search, he is interested in scoring predictions he
>> computes outside of a Predictor.
>>
>>  The difficulty is that the current scorers take an estimator and data
>> points, and compute predictions internally. They don’t accept externally
>> computed predictions.
>>
>>  I’ve written up a series of different generalized options for
>> implementing a system of scoring externally computed predictions (some are
>> likely undesirable but are provided as points of comparison):
>>
>>  1) Add a new implementation that’s completely separate from the
>> existing _BaseScorer class.
>>
>>  2) Use the existing _BaseScorer without changes. This means abusing the
>> Predictor interface and creating something like a dummy predictor that
>> ignores X and returns the externally computed predictions - predictions not
>> inherently based on the X variable, but which were externally computed
>> based on a known X value.
>>
>>  3) Add a private api to _BaseScorer for scoring externally computed
>> predictions. The private api can be called by a public helper function in
>> scorer.py.
>>
>>  4) Change the public api of _BaseScorer to make scoring of externally
>> computed predictions a public operation along with the existing
>> functionality. Also possibly rename _BaseScorer => BaseScorer.
>>
>>  5) Change the public api of _BaseScorer so that it only handles
>> externally computed predictions. The existing functionality would be
>> implemented by the caller (as a callback, since the required type of
>> prediction data is not known by the caller).
>>
>>  So far in the PR we’ve been looking at options 2, 3, and 4, with 4
>> seeming like a good candidate. Once we decide on one of these options, I’d
>> like to follow up with stakeholders on the specifics of what the new
>> interface will look like.
>>
>>  Thanks,
>> Aaron Staple
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] design of scorer interface

Reply via email to