Re: [Scikit-learn-general] design of scorer interface

Michael Eickenberg Tue, 28 Oct 2014 11:44:52 -0700

It is true that RidgeGCV only does loo predictions and thus would need a
scorer that makes sense on one (ytrue, ypred) couple, such as mse. (The way
it is implemented now for arbitrary scorers is not correct). So point
taken, ridge gcv is an exception.


michael

On Tuesday, October 28, 2014, Andy <t3k...@gmail.com> wrote:

>  As for the oob scores, I don't currently see how you would use loo
> scores with a scorer.
> Is that for generalized cross-validation in RidgeCV?
>
>
> On 10/27/2014 09:41 AM, Mathieu Blondel wrote:
>
>   In addition to out-of-bag scores and multi-metric grid search, there is
> also LOO scores in the ridge regression module, as pointed out by Michael.
>
> Option 4 seems like the best option to me.
>
> We keep __call__(self, estimator, X, y) for backward compatibility and
> because it is sometimes more convenient. But we also add a new method
> get_score(self, y_pred, y_proba, y_decision) for computing scores from
> pre-computed predictions. This is for example how we would implement it in
> _ProbaScorer:
>
>  def get_score(self, y, y_pred=None, y_proba=None, y_decision=None,
> sample_weight=None):
>      if y_proba is None:
>          raise ValueError("This scorer needs y_proba.")
>
>     if sample_weight is not None:
>         return self._sign * self._score_func(y, y_proba,
>                                              sample_weight=sample_weight,
>                                              **self._kwargs)
>     else:
>         return self._sign * self._score_func(y, y_proba, **self._kwargs)
>
>
> M.
>
>
> On Mon, Oct 27, 2014 at 11:33 AM, Aaron Staple <aaron.sta...@gmail.com
> <javascript:_e(%7B%7D,'cvml','aaron.sta...@gmail.com');>> wrote:
>
>>  Greetings sklearn developers,
>>
>>  I’m a new sklearn contributor, and I’ve been working on a small project
>> to allow customization of the scoring metric used when scoring out of bag
>> data for random forests (see
>> https://github.com/scikit-learn/scikit-learn/pull/3723). In this PR,
>> @mblondel and I have been discussing an architectural issue that we would
>> like others to weigh in on.
>>
>>  While working on my implementation, I’ve run into a bit of difficulty
>> using the scorer implementation as it exists today - in particular, with
>> the interface expressed in _BaseScorer. The current _BaseScorer interface
>> is callable, accepting an estimator (utilized as a Predictor), along with
>> some prediction data points X, and returning a score. The various
>> _BaseScorer implementations compute a score by calling
>> estimator.predict(X), estimator.predict_proba(X), or
>> estimator.decision_function(X) as needed, possibly applying some
>> transformations to the results, and then applying a score function.
>>
>>  The issue I’ve run into is that predicting out of bag samples is a
>> rather specialized procedure because the model used differs for each
>> training point, based on how that point was used during fitting. Computing
>> these predictions is not particularly suited for implementation as a
>> Predictor. In addition, in the PR we’ve been discussing that idea that a
>> random forest estimator will make its out of bag predictions available as
>> attributes, allowing a user of the estimator to subsequently score these
>> provided predictions. Also, @mblondel mentioned that for his work on
>> multiple-metric grid search, he is interested in scoring predictions he
>> computes outside of a Predictor.
>>
>>  The difficulty is that the current scorers take an estimator and data
>> points, and compute predictions internally. They don’t accept externally
>> computed predictions.
>>
>>  I’ve written up a series of different generalized options for
>> implementing a system of scoring externally computed predictions (some are
>> likely undesirable but are provided as points of comparison):
>>
>>  1) Add a new implementation that’s completely separate from the
>> existing _BaseScorer class.
>>
>>  2) Use the existing _BaseScorer without changes. This means abusing the
>> Predictor interface and creating something like a dummy predictor that
>> ignores X and returns the externally computed predictions - predictions not
>> inherently based on the X variable, but which were externally computed
>> based on a known X value.
>>
>>  3) Add a private api to _BaseScorer for scoring externally computed
>> predictions. The private api can be called by a public helper function in
>> scorer.py.
>>
>>  4) Change the public api of _BaseScorer to make scoring of externally
>> computed predictions a public operation along with the existing
>> functionality. Also possibly rename _BaseScorer => BaseScorer.
>>
>>  5) Change the public api of _BaseScorer so that it only handles
>> externally computed predictions. The existing functionality would be
>> implemented by the caller (as a callback, since the required type of
>> prediction data is not known by the caller).
>>
>>  So far in the PR we’ve been looking at options 2, 3, and 4, with 4
>> seeming like a good candidate. Once we decide on one of these options, I’d
>> like to follow up with stakeholders on the specifics of what the new
>> interface will look like.
>>
>>  Thanks,
>> Aaron Staple
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listscikit-learn-gene...@lists.sourceforge.net 
> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] design of scorer interface

Reply via email to