Re: [Scikit-learn-general] design of scorer interface

Aaron Staple Fri, 28 Nov 2014 00:15:26 -0800

Hi Again Folks,

After discussion with Andreas, we decided to move to the PR stage with
option #4 (adding a get_score method to the scorer interface). Andreas
advised me that this PR should include fixing _RidgeGCV.fit so that it
calls the new get_score method.

In the above thread there was some discussion regarding whether or not
ridge cv is a case where the scorer interface should be used at all, and in
particular whether categorical scoring functions are valid for ridge cv. In
the final comment on this topic Mathieu suggested that the scorer interface
should be used, and that ideally categorical scoring functions would be
supported for RidgeCV on the 0-1 prediction domain and for
RidgeClassifierCV.

However, I tried to run a couple of test cases with 0-1 predictions for
RidgeCV and classification with RidgeClassifierCV, and I got some error
messages. It looks like one reason for this is that
LinearModel._center_data can convert the y values to non integers. In
addition, it appears that in the case of multiclass classification the
scorer is applied to the ravel()’ed list of one-vs-all classifiers and not
to the actual class predictions. Am I right in thinking that this can
affect the classification score for some scorers? For example, consider a
simple accuracy scorer and just one prediction. It is possibly for some
one-vs-all classifiers to be predicted correctly while the overall class
prediction is wrong - thus the accuracy score over the one-vs-all
classifiers would be nonzero while the overall classification accuracy is
zero. (In addition, if I am reading correctly I believe the y_true and
y_predicted values are possibly being passed incorrectly to the scorer
currently, and are being swapped with each other.)

Given these observations I wanted to double check 1) that we want to
support classification scorers and not just regression scorers at this
precise location in this code and 2) that I should start using get_score in
this location now, given that I believe at least some additional work will
be needed for support of classification scorers.

Thanks,
Aaron

PS Here are my simple test cases

>>> import numpy as np
>>> from sklearn.linear_model import RidgeClassifierCV, RidgeCV
>>>
>>> clf = RidgeCV(scoring='roc_auc')
>>> y = np.array([0, 1, 1])
>>> X = np.array([[0, 0], [0, 1], [2, 3]])
>>> clf.fit(X,y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sklearn/linear_model/ridge.py", line 858, in fit
    estimator.fit(X, y, sample_weight=sample_weight)
  File "sklearn/linear_model/ridge.py", line 801, in fit
    for i in range(len(self.alphas))]
  File "sklearn/metrics/scorer.py", line 157, in __call__
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: continuous format is not supported
>>>
>>> clf = RidgeClassifierCV(scoring='roc_auc')
>>> y = np.array([0, 1, 1])
>>> X = np.array([[0, 0], [0, 1], [2, 3]])
>>> clf.fit(X,y)
/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2499:
VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute
or function instead. To find the rank of a matrix see
`numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sklearn/linear_model/ridge.py", line 1069, in fit
    _BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
  File "sklearn/linear_model/ridge.py", line 858, in fit
    estimator.fit(X, y, sample_weight=sample_weight)
  File "sklearn/linear_model/ridge.py", line 801, in fit
    for i in range(len(self.alphas))]
  File "sklearn/metrics/scorer.py", line 157, in __call__
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: continuous format is not supported

On Mon, Nov 3, 2014 at 8:27 AM, Andy <t3k...@gmail.com> wrote:

> Cool, I hope I have time to review it the next couple of days.
>
> On 11/02/2014 07:44 PM, Aaron Staple wrote:
> > Hi folks,
> >
> > I went ahead and made a POC for a more complete implementation of
> > option #4:
> >
> >
> https://github.com/staple/scikit-learn/commit/e76fa8887cd35ad7a249ee157067cd12c89bdefb
> >
> > Aaron
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] design of scorer interface

Reply via email to