Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

Michael Eickenberg Wed, 20 Aug 2014 04:23:59 -0700

Aah yes -- thanks for this rather importance piece of information! -- so it
is using the scoring of the underlying estimator in precedence before the
gridsearchcv's own one. Since we can't really change the internal default
scorer of ExtraTreesClassifier, except by doing something horrible like,
etr.score = lambda X, y, sample_weight=None: different_scorer(etr, X, y,
sample_weight=sample_weight) (which I haven't tested), this does put us in
a fix.


Federico: So to be on the safe side, you should always call the scorer
explicitly similarly to what you do in your second print statement,
score_func(meta_clf, X_test, y_test), (no need for best_estimator_,
predict/predict_proba/decision_function calls get handed to it
automatically).

Michael



On Wed, Aug 20, 2014 at 12:55 PM, Joel Nothman <[email protected]>
wrote:

> It's actually simpler than that issue, Michael. GridSearchCV (and
> RandomizedSearchCV) has a score method that is unintuitive. It will
> generally not use the metric passed to `scoring`. But yes, in `fit`, it has
> used the correct scoring metric.
>
> IMO, it should be changed. But it's been this way since version 0.5, in
> which grid search appeared:
> https://github.com/scikit-learn/scikit-learn/commit/e790866c81d6fee330e143558baf7e6b945d3180
>
> Joel
>
>
> On 20 August 2014 20:32, Michael Eickenberg <[email protected]>
> wrote:
>
>> Hi Federico,
>>
>> I recall an issue at the beginning of this year stating that internally
>> GridSearchCV sometimes defaulted to accuracy scoring even though a
>> different scorer was passed. I am not sure though if this is what you have
>> encountered.
>>
>> There is some code in
>> https://github.com/scikit-learn/scikit-learn/issues/2853 showing how it
>> happens. Maybe you can do a similar check for your case?
>>
>> And referenced within that issue is
>> https://github.com/scikit-learn/scikit-learn/pull/2019
>>
>> Michael
>>
>>
>>
>> On Wed, Aug 20, 2014 at 12:19 PM, federico vaggi <
>> [email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm working on a classification task with ExtraTreesClassifier that
>>> deals with somewhat imbalanced datasets, so instead of using accuracy as a
>>> metric, I'm using MCC.
>>>
>>> However - there's some behaviour which doesn't make perfect sense to me,
>>> for example - after doing this:
>>>
>>> score_func = make_scorer(matthews_corrcoef)
>>> tuned_parameters = [{'n_estimators': [250, 500, 1000, 1500],
>>>                      'min_samples_split': [1, 4, 8]}]
>>> clf = ExtraTreesClassifier()
>>> meta_clf = GridSearchCV(clf, tuned_parameters, cv=5, n_jobs = 2,
>>>                         scoring = score_func)
>>> meta_clf.fit(X_train, y_train)
>>>
>>> I look at the performance of the classifier, and see:
>>>
>>> y_pred = meta_clf.best_estimator_.predict(X_test)
>>> print matthews_corrcoef(y_test, y_pred)
>>> print score_func(meta_clf.best_estimator_, X_test, y_test)
>>> print meta_clf.score(X_test, y_test)
>>> print meta_clf.best_estimator_.score(X_test, y_test)
>>>
>>> 0.399796217794 # MCC, correct
>>> 0.399796217794 # MCC, correct
>>> 0.736672629696 # Accuracy
>>> 0.736672629696 # Accuracy
>>>
>>>
>>> Is that reasonable?  I would have expected meta_clf.score to use the
>>> MCC.  Does it use the MCC internally when optimizing the hyper-parameters
>>> at least?
>>>
>>> Federico
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

Reply via email to