It's actually simpler than that issue, Michael. GridSearchCV (and
RandomizedSearchCV) has a score method that is unintuitive. It will
generally not use the metric passed to `scoring`. But yes, in `fit`, it has
used the correct scoring metric.
IMO, it should be changed. But it's been this way since version 0.5, in
which grid search appeared:
https://github.com/scikit-learn/scikit-learn/commit/e790866c81d6fee330e143558baf7e6b945d3180
Joel
On 20 August 2014 20:32, Michael Eickenberg <[email protected]>
wrote:
> Hi Federico,
>
> I recall an issue at the beginning of this year stating that internally
> GridSearchCV sometimes defaulted to accuracy scoring even though a
> different scorer was passed. I am not sure though if this is what you have
> encountered.
>
> There is some code in
> https://github.com/scikit-learn/scikit-learn/issues/2853 showing how it
> happens. Maybe you can do a similar check for your case?
>
> And referenced within that issue is
> https://github.com/scikit-learn/scikit-learn/pull/2019
>
> Michael
>
>
>
> On Wed, Aug 20, 2014 at 12:19 PM, federico vaggi <[email protected]
> > wrote:
>
>> Hi everyone,
>>
>> I'm working on a classification task with ExtraTreesClassifier that deals
>> with somewhat imbalanced datasets, so instead of using accuracy as a
>> metric, I'm using MCC.
>>
>> However - there's some behaviour which doesn't make perfect sense to me,
>> for example - after doing this:
>>
>> score_func = make_scorer(matthews_corrcoef)
>> tuned_parameters = [{'n_estimators': [250, 500, 1000, 1500],
>> 'min_samples_split': [1, 4, 8]}]
>> clf = ExtraTreesClassifier()
>> meta_clf = GridSearchCV(clf, tuned_parameters, cv=5, n_jobs = 2,
>> scoring = score_func)
>> meta_clf.fit(X_train, y_train)
>>
>> I look at the performance of the classifier, and see:
>>
>> y_pred = meta_clf.best_estimator_.predict(X_test)
>> print matthews_corrcoef(y_test, y_pred)
>> print score_func(meta_clf.best_estimator_, X_test, y_test)
>> print meta_clf.score(X_test, y_test)
>> print meta_clf.best_estimator_.score(X_test, y_test)
>>
>> 0.399796217794 # MCC, correct
>> 0.399796217794 # MCC, correct
>> 0.736672629696 # Accuracy
>> 0.736672629696 # Accuracy
>>
>>
>> Is that reasonable? I would have expected meta_clf.score to use the MCC.
>> Does it use the MCC internally when optimizing the hyper-parameters at
>> least?
>>
>> Federico
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds. Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general