Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

federico vaggi Wed, 20 Aug 2014 04:38:44 -0700

Ok - now it makes sense.  As long as it uses the correct one internally
during 'fit' - that's what matters.


Are there any reasons at all for keeping score function in its current form?




On Wed, Aug 20, 2014 at 12:55 PM, Joel Nothman <[email protected]>
wrote:

> It's actually simpler than that issue, Michael. GridSearchCV (and
> RandomizedSearchCV) has a score method that is unintuitive. It will
> generally not use the metric passed to `scoring`. But yes, in `fit`, it has
> used the correct scoring metric.
>
> IMO, it should be changed. But it's been this way since version 0.5, in
> which grid search appeared:
> https://github.com/scikit-learn/scikit-learn/commit/e790866c81d6fee330e143558baf7e6b945d3180
>
> Joel
>
>
> On 20 August 2014 20:32, Michael Eickenberg <[email protected]>
> wrote:
>
>> Hi Federico,
>>
>> I recall an issue at the beginning of this year stating that internally
>> GridSearchCV sometimes defaulted to accuracy scoring even though a
>> different scorer was passed. I am not sure though if this is what you have
>> encountered.
>>
>> There is some code in
>> https://github.com/scikit-learn/scikit-learn/issues/2853 showing how it
>> happens. Maybe you can do a similar check for your case?
>>
>> And referenced within that issue is
>> https://github.com/scikit-learn/scikit-learn/pull/2019
>>
>> Michael
>>
>>
>>
>> On Wed, Aug 20, 2014 at 12:19 PM, federico vaggi <
>> [email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm working on a classification task with ExtraTreesClassifier that
>>> deals with somewhat imbalanced datasets, so instead of using accuracy as a
>>> metric, I'm using MCC.
>>>
>>> However - there's some behaviour which doesn't make perfect sense to me,
>>> for example - after doing this:
>>>
>>> score_func = make_scorer(matthews_corrcoef)
>>> tuned_parameters = [{'n_estimators': [250, 500, 1000, 1500],
>>>                      'min_samples_split': [1, 4, 8]}]
>>> clf = ExtraTreesClassifier()
>>> meta_clf = GridSearchCV(clf, tuned_parameters, cv=5, n_jobs = 2,
>>>                         scoring = score_func)
>>> meta_clf.fit(X_train, y_train)
>>>
>>> I look at the performance of the classifier, and see:
>>>
>>> y_pred = meta_clf.best_estimator_.predict(X_test)
>>> print matthews_corrcoef(y_test, y_pred)
>>> print score_func(meta_clf.best_estimator_, X_test, y_test)
>>> print meta_clf.score(X_test, y_test)
>>> print meta_clf.best_estimator_.score(X_test, y_test)
>>>
>>> 0.399796217794 # MCC, correct
>>> 0.399796217794 # MCC, correct
>>> 0.736672629696 # Accuracy
>>> 0.736672629696 # Accuracy
>>>
>>>
>>> Is that reasonable?  I would have expected meta_clf.score to use the
>>> MCC.  Does it use the MCC internally when optimizing the hyper-parameters
>>> at least?
>>>
>>> Federico
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

Reply via email to