Hi everyone,

I'm working on a classification task with ExtraTreesClassifier that deals
with somewhat imbalanced datasets, so instead of using accuracy as a
metric, I'm using MCC.

However - there's some behaviour which doesn't make perfect sense to me,
for example - after doing this:

score_func = make_scorer(matthews_corrcoef)
tuned_parameters = [{'n_estimators': [250, 500, 1000, 1500],
                     'min_samples_split': [1, 4, 8]}]
clf = ExtraTreesClassifier()
meta_clf = GridSearchCV(clf, tuned_parameters, cv=5, n_jobs = 2,
                        scoring = score_func)
meta_clf.fit(X_train, y_train)

I look at the performance of the classifier, and see:

y_pred = meta_clf.best_estimator_.predict(X_test)
print matthews_corrcoef(y_test, y_pred)
print score_func(meta_clf.best_estimator_, X_test, y_test)
print meta_clf.score(X_test, y_test)
print meta_clf.best_estimator_.score(X_test, y_test)

0.399796217794 # MCC, correct
0.399796217794 # MCC, correct
0.736672629696 # Accuracy
0.736672629696 # Accuracy


Is that reasonable?  I would have expected meta_clf.score to use the MCC.
 Does it use the MCC internally when optimizing the hyper-parameters at
least?

Federico
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to