Dear scikit-learn Community, I'd like to create a custom scorer to be used with GroupKFold and GridSearchCV. The issue is that I need to use the grouping information also "inside" the custom scorer, to compute the desired metric. How to do that?
Here follows a simplified example to explain in detail the issue. Given this basic and common scenario: --- X = <feature values> y = <labels> groups = np.array([0,0,1,1,1,1,2,2,3,3,3,...]) parameters = {'n_estimators': [10,100,1000], 'max_depth': [5,10,15]} gkf = GroupKFold(n_splits=3) clf = GridSearchCV(RandomForestClassifier(), parameters, scoring=my_scorer) --- how to create my_scorer so that it computes, let's say, the "average accuracy across groups"? Meaning that my_scorer should know not only y_true and y_pred but also their grouping structure. In principle, it should be something like in the following snippet, which needs the group information "for the specific slice of data evaluated" (which I call y_groups below)... a piece of information that I don't know how to propagate there: --- def my_score(y_true, y_pred, y_groups): for group in np.unique(y_groups): idx = y_group==group result.append((y_true[idx] == y_pred[idx]).mean()) return np.mean(result) my_scorer = make_scorer(my_score) --- How can I make a custom scorer that uses inside the group information for the specific predictions to be scored? Thanks in advance for your help, Emanuele
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn