ahha - thanks Andy ! that works... On Tue, Nov 1, 2016 at 7:05 AM, Andy <[email protected]> wrote:
> Hi. > If you want to pass a custom scorer, you need to pass the scorer, not a > string with the scorer name. > Andy > > > On 10/31/2016 04:28 PM, Sumeet Sandhu wrote: > > Hi, > > I've been staring at various doc pages for a while to create a custom > scorer that uses predict_proba output of a multi-class SGDClassifier : > http://scikit-learn.org/stable/modules/generated/ > sklearn.model_selection.cross_val_score.html#sklearn.model_ > selection.cross_val_score > http://scikit-learn.org/stable/modules/model_evaluation.html#scoring- > parameter > http://scikit-learn.org/stable/modules/generated/ > sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer > > I got the impression I could customize the "scoring'' parameter in > cross_val_score directly, but that didn't work. > Then I tried customizing the "score_func" parameter in make_scorer, but > that didn't work either. Both errors are ValuErrors : > > Traceback (most recent call last): > File "<pyshell#96>", line 3, in <module> > accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs, > trainLabelVecs, cv=10, scoring = 'topNscorer')) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ > python2.7/site-packages/sklearn/cross_validation.py", line 1425, in > cross_val_score > scorer = check_scoring(estimator, scoring=scoring) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ > python2.7/site-packages/sklearn/metrics/scorer.py", line 238, in > check_scoring > return get_scorer(scoring) > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ > python2.7/site-packages/sklearn/metrics/scorer.py", line 197, in > get_scorer > % (scoring, sorted(SCORERS.keys()))) > ValueError: 'topNscorer' is not a valid scoring value. Valid options are > ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', > 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', > 'mean_squared_error', 'median_absolute_error', 'precision', > 'precision_macro', 'precision_micro', 'precision_samples', > 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', > 'recall_samples', 'recall_weighted', 'roc_auc'] > > At a high level, I want to find out if the true label was found in the top > N multi-class labels coming out of an SGD classifier. Built-in scores like > "accuracy" only look at N=1. > > Here is the code using make_scorer : > LRclassifier = SGDClassifier(loss='log') > topNscorer = make_scorer(topNscoring, greater_is_better=True, > needs_proba=True) > accuracyN = mean(cross_val_score(LRclassifier, Data, Labels, > scoring = 'topNscorer')) > > Here is the code for the custom scoring function : > def topNscoring(y, yp): > ## Inputs y = true label per sample, yp = predict_proba probabilities > of all labels per sample > N = 5 > foundN = [] > for ii in xrange(0,shape(yp)[0]): > indN = [ w[0] for w in sorted(enumerate(list(yp[ii,:])),key=lambda > w:w[1],reverse=True)[0:N] ] > if y[ii] in indN: foundN.append(1) > else: foundN.append(0) > return mean(foundN) > > Any help will be greatly appreciated. > > best regards, > Sumeet > > > > > _______________________________________________ > scikit-learn mailing > [email protected]https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
