Re: [scikit-learn] creating a custom scoring function for cross-validation of classification

Andy Tue, 01 Nov 2016 07:08:07 -0700

Hi.

If you want to pass a custom scorer, you need to pass the scorer, not astring with the scorer name.

Andy


On 10/31/2016 04:28 PM, Sumeet Sandhu wrote:

Hi,
I've been staring at various doc pages for a while to create a customscorer that uses predict_proba output of a multi-class SGDClassifier :
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score
http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer
I got the impression I could customize the "scoring'' parameter incross_val_score directly, but that didn't work.Then I tried customizing the "score_func" parameter in make_scorer,but that didn't work either. Both errors are ValuErrors :
Traceback (most recent call last):
  File "<pyshell#96>", line 3, in <module>
accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs,trainLabelVecs, cv=10, scoring = 'topNscorer'))File"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cross_validation.py",line 1425, in cross_val_score
    scorer = check_scoring(estimator, scoring=scoring)
File"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py",line 238, in check_scoring
    return get_scorer(scoring)
File"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py",line 197, in get_scorer
    % (scoring, sorted(SCORERS.keys())))
ValueError: 'topNscorer' is not a valid scoring value. Valid optionsare ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1','f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss','mean_absolute_error', 'mean_squared_error', 'median_absolute_error','precision', 'precision_macro', 'precision_micro','precision_samples', 'precision_weighted', 'r2', 'recall','recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted','roc_auc']
At a high level, I want to find out if the true label was found in thetop N multi-class labels coming out of an SGD classifier. Built-inscores like "accuracy" only look at N=1.
Here is the code using make_scorer :
        LRclassifier = SGDClassifier(loss='log')
topNscorer = make_scorer(topNscoring, greater_is_better=True,needs_proba=True)accuracyN = mean(cross_val_score(LRclassifier, Data, Labels,scoring = 'topNscorer'))
Here is the code for the custom scoring function :
def topNscoring(y, yp):
## Inputs y = true label per sample, yp = predict_probaprobabilities of all labels per sample
    N = 5
    foundN = []
    for ii in xrange(0,shape(yp)[0]):
indN = [ w[0] for w insorted(enumerate(list(yp[ii,:])),key=lambda w:w[1],reverse=True)[0:N] ]
        if y[ii] in indN: foundN.append(1)
        else:             foundN.append(0)
    return mean(foundN)

Any help will be greatly appreciated.

best regards,
Sumeet



_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] creating a custom scoring function for cross-validation of classification

Reply via email to