The 0.13.X version of scikit-learn doesn't support grid search
with an aux score. In the master branch, this is possible
thanks to Andreas (see https://github.com/scikit-learn/scikit-learn/pull/1381)
However, there is still work in progress on this subject
see https://github.com/scikit-learn/scikit-learn/pull/2123
The strategy that you described is referred as pooling (concatenation of
prediction) and as averaging (average over folds).
See for instance for more information
- Brian J Parker, Simon Gunter, and Justin Bedo. Stratification bias in low
signal microarray studies. BMC Bioinformatics, 8:326, 2007.
- A comparison of AUC estimators in small-sample studies by Airola, et al
Hope it helps,
Arnaud
On 08 Jul 2013, at 20:50, Josh Wasserstein <ribonucle...@gmail.com> wrote:
> The following call runs into an error
>
> clf = GridSearchCV(SVC(C=1), tuned_parameters,
> score_func=sklearn.metrics.auc_score,verbose=2, n_jobs=1, cv=loo)
> clf.fit(X, y)
>
> with:
> /opt/python/virtualenvs/work/lib/python2.7/site-packages/skle
> arn/metrics/metrics.pyc in auc(x, y, reorder)
> 64 # XXX: Consider using ``scipy.integrate`` instead, or moving t
> o
> 65 # ``utils.extmath``
> ---> 66 x, y = check_arrays(x, y)
> 67 if x.shape[0] < 2:
> 68 raise ValueError('At least 2 points are needed to compute'
>
> even though X and y hold more than 100 examples with 20+ positives.
>
> It looks sklearn cannot obtain AUC scores with LOO since this requires at
> least two points (and probably a mix of positives and negatives), and in LOO
> each fold only has one point.
>
> However, one way to circumvent this limitation could be to concatenate the
> prediction of each fold in LOO (concatenate all predictions), and only then
> measure AUC.
>
> In fact, this is a whole different way of evaluating the performance of a
> model with cross validation. Rather than averaging the scores across folds,
> one could always concatenate the prediction results and measure the
> performance. This way score functions can always be measured directly on the
> prediction of the full dataset.
>
> This also brings interesting an interesting ML question since mean(scores)
> != score(concatenation))
>
> Is there anything wrong with this approach?
>
> Thanks,
>
> Josh
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general