Re: [Scikit-learn-general] computing cv scores

Gael Varoquaux Sun, 06 Nov 2011 22:31:45 -0800

On Sun, Nov 06, 2011 at 07:51:12PM -0500, Satrajit Ghosh wrote:
> yes in certain settings and depending on the score func, they will be
> identical. i'm using avg_f1_score (based on andreas' email - not yet pushed
> yet) and the results are relatively close between the two methods, but not
> identical.


Ha! With avg_f1_score, I don't think that they will be identitical.


> > I don't really see how a different averaging strategy across folds would
> > improve the variance in these settings, but maybe I am missing something?

> the variance statement was only in relation to selection of 'k' and how
> that relates test set size relative to the training set (all numbers are
> small over here, so variance fluctuates quite a bit). i didn't mean it in
> relation to the averaging across folds.

That's why I would use a ShuffleSplit, rather than a KFold, to increase
the number of folds, while keeping the number of testing samples not too
small. The rule of thumb that is often given is that n_test should be
around .2 n_train for an optimal performance.

G

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] computing cv scores

Reply via email to