On Sun, Nov 06, 2011 at 07:51:12PM -0500, Satrajit Ghosh wrote: > yes in certain settings and depending on the score func, they will be > identical. i'm using avg_f1_score (based on andreas' email - not yet pushed > yet) and the results are relatively close between the two methods, but not > identical.
Ha! With avg_f1_score, I don't think that they will be identitical. > > I don't really see how a different averaging strategy across folds would > > improve the variance in these settings, but maybe I am missing something? > the variance statement was only in relation to selection of 'k' and how > that relates test set size relative to the training set (all numbers are > small over here, so variance fluctuates quite a bit). i didn't mean it in > relation to the averaging across folds. That's why I would use a ShuffleSplit, rather than a KFold, to increase the number of folds, while keeping the number of testing samples not too small. The rule of thumb that is often given is that n_test should be around .2 n_train for an optimal performance. G ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
