With as few data points, there is a huge uncertainty in the estimation of the prediction accuracy with cross-validation. This isn't a problem of the method, is it a basic limitation of the small amount of data. I've written a paper on this problem is the specific context of neuroimaging: https://www.sciencedirect.com/science/article/pii/S1053811917305311 (preprint: https://hal.inria.fr/hal-01545002/).
I except that what you are seing in sampling noise: the result has confidence intervals in large than 10%. Gaël On Tue, Dec 19, 2017 at 04:27:53PM -0500, Taylor, Johnmark wrote: > Hello, > I am a researcher in fMRI and am using SVMs to analyze brain data. I am doing > decoding between two classes, each of which has 24 exemplars per class. I am > comparing two different methods of cross-validation for my data: in one, I am > training on 23 exemplars from each class, and testing on the remaining example > from each class, and in the other, I am training on 22 exemplars from each > class, and testing on the remaining two from each class (in case it matters, > the data is structured into different neuroimaging "runs", with each "run" > containing several "blocks"; the first cross-validation method is leaving out > one block at a time, the second is leaving out one run at a time). > Now, I would've thought that these two CV methods would be very similar, since > the vast majority of the training data is the same; the only difference is in > adding two additional points. However, they are yielding very different > results: training on 23 per class is yielding 60% decoding accuracy (averaged > across several subjects, and statistically significantly greater than chance), > training on 22 per class is yielding chance (50%) decoding. Leaving aside the > particulars of fMRI in this case: is it unusual for single points (amounting > to > less than 5% of the data) to have such a big influence on SVM decoding? I am > using a cost parameter of C=1. I must say it is counterintuitive to me that > just a couple points out of two dozen could make such a big difference. > Thank you very much, and cheers, > JohnMark > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn