Hi, you can try to use CV with k-fold partition, so you can see with all training/test combination (generally 90%/10% or 80/20). If you have very different results, probably that you obtain overfitting.
Inviato da iPhone > Il giorno 19 dic 2017, alle ore 22:37, Jacob Vanderplas > <jake...@cs.washington.edu> ha scritto: > > Hi JohnMark, > SVMs, by design, are quite sensitive to the addition of single data points – > but only if those data points happen to lie near the margin. I wrote about > some of those types of details here: > https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html > > > Hope that helps, > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Open Software > University of Washington eScience Institute > >> On Tue, Dec 19, 2017 at 1:27 PM, Taylor, Johnmark >> <johnmarktay...@g.harvard.edu> wrote: >> Hello, >> >> I am a researcher in fMRI and am using SVMs to analyze brain data. I am >> doing decoding between two classes, each of which has 24 exemplars per >> class. I am comparing two different methods of cross-validation for my data: >> in one, I am training on 23 exemplars from each class, and testing on the >> remaining example from each class, and in the other, I am training on 22 >> exemplars from each class, and testing on the remaining two from each class >> (in case it matters, the data is structured into different neuroimaging >> "runs", with each "run" containing several "blocks"; the first >> cross-validation method is leaving out one block at a time, the second is >> leaving out one run at a time). >> >> Now, I would've thought that these two CV methods would be very similar, >> since the vast majority of the training data is the same; the only >> difference is in adding two additional points. However, they are yielding >> very different results: training on 23 per class is yielding 60% decoding >> accuracy (averaged across several subjects, and statistically significantly >> greater than chance), training on 22 per class is yielding chance (50%) >> decoding. Leaving aside the particulars of fMRI in this case: is it unusual >> for single points (amounting to less than 5% of the data) to have such a big >> influence on SVM decoding? I am using a cost parameter of C=1. I must say it >> is counterintuitive to me that just a couple points out of two dozen could >> make such a big difference. >> >> Thank you very much, and cheers, >> >> JohnMark >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn