[scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

Taylor, Johnmark Tue, 19 Dec 2017 13:30:02 -0800

Hello,

I am a researcher in fMRI and am using SVMs to analyze brain data. I am
doing decoding between two classes, each of which has 24 exemplars per
class. I am comparing two different methods of cross-validation for my
data: in one, I am training on 23 exemplars from each class, and testing on
the remaining example from each class, and in the other, I am training on
22 exemplars from each class, and testing on the remaining two from each
class (in case it matters, the data is structured into different
neuroimaging "runs", with each "run" containing several "blocks"; the first
cross-validation method is leaving out one block at a time, the second is
leaving out one run at a time).


Now, I would've thought that these two CV methods would be very similar,
since the vast majority of the training data is the same; the only
difference is in adding two additional points. However, they are yielding
very different results: training on 23 per class is yielding 60% decoding
accuracy (averaged across several subjects, and statistically significantly
greater than chance), training on 22 per class is yielding chance (50%)
decoding. Leaving aside the particulars of fMRI in this case: is it unusual
for single points (amounting to less than 5% of the data) to have such a
big influence on SVM decoding? I am using a cost parameter of C=1. I must
say it is counterintuitive to me that just a couple points out of two dozen
could make such a big difference.

Thank you very much, and cheers,

JohnMark

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

Reply via email to