Re: [scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

Jeffrey Levesque via scikit-learn Tue, 19 Dec 2017 14:19:06 -0800

Hi guys,
I'm currently developing a web-interface, and programmatic rest-API for 
sklearn. I currently have SVM, and SVR available with some parameters like C, 
and gamma exposed:


- https://github.com/jeff1evesque/machine-learning

I'm working a bit to improve the web-interface at the moment. Since you're 
working with SVM's maybe you'd have time, to try my project, and to provide me 
some feedback? I hope to expand the toolset to things like ensemble learning, 
and a long shot of neural network. But, this may be some time.

Thank you,

Jeff Levesque
https://github.com/jeff1evesque

> On Dec 19, 2017, at 4:37 PM, Jacob Vanderplas <jake...@cs.washington.edu> 
> wrote:
> 
> Hi JohnMark,
> SVMs, by design, are quite sensitive to the addition of single data points – 
> but only if those data points happen to lie near the margin. I wrote about 
> some of those types of details here: 
> https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html
>  
> 
> Hope that helps,
>    Jake
> 
>  Jake VanderPlas
>  Senior Data Science Fellow
>  Director of Open Software
>  University of Washington eScience Institute
> 
>> On Tue, Dec 19, 2017 at 1:27 PM, Taylor, Johnmark 
>> <johnmarktay...@g.harvard.edu> wrote:
>> Hello,
>> 
>> I am a researcher in fMRI and am using SVMs to analyze brain data. I am 
>> doing decoding between two classes, each of which has 24 exemplars per 
>> class. I am comparing two different methods of cross-validation for my data: 
>> in one, I am training on 23 exemplars from each class, and testing on the 
>> remaining example from each class, and in the other, I am training on 22 
>> exemplars from each class, and testing on the remaining two from each class 
>> (in case it matters, the data is structured into different neuroimaging 
>> "runs", with each "run" containing several "blocks"; the first 
>> cross-validation method is leaving out one block at a time, the second is 
>> leaving out one run at a time). 
>> 
>> Now, I would've thought that these two CV methods would be very similar, 
>> since the vast majority of the training data is the same; the only 
>> difference is in adding two additional points. However, they are yielding 
>> very different results: training on 23 per class is yielding 60% decoding 
>> accuracy (averaged across several subjects, and statistically significantly 
>> greater than chance), training on 22 per class is yielding chance (50%) 
>> decoding. Leaving aside the particulars of fMRI in this case: is it unusual 
>> for single points (amounting to less than 5% of the data) to have such a big 
>> influence on SVM decoding? I am using a cost parameter of C=1. I must say it 
>> is counterintuitive to me that just a couple points out of two dozen could 
>> make such a big difference.
>> 
>> Thank you very much, and cheers,
>> 
>> JohnMark
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

Reply via email to