Hi all,
After reading some papers on (approximate) polynomial kernels for NLP
applications, I got curious and decided to do some quick experiments.
I modified the 20 newsgroups example to benchmark vanilla SVC instead
of LinearSVC with linear, quadratic and cubic kernels. I was quite
surprised at the results.
For reference, LinearSVC(C=1000, loss=l2, penalty=l2) obtains an
F1-score of 0.896 on the default set of four document classes.
I replaced this with
params = {'C': [.01, .1, 1, 10, 100, 1000]}
GridSearchCV(SVC(kernel='linear'), params, score_func=metrics.f1_score)
and got an F1-score of only 0.131, and exactly the same for
params = {'degree': [2, 3], 'C': [.01, .1, 1, 10, 100, 1000]}
GridSearchCV(SVC(kernel='poly'), params, cv=10, score_func=metrics.f1_score)
The 0.131 figure is exactly what you get from a one-rule classifier
that just predicts the most common class. I could improve this to
0.287 for kernel=linear by using higher values of C, but not for
kernel=poly.
Then, I tried using chi² to select the k=100 best features and
expanded the C grid axis to include [1e4, 1e5, 1e6, 1e7, 1e8, 1e9]. I
got 0.772 F1 for kernel=linear, slightly better than LinearSVC @0.771,
but still 0.131 for the polynomial kernel!
I verified that the features coming from text.Vectorizer are
normalized; they're all in the range [-1, 1].
I'm sure I did something wrong, but can anyone tell me what? Is my C
value still not high enough? Am I missing the magic parameter that
will make this work?
TIA,
Lars
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general