Hi all,

After reading some papers on (approximate) polynomial kernels for NLP
applications, I got curious and decided to do some quick experiments.
I modified the 20 newsgroups example to benchmark vanilla SVC instead
of LinearSVC with linear, quadratic and cubic kernels. I was quite
surprised at the results.

For reference, LinearSVC(C=1000, loss=l2, penalty=l2) obtains an
F1-score of 0.896 on the default set of four document classes.

I replaced this with

    params = {'C': [.01, .1, 1, 10, 100, 1000]}
    GridSearchCV(SVC(kernel='linear'), params, score_func=metrics.f1_score)

and got an F1-score of only 0.131, and exactly the same for

    params = {'degree': [2, 3], 'C': [.01, .1, 1, 10, 100, 1000]}
    GridSearchCV(SVC(kernel='poly'), params, cv=10, score_func=metrics.f1_score)

The 0.131 figure is exactly what you get from a one-rule classifier
that just predicts the most common class. I could improve this to
0.287 for kernel=linear by using higher values of C, but not for
kernel=poly.

Then, I tried using chi² to select the k=100 best features and
expanded the C grid axis to include [1e4, 1e5, 1e6, 1e7, 1e8, 1e9]. I
got 0.772 F1 for kernel=linear, slightly better than LinearSVC @0.771,
but still 0.131 for the polynomial kernel!

I verified that the features coming from text.Vectorizer are
normalized; they're all in the range [-1, 1].

I'm sure I did something wrong, but can anyone tell me what? Is my C
value still not high enough? Am I missing the magic parameter that
will make this work?

TIA,
Lars

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to