2012/12/4 Philipp Singer <kill...@gmail.com>:
>> It's probably better to train a linear classifier on the text features
>> alone and a second (potentially non linear classifier such as GBRT or
>> ExtraTrees) on the predict_proba outcome of the text classifier + your
>> additional low dim features.
>> This is some kind of stacking method (a sort of ensemble method). It
>> should make the text features not overwhelm the final classifier if
>> the other features are informative.
> Hey Olivier!
> Thanks for the hints. I just tried it, but unfortunately the results are
> much worse than just using my textual features alone.
> just to be sure if I am doing it right:
> At first I create my textual features using a vectorizer. Then I fit a
> linear SVC on these features (training data ofc) and use predict_proba
> for my training samples again resulting in a probability distribution of
> dimension 7 (I have 7 classes).
> Then I append my additional features (those are 15) and fit another
> classifier on the new data. (I tried several scaling/normalizing ideas
> without improvement)
> I do the same procedure for test data. (Btw I do cross val)
> While I get 0.85 f1 score for just using textual data the combined
> approach results in only 0.4.

Have you scaled your additional features to the [0-1] range as the
probability features from the text classifier?

If you do a full grid search of the SVC hyperparameters (e.g. kernel
linear or rbf and C + gamma for RBF only) there is no reason that the
stacked model could be worth than the original text classifier (unless
you have very few samples and that the additional features are pure

http://twitter.com/ogrisel - http://github.com/ogrisel

LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
Scikit-learn-general mailing list

Reply via email to