Am 04.12.2012 11:45, schrieb Philipp Singer:
>> It's probably better to train a linear classifier on the text features
>> alone and a second (potentially non linear classifier such as GBRT or
>> ExtraTrees) on the predict_proba outcome of the text classifier + your
>> additional low dim features.
>>
>> This is some kind of stacking method (a sort of ensemble method). It
>> should make the text features not overwhelm the final classifier if
>> the other features are informative.
> Hey Olivier!
>
> Thanks for the hints. I just tried it, but unfortunately the results are
> much worse than just using my textual features alone.
>
> just to be sure if I am doing it right:
>
> At first I create my textual features using a vectorizer. Then I fit a
> linear SVC on these features (training data ofc) and use predict_proba
> for my training samples again resulting in a probability distribution of
> dimension 7 (I have 7 classes).
>
> Then I append my additional features (those are 15) and fit another
> classifier on the new data. (I tried several scaling/normalizing ideas
> without improvement)
>
> I do the same procedure for test data. (Btw I do cross val)
>
> While I get 0.85 f1 score for just using textual data the combined
> approach results in only 0.4.
>
I would guess this is overfitting. Is that correct? (test on training 
set to see)

  How many samples do you have?
Could you do "leave one sample out" for getting the prediction 
probabilities?
I.e. train one classifier per example, leave that out of the training 
and compute
the probabilities for that single example, than retrain for the next etc.
See Elements of Statistical Learning, chapter about stacking.

Best,
Andy

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to