2012/12/4 Philipp Singer <kill...@gmail.com>:
> Thanks to Andreas I got it working now using a custom estimator for the
> pipeline.
>
> I am still struggling a bit to combine textual features (e.g., tfidf)
> with other features that work well on their own.
>
> At the moment, I am just concatanating them --> enlarging the vector.
> The problem now is, that the few added features do not seem to have any
> impact on the classifier, as the accuracy is exactly the same as if I
> would use only textual features. They just seem to be overwhelmed by the
> huge amount of textual features.
>
> Is there now some clever way of combining both feature types? Like
> probably using composite/multiple kernels?
>
> Maybe someone has an idea about that. This is actually a thing, I am
> struggling for a bit now and still haven't found a clever way of solving it.

It's probably better to train a linear classifier on the text features
alone and a second (potentially non linear classifier such as GBRT or
ExtraTrees) on the predict_proba outcome of the text classifier + your
additional low dim features.

This is some kind of stacking method (a sort of ensemble method). It
should make the text features not overwhelm the final classifier if
the other features are informative.

Alternatively, you could bin the other features (e.g. 3 or 5 bins for
continous features) and computed a hashed-cross product of the text
features + the binned features treated as boolean features and fit a
linear model on this.

We don't have any helper to compute binned features nor hashed cross
products yet but you can use the feature hasher as a starting point:

http://scikit-learn.org/dev/modules/feature_extraction.html
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/hashing.py

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to