2012/12/4 Philipp Singer <kill...@gmail.com>: > Thanks to Andreas I got it working now using a custom estimator for the > pipeline. > > I am still struggling a bit to combine textual features (e.g., tfidf) > with other features that work well on their own. > > At the moment, I am just concatanating them --> enlarging the vector. > The problem now is, that the few added features do not seem to have any > impact on the classifier, as the accuracy is exactly the same as if I > would use only textual features. They just seem to be overwhelmed by the > huge amount of textual features. > > Is there now some clever way of combining both feature types? Like > probably using composite/multiple kernels? > > Maybe someone has an idea about that. This is actually a thing, I am > struggling for a bit now and still haven't found a clever way of solving it.
It's probably better to train a linear classifier on the text features alone and a second (potentially non linear classifier such as GBRT or ExtraTrees) on the predict_proba outcome of the text classifier + your additional low dim features. This is some kind of stacking method (a sort of ensemble method). It should make the text features not overwhelm the final classifier if the other features are informative. Alternatively, you could bin the other features (e.g. 3 or 5 bins for continous features) and computed a hashed-cross product of the text features + the binned features treated as boolean features and fit a linear model on this. We don't have any helper to compute binned features nor hashed cross products yet but you can use the feature hasher as a starting point: http://scikit-learn.org/dev/modules/feature_extraction.html https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/hashing.py -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general