On 06/01/2013 08:30 PM, Christian Jauvin wrote: > Hi, > > I asked a (perhaps too vague?) question about the use of Random > Forests with a mix of categorical and lexical features on two ML > forums (stats.SE and MetaOp), but since it has received no attention, > I figured that it might work better on this list (I'm using sklearn's > RF of course): > > "I'm working on a binary classification problem for which the dataset > is mostly composed of categorical features, but also a few lexical > ones (i.e. article titles and abstracts). I'm experimenting with > Random Forests, and my current strategy is to build the training set > by appending the k best lexical features (chosen with univariate > feature selection, and weighted with tf-idf) to the full set of > categorical features. This works reasonably well, but as I cannot find > explicit references to such a strategy of using hybrid features for > RF, I have doubts about my approach: does it make sense? Am I > "diluting" the power of the RF by doing so, and should I rather try to > combine two classifiers specializing on both types of features?" > I think it is ok, though I think people rarely use RF on bag-of-word features. Btw, you do encode the categorical variables using one-hot, right? The sklearn trees don't really support categorical variables. An alternative approach would be to run a linear classifier on all tfidf features and feed the output together with the other variables to the RF.
Hth, Andy ps: try stackoverflow with scikit-learn tag next time. ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general