Hi, I asked a (perhaps too vague?) question about the use of Random Forests with a mix of categorical and lexical features on two ML forums (stats.SE and MetaOp), but since it has received no attention, I figured that it might work better on this list (I'm using sklearn's RF of course):
"I'm working on a binary classification problem for which the dataset is mostly composed of categorical features, but also a few lexical ones (i.e. article titles and abstracts). I'm experimenting with Random Forests, and my current strategy is to build the training set by appending the k best lexical features (chosen with univariate feature selection, and weighted with tf-idf) to the full set of categorical features. This works reasonably well, but as I cannot find explicit references to such a strategy of using hybrid features for RF, I have doubts about my approach: does it make sense? Am I "diluting" the power of the RF by doing so, and should I rather try to combine two classifiers specializing on both types of features?" http://stats.stackexchange.com/questions/60162/random-forest-with-a-mix-of-categorical-and-lexical-features Thanks, Christian ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general