Hi,

I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on this list (I'm using sklearn's
RF of course):

"I'm working on a binary classification problem for which the dataset
is mostly composed of categorical features, but also a few lexical
ones (i.e. article titles and abstracts). I'm experimenting with
Random Forests, and my current strategy is to build the training set
by appending the k best lexical features (chosen with univariate
feature selection, and weighted with tf-idf) to the full set of
categorical features. This works reasonably well, but as I cannot find
explicit references to such a strategy of using hybrid features for
RF, I have doubts about my approach: does it make sense? Am I
"diluting" the power of the RF by doing so, and should I rather try to
combine two classifiers specializing on both types of features?"

http://stats.stackexchange.com/questions/60162/random-forest-with-a-mix-of-categorical-and-lexical-features

Thanks,

Christian

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to