On 06/01/2013 08:30 PM, Christian Jauvin wrote:
> Hi,
>
> I asked a (perhaps too vague?) question about the use of Random
> Forests with a mix of categorical and lexical features on two ML
> forums (stats.SE and MetaOp), but since it has received no attention,
> I figured that it might work better on this list (I'm using sklearn's
> RF of course):
>
> "I'm working on a binary classification problem for which the dataset
> is mostly composed of categorical features, but also a few lexical
> ones (i.e. article titles and abstracts). I'm experimenting with
> Random Forests, and my current strategy is to build the training set
> by appending the k best lexical features (chosen with univariate
> feature selection, and weighted with tf-idf) to the full set of
> categorical features. This works reasonably well, but as I cannot find
> explicit references to such a strategy of using hybrid features for
> RF, I have doubts about my approach: does it make sense? Am I
> "diluting" the power of the RF by doing so, and should I rather try to
> combine two classifiers specializing on both types of features?"
>
I think it is ok, though I think people rarely use RF on bag-of-word 
features.
Btw, you do encode the categorical variables using one-hot, right?
The sklearn trees don't really support categorical variables.
An alternative approach would be to run a linear classifier on all tfidf 
features
and feed the output together with the other variables to the RF.

Hth,
Andy

ps: try stackoverflow with scikit-learn tag next time.

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to