Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Andreas Mueller Tue, 04 Jun 2013 03:00:50 -0700

On 06/04/2013 05:55 AM, Christian Jauvin wrote:
> Many thanks to all for your help and detailed answers, I really appreciate it.
>
> So I wanted to test the discussion's takeaway, namely, what Peter
> suggested: one-hot encode the categorical features with small
> cardinality, and leave the others in their ordinal form.
>
> So from the same dataset I mentioned earlier, I picked another subset
> of 5 features, this time all with small cardinality (5, 5, 6, 11 and
> 12), and all purely categorical (i.e. clearly not ordered). The
> one-hot encoding should clearly help with such a configuration.
>
> But again, what I observe when I pit the fully one-hot encoded RF
> (21000 x 39) against the ordinal-encoded one (21000 x 5) is that
> they're behaving almost the same, in terms of accuracy and AUC, with
> 10-fold cross-validation. In fact, the ordinal version even seems to
> perform very slightly better, although I don't think it's significant.
>
> I really believe in your expertise more than in my results, so what
> could I be doing wrong?
>
>
Did you grid-search parameters again?


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Reply via email to