Many thanks to all for your help and detailed answers, I really appreciate it.

So I wanted to test the discussion's takeaway, namely, what Peter
suggested: one-hot encode the categorical features with small
cardinality, and leave the others in their ordinal form.

So from the same dataset I mentioned earlier, I picked another subset
of 5 features, this time all with small cardinality (5, 5, 6, 11 and
12), and all purely categorical (i.e. clearly not ordered). The
one-hot encoding should clearly help with such a configuration.

But again, what I observe when I pit the fully one-hot encoded RF
(21000 x 39) against the ordinal-encoded one (21000 x 5) is that
they're behaving almost the same, in terms of accuracy and AUC, with
10-fold cross-validation. In fact, the ordinal version even seems to
perform very slightly better, although I don't think it's significant.

I really believe in your expertise more than in my results, so what
could I be doing wrong?



On 3 June 2013 04:56, Andreas Mueller <amuel...@ais.uni-bonn.de> wrote:
> On 06/03/2013 09:15 AM, Peter Prettenhofer wrote:
>> Our decision tree implementation only supports numerical splits; i.e.
>> if tests val < threshold .
>>
>> Categorical features need to be encoded properly. I recommend one-hot
>> encoding for features with small cardinality (e.g. < 50) and ordinal
>> encoding (simply assign each category an integer value) for features
>> with large cardinality.
> This seems to be the opposite of what the kaggle tutorial suggests,
> right? They suggest ordinal encoding for small cardinality, but don't
> suggest
> any other way.
>
> Your and Gilles' feedback make me think we should tell the kaggle people
> to change their tutorial....
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to