Many thanks to all for your help and detailed answers, I really appreciate it.
So I wanted to test the discussion's takeaway, namely, what Peter suggested: one-hot encode the categorical features with small cardinality, and leave the others in their ordinal form. So from the same dataset I mentioned earlier, I picked another subset of 5 features, this time all with small cardinality (5, 5, 6, 11 and 12), and all purely categorical (i.e. clearly not ordered). The one-hot encoding should clearly help with such a configuration. But again, what I observe when I pit the fully one-hot encoded RF (21000 x 39) against the ordinal-encoded one (21000 x 5) is that they're behaving almost the same, in terms of accuracy and AUC, with 10-fold cross-validation. In fact, the ordinal version even seems to perform very slightly better, although I don't think it's significant. I really believe in your expertise more than in my results, so what could I be doing wrong? On 3 June 2013 04:56, Andreas Mueller <amuel...@ais.uni-bonn.de> wrote: > On 06/03/2013 09:15 AM, Peter Prettenhofer wrote: >> Our decision tree implementation only supports numerical splits; i.e. >> if tests val < threshold . >> >> Categorical features need to be encoded properly. I recommend one-hot >> encoding for features with small cardinality (e.g. < 50) and ordinal >> encoding (simply assign each category an integer value) for features >> with large cardinality. > This seems to be the opposite of what the kaggle tutorial suggests, > right? They suggest ordinal encoding for small cardinality, but don't > suggest > any other way. > > Your and Gilles' feedback make me think we should tell the kaggle people > to change their tutorial.... > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general