On 06/04/2013 05:55 AM, Christian Jauvin wrote: > Many thanks to all for your help and detailed answers, I really appreciate it. > > So I wanted to test the discussion's takeaway, namely, what Peter > suggested: one-hot encode the categorical features with small > cardinality, and leave the others in their ordinal form. > > So from the same dataset I mentioned earlier, I picked another subset > of 5 features, this time all with small cardinality (5, 5, 6, 11 and > 12), and all purely categorical (i.e. clearly not ordered). The > one-hot encoding should clearly help with such a configuration. > > But again, what I observe when I pit the fully one-hot encoded RF > (21000 x 39) against the ordinal-encoded one (21000 x 5) is that > they're behaving almost the same, in terms of accuracy and AUC, with > 10-fold cross-validation. In fact, the ordinal version even seems to > perform very slightly better, although I don't think it's significant. > > I really believe in your expertise more than in my results, so what > could I be doing wrong? > > Did you grid-search parameters again?
------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general