Hi Christian,
I believe more in my results than in my expertise - and so should you :-) **
I think you misunderstood me: I did not claim that one-hot encoded
categorical features give better results than ordinal encoded ones - I just
claimed that ordinal encoding works as good as one-hot encoded features
given that you have deep enough trees. But I've to warn you: I cannot
support my claim with (sufficient) data. So at the end of the day, its
always best to make an experiment and test it on your problem at hand.
Anyways, I cannot really see your problem (or what you did "wrong"):
according to your description it seems that the specific encoding (one-hot
vs. ordinal) has no influence on the effectiveness of the model (no
significant difference)? This is in line with observations by others.
Andy raised a very important point though: if you optimized your
hyperparameters (tree depth, min split size, ..) on the ordinal encoding
and then tested those hyperparameters on a one-hot encoding you are giving
an advantage to the ordinal encoding.
HTH,
Peter
** that being said, I'm still quite skeptical when it comes to my results
2013/6/4 Christian Jauvin <cjau...@gmail.com>
> Many thanks to all for your help and detailed answers, I really appreciate
> it.
>
> So I wanted to test the discussion's takeaway, namely, what Peter
> suggested: one-hot encode the categorical features with small
> cardinality, and leave the others in their ordinal form.
>
> So from the same dataset I mentioned earlier, I picked another subset
> of 5 features, this time all with small cardinality (5, 5, 6, 11 and
> 12), and all purely categorical (i.e. clearly not ordered). The
> one-hot encoding should clearly help with such a configuration.
>
> But again, what I observe when I pit the fully one-hot encoded RF
> (21000 x 39) against the ordinal-encoded one (21000 x 5) is that
> they're behaving almost the same, in terms of accuracy and AUC, with
> 10-fold cross-validation. In fact, the ordinal version even seems to
> perform very slightly better, although I don't think it's significant.
>
> I really believe in your expertise more than in my results, so what
> could I be doing wrong?
>
>
>
> On 3 June 2013 04:56, Andreas Mueller <amuel...@ais.uni-bonn.de> wrote:
> > On 06/03/2013 09:15 AM, Peter Prettenhofer wrote:
> >> Our decision tree implementation only supports numerical splits; i.e.
> >> if tests val < threshold .
> >>
> >> Categorical features need to be encoded properly. I recommend one-hot
> >> encoding for features with small cardinality (e.g. < 50) and ordinal
> >> encoding (simply assign each category an integer value) for features
> >> with large cardinality.
> > This seems to be the opposite of what the kaggle tutorial suggests,
> > right? They suggest ordinal encoding for small cardinality, but don't
> > suggest
> > any other way.
> >
> > Your and Gilles' feedback make me think we should tell the kaggle people
> > to change their tutorial....
> >
> >
> ------------------------------------------------------------------------------
> > Get 100% visibility into Java/.NET code with AppDynamics Lite
> > It's a free troubleshooting tool designed for production
> > Get down to code-level detail for bottlenecks, with <2% overhead.
> > Download for free and get started troubleshooting in minutes.
> > http://p.sf.net/sfu/appdyn_d2d_ap2
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Peter Prettenhofer
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general