On 06/03/2013 09:15 AM, Peter Prettenhofer wrote:
> Our decision tree implementation only supports numerical splits; i.e. 
> if tests val < threshold .
>
> Categorical features need to be encoded properly. I recommend one-hot 
> encoding for features with small cardinality (e.g. < 50) and ordinal 
> encoding (simply assign each category an integer value) for features 
> with large cardinality.
This seems to be the opposite of what the kaggle tutorial suggests, 
right? They suggest ordinal encoding for small cardinality, but don't 
suggest
any other way.

Your and Gilles' feedback make me think we should tell the kaggle people 
to change their tutorial....

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to