Hi, Michael:
 Thank you for your comment. Actually, I use one-hot coding strategy but I
don't think it satisfactory.
I do hope that Scikit-learn developer can improve it because it is a big
issue for decision tree method.

On Wed, Oct 29, 2014 at 12:18 PM, Michael Eickenberg <
[email protected]> wrote:

> Hi Xin,
>
> as far as I know the only ways of working around this problem right now
> are one-hot encoding or using integer numbers to represent your classes.
> The former augments your feature space but can cause biases if different
> categorical features can take different numbers of values (leading to more
> columns for one feature, leading to it being selected disproportionately
> often). The latter avoids the problem of the former, but since decisions
> are binary, the trees can only distinguish integer features from a certain
> depth onwards.
>
> I cannot comment on future developments, but I have the feeling that
> better treatment of categorical features may be on the plan :)
>
> Michael
>
> On Wed, Oct 29, 2014 at 5:09 PM, Xin Shuai <[email protected]> wrote:
>
>> Hi,:
>>  I'm a fan of Scikit-learn and it is my favorite ML package.
>>  However, I found this package DOES NOT deal with categorical variable
>> for tree-based method. So I need to convert categorical variable into dummy
>> variable before I can use tree method. Actually, this is counterintuitive
>> to the original decision tree method.
>> Any improvement on that?
>> --
>> Xin(David) Shuai
>> PhD of Complex System in School of Informatics & Computing
>> Indiana University Bloomington
>> 812-606-8969
>>
>> The way to success is to do as much as important things, and as less as
>> unimportant things, as you can...
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Xin(David) Shuai
PhD of Complex System in School of Informatics & Computing
Indiana University Bloomington
812-606-8969

The way to success is to do as much as important things, and as less as
unimportant things, as you can...
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to