Hi, Michael: Thank you for your comment. Actually, I use one-hot coding strategy but I don't think it satisfactory. I do hope that Scikit-learn developer can improve it because it is a big issue for decision tree method.
On Wed, Oct 29, 2014 at 12:18 PM, Michael Eickenberg < [email protected]> wrote: > Hi Xin, > > as far as I know the only ways of working around this problem right now > are one-hot encoding or using integer numbers to represent your classes. > The former augments your feature space but can cause biases if different > categorical features can take different numbers of values (leading to more > columns for one feature, leading to it being selected disproportionately > often). The latter avoids the problem of the former, but since decisions > are binary, the trees can only distinguish integer features from a certain > depth onwards. > > I cannot comment on future developments, but I have the feeling that > better treatment of categorical features may be on the plan :) > > Michael > > On Wed, Oct 29, 2014 at 5:09 PM, Xin Shuai <[email protected]> wrote: > >> Hi,: >> I'm a fan of Scikit-learn and it is my favorite ML package. >> However, I found this package DOES NOT deal with categorical variable >> for tree-based method. So I need to convert categorical variable into dummy >> variable before I can use tree method. Actually, this is counterintuitive >> to the original decision tree method. >> Any improvement on that? >> -- >> Xin(David) Shuai >> PhD of Complex System in School of Informatics & Computing >> Indiana University Bloomington >> 812-606-8969 >> >> The way to success is to do as much as important things, and as less as >> unimportant things, as you can... >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- Xin(David) Shuai PhD of Complex System in School of Informatics & Computing Indiana University Bloomington 812-606-8969 The way to success is to do as much as important things, and as less as unimportant things, as you can...
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
