Hi, if you have the category "car" as shown in your example, this would effectively be something like
BMW=0 Toyota=1 Audi=2 Sure, the algorithm will execute just fine on the feature column with values in {0, 1, 2}. However, the problem is that it will come up with binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous variable. What you can do is to encode this feature via one-hot encoding -- basically extend it into 2 (or 3) binary variables. This has it's own problems (if you have a feature with many possible values, you will end up with a large number of binary variables, and they may dominate in the resulting tree over other feature variables). In any case, I guess this is what > "scikit-learn implementation does not support categorical variables for now". means ;). Best, Sebastian > On Sep 13, 2019, at 9:38 PM, C W <tmrs...@gmail.com> wrote: > > Hello all, > I'm very confused. Can the decision tree module handle both continuous and > categorical features in the dataset? In this case, it's just CART > (Classification and Regression Trees). > > For example, > Gender Age Income Car Attendance > Male 30 10000 BMW Yes > Female 35 9000 Toyota No > Male 50 12000 Audi Yes > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > It says: "scikit-learn implementation does not support categorical variables > for now". > > Is this true? If not, can someone point me to an example? If yes, what do > people do? > > Thank you very much! > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn