Hi,
if you have the category "car" as shown in your example, this would effectively
be something like
BMW=0
Toyota=1
Audi=2
Sure, the algorithm will execute just fine on the feature column with values in
{0, 1, 2}. However, the problem is that it will come up with binary rules like
x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a continuous
variable.
What you can do is to encode this feature via one-hot encoding -- basically
extend it into 2 (or 3) binary variables. This has it's own problems (if you
have a feature with many possible values, you will end up with a large number
of binary variables, and they may dominate in the resulting tree over other
feature variables).
In any case, I guess this is what
> "scikit-learn implementation does not support categorical variables for now".
means ;).
Best,
Sebastian
> On Sep 13, 2019, at 9:38 PM, C W <[email protected]> wrote:
>
> Hello all,
> I'm very confused. Can the decision tree module handle both continuous and
> categorical features in the dataset? In this case, it's just CART
> (Classification and Regression Trees).
>
> For example,
> Gender Age Income Car Attendance
> Male 30 10000 BMW Yes
> Female 35 9000 Toyota No
> Male 50 12000 Audi Yes
>
> According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
>
> It says: "scikit-learn implementation does not support categorical variables
> for now".
>
> Is this true? If not, can someone point me to an example? If yes, what do
> people do?
>
> Thank you very much!
>
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn