[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

C W Thu, 30 Apr 2020 12:57:13 -0700

Hello everyone,

I am frustrated with the one-hot-encoding requirement for categorical
feature. Why?


I've used R and Stata software, none needs such transformation. They have a
data type called "factors", which is different from "numeric".

My problem with OHE:
One-hot-encoding results in large number of features. This really blows up
quickly. And I have to fight curse of dimensionality with PCA reduction.
That's not cool!

Can sklearn have a "factor" data type in the future? It would make life so
much easier.

Thanks a lot!

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

Reply via email to