Hello everyone,

I am frustrated with the one-hot-encoding requirement for categorical
feature. Why?

I've used R and Stata software, none needs such transformation. They have a
data type called "factors", which is different from "numeric".

My problem with OHE:
One-hot-encoding results in large number of features. This really blows up
quickly. And I have to fight curse of dimensionality with PCA reduction.
That's not cool!

Can sklearn have a "factor" data type in the future? It would make life so
much easier.

Thanks a lot!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to