Hello everyone, I am frustrated with the one-hot-encoding requirement for categorical feature. Why?
I've used R and Stata software, none needs such transformation. They have a data type called "factors", which is different from "numeric". My problem with OHE: One-hot-encoding results in large number of features. This really blows up quickly. And I have to fight curse of dimensionality with PCA reduction. That's not cool! Can sklearn have a "factor" data type in the future? It would make life so much easier. Thanks a lot!
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn