On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote: > I've used R and Stata software, none needs such transformation. They have a > data type called "factors", which is different from "numeric".
> My problem with OHE: > One-hot-encoding results in large number of features. This really blows up > quickly. And I have to fight curse of dimensionality with PCA reduction. > That's > not cool! Most statistical models still not one-hot encoding behind the hood. So, R and stata do it too. Typically, tree-based models can be adapted to work directly on categorical data. Ours don't. It's work in progress. G _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn