Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Joel Nothman
When it comes to trees, the API for handling categoricals is simpler than the implementation. Traditionally, tree-based models' handling of categorical variables differs from both ordinal and one-hot encoding, while both of those will work reasonably well for many problems. We are working on

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Fernando Marcos Wittmann
e could hello? >>>> >>>> Obtener Outlook para Android <https://aka.ms/ghei36> >>>> >>>> -------------- >>>> *From:* scikit-learn >>> hotmail@python.org> on behalf of Gael Varoquaux < >>>&g

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-01 Thread C W
t;> -- >>> *From:* scikit-learn >> hotmail@python.org> on behalf of Gael Varoquaux < >>> gael.varoqu...@normalesup.org> >>> *Sent:* Thursday, April 30, 2020 5:12:06 PM >>> *To:* Scikit-learn mailing list >>> *

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread C W
ka.ms/ghei36> > > -- > *From:* scikit-learn hotmail@python.org> on behalf of Gael Varoquaux < > gael.varoqu...@normalesup.org> > *Sent:* Thursday, April 30, 2020 5:12:06 PM > *To:* Scikit-learn mailing list > *Subject:* Re: [scikit-learn] Why does sklearn require one

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Hermes Morales
equire one-hot-encoding for categorical features? Can we have a "factor" data type? On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote: > I've used R and Stata software, none needs such transformation. They have a > data type called "factors", which is different from &

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Gael Varoquaux
On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote: > I've used R and Stata software, none needs such transformation. They have a > data type called "factors", which is different from "numeric". > My problem with OHE: > One-hot-encoding results in large number of features. This really blows up >

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Michael Eickenberg
Hi, I think there are many reasons that have led to the current situation. One is that scikit-learn is based on numpy arrays, which do not offer categorical data types (yet: ideas are being discussed https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already has a categorical data

[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread C W
Hello everyone, I am frustrated with the one-hot-encoding requirement for categorical feature. Why? I've used R and Stata software, none needs such transformation. They have a data type called "factors", which is different from "numeric". My problem with OHE: One-hot-encoding results in large