When it comes to trees, the API for handling categoricals is simpler than
the implementation. Traditionally, tree-based models' handling of
categorical variables differs from both ordinal and one-hot encoding, while
both of those will work reasonably well for many problems. We are working
on
e could hello?
>>>>
>>>> Obtener Outlook para Android <https://aka.ms/ghei36>
>>>>
>>>> --------------
>>>> *From:* scikit-learn >>> hotmail@python.org> on behalf of Gael Varoquaux <
>>>&g
t;> --
>>> *From:* scikit-learn >> hotmail@python.org> on behalf of Gael Varoquaux <
>>> gael.varoqu...@normalesup.org>
>>> *Sent:* Thursday, April 30, 2020 5:12:06 PM
>>> *To:* Scikit-learn mailing list
>>> *
ka.ms/ghei36>
>
> --
> *From:* scikit-learn hotmail@python.org> on behalf of Gael Varoquaux <
> gael.varoqu...@normalesup.org>
> *Sent:* Thursday, April 30, 2020 5:12:06 PM
> *To:* Scikit-learn mailing list
> *Subject:* Re: [scikit-learn] Why does sklearn require one
equire one-hot-encoding for
categorical features? Can we have a "factor" data type?
On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> I've used R and Stata software, none needs such transformation. They have a
> data type called "factors", which is different from &
On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> I've used R and Stata software, none needs such transformation. They have a
> data type called "factors", which is different from "numeric".
> My problem with OHE:
> One-hot-encoding results in large number of features. This really blows up
>
Hi,
I think there are many reasons that have led to the current situation.
One is that scikit-learn is based on numpy arrays, which do not offer
categorical data types (yet: ideas are being discussed
https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already
has a categorical data
Hello everyone,
I am frustrated with the one-hot-encoding requirement for categorical
feature. Why?
I've used R and Stata software, none needs such transformation. They have a
data type called "factors", which is different from "numeric".
My problem with OHE:
One-hot-encoding results in large