I will just add that if you have heterogeneous types, you might want to
look at the ColumnTransformer:
https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html
You might want to apply some scaling (would not be relevant for tree
thought) and encode categories
I am +1 for this change.
I agree that users will accommodate the syntax sooner or later.
On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, <
jeremie.du-boisberran...@inria.fr> wrote:
> I don't know what is the policy about a sklearn 1.0 w.r.t api changes.
>
> If it's meant to be a special
If you have datasets with many categorical features, and perhaps many
categories, the tools in sklearn are quite limited,
but there are alternative implementations of boosted trees that are
designed with categorical features in mind. Take a look
at catboost [1], which has an sklearn-compatible API.
Sayak Paul | sayak.dev
-- Forwarded message -
From:
Date: Fri, Sep 13, 2019 at 10:46 AM
Subject: scikit-learn Digest, Vol 42, Issue 15
To:
Send scikit-learn mailing list submissions to
scikit-learn@python.org
To subscribe or unsubscribe via the World Wide Web, visit
Thanks, Guillaume.
Column transformer looks pretty neat. I've also heard though, this pipeline
can be tedious to set up? Specifying what you want for every feature is a
pain.
Jaiver,
Actually, you guessed right. My real data has only one numerical
variable, looks more like this:
Gender Date
+1 from me
On Sat, Sep 14, 2019 at 8:12 AM Joel Nothman wrote:
> I am +1 for this change.
>
> I agree that users will accommodate the syntax sooner or later.
>
> On Fri., 13 Sep. 2019, 7:54 pm Jeremie du Boisberranger, <
> jeremie.du-boisberran...@inria.fr> wrote:
>
>> I don't know what is the p