Re: [scikit-learn] One-hot encoding

Sarah Wait Zaranek Sun, 04 Feb 2018 21:28:39 -0800

Hi Joel -

Conceptually, that makes sense.  But when I assign n_values, I can't make
it match the result when you don't specify them. See below.  I used the
number of unique levels per column.


>>> enc = OneHotEncoder(sparse=False)
>>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]])
>>> test
array([[0., 0., 1., 1., 0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 1., 1., 0., 0., 0.],
       [1., 0., 0., 0., 1., 0., 1., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 1., 0.]])
>>> enc = OneHotEncoder(sparse=False,n_values=[3,2,4])
>>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]])
>>> test
array([[0., 0., 0., 1., 0., 0., 0., 1., 1.],
       [0., 1., 0., 0., 0., 2., 0., 0., 0.],
       [1., 0., 0., 0., 0., 1., 1., 0., 0.],
       [0., 1., 0., 1., 0., 0., 0., 1., 0.]])

Cheers,
Sarah

Cheers,
Sarah

On Mon, Feb 5, 2018 at 12:02 AM, Joel Nothman <[email protected]>
wrote:

> If each input column is encoded as a value from 0 to the (number of
> possible values for that column - 1) then n_values for that column should
> be the highest value + 1, which is also the number of levels per column.
> Does that make sense?
>
> Actually, I've realised there's a somewhat slow and unnecessary bit of
> code in the one-hot encoder: where the COO matrix is converted to CSR. I
> suspect this was done because most of our ML algorithms perform better on
> CSR, or else to maintain backwards compatibility with an earlier
> implementation.
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] One-hot encoding

Reply via email to