subject:"\[scikit\-learn\] behaviour of OneHotEncoder somewhat confusing"

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-21 Thread Andreas Mueller

Yeah the input format is a bit odd, usually it should be n_samples x 
n_features, so something like

[['A'], ['C'], ['T'], ['G']]

Though this is currently also hard to do :(

On 09/20/2016 05:50 AM, Lee Zamparo wrote:

Hi Joel,

Yea, seems that the one-hot encoding of the transpose solves the 
issue.  As you say, and as I mentioned to Sebastian, it seems a bit 
off-usage for OneHotEncoder.


Thanks for the solution all the same though.

--
Lee Zamparo

On September 19, 2016 at 7:48:15 PM, Joel Nothman 
(joel.noth...@gmail.com ) wrote:



OneHotCoder has issues, but I think all you want here is

ohe.fit_transform(np.transpose(le.fit_transform([c for c in myguide])))

Still, this seems like it is far from the intended use of 
OneHotEncoder (which should not really be stacked with LabelEncoder), 
so it's not surprising it's tricky.


On 20 September 2016 at 08:07, Sebastian Raschka 
mailto:se.rasc...@gmail.com>> wrote:


Hi, Lee,

maybe set `n_value=4`, this seems to do the job. I think the
problem you encountered is due to the fact that the one-hot
encoder infers the number of values for each feature (column)
from the dataset. In your case, each column had only 1 unique
feature in your example

> array([[0, 1, 2, 3],
>[0, 1, 2, 3],
>[0, 1, 2, 3]])

If you had an array like

> array([[0],
>   [1],
>   [2],
>  [3]])

it should work though. Alternatively, set n_values to 4:


> >>> from sklearn.preprocessing import OneHotEncoder
> >>> import numpy as np
>
> >>> enc = OneHotEncoder(n_values=4)
> >>> X = np.array([[0, 1, 2, 3]])
> >>> enc.fit_transform(X).toarray()


array([[ 1.,  0.,  0.,  0.,  0., 1.,  0.,  0.,  0.,  0.,  1.,
0.,  0.,
 0.,  0.,  1.]])

and

> X2 = np.array([[0, 1, 2, 3],
>  [0, 1, 2, 3],
>[0, 1, 2, 3]])
>
> enc.transform(X2).toarray()



array([[ 1.,  0.,  0.,  0.,  0., 1.,  0.,  0.,  0.,  0.,  1.,
0.,  0.,
 0.,  0.,  1.],
   [ 1.,  0.,  0., 0.,  0.,  1.,  0.,  0.,  0., 0.,  1., 
0.,  0.,

 0.,  0.,  1.],
   [ 1.,  0.,  0., 0.,  0.,  1.,  0.,  0.,  0., 0.,  1., 
0.,  0.,

 0.,  0.,  1.]])


Best,
Sebastian


> On Sep 19, 2016, at 5:45 PM, Lee Zamparo mailto:zamp...@gmail.com>> wrote:
>
> Hi sklearners,
>
> A lab-mate came to me with a problem about encoding DNA
sequences using preprocessing.OneHotEncoder, and I find it to
produce confusing results.
>
> Suppose I have a DNA string:  myguide = ‘ACGT’
>
> He’d like use OneHotEncoder to transform DNA strings, character
by character, into a one hot encoded representation like this:
[[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]].  The use-case seems
to be solved in pandas using the dubiously named get_dummies
method

(http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html

).
I thought that it would be trivial to do with OneHotEncoder, but
it seems strangely difficult:
>
> In [23]: myarray = le.fit_transform([c for c in myguide])
>
> In [24]: myarray
> Out[24]: array([0, 1, 2, 3])
>
> In [27]: myarray = le.transform([[c for c in myguide],[c for c
in myguide],[c for c in myguide]])
>
> In [28]: myarray
> Out[28]:
> array([[0, 1, 2, 3],
>[0, 1, 2, 3],
>[0, 1, 2, 3]])
>
> In [29]: ohe.fit_transform(myarray)
> Out[29]:
> array([[ 1.,  1.,  1.,  1.],
>[ 1.,  1.,  1., 1.],
>[ 1.,  1.,  1., 1.]])<— 
>
> So this is not at all what I expected.  I read the
documentation for OneHotEncoder

(http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

),
but did not find if clear how it worked (also I found the example
using integers confusing).  Neither FeatureHasher nor
DictVectorizer seem to be more appropriate for transforming
strings into positional OneHot encoded arrays.  Am I missing
something, or is this operation not supported in sklearn?
>
> Thanks,
>
> --
> Lee Zamparo
> ___
> scikit-learn mailing list
> scikit-learn@python.org 
> https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org 
https://mail.python.org/mailman/l

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Lee Zamparo

Hi Joel,

Yea, seems that the one-hot encoding of the transpose solves the issue.  As
you say, and as I mentioned to Sebastian, it seems a bit off-usage for
OneHotEncoder.

Thanks for the solution all the same though.

-- 
Lee Zamparo

On September 19, 2016 at 7:48:15 PM, Joel Nothman (joel.noth...@gmail.com)
wrote:

OneHotCoder has issues, but I think all you want here is

ohe.fit_transform(np.transpose(le.fit_transform([c for c in myguide])))

Still, this seems like it is far from the intended use of OneHotEncoder
(which should not really be stacked with LabelEncoder), so it's not
surprising it's tricky.

On 20 September 2016 at 08:07, Sebastian Raschka 
wrote:

> Hi, Lee,
>
> maybe set `n_value=4`, this seems to do the job. I think the problem you
> encountered is due to the fact that the one-hot encoder infers the number
> of values for each feature (column) from the dataset. In your case, each
> column had only 1 unique feature in your example
>
> > array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
>
> If you had an array like
>
> > array([[0],
> >   [1],
> >   [2],
> >  [3]])
>
> it should work though. Alternatively, set n_values to 4:
>
>
> > >>> from sklearn.preprocessing import OneHotEncoder
> > >>> import numpy as np
> >
> > >>> enc = OneHotEncoder(n_values=4)
> > >>> X = np.array([[0, 1, 2, 3]])
> > >>> enc.fit_transform(X).toarray()
>
>
> array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.]])
>
> and
>
> > X2 = np.array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
> >
> > enc.transform(X2).toarray()
>
>
>
> array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.],
>[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.],
>[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.]])
>
>
> Best,
> Sebastian
>
>
> > On Sep 19, 2016, at 5:45 PM, Lee Zamparo  wrote:
> >
> > Hi sklearners,
> >
> > A lab-mate came to me with a problem about encoding DNA sequences using
> preprocessing.OneHotEncoder, and I find it to produce confusing results.
> >
> > Suppose I have a DNA string:  myguide = ‘ACGT’
> >
> > He’d like use OneHotEncoder to transform DNA strings, character by
> character, into a one hot encoded representation like this: [[1,0,0,0],
> [0,1,0,0], [0,0,1,0], [0,0,0,1]].  The use-case seems to be solved in
> pandas using the dubiously named get_dummies method (
> http://pandas.pydata.org/pandas-docs/version/0.13.1/
> generated/pandas.get_dummies.html).  I thought that it would be trivial
> to do with OneHotEncoder, but it seems strangely difficult:
> >
> > In [23]: myarray = le.fit_transform([c for c in myguide])
> >
> > In [24]: myarray
> > Out[24]: array([0, 1, 2, 3])
> >
> > In [27]: myarray = le.transform([[c for c in myguide],[c for c in
> myguide],[c for c in myguide]])
> >
> > In [28]: myarray
> > Out[28]:
> > array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
> >
> > In [29]: ohe.fit_transform(myarray)
> > Out[29]:
> > array([[ 1.,  1.,  1.,  1.],
> >[ 1.,  1.,  1.,  1.],
> >[ 1.,  1.,  1.,  1.]])<— 
> >
> > So this is not at all what I expected.  I read the documentation for
> OneHotEncoder (http://scikit-learn.org/stable/modules/generated/
> sklearn.preprocessing.OneHotEncoder.html#sklearn.
> preprocessing.OneHotEncoder), but did not find if clear how it worked
> (also I found the example using integers confusing).  Neither FeatureHasher
> nor DictVectorizer seem to be more appropriate for transforming strings
> into positional OneHot encoded arrays.  Am I missing something, or is this
> operation not supported in sklearn?
> >
> > Thanks,
> >
> > --
> > Lee Zamparo
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Lee Zamparo

Hi Sebastian,

Great, thanks!

The docstring doesn’t make it very clear that using the default
’n_values=‘auto’ infers the number of different values column-wise; maybe I
could do a quick PR to update it?  Or, maybe I could make your example into
a, well, example for the documentation online?

Alternatively, if you think this case is too off-usage for OneHotEncoder,
maybe doing nothing is the best course?

Thanks,

-- 
Lee Zamparo

On September 19, 2016 at 6:08:15 PM, Sebastian Raschka (se.rasc...@gmail.com)
wrote:

Hi, Lee,

maybe set `n_value=4`, this seems to do the job. I think the problem you
encountered is due to the fact that the one-hot encoder infers the number
of values for each feature (column) from the dataset. In your case, each
column had only 1 unique feature in your example

> array([[0, 1, 2, 3],
> [0, 1, 2, 3],
> [0, 1, 2, 3]])

If you had an array like

> array([[0],
> [1],
> [2],
> [3]])

it should work though. Alternatively, set n_values to 4:


> >>> from sklearn.preprocessing import OneHotEncoder
> >>> import numpy as np
>
> >>> enc = OneHotEncoder(n_values=4)
> >>> X = np.array([[0, 1, 2, 3]])
> >>> enc.fit_transform(X).toarray()


array([[ 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 1.]])

and

> X2 = np.array([[0, 1, 2, 3],
> [0, 1, 2, 3],
> [0, 1, 2, 3]])
>
> enc.transform(X2).toarray()



array([[ 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 1.],
[ 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 1.],
[ 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 1.]])


Best,
Sebastian


> On Sep 19, 2016, at 5:45 PM, Lee Zamparo  wrote:
>
> Hi sklearners,
>
> A lab-mate came to me with a problem about encoding DNA sequences using
preprocessing.OneHotEncoder, and I find it to produce confusing results.
>
> Suppose I have a DNA string: myguide = ‘ACGT’
>
> He’d like use OneHotEncoder to transform DNA strings, character by
character, into a one hot encoded representation like this: [[1,0,0,0],
[0,1,0,0], [0,0,1,0], [0,0,0,1]]. The use-case seems to be solved in pandas
using the dubiously named get_dummies method (
http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html).
I thought that it would be trivial to do with OneHotEncoder, but it seems
strangely difficult:
>
> In [23]: myarray = le.fit_transform([c for c in myguide])
>
> In [24]: myarray
> Out[24]: array([0, 1, 2, 3])
>
> In [27]: myarray = le.transform([[c for c in myguide],[c for c in
myguide],[c for c in myguide]])
>
> In [28]: myarray
> Out[28]:
> array([[0, 1, 2, 3],
> [0, 1, 2, 3],
> [0, 1, 2, 3]])
>
> In [29]: ohe.fit_transform(myarray)
> Out[29]:
> array([[ 1., 1., 1., 1.],
> [ 1., 1., 1., 1.],
> [ 1., 1., 1., 1.]]) <— 
>
> So this is not at all what I expected. I read the documentation for
OneHotEncoder (
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder),
but did not find if clear how it worked (also I found the example using
integers confusing). Neither FeatureHasher nor DictVectorizer seem to be
more appropriate for transforming strings into positional OneHot encoded
arrays. Am I missing something, or is this operation not supported in
sklearn?
>
> Thanks,
>
> --
> Lee Zamparo
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Joel Nothman

OneHotCoder has issues, but I think all you want here is

ohe.fit_transform(np.transpose(le.fit_transform([c for c in myguide])))

Still, this seems like it is far from the intended use of OneHotEncoder
(which should not really be stacked with LabelEncoder), so it's not
surprising it's tricky.

On 20 September 2016 at 08:07, Sebastian Raschka 
wrote:

> Hi, Lee,
>
> maybe set `n_value=4`, this seems to do the job. I think the problem you
> encountered is due to the fact that the one-hot encoder infers the number
> of values for each feature (column) from the dataset. In your case, each
> column had only 1 unique feature in your example
>
> > array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
>
> If you had an array like
>
> > array([[0],
> >   [1],
> >   [2],
> >  [3]])
>
> it should work though. Alternatively, set n_values to 4:
>
>
> > >>> from sklearn.preprocessing import OneHotEncoder
> > >>> import numpy as np
> >
> > >>> enc = OneHotEncoder(n_values=4)
> > >>> X = np.array([[0, 1, 2, 3]])
> > >>> enc.fit_transform(X).toarray()
>
>
> array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.]])
>
> and
>
> > X2 = np.array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
> >
> > enc.transform(X2).toarray()
>
>
>
> array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.],
>[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.],
>[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
>  0.,  0.,  1.]])
>
>
> Best,
> Sebastian
>
>
> > On Sep 19, 2016, at 5:45 PM, Lee Zamparo  wrote:
> >
> > Hi sklearners,
> >
> > A lab-mate came to me with a problem about encoding DNA sequences using
> preprocessing.OneHotEncoder, and I find it to produce confusing results.
> >
> > Suppose I have a DNA string:  myguide = ‘ACGT’
> >
> > He’d like use OneHotEncoder to transform DNA strings, character by
> character, into a one hot encoded representation like this: [[1,0,0,0],
> [0,1,0,0], [0,0,1,0], [0,0,0,1]].  The use-case seems to be solved in
> pandas using the dubiously named get_dummies method (
> http://pandas.pydata.org/pandas-docs/version/0.13.1/
> generated/pandas.get_dummies.html).  I thought that it would be trivial
> to do with OneHotEncoder, but it seems strangely difficult:
> >
> > In [23]: myarray = le.fit_transform([c for c in myguide])
> >
> > In [24]: myarray
> > Out[24]: array([0, 1, 2, 3])
> >
> > In [27]: myarray = le.transform([[c for c in myguide],[c for c in
> myguide],[c for c in myguide]])
> >
> > In [28]: myarray
> > Out[28]:
> > array([[0, 1, 2, 3],
> >[0, 1, 2, 3],
> >[0, 1, 2, 3]])
> >
> > In [29]: ohe.fit_transform(myarray)
> > Out[29]:
> > array([[ 1.,  1.,  1.,  1.],
> >[ 1.,  1.,  1.,  1.],
> >[ 1.,  1.,  1.,  1.]])<— 
> >
> > So this is not at all what I expected.  I read the documentation for
> OneHotEncoder (http://scikit-learn.org/stable/modules/generated/
> sklearn.preprocessing.OneHotEncoder.html#sklearn.
> preprocessing.OneHotEncoder), but did not find if clear how it worked
> (also I found the example using integers confusing).  Neither FeatureHasher
> nor DictVectorizer seem to be more appropriate for transforming strings
> into positional OneHot encoded arrays.  Am I missing something, or is this
> operation not supported in sklearn?
> >
> > Thanks,
> >
> > --
> > Lee Zamparo
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Sebastian Raschka

Hi, Lee,

maybe set `n_value=4`, this seems to do the job. I think the problem you 
encountered is due to the fact that the one-hot encoder infers the number of 
values for each feature (column) from the dataset. In your case, each column 
had only 1 unique feature in your example

> array([[0, 1, 2, 3],
>[0, 1, 2, 3],
>[0, 1, 2, 3]])

If you had an array like

> array([[0],
>   [1],
>   [2],
>  [3]])

it should work though. Alternatively, set n_values to 4:


> >>> from sklearn.preprocessing import OneHotEncoder
> >>> import numpy as np
> 
> >>> enc = OneHotEncoder(n_values=4)
> >>> X = np.array([[0, 1, 2, 3]])
> >>> enc.fit_transform(X).toarray()


array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
 0.,  0.,  1.]])

and 

> X2 = np.array([[0, 1, 2, 3],
>[0, 1, 2, 3],
>[0, 1, 2, 3]])
> 
> enc.transform(X2).toarray()



array([[ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
 0.,  0.,  1.],
   [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
 0.,  0.,  1.],
   [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,
 0.,  0.,  1.]])


Best,
Sebastian


> On Sep 19, 2016, at 5:45 PM, Lee Zamparo  wrote:
> 
> Hi sklearners,
> 
> A lab-mate came to me with a problem about encoding DNA sequences using 
> preprocessing.OneHotEncoder, and I find it to produce confusing results.
> 
> Suppose I have a DNA string:  myguide = ‘ACGT’
> 
> He’d like use OneHotEncoder to transform DNA strings, character by character, 
> into a one hot encoded representation like this: [[1,0,0,0], [0,1,0,0], 
> [0,0,1,0], [0,0,0,1]].  The use-case seems to be solved in pandas using the 
> dubiously named get_dummies method 
> (http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html).
>   I thought that it would be trivial to do with OneHotEncoder, but it seems 
> strangely difficult:
> 
> In [23]: myarray = le.fit_transform([c for c in myguide])
> 
> In [24]: myarray
> Out[24]: array([0, 1, 2, 3])
> 
> In [27]: myarray = le.transform([[c for c in myguide],[c for c in myguide],[c 
> for c in myguide]])
> 
> In [28]: myarray
> Out[28]:
> array([[0, 1, 2, 3],
>[0, 1, 2, 3],
>[0, 1, 2, 3]])
> 
> In [29]: ohe.fit_transform(myarray)
> Out[29]:
> array([[ 1.,  1.,  1.,  1.],
>[ 1.,  1.,  1.,  1.],
>[ 1.,  1.,  1.,  1.]])<— 
> 
> So this is not at all what I expected.  I read the documentation for 
> OneHotEncoder 
> (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder),
>  but did not find if clear how it worked (also I found the example using 
> integers confusing).  Neither FeatureHasher nor DictVectorizer seem to be 
> more appropriate for transforming strings into positional OneHot encoded 
> arrays.  Am I missing something, or is this operation not supported in 
> sklearn?
> 
> Thanks,
> 
> -- 
> Lee Zamparo
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Lee Zamparo

Hi sklearners,

A lab-mate came to me with a problem about encoding DNA sequences using
preprocessing.OneHotEncoder, and I find it to produce confusing results.

Suppose I have a DNA string:  myguide = ‘ACGT’

He’d like use OneHotEncoder to transform DNA strings, character by
character, into a one hot encoded representation like this: [[1,0,0,0],
[0,1,0,0], [0,0,1,0], [0,0,0,1]].  The use-case seems to be solved in
pandas using the dubiously named get_dummies method (
http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.get_dummies.html).
I thought that it would be trivial to do with OneHotEncoder, but it seems
strangely difficult:

In [23]: myarray = le.fit_transform([c for c in myguide])

In [24]: myarray
Out[24]: array([0, 1, 2, 3])

In [27]: myarray = le.transform([[c for c in myguide],[c for c in
myguide],[c for c in myguide]])

In [28]: myarray
Out[28]:
array([[0, 1, 2, 3],
   [0, 1, 2, 3],
   [0, 1, 2, 3]])

In [29]: ohe.fit_transform(myarray)
Out[29]:
array([[ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.]])<— 

So this is not at all what I expected.  I read the documentation for
OneHotEncoder (
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder),
but did not find if clear how it worked (also I found the example using
integers confusing).  Neither FeatureHasher nor DictVectorizer seem to be
more appropriate for transforming strings into positional OneHot encoded
arrays.  Am I missing something, or is this operation not supported in
sklearn?

Thanks,

-- 
Lee Zamparo
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

[scikit-learn] behaviour of OneHotEncoder somewhat confusing

6 matches

Site Navigation

Mail list logo

Footer information