Hi,
When I fit a OneHotEncoder, it sometimes encodes different values to the
same new vector, depending on wether n_values is explicit or 'auto'.
I wrote a brief script to demonstrate the issue, below.
Note that [[0, 1]] is either getting encoded as the same thing as [[0, 0]],
or as a different thing, depending on whether or not n_values is specified
directly or as 'auto'.
Either way (using 'auto' or specifying n_values), it says that n_values_ is
[2, 3], but I get different behavior when encoding.
Is this a bug? or am I misunderstanding n_values?
Thanks, and please find my code and corresponding output below.
~SA
<code>
import numpy as np
import sklearn
from sklearn.preprocessing import OneHotEncoder as ohe
print "Scitkit-learn Version:", sklearn.__version__
data = np.array([[1, 2],
[0, 2]])
enc1 = ohe('auto')
print "data:\n", data
enc1.fit(data)
t1 = np.array([[0, 1]])
t2 = np.array([[0, 0]])
print "enc1.n_values_:", enc1.n_values_
print t1, "->", enc1.transform(t1).toarray()
print t2, "->", enc1.transform(t2).toarray()
enc2 = ohe([2, 3])
enc2.fit(data)
print "Now with specifying n_values directly:"
print "enc2.n_values_:", enc2.n_values_
print t1, "->", enc2.transform(t1).toarray()
print t2, "->", enc2.transform(t2).toarray()
</code>
<output>
Scitkit-learn Version: 0.13.1
data:
[[1 2]
[0 2]]
enc1.n_values_: [2 3]
[[0 1]] -> [[ 1. 0. 0.]]
[[0 0]] -> [[ 1. 0. 0.]]
Now with specifying n_values directly:
enc2.n_values_: [2 3]
[[0 1]] -> [[ 1. 0. 0. 1. 0.]]
[[0 0]] -> [[ 1. 0. 1. 0. 0.]]
</output>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general