See ohe.active_features_. I agree it's not clearly documented (PR welcome),
but 'auto' encodes precisely those feature values seen in training (still
requiring non-negative integers), not merely their range.
On Wed, Jul 24, 2013 at 4:01 AM, Scott Alfeld <alf...@cs.utah.edu> wrote:
> Hi,
> When I fit a OneHotEncoder, it sometimes encodes different values to the
> same new vector, depending on wether n_values is explicit or 'auto'.
>
> I wrote a brief script to demonstrate the issue, below.
> Note that [[0, 1]] is either getting encoded as the same thing as [[0,
> 0]], or as a different thing, depending on whether or not n_values is
> specified directly or as 'auto'.
> Either way (using 'auto' or specifying n_values), it says that n_values_
> is [2, 3], but I get different behavior when encoding.
>
> Is this a bug? or am I misunderstanding n_values?
> Thanks, and please find my code and corresponding output below.
> ~SA
>
> <code>
> import numpy as np
> import sklearn
> from sklearn.preprocessing import OneHotEncoder as ohe
> print "Scitkit-learn Version:", sklearn.__version__
>
> data = np.array([[1, 2],
> [0, 2]])
> enc1 = ohe('auto')
> print "data:\n", data
> enc1.fit(data)
> t1 = np.array([[0, 1]])
> t2 = np.array([[0, 0]])
> print "enc1.n_values_:", enc1.n_values_
> print t1, "->", enc1.transform(t1).toarray()
> print t2, "->", enc1.transform(t2).toarray()
>
> enc2 = ohe([2, 3])
> enc2.fit(data)
> print "Now with specifying n_values directly:"
> print "enc2.n_values_:", enc2.n_values_
> print t1, "->", enc2.transform(t1).toarray()
> print t2, "->", enc2.transform(t2).toarray()
> </code>
>
>
> <output>
> Scitkit-learn Version: 0.13.1
> data:
> [[1 2]
> [0 2]]
> enc1.n_values_: [2 3]
> [[0 1]] -> [[ 1. 0. 0.]]
> [[0 0]] -> [[ 1. 0. 0.]]
> Now with specifying n_values directly:
> enc2.n_values_: [2 3]
> [[0 1]] -> [[ 1. 0. 0. 1. 0.]]
> [[0 0]] -> [[ 1. 0. 1. 0. 0.]]
> </output>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general