If I use the n+1 approach, then I get the correct matrix, except with the columns of zeros:
>>> test array([[0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1.], [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.], [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.]]) On Mon, Feb 5, 2018 at 12:25 AM, Sarah Wait Zaranek <sarah.zara...@gmail.com > wrote: > Hi Joel - > > Conceptually, that makes sense. But when I assign n_values, I can't make > it match the result when you don't specify them. See below. I used the > number of unique levels per column. > > >>> enc = OneHotEncoder(sparse=False) > >>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]]) > >>> test > array([[0., 0., 1., 1., 0., 0., 0., 0., 1.], > [0., 1., 0., 0., 1., 1., 0., 0., 0.], > [1., 0., 0., 0., 1., 0., 1., 0., 0.], > [0., 1., 0., 1., 0., 0., 0., 1., 0.]]) > >>> enc = OneHotEncoder(sparse=False,n_values=[3,2,4]) > >>> test = enc.fit_transform([[7, 0, 3], [1, 2, 0], [0, 2, 1], [1, 0, 2]]) > >>> test > array([[0., 0., 0., 1., 0., 0., 0., 1., 1.], > [0., 1., 0., 0., 0., 2., 0., 0., 0.], > [1., 0., 0., 0., 0., 1., 1., 0., 0.], > [0., 1., 0., 1., 0., 0., 0., 1., 0.]]) > > Cheers, > Sarah > > Cheers, > Sarah > > On Mon, Feb 5, 2018 at 12:02 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> If each input column is encoded as a value from 0 to the (number of >> possible values for that column - 1) then n_values for that column should >> be the highest value + 1, which is also the number of levels per column. >> Does that make sense? >> >> Actually, I've realised there's a somewhat slow and unnecessary bit of >> code in the one-hot encoder: where the COO matrix is converted to CSR. I >> suspect this was done because most of our ML algorithms perform better on >> CSR, or else to maintain backwards compatibility with an earlier >> implementation. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn