Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-21 Thread Andreas Mueller
Yeah the input format is a bit odd, usually it should be n_samples x n_features, so something like [['A'], ['C'], ['T'], ['G']] Though this is currently also hard to do :( On 09/20/2016 05:50 AM, Lee Zamparo wrote: Hi Joel, Yea, seems that the one-hot encoding of the transpose solves the iss

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Lee Zamparo
Hi Joel, Yea, seems that the one-hot encoding of the transpose solves the issue. As you say, and as I mentioned to Sebastian, it seems a bit off-usage for OneHotEncoder. Thanks for the solution all the same though. -- Lee Zamparo On September 19, 2016 at 7:48:15 PM, Joel Nothman (joel.noth...

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Lee Zamparo
Hi Sebastian, Great, thanks! The docstring doesn’t make it very clear that using the default ’n_values=‘auto’ infers the number of different values column-wise; maybe I could do a quick PR to update it? Or, maybe I could make your example into a, well, example for the documentation online? Alte

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Joel Nothman
OneHotCoder has issues, but I think all you want here is ohe.fit_transform(np.transpose(le.fit_transform([c for c in myguide]))) Still, this seems like it is far from the intended use of OneHotEncoder (which should not really be stacked with LabelEncoder), so it's not surprising it's tricky. On

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Sebastian Raschka
Hi, Lee, maybe set `n_value=4`, this seems to do the job. I think the problem you encountered is due to the fact that the one-hot encoder infers the number of values for each feature (column) from the dataset. In your case, each column had only 1 unique feature in your example > array([[0, 1,