It seems like it's determined by the order in which they occur in the training 
set. E.g.,

from sklearn.preprocessing import OneHotEncoder
import numpy as np

x = np.array([['b'],
              ['a'], 
              ['b']])
ohe = OneHotEncoder()
xt = ohe.fit_transform(x)
xt.todense()

matrix([[0., 1.],
        [1., 0.],
        [0., 1.]])


and

x = np.array([['a'],
              ['b'], 
              ['a']])
ohe = OneHotEncoder()
xt = ohe.fit_transform(x)
xt.todense()

matrix([[1., 0.],
        [0., 1.],
        [1., 0.]])

Not sure how you used the OHE, but you also want to make sure that you only use 
it on those columns that are indeed categorical, e.g., note the following 
behavior: 

x = np.array([['a', 1.1],
              ['b', 1.2], 
              ['a', 1.3]])
ohe = OneHotEncoder()
xt = ohe.fit_transform(x)
xt.todense()

matrix([[1., 0., 1., 0., 0.],
        [0., 1., 0., 1., 0.],
        [1., 0., 0., 0., 1.]])


Best,
Sebastian

> On Jan 8, 2019, at 9:33 AM, pisymbol <pisym...@gmail.com> wrote:
> 
> Also Sebastian, I have binary classes but they are strings:
> 
> clf.classes_:
> array(['American', 'Southwest'], dtype=object)
> 
> 
> 
> On Tue, Jan 8, 2019 at 9:51 AM pisymbol <pisym...@gmail.com> wrote:
> If that is the case, what order are the coefficients in then?
> 
> -aps
> 
> On Tue, Jan 8, 2019 at 12:48 AM Sebastian Raschka <m...@sebastianraschka.com> 
> wrote:
> E.g, if you have a feature with values 'a' , 'b', 'c', then applying the one 
> hot encoder will transform this into 3 features.
> 
> Best,
> Sebastian
> 
> > On Jan 7, 2019, at 11:02 PM, pisymbol <pisym...@gmail.com> wrote:
> > 
> > 
> > 
> > On Mon, Jan 7, 2019 at 11:50 PM pisymbol <pisym...@gmail.com> wrote:
> > According to the doc (0.20.2) the coef_ variables are suppose to be shape 
> > (1, n_features) for binary classification. Well I created a Pipeline and 
> > performed a GridSearchCV to create a LogisticRegresion model that does 
> > fairly well. However, when I want to rank feature importance I noticed that 
> > my coefs_ for my best_estimator_ has 24 entries while my training data has 
> > 22.
> > 
> > What am I missing? How could coef_ > n_features?
> > 
> > 
> > Just a follow-up, I am using a OneHotEncoder to encode two categoricals as 
> > part of my pipeline (I am also using an imputer/standard scaler too but I 
> > don't see how that could add features).
> > 
> > Could my pipeline actually add two more features during fitting?
> > 
> > -aps
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to