I thought you just wanted to mask some features, but I guess that was
not you intend.
You could make your code robust to future changes by using the
feature_indices_ attribute,
while assuming that the result first has all categorical, and then all
numerical values.
Btw, you might have an easier
Well after a bit of tinkering it seems that OneHotEncoder has simple rules
to affect columns to the output:
1) first do the categorical, in the order given by the argument, creating
columns as needed by the values
2) then the numerical
So a piece of code like that seems to work:
fn = []
fc =
2015-03-05 16:57 GMT+01:00 Andy t3k...@gmail.com:
Well, the columns after the OneHotEncoder correspond to feature values,
not feature names, right?
Well, for the categorical ones this is right, except that not all my
features are categorical (hence the categorical_features=...) and they are
Hi list,
I have a X (np.array) with some columns containing ids. I also have a list
of column names. Then I want to transform the relevant columns to be used
by a logistic regression model using OneHotEncoder:
X = np.loadtxt(...) # from a CSV
col_names = ... # from CSV header
e =