I thought you just wanted to mask some features, but I guess that was
not you intend.
You could make your code robust to future changes by using the
feature_indices_ attribute,
while assuming that the result first has all categorical, and then all
numerical values.
Btw, you might have an easier time using pandas dummy variables instead
of using the one hot encoder.
On 03/06/2015 03:01 AM, Eustache DIEMERT wrote:
2015-03-05 16:57 GMT+01:00 Andy <t3k...@gmail.com
<mailto:t3k...@gmail.com>>:
Well, the columns after the OneHotEncoder correspond to feature
values, not feature names, right?
Well, for the categorical ones this is right, except that not all my
features are categorical (hence the categorical_features=...) and they
are intertwined.
So my problem is more to keep track of which categorical features got
projected into which columns (1->N) and which numerical ones have been
just copied and where (1->1).
Re-reading your answer I'm wondering if you suggest to just separate
the input columns by feature types and apply the encoder to the
categorical ones only ?
There is ``feature_indices_`` which maps each feature to a range
of features in the encoded matrix.
The features in the input matrix don't really have names in
scikit-learn, as they are represented only as numpy matrices.
So you need to keep track of the indices of each feature. That
shouldn't be too hard, though.
Why don't you select the features before the encoding? Or do you
want to exclude some values?
On 03/05/2015 05:55 AM, Eustache DIEMERT wrote:
Hi list,
I have a X (np.array) with some columns containing ids. I also
have a list of column names. Then I want to transform the
relevant columns to be used by a logistic regression model using
OneHotEncoder:
>>> X = np.loadtxt(...) # from a CSV
>>> col_names = ... # from CSV header
>>> e = OneHotEncoder(categorical_features=id_columns)
>>> Xprime = e.fit_transform(X)
But then I don't know how to deduce the names of the columns in
the new matrix :(
Ideally I would want the same as DictVectorizer which has a
feature_names_ member.
Anyone already had this problem ?
Eustache
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and
join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general