I thought you just wanted to mask some features, but I guess that was not you intend. You could make your code robust to future changes by using the feature_indices_ attribute, while assuming that the result first has all categorical, and then all numerical values. Btw, you might have an easier time using pandas dummy variables instead of using the one hot encoder.

On 03/06/2015 03:01 AM, Eustache DIEMERT wrote:

2015-03-05 16:57 GMT+01:00 Andy <t3k...@gmail.com <mailto:t3k...@gmail.com>>:

    Well, the columns after the OneHotEncoder correspond to feature
    values, not feature names, right?


Well, for the categorical ones this is right, except that not all my features are categorical (hence the categorical_features=...) and they are intertwined.

So my problem is more to keep track of which categorical features got projected into which columns (1->N) and which numerical ones have been just copied and where (1->1).

Re-reading your answer I'm wondering if you suggest to just separate the input columns by feature types and apply the encoder to the categorical ones only ?


    There is ``feature_indices_`` which maps each feature to a range
    of features in the encoded matrix.
    The features in the input matrix don't really have names in
    scikit-learn, as they are represented only as numpy matrices.
    So you need to keep track of the indices of each feature. That
    shouldn't be too hard, though.

    Why don't you select the features before the encoding? Or do you
    want to exclude some values?



    On 03/05/2015 05:55 AM, Eustache DIEMERT wrote:
    Hi list,

    I have a X (np.array) with some columns containing ids. I also
    have a list of column names. Then I want to transform the
    relevant columns to be used by a logistic regression model using
    OneHotEncoder:

    >>> X = np.loadtxt(...) # from a CSV
    >>> col_names = ... # from CSV header
    >>> e = OneHotEncoder(categorical_features=id_columns)
    >>> Xprime = e.fit_transform(X)

    But then I don't know how to deduce the names of the columns in
    the new matrix :(

    Ideally I would want the same as DictVectorizer which has a
    feature_names_ member.

    Anyone already had this problem ?

    Eustache


    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming The Go Parallel Website, 
sponsored
    by Intel and developed in partnership with Slashdot Media, is your hub for 
all
    things parallel software development, from weekly thought leadership blogs 
to
    news, videos, case studies, tutorials and more. Take a look and join the
    conversation now.http://goparallel.sourceforge.net/


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming The Go Parallel
    Website, sponsored
    by Intel and developed in partnership with Slashdot Media, is your
    hub for all
    things parallel software development, from weekly thought
    leadership blogs to
    news, videos, case studies, tutorials and more. Take a look and
    join the
    conversation now. http://goparallel.sourceforge.net/
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to