2015-03-05 16:57 GMT+01:00 Andy <t3k...@gmail.com>:

>  Well, the columns after the OneHotEncoder correspond to feature values,
> not feature names, right?
>

Well, for the categorical ones this is right, except that not all my
features are categorical (hence the categorical_features=...) and they are
intertwined.

So my problem is more to keep track of which categorical features got
projected into which columns (1->N) and which numerical ones have been just
copied and where (1->1).

Re-reading your answer I'm wondering if you suggest to just separate the
input columns by feature types and apply the encoder to the categorical
ones only ?



> There is ``feature_indices_`` which maps each feature to a range of
> features in the encoded matrix.
> The features in the input matrix don't really have names in scikit-learn,
> as they are represented only as numpy matrices.
> So you need to keep track of the indices of each feature. That shouldn't
> be too hard, though.
>
> Why don't you select the features before the encoding? Or do you want to
> exclude some values?
>
>
>
> On 03/05/2015 05:55 AM, Eustache DIEMERT wrote:
>
> Hi list,
>
>  I have a X (np.array) with some columns containing ids. I also have a
> list of column names. Then I want to transform the relevant columns to be
> used by a logistic regression model using OneHotEncoder:
>
>  >>> X = np.loadtxt(...) # from a CSV
> >>> col_names = ... # from CSV header
>  >>> e = OneHotEncoder(categorical_features=id_columns)
> >>> Xprime = e.fit_transform(X)
>
>  But then I don't know how to deduce the names of the columns in the new
> matrix :(
>
>  Ideally I would want the same as DictVectorizer which has a
> feature_names_ member.
>
>  Anyone already had this problem ?
>
>  Eustache
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to