I don't consider LabelBinarizer the best workaround. Given a Pandas dataframe df, I'd use:
DictVectorizer().fit_transform(df.to_dict(orient='records')) which will handle encoding strings with one-hot and numerical features as column vectors. Or: class PandasVectorizer(DictVectorizer): def fit(self, x, y=None): return super(PandasVectorizer, self).fit(x.to_dict('records')) def fit_transform(self, x, y=None): return super(PandasVectorizer, self).fit_transform(x.to_dict('records')) def transform(self, x): return super(PandasVectorizer, self).transform(x.to_dict('records')) On 18 August 2017 at 01:11, Andreas Mueller <t3k...@gmail.com> wrote: > Hi Georg. > Unfortunately this is not entirely trivial right now, but will be fixed by > https://github.com/scikit-learn/scikit-learn/pull/9151 > and > https://github.com/scikit-learn/scikit-learn/pull/9012 > which will be in the next release (0.20). > > LabelBinarizer is probably the best work-around for now, and selecting > columns can be done (awkwardly) > like in this example: http://scikit-learn.org/dev/ > auto_examples/hetero_feature_union.html#sphx-glr-auto- > examples-hetero-feature-union-py > > Best, > Andy > > > On 08/17/2017 07:50 AM, Georg Heiler wrote: > > Hi, > > how can I properly handle categorical values in scikit-learn? > https://stackoverflow.com/questions/45727934/pandas-categories-new-levels? > noredirect=1#comment78424496_45727934 > > goals > > - scikit-learn syle fit/transform methods to encode labels of > categorical features of X > - should handle unseen labels > - should be faster than running a label encoder manually for each fold > and manually checking if the label already was seen in the training data > i.e. what I currently do (https://stackoverflow.com/ > questions/45727934/pandas-categories-new-levels? > noredirect=1#comment78424496_45727934 > > <https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934> > which > links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2 > ce) > - only some columns are categorical, and only these should be converted > > > Regards, > Georg > > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn