gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379
On 18 August 2017 at 01:26, Joel Nothman <joel.noth...@gmail.com> wrote: > I don't consider LabelBinarizer the best workaround. > > Given a Pandas dataframe df, I'd use: > > DictVectorizer().fit_transform(df.to_dict(orient='records')) > > which will handle encoding strings with one-hot and numerical features as > column vectors. Or: > > class PandasVectorizer(DictVectorizer): > def fit(self, x, y=None): > return super(PandasVectorizer, self).fit(x.to_dict('records')) > def fit_transform(self, x, y=None): > return super(PandasVectorizer, self).fit_transform(x.to_dict( > 'records')) > def transform(self, x): > return super(PandasVectorizer, self).transform(x.to_dict(' > records')) > > > On 18 August 2017 at 01:11, Andreas Mueller <t3k...@gmail.com> wrote: > >> Hi Georg. >> Unfortunately this is not entirely trivial right now, but will be fixed by >> https://github.com/scikit-learn/scikit-learn/pull/9151 >> and >> https://github.com/scikit-learn/scikit-learn/pull/9012 >> which will be in the next release (0.20). >> >> LabelBinarizer is probably the best work-around for now, and selecting >> columns can be done (awkwardly) >> like in this example: http://scikit-learn.org/dev/au >> to_examples/hetero_feature_union.html#sphx-glr-auto-examples >> -hetero-feature-union-py >> >> Best, >> Andy >> >> >> On 08/17/2017 07:50 AM, Georg Heiler wrote: >> >> Hi, >> >> how can I properly handle categorical values in scikit-learn? >> https://stackoverflow.com/questions/45727934/pandas-categori >> es-new-levels?noredirect=1#comment78424496_45727934 >> >> goals >> >> - scikit-learn syle fit/transform methods to encode labels of >> categorical features of X >> - should handle unseen labels >> - should be faster than running a label encoder manually for each >> fold and manually checking if the label already was seen in the training >> data i.e. what I currently do (https://stackoverflow.com/que >> stions/45727934/pandas-categories-new-levels?noredirect=1# >> comment78424496_45727934 >> >> <https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934> >> which >> links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b07 >> 99dc2ce) >> - only some columns are categorical, and only these should be >> converted >> >> >> Regards, >> Georg >> >> >> _______________________________________________ >> scikit-learn mailing >> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn