Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).
LabelBinarizer is probably the best work-around for now, and selecting
columns can be done (awkwardly)
like in this example:
http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py
Best,
Andy
On 08/17/2017 07:50 AM, Georg Heiler wrote:
Hi,
how can I properly handle categorical values in scikit-learn?
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
goals
* scikit-learn syle fit/transform methods to encode labels of
categorical features of X
* should handle unseen labels
* should be faster than running a label encoder manually for each
fold and manually checking if the label already was seen in the
training data i.e. what I currently do
(https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
which
links to
https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce)
* only some columns are categorical, and only these should be converted
Regards,
Georg
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn