Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).

LabelBinarizer is probably the best work-around for now, and selecting columns can be done (awkwardly) like in this example: http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py

Best,
Andy

On 08/17/2017 07:50 AM, Georg Heiler wrote:
Hi,

how can I properly handle categorical values in scikit-learn?
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934

goals

  * scikit-learn syle fit/transform methods to encode labels of
    categorical features of X
  * should handle unseen labels
  * should be faster than running a label encoder manually for each
    fold and manually checking if the label already was seen in the
    training data i.e. what I currently do
    
(https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
 which
    links to
    https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce)
  * only some columns are categorical, and only these should be converted


Regards,
Georg


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to