Re: [scikit-learn] Categorical handling

Andreas Mueller Thu, 17 Aug 2017 08:14:01 -0700

Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).

LabelBinarizer is probably the best work-around for now, and selectingcolumns can be done (awkwardly)like in this example:http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py


Best,
Andy

On 08/17/2017 07:50 AM, Georg Heiler wrote:

Hi,

how can I properly handle categorical values in scikit-learn?

https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934


goals

  * scikit-learn syle fit/transform methods to encode labels of
    categorical features of X
  * should handle unseen labels
  * should be faster than running a label encoder manually for each
    fold and manually checking if the label already was seen in the
    training data i.e. what I currently do
    
(https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
 which
    links to
    https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce)
  * only some columns are categorical, and only these should be converted


Regards,
Georg


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Categorical handling

Reply via email to