Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
To my understanding pandas.factorize only works for the static case where no unseen variables can occur. Georg Heiler schrieb am Mo. 7. Aug. 2017 um 08:40: > I will need to look into factorize. Here is the result from profiling the > transform method on a single new observation > https://coderevi

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
I will need to look into factorize. Here is the result from profiling the transform method on a single new observation https://codereview.stackexchange.com/q/171622/132999 Best Georg Sebastian Raschka schrieb am So. 6. Aug. 2017 um 20:39: > > performance of prediction is pretty lame when there

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Sebastian Raschka
> performance of prediction is pretty lame when there are around 100-150 > columns used as the input. you are talking about computational performance when you are calling the "transform" method? Have you done some profiling to find out where your bottle neck (in the for loop) is? Just one a ver

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-06 Thread Georg Heiler
@sebastian: thanks. Indeed, I am aware of this problem. I developed something here: https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but realized that the performance of prediction is pretty lame when there are around 100-150 columns used as the input. Do you have some ideas how to

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Joel Nothman
We are working on CategoricalEncoder in https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more with this kind of thing. Feedback and testing is welcome. On 6 August 2017 at 02:13, Sebastian Raschka wrote: > Hi, Georg, > > I bring this up every time here on the mailing list :),

Re: [scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Sebastian Raschka
Hi, Georg, I bring this up every time here on the mailing list :), and you probably aware of this issue, but it makes a difference whether your categorical data is nominal or ordinal. For instance if you have an ordinal variable like with values like {small, medium, large} you probably want to

[scikit-learn] transform categorical data to numerical representation

2017-08-05 Thread Georg Heiler
Hi, the LabelEncooder is only meant for a single column i.e. target variable. Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one per categorical column) the desired way to get values which can be fed into a subsequent classifier? Is there some way I have overlooked which w