To my understanding pandas.factorize only works for the static case where
no unseen variables can occur.
Georg Heiler schrieb am Mo. 7. Aug. 2017 um
08:40:
> I will need to look into factorize. Here is the result from profiling the
> transform method on a single new observation
> https://coderevi
I will need to look into factorize. Here is the result from profiling the
transform method on a single new observation
https://codereview.stackexchange.com/q/171622/132999
Best Georg
Sebastian Raschka schrieb am So. 6. Aug. 2017 um
20:39:
> > performance of prediction is pretty lame when there
> performance of prediction is pretty lame when there are around 100-150
> columns used as the input.
you are talking about computational performance when you are calling the
"transform" method? Have you done some profiling to find out where your bottle
neck (in the for loop) is? Just one a ver
@sebastian: thanks. Indeed, I am aware of this problem.
I developed something here:
https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce but
realized that the performance of prediction is pretty lame when there are
around 100-150 columns used as the input.
Do you have some ideas how to
We are working on CategoricalEncoder in
https://github.com/scikit-learn/scikit-learn/pull/9151 to help users more
with this kind of thing. Feedback and testing is welcome.
On 6 August 2017 at 02:13, Sebastian Raschka wrote:
> Hi, Georg,
>
> I bring this up every time here on the mailing list :),
Hi, Georg,
I bring this up every time here on the mailing list :), and you probably aware
of this issue, but it makes a difference whether your categorical data is
nominal or ordinal. For instance if you have an ordinal variable like with
values like {small, medium, large} you probably want to
Hi,
the LabelEncooder is only meant for a single column i.e. target variable.
Is the DictVectorizeer or a manual chaining of multiple LabelEncoders (one
per categorical column) the desired way to get values which can be fed into
a subsequent classifier?
Is there some way I have overlooked which w