20 million categories, or 20 million categorical variables? OneHotEncoder is pretty efficient if you specify n_values.
On 5 February 2018 at 15:10, Sarah Wait Zaranek <sarah.zara...@gmail.com> wrote: > Hello - > > I was just wondering if there was a way to improve performance on the > one-hot encoder. Or, is there any plans to do so in the future? I am > working with a matrix that will ultimately have 20 million categorical > variables, and my bottleneck is the one-hot encoder. > > Let me know if this isn't the place to inquire. My code is very simple > when using the encoder, but I cut and pasted it here for completeness. > > enc = OneHotEncoder(sparse=True) > Xtrain = enc.fit_transform(tiledata) > > > Thanks, > Sarah > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn