Re: [scikit-learn] One-hot encoding

2018-08-03 Thread Fernando Marcos Wittmann
Hi Sarah, I have some reflection questions. You don't need to answer all of them :) how many categories (approximately) do you have in each of those 20M categorical variables? How many samples do you have? Maybe you should consider different encoding strategies such as binary encoding. Also, this

Re: [scikit-learn] One-hot encoding

2018-08-03 Thread Sarah Wait Zaranek
Hi all - I can't do binary encoding because I need to trace back to the exact categorical variable and that is difficult in binary encoding, I believe. Each categorical variable has a range, but on average it is about 10 categories. I return a sparse matrix from the encoder. Regardless of the enc