Re: [scikit-learn] Categorical Encoding of high cardinality variables

2019-04-23 Thread Sole Galli
Hello everyone, I am Sole, I started the conversation on feature engine , a package I created for feature engineering. Regarding the grouping of *rare / infrequent* categories into an umbrella term like "Rare", "Other", etc, which Federico raised recently,

[scikit-learn] Categorical Encoding of high cardinality variables

2019-04-19 Thread federico vaggi
Hi everyone, I wanted to use the scikit-learn transformer API to clean up some messy data as input to a neural network. One of the steps involves converting categorical variables (of very high cardinality) into integers for use in an embedding layer. Unfortunately, I cannot quite use LabelEncode