Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Joel Nothman
When it comes to trees, the API for handling categoricals is simpler than the implementation. Traditionally, tree-based models' handling of categorical variables differs from both ordinal and one-hot encoding, while both of those will work reasonably well for many problems. We are working on

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-05-06 Thread Fernando Marcos Wittmann
That's an excellent discussion! I've always wondered how other tools like R handled naturally categorical variables or not. LightGBM has a scikit-learn API which handles categorical features by inputting their columns names (or indexes): ``` import lightgbm lgb=lightgbm.LGBMClassifier()

[scikit-learn] ANN: scikit-learn 0.23 RC1

2020-05-06 Thread Adrin
Thanks to all our 200+ contributors, we are announcing a release candidate for the upcoming release. On top of a few exciting features, we're also deprecating positional arguments in many places where the constructor/method accepts many arguments. for example, SVC(.5, "poly") will need to be