[scikit-learn] benchmarking TargetEncoder Was: ANN Dirty_cat: learning on dirty categories

Gael Varoquaux Fri, 23 Nov 2018 00:49:17 -0800

On Wed, Nov 21, 2018 at 11:35:11AM -0500, Andreas Mueller wrote:
> The question for this particular issue for me is also "what are good
> benchmark datasets".
> In dirty cat you used dirty categories, which is a subset of all
> high-cardinality categorical
> variables.
> Whether "clean" high cardinality variables like zip-codes or dirty ones are
> the better
> benchmark is a bit unclear to me, and I'm not aware of a wealth of datasets
> for either :-/


Fair point. We'll have a look to see what we can find. We're open to
suggestions, from you or from anyone else.

G
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] benchmarking TargetEncoder Was: ANN Dirty_cat: learning on dirty categories

Reply via email to