Dear scikit-learn community,

I would like to announce a new release of dirty-cat, which strives to
facilitates machine-learning on non-curated categories: robust to
morphological variants, such as typos.

The new big feature, which I think is of interest to many, is the
"SuperVectorizer", that strives to readily vectorize a pandas dataframe:
https://dirty-cat.github.io/stable/auto_examples/01_dirty_categories.html#example-super-vectorizer

Of course, such an object is full of heuristics. We have tuned them
empirically, but we expect more progress in the long term, as we build a
bigger databases of dataframes that are difficult to vectorize. We'd love
people to join the adventure, it's been fun so far.

Cheers,

Gaƫl

-- 
    Gael Varoquaux
    Research Director, INRIA
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to