Hi scikit-learn'ers
We just released skrub 0.2.0: https://skrub-data.org. This release markedly
simplifies learning on complex dataframes.
# `model = tabular_learner('classifier')`
The highlight of the release is the `tabular_learner` function, which
facilitates creating pipelines that readily perform machine learning on
dataframes, adding preprocessing to a scikit-learn compatible learner. The
function basically packs defaults and heuristics to transform all forms of
dataframes to a representation that is well suited to a learner, and it can
adapt these transformation: tabular_learner(HistGradientBoostingClassifier())
encodes categories differently than tabular_learner(LogisticRegression()).
The heuristics are tuned based on much benchmarking and experience shows that
they give good tradeoffs. The default `tabular_learner('classifier')` is often
a strong baseline.
# `transformer = TableVectorizer()`
Behind the hood, the work is done by the `skrub.TableVectorizer()`, a
scikit-learn compatible transformer that facilitates combining multiple
transformations on the different columns of a dataframe. The TableVectorizer is
not new in the 0.2.0 release, but we have completely revamped its internals to
cover really well edge cases. Indeed, one challenge is to make sure that
nothing different or strange happens at test time. Actually, enforcing
consistency between train-time and test-time transformation is the real value
of skrub compared to using pandas or polars to do transformation.
# Increasing support of polars
We have implemented a new mechanism for supporting both pandas and polars. It
has not been applied on all the codebase, hence the support is still imperfect.
However, we are seeing increasing support for polars in skrub, and our goal in
the short term is to provide rock-solid polar support.
Try skrub out! It's still young, but in mind opinion, it provides a lot of
value to tabular learning.
Cheers,
Gaƫl
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn