Hi scikit-learn'ers

We just released skrub 0.2.0: https://skrub-data.org. This release markedly 
simplifies learning on complex dataframes.

# `model = tabular_learner('classifier')`

The highlight of the release is the `tabular_learner` function, which 
facilitates creating pipelines that readily perform machine learning on 
dataframes, adding preprocessing to a scikit-learn compatible learner. The 
function basically packs defaults and heuristics to transform all forms of 
dataframes to a representation that is well suited to a learner, and it can 
adapt these transformation: tabular_learner(HistGradientBoostingClassifier()) 
encodes categories differently than tabular_learner(LogisticRegression()).

The heuristics are tuned based on much benchmarking and experience shows that 
they give good tradeoffs. The default `tabular_learner('classifier')` is often 
a strong baseline.


# `transformer = TableVectorizer()`

Behind the hood, the work is done by the `skrub.TableVectorizer()`, a 
scikit-learn compatible transformer that facilitates combining multiple 
transformations on the different columns of a dataframe. The TableVectorizer is 
not new in the 0.2.0 release, but we have completely revamped its internals to 
cover really well edge cases. Indeed, one challenge is to make sure that 
nothing different or strange happens at test time. Actually, enforcing 
consistency between train-time and test-time transformation is the real value 
of skrub compared to using pandas or polars to do transformation.

# Increasing support of polars

We have implemented a new mechanism for supporting both pandas and polars. It 
has not been applied on all the codebase, hence the support is still imperfect. 
However, we are seeing increasing support for polars in skrub, and our goal in 
the short term is to provide rock-solid polar support.

Try skrub out! It's still young, but in mind opinion, it provides a lot of 
value to tabular learning.

Cheers,

Gaƫl
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to