I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap` yet, but a simple test on `dask_ml.preprocessing.StandardScaler` failed with the (probably expected) `TypeError: 'int' object is not iterable` when dask-ml attempts an `X.mean(0)`.
I'd be interested to hear what changes dask-ml would need to make to get things working on dask-back xarray datasets, without reading everything into memory at once. The code: import sklearn_xarray.dataarray as da from sklearn_xarray.data import load_dummy_dataarray from dask_ml.preprocessing import StandardScaler X = load_dummy_dataarray() Xt = da.wrap(StandardScaler()).fit_transform(X) Tom On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.gri...@ensta.org> wrote: > Interesting project! > > BTW, do you know about dask-ml [1]? > > It might be interesting to think about generalizing the input validation > of fit and predict / transform as a private method of the BaseEstimator > class instead of directly calling into sklearn.utils.validation functions > so has to make it easier for third party projects such as sklearn-xarray > and dask-ml to subclass and override those methods to allow for specific > input data-structure without converting everyting to a numpy array. > > [1] https://github.com/dask/dask-ml > > > > 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausam...@tum.de>: > >> Hi all, >> >> I'd like to announce *sklearn-xarray*, a new package that provides a >> scikit-learn interface for xarray users. For those not familiar with xarray >> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible >> toolkit for analytics on multi-dimensional arrays". >> >> The package makes it possible to apply sklearn estimators to xarray >> DataArrays and Datasets while keeping the labels (called coordinates in >> xarray) intact whereever possible. >> >> You can install the package via pip: >> >> pip install sklearn-xarray >> >> To get started, you can: >> >> - read the documentation: https://phausamann.github.io/sklearn-xarray >> and >> - check out the repository: https://github.com >> /phausamann/sklearn-xarray >> >> Note that the package is still in a very early development stage and >> there will probably be some major API changes in upcoming releases. Most >> notably, I'd like to replicate the complete sklearn module structure at >> some point by decorating all available estimators with the necessary >> wrappers. >> >> Feedback of any kind is appreciated. >> >> Peter >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn