Thanks everyone for your feedback. The reason you're getting the error is because the first argument of DataArray.mean() is the named dimension 'dim' and not 'axis'. So calling X.mean(axis=0) would probably solve the problem... but it might be easier (and more robust) to fix this on my end by always converting the data to a numpy array before passing it to the wrapped estimator.
Regarding the question on how to avoid data being loaded into memory: I'm honestly not familiar enough with this subject to give you an answer just yet, but supporting too-big-for-memory datasets is definitely a feature that would be very important to me. Cheers Peter Tom Augspurger <tom.augspurge...@gmail.com> schrieb am Mo., 4. Dez. 2017 um 17:00 Uhr: > I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap` > yet, but a simple test > on `dask_ml.preprocessing.StandardScaler` failed with the (probably > expected) `TypeError: 'int' object is not iterable` > when dask-ml attempts an `X.mean(0)`. > > I'd be interested to hear what changes dask-ml would need to make to get > things working on dask-back xarray datasets, > without reading everything into memory at once. > > The code: > > > import sklearn_xarray.dataarray as da > from sklearn_xarray.data import load_dummy_dataarray > from dask_ml.preprocessing import StandardScaler > > X = load_dummy_dataarray() > Xt = da.wrap(StandardScaler()).fit_transform(X) > > > Tom > > On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.gri...@ensta.org> > wrote: > >> Interesting project! >> >> BTW, do you know about dask-ml [1]? >> >> It might be interesting to think about generalizing the input validation >> of fit and predict / transform as a private method of the BaseEstimator >> class instead of directly calling into sklearn.utils.validation functions >> so has to make it easier for third party projects such as sklearn-xarray >> and dask-ml to subclass and override those methods to allow for specific >> input data-structure without converting everyting to a numpy array. >> >> [1] https://github.com/dask/dask-ml >> >> >> >> 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausam...@tum.de>: >> >>> Hi all, >>> >>> I'd like to announce *sklearn-xarray*, a new package that provides a >>> scikit-learn interface for xarray users. For those not familiar with xarray >>> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible >>> toolkit for analytics on multi-dimensional arrays". >>> >>> The package makes it possible to apply sklearn estimators to xarray >>> DataArrays and Datasets while keeping the labels (called coordinates in >>> xarray) intact whereever possible. >>> >>> You can install the package via pip: >>> >>> pip install sklearn-xarray >>> >>> To get started, you can: >>> >>> - read the documentation: https://phausamann.github.io/sklearn-xarray >>> and >>> - check out the repository: >>> https://github.com/phausamann/sklearn-xarray >>> >>> Note that the package is still in a very early development stage and >>> there will probably be some major API changes in upcoming releases. Most >>> notably, I'd like to replicate the complete sklearn module structure at >>> some point by decorating all available estimators with the necessary >>> wrappers. >>> >>> Feedback of any kind is appreciated. >>> >>> Peter >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> Olivier >> http://twitter.com/ogrisel - http://github.com/ogrisel >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn