PS: obviously forcing conversion to numpy is not what we would want, rather passing the underlying array of the DataArray.
Peter Hausamann <peter.hausam...@tum.de> schrieb am Mo., 4. Dez. 2017 um 17:25 Uhr: > Thanks everyone for your feedback. > > The reason you're getting the error is because the first argument of > DataArray.mean() is the named dimension 'dim' and not 'axis'. So calling > X.mean(axis=0) would probably solve the problem... but it might be easier > (and more robust) to fix this on my end by always converting the data to a > numpy array before passing it to the wrapped estimator. > > Regarding the question on how to avoid data being loaded into memory: I'm > honestly not familiar enough with this subject to give you an answer just > yet, but supporting too-big-for-memory datasets is definitely a feature > that would be very important to me. > > Cheers > > Peter > > > Tom Augspurger <tom.augspurge...@gmail.com> schrieb am Mo., 4. Dez. 2017 > um 17:00 Uhr: > >> I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap` >> yet, but a simple test >> on `dask_ml.preprocessing.StandardScaler` failed with the (probably >> expected) `TypeError: 'int' object is not iterable` >> when dask-ml attempts an `X.mean(0)`. >> >> I'd be interested to hear what changes dask-ml would need to make to get >> things working on dask-back xarray datasets, >> without reading everything into memory at once. >> >> The code: >> >> >> import sklearn_xarray.dataarray as da >> from sklearn_xarray.data import load_dummy_dataarray >> from dask_ml.preprocessing import StandardScaler >> >> X = load_dummy_dataarray() >> Xt = da.wrap(StandardScaler()).fit_transform(X) >> >> >> Tom >> >> On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.gri...@ensta.org> >> wrote: >> >>> Interesting project! >>> >>> BTW, do you know about dask-ml [1]? >>> >>> It might be interesting to think about generalizing the input validation >>> of fit and predict / transform as a private method of the BaseEstimator >>> class instead of directly calling into sklearn.utils.validation functions >>> so has to make it easier for third party projects such as sklearn-xarray >>> and dask-ml to subclass and override those methods to allow for specific >>> input data-structure without converting everyting to a numpy array. >>> >>> [1] https://github.com/dask/dask-ml >>> >>> >>> >>> 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausam...@tum.de>: >>> >>>> Hi all, >>>> >>>> I'd like to announce *sklearn-xarray*, a new package that provides a >>>> scikit-learn interface for xarray users. For those not familiar with xarray >>>> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible >>>> toolkit for analytics on multi-dimensional arrays". >>>> >>>> The package makes it possible to apply sklearn estimators to xarray >>>> DataArrays and Datasets while keeping the labels (called coordinates in >>>> xarray) intact whereever possible. >>>> >>>> You can install the package via pip: >>>> >>>> pip install sklearn-xarray >>>> >>>> To get started, you can: >>>> >>>> - read the documentation: >>>> https://phausamann.github.io/sklearn-xarray and >>>> - check out the repository: >>>> https://github.com/phausamann/sklearn-xarray >>>> >>>> Note that the package is still in a very early development stage and >>>> there will probably be some major API changes in upcoming releases. Most >>>> notably, I'd like to replicate the complete sklearn module structure at >>>> some point by decorating all available estimators with the necessary >>>> wrappers. >>>> >>>> Feedback of any kind is appreciated. >>>> >>>> Peter >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> -- >>> Olivier >>> http://twitter.com/ogrisel - http://github.com/ogrisel >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn