PS: obviously forcing conversion to numpy is not what we would want, rather
passing the underlying array of the DataArray.

Peter Hausamann <peter.hausam...@tum.de> schrieb am Mo., 4. Dez. 2017 um
17:25 Uhr:

> Thanks everyone for your feedback.
>
> The reason you're getting the error is because the first argument of
> DataArray.mean() is the named dimension 'dim' and not 'axis'. So calling
> X.mean(axis=0) would probably solve the problem... but it might be easier
> (and more robust) to fix this on my end by always converting the data to a
> numpy array before passing it to the wrapped estimator.
>
> Regarding the question on how to avoid data being loaded into memory: I'm
> honestly not familiar enough with this subject to give you an answer just
> yet, but supporting too-big-for-memory datasets is definitely a feature
> that would be very important to me.
>
> Cheers
>
> Peter
>
>
> Tom Augspurger <tom.augspurge...@gmail.com> schrieb am Mo., 4. Dez. 2017
> um 17:00 Uhr:
>
>> I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap`
>> yet, but a simple test
>> on `dask_ml.preprocessing.StandardScaler` failed with the (probably
>> expected) `TypeError: 'int' object is not iterable`
>> when dask-ml attempts an `X.mean(0)`.
>>
>> I'd be interested to hear what changes dask-ml would need to make to get
>> things working on dask-back xarray datasets,
>> without reading everything into memory at once.
>>
>> The code:
>>
>>
>> import sklearn_xarray.dataarray as da
>> from sklearn_xarray.data import load_dummy_dataarray
>> from dask_ml.preprocessing import StandardScaler
>>
>> X = load_dummy_dataarray()
>> Xt = da.wrap(StandardScaler()).fit_transform(X)
>>
>>
>> Tom
>>
>> On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.gri...@ensta.org>
>> wrote:
>>
>>> Interesting project!
>>>
>>> BTW, do you know about dask-ml [1]?
>>>
>>> It might be interesting to think about generalizing the input validation
>>> of fit and predict / transform as a private method of the BaseEstimator
>>> class instead of directly calling into sklearn.utils.validation functions
>>> so has to make it easier for third party projects such as sklearn-xarray
>>> and dask-ml to subclass and override those methods to allow for specific
>>> input data-structure without converting everyting to a numpy array.
>>>
>>> [1] https://github.com/dask/dask-ml
>>>
>>>
>>>
>>> 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausam...@tum.de>:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to announce *sklearn-xarray*, a new package that provides a
>>>> scikit-learn interface for xarray users. For those not familiar with xarray
>>>> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible
>>>> toolkit for analytics on multi-dimensional arrays".
>>>>
>>>> The package makes it possible to apply sklearn estimators to xarray
>>>> DataArrays and Datasets while keeping the labels (called coordinates in
>>>> xarray) intact whereever possible.
>>>>
>>>> You can install the package via pip:
>>>>
>>>> pip install sklearn-xarray
>>>>
>>>> To get started, you can:
>>>>
>>>>    - read the documentation:
>>>>    https://phausamann.github.io/sklearn-xarray  and
>>>>    - check out the repository:
>>>>    https://github.com/phausamann/sklearn-xarray
>>>>
>>>> Note that the package is still in a very early development stage and
>>>> there will probably be some major API changes in upcoming releases. Most
>>>> notably, I'd like to replicate the complete sklearn module structure at
>>>> some point by decorating all available estimators with the necessary
>>>> wrappers.
>>>>
>>>> Feedback of any kind is appreciated.
>>>>
>>>> Peter
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn@python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>>
>>> --
>>> Olivier
>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to