Re: [scikit-learn] Announcing sklearn-xarray

Peter Hausamann Mon, 04 Dec 2017 08:27:47 -0800

Thanks everyone for your feedback.

The reason you're getting the error is because the first argument of
DataArray.mean() is the named dimension 'dim' and not 'axis'. So calling
X.mean(axis=0) would probably solve the problem... but it might be easier
(and more robust) to fix this on my end by always converting the data to a
numpy array before passing it to the wrapped estimator.


Regarding the question on how to avoid data being loaded into memory: I'm
honestly not familiar enough with this subject to give you an answer just
yet, but supporting too-big-for-memory datasets is definitely a feature
that would be very important to me.

Cheers

Peter

Tom Augspurger <tom.augspurge...@gmail.com> schrieb am Mo., 4. Dez. 2017 um
17:00 Uhr:

> I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap`
> yet, but a simple test
> on `dask_ml.preprocessing.StandardScaler` failed with the (probably
> expected) `TypeError: 'int' object is not iterable`
> when dask-ml attempts an `X.mean(0)`.
>
> I'd be interested to hear what changes dask-ml would need to make to get
> things working on dask-back xarray datasets,
> without reading everything into memory at once.
>
> The code:
>
>
> import sklearn_xarray.dataarray as da
> from sklearn_xarray.data import load_dummy_dataarray
> from dask_ml.preprocessing import StandardScaler
>
> X = load_dummy_dataarray()
> Xt = da.wrap(StandardScaler()).fit_transform(X)
>
>
> Tom
>
> On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.gri...@ensta.org>
> wrote:
>
>> Interesting project!
>>
>> BTW, do you know about dask-ml [1]?
>>
>> It might be interesting to think about generalizing the input validation
>> of fit and predict / transform as a private method of the BaseEstimator
>> class instead of directly calling into sklearn.utils.validation functions
>> so has to make it easier for third party projects such as sklearn-xarray
>> and dask-ml to subclass and override those methods to allow for specific
>> input data-structure without converting everyting to a numpy array.
>>
>> [1] https://github.com/dask/dask-ml
>>
>>
>>
>> 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausam...@tum.de>:
>>
>>> Hi all,
>>>
>>> I'd like to announce *sklearn-xarray*, a new package that provides a
>>> scikit-learn interface for xarray users. For those not familiar with xarray
>>> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible
>>> toolkit for analytics on multi-dimensional arrays".
>>>
>>> The package makes it possible to apply sklearn estimators to xarray
>>> DataArrays and Datasets while keeping the labels (called coordinates in
>>> xarray) intact whereever possible.
>>>
>>> You can install the package via pip:
>>>
>>> pip install sklearn-xarray
>>>
>>> To get started, you can:
>>>
>>>    - read the documentation: https://phausamann.github.io/sklearn-xarray
>>>    and
>>>    - check out the repository:
>>>    https://github.com/phausamann/sklearn-xarray
>>>
>>> Note that the package is still in a very early development stage and
>>> there will probably be some major API changes in upcoming releases. Most
>>> notably, I'd like to replicate the complete sklearn module structure at
>>> some point by decorating all available estimators with the necessary
>>> wrappers.
>>>
>>> Feedback of any kind is appreciated.
>>>
>>> Peter
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>>
>> --
>> Olivier
>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Announcing sklearn-xarray

Reply via email to