In addition to the y=None thing, KDE doesn't have a transform or predict
method - and I don't think Pipeline supports score or score_samples. Maybe
someone can comment on this, but I don't think KDE is typically used in a
pipeline.

In this particular case the code *seems* reasonable (and I am surprised it
doesn't work!), but I don't know much about the KDE stuff. Maybe a bug?

On Wed, Nov 5, 2014 at 7:44 AM, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:

> Hi José,
>
> yes, there seems to be an inconsistency, KernelDensity.fit has signature
> (self, X) and not (self, X, y=None) as is usually the case even if y is
> never used, see
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/kde.py#L113
>
> I think the generally accepted way of remedying this is to just add y=None
> in the signature of that function, as was done e.g. for PCA, see
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L206
>
> But maybe I am missing something crucial. Happy to make the PR if I am
> right about this.
>
> Michael
>
> On Wed, Nov 5, 2014 at 1:35 PM, José Guilherme Camargo de Souza <
> jose.camargo.so...@gmail.com> wrote:
>
>> Hi all,
>>
>> Is the KernelDensity estimator compatible with pipelines? When I try
>> to use it inside one
>>
>>     pipe1 = make_pipeline(StandardScaler(with_mean=True, with_std=True),
>>                           KernelDensity(algorithm="auto",
>> kernel="gaussian", metric="euclidean"))
>>     params = dict(kerneldensity__bandwidth=np.logspace(-10, 1, 100))
>>     search = GridSearchCV(pipe1, param_grid=params, verbose=1, n_jobs=8,
>> cv=5)
>>     search.fit(feats1)
>>     search.best_estimator_
>>
>> I get a TypeError as follows:
>>
>> /home/desouza/anaconda/lib/python2.7/site-packages/sklearn/pipeline.pyc
>> in fit(self=Pipeline(steps=[('standardscaler',
>> StandardScale...euclidean',
>>        metric_params=None, rtol=0))]), X=array([[  5.701     ,
>> 73.6443    ,  61.7018    ...2.7188    ,
>>           0.18243243,   0.21621622]]), y=None, **fit_params={})
>>     125     def fit(self, X, y=None, **fit_params):
>>     126         """Fit all the transforms one after the other and
>> transform the
>>     127         data, then fit the transformed data using the final
>> estimator.
>>     128         """
>>     129         Xt, fit_params = self._pre_transform(X, y, **fit_params)
>> --> 130         self.steps[-1][-1].fit(Xt, y, **fit_params)
>>     131         return self
>>     132
>>     133     def fit_transform(self, X, y=None, **fit_params):
>>     134         """Fit all the transforms one after the other and
>> transform the
>>
>> TypeError: fit() takes exactly 2 arguments (3 given)
>>
>> Is this an issue or it is supposed not to be compatible? A quick
>> search in the mailing list and on stackoverflow did not return any
>> entry about this.
>>
>> Thanks,
>> José
>>
>>
>> On Tue, Oct 21, 2014 at 3:03 PM, Jacob Vanderplas
>> <jake...@cs.washington.edu> wrote:
>> > Hi Jose,
>> > The KDE implementation does work on multivariate data, and will in
>> general
>> > work for multimodal data as well. There are two caveats to that:
>> >
>> > 1. In the sklearn implementation, the bandwidth must be the same across
>> each
>> > dimension. If this poses a problem for your data, the data can be scaled
>> > before the fit (Using StandardScaler or something similar).
>> > 2. The results will depend strongly on the choice of bandwidth: it's
>> > important to cross-validate to determine the optimal bandwidth, as is
>> done
>> > in
>> >
>> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html
>> >
>> > Good luck!
>> >   Jake
>> >
>> >
>> >  Jake VanderPlas
>> >  Director of Research – Physical Sciences
>> >  eScience Institute, University of Washington
>> >  http://www.vanderplas.com
>> >
>> > On Tue, Oct 21, 2014 at 2:09 AM, José Guilherme Camargo de Souza
>> > <jose.camargo.so...@gmail.com> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I would like to ask if the density estimation implementation of scikit
>> >> works with multivariate multimodal data. In the digits example [1] it
>> >> is clear that it supports multivariate datasets and in the guide
>> >> description [2] a 1-D bimodal distribution is used.
>> >>
>> >> Is it possible to use the same implementation on multivariate
>> >> gaussian-shaped data with more than 2 modes? If so, are there any
>> >> shortcomings or useful tips when doing that?
>> >>
>> >> Thanks in advance,
>> >> José
>> >>
>> >> [1]
>> >>
>> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html#example-neighbors-plot-digits-kde-sampling-py
>> >> [2]
>> >>
>> http://scikit-learn.org/stable/modules/density.html#kernel-density-estimation
>> >> José Guilherme
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Comprehensive Server Monitoring with Site24x7.
>> >> Monitor 10 servers for $9/Month.
>> >> Get alerted through email, SMS, voice calls or mobile push
>> notifications.
>> >> Take corrective actions from your mobile device.
>> >> http://p.sf.net/sfu/Zoho
>> >> _______________________________________________
>> >> Scikit-learn-general mailing list
>> >> Scikit-learn-general@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Comprehensive Server Monitoring with Site24x7.
>> > Monitor 10 servers for $9/Month.
>> > Get alerted through email, SMS, voice calls or mobile push
>> notifications.
>> > Take corrective actions from your mobile device.
>> > http://p.sf.net/sfu/Zoho
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to