Hi José,

yes, there seems to be an inconsistency, KernelDensity.fit has signature
(self, X) and not (self, X, y=None) as is usually the case even if y is
never used, see
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/kde.py#L113

I think the generally accepted way of remedying this is to just add y=None
in the signature of that function, as was done e.g. for PCA, see
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L206

But maybe I am missing something crucial. Happy to make the PR if I am
right about this.

Michael

On Wed, Nov 5, 2014 at 1:35 PM, José Guilherme Camargo de Souza <
jose.camargo.so...@gmail.com> wrote:

> Hi all,
>
> Is the KernelDensity estimator compatible with pipelines? When I try
> to use it inside one
>
>     pipe1 = make_pipeline(StandardScaler(with_mean=True, with_std=True),
>                           KernelDensity(algorithm="auto",
> kernel="gaussian", metric="euclidean"))
>     params = dict(kerneldensity__bandwidth=np.logspace(-10, 1, 100))
>     search = GridSearchCV(pipe1, param_grid=params, verbose=1, n_jobs=8,
> cv=5)
>     search.fit(feats1)
>     search.best_estimator_
>
> I get a TypeError as follows:
>
> /home/desouza/anaconda/lib/python2.7/site-packages/sklearn/pipeline.pyc
> in fit(self=Pipeline(steps=[('standardscaler',
> StandardScale...euclidean',
>        metric_params=None, rtol=0))]), X=array([[  5.701     ,
> 73.6443    ,  61.7018    ...2.7188    ,
>           0.18243243,   0.21621622]]), y=None, **fit_params={})
>     125     def fit(self, X, y=None, **fit_params):
>     126         """Fit all the transforms one after the other and
> transform the
>     127         data, then fit the transformed data using the final
> estimator.
>     128         """
>     129         Xt, fit_params = self._pre_transform(X, y, **fit_params)
> --> 130         self.steps[-1][-1].fit(Xt, y, **fit_params)
>     131         return self
>     132
>     133     def fit_transform(self, X, y=None, **fit_params):
>     134         """Fit all the transforms one after the other and
> transform the
>
> TypeError: fit() takes exactly 2 arguments (3 given)
>
> Is this an issue or it is supposed not to be compatible? A quick
> search in the mailing list and on stackoverflow did not return any
> entry about this.
>
> Thanks,
> José
>
>
> On Tue, Oct 21, 2014 at 3:03 PM, Jacob Vanderplas
> <jake...@cs.washington.edu> wrote:
> > Hi Jose,
> > The KDE implementation does work on multivariate data, and will in
> general
> > work for multimodal data as well. There are two caveats to that:
> >
> > 1. In the sklearn implementation, the bandwidth must be the same across
> each
> > dimension. If this poses a problem for your data, the data can be scaled
> > before the fit (Using StandardScaler or something similar).
> > 2. The results will depend strongly on the choice of bandwidth: it's
> > important to cross-validate to determine the optimal bandwidth, as is
> done
> > in
> >
> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html
> >
> > Good luck!
> >   Jake
> >
> >
> >  Jake VanderPlas
> >  Director of Research – Physical Sciences
> >  eScience Institute, University of Washington
> >  http://www.vanderplas.com
> >
> > On Tue, Oct 21, 2014 at 2:09 AM, José Guilherme Camargo de Souza
> > <jose.camargo.so...@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> I would like to ask if the density estimation implementation of scikit
> >> works with multivariate multimodal data. In the digits example [1] it
> >> is clear that it supports multivariate datasets and in the guide
> >> description [2] a 1-D bimodal distribution is used.
> >>
> >> Is it possible to use the same implementation on multivariate
> >> gaussian-shaped data with more than 2 modes? If so, are there any
> >> shortcomings or useful tips when doing that?
> >>
> >> Thanks in advance,
> >> José
> >>
> >> [1]
> >>
> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html#example-neighbors-plot-digits-kde-sampling-py
> >> [2]
> >>
> http://scikit-learn.org/stable/modules/density.html#kernel-density-estimation
> >> José Guilherme
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Comprehensive Server Monitoring with Site24x7.
> >> Monitor 10 servers for $9/Month.
> >> Get alerted through email, SMS, voice calls or mobile push
> notifications.
> >> Take corrective actions from your mobile device.
> >> http://p.sf.net/sfu/Zoho
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Comprehensive Server Monitoring with Site24x7.
> > Monitor 10 servers for $9/Month.
> > Get alerted through email, SMS, voice calls or mobile push notifications.
> > Take corrective actions from your mobile device.
> > http://p.sf.net/sfu/Zoho
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to