Hi José, yes, there seems to be an inconsistency, KernelDensity.fit has signature (self, X) and not (self, X, y=None) as is usually the case even if y is never used, see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/kde.py#L113
I think the generally accepted way of remedying this is to just add y=None in the signature of that function, as was done e.g. for PCA, see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L206 But maybe I am missing something crucial. Happy to make the PR if I am right about this. Michael On Wed, Nov 5, 2014 at 1:35 PM, José Guilherme Camargo de Souza < jose.camargo.so...@gmail.com> wrote: > Hi all, > > Is the KernelDensity estimator compatible with pipelines? When I try > to use it inside one > > pipe1 = make_pipeline(StandardScaler(with_mean=True, with_std=True), > KernelDensity(algorithm="auto", > kernel="gaussian", metric="euclidean")) > params = dict(kerneldensity__bandwidth=np.logspace(-10, 1, 100)) > search = GridSearchCV(pipe1, param_grid=params, verbose=1, n_jobs=8, > cv=5) > search.fit(feats1) > search.best_estimator_ > > I get a TypeError as follows: > > /home/desouza/anaconda/lib/python2.7/site-packages/sklearn/pipeline.pyc > in fit(self=Pipeline(steps=[('standardscaler', > StandardScale...euclidean', > metric_params=None, rtol=0))]), X=array([[ 5.701 , > 73.6443 , 61.7018 ...2.7188 , > 0.18243243, 0.21621622]]), y=None, **fit_params={}) > 125 def fit(self, X, y=None, **fit_params): > 126 """Fit all the transforms one after the other and > transform the > 127 data, then fit the transformed data using the final > estimator. > 128 """ > 129 Xt, fit_params = self._pre_transform(X, y, **fit_params) > --> 130 self.steps[-1][-1].fit(Xt, y, **fit_params) > 131 return self > 132 > 133 def fit_transform(self, X, y=None, **fit_params): > 134 """Fit all the transforms one after the other and > transform the > > TypeError: fit() takes exactly 2 arguments (3 given) > > Is this an issue or it is supposed not to be compatible? A quick > search in the mailing list and on stackoverflow did not return any > entry about this. > > Thanks, > José > > > On Tue, Oct 21, 2014 at 3:03 PM, Jacob Vanderplas > <jake...@cs.washington.edu> wrote: > > Hi Jose, > > The KDE implementation does work on multivariate data, and will in > general > > work for multimodal data as well. There are two caveats to that: > > > > 1. In the sklearn implementation, the bandwidth must be the same across > each > > dimension. If this poses a problem for your data, the data can be scaled > > before the fit (Using StandardScaler or something similar). > > 2. The results will depend strongly on the choice of bandwidth: it's > > important to cross-validate to determine the optimal bandwidth, as is > done > > in > > > http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html > > > > Good luck! > > Jake > > > > > > Jake VanderPlas > > Director of Research – Physical Sciences > > eScience Institute, University of Washington > > http://www.vanderplas.com > > > > On Tue, Oct 21, 2014 at 2:09 AM, José Guilherme Camargo de Souza > > <jose.camargo.so...@gmail.com> wrote: > >> > >> Hi all, > >> > >> I would like to ask if the density estimation implementation of scikit > >> works with multivariate multimodal data. In the digits example [1] it > >> is clear that it supports multivariate datasets and in the guide > >> description [2] a 1-D bimodal distribution is used. > >> > >> Is it possible to use the same implementation on multivariate > >> gaussian-shaped data with more than 2 modes? If so, are there any > >> shortcomings or useful tips when doing that? > >> > >> Thanks in advance, > >> José > >> > >> [1] > >> > http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html#example-neighbors-plot-digits-kde-sampling-py > >> [2] > >> > http://scikit-learn.org/stable/modules/density.html#kernel-density-estimation > >> José Guilherme > >> > >> > >> > ------------------------------------------------------------------------------ > >> Comprehensive Server Monitoring with Site24x7. > >> Monitor 10 servers for $9/Month. > >> Get alerted through email, SMS, voice calls or mobile push > notifications. > >> Take corrective actions from your mobile device. > >> http://p.sf.net/sfu/Zoho > >> _______________________________________________ > >> Scikit-learn-general mailing list > >> Scikit-learn-general@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > > > > > ------------------------------------------------------------------------------ > > Comprehensive Server Monitoring with Site24x7. > > Monitor 10 servers for $9/Month. > > Get alerted through email, SMS, voice calls or mobile push notifications. > > Take corrective actions from your mobile device. > > http://p.sf.net/sfu/Zoho > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general