Re: [Scikit-learn-general] Question about KernelDensity implementation

José Guilherme Camargo de Souza Wed, 05 Nov 2014 04:38:01 -0800

Hi all,

Is the KernelDensity estimator compatible with pipelines? When I try
to use it inside one


    pipe1 = make_pipeline(StandardScaler(with_mean=True, with_std=True),
                          KernelDensity(algorithm="auto",
kernel="gaussian", metric="euclidean"))
    params = dict(kerneldensity__bandwidth=np.logspace(-10, 1, 100))
    search = GridSearchCV(pipe1, param_grid=params, verbose=1, n_jobs=8, cv=5)
    search.fit(feats1)
    search.best_estimator_

I get a TypeError as follows:

/home/desouza/anaconda/lib/python2.7/site-packages/sklearn/pipeline.pyc
in fit(self=Pipeline(steps=[('standardscaler',
StandardScale...euclidean',
       metric_params=None, rtol=0))]), X=array([[  5.701     ,
73.6443    ,  61.7018    ...2.7188    ,
          0.18243243,   0.21621622]]), y=None, **fit_params={})
    125     def fit(self, X, y=None, **fit_params):
    126         """Fit all the transforms one after the other and transform the
    127         data, then fit the transformed data using the final estimator.
    128         """
    129         Xt, fit_params = self._pre_transform(X, y, **fit_params)
--> 130         self.steps[-1][-1].fit(Xt, y, **fit_params)
    131         return self
    132
    133     def fit_transform(self, X, y=None, **fit_params):
    134         """Fit all the transforms one after the other and transform the

TypeError: fit() takes exactly 2 arguments (3 given)

Is this an issue or it is supposed not to be compatible? A quick
search in the mailing list and on stackoverflow did not return any
entry about this.

Thanks,
José


On Tue, Oct 21, 2014 at 3:03 PM, Jacob Vanderplas
<jake...@cs.washington.edu> wrote:
> Hi Jose,
> The KDE implementation does work on multivariate data, and will in general
> work for multimodal data as well. There are two caveats to that:
>
> 1. In the sklearn implementation, the bandwidth must be the same across each
> dimension. If this poses a problem for your data, the data can be scaled
> before the fit (Using StandardScaler or something similar).
> 2. The results will depend strongly on the choice of bandwidth: it's
> important to cross-validate to determine the optimal bandwidth, as is done
> in
> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html
>
> Good luck!
>   Jake
>
>
>  Jake VanderPlas
>  Director of Research – Physical Sciences
>  eScience Institute, University of Washington
>  http://www.vanderplas.com
>
> On Tue, Oct 21, 2014 at 2:09 AM, José Guilherme Camargo de Souza
> <jose.camargo.so...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I would like to ask if the density estimation implementation of scikit
>> works with multivariate multimodal data. In the digits example [1] it
>> is clear that it supports multivariate datasets and in the guide
>> description [2] a 1-D bimodal distribution is used.
>>
>> Is it possible to use the same implementation on multivariate
>> gaussian-shaped data with more than 2 modes? If so, are there any
>> shortcomings or useful tips when doing that?
>>
>> Thanks in advance,
>> José
>>
>> [1]
>> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html#example-neighbors-plot-digits-kde-sampling-py
>> [2]
>> http://scikit-learn.org/stable/modules/density.html#kernel-density-estimation
>> José Guilherme
>>
>>
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Question about KernelDensity implementation

Reply via email to