On Wed, Nov 5, 2014 at 1:52 PM, Kyle Kastner <kastnerk...@gmail.com> wrote:
> In addition to the y=None thing, KDE doesn't have a transform or predict > method - and I don't think Pipeline supports score or score_samples. > That may have been the crucial thing I have missed :) -- Indeed KDE would have to be at the end of the pipeline, because it doesn't do any transforming - one can imagine it preceded by a Scaler as in José's example or e.g. PCA. Pipeline does implement a direct scoring in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/pipeline.py#L193 which passes through all the preceding transformations and then calls score on the last one, so that should be OK > Maybe someone can comment on this, but I don't think KDE is typically used > in a pipeline. > > In this particular case the code *seems* reasonable (and I am surprised it > doesn't work!), but I don't know much about the KDE stuff. Maybe a bug? > > On Wed, Nov 5, 2014 at 7:44 AM, Michael Eickenberg < > michael.eickenb...@gmail.com> wrote: > >> Hi José, >> >> yes, there seems to be an inconsistency, KernelDensity.fit has signature >> (self, X) and not (self, X, y=None) as is usually the case even if y is >> never used, see >> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/kde.py#L113 >> >> I think the generally accepted way of remedying this is to just add >> y=None in the signature of that function, as was done e.g. for PCA, see >> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L206 >> >> But maybe I am missing something crucial. Happy to make the PR if I am >> right about this. >> >> Michael >> >> On Wed, Nov 5, 2014 at 1:35 PM, José Guilherme Camargo de Souza < >> jose.camargo.so...@gmail.com> wrote: >> >>> Hi all, >>> >>> Is the KernelDensity estimator compatible with pipelines? When I try >>> to use it inside one >>> >>> pipe1 = make_pipeline(StandardScaler(with_mean=True, with_std=True), >>> KernelDensity(algorithm="auto", >>> kernel="gaussian", metric="euclidean")) >>> params = dict(kerneldensity__bandwidth=np.logspace(-10, 1, 100)) >>> search = GridSearchCV(pipe1, param_grid=params, verbose=1, n_jobs=8, >>> cv=5) >>> search.fit(feats1) >>> search.best_estimator_ >>> >>> I get a TypeError as follows: >>> >>> /home/desouza/anaconda/lib/python2.7/site-packages/sklearn/pipeline.pyc >>> in fit(self=Pipeline(steps=[('standardscaler', >>> StandardScale...euclidean', >>> metric_params=None, rtol=0))]), X=array([[ 5.701 , >>> 73.6443 , 61.7018 ...2.7188 , >>> 0.18243243, 0.21621622]]), y=None, **fit_params={}) >>> 125 def fit(self, X, y=None, **fit_params): >>> 126 """Fit all the transforms one after the other and >>> transform the >>> 127 data, then fit the transformed data using the final >>> estimator. >>> 128 """ >>> 129 Xt, fit_params = self._pre_transform(X, y, **fit_params) >>> --> 130 self.steps[-1][-1].fit(Xt, y, **fit_params) >>> 131 return self >>> 132 >>> 133 def fit_transform(self, X, y=None, **fit_params): >>> 134 """Fit all the transforms one after the other and >>> transform the >>> >>> TypeError: fit() takes exactly 2 arguments (3 given) >>> >>> Is this an issue or it is supposed not to be compatible? A quick >>> search in the mailing list and on stackoverflow did not return any >>> entry about this. >>> >>> Thanks, >>> José >>> >>> >>> On Tue, Oct 21, 2014 at 3:03 PM, Jacob Vanderplas >>> <jake...@cs.washington.edu> wrote: >>> > Hi Jose, >>> > The KDE implementation does work on multivariate data, and will in >>> general >>> > work for multimodal data as well. There are two caveats to that: >>> > >>> > 1. In the sklearn implementation, the bandwidth must be the same >>> across each >>> > dimension. If this poses a problem for your data, the data can be >>> scaled >>> > before the fit (Using StandardScaler or something similar). >>> > 2. The results will depend strongly on the choice of bandwidth: it's >>> > important to cross-validate to determine the optimal bandwidth, as is >>> done >>> > in >>> > >>> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html >>> > >>> > Good luck! >>> > Jake >>> > >>> > >>> > Jake VanderPlas >>> > Director of Research – Physical Sciences >>> > eScience Institute, University of Washington >>> > http://www.vanderplas.com >>> > >>> > On Tue, Oct 21, 2014 at 2:09 AM, José Guilherme Camargo de Souza >>> > <jose.camargo.so...@gmail.com> wrote: >>> >> >>> >> Hi all, >>> >> >>> >> I would like to ask if the density estimation implementation of scikit >>> >> works with multivariate multimodal data. In the digits example [1] it >>> >> is clear that it supports multivariate datasets and in the guide >>> >> description [2] a 1-D bimodal distribution is used. >>> >> >>> >> Is it possible to use the same implementation on multivariate >>> >> gaussian-shaped data with more than 2 modes? If so, are there any >>> >> shortcomings or useful tips when doing that? >>> >> >>> >> Thanks in advance, >>> >> José >>> >> >>> >> [1] >>> >> >>> http://scikit-learn.org/stable/auto_examples/neighbors/plot_digits_kde_sampling.html#example-neighbors-plot-digits-kde-sampling-py >>> >> [2] >>> >> >>> http://scikit-learn.org/stable/modules/density.html#kernel-density-estimation >>> >> José Guilherme >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> Comprehensive Server Monitoring with Site24x7. >>> >> Monitor 10 servers for $9/Month. >>> >> Get alerted through email, SMS, voice calls or mobile push >>> notifications. >>> >> Take corrective actions from your mobile device. >>> >> http://p.sf.net/sfu/Zoho >>> >> _______________________________________________ >>> >> Scikit-learn-general mailing list >>> >> Scikit-learn-general@lists.sourceforge.net >>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Comprehensive Server Monitoring with Site24x7. >>> > Monitor 10 servers for $9/Month. >>> > Get alerted through email, SMS, voice calls or mobile push >>> notifications. >>> > Take corrective actions from your mobile device. >>> > http://p.sf.net/sfu/Zoho >>> > _______________________________________________ >>> > Scikit-learn-general mailing list >>> > Scikit-learn-general@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general