Re: [Scikit-learn-general] Defining a Density Estimation Interface

Bertrand Thirion Tue, 09 Jul 2013 04:54:50 -0700


----- Mail original -----
> De: "Skipper Seabold" <jsseab...@gmail.com>
> À: scikit-learn-general@lists.sourceforge.net
> Envoyé: Lundi 8 Juillet 2013 19:40:36
> Objet: Re: [Scikit-learn-general] Defining a Density Estimation Interface
> 
> On Mon, Jul 8, 2013 at 1:20 PM, Bertrand Thirion
> <bertrand.thir...@inria.fr> wrote:
> >
> > De: "Jacob Vanderplas" <jake...@cs.washington.edu>
> > À: scikit-learn-general@lists.sourceforge.net
> > Envoyé: Dimanche 7 Juillet 2013 19:10:38
> > Objet: [Scikit-learn-general] Defining a Density Estimation
> > Interface
> >
> >
> > Hi,
> > I've been working on a big rewrite of the Ball Tree and KD Tree in
> > sklearn.neighbors [0], and one of the enhancements is a fast
> > Kernel Density estimation routine.  As part of the PR, I've
> > created a KernelDensity class to wrap this functionality.  For the
> > initial pass at the interface, I've used the same method names
> > used in sklearn.mixture.GMM, which (I believe) is the only other
> > density estimation routine we currently have.  In particular, I've
> > defined these methods:
> >
> > - fit(X) -- fit the model
> > - eval(X) -- compute the log-probability (i.e. normalized density)
> > under the model at positions X
> > - score(X) -- compute the log-likelihood of a set of data X under
> > the model
> > - sample(n_samples) -- draw random samples from the underlying
> > density model
> >
> > Olivier suggested that perhaps ``eval`` is too generic a name, and
> > should instead be something more specific (logprobability?
> > loglikelihood? predict_loglikelihood? something else?)
> >
> > Sounds good to me. As a matter of taste, I like `log_likelihood`,
> > which would be a synonym of `eval` in that case (as a second
> > choice, log_density rather than log_probability) ?
> 
> Why not conform to the already existing distributions interface in
> scipy.stats? That's what we did with statsmodels. These are mostly
> univariate distributions in scipy, but I think it generalized ok to
> the multivariate density estimators and kernel regression models we
> now have.
> 
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous
> https://github.com/statsmodels/statsmodels/tree/master/statsmodels/nonparametric
> http://statsmodels.sourceforge.net/devel/nonparametric.html
> 
> Then you'd have pdf, logpdf, cdf, logcdf, sf, rvs (not wild about
> this
> one, and I think we use sample in places), etc.


While I see a value in sticking to previously existing standards, I prefer more 
explicit naming. 
In any case, we would not implement cdf, logcdf, sf nor isf, since we'll mostly 
consider multi-dimensional settings; actually, only logpdf and rvs are really 
useful in skl.

Bertrand

> 
> Would it break too much the Pipeline interface in scikit-learn? If
> not, I prefer rather to call things what they are. In any event, I
> agree that eval is too generic and I'd add that the score function of
> a distribution has a specific meaning already for parameterized
> distributions.
> 
> fwiw,
> 
> Skipper
> 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
> 
> Build for Windows Store.
> 
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Defining a Density Estimation Interface

Reply via email to