----- Mail original ----- > De: "Skipper Seabold" <jsseab...@gmail.com> > À: scikit-learn-general@lists.sourceforge.net > Envoyé: Lundi 8 Juillet 2013 19:40:36 > Objet: Re: [Scikit-learn-general] Defining a Density Estimation Interface > > On Mon, Jul 8, 2013 at 1:20 PM, Bertrand Thirion > <bertrand.thir...@inria.fr> wrote: > > > > De: "Jacob Vanderplas" <jake...@cs.washington.edu> > > À: scikit-learn-general@lists.sourceforge.net > > Envoyé: Dimanche 7 Juillet 2013 19:10:38 > > Objet: [Scikit-learn-general] Defining a Density Estimation > > Interface > > > > > > Hi, > > I've been working on a big rewrite of the Ball Tree and KD Tree in > > sklearn.neighbors [0], and one of the enhancements is a fast > > Kernel Density estimation routine. As part of the PR, I've > > created a KernelDensity class to wrap this functionality. For the > > initial pass at the interface, I've used the same method names > > used in sklearn.mixture.GMM, which (I believe) is the only other > > density estimation routine we currently have. In particular, I've > > defined these methods: > > > > - fit(X) -- fit the model > > - eval(X) -- compute the log-probability (i.e. normalized density) > > under the model at positions X > > - score(X) -- compute the log-likelihood of a set of data X under > > the model > > - sample(n_samples) -- draw random samples from the underlying > > density model > > > > Olivier suggested that perhaps ``eval`` is too generic a name, and > > should instead be something more specific (logprobability? > > loglikelihood? predict_loglikelihood? something else?) > > > > Sounds good to me. As a matter of taste, I like `log_likelihood`, > > which would be a synonym of `eval` in that case (as a second > > choice, log_density rather than log_probability) ? > > Why not conform to the already existing distributions interface in > scipy.stats? That's what we did with statsmodels. These are mostly > univariate distributions in scipy, but I think it generalized ok to > the multivariate density estimators and kernel regression models we > now have. > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous > https://github.com/statsmodels/statsmodels/tree/master/statsmodels/nonparametric > http://statsmodels.sourceforge.net/devel/nonparametric.html > > Then you'd have pdf, logpdf, cdf, logcdf, sf, rvs (not wild about > this > one, and I think we use sample in places), etc.
While I see a value in sticking to previously existing standards, I prefer more explicit naming. In any case, we would not implement cdf, logcdf, sf nor isf, since we'll mostly consider multi-dimensional settings; actually, only logpdf and rvs are really useful in skl. Bertrand > > Would it break too much the Pipeline interface in scikit-learn? If > not, I prefer rather to call things what they are. In any event, I > agree that eval is too generic and I'd add that the score function of > a distribution has a specific meaning already for parameterized > distributions. > > fwiw, > > Skipper > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general