Hi,
I've been working on a big rewrite of the Ball Tree and KD Tree in
sklearn.neighbors [0], and one of the enhancements is a fast Kernel Density
estimation routine. As part of the PR, I've created a KernelDensity class
to wrap this functionality. For the initial pass at the interface, I've
used the same method names used in sklearn.mixture.GMM, which (I believe)
is the only other density estimation routine we currently have. In
particular, I've defined these methods:
- fit(X) -- fit the model
- eval(X) -- compute the log-probability (i.e. normalized density) under
the model at positions X
- score(X) -- compute the log-likelihood of a set of data X under the model
- sample(n_samples) -- draw random samples from the underlying density model
Olivier suggested that perhaps ``eval`` is too generic a name, and should
instead be something more specific (logprobability? loglikelihood?
predict_loglikelihood? something else?)
I think this would be a good time to discuss what we'd like for a general
interface for density estimators within scikit-learn. A common interface
would have the advantage that several density estimators could be used
together within a general Bayesian generative classification routine (I've
created a proof-of-concept of this estimator at [1]). Note that any change
to the above method names would require the current GMM interface to be
modified.
Let me know if you have thoughts on this,
Jake
[0] https://github.com/scikit-learn/scikit-learn/pull/1732
[1] https://gist.github.com/jakevdp/5891921
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general