That's true, I wasn't aware that score_samples is used already in the
context of density estimation. score_samples would be okay then in my
opinion.
Jan
On 29.07.2015 18:46, Andreas Mueller wrote:
Hm, I'm not entirely sure how score_samples is currently used, but I
think it is the
While the Gaussian distribution has a PDF, the Poisson distribution has a
PMF. From Wikipedia (https://en.wikipedia.org/wiki/Probability_mass_function
):
A probability mass function differs from a probability density function
(pdf) in that the latter is associated with continuous rather than
On Thu, Jul 30, 2015 at 11:38 PM, Andreas Mueller t3k...@gmail.com wrote:
I am mostly concerned about API explosion.
I take your point of PDF vs PMF.
Maybe predict_proba(X, y) is better.
Would you also support predict_proba(X, y) for classifiers (which would be
I support the inclusion of Poisson loss, although a quick note on
predict_prob_at:
The output of Poisson regression is a posterior distribution over the rate
parameter in the form of a Gamma distribution. If we assume no uncertainty
at all in the prediction, the posterior predictive distribution
Hm, I'm not entirely sure how score_samples is currently used, but I
think it is the probability
under a density model.
It would only change the meaning in so far as it is a conditional
distribution over y given x and not x.
I'm not totally opposed to adding a new method, though I'm not sure I
I am not sure about the name, score_samples would sound a bit strange
for a conditional probability in my opinion. And likelihood is also
misleading since its actually a conditional probability and not a
conditional likelihood (the quantities on the right-hand side of
conditioning are fixed
Such a predict_proba_at() method would also make sense for Gaussian
process regression. Currently, computing probability densities for GPs
requires predicting mean and standard deviation (via MSE) at X and
using scipy.stats.norm.pdf to compute probability densities for y for
the predicted mean
Shouldn't that be score_samples?
Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y).
But it is the likelihood of some data given the model.
On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote:
Such a predict_proba_at() method would also make sense for Gaussian
process regression.
Regarding predictions, I don't really see what's the problem. Using GLMs as
an example, you just need to do
def predict(self, X):
if self.loss == poisson:
return np.exp(np.dot(X, self.coef_))
else:
return np.dot(X, self.coef_)
A nice thing about Poisson regression is that
Just a comment from the statistics sidelines
taking log of target and fitting a linear or other model doesn't make it
into a Poisson model.
But maybe Poisson loss in machine learning is unrelated to the Poisson
distribution or a Poisson model with E(y| x) = exp(x beta). ?
Josef
On Tue, Jul
I'd be happy with adding Poisson loss to more models, thought I think it
would be more natural to first add it to GLM before GBM ;)
If the addition is straight-forward, I think it would be a nice
contribution nevertheless.
1) for the user to do np.exp(gbmpoisson.predict(X)) is not acceptable.
I was expecting there to be the actual poisson loss implemented in the
class, not just a log transform.
On 07/28/2015 02:03 PM, josef.p...@gmail.com wrote:
Just a comment from the statistics sidelines
taking log of target and fitting a linear or other model doesn't make
it into a Poisson
Hello sklearn developers,
I'd like the GBM implementation in sklearn to support Poisson loss, and I'm
comfortable in writing the code (I have modified my local sklearn source
already and am using Poisson loss GBM's).
The sklearn site says to get in touch via this list before making a
13 matches
Mail list logo