Such a predict_proba_at() method would also make sense for Gaussian process regression. Currently, computing probability densities for GPs requires predicting mean and standard deviation (via "MSE") at X and using scipy.stats.norm.pdf to compute probability densities for y for the predicted mean and standard-deviation. I think it would be nice to allow this directily via the API. Thus +1 for adding a method like predict_proba_at().
Jan On 29.07.2015 06:42, Mathieu Blondel wrote: > Regarding predictions, I don't really see what's the problem. Using > GLMs as an example, you just need to do > > def predict(self, X): > if self.loss == "poisson": > return np.exp(np.dot(X, self.coef_)) > else: > return np.dot(X, self.coef_) > > A nice thing about Poisson regression is that we can query the > probability p(y|x) for a specific integer y. > https://en.wikipedia.org/wiki/Poisson_regression > > We need to decide an API for that (so far we have used predict_proba > for classification so the output was always n_samples x n_classes). > How about predict_proba(X, at_y=some_integer)? > > However, this is also mean that we can't use predict_proba to detect > classifiers anymore... > Another solution would be to introduce a new method > predict_proba_at(X, y=some_integer)... > > Mathieu > > > On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com > <mailto:t3k...@gmail.com>> wrote: > > I was expecting there to be the actual poisson loss implemented in > the class, not just a log transform. > > > > On 07/28/2015 02:03 PM, josef.p...@gmail.com > <mailto:josef.p...@gmail.com> wrote: >> Just a comment from the statistics sidelines >> >> taking log of target and fitting a linear or other model doesn't >> make it into a Poisson model. >> >> But maybe "Poisson loss" in machine learning is unrelated to the >> Poisson distribution or a Poisson model with E(y| x) = exp(x beta). ? >> >> Josef >> >> >> On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller >> <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote: >> >> I'd be happy with adding Poisson loss to more models, thought >> I think it would be more natural to first add it to GLM >> before GBM ;) >> If the addition is straight-forward, I think it would be a >> nice contribution nevertheless. >> 1) for the user to do np.exp(gbmpoisson.predict(X)) is not >> acceptable. This needs to be automatic. It would be best if >> this could be done in a minimally intrusive way. >> >> 2) I'm not sure, maybe Peter can comment? >> >> 3) I would rather contribute sooner, but other might thing >> differently. Silently ignoring sample weights is not an >> option, but you can error if they are provided. >> >> Hth, >> Andy >> >> >> On 07/23/2015 08:52 PM, Peter Rickwood wrote: >>> >>> Hello sklearn developers, >>> >>> I'd like the GBM implementation in sklearn to support >>> Poisson loss, and I'm comfortable in writing the code (I >>> have modified my local sklearn source already and am using >>> Poisson loss GBM's). >>> >>> The sklearn site says to get in touch via this list before >>> making a contribution, so is it worth me to submitting >>> something along these lines? >>> >>> If the answer is yes, some quick questions: >>> >>> 1) The simplest implementation of poisson loss GBMs is to >>> work in log-space (i.e. the GBM predicts log(target) rather >>> than target), and require the user to then take the >>> exponential of those predictions. So, you would need to do >>> something like: >>> gbmpoisson = >>> sklearn.ensemble.GradientBoostingRegressor(...) >>> gbmpoisson.fit(X,y) >>> preds = np.exp(predict(X)) >>> I am comfortable making changes to the source for this to >>> work, but I'm not comfortable changing any of the >>> higher-level interface to deal automatically with the >>> transform. In other words, other developers would need to >>> either be OK with the GBM returning transformed predictions >>> in the case where "poisson" loss is chosen, or would need to >>> change code in the 'predict' function to automatically do >>> the transformation is poisson loss was specified. Is this OK? >>> 2) If I do contribute, can you advise what the best tests >>> are to test/validate GBM loss functions before they are >>> considered to 'work'? >>> >>> 3) Allowing for weighted samples is in theory easy enough to >>> implement, but is not something I have implemented yet. Is >>> it better to contribute code sooner that doesn't handle >>> weighting (i.e. just ignores sample weights), or later that >>> does? >>> >>> >>> >>> >>> Cheers, and thanks for all your work on sklearn. Fantastic >>> tool/library, >>> >>> >>> >>> Peter >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> <mailto:Scikit-learn-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> <mailto:Scikit-learn-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Jan Hendrik Metzen, Dr.rer.nat. Team Leader of Team "Sustained Learning" Universität Bremen und DFKI GmbH, Robotics Innovation Center FB 3 - Mathematik und Informatik AG Robotik Robert-Hooke-Straße 1 28359 Bremen, Germany Tel.: +49 421 178 45-4123 Zentrale: +49 421 178 45-6611 Fax: +49 421 178 45-4150 E-Mail: j...@informatik.uni-bremen.de Homepage: http://www.informatik.uni-bremen.de/~jhm/ Weitere Informationen: http://www.informatik.uni-bremen.de/robotik ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general