That's true, I wasn't aware that score_samples is used already in the context of density estimation. score_samples would be okay then in my opinion.
Jan On 29.07.2015 18:46, Andreas Mueller wrote: > Hm, I'm not entirely sure how score_samples is currently used, but I > think it is the probability > under a density model. > It would "only" change the meaning in so far as it is a conditional > distribution over y given x and not x. > > I'm not totally opposed to adding a new method, though I'm not sure I > like ``predict_proba_at`` > > On 07/29/2015 12:29 PM, Jan Hendrik Metzen wrote: >> I am not sure about the name, score_samples would sound a bit strange >> for a conditional probability in my opinion. And likelihood is also >> misleading since its actually a conditional probability and not a >> conditional likelihood (the quantities on the right-hand side of >> conditioning are fixed and integrating over all y would be 1). >> >> On 29.07.2015 16:16, Andreas Mueller wrote: >>> Shouldn't that be "score_samples"? >>> Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y). >>> But it is the likelihood of some data given the model. >>> >>> >>> On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote: >>>> Such a predict_proba_at() method would also make sense for Gaussian >>>> process regression. Currently, computing probability densities for GPs >>>> requires predicting mean and standard deviation (via "MSE") at X and >>>> using scipy.stats.norm.pdf to compute probability densities for y for >>>> the predicted mean and standard-deviation. I think it would be nice to >>>> allow this directily via the API. Thus +1 for adding a method like >>>> predict_proba_at(). >>>> >>>> Jan >>>> >>>> On 29.07.2015 06:42, Mathieu Blondel wrote: >>>>> Regarding predictions, I don't really see what's the problem. Using >>>>> GLMs as an example, you just need to do >>>>> >>>>> def predict(self, X): >>>>> if self.loss == "poisson": >>>>> return np.exp(np.dot(X, self.coef_)) >>>>> else: >>>>> return np.dot(X, self.coef_) >>>>> >>>>> A nice thing about Poisson regression is that we can query the >>>>> probability p(y|x) for a specific integer y. >>>>> https://en.wikipedia.org/wiki/Poisson_regression >>>>> >>>>> We need to decide an API for that (so far we have used predict_proba >>>>> for classification so the output was always n_samples x n_classes). >>>>> How about predict_proba(X, at_y=some_integer)? >>>>> >>>>> However, this is also mean that we can't use predict_proba to detect >>>>> classifiers anymore... >>>>> Another solution would be to introduce a new method >>>>> predict_proba_at(X, y=some_integer)... >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com >>>>> <mailto:t3k...@gmail.com>> wrote: >>>>> >>>>> I was expecting there to be the actual poisson loss implemented in >>>>> the class, not just a log transform. >>>>> >>>>> >>>>> >>>>> On 07/28/2015 02:03 PM, josef.p...@gmail.com >>>>> <mailto:josef.p...@gmail.com> wrote: >>>>>> Just a comment from the statistics sidelines >>>>>> >>>>>> taking log of target and fitting a linear or other model doesn't >>>>>> make it into a Poisson model. >>>>>> >>>>>> But maybe "Poisson loss" in machine learning is unrelated to the >>>>>> Poisson distribution or a Poisson model with E(y| x) = exp(x >>>>>> beta). ? >>>>>> >>>>>> Josef >>>>>> >>>>>> >>>>>> On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller >>>>>> <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote: >>>>>> >>>>>> I'd be happy with adding Poisson loss to more models, thought >>>>>> I think it would be more natural to first add it to GLM >>>>>> before GBM ;) >>>>>> If the addition is straight-forward, I think it would be a >>>>>> nice contribution nevertheless. >>>>>> 1) for the user to do np.exp(gbmpoisson.predict(X)) is not >>>>>> acceptable. This needs to be automatic. It would be best if >>>>>> this could be done in a minimally intrusive way. >>>>>> >>>>>> 2) I'm not sure, maybe Peter can comment? >>>>>> >>>>>> 3) I would rather contribute sooner, but other might thing >>>>>> differently. Silently ignoring sample weights is not an >>>>>> option, but you can error if they are provided. >>>>>> >>>>>> Hth, >>>>>> Andy >>>>>> >>>>>> >>>>>> On 07/23/2015 08:52 PM, Peter Rickwood wrote: >>>>>>> Hello sklearn developers, >>>>>>> >>>>>>> I'd like the GBM implementation in sklearn to support >>>>>>> Poisson loss, and I'm comfortable in writing the code (I >>>>>>> have modified my local sklearn source already and am using >>>>>>> Poisson loss GBM's). >>>>>>> >>>>>>> The sklearn site says to get in touch via this list before >>>>>>> making a contribution, so is it worth me to submitting >>>>>>> something along these lines? >>>>>>> >>>>>>> If the answer is yes, some quick questions: >>>>>>> >>>>>>> 1) The simplest implementation of poisson loss GBMs is to >>>>>>> work in log-space (i.e. the GBM predicts log(target) rather >>>>>>> than target), and require the user to then take the >>>>>>> exponential of those predictions. So, you would need to do >>>>>>> something like: >>>>>>> gbmpoisson = >>>>>>> sklearn.ensemble.GradientBoostingRegressor(...) >>>>>>> gbmpoisson.fit(X,y) >>>>>>> preds = np.exp(predict(X)) >>>>>>> I am comfortable making changes to the source for this to >>>>>>> work, but I'm not comfortable changing any of the >>>>>>> higher-level interface to deal automatically with the >>>>>>> transform. In other words, other developers would need to >>>>>>> either be OK with the GBM returning transformed predictions >>>>>>> in the case where "poisson" loss is chosen, or would need to >>>>>>> change code in the 'predict' function to automatically do >>>>>>> the transformation is poisson loss was specified. Is this >>>>>>> OK? >>>>>>> 2) If I do contribute, can you advise what the best tests >>>>>>> are to test/validate GBM loss functions before they are >>>>>>> considered to 'work'? >>>>>>> >>>>>>> 3) Allowing for weighted samples is in theory easy enough to >>>>>>> implement, but is not something I have implemented yet. Is >>>>>>> it better to contribute code sooner that doesn't handle >>>>>>> weighting (i.e. just ignores sample weights), or later that >>>>>>> does? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Cheers, and thanks for all your work on sklearn. Fantastic >>>>>>> tool/library, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Scikit-learn-general mailing list >>>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>>>>>> >>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>>>>> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Jan Hendrik Metzen, Dr.rer.nat. Team Leader of Team "Sustained Learning" Universität Bremen und DFKI GmbH, Robotics Innovation Center FB 3 - Mathematik und Informatik AG Robotik Robert-Hooke-Straße 1 28359 Bremen, Germany Tel.: +49 421 178 45-4123 Zentrale: +49 421 178 45-6611 Fax: +49 421 178 45-4150 E-Mail: j...@informatik.uni-bremen.de Homepage: http://www.informatik.uni-bremen.de/~jhm/ Weitere Informationen: http://www.informatik.uni-bremen.de/robotik ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general