Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Jan Hendrik Metzen Wed, 29 Jul 2015 09:31:07 -0700

I am not sure about the name, score_samples would sound a bit strange 
for a conditional probability in my opinion. And likelihood is also 
misleading since its actually a conditional probability and not a 
conditional likelihood (the quantities on the right-hand side of 
conditioning are fixed and integrating over all y would be 1).


On 29.07.2015 16:16, Andreas Mueller wrote:
> Shouldn't that be "score_samples"?
> Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y).
> But it is the likelihood of some data given the model.
>
>
> On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote:
>> Such a predict_proba_at() method would also make sense for Gaussian
>> process regression. Currently, computing probability densities for GPs
>> requires predicting mean and standard deviation (via "MSE") at X and
>> using scipy.stats.norm.pdf to compute probability densities for y for
>> the predicted mean and standard-deviation. I think it would be nice to
>> allow this directily via the API. Thus +1 for adding a method like
>> predict_proba_at().
>>
>> Jan
>>
>> On 29.07.2015 06:42, Mathieu Blondel wrote:
>>> Regarding predictions, I don't really see what's the problem. Using
>>> GLMs as an example, you just need to do
>>>
>>> def predict(self, X):
>>>       if self.loss == "poisson":
>>>           return np.exp(np.dot(X, self.coef_))
>>>       else:
>>>           return np.dot(X, self.coef_)
>>>
>>> A nice thing about Poisson regression is that we can query the
>>> probability p(y|x) for a specific integer y.
>>> https://en.wikipedia.org/wiki/Poisson_regression
>>>
>>> We need to decide an API for that (so far we have used predict_proba
>>> for classification so the output was always n_samples x n_classes).
>>> How about predict_proba(X, at_y=some_integer)?
>>>
>>> However, this is also mean that we can't use predict_proba to detect
>>> classifiers anymore...
>>> Another solution would be to introduce a new method
>>> predict_proba_at(X, y=some_integer)...
>>>
>>> Mathieu
>>>
>>>
>>> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com
>>> <mailto:t3k...@gmail.com>> wrote:
>>>
>>>       I was expecting there to be the actual poisson loss implemented in
>>>       the class, not just a log transform.
>>>
>>>
>>>
>>>       On 07/28/2015 02:03 PM, josef.p...@gmail.com
>>>       <mailto:josef.p...@gmail.com> wrote:
>>>>       Just a comment from the statistics sidelines
>>>>
>>>>       taking log of target and fitting a linear or other model doesn't
>>>>       make it into a Poisson model.
>>>>
>>>>       But maybe "Poisson loss" in machine learning is unrelated to the
>>>>       Poisson distribution or a Poisson model with E(y| x) = exp(x beta). ?
>>>>
>>>>       Josef
>>>>
>>>>
>>>>       On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller
>>>>       <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
>>>>
>>>>           I'd be happy with adding Poisson loss to more models, thought
>>>>           I think it would be more natural to first add it to GLM
>>>>           before GBM ;)
>>>>           If the addition is straight-forward, I think it would be a
>>>>           nice contribution nevertheless.
>>>>           1) for the user to do np.exp(gbmpoisson.predict(X)) is not
>>>>           acceptable. This needs to be automatic. It would be best if
>>>>           this could be done in a minimally intrusive way.
>>>>
>>>>           2) I'm not sure, maybe Peter can comment?
>>>>
>>>>           3) I would rather contribute sooner, but other might thing
>>>>           differently. Silently ignoring sample weights is not an
>>>>           option, but you can error if they are provided.
>>>>
>>>>           Hth,
>>>>           Andy
>>>>
>>>>
>>>>           On 07/23/2015 08:52 PM, Peter Rickwood wrote:
>>>>>           Hello sklearn developers,
>>>>>
>>>>>           I'd like the GBM implementation in sklearn to support
>>>>>           Poisson loss, and I'm comfortable in writing the code (I
>>>>>           have modified my local sklearn source already and am using
>>>>>           Poisson loss GBM's).
>>>>>
>>>>>           The sklearn site says to get in touch via this list before
>>>>>           making a contribution, so is it worth me to submitting
>>>>>           something along these lines?
>>>>>
>>>>>           If the answer is yes, some quick questions:
>>>>>
>>>>>           1) The simplest implementation of poisson loss GBMs is to
>>>>>           work in log-space (i.e. the GBM predicts log(target) rather
>>>>>           than target), and require the user to then take the
>>>>>           exponential of those predictions. So, you would need to do
>>>>>           something like:
>>>>>                     gbmpoisson =
>>>>>           sklearn.ensemble.GradientBoostingRegressor(...)
>>>>>           gbmpoisson.fit(X,y)
>>>>>                     preds = np.exp(predict(X))
>>>>>           I am comfortable making changes to the source for this to
>>>>>           work, but I'm not comfortable changing any of the
>>>>>           higher-level interface to deal automatically with the
>>>>>           transform. In other words, other developers would need to
>>>>>           either be OK with the GBM returning transformed predictions
>>>>>           in the case where "poisson" loss is chosen, or would need to
>>>>>           change code in the 'predict' function to automatically do
>>>>>           the transformation is poisson loss was specified. Is this OK?
>>>>>           2) If I do contribute, can you advise what the best tests
>>>>>           are to test/validate GBM loss functions before they are
>>>>>           considered to 'work'?
>>>>>
>>>>>           3) Allowing for weighted samples is in theory easy enough to
>>>>>           implement, but is not something I have implemented yet. Is
>>>>>           it better to contribute code sooner that doesn't handle
>>>>>           weighting (i.e. just ignores sample weights), or later that
>>>>>           does?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           Cheers, and thanks for all your work on sklearn. Fantastic
>>>>>           tool/library,
>>>>>
>>>>>
>>>>>
>>>>>           Peter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>           _______________________________________________
>>>>>           Scikit-learn-general mailing list
>>>>>           Scikit-learn-general@lists.sourceforge.net  
>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>           
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>           
>>>> ------------------------------------------------------------------------------
>>>>
>>>>           _______________________________________________
>>>>           Scikit-learn-general mailing list
>>>>           Scikit-learn-general@lists.sourceforge.net
>>>>           <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>           https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>>
>>>>       
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>>       _______________________________________________
>>>>       Scikit-learn-general mailing list
>>>>       Scikit-learn-general@lists.sourceforge.net  
>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>       https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>       
>>> ------------------------------------------------------------------------------
>>>
>>>       _______________________________________________
>>>       Scikit-learn-general mailing list
>>>       Scikit-learn-general@lists.sourceforge.net
>>>       <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>       https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


-- 
  Jan Hendrik Metzen,  Dr.rer.nat.
  Team Leader of Team "Sustained Learning"

  Universität Bremen und DFKI GmbH, Robotics Innovation Center
  FB 3 - Mathematik und Informatik
  AG Robotik
  Robert-Hooke-Straße 1
  28359 Bremen, Germany


  Tel.:     +49 421 178 45-4123
  Zentrale: +49 421 178 45-6611
  Fax:      +49 421 178 45-4150
  E-Mail:   j...@informatik.uni-bremen.de
  Homepage: http://www.informatik.uni-bremen.de/~jhm/

  Weitere Informationen: http://www.informatik.uni-bremen.de/robotik


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Reply via email to