Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Jan Hendrik Metzen Tue, 28 Jul 2015 23:59:44 -0700

Such a predict_proba_at() method would also make sense for Gaussian 
process regression. Currently, computing probability densities for GPs 
requires predicting mean and standard deviation (via "MSE") at X and 
using scipy.stats.norm.pdf to compute probability densities for y for 
the predicted mean and standard-deviation. I think it would be nice to 
allow this directily via the API. Thus +1 for adding a method like 
predict_proba_at().


Jan

On 29.07.2015 06:42, Mathieu Blondel wrote:
> Regarding predictions, I don't really see what's the problem. Using 
> GLMs as an example, you just need to do
>
> def predict(self, X):
>     if self.loss == "poisson":
>         return np.exp(np.dot(X, self.coef_))
>     else:
>         return np.dot(X, self.coef_)
>
> A nice thing about Poisson regression is that we can query the 
> probability p(y|x) for a specific integer y.
> https://en.wikipedia.org/wiki/Poisson_regression
>
> We need to decide an API for that (so far we have used predict_proba 
> for classification so the output was always n_samples x n_classes).
> How about predict_proba(X, at_y=some_integer)?
>
> However, this is also mean that we can't use predict_proba to detect 
> classifiers anymore...
> Another solution would be to introduce a new method 
> predict_proba_at(X, y=some_integer)...
>
> Mathieu
>
>
> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com 
> <mailto:t3k...@gmail.com>> wrote:
>
>     I was expecting there to be the actual poisson loss implemented in
>     the class, not just a log transform.
>
>
>
>     On 07/28/2015 02:03 PM, josef.p...@gmail.com
>     <mailto:josef.p...@gmail.com> wrote:
>>     Just a comment from the statistics sidelines
>>
>>     taking log of target and fitting a linear or other model doesn't
>>     make it into a Poisson model.
>>
>>     But maybe "Poisson loss" in machine learning is unrelated to the
>>     Poisson distribution or a Poisson model with E(y| x) = exp(x beta). ?
>>
>>     Josef
>>
>>
>>     On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller
>>     <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
>>
>>         I'd be happy with adding Poisson loss to more models, thought
>>         I think it would be more natural to first add it to GLM
>>         before GBM ;)
>>         If the addition is straight-forward, I think it would be a
>>         nice contribution nevertheless.
>>         1) for the user to do np.exp(gbmpoisson.predict(X)) is not
>>         acceptable. This needs to be automatic. It would be best if
>>         this could be done in a minimally intrusive way.
>>
>>         2) I'm not sure, maybe Peter can comment?
>>
>>         3) I would rather contribute sooner, but other might thing
>>         differently. Silently ignoring sample weights is not an
>>         option, but you can error if they are provided.
>>
>>         Hth,
>>         Andy
>>
>>
>>         On 07/23/2015 08:52 PM, Peter Rickwood wrote:
>>>
>>>         Hello sklearn developers,
>>>
>>>         I'd like the GBM implementation in sklearn to support
>>>         Poisson loss, and I'm comfortable in writing the code (I
>>>         have modified my local sklearn source already and am using
>>>         Poisson loss GBM's).
>>>
>>>         The sklearn site says to get in touch via this list before
>>>         making a contribution, so is it worth me to submitting
>>>         something along these lines?
>>>
>>>         If the answer is yes, some quick questions:
>>>
>>>         1) The simplest implementation of poisson loss GBMs is to
>>>         work in log-space (i.e. the GBM predicts log(target) rather
>>>         than target), and require the user to then take the
>>>         exponential of those predictions. So, you would need to do
>>>         something like:
>>>                   gbmpoisson =
>>>         sklearn.ensemble.GradientBoostingRegressor(...)
>>>         gbmpoisson.fit(X,y)
>>>                   preds = np.exp(predict(X))
>>>         I am comfortable making changes to the source for this to
>>>         work, but I'm not comfortable changing any of the
>>>         higher-level interface to deal automatically with the
>>>         transform. In other words, other developers would need to
>>>         either be OK with the GBM returning transformed predictions
>>>         in the case where "poisson" loss is chosen, or would need to
>>>         change code in the 'predict' function to automatically do
>>>         the transformation is poisson loss was specified. Is this OK?
>>>         2) If I do contribute, can you advise what the best tests
>>>         are to test/validate GBM loss functions before they are
>>>         considered to 'work'?
>>>
>>>         3) Allowing for weighted samples is in theory easy enough to
>>>         implement, but is not something I have implemented yet. Is
>>>         it better to contribute code sooner that doesn't handle
>>>         weighting (i.e. just ignores sample weights), or later that
>>>         does?
>>>
>>>
>>>
>>>
>>>         Cheers, and thanks for all your work on sklearn. Fantastic
>>>         tool/library,
>>>
>>>
>>>
>>>         Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         
>>> ------------------------------------------------------------------------------
>>>
>>>
>>>         _______________________________________________
>>>         Scikit-learn-general mailing list
>>>         Scikit-learn-general@lists.sourceforge.net  
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>         
>> ------------------------------------------------------------------------------
>>
>>         _______________________________________________
>>         Scikit-learn-general mailing list
>>         Scikit-learn-general@lists.sourceforge.net
>>         <mailto:Scikit-learn-general@lists.sourceforge.net>
>>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>>     
>> ------------------------------------------------------------------------------
>>
>>
>>     _______________________________________________
>>     Scikit-learn-general mailing list
>>     Scikit-learn-general@lists.sourceforge.net  
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>     
> ------------------------------------------------------------------------------
>
>     _______________________________________________
>     Scikit-learn-general mailing list
>     Scikit-learn-general@lists.sourceforge.net
>     <mailto:Scikit-learn-general@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


-- 
  Jan Hendrik Metzen,  Dr.rer.nat.
  Team Leader of Team "Sustained Learning"

  Universität Bremen und DFKI GmbH, Robotics Innovation Center
  FB 3 - Mathematik und Informatik
  AG Robotik
  Robert-Hooke-Straße 1
  28359 Bremen, Germany


  Tel.:     +49 421 178 45-4123
  Zentrale: +49 421 178 45-6611
  Fax:      +49 421 178 45-4150
  E-Mail:   j...@informatik.uni-bremen.de
  Homepage: http://www.informatik.uni-bremen.de/~jhm/

  Weitere Informationen: http://www.informatik.uni-bremen.de/robotik


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Reply via email to