Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Jan Hendrik Metzen Thu, 30 Jul 2015 05:12:08 -0700

That's true, I wasn't aware that score_samples is used already in the 
context of density estimation. score_samples would be okay then in my 
opinion.


Jan

On 29.07.2015 18:46, Andreas Mueller wrote:
> Hm, I'm not entirely sure how score_samples is currently used, but I
> think it is the probability
> under a density model.
> It would "only" change the meaning in so far as it is a conditional
> distribution over y given x and not x.
>
> I'm not totally opposed to adding a new method, though I'm not sure I
> like ``predict_proba_at``
>
> On 07/29/2015 12:29 PM, Jan Hendrik Metzen wrote:
>> I am not sure about the name, score_samples would sound a bit strange
>> for a conditional probability in my opinion. And likelihood is also
>> misleading since its actually a conditional probability and not a
>> conditional likelihood (the quantities on the right-hand side of
>> conditioning are fixed and integrating over all y would be 1).
>>
>> On 29.07.2015 16:16, Andreas Mueller wrote:
>>> Shouldn't that be "score_samples"?
>>> Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y).
>>> But it is the likelihood of some data given the model.
>>>
>>>
>>> On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote:
>>>> Such a predict_proba_at() method would also make sense for Gaussian
>>>> process regression. Currently, computing probability densities for GPs
>>>> requires predicting mean and standard deviation (via "MSE") at X and
>>>> using scipy.stats.norm.pdf to compute probability densities for y for
>>>> the predicted mean and standard-deviation. I think it would be nice to
>>>> allow this directily via the API. Thus +1 for adding a method like
>>>> predict_proba_at().
>>>>
>>>> Jan
>>>>
>>>> On 29.07.2015 06:42, Mathieu Blondel wrote:
>>>>> Regarding predictions, I don't really see what's the problem. Using
>>>>> GLMs as an example, you just need to do
>>>>>
>>>>> def predict(self, X):
>>>>>         if self.loss == "poisson":
>>>>>             return np.exp(np.dot(X, self.coef_))
>>>>>         else:
>>>>>             return np.dot(X, self.coef_)
>>>>>
>>>>> A nice thing about Poisson regression is that we can query the
>>>>> probability p(y|x) for a specific integer y.
>>>>> https://en.wikipedia.org/wiki/Poisson_regression
>>>>>
>>>>> We need to decide an API for that (so far we have used predict_proba
>>>>> for classification so the output was always n_samples x n_classes).
>>>>> How about predict_proba(X, at_y=some_integer)?
>>>>>
>>>>> However, this is also mean that we can't use predict_proba to detect
>>>>> classifiers anymore...
>>>>> Another solution would be to introduce a new method
>>>>> predict_proba_at(X, y=some_integer)...
>>>>>
>>>>> Mathieu
>>>>>
>>>>>
>>>>> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com
>>>>> <mailto:t3k...@gmail.com>> wrote:
>>>>>
>>>>>         I was expecting there to be the actual poisson loss implemented in
>>>>>         the class, not just a log transform.
>>>>>
>>>>>
>>>>>
>>>>>         On 07/28/2015 02:03 PM, josef.p...@gmail.com
>>>>>         <mailto:josef.p...@gmail.com> wrote:
>>>>>>         Just a comment from the statistics sidelines
>>>>>>
>>>>>>         taking log of target and fitting a linear or other model doesn't
>>>>>>         make it into a Poisson model.
>>>>>>
>>>>>>         But maybe "Poisson loss" in machine learning is unrelated to the
>>>>>>         Poisson distribution or a Poisson model with E(y| x) = exp(x 
>>>>>> beta). ?
>>>>>>
>>>>>>         Josef
>>>>>>
>>>>>>
>>>>>>         On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller
>>>>>>         <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
>>>>>>
>>>>>>             I'd be happy with adding Poisson loss to more models, thought
>>>>>>             I think it would be more natural to first add it to GLM
>>>>>>             before GBM ;)
>>>>>>             If the addition is straight-forward, I think it would be a
>>>>>>             nice contribution nevertheless.
>>>>>>             1) for the user to do np.exp(gbmpoisson.predict(X)) is not
>>>>>>             acceptable. This needs to be automatic. It would be best if
>>>>>>             this could be done in a minimally intrusive way.
>>>>>>
>>>>>>             2) I'm not sure, maybe Peter can comment?
>>>>>>
>>>>>>             3) I would rather contribute sooner, but other might thing
>>>>>>             differently. Silently ignoring sample weights is not an
>>>>>>             option, but you can error if they are provided.
>>>>>>
>>>>>>             Hth,
>>>>>>             Andy
>>>>>>
>>>>>>
>>>>>>             On 07/23/2015 08:52 PM, Peter Rickwood wrote:
>>>>>>>             Hello sklearn developers,
>>>>>>>
>>>>>>>             I'd like the GBM implementation in sklearn to support
>>>>>>>             Poisson loss, and I'm comfortable in writing the code (I
>>>>>>>             have modified my local sklearn source already and am using
>>>>>>>             Poisson loss GBM's).
>>>>>>>
>>>>>>>             The sklearn site says to get in touch via this list before
>>>>>>>             making a contribution, so is it worth me to submitting
>>>>>>>             something along these lines?
>>>>>>>
>>>>>>>             If the answer is yes, some quick questions:
>>>>>>>
>>>>>>>             1) The simplest implementation of poisson loss GBMs is to
>>>>>>>             work in log-space (i.e. the GBM predicts log(target) rather
>>>>>>>             than target), and require the user to then take the
>>>>>>>             exponential of those predictions. So, you would need to do
>>>>>>>             something like:
>>>>>>>                       gbmpoisson =
>>>>>>>             sklearn.ensemble.GradientBoostingRegressor(...)
>>>>>>>             gbmpoisson.fit(X,y)
>>>>>>>                       preds = np.exp(predict(X))
>>>>>>>             I am comfortable making changes to the source for this to
>>>>>>>             work, but I'm not comfortable changing any of the
>>>>>>>             higher-level interface to deal automatically with the
>>>>>>>             transform. In other words, other developers would need to
>>>>>>>             either be OK with the GBM returning transformed predictions
>>>>>>>             in the case where "poisson" loss is chosen, or would need to
>>>>>>>             change code in the 'predict' function to automatically do
>>>>>>>             the transformation is poisson loss was specified. Is this 
>>>>>>> OK?
>>>>>>>             2) If I do contribute, can you advise what the best tests
>>>>>>>             are to test/validate GBM loss functions before they are
>>>>>>>             considered to 'work'?
>>>>>>>
>>>>>>>             3) Allowing for weighted samples is in theory easy enough to
>>>>>>>             implement, but is not something I have implemented yet. Is
>>>>>>>             it better to contribute code sooner that doesn't handle
>>>>>>>             weighting (i.e. just ignores sample weights), or later that
>>>>>>>             does?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             Cheers, and thanks for all your work on sklearn. Fantastic
>>>>>>>             tool/library,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             Peter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>             _______________________________________________
>>>>>>>             Scikit-learn-general mailing list
>>>>>>>             Scikit-learn-general@lists.sourceforge.net  
>>>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>>>             
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>             
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>>             _______________________________________________
>>>>>>             Scikit-learn-general mailing list
>>>>>>             Scikit-learn-general@lists.sourceforge.net
>>>>>>             <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>>             
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>         _______________________________________________
>>>>>>         Scikit-learn-general mailing list
>>>>>>         Scikit-learn-general@lists.sourceforge.net  
>>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>         
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>         _______________________________________________
>>>>>         Scikit-learn-general mailing list
>>>>>         Scikit-learn-general@lists.sourceforge.net
>>>>>         <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


-- 
  Jan Hendrik Metzen,  Dr.rer.nat.
  Team Leader of Team "Sustained Learning"

  Universität Bremen und DFKI GmbH, Robotics Innovation Center
  FB 3 - Mathematik und Informatik
  AG Robotik
  Robert-Hooke-Straße 1
  28359 Bremen, Germany


  Tel.:     +49 421 178 45-4123
  Zentrale: +49 421 178 45-6611
  Fax:      +49 421 178 45-4150
  E-Mail:   j...@informatik.uni-bremen.de
  Homepage: http://www.informatik.uni-bremen.de/~jhm/

  Weitere Informationen: http://www.informatik.uni-bremen.de/robotik


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Reply via email to