Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Andreas Mueller Wed, 29 Jul 2015 07:18:54 -0700

Shouldn't that be "score_samples"?
Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y).
But it is the likelihood of some data given the model.



On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote:
> Such a predict_proba_at() method would also make sense for Gaussian
> process regression. Currently, computing probability densities for GPs
> requires predicting mean and standard deviation (via "MSE") at X and
> using scipy.stats.norm.pdf to compute probability densities for y for
> the predicted mean and standard-deviation. I think it would be nice to
> allow this directily via the API. Thus +1 for adding a method like
> predict_proba_at().
>
> Jan
>
> On 29.07.2015 06:42, Mathieu Blondel wrote:
>> Regarding predictions, I don't really see what's the problem. Using
>> GLMs as an example, you just need to do
>>
>> def predict(self, X):
>>      if self.loss == "poisson":
>>          return np.exp(np.dot(X, self.coef_))
>>      else:
>>          return np.dot(X, self.coef_)
>>
>> A nice thing about Poisson regression is that we can query the
>> probability p(y|x) for a specific integer y.
>> https://en.wikipedia.org/wiki/Poisson_regression
>>
>> We need to decide an API for that (so far we have used predict_proba
>> for classification so the output was always n_samples x n_classes).
>> How about predict_proba(X, at_y=some_integer)?
>>
>> However, this is also mean that we can't use predict_proba to detect
>> classifiers anymore...
>> Another solution would be to introduce a new method
>> predict_proba_at(X, y=some_integer)...
>>
>> Mathieu
>>
>>
>> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com
>> <mailto:t3k...@gmail.com>> wrote:
>>
>>      I was expecting there to be the actual poisson loss implemented in
>>      the class, not just a log transform.
>>
>>
>>
>>      On 07/28/2015 02:03 PM, josef.p...@gmail.com
>>      <mailto:josef.p...@gmail.com> wrote:
>>>      Just a comment from the statistics sidelines
>>>
>>>      taking log of target and fitting a linear or other model doesn't
>>>      make it into a Poisson model.
>>>
>>>      But maybe "Poisson loss" in machine learning is unrelated to the
>>>      Poisson distribution or a Poisson model with E(y| x) = exp(x beta). ?
>>>
>>>      Josef
>>>
>>>
>>>      On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller
>>>      <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
>>>
>>>          I'd be happy with adding Poisson loss to more models, thought
>>>          I think it would be more natural to first add it to GLM
>>>          before GBM ;)
>>>          If the addition is straight-forward, I think it would be a
>>>          nice contribution nevertheless.
>>>          1) for the user to do np.exp(gbmpoisson.predict(X)) is not
>>>          acceptable. This needs to be automatic. It would be best if
>>>          this could be done in a minimally intrusive way.
>>>
>>>          2) I'm not sure, maybe Peter can comment?
>>>
>>>          3) I would rather contribute sooner, but other might thing
>>>          differently. Silently ignoring sample weights is not an
>>>          option, but you can error if they are provided.
>>>
>>>          Hth,
>>>          Andy
>>>
>>>
>>>          On 07/23/2015 08:52 PM, Peter Rickwood wrote:
>>>>          Hello sklearn developers,
>>>>
>>>>          I'd like the GBM implementation in sklearn to support
>>>>          Poisson loss, and I'm comfortable in writing the code (I
>>>>          have modified my local sklearn source already and am using
>>>>          Poisson loss GBM's).
>>>>
>>>>          The sklearn site says to get in touch via this list before
>>>>          making a contribution, so is it worth me to submitting
>>>>          something along these lines?
>>>>
>>>>          If the answer is yes, some quick questions:
>>>>
>>>>          1) The simplest implementation of poisson loss GBMs is to
>>>>          work in log-space (i.e. the GBM predicts log(target) rather
>>>>          than target), and require the user to then take the
>>>>          exponential of those predictions. So, you would need to do
>>>>          something like:
>>>>                    gbmpoisson =
>>>>          sklearn.ensemble.GradientBoostingRegressor(...)
>>>>          gbmpoisson.fit(X,y)
>>>>                    preds = np.exp(predict(X))
>>>>          I am comfortable making changes to the source for this to
>>>>          work, but I'm not comfortable changing any of the
>>>>          higher-level interface to deal automatically with the
>>>>          transform. In other words, other developers would need to
>>>>          either be OK with the GBM returning transformed predictions
>>>>          in the case where "poisson" loss is chosen, or would need to
>>>>          change code in the 'predict' function to automatically do
>>>>          the transformation is poisson loss was specified. Is this OK?
>>>>          2) If I do contribute, can you advise what the best tests
>>>>          are to test/validate GBM loss functions before they are
>>>>          considered to 'work'?
>>>>
>>>>          3) Allowing for weighted samples is in theory easy enough to
>>>>          implement, but is not something I have implemented yet. Is
>>>>          it better to contribute code sooner that doesn't handle
>>>>          weighting (i.e. just ignores sample weights), or later that
>>>>          does?
>>>>
>>>>
>>>>
>>>>
>>>>          Cheers, and thanks for all your work on sklearn. Fantastic
>>>>          tool/library,
>>>>
>>>>
>>>>
>>>>          Peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>>          _______________________________________________
>>>>          Scikit-learn-general mailing list
>>>>          Scikit-learn-general@lists.sourceforge.net  
>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>          https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>          
>>> ------------------------------------------------------------------------------
>>>
>>>          _______________________________________________
>>>          Scikit-learn-general mailing list
>>>          Scikit-learn-general@lists.sourceforge.net
>>>          <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>          https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>>      
>>> ------------------------------------------------------------------------------
>>>
>>>
>>>      _______________________________________________
>>>      Scikit-learn-general mailing list
>>>      Scikit-learn-general@lists.sourceforge.net  
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>      https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>      
>> ------------------------------------------------------------------------------
>>
>>      _______________________________________________
>>      Scikit-learn-general mailing list
>>      Scikit-learn-general@lists.sourceforge.net
>>      <mailto:Scikit-learn-general@lists.sourceforge.net>
>>      https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Reply via email to