Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Andreas Mueller Wed, 29 Jul 2015 09:48:07 -0700

Hm, I'm not entirely sure how score_samples is currently used, but I 
think it is the probability
under a density model.
It would "only" change the meaning in so far as it is a conditional 
distribution over y given x and not x.


I'm not totally opposed to adding a new method, though I'm not sure I 
like ``predict_proba_at``

On 07/29/2015 12:29 PM, Jan Hendrik Metzen wrote:
> I am not sure about the name, score_samples would sound a bit strange
> for a conditional probability in my opinion. And likelihood is also
> misleading since its actually a conditional probability and not a
> conditional likelihood (the quantities on the right-hand side of
> conditioning are fixed and integrating over all y would be 1).
>
> On 29.07.2015 16:16, Andreas Mueller wrote:
>> Shouldn't that be "score_samples"?
>> Well, it is a conditional likelihood p(y|x), not p(x) or p(x, y).
>> But it is the likelihood of some data given the model.
>>
>>
>> On 07/29/2015 02:58 AM, Jan Hendrik Metzen wrote:
>>> Such a predict_proba_at() method would also make sense for Gaussian
>>> process regression. Currently, computing probability densities for GPs
>>> requires predicting mean and standard deviation (via "MSE") at X and
>>> using scipy.stats.norm.pdf to compute probability densities for y for
>>> the predicted mean and standard-deviation. I think it would be nice to
>>> allow this directily via the API. Thus +1 for adding a method like
>>> predict_proba_at().
>>>
>>> Jan
>>>
>>> On 29.07.2015 06:42, Mathieu Blondel wrote:
>>>> Regarding predictions, I don't really see what's the problem. Using
>>>> GLMs as an example, you just need to do
>>>>
>>>> def predict(self, X):
>>>>        if self.loss == "poisson":
>>>>            return np.exp(np.dot(X, self.coef_))
>>>>        else:
>>>>            return np.dot(X, self.coef_)
>>>>
>>>> A nice thing about Poisson regression is that we can query the
>>>> probability p(y|x) for a specific integer y.
>>>> https://en.wikipedia.org/wiki/Poisson_regression
>>>>
>>>> We need to decide an API for that (so far we have used predict_proba
>>>> for classification so the output was always n_samples x n_classes).
>>>> How about predict_proba(X, at_y=some_integer)?
>>>>
>>>> However, this is also mean that we can't use predict_proba to detect
>>>> classifiers anymore...
>>>> Another solution would be to introduce a new method
>>>> predict_proba_at(X, y=some_integer)...
>>>>
>>>> Mathieu
>>>>
>>>>
>>>> On Wed, Jul 29, 2015 at 4:19 AM, Andreas Mueller <t3k...@gmail.com
>>>> <mailto:t3k...@gmail.com>> wrote:
>>>>
>>>>        I was expecting there to be the actual poisson loss implemented in
>>>>        the class, not just a log transform.
>>>>
>>>>
>>>>
>>>>        On 07/28/2015 02:03 PM, josef.p...@gmail.com
>>>>        <mailto:josef.p...@gmail.com> wrote:
>>>>>        Just a comment from the statistics sidelines
>>>>>
>>>>>        taking log of target and fitting a linear or other model doesn't
>>>>>        make it into a Poisson model.
>>>>>
>>>>>        But maybe "Poisson loss" in machine learning is unrelated to the
>>>>>        Poisson distribution or a Poisson model with E(y| x) = exp(x 
>>>>> beta). ?
>>>>>
>>>>>        Josef
>>>>>
>>>>>
>>>>>        On Tue, Jul 28, 2015 at 2:46 PM, Andreas Mueller
>>>>>        <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
>>>>>
>>>>>            I'd be happy with adding Poisson loss to more models, thought
>>>>>            I think it would be more natural to first add it to GLM
>>>>>            before GBM ;)
>>>>>            If the addition is straight-forward, I think it would be a
>>>>>            nice contribution nevertheless.
>>>>>            1) for the user to do np.exp(gbmpoisson.predict(X)) is not
>>>>>            acceptable. This needs to be automatic. It would be best if
>>>>>            this could be done in a minimally intrusive way.
>>>>>
>>>>>            2) I'm not sure, maybe Peter can comment?
>>>>>
>>>>>            3) I would rather contribute sooner, but other might thing
>>>>>            differently. Silently ignoring sample weights is not an
>>>>>            option, but you can error if they are provided.
>>>>>
>>>>>            Hth,
>>>>>            Andy
>>>>>
>>>>>
>>>>>            On 07/23/2015 08:52 PM, Peter Rickwood wrote:
>>>>>>            Hello sklearn developers,
>>>>>>
>>>>>>            I'd like the GBM implementation in sklearn to support
>>>>>>            Poisson loss, and I'm comfortable in writing the code (I
>>>>>>            have modified my local sklearn source already and am using
>>>>>>            Poisson loss GBM's).
>>>>>>
>>>>>>            The sklearn site says to get in touch via this list before
>>>>>>            making a contribution, so is it worth me to submitting
>>>>>>            something along these lines?
>>>>>>
>>>>>>            If the answer is yes, some quick questions:
>>>>>>
>>>>>>            1) The simplest implementation of poisson loss GBMs is to
>>>>>>            work in log-space (i.e. the GBM predicts log(target) rather
>>>>>>            than target), and require the user to then take the
>>>>>>            exponential of those predictions. So, you would need to do
>>>>>>            something like:
>>>>>>                      gbmpoisson =
>>>>>>            sklearn.ensemble.GradientBoostingRegressor(...)
>>>>>>            gbmpoisson.fit(X,y)
>>>>>>                      preds = np.exp(predict(X))
>>>>>>            I am comfortable making changes to the source for this to
>>>>>>            work, but I'm not comfortable changing any of the
>>>>>>            higher-level interface to deal automatically with the
>>>>>>            transform. In other words, other developers would need to
>>>>>>            either be OK with the GBM returning transformed predictions
>>>>>>            in the case where "poisson" loss is chosen, or would need to
>>>>>>            change code in the 'predict' function to automatically do
>>>>>>            the transformation is poisson loss was specified. Is this OK?
>>>>>>            2) If I do contribute, can you advise what the best tests
>>>>>>            are to test/validate GBM loss functions before they are
>>>>>>            considered to 'work'?
>>>>>>
>>>>>>            3) Allowing for weighted samples is in theory easy enough to
>>>>>>            implement, but is not something I have implemented yet. Is
>>>>>>            it better to contribute code sooner that doesn't handle
>>>>>>            weighting (i.e. just ignores sample weights), or later that
>>>>>>            does?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            Cheers, and thanks for all your work on sklearn. Fantastic
>>>>>>            tool/library,
>>>>>>
>>>>>>
>>>>>>
>>>>>>            Peter
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>            _______________________________________________
>>>>>>            Scikit-learn-general mailing list
>>>>>>            Scikit-learn-general@lists.sourceforge.net  
>>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>>            
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>            
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>            _______________________________________________
>>>>>            Scikit-learn-general mailing list
>>>>>            Scikit-learn-general@lists.sourceforge.net
>>>>>            <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>            
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>        
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>        _______________________________________________
>>>>>        Scikit-learn-general mailing list
>>>>>        Scikit-learn-general@lists.sourceforge.net  
>>>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>>        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>        
>>>> ------------------------------------------------------------------------------
>>>>
>>>>        _______________________________________________
>>>>        Scikit-learn-general mailing list
>>>>        Scikit-learn-general@lists.sourceforge.net
>>>>        <mailto:Scikit-learn-general@lists.sourceforge.net>
>>>>        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

Reply via email to