Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

Michael Eickenberg Sat, 03 Oct 2015 23:13:06 -0700

Hi George,

completely agreed that np.unique on continuous targets is messy - I have
run into the same problem.


If I remember correctly, you can work around this by using sample_weight to
inject the continuous target into the cross entropy loss:

If p_i are the targets, then duplicate each sample, give it label 1 and p_i
as sample weight and in the duplicate give it label 0 and 1-p_i as sample
weight.

There is a stackoverflow comment or answer by larsmans pertaining to this,
but I can't find it right now.

Hope this helps!
Michael

On Sunday, October 4, 2015, <josef.p...@gmail.com> wrote:

>
>
> On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra <gbeze...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gbeze...@gmail.com');>> wrote:
>
>> Thanks a lot Josef. I guess it is possible to do what I wanted, though
>> maybe not in scikit. Does the statsmodels version allow l1 or l2
>> regularization? I'm planning to use a lot of features and let the model
>> decide what is good.
>>
>>
> statsmodels has had L1 regularization for discrete models including Logit
> for a while. But I don't have much experience with it, and it uses an
> interior point algorithm.
> Elastic net for maximum likelihood models using coordinate descend and
> other penalized maximum likelihood methods like SCAD and structured L2 are
> in PRs and will be merged over the next months.
>
> statsmodels, in contrast to scikit-learn, doesn't have much support for
> large sparse features.
>
> Josef
>
>
>
>> Thanks again.
>>
>> On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','josef.p...@gmail.com');>> wrote:
>>
>>> Just to come in here as an econometrician and statsmodels maintainer.
>>>
>>> statsmodels intentionally doesn't enforce binary data for Logit or
>>> similar models, any data between 0 and 1 is fine.
>>>
>>> Logistic Regression/Logit or similar Binomial/Bernoulli models can
>>> consistently estimate the expected value (predicted mean) for a continuous
>>> variable that is between 0 and 1 like a proportion. (Binomial belongs to
>>> the exponential family where quasi-maximum likelihood method works well.)
>>> Inference has to be adjusted because a logit model cannot be "true" if
>>> the data is not binary.
>>>
>>> I have somewhere references and examples for this usecase.
>>>
>>> statsmodels doesn't do "classification", i.e. hard thresholding, users
>>> can do it themselves if they need to.
>>> Which means we leave classification to scikit-learn and only do
>>> regression, even for funny data, and statsmodels doesn't have methods that
>>> take advantage of the classification structure of a model.
>>>
>>> Josef
>>>
>>>
>>> On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','se.rasc...@gmail.com');>> wrote:
>>>
>>>> Hi, George,
>>>> logistic regression is a binary classifier by nature (class labels 0
>>>> and 1). Scikit-learn supports multi-class classification via One-vs-One or
>>>> One-vs-All though; and there is a generalization (softmax) that gives you
>>>> meaningful probabilities for multiple classes (i.e., class probabilities
>>>> sum up to 1). In any case, logistic regression works with nominal class
>>>> labels - categorical class labels with no order implied.
>>>>
>>>> To keep a long story short: Logistic regression is a classifier, not a
>>>> regressor — the name is misleading, I agree. I think you may want to look
>>>> into regression analysis for your continuous target variable.
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','gbeze...@gmail.com');>> wrote:
>>>> >
>>>> > Hi there,
>>>> >
>>>> > I would like to train a logistic regression model on a continuous
>>>> (i.e., not categorical) target variable. The target is a probability, which
>>>> is why I am using a logistic regression for this problem. However, the
>>>> sklearn function tries to find the class labels by running a unique() on
>>>> the target values, which is disastrous if y is continuous.
>>>> >
>>>> > Is there a way to train logistic regression on a continuous target
>>>> variable in sklearn?
>>>> >
>>>> > Any help is highly appreciated.
>>>> >
>>>> > Best,
>>>> >
>>>> > George.
>>>> >
>>>> > --
>>>> > George Bezerra
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > _______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > Scikit-learn-general@lists.sourceforge.net
>>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> George Bezerra
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

Reply via email to