Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

Mathieu Blondel Sun, 04 Oct 2015 01:21:51 -0700

I've seen logistic regression used in a regression setting in a few papers
as well. A nice thing is that the predictions are mapped to [0, 1].


The correct way to add this to scikit-learn would be to add a regression
class `LogisticRegressor` and rename the existing class to
`LogisticClassifier`. The np.unique check would be only in the classifier.

We can also add it to SGDRegressor.

Mathieu

On Sun, Oct 4, 2015 at 3:11 PM, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:

> Hi George,
>
> completely agreed that np.unique on continuous targets is messy - I have
> run into the same problem.
>
> If I remember correctly, you can work around this by using sample_weight
> to inject the continuous target into the cross entropy loss:
>
> If p_i are the targets, then duplicate each sample, give it label 1 and
> p_i as sample weight and in the duplicate give it label 0 and 1-p_i as
> sample weight.
>
> There is a stackoverflow comment or answer by larsmans pertaining to this,
> but I can't find it right now.
>
> Hope this helps!
> Michael
>
>
> On Sunday, October 4, 2015, <josef.p...@gmail.com> wrote:
>
>>
>>
>> On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra <gbeze...@gmail.com>
>> wrote:
>>
>>> Thanks a lot Josef. I guess it is possible to do what I wanted, though
>>> maybe not in scikit. Does the statsmodels version allow l1 or l2
>>> regularization? I'm planning to use a lot of features and let the model
>>> decide what is good.
>>>
>>>
>> statsmodels has had L1 regularization for discrete models including Logit
>> for a while. But I don't have much experience with it, and it uses an
>> interior point algorithm.
>> Elastic net for maximum likelihood models using coordinate descend and
>> other penalized maximum likelihood methods like SCAD and structured L2 are
>> in PRs and will be merged over the next months.
>>
>> statsmodels, in contrast to scikit-learn, doesn't have much support for
>> large sparse features.
>>
>> Josef
>>
>>
>>
>>> Thanks again.
>>>
>>> On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com> wrote:
>>>
>>>> Just to come in here as an econometrician and statsmodels maintainer.
>>>>
>>>> statsmodels intentionally doesn't enforce binary data for Logit or
>>>> similar models, any data between 0 and 1 is fine.
>>>>
>>>> Logistic Regression/Logit or similar Binomial/Bernoulli models can
>>>> consistently estimate the expected value (predicted mean) for a continuous
>>>> variable that is between 0 and 1 like a proportion. (Binomial belongs to
>>>> the exponential family where quasi-maximum likelihood method works well.)
>>>> Inference has to be adjusted because a logit model cannot be "true" if
>>>> the data is not binary.
>>>>
>>>> I have somewhere references and examples for this usecase.
>>>>
>>>> statsmodels doesn't do "classification", i.e. hard thresholding, users
>>>> can do it themselves if they need to.
>>>> Which means we leave classification to scikit-learn and only do
>>>> regression, even for funny data, and statsmodels doesn't have methods that
>>>> take advantage of the classification structure of a model.
>>>>
>>>> Josef
>>>>
>>>>
>>>> On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <
>>>> se.rasc...@gmail.com> wrote:
>>>>
>>>>> Hi, George,
>>>>> logistic regression is a binary classifier by nature (class labels 0
>>>>> and 1). Scikit-learn supports multi-class classification via One-vs-One or
>>>>> One-vs-All though; and there is a generalization (softmax) that gives you
>>>>> meaningful probabilities for multiple classes (i.e., class probabilities
>>>>> sum up to 1). In any case, logistic regression works with nominal class
>>>>> labels - categorical class labels with no order implied.
>>>>>
>>>>> To keep a long story short: Logistic regression is a classifier, not a
>>>>> regressor — the name is misleading, I agree. I think you may want to look
>>>>> into regression analysis for your continuous target variable.
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi there,
>>>>> >
>>>>> > I would like to train a logistic regression model on a continuous
>>>>> (i.e., not categorical) target variable. The target is a probability, 
>>>>> which
>>>>> is why I am using a logistic regression for this problem. However, the
>>>>> sklearn function tries to find the class labels by running a unique() on
>>>>> the target values, which is disastrous if y is continuous.
>>>>> >
>>>>> > Is there a way to train logistic regression on a continuous target
>>>>> variable in sklearn?
>>>>> >
>>>>> > Any help is highly appreciated.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > George.
>>>>> >
>>>>> > --
>>>>> > George Bezerra
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > _______________________________________________
>>>>> > Scikit-learn-general mailing list
>>>>> > Scikit-learn-general@lists.sourceforge.net
>>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> --
>>> George Bezerra
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

Reply via email to