Just to come in here as an econometrician and statsmodels maintainer.

statsmodels intentionally doesn't enforce binary data for Logit or similar
models, any data between 0 and 1 is fine.

Logistic Regression/Logit or similar Binomial/Bernoulli models can
consistently estimate the expected value (predicted mean) for a continuous
variable that is between 0 and 1 like a proportion. (Binomial belongs to
the exponential family where quasi-maximum likelihood method works well.)
Inference has to be adjusted because a logit model cannot be "true" if the
data is not binary.

I have somewhere references and examples for this usecase.

statsmodels doesn't do "classification", i.e. hard thresholding, users can
do it themselves if they need to.
Which means we leave classification to scikit-learn and only do regression,
even for funny data, and statsmodels doesn't have methods that take
advantage of the classification structure of a model.

Josef


On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> Hi, George,
> logistic regression is a binary classifier by nature (class labels 0 and
> 1). Scikit-learn supports multi-class classification via One-vs-One or
> One-vs-All though; and there is a generalization (softmax) that gives you
> meaningful probabilities for multiple classes (i.e., class probabilities
> sum up to 1). In any case, logistic regression works with nominal class
> labels - categorical class labels with no order implied.
>
> To keep a long story short: Logistic regression is a classifier, not a
> regressor — the name is misleading, I agree. I think you may want to look
> into regression analysis for your continuous target variable.
>
> Best,
> Sebastian
>
> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com> wrote:
> >
> > Hi there,
> >
> > I would like to train a logistic regression model on a continuous (i.e.,
> not categorical) target variable. The target is a probability, which is why
> I am using a logistic regression for this problem. However, the sklearn
> function tries to find the class labels by running a unique() on the target
> values, which is disastrous if y is continuous.
> >
> > Is there a way to train logistic regression on a continuous target
> variable in sklearn?
> >
> > Any help is highly appreciated.
> >
> > Best,
> >
> > George.
> >
> > --
> > George Bezerra
> >
> ------------------------------------------------------------------------------
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to