Just to come in here as an econometrician and statsmodels maintainer. statsmodels intentionally doesn't enforce binary data for Logit or similar models, any data between 0 and 1 is fine.
Logistic Regression/Logit or similar Binomial/Bernoulli models can consistently estimate the expected value (predicted mean) for a continuous variable that is between 0 and 1 like a proportion. (Binomial belongs to the exponential family where quasi-maximum likelihood method works well.) Inference has to be adjusted because a logit model cannot be "true" if the data is not binary. I have somewhere references and examples for this usecase. statsmodels doesn't do "classification", i.e. hard thresholding, users can do it themselves if they need to. Which means we leave classification to scikit-learn and only do regression, even for funny data, and statsmodels doesn't have methods that take advantage of the classification structure of a model. Josef On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com> wrote: > Hi, George, > logistic regression is a binary classifier by nature (class labels 0 and > 1). Scikit-learn supports multi-class classification via One-vs-One or > One-vs-All though; and there is a generalization (softmax) that gives you > meaningful probabilities for multiple classes (i.e., class probabilities > sum up to 1). In any case, logistic regression works with nominal class > labels - categorical class labels with no order implied. > > To keep a long story short: Logistic regression is a classifier, not a > regressor — the name is misleading, I agree. I think you may want to look > into regression analysis for your continuous target variable. > > Best, > Sebastian > > > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com> wrote: > > > > Hi there, > > > > I would like to train a logistic regression model on a continuous (i.e., > not categorical) target variable. The target is a probability, which is why > I am using a logistic regression for this problem. However, the sklearn > function tries to find the class labels by running a unique() on the target > values, which is disastrous if y is continuous. > > > > Is there a way to train logistic regression on a continuous target > variable in sklearn? > > > > Any help is highly appreciated. > > > > Best, > > > > George. > > > > -- > > George Bezerra > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general