Thanks a lot Josef. I guess it is possible to do what I wanted, though maybe not in scikit. Does the statsmodels version allow l1 or l2 regularization? I'm planning to use a lot of features and let the model decide what is good.
Thanks again. On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com> wrote: > Just to come in here as an econometrician and statsmodels maintainer. > > statsmodels intentionally doesn't enforce binary data for Logit or similar > models, any data between 0 and 1 is fine. > > Logistic Regression/Logit or similar Binomial/Bernoulli models can > consistently estimate the expected value (predicted mean) for a continuous > variable that is between 0 and 1 like a proportion. (Binomial belongs to > the exponential family where quasi-maximum likelihood method works well.) > Inference has to be adjusted because a logit model cannot be "true" if the > data is not binary. > > I have somewhere references and examples for this usecase. > > statsmodels doesn't do "classification", i.e. hard thresholding, users can > do it themselves if they need to. > Which means we leave classification to scikit-learn and only do > regression, even for funny data, and statsmodels doesn't have methods that > take advantage of the classification structure of a model. > > Josef > > > On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > >> Hi, George, >> logistic regression is a binary classifier by nature (class labels 0 and >> 1). Scikit-learn supports multi-class classification via One-vs-One or >> One-vs-All though; and there is a generalization (softmax) that gives you >> meaningful probabilities for multiple classes (i.e., class probabilities >> sum up to 1). In any case, logistic regression works with nominal class >> labels - categorical class labels with no order implied. >> >> To keep a long story short: Logistic regression is a classifier, not a >> regressor — the name is misleading, I agree. I think you may want to look >> into regression analysis for your continuous target variable. >> >> Best, >> Sebastian >> >> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com> wrote: >> > >> > Hi there, >> > >> > I would like to train a logistic regression model on a continuous >> (i.e., not categorical) target variable. The target is a probability, which >> is why I am using a logistic regression for this problem. However, the >> sklearn function tries to find the class labels by running a unique() on >> the target values, which is disastrous if y is continuous. >> > >> > Is there a way to train logistic regression on a continuous target >> variable in sklearn? >> > >> > Any help is highly appreciated. >> > >> > Best, >> > >> > George. >> > >> > -- >> > George Bezerra >> > >> ------------------------------------------------------------------------------ >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > Scikit-learn-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- George Bezerra
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general