I've seen logistic regression used in a regression setting in a few papers as well. A nice thing is that the predictions are mapped to [0, 1].
The correct way to add this to scikit-learn would be to add a regression class `LogisticRegressor` and rename the existing class to `LogisticClassifier`. The np.unique check would be only in the classifier. We can also add it to SGDRegressor. Mathieu On Sun, Oct 4, 2015 at 3:11 PM, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > Hi George, > > completely agreed that np.unique on continuous targets is messy - I have > run into the same problem. > > If I remember correctly, you can work around this by using sample_weight > to inject the continuous target into the cross entropy loss: > > If p_i are the targets, then duplicate each sample, give it label 1 and > p_i as sample weight and in the duplicate give it label 0 and 1-p_i as > sample weight. > > There is a stackoverflow comment or answer by larsmans pertaining to this, > but I can't find it right now. > > Hope this helps! > Michael > > > On Sunday, October 4, 2015, <josef.p...@gmail.com> wrote: > >> >> >> On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra <gbeze...@gmail.com> >> wrote: >> >>> Thanks a lot Josef. I guess it is possible to do what I wanted, though >>> maybe not in scikit. Does the statsmodels version allow l1 or l2 >>> regularization? I'm planning to use a lot of features and let the model >>> decide what is good. >>> >>> >> statsmodels has had L1 regularization for discrete models including Logit >> for a while. But I don't have much experience with it, and it uses an >> interior point algorithm. >> Elastic net for maximum likelihood models using coordinate descend and >> other penalized maximum likelihood methods like SCAD and structured L2 are >> in PRs and will be merged over the next months. >> >> statsmodels, in contrast to scikit-learn, doesn't have much support for >> large sparse features. >> >> Josef >> >> >> >>> Thanks again. >>> >>> On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com> wrote: >>> >>>> Just to come in here as an econometrician and statsmodels maintainer. >>>> >>>> statsmodels intentionally doesn't enforce binary data for Logit or >>>> similar models, any data between 0 and 1 is fine. >>>> >>>> Logistic Regression/Logit or similar Binomial/Bernoulli models can >>>> consistently estimate the expected value (predicted mean) for a continuous >>>> variable that is between 0 and 1 like a proportion. (Binomial belongs to >>>> the exponential family where quasi-maximum likelihood method works well.) >>>> Inference has to be adjusted because a logit model cannot be "true" if >>>> the data is not binary. >>>> >>>> I have somewhere references and examples for this usecase. >>>> >>>> statsmodels doesn't do "classification", i.e. hard thresholding, users >>>> can do it themselves if they need to. >>>> Which means we leave classification to scikit-learn and only do >>>> regression, even for funny data, and statsmodels doesn't have methods that >>>> take advantage of the classification structure of a model. >>>> >>>> Josef >>>> >>>> >>>> On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka < >>>> se.rasc...@gmail.com> wrote: >>>> >>>>> Hi, George, >>>>> logistic regression is a binary classifier by nature (class labels 0 >>>>> and 1). Scikit-learn supports multi-class classification via One-vs-One or >>>>> One-vs-All though; and there is a generalization (softmax) that gives you >>>>> meaningful probabilities for multiple classes (i.e., class probabilities >>>>> sum up to 1). In any case, logistic regression works with nominal class >>>>> labels - categorical class labels with no order implied. >>>>> >>>>> To keep a long story short: Logistic regression is a classifier, not a >>>>> regressor — the name is misleading, I agree. I think you may want to look >>>>> into regression analysis for your continuous target variable. >>>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com> >>>>> wrote: >>>>> > >>>>> > Hi there, >>>>> > >>>>> > I would like to train a logistic regression model on a continuous >>>>> (i.e., not categorical) target variable. The target is a probability, >>>>> which >>>>> is why I am using a logistic regression for this problem. However, the >>>>> sklearn function tries to find the class labels by running a unique() on >>>>> the target values, which is disastrous if y is continuous. >>>>> > >>>>> > Is there a way to train logistic regression on a continuous target >>>>> variable in sklearn? >>>>> > >>>>> > Any help is highly appreciated. >>>>> > >>>>> > Best, >>>>> > >>>>> > George. >>>>> > >>>>> > -- >>>>> > George Bezerra >>>>> > >>>>> ------------------------------------------------------------------------------ >>>>> > _______________________________________________ >>>>> > Scikit-learn-general mailing list >>>>> > Scikit-learn-general@lists.sourceforge.net >>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> >>> >>> -- >>> George Bezerra >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general