Hi George, completely agreed that np.unique on continuous targets is messy - I have run into the same problem.
If I remember correctly, you can work around this by using sample_weight to inject the continuous target into the cross entropy loss: If p_i are the targets, then duplicate each sample, give it label 1 and p_i as sample weight and in the duplicate give it label 0 and 1-p_i as sample weight. There is a stackoverflow comment or answer by larsmans pertaining to this, but I can't find it right now. Hope this helps! Michael On Sunday, October 4, 2015, <josef.p...@gmail.com> wrote: > > > On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra <gbeze...@gmail.com > <javascript:_e(%7B%7D,'cvml','gbeze...@gmail.com');>> wrote: > >> Thanks a lot Josef. I guess it is possible to do what I wanted, though >> maybe not in scikit. Does the statsmodels version allow l1 or l2 >> regularization? I'm planning to use a lot of features and let the model >> decide what is good. >> >> > statsmodels has had L1 regularization for discrete models including Logit > for a while. But I don't have much experience with it, and it uses an > interior point algorithm. > Elastic net for maximum likelihood models using coordinate descend and > other penalized maximum likelihood methods like SCAD and structured L2 are > in PRs and will be merged over the next months. > > statsmodels, in contrast to scikit-learn, doesn't have much support for > large sparse features. > > Josef > > > >> Thanks again. >> >> On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com >> <javascript:_e(%7B%7D,'cvml','josef.p...@gmail.com');>> wrote: >> >>> Just to come in here as an econometrician and statsmodels maintainer. >>> >>> statsmodels intentionally doesn't enforce binary data for Logit or >>> similar models, any data between 0 and 1 is fine. >>> >>> Logistic Regression/Logit or similar Binomial/Bernoulli models can >>> consistently estimate the expected value (predicted mean) for a continuous >>> variable that is between 0 and 1 like a proportion. (Binomial belongs to >>> the exponential family where quasi-maximum likelihood method works well.) >>> Inference has to be adjusted because a logit model cannot be "true" if >>> the data is not binary. >>> >>> I have somewhere references and examples for this usecase. >>> >>> statsmodels doesn't do "classification", i.e. hard thresholding, users >>> can do it themselves if they need to. >>> Which means we leave classification to scikit-learn and only do >>> regression, even for funny data, and statsmodels doesn't have methods that >>> take advantage of the classification structure of a model. >>> >>> Josef >>> >>> >>> On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','se.rasc...@gmail.com');>> wrote: >>> >>>> Hi, George, >>>> logistic regression is a binary classifier by nature (class labels 0 >>>> and 1). Scikit-learn supports multi-class classification via One-vs-One or >>>> One-vs-All though; and there is a generalization (softmax) that gives you >>>> meaningful probabilities for multiple classes (i.e., class probabilities >>>> sum up to 1). In any case, logistic regression works with nominal class >>>> labels - categorical class labels with no order implied. >>>> >>>> To keep a long story short: Logistic regression is a classifier, not a >>>> regressor — the name is misleading, I agree. I think you may want to look >>>> into regression analysis for your continuous target variable. >>>> >>>> Best, >>>> Sebastian >>>> >>>> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','gbeze...@gmail.com');>> wrote: >>>> > >>>> > Hi there, >>>> > >>>> > I would like to train a logistic regression model on a continuous >>>> (i.e., not categorical) target variable. The target is a probability, which >>>> is why I am using a logistic regression for this problem. However, the >>>> sklearn function tries to find the class labels by running a unique() on >>>> the target values, which is disastrous if y is continuous. >>>> > >>>> > Is there a way to train logistic regression on a continuous target >>>> variable in sklearn? >>>> > >>>> > Any help is highly appreciated. >>>> > >>>> > Best, >>>> > >>>> > George. >>>> > >>>> > -- >>>> > George Bezerra >>>> > >>>> ------------------------------------------------------------------------------ >>>> > _______________________________________________ >>>> > Scikit-learn-general mailing list >>>> > Scikit-learn-general@lists.sourceforge.net >>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');> >>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> -- >> George Bezerra >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> <javascript:_e(%7B%7D,'cvml','Scikit-learn-general@lists.sourceforge.net');> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general