Thanks a lot Josef. I guess it is possible to do what I wanted, though
maybe not in scikit. Does the statsmodels version allow l1 or l2
regularization? I'm planning to use a lot of features and let the model
decide what is good.

Thanks again.

On Sat, Oct 3, 2015 at 11:20 PM, <josef.p...@gmail.com> wrote:

> Just to come in here as an econometrician and statsmodels maintainer.
>
> statsmodels intentionally doesn't enforce binary data for Logit or similar
> models, any data between 0 and 1 is fine.
>
> Logistic Regression/Logit or similar Binomial/Bernoulli models can
> consistently estimate the expected value (predicted mean) for a continuous
> variable that is between 0 and 1 like a proportion. (Binomial belongs to
> the exponential family where quasi-maximum likelihood method works well.)
> Inference has to be adjusted because a logit model cannot be "true" if the
> data is not binary.
>
> I have somewhere references and examples for this usecase.
>
> statsmodels doesn't do "classification", i.e. hard thresholding, users can
> do it themselves if they need to.
> Which means we leave classification to scikit-learn and only do
> regression, even for funny data, and statsmodels doesn't have methods that
> take advantage of the classification structure of a model.
>
> Josef
>
>
> On Sat, Oct 3, 2015 at 10:50 PM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> Hi, George,
>> logistic regression is a binary classifier by nature (class labels 0 and
>> 1). Scikit-learn supports multi-class classification via One-vs-One or
>> One-vs-All though; and there is a generalization (softmax) that gives you
>> meaningful probabilities for multiple classes (i.e., class probabilities
>> sum up to 1). In any case, logistic regression works with nominal class
>> labels - categorical class labels with no order implied.
>>
>> To keep a long story short: Logistic regression is a classifier, not a
>> regressor — the name is misleading, I agree. I think you may want to look
>> into regression analysis for your continuous target variable.
>>
>> Best,
>> Sebastian
>>
>> > On Oct 3, 2015, at 9:58 PM, George Bezerra <gbeze...@gmail.com> wrote:
>> >
>> > Hi there,
>> >
>> > I would like to train a logistic regression model on a continuous
>> (i.e., not categorical) target variable. The target is a probability, which
>> is why I am using a logistic regression for this problem. However, the
>> sklearn function tries to find the class labels by running a unique() on
>> the target values, which is disastrous if y is continuous.
>> >
>> > Is there a way to train logistic regression on a continuous target
>> variable in sklearn?
>> >
>> > Any help is highly appreciated.
>> >
>> > Best,
>> >
>> > George.
>> >
>> > --
>> > George Bezerra
>> >
>> ------------------------------------------------------------------------------
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
George Bezerra
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to