Hi Andy, Thanks -- I'll give another statsmodels another go. I remember I had some fitting speed issues with it in the past, and also some issues related their models keeping references to the data (=disaster for serialization and multiprocessing) -- although that was a long time ago. - Stuart
On Wed, Oct 4, 2017 at 1:09 PM, Andreas Mueller <t3k...@gmail.com> wrote: > Hi Stuart. > There is no interface to do this in scikit-learn (and maybe we should at > this to the FAQ). > Yes, in principle this would be possible with several of the models. > > I think statsmodels can do that, and I think I saw another glm package > for Python that does that? > > It's certainly a legitimate use-case but would require substantial > changes to the code. I think so far we decided not to support > this in scikit-learn. Basically we don't have a concept of a link > function, and it's a concept that only applies to a subset of models. > We try to have a consistent interface for all our estimators, and > this doesn't really fit well within that interface. > > Hth, > Andy > > > On 10/04/2017 03:58 PM, Stuart Reynolds wrote: >> >> I'd like to fit a model that maps a matrix of continuous inputs to a >> target that's between 0 and 1 (a probability). >> >> In principle, I'd expect logistic regression should work out of the >> box with no modification (although its often posed as being strictly >> for classification, its loss function allows for fitting targets in >> the range 0 to 1, and not strictly zero or one.) >> >> However, scikit's LogisticRegression and LogisticRegressionCV reject >> target arrays that are continuous. Other LR implementations allow a >> matrix of probability estimates. Looking at: >> >> http://scikit-learn-general.narkive.com/4dSCktaM/using-logistic-regression-on-a-continuous-target-variable >> and the fix here: >> https://github.com/scikit-learn/scikit-learn/pull/5084, which disables >> continuous inputs, it looks like there was some reason for this. So >> ... I'm looking for alternatives. >> >> SGDClassifier allows log loss and (if I understood the docs correctly) >> adds a logistic link function, but also rejects continuous targets. >> Oddly, SGDRegressor only allows ‘squared_loss’, ‘huber’, >> ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’, and doesn't >> seems to give a logistic function. >> >> In principle, GLM allow this, but scikit's docs say the GLM models >> only allows strict linear functions of their input, and doesn't allow >> a logistic link function. The docs direct people to the >> LogisticRegression class for this case. >> >> In R, there is: >> >> glm(Total_Service_Points_Won/Total_Service_Points_Played ~ ... , >> family = binomial(link=logit), weights = Total_Service_Points_Played) >> which would be ideal. >> >> Is something similar available in scikit? (Or any continuous model >> that takes and 0 to 1 target and outputs a 0 to 1 target?) >> >> I was surprised to see that the implementation of >> CalibratedClassifierCV(method="sigmoid") uses an internal >> implementation of logistic regression to do its logistic regressing -- >> which I can use, although I'd prefer to use a user-facing library. >> >> Thanks, >> - Stuart >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn