On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds <stu...@stuartreynolds.net> wrote:
> Turns out sm.Logit does allow setting the tolerance. > Some and quick and dirty time profiling of different methods on a 100k > * 30 features dataset, with different solvers and losses: > > sklearn.LogisticRegression: l1 1.13864398003 (seconds) > sklearn.LogisticRegression: l2 0.0538778305054 > sm.Logit l1 0.0922629833221 # Although didn't converge > sm.Logit l1_cvxopt_cp 0.958268165588 > sm.Logit newton 0.133476018906 > sm.Logit nm 0.369864940643 > sm.Logit bfgs 0.105798006058 > sm.Logit lbfgs 0.06241106987 > sm.Logit powell 1.64219808578 > sm.Logit cg 0.2184278965 > sm.Logit ncg 0.216138124466 > sm.Logit basinhopping 8.82164621353 > sm.GLM.fit IRLS 0.544688940048 > sm.GLM L2: 1.29778695107 > > I've been getting good results from sm.Logit.fit (although unregularized). > statsmodels GLM seems a little slow. Not sure why. > > My benchmark may be a little apples-to-oranges, since the stopping > criteria probably aren't comparable. > I think that's a problem with GLM IRLS. AFAIK, but never fully tested, is that the objective function is proportional to the number of observations and the convergence criterion becomes tighter as nobs increases. I don't find the issue or PR discussion anymore, but one of our contributors fixed maxiter at 15 or something like that for IRLS with around 4 to 5 million observations and mostly categorical explanatory variables in his application. unfortunately (no upfront design and decisions across models) https://github.com/statsmodels/statsmodels/issues/2825 Josef > > > For tiny models, which I'm also building: 100 samples, 5 features > > sklearn.LogisticRegression: l1 0.00137376785278 > sklearn.LogisticRegression: l2 0.00167894363403 > sm.Logit l1 0.0198900699615 > sm.Logit l1_cvxopt_cp 0.162448167801 > sm.Logit newton 0.00689911842346 > sm.Logit nm 0.0754928588867 > sm.Logit bfgs 0.0210938453674 > sm.Logit lbfgs 0.0156588554382 > sm.Logit powell 0.0161390304565 > sm.Logit cg 0.00759506225586 > sm.Logit ncg 0.00541186332703 > sm.Logit basinhopping 0.3076171875 > sm.GLM.fit IRLS 0.00902199745178 > sm.GLM L2: 0.0208361148834 > > I couldn't get sm.GLM.fit to work with non "IRLS" solvers. (hits a > division by zero). > > ---- > > import sklearn.datasets > from sklearn.preprocessing import StandardScaler > X, y = sklearn.datasets.make_classification(n_samples=10000, > n_features=30, random_state=123) > X = StandardScaler(copy=True, with_mean=True, with_std=True).fit_transform( > X) > > import time > tol = 0.0001 > maxiter = 100 > DISP = 0 > > > if 1: # sk.LogisticRegression > import sklearn > from sklearn.linear_model import LogisticRegression > > for method in ["l1", "l2"]: # TODO, add solvers: > t = time.time() > model = LogisticRegression(C=1, tol=tol, max_iter=maxiter, > penalty=method) > model.fit(X,y) > print "sklearn.LogisticRegression:", method, time.time() - t > > > > > if 1: # sm.Logit.fit_regularized > from statsmodels.discrete.discrete_model import Logit > for method in ["l1", "l1_cvxopt_cp"]: > t = time.time() > model = Logit(y,X) > result = model.fit_regularized(method=method, maxiter=maxiter, > alpha=1., > abstol=tol, > acc=tol, > tol=tol, gtol=tol, pgtol=tol, > disp=DISP) > print "sm.Logit", method, time.time() - t > > if 1: # sm.Logit.fit > from statsmodels.discrete.discrete_model import Logit > > SOLVERS = ["newton", "nm", > "bfgs","lbfgs","powell","cg","ncg","basinhopping",] > for method in SOLVERS: > t = time.time() > model = Logit(y,X) > result = model.fit(method=method, maxiter=maxiter, > niter=maxiter, > ftol=tol, > tol=tol, gtol=tol, pgtol=tol, # Hmmm.. > needs to be reviewed. > disp=DISP) > print "sm.Logit", method, time.time() - t > > if 1: # sm.GLM.fit > from statsmodels.genmod.generalized_linear_model import GLM > from statsmodels.genmod.generalized_linear_model import families > for method in ["IRLS"]: > t = time.time() > model = GLM(y, X, family=families.Binomial(link= > families.links.logit)) > result = model.fit(method=method, cnvrg_tol=tol, > maxiter=maxiter, full_output=False, disp=DISP) > print "sm.GLM.fit", method, time.time() - t > > > if 1: # GLM.fit_regularized > from statsmodels.genmod.generalized_linear_model import GLM > from statsmodels.genmod.generalized_linear_model import families > t = time.time() > model = GLM(y, X, family=families.Binomial(link=families.links.logit)) > result = model.fit_regularized(method='elastic_net', alpha=1.0, > L1_wt=0.0, cnvrg_tol=tol, maxiter=maxiter) > print "sm.GLM L2:", time.time() - t > > > > if 0: # GLM.fit > # Hits division by zero. > SOLVERS = ["bfgs","lbfgs", "netwon", "nm", > "powell","cg","ncg","basinhopping",] > from statsmodels.genmod.generalized_linear_model import GLM > from statsmodels.genmod.generalized_linear_model import families > for method in SOLVERS: > t = time.time() > model = GLM(y, X, family=families.Binomial(link= > families.links.logit)) > result = model.fit(method=method, > # scale="X2", > # alpha=1., > # abstol=tol, > # acc=tol, > # tol=tol, gtol=tol, pgtol=tol, > # maxiter=maxiter, > # #full_output=False, > disp=DISP) > print "sm.GLM.fit", method, time.time() - t > > > On Thu, Oct 5, 2017 at 10:32 AM, Sean Violante <sean.viola...@gmail.com> > wrote: > > Stuart > > have you tried glmnet ( in R) there is a python version > > https://web.stanford.edu/~hastie/glmnet_python/ .... > > > > > > > > > > On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds < > stu...@stuartreynolds.net> > > wrote: > >> > >> Thanks Josef. Was very useful. > >> > >> result.remove_data() reduces a 5 parameter Logit result object from > >> megabytes to 5Kb (as compared to a minimum uncompressed size of the > >> parameters of ~320 bytes). Is big improvement. I'll experiment with > >> what you suggest -- since this is still >10x larger than possible. I > >> think the difference is mostly attribute names. > >> I don't mind the lack of a multinomial support. I've often had better > >> results mixing independent models for each class. > >> > >> I'll experiment with the different solvers. I tried the Logit model > >> in the past -- its fit function only exposed a maxiter, and not a > >> tolerance -- meaning I had to set maxiter very high. The newer > >> statsmodels GLM module looks great and seem to solve this. > >> > >> For other who come this way, I think the magic for ridge regression is: > >> > >> from statsmodels.genmod.generalized_linear_model import GLM > >> from statsmodels.genmod.generalized_linear_model import > families > >> from statsmodels.genmod.generalized_linear_model.families > import > >> links > >> > >> model = GLM(y, Xtrain, family=families.Binomial(link= > links.Logit)) > >> result = model.fit_regularized(method='elastic_net', > >> alpha=l2weight, L1_wt=0.0, tol=...) > >> result.remove_data() > >> result.predict(Xtest) > >> > >> One last thing -- its clear that it should be possible to do something > >> like scikit's LogisticRegressionCV in order to quickly optimize a > >> single parameter by re-using past coefficients. > >> Are there any wrappers in statsmodels for doing this or should I roll my > >> own? > >> > >> > >> - Stu > >> > >> > >> On Wed, Oct 4, 2017 at 3:43 PM, <josef.p...@gmail.com> wrote: > >> > > >> > > >> > On Wed, Oct 4, 2017 at 4:26 PM, Stuart Reynolds > >> > <stu...@stuartreynolds.net> > >> > wrote: > >> >> > >> >> Hi Andy, > >> >> Thanks -- I'll give another statsmodels another go. > >> >> I remember I had some fitting speed issues with it in the past, and > >> >> also some issues related their models keeping references to the data > >> >> (=disaster for serialization and multiprocessing) -- although that > was > >> >> a long time ago. > >> > > >> > > >> > The second has not changed and will not change, but there is a > >> > remove_data > >> > method that deletes all references to full, data sized arrays. > However, > >> > once > >> > the data is removed, it is not possible anymore to compute any new > >> > results > >> > statistics which are almost all lazily computed. > >> > The fitting speed depends a lot on the optimizer, convergence criteria > >> > and > >> > difficulty of the problem, and availability of good starting > parameters. > >> > Almost all nonlinear estimation problems use the scipy optimizers, all > >> > unconstrained optimizers can be used. There are no optimized special > >> > methods > >> > for cases with a very large number of features. > >> > > >> > Multinomial/multiclass models don't support continuous response (yet), > >> > all > >> > other GLM and discrete models allow for continuous data in the > interval > >> > extension of the domain. > >> > > >> > Josef > >> > > >> > > >> >> > >> >> - Stuart > >> >> > >> >> On Wed, Oct 4, 2017 at 1:09 PM, Andreas Mueller <t3k...@gmail.com> > >> >> wrote: > >> >> > Hi Stuart. > >> >> > There is no interface to do this in scikit-learn (and maybe we > should > >> >> > at > >> >> > this to the FAQ). > >> >> > Yes, in principle this would be possible with several of the > models. > >> >> > > >> >> > I think statsmodels can do that, and I think I saw another glm > >> >> > package > >> >> > for Python that does that? > >> >> > > >> >> > It's certainly a legitimate use-case but would require substantial > >> >> > changes to the code. I think so far we decided not to support > >> >> > this in scikit-learn. Basically we don't have a concept of a link > >> >> > function, and it's a concept that only applies to a subset of > models. > >> >> > We try to have a consistent interface for all our estimators, and > >> >> > this doesn't really fit well within that interface. > >> >> > > >> >> > Hth, > >> >> > Andy > >> >> > > >> >> > > >> >> > On 10/04/2017 03:58 PM, Stuart Reynolds wrote: > >> >> >> > >> >> >> I'd like to fit a model that maps a matrix of continuous inputs > to a > >> >> >> target that's between 0 and 1 (a probability). > >> >> >> > >> >> >> In principle, I'd expect logistic regression should work out of > the > >> >> >> box with no modification (although its often posed as being > strictly > >> >> >> for classification, its loss function allows for fitting targets > in > >> >> >> the range 0 to 1, and not strictly zero or one.) > >> >> >> > >> >> >> However, scikit's LogisticRegression and LogisticRegressionCV > reject > >> >> >> target arrays that are continuous. Other LR implementations allow > a > >> >> >> matrix of probability estimates. Looking at: > >> >> >> > >> >> >> > >> >> >> > >> >> >> http://scikit-learn-general.narkive.com/4dSCktaM/using- > logistic-regression-on-a-continuous-target-variable > >> >> >> and the fix here: > >> >> >> https://github.com/scikit-learn/scikit-learn/pull/5084, which > >> >> >> disables > >> >> >> continuous inputs, it looks like there was some reason for this. > So > >> >> >> ... I'm looking for alternatives. > >> >> >> > >> >> >> SGDClassifier allows log loss and (if I understood the docs > >> >> >> correctly) > >> >> >> adds a logistic link function, but also rejects continuous > targets. > >> >> >> Oddly, SGDRegressor only allows ‘squared_loss’, ‘huber’, > >> >> >> ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’, and > doesn't > >> >> >> seems to give a logistic function. > >> >> >> > >> >> >> In principle, GLM allow this, but scikit's docs say the GLM models > >> >> >> only allows strict linear functions of their input, and doesn't > >> >> >> allow > >> >> >> a logistic link function. The docs direct people to the > >> >> >> LogisticRegression class for this case. > >> >> >> > >> >> >> In R, there is: > >> >> >> > >> >> >> glm(Total_Service_Points_Won/Total_Service_Points_Played ~ ... , > >> >> >> family = binomial(link=logit), weights = > >> >> >> Total_Service_Points_Played) > >> >> >> which would be ideal. > >> >> >> > >> >> >> Is something similar available in scikit? (Or any continuous model > >> >> >> that takes and 0 to 1 target and outputs a 0 to 1 target?) > >> >> >> > >> >> >> I was surprised to see that the implementation of > >> >> >> CalibratedClassifierCV(method="sigmoid") uses an internal > >> >> >> implementation of logistic regression to do its logistic > regressing > >> >> >> -- > >> >> >> which I can use, although I'd prefer to use a user-facing library. > >> >> >> > >> >> >> Thanks, > >> >> >> - Stuart > >> >> >> _______________________________________________ > >> >> >> scikit-learn mailing list > >> >> >> scikit-learn@python.org > >> >> >> https://mail.python.org/mailman/listinfo/scikit-learn > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > scikit-learn mailing list > >> >> > scikit-learn@python.org > >> >> > https://mail.python.org/mailman/listinfo/scikit-learn > >> >> _______________________________________________ > >> >> scikit-learn mailing list > >> >> scikit-learn@python.org > >> >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > > >> > > >> > > >> > _______________________________________________ > >> > scikit-learn mailing list > >> > scikit-learn@python.org > >> > https://mail.python.org/mailman/listinfo/scikit-learn > >> > > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn