Good know -- thank you.
On Fri, Oct 6, 2017 at 5:25 AM, <josef.p...@gmail.com> wrote: > > > On Thu, Oct 5, 2017 at 3:27 PM, <josef.p...@gmail.com> wrote: >> >> >> >> On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds >> <stu...@stuartreynolds.net> wrote: >>> >>> Turns out sm.Logit does allow setting the tolerance. >>> Some and quick and dirty time profiling of different methods on a 100k >>> * 30 features dataset, with different solvers and losses: >>> >>> sklearn.LogisticRegression: l1 1.13864398003 (seconds) >>> sklearn.LogisticRegression: l2 0.0538778305054 >>> sm.Logit l1 0.0922629833221 # Although didn't converge >>> sm.Logit l1_cvxopt_cp 0.958268165588 >>> sm.Logit newton 0.133476018906 >>> sm.Logit nm 0.369864940643 >>> sm.Logit bfgs 0.105798006058 >>> sm.Logit lbfgs 0.06241106987 >>> sm.Logit powell 1.64219808578 >>> sm.Logit cg 0.2184278965 >>> sm.Logit ncg 0.216138124466 >>> sm.Logit basinhopping 8.82164621353 >>> sm.GLM.fit IRLS 0.544688940048 >>> sm.GLM L2: 1.29778695107 >>> >>> I've been getting good results from sm.Logit.fit (although >>> unregularized). >>> statsmodels GLM seems a little slow. Not sure why. >>> >>> My benchmark may be a little apples-to-oranges, since the stopping >>> criteria probably aren't comparable. >> >> >> I think that's a problem with GLM IRLS. >> AFAIK, but never fully tested, is that the objective function is >> proportional to the number of observations and the convergence >> criterion becomes tighter as nobs increases. >> I don't find the issue or PR discussion anymore, but one of our >> contributors fixed maxiter at 15 or something like that for IRLS with >> around 4 to 5 million observations and mostly categorical explanatory >> variables in his application. >> >> unfortunately (no upfront design and decisions across models) >> https://github.com/statsmodels/statsmodels/issues/2825 > > > > Interesting timing excercise, I tried a bit more yesterday. > > GLM IRLS is not slow because of the convergence criterion, but it seems like > it takes much longer when the design matrix is not well conditioned. > The random dataset generated by sklearn has singular values in the range of > 1e-14 or 1e-15 > This doesn't affect the other estimators much and lbfgs is almost always the > fastest with bfgs close behind. > > When I add some noise to the feature matrix so it's not so close to > singular, then IRLS is roughly in the same neighborhood as the faster scipy > optimizers > With n_samples=1000000, n_features=50, Logit is around 5 or 6 seconds (for > lbfgs, bfgs and newton) slightly faster than sklearnLogistic regression > regularized, but GLM is about 4 times slower with 17 to 20 seconds > GLM L2 is much slower in this case because of the current non-optimized > implementation of coordinate descend. > > aside: In master and next release of statsmodels there is a interface to > scipy.minimize, which allows that all new optimizers can be used, e.g. > dogleg and other new trust region newton methods will be better optimizers > for many cases. > > Josef > > >> >> >> Josef >> >> >>> >>> >>> >>> For tiny models, which I'm also building: 100 samples, 5 features >>> >>> sklearn.LogisticRegression: l1 0.00137376785278 >>> sklearn.LogisticRegression: l2 0.00167894363403 >>> sm.Logit l1 0.0198900699615 >>> sm.Logit l1_cvxopt_cp 0.162448167801 >>> sm.Logit newton 0.00689911842346 >>> sm.Logit nm 0.0754928588867 >>> sm.Logit bfgs 0.0210938453674 >>> sm.Logit lbfgs 0.0156588554382 >>> sm.Logit powell 0.0161390304565 >>> sm.Logit cg 0.00759506225586 >>> sm.Logit ncg 0.00541186332703 >>> sm.Logit basinhopping 0.3076171875 >>> sm.GLM.fit IRLS 0.00902199745178 >>> sm.GLM L2: 0.0208361148834 >>> >>> I couldn't get sm.GLM.fit to work with non "IRLS" solvers. (hits a >>> division by zero). >>> >>> >>> ---- >>> >>> import sklearn.datasets >>> from sklearn.preprocessing import StandardScaler >>> X, y = sklearn.datasets.make_classification(n_samples=10000, >>> n_features=30, random_state=123) >>> X = StandardScaler(copy=True, with_mean=True, >>> with_std=True).fit_transform(X) >>> >>> import time >>> tol = 0.0001 >>> maxiter = 100 >>> DISP = 0 >>> >>> >>> if 1: # sk.LogisticRegression >>> import sklearn >>> from sklearn.linear_model import LogisticRegression >>> >>> for method in ["l1", "l2"]: # TODO, add solvers: >>> t = time.time() >>> model = LogisticRegression(C=1, tol=tol, max_iter=maxiter, >>> penalty=method) >>> model.fit(X,y) >>> print "sklearn.LogisticRegression:", method, time.time() - t >>> >>> >>> >>> >>> if 1: # sm.Logit.fit_regularized >>> from statsmodels.discrete.discrete_model import Logit >>> for method in ["l1", "l1_cvxopt_cp"]: >>> t = time.time() >>> model = Logit(y,X) >>> result = model.fit_regularized(method=method, maxiter=maxiter, >>> alpha=1., >>> abstol=tol, >>> acc=tol, >>> tol=tol, gtol=tol, pgtol=tol, >>> disp=DISP) >>> print "sm.Logit", method, time.time() - t >>> >>> if 1: # sm.Logit.fit >>> from statsmodels.discrete.discrete_model import Logit >>> >>> SOLVERS = ["newton", "nm", >>> "bfgs","lbfgs","powell","cg","ncg","basinhopping",] >>> for method in SOLVERS: >>> t = time.time() >>> model = Logit(y,X) >>> result = model.fit(method=method, maxiter=maxiter, >>> niter=maxiter, >>> ftol=tol, >>> tol=tol, gtol=tol, pgtol=tol, # Hmmm.. >>> needs to be reviewed. >>> disp=DISP) >>> print "sm.Logit", method, time.time() - t >>> >>> if 1: # sm.GLM.fit >>> from statsmodels.genmod.generalized_linear_model import GLM >>> from statsmodels.genmod.generalized_linear_model import families >>> for method in ["IRLS"]: >>> t = time.time() >>> model = GLM(y, X, >>> family=families.Binomial(link=families.links.logit)) >>> result = model.fit(method=method, cnvrg_tol=tol, >>> maxiter=maxiter, full_output=False, disp=DISP) >>> print "sm.GLM.fit", method, time.time() - t >>> >>> >>> if 1: # GLM.fit_regularized >>> from statsmodels.genmod.generalized_linear_model import GLM >>> from statsmodels.genmod.generalized_linear_model import families >>> t = time.time() >>> model = GLM(y, X, >>> family=families.Binomial(link=families.links.logit)) >>> result = model.fit_regularized(method='elastic_net', alpha=1.0, >>> L1_wt=0.0, cnvrg_tol=tol, maxiter=maxiter) >>> print "sm.GLM L2:", time.time() - t >>> >>> >>> >>> if 0: # GLM.fit >>> # Hits division by zero. >>> SOLVERS = ["bfgs","lbfgs", "netwon", "nm", >>> "powell","cg","ncg","basinhopping",] >>> from statsmodels.genmod.generalized_linear_model import GLM >>> from statsmodels.genmod.generalized_linear_model import families >>> for method in SOLVERS: >>> t = time.time() >>> model = GLM(y, X, >>> family=families.Binomial(link=families.links.logit)) >>> result = model.fit(method=method, >>> # scale="X2", >>> # alpha=1., >>> # abstol=tol, >>> # acc=tol, >>> # tol=tol, gtol=tol, pgtol=tol, >>> # maxiter=maxiter, >>> # #full_output=False, >>> disp=DISP) >>> print "sm.GLM.fit", method, time.time() - t >>> >>> >>> On Thu, Oct 5, 2017 at 10:32 AM, Sean Violante <sean.viola...@gmail.com> >>> wrote: >>> > Stuart >>> > have you tried glmnet ( in R) there is a python version >>> > https://web.stanford.edu/~hastie/glmnet_python/ .... >>> > >>> > >>> > >>> > >>> > On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds >>> > <stu...@stuartreynolds.net> >>> > wrote: >>> >> >>> >> Thanks Josef. Was very useful. >>> >> >>> >> result.remove_data() reduces a 5 parameter Logit result object from >>> >> megabytes to 5Kb (as compared to a minimum uncompressed size of the >>> >> parameters of ~320 bytes). Is big improvement. I'll experiment with >>> >> what you suggest -- since this is still >10x larger than possible. I >>> >> think the difference is mostly attribute names. >>> >> I don't mind the lack of a multinomial support. I've often had better >>> >> results mixing independent models for each class. >>> >> >>> >> I'll experiment with the different solvers. I tried the Logit model >>> >> in the past -- its fit function only exposed a maxiter, and not a >>> >> tolerance -- meaning I had to set maxiter very high. The newer >>> >> statsmodels GLM module looks great and seem to solve this. >>> >> >>> >> For other who come this way, I think the magic for ridge regression >>> >> is: >>> >> >>> >> from statsmodels.genmod.generalized_linear_model import GLM >>> >> from statsmodels.genmod.generalized_linear_model import >>> >> families >>> >> from statsmodels.genmod.generalized_linear_model.families >>> >> import >>> >> links >>> >> >>> >> model = GLM(y, Xtrain, >>> >> family=families.Binomial(link=links.Logit)) >>> >> result = model.fit_regularized(method='elastic_net', >>> >> alpha=l2weight, L1_wt=0.0, tol=...) >>> >> result.remove_data() >>> >> result.predict(Xtest) >>> >> >>> >> One last thing -- its clear that it should be possible to do something >>> >> like scikit's LogisticRegressionCV in order to quickly optimize a >>> >> single parameter by re-using past coefficients. >>> >> Are there any wrappers in statsmodels for doing this or should I roll >>> >> my >>> >> own? >>> >> >>> >> >>> >> - Stu >>> >> >>> >> >>> >> On Wed, Oct 4, 2017 at 3:43 PM, <josef.p...@gmail.com> wrote: >>> >> > >>> >> > >>> >> > On Wed, Oct 4, 2017 at 4:26 PM, Stuart Reynolds >>> >> > <stu...@stuartreynolds.net> >>> >> > wrote: >>> >> >> >>> >> >> Hi Andy, >>> >> >> Thanks -- I'll give another statsmodels another go. >>> >> >> I remember I had some fitting speed issues with it in the past, and >>> >> >> also some issues related their models keeping references to the >>> >> >> data >>> >> >> (=disaster for serialization and multiprocessing) -- although that >>> >> >> was >>> >> >> a long time ago. >>> >> > >>> >> > >>> >> > The second has not changed and will not change, but there is a >>> >> > remove_data >>> >> > method that deletes all references to full, data sized arrays. >>> >> > However, >>> >> > once >>> >> > the data is removed, it is not possible anymore to compute any new >>> >> > results >>> >> > statistics which are almost all lazily computed. >>> >> > The fitting speed depends a lot on the optimizer, convergence >>> >> > criteria >>> >> > and >>> >> > difficulty of the problem, and availability of good starting >>> >> > parameters. >>> >> > Almost all nonlinear estimation problems use the scipy optimizers, >>> >> > all >>> >> > unconstrained optimizers can be used. There are no optimized special >>> >> > methods >>> >> > for cases with a very large number of features. >>> >> > >>> >> > Multinomial/multiclass models don't support continuous response >>> >> > (yet), >>> >> > all >>> >> > other GLM and discrete models allow for continuous data in the >>> >> > interval >>> >> > extension of the domain. >>> >> > >>> >> > Josef >>> >> > >>> >> > >>> >> >> >>> >> >> - Stuart >>> >> >> >>> >> >> On Wed, Oct 4, 2017 at 1:09 PM, Andreas Mueller <t3k...@gmail.com> >>> >> >> wrote: >>> >> >> > Hi Stuart. >>> >> >> > There is no interface to do this in scikit-learn (and maybe we >>> >> >> > should >>> >> >> > at >>> >> >> > this to the FAQ). >>> >> >> > Yes, in principle this would be possible with several of the >>> >> >> > models. >>> >> >> > >>> >> >> > I think statsmodels can do that, and I think I saw another glm >>> >> >> > package >>> >> >> > for Python that does that? >>> >> >> > >>> >> >> > It's certainly a legitimate use-case but would require >>> >> >> > substantial >>> >> >> > changes to the code. I think so far we decided not to support >>> >> >> > this in scikit-learn. Basically we don't have a concept of a link >>> >> >> > function, and it's a concept that only applies to a subset of >>> >> >> > models. >>> >> >> > We try to have a consistent interface for all our estimators, and >>> >> >> > this doesn't really fit well within that interface. >>> >> >> > >>> >> >> > Hth, >>> >> >> > Andy >>> >> >> > >>> >> >> > >>> >> >> > On 10/04/2017 03:58 PM, Stuart Reynolds wrote: >>> >> >> >> >>> >> >> >> I'd like to fit a model that maps a matrix of continuous inputs >>> >> >> >> to a >>> >> >> >> target that's between 0 and 1 (a probability). >>> >> >> >> >>> >> >> >> In principle, I'd expect logistic regression should work out of >>> >> >> >> the >>> >> >> >> box with no modification (although its often posed as being >>> >> >> >> strictly >>> >> >> >> for classification, its loss function allows for fitting targets >>> >> >> >> in >>> >> >> >> the range 0 to 1, and not strictly zero or one.) >>> >> >> >> >>> >> >> >> However, scikit's LogisticRegression and LogisticRegressionCV >>> >> >> >> reject >>> >> >> >> target arrays that are continuous. Other LR implementations >>> >> >> >> allow a >>> >> >> >> matrix of probability estimates. Looking at: >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> http://scikit-learn-general.narkive.com/4dSCktaM/using-logistic-regression-on-a-continuous-target-variable >>> >> >> >> and the fix here: >>> >> >> >> https://github.com/scikit-learn/scikit-learn/pull/5084, which >>> >> >> >> disables >>> >> >> >> continuous inputs, it looks like there was some reason for this. >>> >> >> >> So >>> >> >> >> ... I'm looking for alternatives. >>> >> >> >> >>> >> >> >> SGDClassifier allows log loss and (if I understood the docs >>> >> >> >> correctly) >>> >> >> >> adds a logistic link function, but also rejects continuous >>> >> >> >> targets. >>> >> >> >> Oddly, SGDRegressor only allows ‘squared_loss’, ‘huber’, >>> >> >> >> ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’, and >>> >> >> >> doesn't >>> >> >> >> seems to give a logistic function. >>> >> >> >> >>> >> >> >> In principle, GLM allow this, but scikit's docs say the GLM >>> >> >> >> models >>> >> >> >> only allows strict linear functions of their input, and doesn't >>> >> >> >> allow >>> >> >> >> a logistic link function. The docs direct people to the >>> >> >> >> LogisticRegression class for this case. >>> >> >> >> >>> >> >> >> In R, there is: >>> >> >> >> >>> >> >> >> glm(Total_Service_Points_Won/Total_Service_Points_Played ~ ... , >>> >> >> >> family = binomial(link=logit), weights = >>> >> >> >> Total_Service_Points_Played) >>> >> >> >> which would be ideal. >>> >> >> >> >>> >> >> >> Is something similar available in scikit? (Or any continuous >>> >> >> >> model >>> >> >> >> that takes and 0 to 1 target and outputs a 0 to 1 target?) >>> >> >> >> >>> >> >> >> I was surprised to see that the implementation of >>> >> >> >> CalibratedClassifierCV(method="sigmoid") uses an internal >>> >> >> >> implementation of logistic regression to do its logistic >>> >> >> >> regressing >>> >> >> >> -- >>> >> >> >> which I can use, although I'd prefer to use a user-facing >>> >> >> >> library. >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> - Stuart >>> >> >> >> _______________________________________________ >>> >> >> >> scikit-learn mailing list >>> >> >> >> scikit-learn@python.org >>> >> >> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> > >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > scikit-learn mailing list >>> >> >> > scikit-learn@python.org >>> >> >> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> _______________________________________________ >>> >> >> scikit-learn mailing list >>> >> >> scikit-learn@python.org >>> >> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > scikit-learn mailing list >>> >> > scikit-learn@python.org >>> >> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >> > >>> >> _______________________________________________ >>> >> scikit-learn mailing list >>> >> scikit-learn@python.org >>> >> https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > >>> > >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn