On Wed, 28 Jul 2004, Mayeul KAUFFMANN wrote: > > If you can get the conditional independence (martingaleness) then, yes, > > BIC is fine. > > > > One way to check might be to see how similar the standard errors are > with > > and without the cluster(id) term. > > (Thank you "again !", Thomas.) > > At first look, the values seemed very similar (see below, case 2). > However, to check this without being too subjective, and without a > specific test, I needed other values to assess the size of the > differences: what is similar, what is not? >
I think the econometricians have theory for this (comparing the whole covariance matrices). -thomas > > ========================================================================== > ===== > CASE 1 > I first estimated the model without modeling dependence: > > Call: > coxph(formula = Surv(start, stop, status) ~ cluster(ccode) + > pop + pib + pib2 + crois + instab.x1 + instab.autres, data = xstep) > > > coef exp(coef) se(coef) robust se z p > pop 0.3606 1.434 0.0978 0.1182 3.05 2.3e-03 > pib -0.5947 0.552 0.1952 0.1828 -3.25 1.1e-03 > pib2 -0.4104 0.663 0.1452 0.1270 -3.23 1.2e-03 > crois -0.0592 0.943 0.0245 0.0240 -2.46 1.4e-02 > instab.x1 2.2059 9.079 0.4692 0.4097 5.38 7.3e-08 > instab.autres 0.9550 2.599 0.4700 0.4936 1.93 5.3e-02 > > Likelihood ratio test=74 on 6 df, p=6.2e-14 n= 7286 > > There seems to be a strong linear relationship between standard errors > (se, or naive se) and robust se. > > > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1)) > Coefficients: > Estimate Std. Error t value Pr(>|t|) > sqrt(diag(cox1$naive.var)) 0.96103 0.04064 23.65 2.52e-06 *** > Multiple R-Squared: 0.9911, Adjusted R-squared: 0.9894 > > > ========================================================================== > ===== > CASE 2 > > Then I added a variable (pxcw) measuring the proximity of the previous > event (1>pxcw>0) > > n= 7286 > coef exp(coef) se(coef) robust se z p > pxcw 0.9063 2.475 0.4267 0.4349 2.08 3.7e-02 > pop 0.3001 1.350 0.1041 0.1295 2.32 2.0e-02 > pib -0.5485 0.578 0.2014 0.1799 -3.05 2.3e-03 > pib2 -0.4033 0.668 0.1450 0.1152 -3.50 4.6e-04 > crois -0.0541 0.947 0.0236 0.0227 -2.38 1.7e-02 > instab.x1 1.9649 7.134 0.4839 0.4753 4.13 3.6e-05 > instab.autres 0.8498 2.339 0.4693 0.4594 1.85 6.4e-02 > > Likelihood ratio test=78.3 on 7 df, p=3.04e-14 n= 7286 > > > Estimate Std. Error t value Pr(>|t|) > sqrt(diag(cox1$naive.var)) 0.98397 0.02199 44.74 8.35e-09 *** > Multiple R-Squared: 0.997, Adjusted R-squared: 0.9965 > > The naive standard errors (se) seem closer to the robust se than they were > when not modeling for dependence. > 0.98397 is very close to one, R^2 grew, etc. > The dependence is high (risk is multiplied by 2.475 the day after an > event) > but conditional independence (given covariates) seems hard to reject. > > > ========================================================================== > ===== > CASE 3 > Finally, I compared these results with those without repeated events > (which gives a smaller dataset). A country is removed as soon as we > observe its first event. > (robust se is still computed, even if naive se should in fact be used here > to compute the pvalue) > > coxph(formula = Surv(start, stop, status) ~ cluster(ccode) + > pop + pib + pib2 + crois + instab.x1 + instab.autres, data = > xstep[no.previous.event, ]) > > coef exp(coef) se(coef) robust se z p > pop 0.4236 1.528 0.1030 0.1157 3.66 2.5e-04 > pib -0.7821 0.457 0.2072 0.1931 -4.05 5.1e-05 > pib2 -0.3069 0.736 0.1477 0.1254 -2.45 1.4e-02 > crois -0.0432 0.958 0.0281 0.0258 -1.67 9.5e-02 > instab.x1 1.9925 7.334 0.5321 0.3578 5.57 2.6e-08 > instab.autres 1.3571 3.885 0.5428 0.5623 2.41 1.6e-02 > > Likelihood ratio test=66.7 on 6 df, p=1.99e-12 n=5971 (2466 observations > deleted due to missing) > > > > summary(lm(sqrt(diag(cox1$var))~ sqrt(diag(cox1$naive.var)) -1)) > Estimate Std. Error t value Pr(>|t|) > sqrt(diag(cox1$naive.var)) 0.86682 0.07826 11.08 0.000104 *** > Residual standard error: 0.06328 on 5 degrees of freedom > Multiple R-Squared: 0.9608, Adjusted R-squared: 0.953 > > > There seems to be no evidence that robust se is more different from se in > case 2 than in case 3 (and case 1). > It even seems closer. > > I conclude that conditional independence (martingaleness) cannot be > rejected in CASE 2, when modeling the dependence between events with a > covariate. > > Mayeul KAUFFMANN > Univ. Pierre Mendes France > Grenoble - France > > > > > > Then, there is still another option. In fact, I already modelled > > > explicitely the influence of past events with a "proximity of last > event" > > > covariate, assuming the dependence on the last event decreases at a > > > constant rate (for instance, the proximity covariate varies from 1 to > 0.5 > > > in the first 10 years after an event, then from 0.5 to 0.25 in the > next > > > ten years, etc). > > > > > > With a well chosen modelisation of the dependence effect, the events > > > become conditionnaly independent, I do not need a +cluster(id) term, > and I > > > can use fit$loglik to make a covariate selection based on BIC, right? > > > > If you can get the conditional independence (martingaleness) then, yes, > > BIC is fine. > > > > One way to check might be to see how similar the standard errors are > with > > and without the cluster(id) term. > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html