Frank E Harrell Jr Professor and ChairmanSchool of Medicine
Department of Biostatistics Vanderbilt University
On Mon, 30 Aug 2010, Ravi Varadhan wrote:
Hi,
I fit a Cox PH model to estimate the cause-specific hazards (in a competing
risks setting). Then , I compute the survival estimates for all the
individuals in my data set using the `survfit' function. I am currently
playing with a data set that has about 6000 observations and 12 covariates. I
am finding that the survfit function is very slow.
Here is a simple simulation example (modified from Frank Harrell's example for
`cph') that illustrates the problem:
#n <- 500
set.seed(4321)
age <- 50 + 12*rnorm(n)
sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
cens <- 5 * runif(n)
h <- 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens, 1, 0)
dt <- pmin(dt, cens)
Srv <- Surv(dt, e)
f <- coxph(Srv ~ age + sex, x=TRUE, y=TRUE)
system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
When I run the above code with sample sizes, n, taking on values of 500, 1000,
2000, and 4000, the time it takes for survfit to run are as follows:
# n <- 500
system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
user system elapsed
0.160.000.15
# n <- 1000
system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
user system elapsed
1.450.001.48
# n <- 2000
system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
user system elapsed
10.190.00 10.25
# n <- 4000
system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
user system elapsed
72.870.05 74.87
I eventually want to use `survfit' on a data set with roughly 50K observations,
which I am afraid is going to be painfully slow. I would much appreciate
hints/suggestions on how to make `survfit' faster or any other faster
alternatives.
Ravi,
If you don't need standard errors/confidence limits, the rms package's
survest and related functions can speed things up greatly if you fit
the model using cph(, surv=TRUE). [cph calls coxph, and calls
survfit once to estimate the underlying survival curve].
Frank
Thanks.
Best,
Ravi.
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvarad...@jhmi.edu
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.