Re: [R] R Kaplan-Meier plotting quirks?

Andrews, Chris Wed, 17 Oct 2012 05:47:30 -0700

Mike,

My guess is that you have censored observations in the middle.
When using the minimum time, the events are happening prior to censorings.  
Then the riskset is large and the curve decreases slightly.
When using the maximum time, the events are happening after the censorings.  
Then the riskset is small and the curve decreases quickly.


For example, moving the first event from time 1 to time 5 causes the final 
survival estimate to be lower when using max time (.375) than min time (.533):

library(survival)
df <- data.frame(mintime = c(1,2,3,4,6), maxtime = c(5,2,3,4,6), Delta= 
c(1,0,1,0,0))
plot(survfit(Surv(mintime,Delta)~1,data=df), conf=FALSE, xlim=c(0,7))
lines(survfit(Surv(maxtime,Delta)~1,data=df), col=2)

> summary(survfit(Surv(mintime,Delta)~1,data=df))
Call: survfit(formula = Surv(mintime, Delta) ~ 1, data = df)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1      5       1    0.800   0.179        0.516            1
    3      3       1    0.533   0.248        0.214            1
> summary(survfit(Surv(maxtime,Delta)~1,data=df))
Call: survfit(formula = Surv(maxtime, Delta) ~ 1, data = df)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    3      4       1    0.750   0.217       0.4259            1
    5      2       1    0.375   0.286       0.0839            1

Given that you have interval censored data, you can consider fitting the 
survival curve with interval censoring techniques.  For example survreg fits a 
parametric curve.

Chris

-----Original Message-----
From: Michael Rentz [mailto:rent0...@umn.edu] 
Sent: Tuesday, October 16, 2012 12:36 PM
To: r-help@r-project.org
Subject: [R] R Kaplan-Meier plotting quirks?

Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to 
include enough detail.

I have a question about survival curves I have been battling off and on for a 
few months. No one local seems to be able to help, so I turn here. The issue 
seems to either be how R calculates Kaplan-Meier Plots, or something with the 
underlying statistic itself that I am misunderstanding. Basically, longer 
survival times are yielding steeper drops in survival than a set of shorter 
survival times but with the same number of loss and retention events.

As a minor part of my research I have been comparing tag survival in marked 
wild rodents. I am comparing a standard ear tag with a relatively new 
technique. The newer tag clearly “wins” using survival tests, but the resultant 
Kaplan-Meier plot does not seem to make sense. Since I am dealing with a wild 
animal and only trapped a few days out of a month the data is fairly messy, 
with gaps in capture history that require assumptions of tag survival. An 
animal that is tagged and recaptured 2 days later with a tag and 30 days later 
without one could have an assumed tag retention of 2 days (minimum confirmed) 
or 30 days (maximum possible).

Both are significant with a survtest, but the K-M plots differ. A plot of 
minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) yields 
a curve with a steep initial drop in “survival”, but then a leveling off and 
straight line thereafter at about 80% survival. Plotting the maximum possible 
dates (same number of losses/retention, but retention times are longer, the 
length to the next capture without a tag, typically
25-30 days or more) does not show as steep of a drop in the first few days, but 
at about the point the minimum estimate levels off this one begins dropping 
steeply. 400 days out the plot with minimum possible estimates has tag survival 
of about 80%, whereas the plot with the same loss rate but longer assumed 
survival times shows only a 20% assumed survival at 400 days. Complicating this 
of course is the fact that the great majority of the animals die before the tag 
is lost, survival of the rodents is on the order of months.

I really am not sure what is going on, unless somehow the high number of events 
in the first few days followed by few events thereafter leads to the assumption 
that after the initial few days survival of the tag is high. The plotting of 
maximum lengths has a more even distribution of events, rather than a clumping 
in the first few days, so I guess the model assumes relatively constant 
hazards? As an aside, a plot of the mean between the minimum and maximum almost 
mirrors the maximum plot. Adding five days to the minimum when the minimum plus 
5 is less than the maximum returns a plot with a steeper initial drop, but then 
constant thereafter, mimicking the minimum plot, but at a lower final survival 
rate.

Basically, I am at a loss why surviving longer would *decrease* the survival 
rate???

My co-author wants to drop the K-M graph given the confusion, but I think it 
would be odd to publish a survival paper without one. I am not sure which graph 
to use? They say very different things, while the actual statistics do not 
differ that greatly.

I am more than happy to provide the data and code for anyone who would like to 
help if the above is not explanation enough. Thank you in advance.

Mike.


--
Michael S. Rentz
PhD Candidate, Conservation Biology
University of Minnesota
5122 Idlewild Street
Duluth, MN 55804
(218) 525-3299
rent0...@umn.edu


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Kaplan-Meier plotting quirks?

Reply via email to