Question: When survfit() function is used upon a coxph object, the 'n' returned 
is vastly smaller (n=6) than the number of distinct loans in the dataset used. 

I am trying to estimate a Cox proportional hazards model for a set of loans 
(over 6000) using using time varying covariates. For this 6000+ loans, I have 
some 62,000 different vectors representing the loans at different periods of 
time. I did the following:

resultsOpt <- coxph(Surv(Start,Stop,PrepayDate)~ closingCoupon + loanPurposeId, 
data=latest)

which returned:

Call:
coxph(formula = Surv(Start, Stop, PrepayDate) ~ closingCoupon + 
    loanPurposeId, data = latest)


               coef exp(coef) se(coef)    z       p
closingCoupon 0.101      1.11   0.0271 3.73 1.9e-04
loanPurposeId 0.434      1.54   0.0624 6.96 3.3e-12

Likelihood ratio test=50.3  on 2 df, p=1.18e-11  n= 62297 


which seems fair.


However when I do:

> survfit(resultsOpt)
Call: survfit.coxph(object = resultsOpt)

      n  events  median 0.95LCL 0.95UCL 
      6     489     Inf     Inf     Inf 

the n = 6 when the number of distinct loans in the dataset is more like 6554.

My dataset looks like the following when I call it from within R:

> latest[1:5, 1:5]
  Start Stop PrepayDate modBalance closingCoupon
1     6    7          0   811.2769          8.35
2     7    8          0   811.2769          8.35
3     8    9          1   811.2769          8.35
4     4    5          0  2226.0825          8.70
5     5    6          0  2226.0825          8.70


where the first 3 rows present 1 loan, and the next 2 loans a new one. Am I 
putting the data in an incorrect format, and if so how should I correct it? 
Thanks much.

Pan

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to