Hello R-users. I believe that the way basehaz (in the survival package) compute the baseline hazard function is false.
I come to question this function when it gives me hazard probabilities greater than 1. Looking at the code I think I've localised the error : hazard probability is computed as : H <- -log(surv) but it seems to me that hazard probabilities is rather an instantaneous survival rate that could be computed this way : H[i] <- 1 - surv[i] / surv[i-1] Using this rule I achieve satisfiable results with the two following functions : surv2haz <- function(surv) { haz <- surv haz[1] <- 1 - surv[1] for(i in c(2:length(surv))) { haz[i] <- 1 - surv[i] / surv[i - 1] } return(haz) } haz2surv <- function(haz) { surv <- haz surv[1] <- 1 - haz[1] for(i in c(2:length(haz))) { surv[i] <- (1 - haz[i]) * surv[i-1] } return(surv) } If I'm right, wouldn't it be a good idea to change the basehaz function, to avoid misleading the overconfident user (as I happen to be) ? I hope this will help contributing to a wonderful tool that speed up my understanding of statistical analysis and my research. David -- David Mas ERMES-FRE 2887-CNRS Université Pantheon-Assas Paris II 12, place du Pantheon F-75230 Paris Cedex 05 Tel: +33 (0)1 44 41 89 91 Mob: +33 (0)6 84 15 77 67 Fax: +33 (0)1 40 51 81 30 http://www.u-paris2.fr/ermes/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.