>First, I do not know how to specify such a link function in R. >Second, if I can specify such alink, I could use (in place of d*j), the >smooth baseline estimated after doing a Cox regression. But I don't know >how to fit (for instance) a piecewise constant baseline hazard with a >Poison glm
I found a possible answer to one of my own question, sorry for the nuisance. Simply add +factor(year) to the poisson model will fit a piecewise constant baseline hazard (one step every year). or +cut(year,br=seq(1961,1997,by=2)) to have one step every two years. (Hope the rest is correct...) Mayeul KAUFFMANN Univ. Pierre Mendes France Grenoble - France ----- Original Message ----- From: "Prof Brian Ripley" <[EMAIL PROTECTED]> To: "Mayeul KAUFFMANN" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, August 13, 2004 8:40 AM Subject: Re: [R] How to use the whole dataset (including between events) in Cox model (time-varying covariates) ? > This is the consequence of the use of partial likelihood in the Cox model. > You should read the literature on this point (for example, have you read > Cox, 1972 and all its discussion, or Anderson, Borgan, Gill & Keiding?). > > It is not an R question. You need to make more assumptions, such as a > smooth baseline hazard, and you can always use parametric models and a > full likelihood (but you may have to program them yourself). > > > On Fri, 13 Aug 2004, Mayeul KAUFFMANN wrote: > > > Hello, > > > > coxph does not use any information that are in the dataset between event > > times (or "death times") , since computation only occurs at event times. > > > > For instance, removing observations when there is no event at that time in > > the whole dataset does not change the results: > > > set.seed(1) > > > data <- > > as.data.frame(cbind(start=c(1:5,1:5,1:4),stop=c(2:6,2:6,2:5),status=c(rep( > > 0,7),1,rep(0,5),1),id=c(rep(1,5),rep(2,5),rep(3,4)),x1=rnorm(14))) > > > data > > start stop status id x1 > > 1 1 2 0 1 -0.6264538 > > 2 2 3 0 1 0.1836433 > > 3 3 4 0 1 -0.8356286 > > 4 4 5 0 1 1.5952808 > > 5 5 6 0 1 0.3295078 > > 6 1 2 0 2 -0.8204684 > > 7 2 3 0 2 0.4874291 > > 8 3 4 1 2 0.7383247 > > 9 4 5 0 2 0.5757814 > > 10 5 6 0 2 -0.3053884 > > 11 1 2 0 3 1.5117812 > > 12 2 3 0 3 0.3898432 > > 13 3 4 0 3 -0.6212406 > > 14 4 5 1 3 -2.2146999 > > coxph(Surv(start, stop,status)~ cluster(id)+x1,data=data ,robust=T) > > coxph(Surv(start, stop,status)~ cluster(id)+x1,data=subset(data,stop %in% > > 4:5) ,robust=T) # the same !!! (except n) > > > > First, some data is lost. > > Second, this loss could be an important problem when there is a > > time-varying covariate that changes quicker than the frequency of events. > > Specifically, I have a covariate which has low values most of the time. It > > sometimes jumps to high values and that is hypothesized as greatly > > increasing the risk of an event. > > With rare events, the effect of this covariate will only be measured at > > event times. Chances are that the only time such a covariate is recorded > > at high level, the individual for which it is measured as being high is > > having an event. > > This may bias the estimated coefficient. > > > > Here is my question: > > How to fully use the dataset? > > > > (that is: how to have really _time-varying_ covariates (even if they > > change step by step, not continuously), not covariates whose changes are > > measured only at event time ) > > > > Ideally, the full dataset would be use to estimate the parameters, or at > > least to estimate the standard error of the estimated parameters. > > Any ideas ??? > > . > > . > > . > > > > A second best (which might require less work) would be to use all the > > dataset to assess the predictive power of the model. > > > > Maybe by using the expected number of events for an individual over the > > time interval that they were observed to be at risk > > > predict(coxfit,type="expected") > > and compare it with observed number of events > > (does it use all data and takes into account all the baseline hazard, even > > between events?) > > > > > > Or, if not, following Brian D. Ripley suggestion about the baseline > > hazard: "As an approximation you can smooth the fitted > > baseline cumulative hazard (e.g. by package pspline) and ask for its > > derivative" (https://stat.ethz.ch/pipermail/r-help/2004-July/052376.html) > > > > the following code could be use to estimate (and plot) a smooth baseline > > hazard: > > > t<-seq(min(data$start),max(data$stop),length=100) > > > lines(t, > > predict(sm.spline(x=basehaz(coxfit)[,2],y=basehaz(coxfit)[,1],norder=2), > > t,1)) > > #there is a problem with this code. One should add the contraint that the > > baseline hazard cannot be negative. > > > > The following computes the parametric part of the cox model. > > > risk <- predict(coxfit, type='risk') # gives exp(X'b) > > > > something like > > > baseline.hazard*risk > > would give the true risk at any time (but it would be probably much harder > > to compute) > > > > which could help assess the predictive power of the model. > > (still a lot of work) > > > > Thanks in advance for any help or comment. > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
