Re: [R] prediction based on conditional logistic regression, clogit
On 17 Jun 2014, at 16:38 , Therneau, Terry M., Ph.D. thern...@mayo.edu wrote: As Peter D said, the clogit function simply sets up a special data set and then calls coxph, and is based on an identity that the likelihood for the conditional logistic is identical to the likelihood of a Cox model for a specially structured data set. I vacillate on whether this identity is just a mathematical accident or the mark of some deep truth. My take on this is that it is the latter, but in reverse: The Cox likelihood works by splitting the event history into a series of conditional experiments: Given a risk set and that one member must die, work out the likelihood that it is the member that is observed to die. These experiments are clearly not independent since the risk set of one experiment depends on the outcome of the previous ones. However, arguably, it is still sensible to cumulate the information represented by the log-likelihood contributions. I.e. you can do condtional logistic regression with a Cox likelihood because the Cox likelihood _is_ a conditional logistic regression likelihood. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression, clogit
I ususally scan the digest for surv, so missed your question on the first round. You caught predict with a case that I never thought of; I'll look into making it smarter. As Peter D said, the clogit function simply sets up a special data set and then calls coxph, and is based on an identity that the likelihood for the conditional logistic is identical to the likelihood of a Cox model for a specially structured data set. I vacillate on whether this identity is just a mathematical accident or the mark of some deep truth. However it did allow us to use existing code rather than write new routine from scratch. In this special data set all of the follow-up times are set to 1 (any constant value would do). For normal Cox regression a predicted expected number of events depends on the length of follow-up for the subject, so the predict routine expects to find a follow-up time variable in your newdata. In my code, rep(1, 150L) is being mistaken for the name of the time variable, and failure of the confused routine ensues in due course. For the interim: in a conditional logistic the expected number of events is just exp(eta)/(1 + exp(eta)) where eta is the linear predictor. There is no follow-up time to worry about and predict.coxph would have done way too much work, even if it hadn't gotton confused. Use risk - predict(fit, newdata=dat.test, type='risk') risk / (1+risk) Terry Therneau Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : ? Time and status are different lengths Can anyone suggest what's wrong here? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
On 16 Jun 2014, at 05:22 , array chip arrayprof...@yahoo.com wrote: Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
Thank you Peter. Any other suggestions are absolutely welcome!! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, June 16, 2014 2:22 AM Subject: Re: [R] prediction based on conditional logistic regression clogit Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
peter dalgaard pdalgd at gmail.com writes: On 16 Jun 2014, at 05:22 , array chip arrayprofile at yahoo.com wrote: Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: [snip] Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. For the general situation of n cases in a stratum of size N, you want the probability that the unit in question is one of n units drawn from a stratum of size N without replacement with unequal probabilities of selection over the units. I am *not* an expert on that, but there is plenty written on it. Horvitz, Daniel G., and Donovan J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47.260 (1952): 663-685. is a place to start. The probability in question is a sum over the factorial(n)*choose(N-1,n-1)) elements corresponding to the number of samples (and orders) that include a chosen element. Of course, for n=1 there is just the one element, pp/pps. HTH, Chuck __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.