Re: [R] prediction based on conditional logistic regression, clogit

2014-06-18 Thread peter dalgaard

On 17 Jun 2014, at 16:38 , Therneau, Terry M., Ph.D. thern...@mayo.edu wrote:

 As Peter D said, the clogit function simply sets up a special data set and 
 then calls coxph, and is based on an identity that the likelihood for the 
 conditional logistic is identical to the likelihood of a Cox model for a 
 specially structured data set.  I vacillate on whether this identity is just 
 a mathematical accident or the mark of some deep truth.


My take on this is that it is the latter, but in reverse: The Cox likelihood 
works by splitting the event history into a series of conditional experiments: 
Given a risk set and that one member must die, work out the likelihood that it 
is the member that is observed to die. These experiments are clearly not 
independent since the risk set of one experiment depends on the outcome of the 
previous ones. However, arguably, it is still sensible to cumulate the 
information represented by the log-likelihood contributions. 

I.e. you can do condtional logistic regression with a Cox likelihood because 
the Cox likelihood _is_ a conditional logistic regression likelihood.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression, clogit

2014-06-17 Thread Therneau, Terry M., Ph.D.

I ususally scan the digest for surv, so missed your question on the first 
round.
You caught predict with a case that I never thought of; I'll look into making 
it smarter.

As Peter D said, the clogit function simply sets up a special data set and then calls 
coxph, and is based on an identity that the likelihood for the conditional logistic is 
identical to the likelihood of a Cox model for a specially structured data set.  I 
vacillate on whether this identity is just a mathematical accident or the mark of some 
deep truth.  However it did allow us to use existing code rather than write new routine 
from scratch.


In this special data set all of the follow-up times are set to 1 (any constant value would 
do).  For normal Cox regression a predicted expected number of events depends on the 
length of follow-up for the subject, so the predict routine expects to find a follow-up 
time variable in your newdata.  In my code, rep(1, 150L) is being mistaken for the name 
of the time variable, and failure of the confused routine ensues in due course.


For the interim: in a conditional logistic the expected number of events is just 
exp(eta)/(1 + exp(eta)) where eta is the linear predictor.  There is no follow-up time to 
worry about and predict.coxph would have done way too much work, even if it hadn't gotton 
confused.  Use


risk - predict(fit, newdata=dat.test, type='risk')
risk / (1+risk)

Terry Therneau




Hi, I am using clogit() from survival package to do conditional logistic 
regression. I also need to make prediction on an independent dataset to calculate 
predicted probability. Here is an example:



dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
fit-clogit(status~x1+x2+strata(set),dat)
predict(fit,newdata=dat.test,type='expected')

Error in Surv(rep(1, 150L), status) :
?  Time and status are different lengths

Can anyone suggest what's wrong here?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread peter dalgaard

On 16 Jun 2014, at 05:22 , array chip arrayprof...@yahoo.com wrote:

 Hi, I am using clogit() from survival package to do conditional logistic 
 regression. I also need to make prediction on an independent dataset to 
 calculate predicted probability. Here is an example:
 
 
 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
 Error in Surv(rep(1, 150L), status) : 
   Time and status are different lengths
 
 Can anyone suggest what's wrong here?
 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m - model.matrix(~ x1 + x2 - 1, data=dat.test)
pp - exp(m %*% coef(fit))
pps - ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd

 Thanks!
 
 John
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread array chip
Thank you Peter. Any other suggestions are absolutely welcome!!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, June 16, 2014 2:22 AM
Subject: Re: [R] prediction based on conditional logistic regression clogit





 Hi, I am using clogit() from survival package to do conditional logistic 
 regression. I also need to make prediction on an independent dataset to 
 calculate predicted probability. Here is an example:
 
 
 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
 Error in Surv(rep(1, 150L), status) : 
   Time and status are different lengths
 
 Can anyone suggest what's wrong here?
 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m - model.matrix(~ x1 + x2 - 1, data=dat.test)
pp - exp(m %*% coef(fit))
pps - ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd




 Thanks!
 
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread Charles Berry
peter dalgaard pdalgd at gmail.com writes:

 
 
 On 16 Jun 2014, at 05:22 , array chip arrayprofile at yahoo.com wrote:
 

  Hi, I am using clogit() from survival package to do conditional
  logistic regression. I also need to make prediction on an
  independent dataset to calculate predicted probability. Here is an
  example:

[snip]

  Can anyone suggest what's wrong here?
  

  The direct cause is that clogit() works by using the fact that the
 likelihood is equivalent to a coxph() likelihood with stratification
 and all observation lengths set to 1. Therefore the analysis is
 formally on Surv(rep(1, 150L), status) and that goes belly-up if you
 apply the same formula to a data set of different length.

  However, notice that there is no such thing as predict.clogit(), so
 you are attempting predict.coxph() on a purely formal Cox model. It
 is unclear to what extent predicted values, in the sense of coxph()
 are compatible with predictions in conditional logit models.

 
 I'm rusty on this, but I think what you want is something like
 
 m - model.matrix(~ x1 + x2 - 1, data=dat.test)
 pp - exp(m %*% coef(fit))
 pps - ave(pp, dat.test$set, FUN=sum)
 pp/pps
 

 i.e. the conditional probability that an observation is a case given
 covariates and that there is on case in each set (in the data given,
 you have sets of three with one being a case, so all predicted
 probabilities are close to 0.33). For more general matched sets, I'm
 not really sure what one wants. Real experts are welcome to chime
 in.

For the general situation of n cases in a stratum of size N, you want the
probability that the unit in question is one of n units drawn from a
stratum of size N without replacement with unequal probabilities of
selection over the units.

I am *not* an expert on that, but there is plenty written on it.

 Horvitz, Daniel G., and Donovan J. Thompson. A generalization of
 sampling without replacement from a finite universe. Journal of the
 American Statistical Association 47.260 (1952): 663-685.

is a place to start.

The probability in question is a sum over the

factorial(n)*choose(N-1,n-1)) 

elements corresponding to the number of samples (and orders) that
include a chosen element.

Of course, for n=1 there is just the one element, pp/pps.

HTH,

Chuck

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.