Re: [R] Trouble Computing Type III SS in a Cox Regression
Sigh. Message: 50 Date: Fri, 26 Apr 2013 10:13:52 +1200 From: Rolf Turner rolf.tur...@xtra.co.nz To: Terry Therneau thern...@mayo.edu Cc: r-help@r-project.org, Achim Zeileis achim.zeil...@uibk.ac.at Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression Message-ID: 5179aaa0.8060...@xtra.co.nz Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 26/04/13 03:40, Terry Therneau wrote: (In response to a question about computing type III sums of squares in a Cox regression): SNIP If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Fortune nomination! cheers, Rolf --- On Thu, 4/25/13, Terry Therneau thern...@mayo.edu wrote: From: Terry Therneau thern...@mayo.edu Subject: Re: Trouble Computing Type III SS in a Cox Regression To: Paul Miller pjmiller...@yahoo.com, r-help@R-project.org Received: Thursday, April 25, 2013, 10:40 AM You've missed the point of my earlier post, which is that type III is not an answerable question. 1. There are lots of ways to compare Cox models, LRT is normally considered the most reliable by serious authors. There is usually not much difference between score, Wald, and LRT tests though, and the other two are more convenient in many situations. 2. Type III is a question that can't be addressed. SAS prints something out with that label, but since they don't document what it is, and people with in-depth knowlegde of Cox models (like me) cannot figure out what a sensible definition could actually be, there is nowhere to go. How to do this in R can't be answered. (It has nothing to do with interactions.) 3. If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Terry T. On 04/25/2013 07:59 AM, Paul Miller wrote Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
Seconded John Kane Kingston ON Canada -Original Message- From: rolf.tur...@xtra.co.nz Sent: Fri, 26 Apr 2013 10:13:52 +1200 To: thern...@mayo.edu Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression On 26/04/13 03:40, Terry Therneau wrote: (In response to a question about computing type III sums of squares in a Cox regression): SNIP If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Fortune nomination! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul --- On Wed, 4/24/13, Terry Therneau thern...@mayo.edu wrote: From: Terry Therneau thern...@mayo.edu Subject: Re: Trouble Computing Type III SS in a Cox Regression To: r-help@r-project.org, Paul Miller pjmiller...@yahoo.com Received: Wednesday, April 24, 2013, 5:55 PM I should hope that there is trouble, since type III is an undefined concept for a Cox model. Since SAS Inc fostered the cult of type III they have recently added it as an option for phreg, but I am not able to find any hints in the phreg documentation of what exactly they are doing when you invoke it. If you can unearth this information, then I will be happy to tell you whether a. using the test (whatever it is) makes any sense at all for your data set b. if a is true, how to get it out of R I use the word cult on purpose -- an entire generation of users who believe in the efficacy of this incantation without having any idea what it actually does. In many particular instances the SAS type III corresponds to a survey sampling question, i.e., reweight the data so that it is balanced wrt factor A and then test factor B in the new sample. The three biggest problems with type III are that 1: the particular test has been hyped as better when in fact it sometimes is sensible and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately often works even when the underlying rationale does not hold and 3: they explain it using a notation that completely obscures the actual question. This last leads to the nonsense phrase test for main effects in the presence of interactions. There is a survey reweighted approach for Cox models, very closely related to the work on causal inference (marginal structural models), but I'd bet dollars to donuts that this is not what SAS is doing. (Per 2 -- type III was a particular order of operations of the sweep algorithm for linear models, and for backwards compatability that remains the core definition even as computational algorthims have left sweep behind. But Cox models can't be computed using the sweep algorithm). Terry Therneau On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote: Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp- subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp- subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) 'data.frame': 227 obs. of 6 variables: $ PTNO : int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron) coef exp(coef) se(coef) z p AGE 0.00492 1.005 0.00789 0.624 0.530 PS2.L -0.34523
Re: [R] Trouble Computing Type III SS in a Cox Regression
Please take this discussion offlist. It is **not** about R. -- Bert On Thu, Apr 25, 2013 at 5:59 AM, Paul Miller pjmiller...@yahoo.com wrote: Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul --- On Wed, 4/24/13, Terry Therneau thern...@mayo.edu wrote: From: Terry Therneau thern...@mayo.edu Subject: Re: Trouble Computing Type III SS in a Cox Regression To: r-help@r-project.org, Paul Miller pjmiller...@yahoo.com Received: Wednesday, April 24, 2013, 5:55 PM I should hope that there is trouble, since type III is an undefined concept for a Cox model. Since SAS Inc fostered the cult of type III they have recently added it as an option for phreg, but I am not able to find any hints in the phreg documentation of what exactly they are doing when you invoke it. If you can unearth this information, then I will be happy to tell you whether a. using the test (whatever it is) makes any sense at all for your data set b. if a is true, how to get it out of R I use the word cult on purpose -- an entire generation of users who believe in the efficacy of this incantation without having any idea what it actually does. In many particular instances the SAS type III corresponds to a survey sampling question, i.e., reweight the data so that it is balanced wrt factor A and then test factor B in the new sample. The three biggest problems with type III are that 1: the particular test has been hyped as better when in fact it sometimes is sensible and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately often works even when the underlying rationale does not hold and 3: they explain it using a notation that completely obscures the actual question. This last leads to the nonsense phrase test for main effects in the presence of interactions. There is a survey reweighted approach for Cox models, very closely related to the work on causal inference (marginal structural models), but I'd bet dollars to donuts that this is not what SAS is doing. (Per 2 -- type III was a particular order of operations of the sweep algorithm for linear models, and for backwards compatability that remains the core definition even as computational algorthims have left sweep behind. But Cox models can't be computed using the sweep algorithm). Terry Therneau On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote: Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp- subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp- subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) 'data.frame':227 obs. of 6 variables: $ PTNO: int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron
Re: [R] Trouble Computing Type III SS in a Cox Regression
You've missed the point of my earlier post, which is that type III is not an answerable question. 1. There are lots of ways to compare Cox models, LRT is normally considered the most reliable by serious authors. There is usually not much difference between score, Wald, and LRT tests though, and the other two are more convenient in many situations. 2. Type III is a question that can't be addressed. SAS prints something out with that label, but since they don't document what it is, and people with in-depth knowlegde of Cox models (like me) cannot figure out what a sensible definition could actually be, there is nowhere to go. How to do this in R can't be answered. (It has nothing to do with interactions.) 3. If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Terry T. On 04/25/2013 07:59 AM, Paul Miller wrote Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
On 26/04/13 03:40, Terry Therneau wrote: (In response to a question about computing type III sums of squares in a Cox regression): SNIP If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Fortune nomination! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression using drop1 and Anova
Dear Paul, -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Paul Miller Sent: Wednesday, April 24, 2013 1:18 PM To: r-help@r-project.org Subject: [R] Trouble Computing Type III SS in a Cox Regression using drop1 and Anova Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: . . . Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. You've answered your own question: The summary() output gives you Wald tests, and drop1() and Anova() give you LR tests. From ?Anova: test.statistic: ... for a Cox model, whether to calculate LR (partial-likelihood ratio) or Wald tests; ... Thus, if you want the Wald test, you can ask for it (though it escapes me why you prefer it to the LR test). I hope this helps, John --- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
I should hope that there is trouble, since type III is an undefined concept for a Cox model. Since SAS Inc fostered the cult of type III they have recently added it as an option for phreg, but I am not able to find any hints in the phreg documentation of what exactly they are doing when you invoke it. If you can unearth this information, then I will be happy to tell you whether a. using the test (whatever it is) makes any sense at all for your data set b. if a is true, how to get it out of R I use the word cult on purpose -- an entire generation of users who believe in the efficacy of this incantation without having any idea what it actually does. In many particular instances the SAS type III corresponds to a survey sampling question, i.e., reweight the data so that it is balanced wrt factor A and then test factor B in the new sample. The three biggest problems with type III are that 1: the particular test has been hyped as better when in fact it sometimes is sensible and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately often works even when the underlying rationale does not hold and 3: they explain it using a notation that completely obscures the actual question. This last leads to the nonsense phrase test for main effects in the presence of interactions. There is a survey reweighted approach for Cox models, very closely related to the work on causal inference (marginal structural models), but I'd bet dollars to donuts that this is not what SAS is doing. (Per 2 -- type III was a particular order of operations of the sweep algorithm for linear models, and for backwards compatability that remains the core definition even as computational algorthims have left sweep behind. But Cox models can't be computed using the sweep algorithm). Terry Therneau On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote: Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp- subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp- subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) 'data.frame': 227 obs. of 6 variables: $ PTNO: int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron) coef exp(coef) se(coef) z p AGE0.00492 1.005 0.00789 0.624 0.530 PS2.L -0.34523 0.708 0.14315 -2.412 0.016 Likelihood ratio test=5.66 on 2 df, p=0.0591 n= 227, number of events= 198 drop1(coxCV, test=Chisq) Single term deletions Model: Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2 DfAICLRT Pr(Chi) none 1755.2 AGE 1 1753.6 0.3915 0.53151 PS2 1 1758.4 5.2364 0.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 require(car) Anova(coxCV, type=III) Analysis of Deviance Table (Type III tests) LR Chisq Df Pr(Chisq) AGE 0.3915 10.53151 PS2 5.2364 10.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide