Re: [Rd] Bugs? when dealing with contrasts
Gabor Grothendieck wrote: On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote: ... I.e., that R reverts to using indicator variables when the intercept is absent. Is there any nice way of getting contr.sum coding for the interaction as opposed to the ugly code in my post that I used to force it? i.e. cbind(1, model.matrix(~ fac)[,2:3] * scores) I think not. In general, an interaction like ~fac:scores indicates three lines with a common intercept and three different slopes, and changing the parametrization is not supposed to change the model, whereas your model inserts a restriction that the slopes sum to zero (if I understand correctly). So if you want to fit ugly models, you get to do a little ugly footwork. (A similar, simpler, issue arises if you want to have a 2x2 design with no effect in one column and/or one row (think clinical trial, placebo vs. active, baseline vs. treated. You can only do this us explicit dummy variables, not with the two classifications represented as factors.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bugs? when dealing with contrasts
On Thu, Apr 22, 2010 at 2:32 AM, Peter Dalgaard pda...@gmail.com wrote: Gabor Grothendieck wrote: On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote: ... I.e., that R reverts to using indicator variables when the intercept is absent. Is there any nice way of getting contr.sum coding for the interaction as opposed to the ugly code in my post that I used to force it? i.e. cbind(1, model.matrix(~ fac)[,2:3] * scores) I think not. In general, an interaction like ~fac:scores indicates three lines with a common intercept and three different slopes, and changing the parametrization is not supposed to change the model, whereas your model inserts a restriction that the slopes sum to zero (if I understand correctly). So if you want to fit ugly models, you get to do a little ugly footwork. OK. Thanks. I guess that's fair. (A similar, simpler, issue arises if you want to have a 2x2 design with no effect in one column and/or one row (think clinical trial, placebo vs. active, baseline vs. treated. You can only do this us explicit dummy variables, not with the two classifications represented as factors.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bugs? when dealing with contrasts
R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Below are two cases where I don't seem to be getting contr.sum contrasts even though they were specified. Are these bugs? The first case is an interaction between continuous and factor variables. The second case contrasts= was specified as an arg to lm. The second works ok if we set the contrasts through options but not if we set it through an lm argument. # 1. In this case I don't seem to be getting contr.sum contrasts: options(contrasts = c(contr.sum, contr.poly)) getOption(contrasts) [1] contr.sum contr.poly scores - rep(seq(-2, 2), 3); scores [1] -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 fac - gl(3, 5); fac [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Levels: 1 2 3 # I get this: model.matrix(~ scores:fac) (Intercept) scores:fac1 scores:fac2 scores:fac3 11 -2 0 0 21 -1 0 0 31 0 0 0 41 1 0 0 51 2 0 0 61 0 -2 0 71 0 -1 0 81 0 0 0 91 0 1 0 10 1 0 2 0 11 1 0 0 -2 12 1 0 0 -1 13 1 0 0 0 14 1 0 0 1 15 1 0 0 2 attr(,assign) [1] 0 1 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.sum # But I was expecting this since I am using contr.sum cbind(1, model.matrix(~ fac)[,2:3] * scores) fac1 fac2 1 1 -20 2 1 -10 3 100 4 110 5 120 6 10 -2 7 10 -1 8 100 9 101 10 102 11 122 12 111 13 100 14 1 -1 -1 15 1 -2 -2 # 2. # here I don't get contr.sum but rather get contr.treatment options(contrasts = c(contr.treatment, contr.poly)) getOption(contrasts) [1] contr.treatment contr.poly model.matrix(lm(seq(15) ~ fac, contrasts = c(contr.sum, contr.poly))) (Intercept) fac2 fac3 1100 2100 3100 4100 5100 6110 7110 8110 9110 10 110 11 101 12 101 13 101 14 101 15 101 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.treatment model.matrix(lm(seq(15) ~ fac)) # same (Intercept) fac2 fac3 1100 2100 3100 4100 5100 6110 7110 8110 9110 10 110 11 101 12 101 13 101 14 101 15 101 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.treatment # I was expecting the first one to give me this options(contrasts = c(contr.sum, contr.poly)) model.matrix(lm(seq(15) ~ fac)) (Intercept) fac1 fac2 1110 2110 3110 4110 5110 6101 7101 8101 9101 10 101 11 1 -1 -1 12 1 -1 -1 13 1 -1 -1 14 1 -1 -1 15 1 -1 -1 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.sum R.version.string [1] R version 2.10.1 (2009-12-14) win.version() [1] Windows Vista (build 6002) Service Pack 2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bugs? when dealing with contrasts
On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard pda...@gmail.com wrote: As for case #1, the rules are tricky in cases where interactions are present without main effects, but AFAICS, what you observe is essentially the same effect as model.matrix(~fac-1, contrasts=list(fac=contr.sum)) fac1 fac2 fac3 1 1 0 0 2 1 0 0 3 1 0 0 4 1 0 0 5 1 0 0 6 0 1 0 7 0 1 0 8 0 1 0 9 0 1 0 10 0 1 0 11 0 0 1 12 0 0 1 13 0 0 1 14 0 0 1 15 0 0 1 attr(,assign) [1] 1 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.sum I.e., that R reverts to using indicator variables when the intercept is absent. Is there any nice way of getting contr.sum coding for the interaction as opposed to the ugly code in my post that I used to force it? i.e. cbind(1, model.matrix(~ fac)[,2:3] * scores) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bugs? when dealing with contrasts
G'day Gabor, On Wed, 21 Apr 2010 14:00:33 -0400 Gabor Grothendieck ggrothendi...@gmail.com wrote: Below are two cases where I don't seem to be getting contr.sum contrasts even though they were specified. Are these bugs? Short answer: no. :) The first case is an interaction between continuous and factor variables. The second case contrasts= was specified as an arg to lm. The second works ok if we set the contrasts through options but not if we set it through an lm argument. # 1. In this case I don't seem to be getting contr.sum contrasts: options(contrasts = c(contr.sum, contr.poly)) getOption(contrasts) [1] contr.sum contr.poly scores - rep(seq(-2, 2), 3); scores [1] -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 fac - gl(3, 5); fac [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Levels: 1 2 3 # I get this: model.matrix(~ scores:fac) (Intercept) scores:fac1 scores:fac2 scores:fac3 11 -2 0 0 21 -1 0 0 31 0 0 0 41 1 0 0 51 2 0 0 61 0 -2 0 71 0 -1 0 81 0 0 0 91 0 1 0 10 1 0 2 0 11 1 0 0 -2 12 1 0 0 -1 13 1 0 0 0 14 1 0 0 1 15 1 0 0 2 attr(,assign) [1] 0 1 1 1 attr(,contrasts) attr(,contrasts)$fac [1] contr.sum When creating the model matrix, the various levels of a factor are first coded as indicator variables (for creation of the main effects and for creation of interactions). The contrast matrix is then applied to remove redundant columns from the design matrix; that is, columns that are known to be redundant due to the *design* (i.e. model formula as a whole), depending on whether data is available for all levels of a factor (or combinations of levels if several factors are in the model) you can still end up with a design matrix that does not have full column rank. In this case, since there is no main effect fac, the three columns that are put into the design matrix due to the score:fac term do not create a singularity, hence the contrast matrix is not applied to these three columns. The above description is probably a bit rough and lacking in detail (if not even wrong in some details). IMHO, the best explanation of how R goes from the model formula to the actual design matrix (for linear models) can still be found in MASS; though, if I am not mistaken, current versions of R seem to proceed slightly different to that description in more complicated models (i.e. several factors and continuous explanatory variables). # But I was expecting this since I am using contr.sum cbind(1, model.matrix(~ fac)[,2:3] * scores) fac1 fac2 1 1 -20 2 1 -10 3 100 4 110 5 120 6 10 -2 7 10 -1 8 100 9 101 10 102 11 122 12 111 13 100 14 1 -1 -1 15 1 -2 -2 If you wish to have the design matrix that contains the interaction terms as if the model had the main effects too but without the columns corresponding to the main effects, then just instruct R that you want to have such a matrix: R model.matrix(~ scores*fac)[,-(2:4)] (Intercept) scores:fac1 scores:fac2 11 -2 0 21 -1 0 31 0 0 41 1 0 51 2 0 61 0 -2 71 0 -1 81 0 0 91 0 1 10 1 0 2 11 1 2 2 12 1 1 1 13 1 0 0 14 1 -1 -1 15 1 -2 -2 Though, I am not sure why one wants to fit such a model. # 2. # here I don't get contr.sum but rather get contr.treatment options(contrasts = c(contr.treatment, contr.poly)) getOption(contrasts) [1] contr.treatment contr.poly model.matrix(lm(seq(15) ~ fac, contrasts = c(contr.sum, contr.poly))) (Intercept) fac2 fac3 1100 2100 3100 4100 5100 6110 7110 8110 9110 10 110 11 101 12 101 13 101 14 101 15
Re: [Rd] Bugs? when dealing with contrasts
On Wed, Apr 21, 2010 at 11:38 PM, Berwin A Turlach ber...@maths.uwa.edu.au wrote: # But I was expecting this since I am using contr.sum cbind(1, model.matrix(~ fac)[,2:3] * scores) fac1 fac2 1 1 -2 0 2 1 -1 0 3 1 0 0 4 1 1 0 5 1 2 0 6 1 0 -2 7 1 0 -1 8 1 0 0 9 1 0 1 10 1 0 2 11 1 2 2 12 1 1 1 13 1 0 0 14 1 -1 -1 15 1 -2 -2 If you wish to have the design matrix that contains the interaction terms as if the model had the main effects too but without the columns corresponding to the main effects, then just instruct R that you want to have such a matrix: R model.matrix(~ scores*fac)[,-(2:4)] (Intercept) scores:fac1 scores:fac2 1 1 -2 0 2 1 -1 0 3 1 0 0 4 1 1 0 5 1 2 0 6 1 0 -2 7 1 0 -1 8 1 0 0 9 1 0 1 10 1 0 2 11 1 2 2 12 1 1 1 13 1 0 0 14 1 -1 -1 15 1 -2 -2 Though, I am not sure why one wants to fit such a model. To save on degrees of freedom. I had reduced it to the minimum needed to illustrate but in fact its closer to this (except the coding is not the one needed): options(contrasts = c(contr.sum, contr.poly)) tab - as.table(matrix(1:21, 7)) dimnames(tab) = list(X = letters[1:7], Y = LETTERS[1:3]) rr - factor(row(tab)) cc - factor(col(tab)) scores - rep(seq(-3,3), 3) model.matrix( ~ rr + cc + scores:cc) so the main effects are rr and cc but scores takes the place of rr in the interaction. Your description of the process seems right since it would predict that the following gives the required coding and it does: model.matrix(~ scores*cc + rr)[,-2] Thanks. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel