The following is the code for the model.matrix. But it still doesn't answer why A:B is interpreted differently in Y~A+B+A:B and Y~A:B. By 'why', I mean how R internally does it and what is the rational behind the way of doing it?
And it didn't answer why in the model.matrix of Y~A, there are a-1 terms from A plus the intercept, but in the model.matrix of Y~A:B, there are a*b terms (rather than a*b-1 terms) plus the intercept. Since the one coefficient of the lm of Y~A:B is going to be NA, why bother to include the corresponding term in the model matrix? ############code below set.seed(0) a=3 b=4 AB_effect=data.frame( name=paste( unlist( do.call( rbind , rep(list(paste('A', letters[1:a],sep='')), b) ) ) , unlist( do.call( cbind , rep(list(paste('B', letters[1:b],sep='')), a) ) ) , sep=':' ) , value=rnorm(a*b) , stringsAsFactors=F ) max_n=10 n=sample.int(max_n, a*b, replace=T) AB=mapply(function(name, n){rep(name,n)}, AB_effect$name, n) Y=AB_effect$value[match(unlist(AB), AB_effect$name)] Y=Y+a*b*rnorm(length(Y)) sub_fr=as.data.frame(do.call(rbind, strsplit(unlist(AB), ':'))) rownames(sub_fr)=NULL colnames(sub_fr)=c('A', 'B') fr=data.frame(Y=Y,sub_fr) my_subset=function(amm) { coding=apply( amm ,1 , function(x) { paste(x, collapse='') } ) amm[match(unique(coding), coding),] } my_subset(model.matrix(Y ~ A*B,fr)) my_subset(model.matrix(Y ~ (A+B)^2,fr)) my_subset(model.matrix(Y ~ A + B + A:B,fr)) my_subset(model.matrix(Y ~ A:B - 1,fr)) my_subset(model.matrix(Y ~ A:B,fr)) On Fri, Mar 5, 2010 at 8:45 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > The way to understand this is to look at the output of model.matrix: > > model.matrix(fo, fr) > > for each formula you tried. If your data is large you will have to > use a subset not to be overwhelmed with output. > > On Fri, Mar 5, 2010 at 9:08 AM, blue sky <bluesky...@gmail.com> wrote: >> Suppose, 'fr' is data.frame with columns 'Y', 'A' and 'B'. 'A' has levels >> 'Aa' >> 'Ab' and 'Ac', and 'B' has levels 'Ba', 'Bb', 'Bc' and 'Bd'. 'Y' >> columns are numbers. >> >> I tried the following three sets of commands. I understand that A*B is >> equivalent to A+B+A:B. However, A:B in A+B+A:B is different from A:B >> just by itself (see the 3rd and 4th set of commands). Would you please >> help me understand why the meanings of A:B are different in different >> contexts? >> >> I also see the coefficient of AAc:BBd is NA (the last set of >> commands). I'm wondering why this coefficient is not removed from the >> 'coefficients' vector. Since lm(Y~A) has coefficients for (intercept), >> Ab, Ac (there are no NA's), I think that it is reasonable to make sure >> that there are no NA's when there are interaction terms, namely, A:B >> in this case. >> >> Thank you for answering my questions! >> >>> alm=lm(Y ~ A*B,fr) >>> alm$coefficients >> (Intercept) AAb AAc BBb BBc BBd >> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840 >> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd >> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173 >>> >>> alm=lm(Y ~ A + B + A:B,fr) >>> alm$coefficients >> (Intercept) AAb AAc BBb BBc BBd >> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840 >> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd >> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173 >>> >>> alm=lm(Y ~ A:B - 1,fr) >>> alm$coefficients >> AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb AAc:BBb AAa:BBc >> -3.5481765 -5.6347625 3.4556808 0.8196231 3.8939016 -4.0349449 8.3391795 >> AAb:BBc AAc:BBc AAa:BBd AAb:BBd AAc:BBd >> -6.6005222 -4.9465744 -7.0190168 -2.3782017 -2.3423322 >>> >>> alm=lm(Y ~ A:B,fr) >>> alm$coefficients >> (Intercept) AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb >> -2.34233221 -1.20584424 -3.29243033 5.79801305 3.16195534 6.23623377 >> AAc:BBb AAa:BBc AAb:BBc AAc:BBc AAa:BBd AAb:BBd >> -1.69261273 10.68151168 -4.25819000 -2.60424217 -4.67668454 -0.03586951 >> AAc:BBd >> NA >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.