[R] Factors attribute?
I noticed that when I fit a linear model using 'lm' there is an attribute called factors that is added to the term. It doesn't seem to appear for 'model.matrix', just 'lm'. I have been unable to find where it gets constructed or what it means? It looks like a two dimensional array that I may be able to use so I would just like to get some 'official' statement regarding what it is and how it is constructed. I would rather not go on my assumptions. An example would be like: l - lm(prestige ~ income + education, data=Duncan) attr(l$terms,factors) income education prestige 0 0 income 1 0 education 0 1 Thank you. Kevin Burton rkevinbur...@charter.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factors attribute?
See ?terms On Mon, Mar 22, 2010 at 2:08 PM, rkevinbur...@charter.net wrote: I noticed that when I fit a linear model using 'lm' there is an attribute called factors that is added to the term. It doesn't seem to appear for 'model.matrix', just 'lm'. I have been unable to find where it gets constructed or what it means? It looks like a two dimensional array that I may be able to use so I would just like to get some 'official' statement regarding what it is and how it is constructed. I would rather not go on my assumptions. An example would be like: l - lm(prestige ~ income + education, data=Duncan) attr(l$terms,factors) income education prestige 0 0 income 1 0 education 0 1 Thank you. Kevin Burton rkevinbur...@charter.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factors attribute?
I am sorry but I didn't see factors mentioned in this documentation. Kevin Henrique Dallazuanna www...@gmail.com wrote: See ?terms On Mon, Mar 22, 2010 at 2:08 PM, rkevinbur...@charter.net wrote: I noticed that when I fit a linear model using 'lm' there is an attribute called factors that is added to the term. It doesn't seem to appear for 'model.matrix', just 'lm'. I have been unable to find where it gets constructed or what it means? It looks like a two dimensional array that I may be able to use so I would just like to get some 'official' statement regarding what it is and how it is constructed. I would rather not go on my assumptions. An example would be like: l - lm(prestige ~ income + education, data=Duncan) attr(l$terms,factors) income education prestige 0 0 income 1 0 education 0 1 Thank you. Kevin Burton rkevinbur...@charter.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factors attribute?
I am sorry but I didn't see factors mentioned in this documentation. Kevin Henrique Dallazuanna www...@gmail.com wrote: See ?terms On Mon, Mar 22, 2010 at 2:08 PM, rkevinbur...@charter.net wrote: I noticed that when I fit a linear model using 'lm' there is an attribute called factors that is added to the term. It doesn't seem to appear for 'model.matrix', just 'lm'. I have been unable to find where it gets constructed or what it means? It looks like a two dimensional array that I may be able to use so I would just like to get some 'official' statement regarding what it is and how it is constructed. I would rather not go on my assumptions. An example would be like: l - lm(prestige ~ income + education, data=Duncan) attr(l$terms,factors) income education prestige 0 0 income 1 0 education 0 1 Thank you. Kevin Burton rkevinbur...@charter.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factors attribute?
Kevin, See ?terms.object, which is indicated in the Value section of ?terms and listed in the See Also of ?terms. HTH, Marc Schwartz On Mar 22, 2010, at 1:16 PM, rkevinbur...@charter.net wrote: I am sorry but I didn't see factors mentioned in this documentation. Kevin Henrique Dallazuanna www...@gmail.com wrote: See ?terms On Mon, Mar 22, 2010 at 2:08 PM, rkevinbur...@charter.net wrote: I noticed that when I fit a linear model using 'lm' there is an attribute called factors that is added to the term. It doesn't seem to appear for 'model.matrix', just 'lm'. I have been unable to find where it gets constructed or what it means? It looks like a two dimensional array that I may be able to use so I would just like to get some 'official' statement regarding what it is and how it is constructed. I would rather not go on my assumptions. An example would be like: l - lm(prestige ~ income + education, data=Duncan) attr(l$terms,factors) income education prestige 0 0 income 1 0 education 0 1 Thank you. Kevin Burton rkevinbur...@charter.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factors attribute format
Thanks to Marc Schultz I found the documentation on the factors attribute under ?term.object. It stats: factors: A matrix of variables by terms showing which variables appear in which terms. The entries are 0 if the variable does not occur in the term, 1 if it does occur and should be coded by contrasts, and 2 if it occurs and should be coded via dummy variables for all levels (as when an intercept or lower-order term is missing). If there are no terms other than an intercept and offsets, this is ‘numeric(0)’. So now this brings up another question. It seems that the attriute is a two dimentional array. When I print it out in 'R' Fitting the formula prestige ~ income + education I get: income education prestige 0 0 income 1 0 education 0 1 This matrix says to me that 'income' occurs in the term 'income' etc. So it seems that this matrix will always be a diagonal matrix with an added row of zeros containing the response term. If the formula is such that the response is a function of one or more of the dependent variables then of course it will be something other that a row of zeros. So far OK? My problem in understanding comes with using a formula that contains R factors. I am using the following (from the TSA package) for an example: l - lm(tempdub ~ season(tempdub)) attr(l$terms, factors) season(tempdub) tempdub 0 season(tempdub) 1 The function 'season' produces a factor (in this case with 12 levels, one for each month). But the factor attribute still has a '1' and not a '2' indicating that the variable should be coded as a dummy variable (factor). Please help my misunderstanding. Thank you. Kevin Burton __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factors attribute format
On Mar 22, 2010, at 2:00 PM, rkevinbur...@charter.net wrote: Thanks to Marc Schultz I found the documentation on the factors attribute under ?term.object. It stats: cough ;-) factors: A matrix of variables by terms showing which variables appear in which terms. The entries are 0 if the variable does not occur in the term, 1 if it does occur and should be coded by contrasts, and 2 if it occurs and should be coded via dummy variables for all levels (as when an intercept or lower-order term is missing). If there are no terms other than an intercept and offsets, this is ‘numeric(0)’. The key is 'dummy variables for *all* levels'. In other words your example below of 12 months, would be represented by 12 individual binary (0/1) encodings, rather than, for example using default treatment contrasts, 11 individual binary (0/1) encodings, where the base or reference level is not included in the resultant model matrix. I have not spent a lot of time on this internal R/S model design point, but in rather simple cases as an example, a '2' will appear in the presence of interaction terms lacking the main effects term for the second factor: attr(terms(y ~ x + z), factors) x z y 0 0 x 1 0 z 0 1 attr(terms(y ~ x + x:z), factors) x x:z y 0 0 x 1 2 z 0 1 Compare the second example above with the more common: attr(terms(y ~ x * z), factors) x z x:z y 0 0 0 x 1 0 1 z 0 1 1 which is of course equivalent to: attr(terms(y ~ x + z + x:z), factors) x z x:z y 0 0 0 x 1 0 1 z 0 1 1 The difference in the encodings will be reflected in the model matrix. See ?model.matrix and play around with the examples there, including adding interaction terms. For example, model.matrix( ~ a + a:b, dd), etc. This discussion leads into the complex issue of the internal representation of R (and S) models. If you really want to dig deeper, then you should get a copy of Statistical Models in S by Chambers and Hastie 1993 (aka The White Book) and specifically note the rule described on the bottom of page 38 therein, perhaps pre-reading the entire chapter leading up to that particular point. HTH, Marc So now this brings up another question. It seems that the attriute is a two dimentional array. When I print it out in 'R' Fitting the formula prestige ~ income + education I get: income education prestige 0 0 income 1 0 education 0 1 This matrix says to me that 'income' occurs in the term 'income' etc. So it seems that this matrix will always be a diagonal matrix with an added row of zeros containing the response term. If the formula is such that the response is a function of one or more of the dependent variables then of course it will be something other that a row of zeros. So far OK? My problem in understanding comes with using a formula that contains R factors. I am using the following (from the TSA package) for an example: l - lm(tempdub ~ season(tempdub)) attr(l$terms, factors) season(tempdub) tempdub 0 season(tempdub) 1 The function 'season' produces a factor (in this case with 12 levels, one for each month). But the factor attribute still has a '1' and not a '2' indicating that the variable should be coded as a dummy variable (factor). Please help my misunderstanding. Thank you. Kevin Burton __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.