[R] Factors attribute?

2010-03-22 Thread rkevinburton
I noticed that when I fit a linear model using 'lm' there is an attribute 
called factors that is added to the term. It doesn't seem to appear for 
'model.matrix', just 'lm'. I have been unable to find where it gets constructed 
or what it means? It looks like a two dimensional array that I may be able to 
use so I would just like to get some 'official' statement regarding what it is 
and how it is constructed. I would rather not go on my assumptions. An example 
would be like:

 l - lm(prestige ~ income + education, data=Duncan)
 attr(l$terms,factors)
  income education
prestige   0 0
income 1 0
education  0 1

Thank you.

Kevin Burton
rkevinbur...@charter.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors attribute?

2010-03-22 Thread Henrique Dallazuanna
See ?terms

On Mon, Mar 22, 2010 at 2:08 PM,  rkevinbur...@charter.net wrote:
 I noticed that when I fit a linear model using 'lm' there is an attribute 
 called factors that is added to the term. It doesn't seem to appear for 
 'model.matrix', just 'lm'. I have been unable to find where it gets 
 constructed or what it means? It looks like a two dimensional array that I 
 may be able to use so I would just like to get some 'official' statement 
 regarding what it is and how it is constructed. I would rather not go on my 
 assumptions. An example would be like:

 l - lm(prestige ~ income + education, data=Duncan)
 attr(l$terms,factors)
          income education
 prestige       0         0
 income         1         0
 education      0         1

 Thank you.

 Kevin Burton
 rkevinbur...@charter.net

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors attribute?

2010-03-22 Thread rkevinburton

I am sorry but I didn't see factors mentioned in this documentation.

Kevin

 Henrique Dallazuanna www...@gmail.com wrote: 
 See ?terms
 
 On Mon, Mar 22, 2010 at 2:08 PM,  rkevinbur...@charter.net wrote:
  I noticed that when I fit a linear model using 'lm' there is an attribute 
  called factors that is added to the term. It doesn't seem to appear for 
  'model.matrix', just 'lm'. I have been unable to find where it gets 
  constructed or what it means? It looks like a two dimensional array that I 
  may be able to use so I would just like to get some 'official' statement 
  regarding what it is and how it is constructed. I would rather not go on my 
  assumptions. An example would be like:
 
  l - lm(prestige ~ income + education, data=Duncan)
  attr(l$terms,factors)
           income education
  prestige       0         0
  income         1         0
  education      0         1
 
  Thank you.
 
  Kevin Burton
  rkevinbur...@charter.net
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors attribute?

2010-03-22 Thread rkevinburton

I am sorry but I didn't see factors mentioned in this documentation.

Kevin

 Henrique Dallazuanna www...@gmail.com wrote: 
 See ?terms
 
 On Mon, Mar 22, 2010 at 2:08 PM,  rkevinbur...@charter.net wrote:
  I noticed that when I fit a linear model using 'lm' there is an attribute 
  called factors that is added to the term. It doesn't seem to appear for 
  'model.matrix', just 'lm'. I have been unable to find where it gets 
  constructed or what it means? It looks like a two dimensional array that I 
  may be able to use so I would just like to get some 'official' statement 
  regarding what it is and how it is constructed. I would rather not go on my 
  assumptions. An example would be like:
 
  l - lm(prestige ~ income + education, data=Duncan)
  attr(l$terms,factors)
           income education
  prestige       0         0
  income         1         0
  education      0         1
 
  Thank you.
 
  Kevin Burton
  rkevinbur...@charter.net
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors attribute?

2010-03-22 Thread Marc Schwartz
Kevin,

See ?terms.object, which is indicated in the Value section of ?terms and listed 
in the See Also of ?terms.

HTH,

Marc Schwartz

On Mar 22, 2010, at 1:16 PM, rkevinbur...@charter.net wrote:

 
 I am sorry but I didn't see factors mentioned in this documentation.
 
 Kevin
 
  Henrique Dallazuanna www...@gmail.com wrote: 
 See ?terms
 
 On Mon, Mar 22, 2010 at 2:08 PM,  rkevinbur...@charter.net wrote:
 I noticed that when I fit a linear model using 'lm' there is an attribute 
 called factors that is added to the term. It doesn't seem to appear for 
 'model.matrix', just 'lm'. I have been unable to find where it gets 
 constructed or what it means? It looks like a two dimensional array that I 
 may be able to use so I would just like to get some 'official' statement 
 regarding what it is and how it is constructed. I would rather not go on my 
 assumptions. An example would be like:
 
 l - lm(prestige ~ income + education, data=Duncan)
 attr(l$terms,factors)
  income education
 prestige   0 0
 income 1 0
 education  0 1
 
 Thank you.
 
 Kevin Burton
 rkevinbur...@charter.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Factors attribute format

2010-03-22 Thread rkevinburton
Thanks to Marc Schultz I found the documentation on the factors attribute 
under ?term.object. It stats:

factors: A matrix of variables by terms showing which variables appear
  in which terms.  The entries are 0 if the variable does not
  occur in the term, 1 if it does occur and should be coded by
  contrasts, and 2 if it occurs and should be coded via dummy
  variables for all levels (as when an intercept or lower-order
  term is missing).  If there are no terms other than an
  intercept and offsets, this is ‘numeric(0)’.

So now this brings up another question. It seems that the attriute is a two 
dimentional array. When I print it out in 'R' 

Fitting the formula prestige ~ income + education I get:

  income education
prestige   0 0
income 1 0
education  0 1

This matrix says to me that 'income' occurs in the term 'income' etc. So it 
seems that this matrix will always be a diagonal matrix with an added row of 
zeros containing the response term. If the formula is such that the response is 
a function of one or more of the dependent variables then of course it will be 
something other that a row of zeros. So far OK?

My problem in understanding comes with using a formula that contains R factors. 
I am using the following (from the TSA package)  for an example:

l - lm(tempdub ~ season(tempdub))
attr(l$terms, factors)

season(tempdub)
tempdub   0
season(tempdub)   1

The function 'season' produces a factor (in this case with 12 levels, one for 
each month). But the factor attribute still has a '1' and not a '2' indicating 
that the variable should be coded as a dummy variable (factor).

Please help my misunderstanding.

Thank you.

Kevin Burton

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factors attribute format

2010-03-22 Thread Marc Schwartz
On Mar 22, 2010, at 2:00 PM, rkevinbur...@charter.net wrote:

 Thanks to Marc Schultz I found the documentation on the factors attribute 
 under ?term.object. It stats:

cough   ;-)

 factors: A matrix of variables by terms showing which variables appear
  in which terms.  The entries are 0 if the variable does not
  occur in the term, 1 if it does occur and should be coded by
  contrasts, and 2 if it occurs and should be coded via dummy
  variables for all levels (as when an intercept or lower-order
  term is missing).  If there are no terms other than an
  intercept and offsets, this is ‘numeric(0)’.


The key is 'dummy variables for *all* levels'. In other words your example 
below of 12 months, would be represented by 12 individual binary (0/1) 
encodings, rather than, for example using default treatment contrasts, 11 
individual binary (0/1) encodings, where the base or reference level is not 
included in the resultant model matrix.

I have not spent a lot of time on this internal R/S model design point, but in 
rather simple cases as an example, a '2' will appear in the presence of 
interaction terms lacking the main effects term for the second factor:

 attr(terms(y ~ x + z), factors)
  x z
y 0 0
x 1 0
z 0 1

 attr(terms(y ~ x + x:z), factors)
  x x:z
y 0   0
x 1   2
z 0   1


Compare the second example above with the more common:

 attr(terms(y ~ x * z), factors)
  x z x:z
y 0 0   0
x 1 0   1
z 0 1   1

which is of course equivalent to:

 attr(terms(y ~ x + z + x:z), factors)
  x z x:z
y 0 0   0
x 1 0   1
z 0 1   1


The difference in the encodings will be reflected in the model matrix. See 
?model.matrix and play around with the examples there, including adding 
interaction terms. For example, model.matrix( ~ a + a:b, dd), etc.

This discussion leads into the complex issue of the internal representation of 
R (and S) models. If you really want to dig deeper, then you should get a copy 
of Statistical Models in S by Chambers and Hastie 1993 (aka The White Book) 
and specifically note the rule described on the bottom of page 38 therein, 
perhaps pre-reading the entire chapter leading up to that particular point.

HTH,

Marc


 So now this brings up another question. It seems that the attriute is a two 
 dimentional array. When I print it out in 'R' 
 
 Fitting the formula prestige ~ income + education I get:
 
  income education
 prestige   0 0
 income 1 0
 education  0 1
 
 This matrix says to me that 'income' occurs in the term 'income' etc. So it 
 seems that this matrix will always be a diagonal matrix with an added row of 
 zeros containing the response term. If the formula is such that the response 
 is a function of one or more of the dependent variables then of course it 
 will be something other that a row of zeros. So far OK?
 
 My problem in understanding comes with using a formula that contains R 
 factors. I am using the following (from the TSA package)  for an example:
 
 l - lm(tempdub ~ season(tempdub))
 attr(l$terms, factors)
 
season(tempdub)
 tempdub   0
 season(tempdub)   1
 
 The function 'season' produces a factor (in this case with 12 levels, one for 
 each month). But the factor attribute still has a '1' and not a '2' 
 indicating that the variable should be coded as a dummy variable (factor).
 
 Please help my misunderstanding.
 
 Thank you.
 
 Kevin Burton

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.