Re: [R] model.matrix for a factor effect with no intercept

2005-02-23 Thread Prof Brian Ripley
MASS4 p.150
White Book p.38
Those are the only two reasonably comprehensive accounts that I am aware 
of (and they have only partial overlap).

The underlying motivation is to span the _additional_ vector space covered 
by the term, the complement to what has gone before.  Put another way, as 
each term is added, only enough columns are added to the model matrix to 
span the same space as if dummy coding had been used for that term and its 
predecessors.  So think of this as a way to produce a parsimonious 
(usually full-rank) basis for the model space.

On Wed, 23 Feb 2005, David Firth wrote:
I was surprised by this (in R 2.0.1):
a - ordered(-1:1)
a
[1] -1 0  1
Levels: -1  0  1
model.matrix(~ a)
 (Intercept)   a.La.Q
1   1 -7.071068e-01  0.4082483
2   1 -9.073800e-17 -0.8164966
3   1  7.071068e-01  0.4082483
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
model.matrix(~ -1 + a)
 a-1 a0 a1
1   1  0  0
2   0  1  0
3   0  0  1
attr(,assign)
[1] 1 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
Without the intercept, treatment contrasts seem to have been used (this 
despite the contr.poly in the contrasts attribute).

It's not restricted to ordered factors.  For example, if Helmert contrasts 
are used for nominal factors, the same sort of thing happens.

I suppose it is a deliberate feature (perhaps to protect the user from 
accidentally fitting models that make no sense?  or maybe some better 
reason?) -- is it explained somewhere?
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] model.matrix for a factor effect with no intercept

2005-02-23 Thread David Firth
Brian, many thanks for these helpful pointers:
On 23 Feb 2005, at 15:45, Prof Brian Ripley wrote:
MASS4 p.150
White Book p.38
Those are the only two reasonably comprehensive accounts that I am 
aware of (and they have only partial overlap).

The underlying motivation is to span the _additional_ vector space 
covered by the term, the complement to what has gone before.  Put 
another way, as each term is added, only enough columns are added to 
the model matrix to span the same space as if dummy coding had been 
used for that term and its predecessors.  So think of this as a way to 
produce a parsimonious (usually full-rank) basis for the model space.
Yes, indeed.  My surprise was that *this* particular basis (dummy 
coding) was the one used here.  I should have got a clue from what 
contrasts() does, and is documented to do,

 options(contrasts)
$contrasts
[1] contr.treatment contr.poly
 contrasts(a)
.L .Q
[1,] -7.071068e-01  0.4082483
[2,] -9.073800e-17 -0.8164966
[3,]  7.071068e-01  0.4082483
but when there's no intercept in the model the contrasts used appear to 
be

 contrasts(a, contrasts = FALSE)
   -1 0 1
-1  1 0 0
0   0 1 0
1   0 0 1
which are not the same as
 contr.poly(a, contrasts = FALSE)
 ^0^1 ^2
[1,]  1 -7.071068e-01  0.4082483
[2,]  1 -9.073800e-17 -0.8164966
[3,]  1  7.071068e-01  0.4082483
-- which is what I had naively expected to get in my model matrix. 

This is of course all a matter of convention.  The present convention 
does seem a touch confusing though: the basis for the space spanned by 
a factor is determined by options(contrasts) or by the contrasts 
attribute of the factor or by the contrasts argument in the call, 
*except* when there's no intercept or other factor earlier in the model 
in which case all such settings are ignored (well, not *quite* ignored: 
they do get put in the contrasts attribute of the resultant model 
matrix).

On the other hand, one good reason to use dummy coding for the 
first-encountered factor when there's no intercept is that the 
associated parameters are then often interpretable as group-specific 
intercepts.

Would it be an improvement, though, if the contrasts attribute of the 
resultant model matrix contained contr.treatment in such cases 
instead of the name of a contrast function that was not actually used?

Best wishes,
David

On Wed, 23 Feb 2005, David Firth wrote:
I was surprised by this (in R 2.0.1):
a - ordered(-1:1)
a
[1] -1 0  1
Levels: -1  0  1
model.matrix(~ a)
 (Intercept)   a.La.Q
1   1 -7.071068e-01  0.4082483
2   1 -9.073800e-17 -0.8164966
3   1  7.071068e-01  0.4082483
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
model.matrix(~ -1 + a)
 a-1 a0 a1
1   1  0  0
2   0  1  0
3   0  0  1
attr(,assign)
[1] 1 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
Without the intercept, treatment contrasts seem to have been used 
(this despite the contr.poly in the contrasts attribute).

It's not restricted to ordered factors.  For example, if Helmert 
contrasts are used for nominal factors, the same sort of thing 
happens.

I suppose it is a deliberate feature (perhaps to protect the user 
from accidentally fitting models that make no sense?  or maybe some 
better reason?) -- is it explained somewhere?
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] model.matrix for a factor effect with no intercept

2005-02-23 Thread Prof Brian Ripley
David,
I am sorry, I did not read your question as `why dummy coding not poly 
coding', but I still seemed to have answered it.  BTW,

contr.poly(a, contrasts = FALSE)
should be an indicator matrix according to its help page, so I would not 
have expected what you expected, and if it had been, you might not have 
been surprised. [More on that below.]

I would say the algorithm was `code by the specified contrasts only if 
dummy coding is not parsimonious'.

Your final point isn't the whole story. If I have 0+a+b+a:b, R uses the 
contrasts for a in the interaction but not the main term.  So the 
attribute contrasts for 'a' represents

  `the contrast used for those terms (possibly none) in which it needed
   coding'.
Now, if there were none it might be better to say nothing, but that is not 
actually calculated explicitly.  I've just checked, and we do not document 
on the help page what it means.  I have just drafted

  If there are any factors in terms in the model, there is an attribute
  \code{contrasts}, a named list with an entry for each factor. This
  specifies the contrasts that would be used in terms in which the
  factor is coded by contrasts (in some terms dummy coding may be used),
  either as a character vector naming a function or as a numeric matrix.
As far as I recall the contrasts attribute is returned to be tacked on a 
fit and used later to reconstruct the model matrix, or perhaps a model 
matrix for another formula and the same data.  So I think it does need to 
be the contrasts as specified at the time of the model.matrix call.

It is probably helpful to note that it is not specifically treatment 
contrasts that are used but dummy coding: contr.sum and contr.helmert give 
the same matrix.

It is a bug that contr.poly(x, contrasts=FALSE) does not behave as 
documented (a bug shared with S, it seems).  I do dimly recall this having 
been raised in the past (probably by me), but cannot find any record as to 
why the discrepancy remains.  We should change the code or the 
documentation, probably the latter.

Brian
On Wed, 23 Feb 2005, David Firth wrote:
Brian, many thanks for these helpful pointers:
On 23 Feb 2005, at 15:45, Prof Brian Ripley wrote:
MASS4 p.150
White Book p.38
Those are the only two reasonably comprehensive accounts that I am aware of 
(and they have only partial overlap).

The underlying motivation is to span the _additional_ vector space covered 
by the term, the complement to what has gone before.  Put another way, as 
each term is added, only enough columns are added to the model matrix to 
span the same space as if dummy coding had been used for that term and its 
predecessors.  So think of this as a way to produce a parsimonious (usually 
full-rank) basis for the model space.
Yes, indeed.  My surprise was that *this* particular basis (dummy coding) was 
the one used here.  I should have got a clue from what contrasts() does, and 
is documented to do,

options(contrasts)
$contrasts
[1] contr.treatment contr.poly
contrasts(a)
   .L .Q
[1,] -7.071068e-01  0.4082483
[2,] -9.073800e-17 -0.8164966
[3,]  7.071068e-01  0.4082483
but when there's no intercept in the model the contrasts used appear to be
contrasts(a, contrasts = FALSE)
  -1 0 1
-1  1 0 0
0   0 1 0
1   0 0 1
which are not the same as
contr.poly(a, contrasts = FALSE)
^0^1 ^2
[1,]  1 -7.071068e-01  0.4082483
[2,]  1 -9.073800e-17 -0.8164966
[3,]  1  7.071068e-01  0.4082483
-- which is what I had naively expected to get in my model matrix. 
This is of course all a matter of convention.  The present convention does 
seem a touch confusing though: the basis for the space spanned by a factor is 
determined by options(contrasts) or by the contrasts attribute of the 
factor or by the contrasts argument in the call, *except* when there's no 
intercept or other factor earlier in the model in which case all such 
settings are ignored (well, not *quite* ignored: they do get put in the 
contrasts attribute of the resultant model matrix).

On the other hand, one good reason to use dummy coding for the 
first-encountered factor when there's no intercept is that the associated 
parameters are then often interpretable as group-specific intercepts.

Would it be an improvement, though, if the contrasts attribute of the 
resultant model matrix contained contr.treatment in such cases instead of 
the name of a contrast function that was not actually used?

Best wishes,
David

On Wed, 23 Feb 2005, David Firth wrote:
I was surprised by this (in R 2.0.1):
a - ordered(-1:1)
a
[1] -1 0  1
Levels: -1  0  1
model.matrix(~ a)
 (Intercept)   a.La.Q
1   1 -7.071068e-01  0.4082483
2   1 -9.073800e-17 -0.8164966
3   1  7.071068e-01  0.4082483
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
model.matrix(~ -1 + a)
 a-1 a0 a1
1   1  0  0
2   0  1  0
3   0  0  1
attr(,assign)
[1] 1 1 1
attr(,contrasts)
attr(,contrasts)$a
[1]