Hadley,

The S language modeling language was designed with Wilkinson and
Rogers in mind.  The notation was changed from their paper to
retain consistency with the parsing rules for ordinary algebra in
S.  I think of ":" as an indicator of an indexing system into the
dummy variables.  It is not an indicator of degrees of freedom.

For simplicity in notation, let A be a factor with a levels and B
be a factor with b levels.  Then A:B implies a set of dummy
variables with at most ab columns indexed by an A level and a B
level.  The degrees of freedom associated with A:B depends on the
linear dependencies of the associated dummy variables with the
dummy variables of other terms in the model.  The excess columns
can be suppressed when the dummy variables are generated or they
can be pivoted out during the analysis.  When we have the special
case A:A, there is only one factor mentioned, so the indexing
scheme is based on just the one factor.  You could generate the
full set of a^2 columns, and then you would discover that they
are all linearly dependent on the first a.


The columns can be labeled either
a1b1 a1b2 a1b3 a2b1 a2b2 a2b3
or
a1b1 a2b1 a1b2 a2b2 a1b3 a2b3

If there is crossing, we would report the a single sum of squares
and degrees of freedom for the interaction.  If there is nesting,
say a/b , then it might make sense to group the dummy variables
say (a1b1 a1b2 a1b3) and (a2b1 a2b2 a2b3) and report simple
effects sum of squares and degrees of freedom for each of the
groups.
The structure of the individual columns depends on the set of
contrasts used for the A and B factors.

Rich

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to