[R] Wrong contrast matrix for nested factors in lm(), rlm(), and lmRob()

2010-12-13 Thread Saul Sternberg

This message also reports wrong estimates produced by lmRob.fit.compute()
for nested factors when using the correct contrast matrix.

And in these respects, I have found that S-Plus behaves the same way as R.

Using the three available contrast types (sum, treatment, helmert)
with lm() or lm.fit(), but just contr.sum with rlm() and lmRob(),
and small examples, I generated contrast matrices for four models
involving nested factors with fixed effects.  For three of the models
the matrices were incorrect.

-
Details
-
For lm() and rlm() I used two data frames:

In "same.df" the nested factor, D, has the same number of levels for
each level of the nesting factor, G:
   G  D  T1
1 g1 d1 -60
2 g1 d2 -50
3 g1 d3 -40
4 g2 d1  30
5 g2 d2  50
6 g2 d3  70

In "diff.df" the nested factor, D, has a different number of levels for
the two levels of the nesting factor, G:
   G  D  T2
1 g1 d1 -60
2 g1 d2 -50
3 g1 d3 -40
4 g2 d1  20
5 g2 d2  40
6 g2 d3  60
7 g2 d4  80

(G, D are factors; T1, T2 are numeric)

For lmRob() I expanded the data frames to 600 or 700 rows by replicating
them 100 times and adding error to the observations.

For lm() the models and commands were
(1) model.matrix(lm(T1 ~ G + D%in%G, same.df))
(2) model.matrix(lm(T1 ~ D%in%G, same.df))
(3) model.matrix(lm(T2 ~ G + D%in%G, diff.df))
(4) model.matrix(lm(T2 ~ D%in%G, diff.df))

Using (1), all three types of contrast matrix were correctly generated
Using (2), the same incorrect contrast matrix was generated for all
three contrast types.
Using (3), an incorrect contrast matrix was generated for each of the
three contrast types.  For contr.treatment the error was an extra
column of zeros; for the others the error was more serious.
Using (4), the same incorrect contrast matrix was generated for all
three contrast types.

I used only contr.sum with rlm() and lmRob().  For model (1) the programs
worked correctly, but for models (2), (3), and (4) with the formulas
above, rlm() and lmRob() both reported that x was singular.

When x was the correct contrast matrix and T was the observation vector,
rlm(x,T) worked correctly for (2), (3), and (4), as did lm.fit(x,T).
However, whereas lmRob.fit.compute(x2=NULL,y=T,x1=x) worked correctly for
(3), the estimates it produced for (2) and (4) were radically wrong
(and were the same for different random seeds and initial algorithms).

---
Questions:
-

(1) If there is a way to use lm(), rlm(), and lmRob() in such cases
so that they generate the correct contrast matrices (and the desired
parameter estimates), what is it?

(2) If there is no way to do this, is the best alternative for the user to
create the desired model matrices "by hand" and provide them as arguments
to lim.fit(), rlm(), and lmRob.fit.compute()?  This would also require
that lmRob.fit.compute() generate the correct estimates.

(3) If one uses lm.fit() and lmRob.fit.compute() directly in this way,
then, given that one is warned against doing so, what are the dangers?

(4) According to cran.r-project.org/web/views/Robust.html, lmRob()
"makes use of the M-S algorithm of Maronna and Yohai (2000),
automatically when there are factors among the predictors (where
S-estimators (and hence MM-estimators) based on resampling typically
badly fail)."  Is there an alternative program that uses the M-S
algorithm, if lmRob() or lmRob.fit.compute() cannot be made to work? 


R%%>sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-redhat-linux-gnu (64-bit)
----
I'll be very grateful for any help.
Saul Sternberg, Psychology
University of Pennsylvania

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] model.matrix() and lm() for nested factors

2010-11-21 Thread Saul Sternberg

To R-help list:

I would like to use lm() and lmRob() to estimate the parameters of a
fixed-effects model that includes nested factors with unequal numbers
of levels, in some cases without also including the nesting factor in
the model.

When I specify options(contrasts=c("contr.sum","contr.poly")), the model
matrix generated by model.matrix() can be incorrect.  I realize that I can
create the desired matrix myself, and use lm.fit(), or lmRob.fit.compute()
but there are two problems with this: First, we are warned against
using those functions directly, and, second, whereas the models I want to
apply are the same for about 100 different data frames, each with about
500 observations, a different model matrix would have to be constructed
for each data frame, increasing the chance of error.

---
Here is a toy example:

Dataframe diff.df:
   G  D  T2  
1 g1 d1 -60 
2 g1 d2 -50 
3 g1 d3 -40 
4 g2 d1  20  
5 g2 d2  40  
6 g2 d3  60  
7 g2 d4  80  

(G, D factors; T2 numeric)

After
options(contrasts=c("contr.sum","contr.poly"))

neither of the following produces the desired model matrix:

model.matrix(T2 ~ G + D%in%G, diff.df)
model.matrix(T2 ~  D%in%G, diff.df)

For example, omitting row numbers, the second command produces:

Icpt   Dd1:Gg1 Dd2:Gg1 Dd3:Gg1 Dd4:Gg1 Dd1:Gg2 Dd2:Gg2 Dd3:Gg2 Dd4:Gg2
1   1   0   0   0   0   0   0   0   
1   0   1   0   0   0   0   0   0   
1   0   0   1   0   0   0   0   0   
1   0   0   0   0   1   0   0   0   
1   0   0   0   0   0   1   0   0   
1   0   0   0   0   0   0   1   0   
1   0   0   0   0   0   0   0   1   

whereas the correct matrix is:

Icpt Gg1:D1 Gg1:D2 Gg2:D1 Gg2:D2 Gg2:D3
1  1  0  0  0  0
1  0  1  0  0  0
1 -1 -1  0  0  0
1  0  0  1  0  0
1  0  0  0  1  0
1  0  0  0  0  1
1  0  0 -1 -1 -1

---
Other behavior of model.matrix() with nested factors in tiny examples:

When the nested factors had unequal numbers of levels and the nesting
factor was not included in the model, as above, the matrix
generated for contr.treatment was also wrong.

When the nested factors had unequal numbers of levels and the nesting
factor was included in the model, the matrix for contr.sum was
wrong, while the matrix for contr.treatment was wrong only in
having an extra column of zeros.

When the nested factors had equal numbers of levels and the nesting factor
was not included in the model, the correct matrix was generated
for neither contr.sum nor contr.treatment.

When the nested factors had equal numbers of levels and the nesting factor
was included in the model, the correct matrix was generated for
both contr.sum and contr.treatment.
---
My questions:

(1) If there is a way to use lm() and lmRob() in such cases so that they
generate the correct contrast matrices (and hence the desired parameter
estimates), what is it?

(2) If there is no way to do this, is the best alternative for the user to
create the desired model matrices "by hand" and provide them as arguments
to lim.fit() and lmRob.fit.compute()?

(3) If one uses lm.fit() and lmRob.fit.compute() directly in this way,
then, given that one is warned against doing so, what are the dangers?

Many thanks.

Saul Sternberg 
Psychology
University of Pennsylvania


R%%>sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base 

loaded via a namespace (and not attached):
[1] tools_2.12.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.