[R] Dominant factors in aov?

2004-12-02 Thread Rene Eschen
Hi all,

I'm using R 2.0.1. for Windows to analyze the influence of following factors
on response Y:

A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)

The dataset looks like this:
A   B   C   D   E   Y
0   1   1   1   1   491.9
0   1   1   1   2   618.7
0   1   1   1   3   448.2
0   1   1   1   4   632.9
250 1   1   1   1   92.4
250 1   1   1   2   117
250 1   1   1   3   35.5
250 1   1   1   4   102.7
500 1   1   1   1   47
500 1   1   1   2   57.4
500 1   1   1   3   6.5
500 1   1   1   4   50.9
10001   1   1   1   0.7
10001   1   1   2   6.2
10001   1   1   3   0.5
10001   1   1   4   1.1
0   2   2   2   1   6
0   2   2   2   2   4.2
0   2   2   2   3   20.3
0   2   2   2   4   3.5
250 2   2   2   1   8.4
250 2   2   2   2   2.8

etc.

If I ask the following: summary(aov(Y~A+B+C+D+E))

R gives me this answer:

 Df  Sum Sq Mean Sq  F value Pr(F)
A 3 135.602  45.201 310.2166 2e-16 ***
B 2   0.553   0.276   1.8976 0.1512
C 1   0.281   0.281   1.9264 0.1659
D25  92.848   3.714  25.4890 2e-16 ***
E 3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146   

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?

When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical reason
for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df. 

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?

Many thanks,

René Eschen.

___
drs. René Eschen
CABI Bioscience Switzerland Centre
1 Rue des Grillons
CH-2800 Delémont
Switzerland
+41 32 421 48 87 (Direct)
+41 32 421 48 70 (Secretary)
+41 32 421 48 71 (Fax)

http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dominant factors in aov?

2004-12-02 Thread Jonathan Baron
I'm not a statistician, so take what I say with a grain of salt.

On 12/02/04 06:29, Rene Eschen wrote:
Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?

When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical reason
for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df.

I would guess that the factors are somewhat predictable from each
other.  That is, there is some redundancy.  Try predicting each
factor from all the others, without the dependent variable.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dominant factors in aov?

2004-12-02 Thread Christoph Scherber
Dear Rene,
First of all, note that A,B,C,D, and E need to be declared as factors in 
the beginning, using factor() (but I think you did this already). Also, 
make sure that the data are read into R in the correct way (i.e. . 
separating decimal places).

The reason for the singularities is that B, C and D are not 
independent (in fact, they´re identical in their factor levels, and 
hence in their effect on Y).

For this reason, only the effects of A, B and E can be estimated:
  Df Sum Sq Mean Sq F valuePr(F)   
A3 302286  100762  7.9887  0.002396 **
B1 422869  422869 33.5263 4.683e-05 ***
E3  222817427  0.5888  0.632334   
Residuals   14 176583   12613 

A has 4 levels so there should be 3 d.f. (that´s correct in the table)
B has 2 levels so there is only 1 d.f. (that´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)
In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so 
that´s OK, too.

Hope this helps,
Christoph

levels(A)
[1] 0250  500  1000
 levels(B)
[1] 1 2
 levels(E)
[1] 1 2 3 4


Rene Eschen wrote:
Hi all,
I'm using R 2.0.1. for Windows to analyze the influence of following factors
on response Y:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
The dataset looks like this:
A   B   C   D   E   Y
0   1   1   1   1   491.9
0   1   1   1   2   618.7
0   1   1   1   3   448.2
0   1   1   1   4   632.9
250 1   1   1   1   92.4
250 1   1   1   2   117
250 1   1   1   3   35.5
250 1   1   1   4   102.7
500 1   1   1   1   47
500 1   1   1   2   57.4
500 1   1   1   3   6.5
500 1   1   1   4   50.9
10001   1   1   1   0.7
10001   1   1   2   6.2
10001   1   1   3   0.5
10001   1   1   4   1.1
0   2   2   2   1   6
0   2   2   2   2   4.2
0   2   2   2   3   20.3
0   2   2   2   4   3.5
250 2   2   2   1   8.4
250 2   2   2   2   2.8
etc.
If I ask the following: summary(aov(Y~A+B+C+D+E))
R gives me this answer:
  		 Df  Sum Sq Mean Sq  F value Pr(F)
A  		  3 135.602  45.201 310.2166 2e-16 ***
B  		  2   0.553   0.276   1.8976 0.1512
C  		  1   0.281   0.281   1.9264 0.1659
D  		 25  92.848   3.714  25.4890 2e-16 ***
E  		  3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146   

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?
When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical reason
for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df. 

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?
Many thanks,
René Eschen.
___
drs. René Eschen
CABI Bioscience Switzerland Centre
1 Rue des Grillons
CH-2800 Delémont
Switzerland
+41 32 421 48 87 (Direct)
+41 32 421 48 70 (Secretary)
+41 32 421 48 71 (Fax)
http://www.unifr.ch/biol/ecology/muellerschaerer/group/eschen/eschen.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dominant factors in aov?

2004-12-02 Thread Christoph Scherber
Dear Rene,
At least from the part of the data.frame attached to your mail, I 
assumed that C,D and E changed in identical ways (but maybe I got this 
wrong).

With your following combination of factors:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
And assuming independence of the treatment levels, you should get
3 d.f. for A
2 d.f. for B
28 d.f. for D
3 d.f. for E
? residual d.f. (how big is total number of Y values?)
The problem arises if parts of treatments B,D and E are applied to the same 
subjects, e.g.
B   D   E   Y   
1   1   1   400
2   2   2   300
2   2   3   420
2   2   4   350
(etc)
then you immediately run into problems because treatments B and D (in this 
case) change in an identical way, i.e. the variances calculated for each level 
of B and D are the same; this is what causes the ´singularities´. Errors need 
to be independent, otherwise you will have order dependence in your analyses.
i.e. the output of your aov model will change depending on the sequence in 
which the terms A,B,C,D,E are entered.
Did I get this right? It would probably help to see the full dataset
Best wishes
Christoph



Rene Eschen wrote:
Dear Christoph, 

 

The reason for the singularities is that B, C and D are not 
independent (in fact, they´re identical in their factor levels, and 
hence in their effect on Y).
   

I do not understand this. You gave the correct levels for A, B and E, but I
do not see how they are identical. They have different levels and different
codings, or is it because A has the same number of levels as E, and E shares
some of the coding with B?
René Eschen.
---
For this reason, only the effects of A, B and E can be estimated:
  Df Sum Sq Mean Sq F valuePr(F)   
A3 302286  100762  7.9887  0.002396 **
B1 422869  422869 33.5263 4.683e-05 ***
E3  222817427  0.5888  0.632334   
Residuals   14 176583   12613 

A has 4 levels so there should be 3 d.f. (that´s correct in the table)
B has 2 levels so there is only 1 d.f. (that´s also correct)
E has 4 levels so there should be 3 d.f. (also O.K.)
In total, there are [(n=22)-(3)-(1)-(3)] -1 = 14 residual d.f., so 
that´s OK, too.

Hope this helps,
Christoph

levels(A)
[1] 0250  500  1000
 levels(B)
[1] 1 2
 levels(E)
[1] 1 2 3 4


Rene Eschen wrote:
 

Hi all,
I'm using R 2.0.1. for Windows to analyze the influence of following
   

factors
 

on response Y:
A (four levels)
B (three levels)
C (two levels)
D (29 levels) with
E (four replicates)
The dataset looks like this:
A   B   C   D   E   Y
0   1   1   1   1   491.9
0   1   1   1   2   618.7
0   1   1   1   3   448.2
0   1   1   1   4   632.9
250 1   1   1   1   92.4
250 1   1   1   2   117
250 1   1   1   3   35.5
250 1   1   1   4   102.7
500 1   1   1   1   47
500 1   1   1   2   57.4
500 1   1   1   3   6.5
500 1   1   1   4   50.9
10001   1   1   1   0.7
10001   1   1   2   6.2
10001   1   1   3   0.5
10001   1   1   4   1.1
0   2   2   2   1   6
0   2   2   2   2   4.2
0   2   2   2   3   20.3
0   2   2   2   4   3.5
250 2   2   2   1   8.4
250 2   2   2   2   2.8
etc.
If I ask the following: summary(aov(Y~A+B+C+D+E))
R gives me this answer:
 		 Df  Sum Sq Mean Sq  F value Pr(F)
A  		  3 135.602  45.201 310.2166 2e-16 ***
B  		  2   0.553   0.276   1.8976 0.1512
C  		  1   0.281   0.281   1.9264 0.1659
D  		 25  92.848   3.714  25.4890 2e-16 ***
E  		  3   0.231   0.077   0.5279 0.6634
Residuals   411  59.885   0.146   

Can someone explain me why factor C has only 25 Df (in stead of 28, what I
expected), and why this number changes when I leave out factors B or C (but
not A)? Why do factors B and C (but again: not A) not show up in the
calculation if they appear later in the formula than D?
When I ask summary.lm(aov(Y~A+B+C+D+E)), R tells me that three levels of D
were not defined because of singularities (what does this word mean?).
After checking and playing around with the dataset, I find no logical
   

reason
 

for which levels are not defined. Even if I construct a perfect dataset
(balanced, no missing values) I never get the correct number of Df. 

My other datasets are analyzed as expected using the similar function calls
and similar datasets. Am I doing something wrong here?
Many thanks,
René Eschen.
___
drs. René Eschen
CABI Bioscience Switzerland Centre
1 Rue des Grillons
CH-2800 Delémont
Switzerland
+41 32 421 48 87 (Direct)
+41 32 421