Hi,
I'm wrestling with an analysis of a dataset, which I previously
analyzed in SYSTAT, but am now converting to R and was doing a
re-analysis. I noticed, however, that the same model yields different
results (different sums of squares) from the two programs. I first
thought this might be because the two programs use different
calculations to get the sums of squares, but the problem persisted
even after I specified type III sums of squares. Can anyone help me
by clarifying why there is this discrepancy?
The data table is:
host size2 maladapt increase
A yes 35 21
A yes 30 13
A no 73 -6
A yes 22 3
C yes 19 -1
A no 53 1
C no 48 -27
A yes 32 26
A yes 14 1
A no 83 42
A yes 19 -3
A no 66 -7
C no 69 -14
A yes 30 30
C no 69 -22
A yes 10 6
C no 65 -15
A yes 11 4
A yes 15 15
A no 77 30
C yes 11 11
A no 48 -4
C yes 29 -4
A yes 0 0
C no 69 -2
A yes 10 -40
C yes 8 -6
C no 91 -2
C no 65 13
A yes 12 0
C yes 16 -26
C yes 38 -12
A no 43 20
C no 81 -7
A yes 9 9
C no 100 25
A yes 18 12
C yes 27 -6
A yes 11 -3
The dialogue in R is as follows:
> > library(car)
>
> > read.table(file="/Users/lukeharmon/Desktop/glmnosil.txt",
>header=T)->nn
> > attach(nn)
> > ls(2)
>[1] "host" "increase" "maladapt" "size2" "size4"
> > lm(maladapt~host*increase*size2)
>
>Call:
>lm(formula = maladapt ~ host * increase * size2)
>
>Coefficients:
> (Intercept) hostC
>increase size2yes
> 59.54144 17.13828
>0.34487 -44.41381
> hostC:increase hostC:size2yes
>increase:size2yes hostC:increase:size2yes
> 0.30449 -12.50558
>0.03766 -0.90697
>
> > lm(maladapt~host*increase*size2)->fm
> > Anova(fm, type="III")
>Anova Table (Type III tests)
>
>Response: maladapt
> Sum Sq Df F value Pr(>F)
>(Intercept) 18348.5 1 152.9683 1.595e-13 ***
>host 920.9 1 7.6774 0.009366 **
>increase 278.4 1 2.3210 0.137773
>size2 7447.0 1 62.0841 6.806e-09 ***
>host:increase 105.1 1 0.8758 0.356584
>host:size2 266.9 1 2.2252 0.145880
>increase:size2 2.0 1 0.0171 0.896902
>host:increase:size2 332.3 1 2.7703 0.106108
>Residuals 3718.4 31
>---
>Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Contrast this with the results from SYSTAT
SourceSum-of-SquaresdfMean-SquareF-ratioP
HOST$808.9491808.9496.7440.014
SIZE2$17525.418117525.418146.1060.000
INCREASE540.5791540.5794.5070.042
SIZE2$*HOST$266.9151266.9152.2250.146
SIZE2$*INCREASE279.3891279.3892.3290.137
HOST$*INCREASE35.869135.8690.2990.588
SIZE2$*HOST$*INCREASE332.2931332.2932.7700.106
Error3718.44131119.950
I've been trying to find anything in the documentation for anova()
that would give a default that is different from what is in SYSTAT,
but part of the problem is that SYSTAT is somewhat opaque as to its
calculations, so it is hard to contrast the two. I would really
really welcome feedback as to what may cause this discrepancy.
Thanks very much for your help,
Dan Bolnick
Section of Integrative Biology
University of Texas at Austin
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.