Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-29 Thread Bill.Venables
I cannot resist a very brief entry into this old and seemingly
immortal issue, but I will be very brief, I promise!

Amasco Miralisus suggests:

 As I understood form R FAQ, there is disagreement among Statisticians
 which SS to use

(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-a
nova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f).

To let this go is to concede way too much.  The 'disagreement' is
really over whether this is a sensible question to ask in the first
place.  One side of the debate suggests that the real question is what
hypotheses does it make sense to test and within what outer
hypotheses.  Settle that question and no issue on types of sums of
squares arises.

This is often a hard question to get your head around, and the
attraction of offering a variety of 'types of sums of squares' holds
out the false hope that perhaps you don't need to do so.  The bad
news is that for good science and good decision making, you do.

Bill Venables.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-29 Thread John Sorkin
Amasco,
In general it is dangerous to attempt to interpret a main effect that
is included in an interaction, regardless of wether or not the
interaction is significant. If you want to make a valid inference about
a main effect it is safest to do so after dropping any interaction that
contains the main effect. Since you would not want to drop a significant
interaction, you should not try to interpret a main effect in the
presence of a significant interaction that contains the main effect. If
the interaction is not significant drop the interaction, re-run the
model and then look at the main effect.
John
 
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
[EMAIL PROTECTED]

 Amasco Miralisus [EMAIL PROTECTED] 8/28/2006 3:20 PM 
Hello,

First of all, I would like to thank everybody who answered my
question. Every post has added something to my knowledge of the topic.
I now know why Type III SS are so questionable.

As I understood form R FAQ, there is disagreement among Statisticians
which SS to use
(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-anova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f).
However, most commercial statistical packages use Type III as the
default (with orthogonal contrasts), just as STATISTICA, from which I
am currently trying to migrate to R. This was probably was done for
the convenience of end-users who are not very experienced in
theoretical statistics.

I am aware that the same result could be produced using the standard
anova() function with Type I sequential SS, supplemented by drop1()
function, but this approach will look quite complicated for persons
without any substantial background in statistics, like no-math
students. I would prefer easier way, possibly more universal, though
also probably more for dummies :) If am not mistaken, car package by
John Fox with his nice Anova() function is the reasonable alternative
for any, who wish to simply perform quick statistical analysis,
without afraid to mess something with model fitting. Of course
orthogonal contrasts have to be specified (for example contr.sum) in
case of Type III SS.

Therefore, I would like to reformulate my questions, to make it easier
for you to answer:

1. The first question related to answer by Professor Brian Ripley: Did
I understood correctly from the advised paper (Bill Venables'
'exegeses' paper) that there is not much sense to test main effects if
the interaction is significant?

2. If I understood the post by John Fox correctly, I could safely use
Anova(.,type=III) function from car for ANOVA analyses in R, both
for balanced and unbalanced designs? Of course providing the model was
fitted with orthogonal contrasts. Something like below:
mod - aov(response ~ factor1 * factor2, data=mydata,
   contrasts=list(factor1=contr.sum,
factor2=contr.sum))
Anova(mod, type=III)

It was also said in most of your posts that the decision of which of
Type of SS to use has to be done on the basis of the hypothesis we
want to test. Therefore, let's assume that I would like to test the
significance of both factors, and if some of them significant, I plan
to use post-hoc tests to explore difference(s) between levels of this
significant factor(s).

Thank you in advance, Amasco

On 8/27/06, John Fox [EMAIL PROTECTED] wrote:
 Dear Amasco,

 A complete explanation of the issues that you raise is awkward in an
email,
 so I'll address your questions briefly. Section 8.2 of my text,
Applied
 Regression Analysis, Linear Models, and Related Methods (Sage, 1997)
has a
 detailed discussion.

 (1) In balanced designs, so-called Type I, II, and III sums of
squares
 are identical. If the STATA manual says that Type II tests are only
 appropriate in balanced designs, then that doesn't make a whole lot
of sense
 (unless one believes that Type-II tests are nonsense, which is not
the
 case).

 (2) One should concentrate not directly on different types of sums
of
 squares, but on the hypotheses to be tested. Sums of squares and
F-tests
 should follow from the hypotheses. Type-II and Type-III tests (if the
latter
 are properly formulated) test hypotheses that are reasonably
construed as
 tests of main effects and interactions in unbalanced designs. In
unbalanced
 designs, Type-I sums of squares usually test hypotheses of interest
only by
 accident.

 (3) Type-II sums of squares are constructed obeying the principle of
 marginality, so the kinds of contrasts employed to represent factors
are
 irrelevant to the sums of squares produced. 

Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-28 Thread Amasco Miralisus
Hello,

First of all, I would like to thank everybody who answered my
question. Every post has added something to my knowledge of the topic.
I now know why Type III SS are so questionable.

As I understood form R FAQ, there is disagreement among Statisticians
which SS to use
(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-anova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f).
However, most commercial statistical packages use Type III as the
default (with orthogonal contrasts), just as STATISTICA, from which I
am currently trying to migrate to R. This was probably was done for
the convenience of end-users who are not very experienced in
theoretical statistics.

I am aware that the same result could be produced using the standard
anova() function with Type I sequential SS, supplemented by drop1()
function, but this approach will look quite complicated for persons
without any substantial background in statistics, like no-math
students. I would prefer easier way, possibly more universal, though
also probably more for dummies :) If am not mistaken, car package by
John Fox with his nice Anova() function is the reasonable alternative
for any, who wish to simply perform quick statistical analysis,
without afraid to mess something with model fitting. Of course
orthogonal contrasts have to be specified (for example contr.sum) in
case of Type III SS.

Therefore, I would like to reformulate my questions, to make it easier
for you to answer:

1. The first question related to answer by Professor Brian Ripley: Did
I understood correctly from the advised paper (Bill Venables'
'exegeses' paper) that there is not much sense to test main effects if
the interaction is significant?

2. If I understood the post by John Fox correctly, I could safely use
Anova(…,type=III) function from car for ANOVA analyses in R, both
for balanced and unbalanced designs? Of course providing the model was
fitted with orthogonal contrasts. Something like below:
mod - aov(response ~ factor1 * factor2, data=mydata,
   contrasts=list(factor1=contr.sum, factor2=contr.sum))
Anova(mod, type=III)

It was also said in most of your posts that the decision of which of
Type of SS to use has to be done on the basis of the hypothesis we
want to test. Therefore, let's assume that I would like to test the
significance of both factors, and if some of them significant, I plan
to use post-hoc tests to explore difference(s) between levels of this
significant factor(s).

Thank you in advance, Amasco

On 8/27/06, John Fox [EMAIL PROTECTED] wrote:
 Dear Amasco,

 A complete explanation of the issues that you raise is awkward in an email,
 so I'll address your questions briefly. Section 8.2 of my text, Applied
 Regression Analysis, Linear Models, and Related Methods (Sage, 1997) has a
 detailed discussion.

 (1) In balanced designs, so-called Type I, II, and III sums of squares
 are identical. If the STATA manual says that Type II tests are only
 appropriate in balanced designs, then that doesn't make a whole lot of sense
 (unless one believes that Type-II tests are nonsense, which is not the
 case).

 (2) One should concentrate not directly on different types of sums of
 squares, but on the hypotheses to be tested. Sums of squares and F-tests
 should follow from the hypotheses. Type-II and Type-III tests (if the latter
 are properly formulated) test hypotheses that are reasonably construed as
 tests of main effects and interactions in unbalanced designs. In unbalanced
 designs, Type-I sums of squares usually test hypotheses of interest only by
 accident.

 (3) Type-II sums of squares are constructed obeying the principle of
 marginality, so the kinds of contrasts employed to represent factors are
 irrelevant to the sums of squares produced. You get the same answer for any
 full set of contrasts for each factor. In general, the hypotheses tested
 assume that terms to which a particular term is marginal are zero. So, for
 example, in a three-way ANOVA with factors A, B, and C, the Type-II test for
 the AB interaction assumes that the ABC interaction is absent, and the test
 for the A main effect assumes that the ABC, AB, and AC interaction are
 absent (but not necessarily the BC interaction, since the A main effect is
 not marginal to this term). A general justification is that we're usually
 not interested, e.g., in a main effect that's marginal to a nonzero
 interaction.

 (4) Type-III tests do not assume that terms higher-order to the term in
 question are zero. For example, in a two-way design with factors A and B,
 the type-III test for the A main effect tests whether the population
 marginal means at the levels of A (i.e., averaged across the levels of B)
 are the same. One can test this hypothesis whether or not A and B interact,
 since the marginal means can be formed whether or not the profiles of means
 for A within levels of B are parallel. Whether the hypothesis is of interest
 in the presence of interaction 

Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-28 Thread John Fox
Dear Amasco,

Again, I'll answer briefly (since the written source that I previously
mentioned has an extensive discussion):

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Amasco 
 Miralisus
 Sent: Monday, August 28, 2006 2:21 PM
 To: r-help@stat.math.ethz.ch
 Cc: John Fox; Prof Brian Ripley; Mark Lyman
 Subject: Re: [R] Type II and III sum of square in Anova (R, 
 car package)
 
 Hello,
 
 First of all, I would like to thank everybody who answered my 
 question. Every post has added something to my knowledge of the topic.
 I now know why Type III SS are so questionable.
 
 As I understood form R FAQ, there is disagreement among 
 Statisticians which SS to use 
 (http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-out
 put-from-anova_0028_0029-depend-on-the-order-of-factors-in-the
 -model_003f).
 However, most commercial statistical packages use Type III as 
 the default (with orthogonal contrasts), just as STATISTICA, 
 from which I am currently trying to migrate to R. This was 
 probably was done for the convenience of end-users who are 
 not very experienced in theoretical statistics.
 

Note that the contrasts are only orthogonal in the row basis of the model
matrix, not, with unbalanced data, in the model matrix itself.

 I am aware that the same result could be produced using the standard
 anova() function with Type I sequential SS, supplemented by 
 drop1() function, but this approach will look quite 
 complicated for persons without any substantial background in 
 statistics, like no-math students. I would prefer easier way, 
 possibly more universal, though also probably more for 
 dummies :) If am not mistaken, car package by John Fox with 
 his nice Anova() function is the reasonable alternative for 
 any, who wish to simply perform quick statistical analysis, 
 without afraid to mess something with model fitting. Of 
 course orthogonal contrasts have to be specified (for example 
 contr.sum) in case of Type III SS.
 
 Therefore, I would like to reformulate my questions, to make 
 it easier for you to answer:
 
 1. The first question related to answer by Professor Brian 
 Ripley: Did I understood correctly from the advised paper 
 (Bill Venables'
 'exegeses' paper) that there is not much sense to test main 
 effects if the interaction is significant?
 

Many are of this opinion. I would put it a bit differently: Properly
formulated, tests of main effects in the presence of interactions make sense
(i.e., have a straightforward interpretation in terms of population marginal
means) but probably are not of interest.

 2. If I understood the post by John Fox correctly, I could safely use
 Anova(.,type=III) function from car for ANOVA analyses in 
 R, both for balanced and unbalanced designs? Of course 
 providing the model was fitted with orthogonal contrasts. 
 Something like below:
 mod - aov(response ~ factor1 * factor2, data=mydata,
contrasts=list(factor1=contr.sum, 
 factor2=contr.sum)) Anova(mod, type=III)
 

Yes (or you could reset the contrasts option), but why do you appear to
prefer the type-III tests to the type-II tests?

 It was also said in most of your posts that the decision of 
 which of Type of SS to use has to be done on the basis of the 
 hypothesis we want to test. Therefore, let's assume that I 
 would like to test the significance of both factors, and if 
 some of them significant, I plan to use post-hoc tests to 
 explore difference(s) between levels of this significant factor(s).
 

Your statement is too vague to imply what kind of tests you should use. I
think that people are almost always interested in main effects when
interactions to which they are marginal are negligible. In this situation,
both type-II and type-III tests are appropriate, and type-II tests
would usually be more powerful.

Regards,
John

 Thank you in advance, Amasco
 
 On 8/27/06, John Fox [EMAIL PROTECTED] wrote:
  Dear Amasco,
 
  A complete explanation of the issues that you raise is 
 awkward in an 
  email, so I'll address your questions briefly. Section 8.2 
 of my text, 
  Applied Regression Analysis, Linear Models, and Related 
 Methods (Sage, 
  1997) has a detailed discussion.
 
  (1) In balanced designs, so-called Type I, II, and 
 III sums of 
  squares are identical. If the STATA manual says that Type 
 II tests are 
  only appropriate in balanced designs, then that doesn't 
 make a whole 
  lot of sense (unless one believes that Type-II tests are nonsense, 
  which is not the case).
 
  (2) One should concentrate not directly on different 
 types of sums 
  of squares, but on the hypotheses to be tested. Sums of squares and 
  F-tests should follow from the hypotheses. Type-II and 
 Type-III tests 
  (if the latter are properly formulated) test hypotheses that are 
  reasonably construed as tests of main effects and interactions in 
  unbalanced designs. In unbalanced designs, Type-I sums of squares 
  usually test 

Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-27 Thread Prof Brian Ripley
I think this starts from the position of a batch-oriented package.
In R you can refit models with update(), add1() and drop1(), and 
experienced S/R users almost never use ANOVA tables for unbalanced 
designs.  Rather than fit a pre-specified set of sub-models, why not fit 
those sub-models that appear to make some sense for your problem and data?

SInce your post lacks a signature and your credentials we have no idea of 
your background, which makes it very difficult even to know what reading 
to suggest to you.  But Bill Venables' 'exegeses' paper 
(http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf) may be a good start.
It does explain your comment '3.'.

On Sun, 27 Aug 2006, Amasco Miralisus wrote:

 Hello everybody,
 
 I have some questions on ANOVA in general and on ANOVA in R particularly.
 I am not Statistician, therefore I would be very appreciated if you answer
 it in a simple way.
 
 1. First of all, more general question. Standard anova() function for lm()
 or aov() models in R implements Type I sum of squares (sequential), which
 is not well suited for unbalanced ANOVA. Therefore it is better to use
 Anova() function from car package, which was programmed by John Fox to use
 Type II and Type III sum of squares. Did I get the point?
 
 2. Now more specific question. Type II sum of squares is not well suited
 for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore
 the general rule of thumb is to use Anova() function using Type II SS
 only for balanced ANOVA and Anova() function using Type III SS for
 unbalanced ANOVA? Is this correct interpretation?
 
 3. I have found a post from John Fox in which he wrote that Type III SS
 could be misleading in case someone use some contrasts. What is this about?
 Could you please advice, when it is appropriate to use Type II and when
 Type III SS? I do not use contrasts for comparisons, just general ANOVA
 with subsequent Tukey post-hoc comparisons.
 
 Thank you in advance,
 Amasco
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-27 Thread John Fox
Dear Amasco,

A complete explanation of the issues that you raise is awkward in an email,
so I'll address your questions briefly. Section 8.2 of my text, Applied
Regression Analysis, Linear Models, and Related Methods (Sage, 1997) has a
detailed discussion.

(1) In balanced designs, so-called Type I, II, and III sums of squares
are identical. If the STATA manual says that Type II tests are only
appropriate in balanced designs, then that doesn't make a whole lot of sense
(unless one believes that Type-II tests are nonsense, which is not the
case).

(2) One should concentrate not directly on different types of sums of
squares, but on the hypotheses to be tested. Sums of squares and F-tests
should follow from the hypotheses. Type-II and Type-III tests (if the latter
are properly formulated) test hypotheses that are reasonably construed as
tests of main effects and interactions in unbalanced designs. In unbalanced
designs, Type-I sums of squares usually test hypotheses of interest only by
accident. 

(3) Type-II sums of squares are constructed obeying the principle of
marginality, so the kinds of contrasts employed to represent factors are
irrelevant to the sums of squares produced. You get the same answer for any
full set of contrasts for each factor. In general, the hypotheses tested
assume that terms to which a particular term is marginal are zero. So, for
example, in a three-way ANOVA with factors A, B, and C, the Type-II test for
the AB interaction assumes that the ABC interaction is absent, and the test
for the A main effect assumes that the ABC, AB, and AC interaction are
absent (but not necessarily the BC interaction, since the A main effect is
not marginal to this term). A general justification is that we're usually
not interested, e.g., in a main effect that's marginal to a nonzero
interaction.

(4) Type-III tests do not assume that terms higher-order to the term in
question are zero. For example, in a two-way design with factors A and B,
the type-III test for the A main effect tests whether the population
marginal means at the levels of A (i.e., averaged across the levels of B)
are the same. One can test this hypothesis whether or not A and B interact,
since the marginal means can be formed whether or not the profiles of means
for A within levels of B are parallel. Whether the hypothesis is of interest
in the presence of interaction is another matter, however. To compute
Type-III tests using incremental F-tests, one needs contrasts that are
orthogonal in the row-basis of the model matrix. In R, this means, e.g.,
using contr.sum, contr.helmert, or contr.poly (all of which will give you
the same SS), but not contr.treatment. Failing to be careful here will
result in testing hypotheses that are not reasonably construed, e.g., as
hypotheses concerning main effects.

(5) The same considerations apply to linear models that include quantitative
predictors -- e.g., ANCOVA. Most software will not automatically produce
sensible Type-III tests, however.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Amasco 
 Miralisus
 Sent: Saturday, August 26, 2006 5:07 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Type II and III sum of square in Anova (R, car package)
 
 Hello everybody,
 
 I have some questions on ANOVA in general and on ANOVA in R 
 particularly.
 I am not Statistician, therefore I would be very appreciated 
 if you answer it in a simple way.
 
 1. First of all, more general question. Standard anova() 
 function for lm() or aov() models in R implements Type I sum 
 of squares (sequential), which is not well suited for 
 unbalanced ANOVA. Therefore it is better to use
 Anova() function from car package, which was programmed by 
 John Fox to use Type II and Type III sum of squares. Did I 
 get the point?
 
 2. Now more specific question. Type II sum of squares is not 
 well suited for unbalanced ANOVA designs too (as stated in 
 STATISTICA help), therefore the general rule of thumb is to 
 use Anova() function using Type II SS only for balanced ANOVA 
 and Anova() function using Type III SS for unbalanced ANOVA? 
 Is this correct interpretation?
 
 3. I have found a post from John Fox in which he wrote that 
 Type III SS could be misleading in case someone use some 
 contrasts. What is this about?
 Could you please advice, when it is appropriate to use Type 
 II and when Type III SS? I do not use contrasts for 
 comparisons, just general ANOVA with subsequent Tukey 
 post-hoc comparisons.
 
 Thank you in advance,
 Amasco
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting 

Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-26 Thread Mark Lyman
 1. First of all, more general question. Standard anova() function for lm()
 or aov() models in R implements Type I sum of squares (sequential), which
 is not well suited for unbalanced ANOVA. Therefore it is better to use
 Anova() function from car package, which was programmed by John Fox to use
 Type II and Type III sum of squares. Did I get the point?
 
 2. Now more specific question. Type II sum of squares is not well suited
 for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore
 the general rule of thumb is to use Anova() function using Type II SS
 only for balanced ANOVA and Anova() function using Type III SS for
 unbalanced ANOVA? Is this correct interpretation?
 
 3. I have found a post from John Fox in which he wrote that Type III SS
 could be misleading in case someone use some contrasts. What is this about?
 Could you please advice, when it is appropriate to use Type II and when
 Type III SS? I do not use contrasts for comparisons, just general ANOVA
 with subsequent Tukey post-hoc comparisons.
 
There are many threads on this list that discuss this issue. Not being a great
statistician myself, I would suggest you read through some of these as a start.
As I understand, the best philosophy with regards to types of sums of squares is
to use the type that tests the hypothesis you want. They were developed as a
convenience to test many of the hypotheses a person might want automatically,
and put it into a nice, neat little table. However, with an interactive system
like R it is usually even easier to test a full model vs. a reduced model. That
is if I want to test the significance of an interaction, I would use
anova(lm.fit2, lm.fit1) where lm.fit2 contains the interaction and lm.fit2 does
not. The anova function will return the appropriate F-test. The danger with
worrying about what type of sums of squares to use is that often we do not think
about what hypotheses we are testing and if those hypotheses make sense in our
situation.

Mark Lyman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.