I promised I would summarize and send out the responses to my SPSS questions. I 
have included most of them in their entirety. I have omitted a couple whose 
advice is included in the longer posts. I have removed all names because I 
wasn't always sure everyone remembered that I had indicated in the original 
email that I would do this. I start with my original query and then responses 
follow. 

Let me thank all tipsters and PsychTeachers for these wonderful replies. You 
are truly a community of generous souls. All of these nice replies came during 
a holiday period for most people, and spring break for many others. 

Lastly, for anyone struggling with SPSS there are several links and print 
sources that I think will be helpful for anyone who needs more help in 
self-teaching themselves that which was not available to be taught when some of 
us older-timers were in grad school.
=========================================================
Original Query:
1.  First of all, I am running a mixed ANOVA with one repeated measures 
variable with 5 levels and one between measures variables with 2 levels. I 
wanted to run planned comparisons but SPSS12 won't let me. It tells me that I 
need at least 3 groups and that I don't have three groups. Can someone explain 
this to me and tell how to run my analysis?

2.  Second, SPSS has several (about 12) different planned comparisons I can 
run. I know that some are more conservative and some less conservative, but how 
does one decide between so very many which ones to run?

3.  Third, for planned comparisons, can't I just run t-tests for the 
comparisons of interest

4.  Then how do I get effect size analyses? Effect size analyses in SPSS seem 
to be tied to post-hoc comparisons. Is it sufficient to say that my confidence 
intervals don't overlap?

5.  Why in the world would I want to do an omnibus post-hoc test when I have a 
hypothesis driving planned comparisons and how does all this work out in SPSS?
=============================================================
Responses:

You may find the following helpful:
http://www.uvm.edu/~dhowell/StatPages/More_Stuff/RepMeasMultComp/RepMeas
MultComp.html 

[My editorial note: I did find this link to be very good but note that it 
doesn’t fit on one line so I had to paste in the last part to get there.]

It is an interesting read even if not all of it is directly relevant to your 
problem (much of it is).

If your planned comparisons are on the between groups variable, you don't have 
three groups (levels). If the F is significant, there is a significant main 
effect for that factor and a significant difference between those levels and 
there is no need for a post hoc test. 

If you do dependent sample t-tests, the effect size is just (the larger mean 
minus the smaller mean) divided by the standard deviation of the difference 
scores. Confidence intervals are not the same as effect sizes. They are more 
closely related to p-values because they are affected by the sample size (which 
effect sizes are not). 

Most all of the rest of your questions are best answered by Howell at the link 
above.

=============================================================

I think the problem is that SPSS only does these tests on main effects. It will 
probably do them on your 5-level factor but won't do the other one (comparing 
two groups being the same as doing the F test). 
I've never been able to get SPSS to do pair-wise tests on the means for an 
interaction. I always end up doing these by hand (not that hard). 
Another solution would be to test the 5x2 interaction as if it were a 1-way 
ANOVA with 10 levels. But this might not be possible in a mixed design. 

Winer and (I think) Kirk have good discussions on the merits of the various 
post-hoc tests. They vary in power and susceptibility to experiment-wise Type I 
errors. Some require specific conditions (e.g., comparing n means to a single 
control). Neither text discusses all of the tests available in SPSS . . . you'd 
have to hunt down the references for these. I recommend that you consider the 
tests commonly used by researchers in your target readership. That is where 
your reviewers will come from. If you decide to use a test that isn't commonly 
used by the research community, you'll have to explain your choice (either up 
front or as a response to a reviewer's question if you use a test he/she isn't 
familiar with). 

Question 3 describes the Bonferonni procedure (which you will find described as 
a strategy for post-hoc testing in either Winer or Kirk or both). 

The post hocs sometimes use values derived from the omnibus F. And it is 
probably easier to write the software in this lockstep way. 

Are you sure you'd rather go back to the days when we cranked these things out 
on Monroe calculators and computing a simple one-way ANOVA could eat up the 
better half of your Saturday . . . and a 5 x 2 design might take a week. There 
are always tradeoffs. 
I'd much rather fight with SPSS for an hour or so than sweat out punching in 
all those numbers and worrying about my keying skills! 


=============================================================

I run a lot of repeated measures ANOVAs in SPSS and am pretty well familiar 
with the program (and repeated measures in general, because nearly every study 
I do has a combination of within and between subjects variables, because I 
study age differences).  As a caveat though, I use a different (more recent) 
version of SPSS than yours, so my answers may not be exact.   

First of all, if you are intending to run a 2x5 repeated measures ANOVA, I 
assume the 5-level is a within-subjects variable - is that so?  If it is, then 
your analysis should run just fine, but you may be choosing the wrong option 
under the analysis menu -- go to Analysis, then General Linear Model, then 
Repeated Measures.  Define a variable with 5 levels (i.e., give it a name and 
number of levels, then click define).  Then put the 5 in, and be sure to put 
the between-subjects (2-level) variable in under the corresponding menu box 
(below the within-subjects one).  This should work just fine.   

The choice of which planned comparisons to run is, in my experience, mostly 
arbitrary or up to the experimenter.  Of course, if you choose a more 
conservative test and still get a reliable effect, you will please even the 
most grumpy reviewer.  However, you can deal with some of the issues you asked 
about by going to the options under the repeated measures.  So, once you've 
entered everything as I described above, click the "options" button on the 
screen and you'll see  some options for displaying factor and facter 
interactions (put those all over into the window for "Display Means" -- then 
check the box that says "Compare Main effects" -- you can use a confidence 
interval adjustment right there to deal with those planned comparisons, for 
example, Bonferroni, which is essentially similar to what you were describing 
in terms of dividing by alpha and such.  You can also check the box for 
"estimates of effect size" to get the effect sizes for the main effects and any 
interaction in your 2x5 design.   

I'm not sure if I am explaining the procedure all that well (probably not), but 
if you go to the repeated measures menu and check out the options, I think you 
will find the easiest way to get at least some of the things you seem to be 
after.  Of course, interpreting the output can be tricky as well - in your 
case, be sure to check out the within-subjects contrasts (not just the 
within-subjects effects), because with a 5-level variable you may have a 
quadratic (or even more complicated, like Order 4) interaction, rather than 
something simply linear.  I'd be happy to help interpret output if you sent me 
the .spo file.  Good luck.

=============================================================

From Garson's amazing course
http://www2.chass.ncsu.edu/garson/pa765/anova.htm

"In mixed designs, sphericity is almost always violated and therefore epsilon 
adjustments to degrees of freedom are routine prior to computing F-test 
significance levels.

Split-plot designs are a form of mixed design, originating in agricultural 
research, where seeds were assigned to different plots of land, each receiving 
a different treatment. In split plot designs, subjects (ex., seeds) are 
randomly assigned to each level (ex., plots) of the between-groups factor (soil 
types), prior to receiving the within-subjects repeated factor treatments (ex., 
applications of different types of fertilizer).

Split-plot repeated measures ANOVA can be used when the same subjects are 
measured more than once. In this design, the between-subjects factor is the 
group (treatment or control) and the repeated measure is, for example, the test 
scores for two trials. The resulting ANOVA table will include a main treatment 
effect (reflecting being in the control or treatment group) and a 
group-by-trials interaction effect (reflecting treatment effect on posttest 
scores, taking pretest scores into account). This partitioning of the treatment 
effect may be more confusing than analysis of difference scores, which gives 
equivalent results and therefore is sometimes recommended.

In a typical split-plot repeated measures design, Subjects will be measured on 
some Score over a number of Trials. Subjects will also be split by some Group 
variable. In SPSS, Analyze, General Linear Model, Univariate; enter Score as 
the dependent; enter Trial and Group as fixed factors; enter Subject as a 
random factor; Press the Model button and choose Custom, asking for the Main 
effects for Group and Trial, and the interaction effect of Trial*Group; then 
click the Paste button and modify
the /DESIGN statement to also include Subject(Group) to get the 
Subject-within-Group effect; then select Run All in the syntax window to 
execute."

--------------------------------------
Much more detail on post-hocs can be found in Garson's course.

You certainly don't need an omnibus test to justify a planned comparison - but 
the comparison would need to use the appropriate error SS and df that includes 
all of the data.  In SPSS, programming the contrast in the syntax for mixed 
designs may not be possible or feasible.  Alternately, there is the 'Data' 
"Split File" option in SPSS to conduct analyses separately for different 
groups.  You could Split by your between-groups DV, and then run a GLM Repeated 
Measures.  Choose a Contrast (polynomials may fit your data best), or the 
Repeated post-hoc option could be used to compare measurement phases.  I 
believe that reducing alpha in the way you described would probably satisfy 
most reviewers (including me).

An easier approach would be to calculate difference scores - if you can find 
references for the equivalence of that approach.  As a reviewer, I wouldn't let 
the difference score analysis fly without a reference or two in support 
(preferably Monte Carlo simulations).  You would most likely also lose power by 
computing difference scores, but may yield very similar results, especially if 
you only care about one or two changes in the series.  Let me know if someone 
discusses programmed contrasts in syntax for mixed designs - you're the second 
person to ask me about these situations this week!

=============================================================

1.     
SPSS has always had a bizzare implementation, IMHO, for the analysis of 
repeated measures/mixed designs, far inferior  to the programs of BMDP (RIP).  
SAS has had it own  peculiarities but I think that they've improved in recent 
years  (I admit to not being a SAS person).  You don't mention which  procedure 
you're using in SPSS -- I assume that you're using  MANOVA but I realize that 
you might be using GLM though  I'm not really sure which version of SPSS GLM 
was made  available.  In any event, it's possible that whatever procedure  
you're using, SPSS is balking at doing multiple comparisons  with a two level 
factor since the F ratio is a direct test of this  factor (in this case the 
F-ratio is equivalent to the squared value  of the t-test for the two means 
inovlved in the main effect).  I  suspect, however, that you want to test 
components of the 2x5 interaction and apply multiple  comparisons to the main 
effects.  Is this correct?  SPSS may not allow this to be done, though the 
rationale may not be clear.

2.
There are no hard and fast rules for this but one can use the following 
criteria:

(1)  The LSD procedure (i.e., multiple t-tests) is the most powerful multiple 
comparison procedure but it also has the highest overall/familywise Type I 
error rate.  If there are statistically significant results, they may be real 
or Type I errors, nonetheless you'll find the largest number of significant 
results with this procedure.  

(2)  The Scheffe procedure I believe is still the most conservative procedure, 
that is, it has the least power but it will you allow one to perform all 
possible multiple comparisons, that is, pairwise comparisons and combinations 
of means.  If it's significant by Scheffe, it's likely to be significant by all 
other procedures but this will provide the fewest significant results.  

(3) I believe that all other procedure will provide different levels of 
liberalism/conservatism of results, that is, intermediate degrees of power and 
control of overall Type I error rates. The choice of one procedure over the 
other may be as dependent upon the specific conditions of one's data as one's 
experience/attitude/knowledge of different tests. I have a fondness for 
Bonferroni corrected t-tests but this is mostly motivated by the simplicity of 
the test and its conceptual basis -- there are other tests which can be more 
powerful depending upon the number of means being compared. 

3.
Planned comparisons assume (a) that a subset of comparisons will be made 
relative to all comparisons and (b) there is some theoretical/rational basis 
for choosing certain comparisons. In this case, one just does the ANOVA to get 
the appropriate error term. Look at the latest edition of Kirk and his chapter 
on multiple comparisons for guidance. In earlier editions, Kirk pointed out 
that many researcher simply used alpha=.05 for each planned comparison, 
especially if the number of such comparisons was small.  It seems to make more 
sense to use a Bonferroni correction and divide the overall alpha=-.05 by the 
number of comparisons being made and using this for each individual test 
(though one could allocate a higher per comparison alpha for more "important" 
comparisons).  If this practice has changed, I'd like to hear about it.

4. 
I'm not sure I understand what you're saying here.  The usual effect size 
measure provided by SPSS is partial eta square which, if memory serves, is 
somewhat equivalent to a semi-partial correlation coefficient squared.  Your 
statement about confidence intervals suggests that you're focusing on something 
else. Are you talking about standardized differences between means?  If so, it 
might be easiest to calculate these by hand, unless I'm missing something.

5.
The simple answer, I think, is that once you know what equations you need to 
use for the procedure you're doing, you use SPSS to provide you with the 
components of the test you want to do and do the rest by hand.  That way you're 
assured that the analysis you want done is actually being done (it's not always 
clear what SPSS is doing or why it is doing it).  If one is expert in SPSS 
programming, especially in the use of its matrix manipulation procedure and in 
the use of scripts, I imagine that that one can make SPSS jump through these 
hoops.  Otherwise, it may make more sense to select a specific test that one 
can do by hand (or program the equation either in SPSS or another program like 
Excel) and use components from the SPSS analysis of variance procedure 
necessary for the test (e.g., Mean Square error from the ANOVA -- ignoring the 
F-tests since the planned comparisons imply that one isn't interested in these)

>YUCK! Why can stats be what they were 30 years ago when 
>I was in grad school?

Because we've come a long way since then?  And though programs like SPSS have 
also progressed, it still doesn't seem to be able to do certain analyses (e.g., 
those involving repeated-measures) in reasonable ways.

=============================================================

You would surely find the Keppel & Wickens text helpful. That said, given that 
I provide worked out examples in SPSS, you may want to look at my notes on the 
K&W text for my advanced (undergrad) stats course.

http://www.skidmore.edu/%7Ehfoley/PS318.htm

Given K&W's advice, which involves always using a different error term in any 
repeated measures ANOVA (including post hoc analyses), the only way to 
decompose an interaction involving a repeated factor is to do a series of 
separate RM analyses. I'm fairly certain that I provide some examples in my 
notes, but if not, let me know. Moreover, effect size is easily enough computed 
by hand, once you know which type of effect size measure you want. :-) Again, 
if you look into K&W, they'll give you some sense of the options, along with 
the suggested computations. I should have some worked examples in my notes as 
well.

If my notes don't help, let me know because as I gear up to teach the course 
again this fall, I should make the necessary additions.

=============================================================

Look in the Options dialogue box rather than the post hoc dialogue. You should 
be able to move your within groups variable name from the list at the left to 
the box at the right. Then you should be able to choose among two or three 
options for post hoc comparisons (the default is LSD).
I hope that helps
[short and to the point and helped with getting the main effects worked out!]


=============================================================

(1)  You probably should take a look at the latest edition of Kirk's 
Experimental Design (I think that its in its 3rd edition).  It may help you 
review the issues regarding multiple comparisons and planned comparisons 
(especially whether your comparisons are orthogonal or not and why this might 
be important).

(2)  I dislike SPSS implementation of the repeated measures ANOVA in both 
MANOVA and GLM.

For reasons that are unclear to me the versions of SPSS that I'm familiar with 
(v13 & v14) do allow trend analysis for a repeated measures variable but this 
makes little sense if your within-subject variable does not have ordered levels 
(it makes sense for angle of rotation in a mental rotation experiment but not 
if the levels are qualitatively different, such as different types of stimulus 
presentations on an RT task).

One should be able to do some form of t-test or comparison among the means for 
the within-subject factor but from some limited materials I've read from SPSS, 
I think that SPSS avoids providing these because it claims that the means for 
the within-subject factor are not independent (the means for any 
between-subject factor would be independent which I think SPSS assumes allows 
the standard tests to be done).  It may be possible to specify contrasts 
through the contrast subcommand but because I've had problems in getting this 
to work properly in MANOVA, I've just avoided using it in general.

In any event, doing the ANOVA in GLM should give you the appropriate mean 
squares to allow you to do comparisons between means.

(3) Regarding effect sizes vs. confidence intervals, I understand the concern 
about editors and reviewers but given that opinions about these issues are all 
over the place, there are no hard and fast rules to follow. The best you can do 
is provide a good justification for using one or the other.  I've attached a 
copy of Jack Cohen's Power Primer which provides a basis for emphasizing the 
importance of effect sizes.  The main reason for providing an effect size is 
that this will be useful for future researchers when they conduct a variation 
of your research and need an effect size measure for a power analysis.  If 
previous research doesn't provide effect size estimates, the new researcher is 
left to guess what a reasonable effect size might be for the power analysis for 
the proposed research.

Another reason is that quantitative reviews of research (i.e., meta-analyses) 
require effect size estimates. You can make the meta-analyst's job easier by 
calculating and providing the effect size or leaving it up to the analyst to 
calculate it him/herself.

Finally, confidence intervals are just the flip side of significance tests, so 
the preference for one is more a matter of taste (ideology?) than anything 
else. However, in GLM you can request "estimated marginal means" which will 
provide you with the 95% confidence intervals for each mean that you specify. 

Below is the SPSS code for a GLM analysis of the "same" response condition for 
the mental rotation experiment (degrees 0-180) and request the marginal mean 
RTs for each angle of rotation:

GLM
  ang0time,ang45time,ang90time,ang135time,ang180time
  /WSFACTOR = ang_rot(5)
  /METHOD = SSTYPE(3)
  /CRITERIA = ALPHA(.05)
  /print=descriptive,etasq
  /WSDESIGN = ang_rot
  /emmeans=tables(ang_rot).

the last subcommand requests the marginal means for the within-subject factor 
"ang_rot" and produces a table of the mean reaction times for the five angle 
rotations along with their standard errors and 95% CIs.  In the /print command 
I also request "etasq", that is, the partial eta square which can be used as an 
effect size measure though it might make sense to hand calculate Cohen's d for 
means that are significantly different.

I hope you find this helpful.

=============================================================

I think you have two options: focused contrasts that you hint at below, with 
appropriate Type I error control (another kettle of fish), or specify a form to 
the longitudinal relationship you expect to find for each group and test for 
group x form interactions:

1) group linear group x linear

or

2) group linear quadratic group x linear group x quadratic

You would create the linear and quadratic terms by creating a variable that 
indexes time, e.g., time 1 = 0, time 2 = 1, time 3 = 2, etc. and using it as is 
for the linear term and square it for the quadratic term.

Plotting will help a lot too: look into the interactive graphics submenu. 
You can plot your DV by time (if the data are structured "long") and overlay a 
smoothed or linear regression line to check fit.  If you need to transform your 
data from "wide" to "long", check out the "varstocases" command.

About the Type I error control: I would try and keep it simple.  I find the 
list of choices available from SPSS bewildering and the help menus unhelpful.  
Many do things you aren't interested in: control for Type I error in all 
possible pairwise differences.  Moreover, even if you did pick one that fit 
your analyses perfectly, I think there's a big gap between what are more 
powerful techniques to use (e.g., simulation based methods) and what is easily 
explainable to a reader.  If I only had a few comparisons I wanted to make, I 
might just use a Bonferroni correction for the few that I did and realize I was 
more conservative than I would have liked (may have missed a true difference).

If you want to do a little delving into some more powerful methods to control 
type I error, check into the "False Discovery Rate", Benjamini & Hochberg 
(1995).  It's in a supplement of the Journal for the Royal Statistical Society 
and so was hard for me to root out, but here is an overview of several methods 
that might be helpful:

http://nitro.biosci.arizona.edu/workshops/Aarhus2006/pdfs/Multiple.pdf

Either the FDR or the Bonferonni correction will take some hand calculation on 
your part since you will be using multiple instances of SPSS t-test procedure 
and you'll have to keep track of the p-values.

Another approach is to calculate confidence intervals for means (or differences 
between adjacent means) at each time point and plot them. That way you avoid 
the whole NHST problem altogether.

Effect sizes can be Cohen's d for independent means between groups, or even 
within groups.

In general, things are more complicated now because many people find that 
traditional repeated measures ANOVA isn't well equipped to handle the 
dependency among observations.  In the last 30 years there's been considerable 
effort expended to create adjustments for the df in a Repeated Measures ANOVA 
that people will argue over whether is too conservative or too liberal.  More 
recently, I've read where people are using random effects or hierarchical 
modeling approaches to model the function of time, rather than simply test mean 
differences.  To some degree, things are in flux.

A great no-nonsense approach to some of these issues that I rely on is 
David Howell's intermediate statistics text:

Howell DC (2002). Statistical Methods for Psychology. 5th ed. Duxbury.

If you're interested in the longitudinal approach, I can't recommend highly 
enough the Singer and Willet text:

Singer JD & Willett JB. (2003). Applied Longitudinal Data Analysis: Modeling 
Change and Event Occurrence. Oxford University Press.

I'd be interested in other ideas you get, if you have time to summarize them. 
Other people on the list might be interested too.

Hope this helps,

=============================================================

1.  From this I infer that what you want to do is compare the two groups 
(between subjects) at each of the five levels of the repeated factor (if you 
wanted to test the repeated factor at each of the two levels of the between 
factor SPSS should have complied).  In my limited experience with mixed designs 
like this homogeneity of variance is iffy, so I am inclined to use individual 
rather than pooled error terms. This is easily accomplished by asking SPSS to 
do a good old fashioned t test at each level of the repeated factor.  Suppose 
that R1, R2, R3, R4, and R5 are the five variables coding the repeated effect 
and G is the grouping variable.  Compare means, Independent Samples t, G is 
grouping variable, R1 to R5 the test variables.  OK.  Worried that you might 
burn in hell if you allow familywise error to exceed .05?  Just use a 
Bonferroni adjusted criterion of .01, but be aware that Satan smiles every time 
we make a Type II error.  Want an F instead of a t ?  As Mike suggested, just 
square the t.  The p will be the same.

 2.  If you have only three groups, use Fisher's procedure.  As Mike pointed 
out, it is more powerful.  What he did not point out is that it does cap alpha 
familywise at the nominal level, so there is no good reason to use a more 
conservative procedure, unless you just really want to make Satan smile again.  
More than three groups?  Use the REGWQ, which will hold the familywise error at 
no more than the nominal level and is more powerful than the Tukey.  For 
special cases there may be better choices.

 3.  Some say it does not matter whether your comparions are planned or not, 
others say it does.  If you belong to the later camp you can just tell yourself 
that you planned to make every possible comparison among means and thus you 
don't have to worry about familywise error.  :-)

 4.  If, as I suspect, you are simply comparing two means (five times), I can 
provide a SPSS script that will compute the value of g (estimate of Cohen's d) 
and put a confidence interval on it.  "Percentage of variance explained" 
statistics are commonly misinterpreted, so I avoid them if I can.  Deciding 
between eta-squared (or the similar omega-squared) and partial eta-squared can 
be a challenge -- can you or can you not justify removing from the total 
variance the variance accounted for by the other factor(s)?  With partial 
eta-squared in a factorial design you can end up accounting for over 100% of 
the variance.

 5.  To make Satan smile again.  You would probably not have much difficulty 
convincing me that the omnibus test is silly and that a set of focused 
contrasts that address your research questions is the better way to go.

=============================================================
Annette Kujawski Taylor, Ph.D.
Professor of Psychology
University of San Diego
5998 Alcala Park
San Diego, CA 92110
619-260-4006
[EMAIL PROTECTED]

Reply via email to