I promised I would summarize and send out the responses to my SPSS questions. I have included most of them in their entirety. I have omitted a couple whose advice is included in the longer posts. I have removed all names because I wasn't always sure everyone remembered that I had indicated in the original email that I would do this. I start with my original query and then responses follow.
Let me thank all tipsters and PsychTeachers for these wonderful replies. You are truly a community of generous souls. All of these nice replies came during a holiday period for most people, and spring break for many others. Lastly, for anyone struggling with SPSS there are several links and print sources that I think will be helpful for anyone who needs more help in self-teaching themselves that which was not available to be taught when some of us older-timers were in grad school. ========================================================= Original Query: 1. First of all, I am running a mixed ANOVA with one repeated measures variable with 5 levels and one between measures variables with 2 levels. I wanted to run planned comparisons but SPSS12 won't let me. It tells me that I need at least 3 groups and that I don't have three groups. Can someone explain this to me and tell how to run my analysis? 2. Second, SPSS has several (about 12) different planned comparisons I can run. I know that some are more conservative and some less conservative, but how does one decide between so very many which ones to run? 3. Third, for planned comparisons, can't I just run t-tests for the comparisons of interest 4. Then how do I get effect size analyses? Effect size analyses in SPSS seem to be tied to post-hoc comparisons. Is it sufficient to say that my confidence intervals don't overlap? 5. Why in the world would I want to do an omnibus post-hoc test when I have a hypothesis driving planned comparisons and how does all this work out in SPSS? ============================================================= Responses: You may find the following helpful: http://www.uvm.edu/~dhowell/StatPages/More_Stuff/RepMeasMultComp/RepMeas MultComp.html [My editorial note: I did find this link to be very good but note that it doesn’t fit on one line so I had to paste in the last part to get there.] It is an interesting read even if not all of it is directly relevant to your problem (much of it is). If your planned comparisons are on the between groups variable, you don't have three groups (levels). If the F is significant, there is a significant main effect for that factor and a significant difference between those levels and there is no need for a post hoc test. If you do dependent sample t-tests, the effect size is just (the larger mean minus the smaller mean) divided by the standard deviation of the difference scores. Confidence intervals are not the same as effect sizes. They are more closely related to p-values because they are affected by the sample size (which effect sizes are not). Most all of the rest of your questions are best answered by Howell at the link above. ============================================================= I think the problem is that SPSS only does these tests on main effects. It will probably do them on your 5-level factor but won't do the other one (comparing two groups being the same as doing the F test). I've never been able to get SPSS to do pair-wise tests on the means for an interaction. I always end up doing these by hand (not that hard). Another solution would be to test the 5x2 interaction as if it were a 1-way ANOVA with 10 levels. But this might not be possible in a mixed design. Winer and (I think) Kirk have good discussions on the merits of the various post-hoc tests. They vary in power and susceptibility to experiment-wise Type I errors. Some require specific conditions (e.g., comparing n means to a single control). Neither text discusses all of the tests available in SPSS . . . you'd have to hunt down the references for these. I recommend that you consider the tests commonly used by researchers in your target readership. That is where your reviewers will come from. If you decide to use a test that isn't commonly used by the research community, you'll have to explain your choice (either up front or as a response to a reviewer's question if you use a test he/she isn't familiar with). Question 3 describes the Bonferonni procedure (which you will find described as a strategy for post-hoc testing in either Winer or Kirk or both). The post hocs sometimes use values derived from the omnibus F. And it is probably easier to write the software in this lockstep way. Are you sure you'd rather go back to the days when we cranked these things out on Monroe calculators and computing a simple one-way ANOVA could eat up the better half of your Saturday . . . and a 5 x 2 design might take a week. There are always tradeoffs. I'd much rather fight with SPSS for an hour or so than sweat out punching in all those numbers and worrying about my keying skills! ============================================================= I run a lot of repeated measures ANOVAs in SPSS and am pretty well familiar with the program (and repeated measures in general, because nearly every study I do has a combination of within and between subjects variables, because I study age differences). As a caveat though, I use a different (more recent) version of SPSS than yours, so my answers may not be exact. First of all, if you are intending to run a 2x5 repeated measures ANOVA, I assume the 5-level is a within-subjects variable - is that so? If it is, then your analysis should run just fine, but you may be choosing the wrong option under the analysis menu -- go to Analysis, then General Linear Model, then Repeated Measures. Define a variable with 5 levels (i.e., give it a name and number of levels, then click define). Then put the 5 in, and be sure to put the between-subjects (2-level) variable in under the corresponding menu box (below the within-subjects one). This should work just fine. The choice of which planned comparisons to run is, in my experience, mostly arbitrary or up to the experimenter. Of course, if you choose a more conservative test and still get a reliable effect, you will please even the most grumpy reviewer. However, you can deal with some of the issues you asked about by going to the options under the repeated measures. So, once you've entered everything as I described above, click the "options" button on the screen and you'll see some options for displaying factor and facter interactions (put those all over into the window for "Display Means" -- then check the box that says "Compare Main effects" -- you can use a confidence interval adjustment right there to deal with those planned comparisons, for example, Bonferroni, which is essentially similar to what you were describing in terms of dividing by alpha and such. You can also check the box for "estimates of effect size" to get the effect sizes for the main effects and any interaction in your 2x5 design. I'm not sure if I am explaining the procedure all that well (probably not), but if you go to the repeated measures menu and check out the options, I think you will find the easiest way to get at least some of the things you seem to be after. Of course, interpreting the output can be tricky as well - in your case, be sure to check out the within-subjects contrasts (not just the within-subjects effects), because with a 5-level variable you may have a quadratic (or even more complicated, like Order 4) interaction, rather than something simply linear. I'd be happy to help interpret output if you sent me the .spo file. Good luck. ============================================================= From Garson's amazing course http://www2.chass.ncsu.edu/garson/pa765/anova.htm "In mixed designs, sphericity is almost always violated and therefore epsilon adjustments to degrees of freedom are routine prior to computing F-test significance levels. Split-plot designs are a form of mixed design, originating in agricultural research, where seeds were assigned to different plots of land, each receiving a different treatment. In split plot designs, subjects (ex., seeds) are randomly assigned to each level (ex., plots) of the between-groups factor (soil types), prior to receiving the within-subjects repeated factor treatments (ex., applications of different types of fertilizer). Split-plot repeated measures ANOVA can be used when the same subjects are measured more than once. In this design, the between-subjects factor is the group (treatment or control) and the repeated measure is, for example, the test scores for two trials. The resulting ANOVA table will include a main treatment effect (reflecting being in the control or treatment group) and a group-by-trials interaction effect (reflecting treatment effect on posttest scores, taking pretest scores into account). This partitioning of the treatment effect may be more confusing than analysis of difference scores, which gives equivalent results and therefore is sometimes recommended. In a typical split-plot repeated measures design, Subjects will be measured on some Score over a number of Trials. Subjects will also be split by some Group variable. In SPSS, Analyze, General Linear Model, Univariate; enter Score as the dependent; enter Trial and Group as fixed factors; enter Subject as a random factor; Press the Model button and choose Custom, asking for the Main effects for Group and Trial, and the interaction effect of Trial*Group; then click the Paste button and modify the /DESIGN statement to also include Subject(Group) to get the Subject-within-Group effect; then select Run All in the syntax window to execute." -------------------------------------- Much more detail on post-hocs can be found in Garson's course. You certainly don't need an omnibus test to justify a planned comparison - but the comparison would need to use the appropriate error SS and df that includes all of the data. In SPSS, programming the contrast in the syntax for mixed designs may not be possible or feasible. Alternately, there is the 'Data' "Split File" option in SPSS to conduct analyses separately for different groups. You could Split by your between-groups DV, and then run a GLM Repeated Measures. Choose a Contrast (polynomials may fit your data best), or the Repeated post-hoc option could be used to compare measurement phases. I believe that reducing alpha in the way you described would probably satisfy most reviewers (including me). An easier approach would be to calculate difference scores - if you can find references for the equivalence of that approach. As a reviewer, I wouldn't let the difference score analysis fly without a reference or two in support (preferably Monte Carlo simulations). You would most likely also lose power by computing difference scores, but may yield very similar results, especially if you only care about one or two changes in the series. Let me know if someone discusses programmed contrasts in syntax for mixed designs - you're the second person to ask me about these situations this week! ============================================================= 1. SPSS has always had a bizzare implementation, IMHO, for the analysis of repeated measures/mixed designs, far inferior to the programs of BMDP (RIP). SAS has had it own peculiarities but I think that they've improved in recent years (I admit to not being a SAS person). You don't mention which procedure you're using in SPSS -- I assume that you're using MANOVA but I realize that you might be using GLM though I'm not really sure which version of SPSS GLM was made available. In any event, it's possible that whatever procedure you're using, SPSS is balking at doing multiple comparisons with a two level factor since the F ratio is a direct test of this factor (in this case the F-ratio is equivalent to the squared value of the t-test for the two means inovlved in the main effect). I suspect, however, that you want to test components of the 2x5 interaction and apply multiple comparisons to the main effects. Is this correct? SPSS may not allow this to be done, though the rationale may not be clear. 2. There are no hard and fast rules for this but one can use the following criteria: (1) The LSD procedure (i.e., multiple t-tests) is the most powerful multiple comparison procedure but it also has the highest overall/familywise Type I error rate. If there are statistically significant results, they may be real or Type I errors, nonetheless you'll find the largest number of significant results with this procedure. (2) The Scheffe procedure I believe is still the most conservative procedure, that is, it has the least power but it will you allow one to perform all possible multiple comparisons, that is, pairwise comparisons and combinations of means. If it's significant by Scheffe, it's likely to be significant by all other procedures but this will provide the fewest significant results. (3) I believe that all other procedure will provide different levels of liberalism/conservatism of results, that is, intermediate degrees of power and control of overall Type I error rates. The choice of one procedure over the other may be as dependent upon the specific conditions of one's data as one's experience/attitude/knowledge of different tests. I have a fondness for Bonferroni corrected t-tests but this is mostly motivated by the simplicity of the test and its conceptual basis -- there are other tests which can be more powerful depending upon the number of means being compared. 3. Planned comparisons assume (a) that a subset of comparisons will be made relative to all comparisons and (b) there is some theoretical/rational basis for choosing certain comparisons. In this case, one just does the ANOVA to get the appropriate error term. Look at the latest edition of Kirk and his chapter on multiple comparisons for guidance. In earlier editions, Kirk pointed out that many researcher simply used alpha=.05 for each planned comparison, especially if the number of such comparisons was small. It seems to make more sense to use a Bonferroni correction and divide the overall alpha=-.05 by the number of comparisons being made and using this for each individual test (though one could allocate a higher per comparison alpha for more "important" comparisons). If this practice has changed, I'd like to hear about it. 4. I'm not sure I understand what you're saying here. The usual effect size measure provided by SPSS is partial eta square which, if memory serves, is somewhat equivalent to a semi-partial correlation coefficient squared. Your statement about confidence intervals suggests that you're focusing on something else. Are you talking about standardized differences between means? If so, it might be easiest to calculate these by hand, unless I'm missing something. 5. The simple answer, I think, is that once you know what equations you need to use for the procedure you're doing, you use SPSS to provide you with the components of the test you want to do and do the rest by hand. That way you're assured that the analysis you want done is actually being done (it's not always clear what SPSS is doing or why it is doing it). If one is expert in SPSS programming, especially in the use of its matrix manipulation procedure and in the use of scripts, I imagine that that one can make SPSS jump through these hoops. Otherwise, it may make more sense to select a specific test that one can do by hand (or program the equation either in SPSS or another program like Excel) and use components from the SPSS analysis of variance procedure necessary for the test (e.g., Mean Square error from the ANOVA -- ignoring the F-tests since the planned comparisons imply that one isn't interested in these) >YUCK! Why can stats be what they were 30 years ago when >I was in grad school? Because we've come a long way since then? And though programs like SPSS have also progressed, it still doesn't seem to be able to do certain analyses (e.g., those involving repeated-measures) in reasonable ways. ============================================================= You would surely find the Keppel & Wickens text helpful. That said, given that I provide worked out examples in SPSS, you may want to look at my notes on the K&W text for my advanced (undergrad) stats course. http://www.skidmore.edu/%7Ehfoley/PS318.htm Given K&W's advice, which involves always using a different error term in any repeated measures ANOVA (including post hoc analyses), the only way to decompose an interaction involving a repeated factor is to do a series of separate RM analyses. I'm fairly certain that I provide some examples in my notes, but if not, let me know. Moreover, effect size is easily enough computed by hand, once you know which type of effect size measure you want. :-) Again, if you look into K&W, they'll give you some sense of the options, along with the suggested computations. I should have some worked examples in my notes as well. If my notes don't help, let me know because as I gear up to teach the course again this fall, I should make the necessary additions. ============================================================= Look in the Options dialogue box rather than the post hoc dialogue. You should be able to move your within groups variable name from the list at the left to the box at the right. Then you should be able to choose among two or three options for post hoc comparisons (the default is LSD). I hope that helps [short and to the point and helped with getting the main effects worked out!] ============================================================= (1) You probably should take a look at the latest edition of Kirk's Experimental Design (I think that its in its 3rd edition). It may help you review the issues regarding multiple comparisons and planned comparisons (especially whether your comparisons are orthogonal or not and why this might be important). (2) I dislike SPSS implementation of the repeated measures ANOVA in both MANOVA and GLM. For reasons that are unclear to me the versions of SPSS that I'm familiar with (v13 & v14) do allow trend analysis for a repeated measures variable but this makes little sense if your within-subject variable does not have ordered levels (it makes sense for angle of rotation in a mental rotation experiment but not if the levels are qualitatively different, such as different types of stimulus presentations on an RT task). One should be able to do some form of t-test or comparison among the means for the within-subject factor but from some limited materials I've read from SPSS, I think that SPSS avoids providing these because it claims that the means for the within-subject factor are not independent (the means for any between-subject factor would be independent which I think SPSS assumes allows the standard tests to be done). It may be possible to specify contrasts through the contrast subcommand but because I've had problems in getting this to work properly in MANOVA, I've just avoided using it in general. In any event, doing the ANOVA in GLM should give you the appropriate mean squares to allow you to do comparisons between means. (3) Regarding effect sizes vs. confidence intervals, I understand the concern about editors and reviewers but given that opinions about these issues are all over the place, there are no hard and fast rules to follow. The best you can do is provide a good justification for using one or the other. I've attached a copy of Jack Cohen's Power Primer which provides a basis for emphasizing the importance of effect sizes. The main reason for providing an effect size is that this will be useful for future researchers when they conduct a variation of your research and need an effect size measure for a power analysis. If previous research doesn't provide effect size estimates, the new researcher is left to guess what a reasonable effect size might be for the power analysis for the proposed research. Another reason is that quantitative reviews of research (i.e., meta-analyses) require effect size estimates. You can make the meta-analyst's job easier by calculating and providing the effect size or leaving it up to the analyst to calculate it him/herself. Finally, confidence intervals are just the flip side of significance tests, so the preference for one is more a matter of taste (ideology?) than anything else. However, in GLM you can request "estimated marginal means" which will provide you with the 95% confidence intervals for each mean that you specify. Below is the SPSS code for a GLM analysis of the "same" response condition for the mental rotation experiment (degrees 0-180) and request the marginal mean RTs for each angle of rotation: GLM ang0time,ang45time,ang90time,ang135time,ang180time /WSFACTOR = ang_rot(5) /METHOD = SSTYPE(3) /CRITERIA = ALPHA(.05) /print=descriptive,etasq /WSDESIGN = ang_rot /emmeans=tables(ang_rot). the last subcommand requests the marginal means for the within-subject factor "ang_rot" and produces a table of the mean reaction times for the five angle rotations along with their standard errors and 95% CIs. In the /print command I also request "etasq", that is, the partial eta square which can be used as an effect size measure though it might make sense to hand calculate Cohen's d for means that are significantly different. I hope you find this helpful. ============================================================= I think you have two options: focused contrasts that you hint at below, with appropriate Type I error control (another kettle of fish), or specify a form to the longitudinal relationship you expect to find for each group and test for group x form interactions: 1) group linear group x linear or 2) group linear quadratic group x linear group x quadratic You would create the linear and quadratic terms by creating a variable that indexes time, e.g., time 1 = 0, time 2 = 1, time 3 = 2, etc. and using it as is for the linear term and square it for the quadratic term. Plotting will help a lot too: look into the interactive graphics submenu. You can plot your DV by time (if the data are structured "long") and overlay a smoothed or linear regression line to check fit. If you need to transform your data from "wide" to "long", check out the "varstocases" command. About the Type I error control: I would try and keep it simple. I find the list of choices available from SPSS bewildering and the help menus unhelpful. Many do things you aren't interested in: control for Type I error in all possible pairwise differences. Moreover, even if you did pick one that fit your analyses perfectly, I think there's a big gap between what are more powerful techniques to use (e.g., simulation based methods) and what is easily explainable to a reader. If I only had a few comparisons I wanted to make, I might just use a Bonferroni correction for the few that I did and realize I was more conservative than I would have liked (may have missed a true difference). If you want to do a little delving into some more powerful methods to control type I error, check into the "False Discovery Rate", Benjamini & Hochberg (1995). It's in a supplement of the Journal for the Royal Statistical Society and so was hard for me to root out, but here is an overview of several methods that might be helpful: http://nitro.biosci.arizona.edu/workshops/Aarhus2006/pdfs/Multiple.pdf Either the FDR or the Bonferonni correction will take some hand calculation on your part since you will be using multiple instances of SPSS t-test procedure and you'll have to keep track of the p-values. Another approach is to calculate confidence intervals for means (or differences between adjacent means) at each time point and plot them. That way you avoid the whole NHST problem altogether. Effect sizes can be Cohen's d for independent means between groups, or even within groups. In general, things are more complicated now because many people find that traditional repeated measures ANOVA isn't well equipped to handle the dependency among observations. In the last 30 years there's been considerable effort expended to create adjustments for the df in a Repeated Measures ANOVA that people will argue over whether is too conservative or too liberal. More recently, I've read where people are using random effects or hierarchical modeling approaches to model the function of time, rather than simply test mean differences. To some degree, things are in flux. A great no-nonsense approach to some of these issues that I rely on is David Howell's intermediate statistics text: Howell DC (2002). Statistical Methods for Psychology. 5th ed. Duxbury. If you're interested in the longitudinal approach, I can't recommend highly enough the Singer and Willet text: Singer JD & Willett JB. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press. I'd be interested in other ideas you get, if you have time to summarize them. Other people on the list might be interested too. Hope this helps, ============================================================= 1. From this I infer that what you want to do is compare the two groups (between subjects) at each of the five levels of the repeated factor (if you wanted to test the repeated factor at each of the two levels of the between factor SPSS should have complied). In my limited experience with mixed designs like this homogeneity of variance is iffy, so I am inclined to use individual rather than pooled error terms. This is easily accomplished by asking SPSS to do a good old fashioned t test at each level of the repeated factor. Suppose that R1, R2, R3, R4, and R5 are the five variables coding the repeated effect and G is the grouping variable. Compare means, Independent Samples t, G is grouping variable, R1 to R5 the test variables. OK. Worried that you might burn in hell if you allow familywise error to exceed .05? Just use a Bonferroni adjusted criterion of .01, but be aware that Satan smiles every time we make a Type II error. Want an F instead of a t ? As Mike suggested, just square the t. The p will be the same. 2. If you have only three groups, use Fisher's procedure. As Mike pointed out, it is more powerful. What he did not point out is that it does cap alpha familywise at the nominal level, so there is no good reason to use a more conservative procedure, unless you just really want to make Satan smile again. More than three groups? Use the REGWQ, which will hold the familywise error at no more than the nominal level and is more powerful than the Tukey. For special cases there may be better choices. 3. Some say it does not matter whether your comparions are planned or not, others say it does. If you belong to the later camp you can just tell yourself that you planned to make every possible comparison among means and thus you don't have to worry about familywise error. :-) 4. If, as I suspect, you are simply comparing two means (five times), I can provide a SPSS script that will compute the value of g (estimate of Cohen's d) and put a confidence interval on it. "Percentage of variance explained" statistics are commonly misinterpreted, so I avoid them if I can. Deciding between eta-squared (or the similar omega-squared) and partial eta-squared can be a challenge -- can you or can you not justify removing from the total variance the variance accounted for by the other factor(s)? With partial eta-squared in a factorial design you can end up accounting for over 100% of the variance. 5. To make Satan smile again. You would probably not have much difficulty convincing me that the omnibus test is silly and that a set of focused contrasts that address your research questions is the better way to go. ============================================================= Annette Kujawski Taylor, Ph.D. Professor of Psychology University of San Diego 5998 Alcala Park San Diego, CA 92110 619-260-4006 [EMAIL PROTECTED]
