The questions you ask about the interactions in the model not making sense relates, I believe, to a multiple comparisons issue that is not adequately addressed by the stepAIC analysis you did. To understand this, note first that you've got something close to 2^(2^5) possible models: With 5 main effects, there area 2^5 = 32 possible subsets = "interaction" terms. The function stepAIC tries to select from among models representing all possible subsets of these 2^5 "interaction" terms. Just for clarity on this point, suppose we had 2 main effects, A and B, not 5 as you have. Then there are 2^2 = 4 possible "interaction" terms: 1, A, B, and A:B. Then stepAIC would want to select the best model from among the 2^4 = 16 models consisting of these 4 terms plus all possible combinations of them. With 5 main effects, this set of possible models balloons to 2^32 = 4.3e9. Some of them don't make sense by themselves, e.g., A:B without A and B. However, I don't think that will reduce this 4.3e9 by more than an order of magnitude or so -- and maybe only a few percent. If we ignore this reduction, then the Bonferroni correction suggests we multiply all your p values by 4.3e9. If you still have any that are less than, say, 0.05, then you could believe those. Otherwise, I'd be skeptical. A better way of handling this, I believe is Bayesian Model Averaging. An R package BMA is supposed to address that, but I don't know if it will help you.
Other questions: What are your 5 explanatory variables, NR, NS, PA, KINDERWR, and WM? In particular, are they numbers representing at least an ordinal scale or are they categories? If numbers, you've got possibilities for parabolic terms that you haven't even considered. If categories, how many levels are there? If some of them have large numbers of levels, you may want to consider those factors as random effects. In that case, you need something to do 'mixed-effects' modeling. For that, people use the 'nlme' or 'lme4' packages. Have you tried to find a statistician with whom you could consult or even possibly collaborate on this? Hope this helps. Spencer Graves Jenifer Larson-Hall wrote: > I don’t know if this question properly belongs on this list, but I’ll ask it > here because I’ve been using R to run linear regression models, and it is > only in using R (after switching from using SPSS) that I have discovered the > process of fitting a linear model. However, after reading Crowley (2002), Fox > (2002), Verzani (2004), Dalgaard (2002) and of course searching the R-help > archives I cannot find an answer to my question. > I have 5 explanatory variables (NR, NS, PA, KINDERWR, WM) and one > response variable (G1L1WR). A simple main effects model finds that only PA is > statistically significant, and an anova comparison between a 5-variable main > effects model and a 1-variable main effects model finds no difference between > the models. So it is possible to simplify the model to just G1L1WR ~ PA. This > leaves me with a residual standard error of 0.3026 on 35 degrees of freedom > and an adjusted R2 of 0.552. > I also decided, following Crawley’s (2002) advice, to create a maximal > model, G1L1WR ~ NR*NS*PA*KINDERWR*WM. This full model is not a good fit, but > a stepAIC through the model revealed the model which had a maximal fit: > > maximal.fit=lm(formula = G1L1WR ~ NR + KINDERWR + NS + WM + PA + NR:KINDERWR > + NR:NS + KINDERWR:NS + NR:WM + KINDERWR:WM + NS:WM + NR:PA + + KINDERWR:PA + > NS:PA + WM:PA + NR:KINDERWR:NS + NR:KINDERWR:WM + NR:NS:WM + KINDERWR:NS:WM + > NR:NS:PA + KINDERWR:NS:PA + KINDERWR:WM:PA + NR:KINDERWR:NS:WM, data = > lafrance.NoNA) > > All of the terms of this model have statistical t-tests, the residual > standard error has gone down to 0.2102, and the adjusted R2 has increased to > 0.7839. An anova shows a clear difference between the simplified model and > the maximal fit model. My question is, should I really pick the maximal fit > over the simple model when it is really so much harder to understand? I guess > there’s really no easy answer to that, but if that’s so, then my question > is—would there be anything wrong with me saying that sometimes you might > value parsimony and ease of understanding over best fit? Because I don’t > really know what the maximal fit model buys you. It seems unintelligible to > me. All of the terms are involved in interactions to some extent, but there > are 4-way interactions and 3-way interactions and 2-way interactions and I’m > not sure even how to understand it. A nice tree model showed that at higher > levels of PA, KINDERWR and NS affected scores. That I can understand, but > that is not reflected in this model. > > An auxiliary question, probably easier to answer, is how could I do > hierarchical linear regression? The authors knew that PA would be the largest > contributor to the response variable because of previous research, and their > research question was whether PA would contribute anything AFTER the other 4 > variables had already eaten their piece of the response variable pie. I know > how to do a hierarchical regression in SPSS, and want to show in parallel how > to do this in R. I did search R-help archives and didn’t find quite anything > that would just plain tell me how to do hierarchical linear regression. > > Thanks in advance for any help. > > Dr. Jenifer Larson-Hall > Assistant Professor of Linguistics > University of North Texas > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.