On Friday 17 October 2003 03:33, Alexander Sirotkin \[at Yahoo\] wrote: > > > > > One more (hopefully last one) : I've been very > > > > > surprised when I tried to fit a model (using > > > > > aov()) > > > > > for a sample of size 200 and 10 variables and > > > > > their interactions. > > > > > > > > That doesn't really say much. How many of these > > > > variables are factors ? How > > > > many levels do they have ? And what is the order > > > > of the interaction ? (Note > > > > that for 10 numeric variables, if you allow all > > > > interactions, then there will > > > > be a 100 terms in your model. This increases for > > > > factors.) > > > > > > > > In other words, how big is your model matrix ? > > > > > > I see... > > > > > > Unfortunately, model.matrix() ran out of memory :) > > > I have 10 variables, 6 of which are factor, 2 of > > which > > > have quite a lot of levels (about 40). And I would > > > like to allow all interactions. > > > > > > I understand your point about categorical > > > > variables, > > > > > but still - this does not seem like too much data > > > > to me. > > > > That's one way to look at it. You don't have enough > > data for the model you are > > trying to fit. The usual approach under these > > circumstances is to try > > 'simpler' models. > > > > Please try to understand what you are trying to do > > (in this case by reading an > > introductory linear model text) before blindly > > applying a methodology. > > > > Deepayan > > I did study ANOVA and I do have enough observations. > 200 was only a random sample of more then 5000 which I > think should be enough. However, I'm afraid to even > think about amount of RAM I will need with R to fit a > model for this data.
Let's see. You have 10 variables, 6 of which are factors, 2 of which have at least 40 levels, and you want all interactions. Let's conservatively estimate that all the other four factors have only 2 levels. > x1 = gl(40, 1, 1) > x2 = gl(40, 1, 1) > x3 = gl(2, 1, 1) > x4 = gl(2, 1, 1) > x5 = gl(2, 1, 1) > x6 = gl(2, 1, 1) > dim(model.matrix(~ x1 * x2 * x3 * x4 * x5 * x6)) [1] 1 25600 This was for one data point, increasing that would only increase the number of rows, the columns would be the same. And of course, this is just for 6-way interactions, and the least possible given the information you have given us about your model. In actual fact, your model matrix will have many many more columns. I hope you realize that the number of columns in the model matrix is the number of parameters you are trying to estimate. If your sample size is less than this number (and 5000 is way less), then there will be infinitely many solutions to this problem, each of which will fit your data perfectly. Do you really want such an answer ? Assuming that you find one, what are you going to do with it ? I have no idea what made you choose such an high order model, but as Andy has said, you really should try to figure out what exactly your goals are before proceeding. If you believe that your data can really not be modeled reasonably by anything simpler, you probably should not use a linear model at all. Hope that helps, Deepayan ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
