I don't know what the goal of the analysis is, but I have a suspicion that the `gbm' package might be a more fruitful way...
Cheers, Andy From: Lucy Crooks > > Thanks for your reply. > > Thanks for info on aov-hadn't been able to tell which to use from > help pages. There are no random effects so will switch to lm(). > > The data are amino acid sequences, with factor being position and > level which amino acid is present. There are indeed an average of > around 8 per position (from 2 to 20). I don't think I can collapse > the levels at least to start with as I don't know in advance which > effect fitness (the y variable). > > From what you say R should be able to do the smaller analysis. So > have increased the RAM and will try this again. > > Lucy Crooks > > On Feb 1, 2006, at 3:45 PM, Peter Dalgaard wrote: > > You do not want to use aov() on unbalanced data, and > especially not on > > large data sets if random effects are involved. Rather, you need to > > look at lmer() or just lm() if no random effects are present. > > > > However, even so, if you really have 29025 parameters to estimate, I > > think you're out of luck. 8 billion (US) elements is 64G > and R is not > > able to handle objects of that size - the limit is that the > size must > > fit in a 32 bit integer (about 2 billion elements). > > > > A quick calculation suggests that your factors have around 8 levels > > each. Is that really necessary, or can you perhaps collapse some > > levels? > > > > > > > > -- > > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > (*) \(*) -- University of Copenhagen Denmark > Ph: (+45) > > 35327918 > > ~~~~~~~~~~ - ([EMAIL PROTECTED]) > FAX: (+45) > > 35327907 > > > > Lucy Crooks <[EMAIL PROTECTED]> writes: > >> I want to do an unbalanced anova on 272,992 observations with 405 > >> factors including 2-way interactions between 1 of these factors and > >> the other 404. After fitting only 11 factors and their > interactions I > >> get error messages like: > >> > >> Error: cannot allocate vector of size 1433066 Kb > >> R(365,0xa000ed68) malloc: *** vm_allocate(size=1467461632) failed > >> (error code=3) > >> R(365,0xa000ed68) malloc: *** error: can't allocate region > >> R(365,0xa000ed68) malloc: *** set a breakpoint in szone_error to > >> debug > >> > >> I think that the anova involves a matrix of 272,992 rows by 29025 > >> columns (using dummy variables)=7,900 million elements. I realise > >> this is a lot! Could I solve this if I had more RAM or is > it just too > >> big? > >> > >> Another possibility is to do 16 separate analyses on 17,062 > >> observations with 404 factors (although statistically I think the > >> first approach is preferable). I get similar error messages then: > >> > >> Error: cannot allocate vector of size 175685 Kb > >> R(365,0xa000ed68) malloc: *** vm_allocate(size=179904512) failed > >> (error code=3) > >> > >> I think this analysis requires a 31 million element matrix. > >> > >> I am using R version 2.2.1 on a Mac G5 with 1 GB RAM running OS > >> 10.4.4. Can somebody tell me what the limitations of my machine (or > >> R) are likely to be? Whether this smaller analysis is feasible? and > >> if so how much more memory I might require? > >> > >> The data is in R in a data frame of 272,992 rows by 406 columns. I > >> would really appreciate any helpful input. > >> > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
