One thing you might consider when working with large dataframes is that instead of partitioning the dataframe into smaller ones, create a list of indices and use that to access the subset. Works especially well when using 'lapply' to cromp through many segments of a data frame:
> y suid month esr 1 1074034 12 6 2 1074034 1 2 3 1074034 2 2 4 1074034 3 2 5 1074034 12 1 6 1074034 1 1 7 1074034 2 1 8 1074034 3 1 9 1074034 12 2 10 1074034 1 2 11 1074034 2 2 12 1074034 3 2 13 1074034 12 9 14 1074034 1 9 15 1074034 2 9 16 1074034 3 9 17 1123003 12 2 18 1123003 1 2 19 1123003 2 2 20 1123003 3 2 > y.ind <- split(seq(nrow(y)), y$month) > y.ind $`1` [1] 2 6 10 14 18 $`2` [1] 3 7 11 15 19 $`3` [1] 4 8 12 16 20 $`12` [1] 1 5 9 13 17 > # a subset > y[y.ind[['12']],] suid month esr 1 1074034 12 6 5 1074034 12 1 9 1074034 12 2 13 1074034 12 9 17 1123003 12 2 > On Wed, Jun 24, 2009 at 11:34 PM, Stephan Lindner <lindn...@umich.edu>wrote: > Dear all, > > > I have a code where I subset a data frame to match entries within > levels of an factor (actually, the full script uses three difference > factors do do that). I'm very happy with the precision with which I can > work with R, but since I loop over factor levels, and the data frame is > big, the process is slow. So I've been trying to speed up the process > using by(), but I got stuck at the point where I want to stack back > the sub- data frames, and I was wondering whether someone could help me > out. > > Here is an example: > > <-- > > > y <- data.frame(suid = c(rep(1074034,16),rep(1123003,4)), > month = rep(c(12,1,2,3),5), > esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2)) > > > > by(y,y$month,function(x)return(x)) > > y$month: 1 > suid month esr > 2 1074034 1 2 > 6 1074034 1 1 > 10 1074034 1 2 > 14 1074034 1 9 > 18 1123003 1 2 > ------------------------------------------------------------ > y$month: 2 > suid month esr > 3 1074034 2 2 > 7 1074034 2 1 > 11 1074034 2 2 > 15 1074034 2 9 > 19 1123003 2 2 > ------------------------------------------------------------ > y$month: 3 > suid month esr > 4 1074034 3 2 > 8 1074034 3 1 > 12 1074034 3 2 > 16 1074034 3 9 > 20 1123003 3 2 > ------------------------------------------------------------ > y$month: 12 > suid month esr > 1 1074034 12 6 > 5 1074034 12 1 > 9 1074034 12 2 > 13 1074034 12 9 > 17 1123003 12 2 > > --> > > What I would like to do is stacking these four data frames back to one > data frame, which in this simple example would just be y. I tried > unlist(), unclass() and rbind(), but none of them would work. > > > Thanks a lot, > > > > Stephan > > > > > > > > > > > -- > ----------------------- > Stephan Lindner > University of Michigan > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.