I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around.
Often it happens that there is a list with lots of matrices or data frames in it and we need to "stack those together". I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most "elegant" way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( "rbind", mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson <pauljohn at ku.edu> ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 <- data.frame(x=rnorm(100),y=rnorm(100)) df2 <- data.frame(x=rnorm(100),y=rnorm(100)) df3 <- data.frame(x=rnorm(100),y=rnorm(100)) df4 <- data.frame(x=rnorm(100),y=rnorm(100)) mylist <- list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF <- mylist[[1]] for (i in 2:4) resultDF <- rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## "complete" function in the "mice" package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m <- 4 nr <- nrow(df1) nc <- ncol(df1) dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer <- do.call("rbind",mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than "dataComplete" ## First, make a bigger test case and use system.time to evaluate phony <- function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist <- lapply(1:1000, phony) ### First, try the terse way system.time( shortAnswer <- do.call("rbind", mylist) ) ### Second, try the complete way: m <- 1000 nr <- nrow(df1) nc <- ncol(df1) system.time( dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) system.time( for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]] ) ## On my Thinkpad T62 dual core, the "shortAnswer" approach takes about ## three times as long: ## > system.time( bestAnswer <- do.call("rbind",mylist) ) ## user system elapsed ## 14.270 1.170 15.433 ## > system.time( ## + dataComplete <- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ## + ) ## user system elapsed ## 0.000 0.000 0.006 ## > system.time( ## + for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] <- mylist[[j]] ## + ) ## user system elapsed ## 4.940 0.050 4.989 ## That makes the do.call way look slow, and I said "hey, ## our stupid for loop at the beginning may not be so bad. ## Wrong. It is a disaster. Check this out: ## > resultDF <- phony(1) ## > system.time( ## + for (i in 2:1000) resultDF <- rbind(resultDF, mylist[[i]]) ## + ) ## user system elapsed ## 159.740 4.150 163.996 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.