> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Stephan Dlugosz > Sent: Thursday, November 19, 2009 7:03 AM > To: r-help@r-project.org > Subject: [R] Efficient cbind of elements from two lists > > Hi! > > I have a data.frame "data" and splitted it. > > data <- split(data, data[,1]) > > This is a quite slow procedure; and I do not want to do it again. So, > any unsplit and "resplit" is no option for me. > But: I have to cbind "variables" to the splitted data from > another list, > that contains of vectors with matching sizes, so > > for (i in 1:length(data)) { > data[[i]] <- cbind(data[[i]], l[[i]])) > } > > works well; but very, very slowly. > The lapply solution: > > data <- lapply(1:k, function(i) cbind(data[[i]], l[[i]])) > > does not improve the situation, but allows for mclapply from the > multicore package... > Is there a more efficient way to combine elements from two lists?
Can you restructure your analysis so you don't need to split the data.frame itself? I'm assuming the split was slow because there are a lot of groups. Splitting a data.frame into lots of pieces is considerably slower than splitting a few numeric or character columns in it. > df <- data.frame(group=rep(1:1e5, each=2), score=1:2e5) > system.time(split(df, df$group)) # split entire data.frame into 1e5 parts user system elapsed 117.32 38.42 154.34 > system.time(split(df$score, df$group)) # split 2nd column into 1e5 parts user system elapsed 0.43 0.03 0.46 If R does things the way S+ does this is because splitting simple vectors is done in C code but splitting data.frames invokes the S-language [.data.frame function, which is relatively slow when selecting rows from a data.frame. I'd suggest using ave() (or a function from the plyr package), working on columns from your data.frame and adding ave's output as a column in your big data.frame. E.g., to compute the average score in each group > system.time(df$meanScore <- ave(df$score, df$group, FUN=mean)) user system elapsed 3.37 0.00 3.50 > df[1:6,] group score meanScore 1 1 1 1.5 2 1 2 1.5 3 2 3 3.5 4 2 4 3.5 5 3 5 5.5 6 3 6 5.5 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Thank you very much! > > Greetings, > Stephan > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.