I've written a "dataframe" package that replaces existing methods for data frame creation and subscripting with versions that use less memory. For example, as.data.frame(a vector) makes 4 copies of the data in R 2.9.2, and 1 copy with the package. There is a small speed gain.
I and others have been using it at Google for some years, and it is time to either put it on CRAN, or move it into R. R core folks - would you prefer that this be released to CRAN, or would you like to consider merging it directly into R? I took existing functions, and did some hacks to reduce the number of times R copies objects. Some of it is ugly. This could be done more efficiently, and with cleaner code, with some changes or hooks in R internal code, but I'm not prepared to do that. I often use lists instead of data frames. In another package I have a 'subscriptRows' function that subscripts a list as if it were a data frame. I could merge that into the dataframe package. Memory use - number of copies made # R 2.9.2 library(dataframe) # as.data.frame(y) 4 1 # data.frame(y) 8 3 # data.frame(y, z) 8 3 # as.data.frame(l) 10 3 # data.frame(l) 15 5 # d$z <- z 3,2 1,1 # d[["z"]] <- z 4,3 2,1 # d[, "z"] <- z 6,4,2 2,2,1 # d["z"] <- z 6,5,2 2,2,1 # d["z"] <- list(z=z) 6,3,2 2,2,1 # d["z"] <- Z #list(z=z) 6,2,2 2,1,1 # a <- d["y"] 2 1 # a <- d[, "y", drop=F] 2 1 # y and z are vectors, Z and l are lists, and d a data frame. # Where two numbers are given, they refer to: # (copies of the old data frame), # (copies of the new column) # A third number refers to numbers of # (copies made of an integer vector of row names) # ------- seconds (multiple repetitions) ------- # creation/column subscripting row subscripting # R 2.9.2 : 34.2 43.9 43.3 10.6 13.0 # library(dataframe) : 22.5 21.8 21.8 9.7 9.5 9.5 I reported one of the simpler hacks to this list earlier, and it was included in some version of R after 2.9.2, so the current version of R isn't as bad as 2.9.2. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel