Sure. I'll write something up for the gallery, but here's the crude outline.
Here's the C++ code: #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] List BuildCheapDataFrame(List a) { List returned_frame = clone(a); GenericVector sample_row = returned_frame(1); StringVector row_names(sample_row.length()); for (int i = 0; i < sample_row.length(); ++i) { char name[5]; sprintf(&(name[0]), "%d", i); row_names(i) = name; } returned_frame.attr("row.names") = row_names; StringVector col_names(returned_frame.length()); for (int j = 0; j < returned_frame.length(); ++j) { char name[6]; sprintf(&(name[0]), "X.%d", j); col_names(j) = name; } returned_frame.attr("names") = col_names; returned_frame.attr("class") = "data.frame"; return returned_frame; } There are some subtleties in this code: * It turns out that one can't send super-large data frames to it because of possible buffer overflows. I've never seen that problem when I've written Rcpp functions which exchanged SEXPs with R, but this one uses Rcpp:export in order to use sourceCpp. * Notice the invocation of clone() in the first line of the code. If you don't do that, you wind up side-effecting the parameter, which is not what most people would expect. Here's the timing, as measured on an AWS node: > sourceCpp('/tmp/test_adf.cc') > a <- replicate(250, 1:100, simplify=FALSE) > system.time(replicate( { as.data.frame(a) ; NULL }, n=100)) user system elapsed 3.890 0.000 3.892 > system.time(replicate( { BuildCheapDataFrame(a) ; NULL }, n=100)) user system elapsed 0.020 0.000 0.022 Yes, that really is a factor of 200 speedup. On Fri, Jan 18, 2013 at 8:16 AM, Paul Johnson <pauljoh...@gmail.com> wrote: > On Thu, Jan 17, 2013 at 9:54 PM, John Merrill <john.merr...@gmail.com> > wrote: > > As of 2.15.1, data.frame appears to no longer be O(n^2) in the number of > > columns in the frame. That's certainly an improvement, yes. > > > > However, by eliminating calls to data.frame and replacing them with > direct > > class modifications, I can take a routine which takes minutes and reduce > it > > to a routine which takes seconds. So, pragmatically, in Rcpp, I can get > a > > rough factor of sixty, it appears. > > > > > Wow. > > When you have this written out, will you post links to it? I can > learn from your examples, I think. > > pj > > > > > On Thu, Jan 17, 2013 at 7:46 PM, Paul Johnson <pauljoh...@gmail.com> > wrote: > >> > >> On Tue, Jan 15, 2013 at 9:20 AM, John Merrill <john.merr...@gmail.com> > >> wrote: > >> > It appears that DataFrame::create is a thin layer on top of the R > >> > data.frame > >> > call. The guarantee correctness, but also means the performance of an > >> > Rcpp > >> > routine which returns a large data frame is limited by the performance > >> > of > >> > data.frame -- which is utterly horrible. > >> > >> Are you certain that this claim is still true? > >> > >> I was shocked/surprised by the package "dataframe" and the commentary > >> about it. The author said that data.frame was slow because "This > >> contains versions of standard data frame functions in R, modified to > >> avoid making extra copies of inputs. This is faster, particularly for > >> large data." > >> > >> it was repeatedly copying some objects and he proved a substantially > >> faster approach. > >> > >> In the release notes for R-2.15.1, I recall seeing a note that R Core > >> had responded by integrating several of those changes. But still > >> data.frame is not fast for you? > >> > >> If they didn't make the core data.frame as fast, would you care to > >> enlighten us by installing the dataframe package and letting us know > >> if it is still faster? > >> > >> Or perhaps you are way ahead of me and you've already imitated > >> Hesterberg's algorithms in your C++ design? > >> > >> pj > >> > >> -- > >> Paul E. Johnson > >> Professor, Political Science Assoc. Director > >> 1541 Lilac Lane, Room 504 Center for Research Methods > >> University of Kansas University of Kansas > >> http://pj.freefaculty.org http://quant.ku.edu > > > > > > > > -- > Paul E. Johnson > Professor, Political Science Assoc. Director > 1541 Lilac Lane, Room 504 Center for Research Methods > University of Kansas University of Kansas > http://pj.freefaculty.org http://quant.ku.edu >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel