Hello,
Here I report the slowness in creation of Rcpp DataFrame objects and proposed 
change to speed it up.
For system information, here is output from sessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
...
other attached packages:
[1] microbenchmark_1.3-0 Rcpp_0.11.1         

I am using Rcpp package to port my old functions written with R's C interface 
to a more convenient style of Rcpp.
While writing code that creates data.frame’s, I noticed that the Rcpp-based 
code was running quite a bit slower (using microbenchmark package) than my old 
implementation. The difference was approximately 40(!) times slower for data 
frame 50x2 (row x col)

I have narrowed the speed difference down to the following call:

    return Rcpp::DataFrame::create(Rcpp::Named(“xdata”)=x,
                                   Rcpp::Named(“ydata”)=y);

Where x and y are Rcpp::NumericVector objects.
By debugging through the code and Rcpp, I noticed that during the creation Rcpp 
uses “as.data.frame” conversion on the vector list that contained x, y vectors 
and their names “xdata” and “ydata”, while this step was not necessary in my 
previous code using C interface.

In Rcpp/DataFrame.h:87
       static DataFrame_Impl from_list( Parent obj ){
This in turn calls on line 104:
                return DataFrame_Impl(obj) ;
and which ultimately calls on line 78:
        void set__(SEXP x){
            if( ::Rf_inherits( x, "data.frame" )){
                Parent::set__( x ) ;
            } else{
                SEXP y = internal::convert_using_rfunction( x, "as.data.frame" 
) ;
                Parent::set__( y ) ;
            }
        }
Since the DataFrame::create() function has not set a class attribute to 
“data.frame” by far, the conversion “as.data.frame” takes place and slows down 
the creation of the final object.
I propose to make change on line 103 to set class attribute to “data.frame”, so 
no further conversion will take place:
            if( use_default_strings_as_factors ) {
                Rf_setAttrib(obj, R_ClassSymbol, Rf_mkString("data.frame"));
                return DataFrame_Impl(obj) ;
            }

I tested it and it brought the speed of execution of the function to about the 
same as it was before with plain C API.
Please let me know if it makes sense or maybe I should use DataFrame::create() 
function differently.

Best,
Dmitry
 
_______________________________________________
Rcpp-devel mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to