On Jul 19, 2011, at 11:07 AM, Alireza Mahani wrote: > Further pursuing my curiosity to measure the efficiency of R/C++ interface, I > conducted a simple matrix-vector multiplication test using .C and .Call > functions in R. In each case, I measured the execution time in R, as well as > inside the C++ function. Subtracting the two, I came up with a measure of > overhead associated with each call. I assume that this overhead would be > non-existent of the entire code was implemented in C++. [Please let me know > if my approach is somehow faulty.] The results of this simple test have > significant implications for my software development approach. I hope others > find this information useful as well. I have attached the code files in the > bottom. Below I provide a summary output, and interpret the results. [All > times in table below are in microseconds] > > m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead > 100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93* > 200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96* > 500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75* >
Your "overhead" calculations are anything but since they include a loop, value replacement, arithmetics, unnecessary function calls (you don't need as.double() for .Call - it's more efficient to use coerceVector() or throw an error depending on what you desire) - that have nothing to do with .Call. In addition, if you really cared about overhead, you would avoid symbol lookup millions of times: foo.c: #include <Rinternals.h> SEXP foo() { return R_NilValue; } > system.time(for(i in 1:1e6) .Call("foo")) user system elapsed 4.943 0.016 4.960 > foo = getNativeSymbolInfo("foo")$address > system.time(for(i in 1:1e6) .Call(foo)) user system elapsed 0.363 0.001 0.365 This is very close to just calling a constant closure with the same result: > f=function() NULL > system.time(for(i in 1:1e6) f()) user system elapsed 0.238 0.001 0.239 So .Call overhead is essentially negligible - and not what you were measuring ;). There is only a tiny overhead in forcing the argument promise, but it's constant: > x=rnorm(1e6) > system.time(for(i in 1:1e6) .Call(foo,x)) user system elapsed 0.455 0.001 0.456 > x=rnorm(1e9) > system.time(for(i in 1:1e6) .Call(foo,x)) user system elapsed 0.454 0.000 0.455 and it's again no different that for a closure: > f=function(x) NULL > system.time(for(i in 1:1e6) f()) user system elapsed 0.259 0.001 0.259 > system.time(for(i in 1:1e6) f(x)) user system elapsed 0.329 0.001 0.329 so this is inherent to any call. .Call is the fastest you can get, there is nothing faster (other than hacking your code into R internals...). Cheers, Simon > Interpretation: > > 1- .Call overhead holds nearly constant, i.e. independent of data size. This > is expected since .Call works by passing pointers. (How can we explain the > slight increase in overhead?) > 2- C++ times for .C are somewhat better than .Call. This is likely to be due > to the overhead associated with unpacking the SEXP pointers in a .Call > function. > 3- The overhead for .C dominates the execution time. For a 500x20 matrix, > the overhead is ~90% of total time. This means that whenever we need to make > repeated calls to a C/C++ function from R, and when performance is important > to us, .Call is much preferred to .C, even at modest data sizes. > 4- Overhead for .C scales sub-linearly with data size. I imagine that this > overhead can be reduced through registering the functions and using the > style field in R_CMethodDef to optimize data transfer (per Section 5.4 of > "Writing R Extensions"), but perhaps not by more than a half. > 5- Even using .Call, the overhead is still a significant percentage of total > time, though the contribution decreases with data size (and compute load of > function). However, in the context of parallelization benefits, this > overhead puts an unpleasant cap on achievable gains. For example, even if > the compute time goes down by 100x, if overhead was even 10% of the time, > our effective gain will saturate at 10x. In other words, effective > parallelization can only be achieved by moving big chunks of the code to > C/C++ so as to avoid repeated calls from R to C. > > I look forward to your comments. > > -Alireza > -------------------------------- > #code file: mvMultiply.r > #measuring execution time for a matrix-vector multiplication (A%*%x) using > .C and .Call functions > #execution time is measured both in R and inside the C++ functions; > subtracting the two is > #used as an indicator of overhead associated with each call > > rm(list = ls()) > system("R CMD SHLIB mvMultiply.cc") > dyn.load("mvMultiply.so") > > m <- 100 #number of rows in matrix A > n <- 10 #number of columns in matrix A (= number of rows in vector x) > N <- 1000000 > > A <- runif(m*n) > x <- runif(n) > > # measuring .Call > tCall_c <- 0.0 > t1 <- proc.time()[3] > for (i in 1:N) { > tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x)) > } > tCall_R <- proc.time()[3] - t1 > cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n") > cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n") > cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec > -> per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n") > > # measuring .C > tC_c <- 0.0 > t1 <- proc.time()[3] > for (i in 1:N) { > tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x), > double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t > } > tC_R <- proc.time()[3] - t1 > cat(".C - Time measured in R: ", round(tC_R,2), "sec\n") > cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n") > cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec -> per > call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n") > > dyn.unload("mvMultiply.so") > -------------------------------- > #code file: myMultiply.cc > #include <Rinternals.h> > #include <R.h> > #include <sys/time.h> > > extern "C" { > > SEXP matvecMultiply(SEXP A, SEXP x) { > timeval time1, time2; > gettimeofday(&time1, NULL); > SEXP execTime_sxp; > PROTECT(execTime_sxp = allocVector(REALSXP, 1)); > double *execTime; execTime = REAL(execTime_sxp); > double *rA = REAL(A), *rx = REAL(x), *ry; > int n = length(x); > int m = length(A) / n; > SEXP y; > PROTECT(y = allocVector(REALSXP, m)); > ry = REAL(y); > for (int i = 0; i < m; i++) { > ry[i] = 0.0; > for (int j = 0; j < n; j++) { > ry[i] += rA[j * m + i] * rx[j]; > } > } > UNPROTECT(1); > gettimeofday(&time2, NULL); > *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec - > time1.tv_usec)/1000000.0; > UNPROTECT(1); > return execTime_sxp; > } > > void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int > *n_ptr, double *execTime) { > timeval time1, time2; > gettimeofday(&time1, NULL); > int m = *m_ptr; > int n = *n_ptr; > for (int i = 0; i < m; i++) { > y[i] = 0.0; > for (int j = 0; j < n; j++) { > y[i] += A[j * m + i] * x[j]; > } > } > gettimeofday(&time2, NULL); > *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec - > time1.tv_usec)/1000000.0; > } > > } > -------------------------------- > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel