Further pursuing my curiosity to measure the efficiency of R/C++ interface, I conducted a simple matrix-vector multiplication test using .C and .Call functions in R. In each case, I measured the execution time in R, as well as inside the C++ function. Subtracting the two, I came up with a measure of overhead associated with each call. I assume that this overhead would be non-existent of the entire code was implemented in C++. [Please let me know if my approach is somehow faulty.] The results of this simple test have significant implications for my software development approach. I hope others find this information useful as well. I have attached the code files in the bottom. Below I provide a summary output, and interpret the results. [All times in table below are in microseconds]
m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead 100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93* 200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96* 500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75* Interpretation: 1- .Call overhead holds nearly constant, i.e. independent of data size. This is expected since .Call works by passing pointers. (How can we explain the slight increase in overhead?) 2- C++ times for .C are somewhat better than .Call. This is likely to be due to the overhead associated with unpacking the SEXP pointers in a .Call function. 3- The overhead for .C dominates the execution time. For a 500x20 matrix, the overhead is ~90% of total time. This means that whenever we need to make repeated calls to a C/C++ function from R, and when performance is important to us, .Call is much preferred to .C, even at modest data sizes. 4- Overhead for .C scales sub-linearly with data size. I imagine that this overhead can be reduced through registering the functions and using the style field in R_CMethodDef to optimize data transfer (per Section 5.4 of "Writing R Extensions"), but perhaps not by more than a half. 5- Even using .Call, the overhead is still a significant percentage of total time, though the contribution decreases with data size (and compute load of function). However, in the context of parallelization benefits, this overhead puts an unpleasant cap on achievable gains. For example, even if the compute time goes down by 100x, if overhead was even 10% of the time, our effective gain will saturate at 10x. In other words, effective parallelization can only be achieved by moving big chunks of the code to C/C++ so as to avoid repeated calls from R to C. I look forward to your comments. -Alireza -------------------------------- #code file: mvMultiply.r #measuring execution time for a matrix-vector multiplication (A%*%x) using .C and .Call functions #execution time is measured both in R and inside the C++ functions; subtracting the two is #used as an indicator of overhead associated with each call rm(list = ls()) system("R CMD SHLIB mvMultiply.cc") dyn.load("mvMultiply.so") m <- 100 #number of rows in matrix A n <- 10 #number of columns in matrix A (= number of rows in vector x) N <- 1000000 A <- runif(m*n) x <- runif(n) # measuring .Call tCall_c <- 0.0 t1 <- proc.time()[3] for (i in 1:N) { tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x)) } tCall_R <- proc.time()[3] - t1 cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n") cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n") cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec -> per call: ", 1000000*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n") # measuring .C tC_c <- 0.0 t1 <- proc.time()[3] for (i in 1:N) { tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x), double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t } tC_R <- proc.time()[3] - t1 cat(".C - Time measured in R: ", round(tC_R,2), "sec\n") cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n") cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec -> per call: ", 1000000*(round(tC_R,2) - round(tC_c,2))/N, "usec\n") dyn.unload("mvMultiply.so") -------------------------------- #code file: myMultiply.cc #include <Rinternals.h> #include <R.h> #include <sys/time.h> extern "C" { SEXP matvecMultiply(SEXP A, SEXP x) { timeval time1, time2; gettimeofday(&time1, NULL); SEXP execTime_sxp; PROTECT(execTime_sxp = allocVector(REALSXP, 1)); double *execTime; execTime = REAL(execTime_sxp); double *rA = REAL(A), *rx = REAL(x), *ry; int n = length(x); int m = length(A) / n; SEXP y; PROTECT(y = allocVector(REALSXP, m)); ry = REAL(y); for (int i = 0; i < m; i++) { ry[i] = 0.0; for (int j = 0; j < n; j++) { ry[i] += rA[j * m + i] * rx[j]; } } UNPROTECT(1); gettimeofday(&time2, NULL); *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec - time1.tv_usec)/1000000.0; UNPROTECT(1); return execTime_sxp; } void matvecMultiply2(double *A, double *x, double *y, int *m_ptr, int *n_ptr, double *execTime) { timeval time1, time2; gettimeofday(&time1, NULL); int m = *m_ptr; int n = *n_ptr; for (int i = 0; i < m; i++) { y[i] = 0.0; for (int j = 0; j < n; j++) { y[i] += A[j * m + i] * x[j]; } } gettimeofday(&time2, NULL); *execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec - time1.tv_usec)/1000000.0; } } -------------------------------- -- View this message in context: http://r.789695.n4.nabble.com/Measuring-and-comparing-C-and-Call-overhead-tp3678361p3678361.html Sent from the R devel mailing list archive at Nabble.com. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel