Hi,

Using some of what we know about the structure of an R matrix, we can use this version:

convolve_2d_tricks <- cxxfunction(signature(sampleS = "numeric", kernelS =
"numeric"), plugin = "Rcpp", '
    Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
    int x_s = sample.nrow(), x_k = kernel.nrow();
    int y_s = sample.ncol(), y_k = kernel.ncol();

    Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
    double* output_row_col_j ;
    double sample_row_col = 0.0 ;
    double* kernel_j ;


    for (int row = 0; row < x_s; row++) {
      for (int col = 0; col < y_s; col++) {
        sample_row_col = sample(row,col) ;

        for (int j = 0; j < y_k; j++) {
          output_row_col_j = & output( row, col+j ) ;
          kernel_j = &kernel(0,j) ;
          for (int i = 0; i < x_k; i++) {
            output_row_col_j[i] += sample_row_col * kernel_j[i] ;
          }
        }
      }
    }
    return output;
')

I get quite a speed up on hadley's example:

utilisateur     système      écoulé
      1.555       0.003       1.558
utilisateur     système      écoulé
      0.547       0.004       0.550
[1] TRUE

The idea is that we compute pointers and constants just once, asap, so that we don't have to do it again in inner loops.

Romain

Le 06/01/12 19:43, Hadley Wickham a écrit :
Hi all,

The rcpp-devel list has been very helpful to me so far, so I hope you
don't mind another question!  I'm trying to speed up my 2d convolution
function:



library(inline)
# 2d convolution -------------------------------------------------------------

convolve_2d<- cxxfunction(signature(sampleS = "numeric", kernelS =
"numeric"), plugin = "Rcpp", '
     Rcpp::NumericMatrix sample(sampleS), kernel(kernelS);
     int x_s = sample.nrow(), x_k = kernel.nrow();
     int y_s = sample.ncol(), y_k = kernel.ncol();

     Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1);
     for (int row = 0; row<  x_s; row++) {
       for (int col = 0; col<  y_s; col++) {
         for (int i = 0; i<  x_k; i++) {
           for (int j = 0; j<  y_k; j++) {
             output(row + i, col + j) += sample(row, col) * kernel(i, j);
           }
         }
       }
     }
     return output;
')


x<- diag(1000)
k<- matrix(runif(20* 20), ncol = 20)
system.time(convolve_2d(x, k))
#    user  system elapsed
# 14.759   0.028  15.524

I have a vague idea that to get better performance I need to get
closer to bare pointers, and I need to be careful about the order of
my loops to ensure that I'm working over contiguous chunks of memory
as often as possible, but otherwise I've basically exhausted my
C++/Rcpp knowledge.  Where should I start looking to improve the
performance of this function?

The example data basically matches the real problem - x is not usually
going to be much bigger than 1000 x 1000 and k typically will be much
smaller.  (And hence, based on what I've read, this technique should
be faster than doing it via a discrete fft)

Hadley



--
Romain Francois
Professional R Enthusiast
http://romainfrancois.blog.free.fr

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to