Hi, Fancy indexing tricks aside, another avenue you might want to look into is doing the convolution in fourier space ...
I'll stop there so I minimize the damage I'm doing as far as spreading misinformation in a public forum goes. I'm also interested in investigating that avenue further, as I wrote a 1d convolution for something else in Rcpp and want to drive the "fourier trick" via C code ... I just haven't had the time to do the follow up reading. Anyway, I "know" it works in the 1d case (that's how the base R convolve works), I guess there are particulars to sort out in the 2d case ... And if you *really* want it to go fast, I guess you can go the CUDA route :-) http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf HTH (but I'm sure it didn't :-) -steve On Fri, Jan 6, 2012 at 1:43 PM, Hadley Wickham <had...@rice.edu> wrote: > Hi all, > > The rcpp-devel list has been very helpful to me so far, so I hope you > don't mind another question! I'm trying to speed up my 2d convolution > function: > > > > library(inline) > # 2d convolution ------------------------------------------------------------- > > convolve_2d <- cxxfunction(signature(sampleS = "numeric", kernelS = > "numeric"), plugin = "Rcpp", ' > Rcpp::NumericMatrix sample(sampleS), kernel(kernelS); > int x_s = sample.nrow(), x_k = kernel.nrow(); > int y_s = sample.ncol(), y_k = kernel.ncol(); > > Rcpp::NumericMatrix output(x_s + x_k - 1, y_s + y_k - 1); > for (int row = 0; row < x_s; row++) { > for (int col = 0; col < y_s; col++) { > for (int i = 0; i < x_k; i++) { > for (int j = 0; j < y_k; j++) { > output(row + i, col + j) += sample(row, col) * kernel(i, j); > } > } > } > } > return output; > ') > > > x <- diag(1000) > k <- matrix(runif(20* 20), ncol = 20) > system.time(convolve_2d(x, k)) > # user system elapsed > # 14.759 0.028 15.524 > > I have a vague idea that to get better performance I need to get > closer to bare pointers, and I need to be careful about the order of > my loops to ensure that I'm working over contiguous chunks of memory > as often as possible, but otherwise I've basically exhausted my > C++/Rcpp knowledge. Where should I start looking to improve the > performance of this function? > > The example data basically matches the real problem - x is not usually > going to be much bigger than 1000 x 1000 and k typically will be much > smaller. (And hence, based on what I've read, this technique should > be faster than doing it via a discrete fft) > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > _______________________________________________ > Rcpp-devel mailing list > Rcpp-devel@lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel