Hi everyone, Apologies if this is the wrong place to ask for help with Rcpp/RcppParallel.
I have recently began using Rcpp and RcppParallel to speed up an algorithm that I have written in R ( I still have a lot to learn!). I have had no issues converting my R code into Rcpp code, but I am struggling to get things working in parallel using RcppParallel. I was hoping that someone may be able help improve my understanding of how to translate Rcpp code into code that works with RcppParallel. To make things easier to understand, I have created a simplified example that will hopefully illustrate what I hope to achieve. Firstly, we compute a distance matrix based on xy co-ordinates using the dist() function in R: x <- matrix(1:100 + 0.1, ncol = 2, dimnames = list(NULL, c("x","y"))) x_dist <- dist(x, method = "euclidean") Secondly, we apply a custom algorithm that uses the raw data and distance matrix to compute labels for each point: algo <- function(x, dist = NULL) { # algorithm code to compute a label for each row in x # code is complex and requires both x and distance matrix # but simply assigns an integer to each row labels <- 1:nrow(x) return(labels) } x_labels <- algo(x, dist = x_dist) Now let's say we wanted run the algorithm for all distance methods and store the labels (single thread R code) in a matrix. This is the code I would like to run in parallel as it can take some time to compute the labels using the alogrithm. dist_methods <- c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski") x_labels <- lapply(dist_methods, function(dist_method) { x_dist <- dist(x, method = dist_method) return(algo(x, x_dist)) }) x_labels <- do.call("cbind", x_labels) To speed up the algorithm we can convert these R functions into Rcpp equivalents. I have removed code to make it clearer and simply returned an example matrix/vector. I have also converted the dist_methods to an integer for simplicity (i.e. 1 - euclidean, 2 - maximum etc.). #include <Rcpp.h> using namespace Rcpp; //[[Rcpp::export]] NumericMatrix rcpp_dist(NumericMatrix x, int dist_method = 1) { // code to compute distance matrix depends on dist_method NumericMatrix x_dist(x.nrow(), x.nrow()); return x_dist; } //[[Rcpp::export]] IntegerVector rcpp_algo(NumericMatrix x, NumericMatrix dist) { // code to compute a label for each row in x using distance matrix IntegerVector labels(x.nrow()); return labels; } Finally, we attempt to run the code on multiple threads for each distance method. #include <RcppParallel.h> using namespace RcppParallel; struct RcppParAlgo : public Worker { // source thread safe RMatrix<double> x; RVector<int> dist_method; // output thread safe RMatrix <int> labels; // initialise RcppParAlgo(NumericMatrix x, int dist_method) : x(x), dist_method(dist_method), labels(labels) {} // compute distance matrix based on method and run algo to get labels void operator() (std::size_t begin, std::size_t end) { // call rcpp_dist on x with dist_method // I know this code is wrong but hopefully shows what I am trying to do (this is where I need some help) for(std::size_t i = begin; i < end; i++) { RVector<int> dist_type = dist_method[i]; RMatrix<double> x_dist = rcpp_dist(x, dist_type); RVector<int> x_labels = rcpp_algo(x, x_dist); // push lables into output matrix for(int j = 0; j <= x_labels.size(); j++) { labels(j, i) = x_labels[j]; } } } }; //[[Rcpp::export]] IntegerMatrix rcppParAlgo(NumericMatrix x) { // distance methods IntegerVector dist_methods = {1,2,3,4,5}; // allocate output matrix to store labels IntegerMatrix labels(x.nrow(), dist_methods.size()); // functor RcppParAlgo RcppParAlgo(x, dist_methods); // run in parallel parallelFor(0, dist_methods.length(), RcppParAlgo); return labels; } The main questions I would like to address are: 1. As the code is currently written, rcpp_dist and rcpp_algo do not accept RMatrix/RVector classes which causes problems when I try call these functions in parallel. Is there a way to write generic functions that will work in both cases? I still want to export the definitions of rcpp_dist() and rcpp_algo() so they can be called from R. 2. Is it safe to create new objects on each thread? For example, in my case I need to create the temporary distance matrix which is passed to rcpp_algo() along with x to compute the labels. I haven't seen any examples of this, are there constructors for RMatrix/RVector similar to NumericMatrix/NumericVector? Any help or suggestions would be greatly appreciated. I have placed the Rcpp/RcppParallel code in package hosted on GitHub if you would like to fiddle with the code: https://github.com/DillonHammill/RcppHelp [https://opengraph.githubassets.com/c783348bc98fe97d064ef1da1b860542a1422388bfb389ba366bc8827df916cf/DillonHammill/RcppHelp]<https://github.com/DillonHammill/RcppHelp> GitHub - DillonHammill/RcppHelp: Example RcppParallel Code<https://github.com/DillonHammill/RcppHelp> Example RcppParallel Code. Contribute to DillonHammill/RcppHelp development by creating an account on GitHub. github.com Cheers, Dillon
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel