Moving the call outside the main loop would be effective for some scenarios (i.e, the scenarios where the data objects do not contain NaNs). However, once they do we still want to compute a distance based on the values and "correct" for the NaNs in some way, so skipping the entire object is not really an option. Including a switch between the cases of objects with and objects without NaNs is probably something worthwhile (that and using more rcpp-sugar).

Nevertheless, the question still remains why the rcpp isNaN call is so much slower.

On 12/13/2016 2:04 PM, xian at unm.edu (Christian Gunning) wrote:
|    for (i = 0; i < numObjects; i++) {
|      for (j = 0; j < numCodes; j++) {
|        dist = 0;
|        for (k = 0; k < numVars; k++) {
|          if (!ISNAN(data[i * numVars + k])) {
|            tmp = data[i * numVars + k] - codes[j * numVars + k];

Why not drop data and codes and use  sData1(i,k) - sData2(j,k)  ?
Or better yet, just use the original code with NumericMatrix:
sData1[i * numVars + k] does the right thing.
I don't get any timing difference based on this change.

Using Rcpp sugar
(https://cran.r-project.org/package=Rcpp/vignettes/Rcpp-sugar.pdf),
and moving the call outside the loop, appears to do the right thing.

## modified example
## see edits here:
https://github.com/helmingstay/rcpp-timings/blob/master/diff/rcppdist.cpp#L24
git clone https://github.com/helmingstay/rcpp-timings
cd rcpp-timings/diff
R --vanilla < glue.R
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to