1) Figure out where you are making full copies of your data structures (casting NumericVector to std::vector<double>) 2) Use eg R::rbinom instead of rbinom if you do not want the return as an Rcpp Vector type. 3) Move the outer loop (your replicate call) into C++
The copying of data is what is slowing things down most likely. T. http://www.keittlab.org/ On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtol...@gmail.com> wrote: > Dear All, > > It is my first post here at Rcpp-devel List. I'm passionate with the > awesome work and with all perspectives that it can opens for speed up codes > and embedding of C++ in R package seamless. Thus, I wish to thank all > developers. > > After, some reading over the Book, online vignettes and so on. I > translated one of my R scripts using two approaches: taking all advantages > of Rcpp; and using just the "seamless" integration, i.e., casting SEXP > objects, working only with std:: objects and wrapping the return. > > So far so good, however, by my surprise I found huge difference regarding > the performance between both implementations. If it's not enough, even the > raw R code shown better performance compared to the "full Rcpp". > > I made a short example to show you the point. First of all let me explain > the process for who do not have any idea about meiosis: We take a > hypothetical genome (that gives us the SNPs positions, vector Map); then > based on this length we draw a Poisson value (where length is lambda), with > this value (e.g., n) we draw n uniform within the genome length. With this, > after some manipulations (findInterval % 2) we know where Crossing Over > (recombination) happened. Then we can take this prototype and generate a > gamete of some entry genotype (DNA tapes A and B). > > The codes are: > > Raw R script: > > ## Meiosis process through Poisson/Uniform > MeiosisR <- function( Map, TapeA, TapeB ) > { > ## Number of Crossing Overs **Poisson Draw** > nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map))) > ## Assigning Crossing Overs on the Genome **Uniform Draw** > posCrossOver <- sort(runif(n = nCrossOver, min = 0, max = > max(Map))) > ## Building the "Prototype" **mixture between the 2 haplotypes** > Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2 > ## Raffle for the reference haplotype > if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) ) > { > ## Building the gamete > Gamete <- ifelse(test = as.logical(Prototype), > yes = TapeA, > no = TapeB) > } else { > ## idem with other "guide" > Gamete <- ifelse(test = as.logical(Prototype), > yes = TapeB, > no = TapeA) > } > return( Gamete ) > } > > STL cpp version: > > // [[Rcpp::plugins(cpp11)]] > #include <algorithm> > #include <Rcpp.h> > > using namespace Rcpp; > > // [[Rcpp::export]] > std::vector<int> MeiosisSTL( std::vector<double> Map, > std::vector<int> TapeA, > std::vector<int> TapeB ) { > > // Obtaining Chromossome's parameter: length, number & positions of > SNPs > double L = *std::max_element(Map.begin(), Map.end()); > int Size = Map.size(); > > // RNG process: draw the number & positions of Crossing Overs > double CrossingNumber = as<double>( rpois(1, L) ); > std::vector<double> CrossingPositions = as<std::vector<double> >( runif( > CrossingNumber, 0.0, L ) ); > std::sort(CrossingPositions.begin(), CrossingPositions.end()); > > // Building a Gamete "Prototype" > std::vector<bool> Prototype( Size ); > > // "findInterval" adapted from Wickham's Rcpp Chapter ( > http://adv-r.had.co.nz/Rcpp.html) > std::vector<double>::iterator Iter, Positions; > std::vector<bool>::iterator ProtoIter; > std::vector<double>::iterator MapBegin = Map.begin(); > std::vector<double>::iterator MapEnd = Map.end(); > std::vector<bool>::iterator ProtoBegin = Prototype.begin(); > std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin(); > std::vector<double>::iterator CrossPosEnd = CrossingPositions.end(); > > for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, > ++ProtoIter) { > Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); > *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or > even > } > > std::vector<int> Gamete; > Gamete.reserve( Size ); > // raffle: which tape shall be the "guide"? > bool v01 = as<bool>( rbinom(1, 1, .5) ); > > if ( v01 ) { > for ( int i = 0; i < Size; i++ ) { > // Build the Gamete from the Prototype > Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( i > )) ); > } > } else { > for ( int i = 0; i < Size; i++ ) { > Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( i > )) ); > } > } > > return Gamete ; > } > > Full Rcpp version: > > // [[Rcpp::plugins(cpp11)]] > #include <algorithm> > #include <Rcpp.h> > > using namespace Rcpp; > > // [[Rcpp::export]] > IntegerVector MeiosisRcpp( NumericVector Map, > IntegerVector TapeA, > IntegerVector TapeB ) { > > double L = max( Map ); // sugar :) > int Size = Map.size(); > > double CrossingNumber = as<double>( rpois(1, L) ); > NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L ); > std::sort(CrossingPositions.begin(), CrossingPositions.end()); > > LogicalVector Prototype( Size ); > > NumericVector::iterator Iter, Positions; > LogicalVector::iterator ProtoIter; > NumericVector::iterator MapBegin = Map.begin(); > NumericVector::iterator MapEnd = Map.end(); > LogicalVector::iterator ProtoBegin = Prototype.begin(); > NumericVector::iterator CrossPosBegin = CrossingPositions.begin(); > NumericVector::iterator CrossPosEnd = CrossingPositions.end(); > > for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, > ++ProtoIter) { > Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); > *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or > even > } > > IntegerVector Gamete; > bool v01 = as<bool>( rbinom(1, 1, .5) ); > > if ( v01 ) { > for ( int i = 0; i < Size; i++ ) { > Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); > > } > } else { > for ( int i = 0; i < Size; i++ ) { > Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) ); > } > } > > return Gamete ; > } > > If we take the entry as: > > ## Genome Setup > N <- 1E4 # number of gametes > nM <- 400 # number of Markers (SNPs) > L <- 4 # genome length (in Morgans) > > ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes) > ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions > **regular** > map <- sort(runif(nM, 0, L)) # idem **random** > tapeA <- rep(1, nM) > tapeB <- rep(2, nM) > > You will found results like that: > > evaluated expressions: > > RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB = > tapeB)) > cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA, > TapeB = tapeB)) > Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, TapeB > = tapeB)) > > - microbenchmark: > > Unit: seconds > expr min lq mean median uq max neval > cld > RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 100 > b > cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100 > a > Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565 100 > c > > - rbenchmark: > > test replications elapsed user.self sys.self > 2 cppSTL 100 18.048 18.020 0.020 > 1 RawR 100 126.264 124.888 1.324 > 3 Rcpp 100 208.647 208.548 0.012 > > Just to report, my sessionInfo() is as follow: > > sessionInfo() # Useful Infos **Reproducible Research** > R version 3.2.1 (2015-06-18) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 8 (jessie) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5 > > loaded via a namespace (and not attached): > [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1 > [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1 > [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8 > [13] proto_0.3-10 munsell_0.4.2 > > After that, I wish to request help to evaluate those results. Is "full > Rcpp" not a good choice when using iterators? Let me know if I am not > taking care about something that may change the things. You may notice that > I left all parameters as fixed as possible, then, they should not influence > on that. > > I am open to share the files or give further information. > > Thank you very much indeed regarding any insight on that. > > Cheers, > FH > > _______________________________________________ > Rcpp-devel mailing list > Rcpp-devel@lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel