It also looks like you are not initializing the size of Gamete in the "full Rcpp" version, so it is resizing on every loop iteration...
Best, -- Hao Ye h...@ucsd.edu On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <tke...@utexas.edu> wrote: > 1) Figure out where you are making full copies of your data structures > (casting NumericVector to std::vector<double>) > 2) Use eg R::rbinom instead of rbinom if you do not want the return as an > Rcpp Vector type. > 3) Move the outer loop (your replicate call) into C++ > > The copying of data is what is slowing things down most likely. > > T. > > http://www.keittlab.org/ > > On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtol...@gmail.com> > wrote: > >> Dear All, >> >> It is my first post here at Rcpp-devel List. I'm passionate with the >> awesome work and with all perspectives that it can opens for speed up codes >> and embedding of C++ in R package seamless. Thus, I wish to thank all >> developers. >> >> After, some reading over the Book, online vignettes and so on. I >> translated one of my R scripts using two approaches: taking all advantages >> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP >> objects, working only with std:: objects and wrapping the return. >> >> So far so good, however, by my surprise I found huge difference regarding >> the performance between both implementations. If it's not enough, even the >> raw R code shown better performance compared to the "full Rcpp". >> >> I made a short example to show you the point. First of all let me explain >> the process for who do not have any idea about meiosis: We take a >> hypothetical genome (that gives us the SNPs positions, vector Map); then >> based on this length we draw a Poisson value (where length is lambda), with >> this value (e.g., n) we draw n uniform within the genome length. With this, >> after some manipulations (findInterval % 2) we know where Crossing Over >> (recombination) happened. Then we can take this prototype and generate a >> gamete of some entry genotype (DNA tapes A and B). >> >> The codes are: >> >> Raw R script: >> >> ## Meiosis process through Poisson/Uniform >> MeiosisR <- function( Map, TapeA, TapeB ) >> { >> ## Number of Crossing Overs **Poisson Draw** >> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map))) >> ## Assigning Crossing Overs on the Genome **Uniform Draw** >> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max = >> max(Map))) >> ## Building the "Prototype" **mixture between the 2 haplotypes** >> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2 >> ## Raffle for the reference haplotype >> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) ) >> { >> ## Building the gamete >> Gamete <- ifelse(test = as.logical(Prototype), >> yes = TapeA, >> no = TapeB) >> } else { >> ## idem with other "guide" >> Gamete <- ifelse(test = as.logical(Prototype), >> yes = TapeB, >> no = TapeA) >> } >> return( Gamete ) >> } >> >> STL cpp version: >> >> // [[Rcpp::plugins(cpp11)]] >> #include <algorithm> >> #include <Rcpp.h> >> >> using namespace Rcpp; >> >> // [[Rcpp::export]] >> std::vector<int> MeiosisSTL( std::vector<double> Map, >> std::vector<int> TapeA, >> std::vector<int> TapeB ) { >> >> // Obtaining Chromossome's parameter: length, number & positions of >> SNPs >> double L = *std::max_element(Map.begin(), Map.end()); >> int Size = Map.size(); >> >> // RNG process: draw the number & positions of Crossing Overs >> double CrossingNumber = as<double>( rpois(1, L) ); >> std::vector<double> CrossingPositions = as<std::vector<double> >( >> runif( CrossingNumber, 0.0, L ) ); >> std::sort(CrossingPositions.begin(), CrossingPositions.end()); >> >> // Building a Gamete "Prototype" >> std::vector<bool> Prototype( Size ); >> >> // "findInterval" adapted from Wickham's Rcpp Chapter ( >> http://adv-r.had.co.nz/Rcpp.html) >> std::vector<double>::iterator Iter, Positions; >> std::vector<bool>::iterator ProtoIter; >> std::vector<double>::iterator MapBegin = Map.begin(); >> std::vector<double>::iterator MapEnd = Map.end(); >> std::vector<bool>::iterator ProtoBegin = Prototype.begin(); >> std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin(); >> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end(); >> >> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, >> ++ProtoIter) { >> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); >> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or >> even >> } >> >> std::vector<int> Gamete; >> Gamete.reserve( Size ); >> // raffle: which tape shall be the "guide"? >> bool v01 = as<bool>( rbinom(1, 1, .5) ); >> >> if ( v01 ) { >> for ( int i = 0; i < Size; i++ ) { >> // Build the Gamete from the Prototype >> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( >> i )) ); >> } >> } else { >> for ( int i = 0; i < Size; i++ ) { >> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( >> i )) ); >> } >> } >> >> return Gamete ; >> } >> >> Full Rcpp version: >> >> // [[Rcpp::plugins(cpp11)]] >> #include <algorithm> >> #include <Rcpp.h> >> >> using namespace Rcpp; >> >> // [[Rcpp::export]] >> IntegerVector MeiosisRcpp( NumericVector Map, >> IntegerVector TapeA, >> IntegerVector TapeB ) { >> >> double L = max( Map ); // sugar :) >> int Size = Map.size(); >> >> double CrossingNumber = as<double>( rpois(1, L) ); >> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L ); >> std::sort(CrossingPositions.begin(), CrossingPositions.end()); >> >> LogicalVector Prototype( Size ); >> >> NumericVector::iterator Iter, Positions; >> LogicalVector::iterator ProtoIter; >> NumericVector::iterator MapBegin = Map.begin(); >> NumericVector::iterator MapEnd = Map.end(); >> LogicalVector::iterator ProtoBegin = Prototype.begin(); >> NumericVector::iterator CrossPosBegin = CrossingPositions.begin(); >> NumericVector::iterator CrossPosEnd = CrossingPositions.end(); >> >> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, >> ++ProtoIter) { >> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); >> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or >> even >> } >> >> IntegerVector Gamete; >> bool v01 = as<bool>( rbinom(1, 1, .5) ); >> >> if ( v01 ) { >> for ( int i = 0; i < Size; i++ ) { >> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); >> >> } >> } else { >> for ( int i = 0; i < Size; i++ ) { >> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) ); >> } >> } >> >> return Gamete ; >> } >> >> If we take the entry as: >> >> ## Genome Setup >> N <- 1E4 # number of gametes >> nM <- 400 # number of Markers (SNPs) >> L <- 4 # genome length (in Morgans) >> >> ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes) >> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions >> **regular** >> map <- sort(runif(nM, 0, L)) # idem **random** >> tapeA <- rep(1, nM) >> tapeB <- rep(2, nM) >> >> You will found results like that: >> >> evaluated expressions: >> >> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB = >> tapeB)) >> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA, >> TapeB = tapeB)) >> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, >> TapeB = tapeB)) >> >> - microbenchmark: >> >> Unit: seconds >> expr min lq mean median uq max neval >> cld >> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 100 >> b >> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100 >> a >> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565 >> 100 c >> >> - rbenchmark: >> >> test replications elapsed user.self sys.self >> 2 cppSTL 100 18.048 18.020 0.020 >> 1 RawR 100 126.264 124.888 1.324 >> 3 Rcpp 100 208.647 208.548 0.012 >> >> Just to report, my sessionInfo() is as follow: >> >> sessionInfo() # Useful Infos **Reproducible Research** >> R version 3.2.1 (2015-06-18) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Debian GNU/Linux 8 (jessie) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=en_US.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5 >> >> loaded via a namespace (and not attached): >> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1 >> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1 >> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8 >> [13] proto_0.3-10 munsell_0.4.2 >> >> After that, I wish to request help to evaluate those results. Is "full >> Rcpp" not a good choice when using iterators? Let me know if I am not >> taking care about something that may change the things. You may notice that >> I left all parameters as fixed as possible, then, they should not influence >> on that. >> >> I am open to share the files or give further information. >> >> Thank you very much indeed regarding any insight on that. >> >> Cheers, >> FH >> >> _______________________________________________ >> Rcpp-devel mailing list >> Rcpp-devel@lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >> > > > _______________________________________________ > Rcpp-devel mailing list > Rcpp-devel@lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel