---------- Forwarded message ---------- From: FHRB Toledo <fernandohtol...@gmail.com> Date: 11 August 2015 at 12:07 Subject: Re: [Rcpp-devel] Is Rcpp:: not a good choice for iterators? To: Hao Ye <h...@ucsd.edu>
Hi Hao Ye, Great, , you are right! I tried your suggestion through: NumericVector Gamete = no_init( Size ); And then change the loop such as: for ( int i = 0; i < Size; i++ ) { Gamete[ i ] = ( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); } With those, the results seem much more reasonable: expr min lq mean median uq max neval cld RawR 1.2258215 1.2904011 1.3287819 1.3133870 1.3466908 1.6158531 100 c cppSTL 0.1741156 0.1791741 0.1933450 0.1835020 0.2083610 0.2628995 100 a Rcpp 0.2019144 0.2086193 0.2217021 0.2154814 0.2234401 0.4010162 100 b and as rebuttal: test replications elapsed user.self sys.self 2 cppSTL 100 18.788 18.700 0.080 3 Rcpp 100 21.451 21.444 0.000 1 RawR 100 133.003 131.604 1.344 Consistently, both C++ versions are faster than raw R. Thank you very much indeed. P.S.: As I mentioned, I also will try the Eddelbuette's comment and asap report the benchmark for complementarity over the thread. Cheers, FH On 11 August 2015 at 11:16, Hao Ye <h...@ucsd.edu> wrote: > It also looks like you are not initializing the size of Gamete in the > "full Rcpp" version, so it is resizing on every loop iteration... > > Best, > -- > Hao Ye > h...@ucsd.edu > > On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <tke...@utexas.edu> wrote: > >> 1) Figure out where you are making full copies of your data structures >> (casting NumericVector to std::vector<double>) >> 2) Use eg R::rbinom instead of rbinom if you do not want the return as an >> Rcpp Vector type. >> 3) Move the outer loop (your replicate call) into C++ >> >> The copying of data is what is slowing things down most likely. >> >> T. >> >> http://www.keittlab.org/ >> >> On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtol...@gmail.com> >> wrote: >> >>> Dear All, >>> >>> It is my first post here at Rcpp-devel List. I'm passionate with the >>> awesome work and with all perspectives that it can opens for speed up codes >>> and embedding of C++ in R package seamless. Thus, I wish to thank all >>> developers. >>> >>> After, some reading over the Book, online vignettes and so on. I >>> translated one of my R scripts using two approaches: taking all advantages >>> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP >>> objects, working only with std:: objects and wrapping the return. >>> >>> So far so good, however, by my surprise I found huge difference >>> regarding the performance between both implementations. If it's not enough, >>> even the raw R code shown better performance compared to the "full Rcpp". >>> >>> I made a short example to show you the point. First of all let me >>> explain the process for who do not have any idea about meiosis: We take a >>> hypothetical genome (that gives us the SNPs positions, vector Map); then >>> based on this length we draw a Poisson value (where length is lambda), with >>> this value (e.g., n) we draw n uniform within the genome length. With this, >>> after some manipulations (findInterval % 2) we know where Crossing Over >>> (recombination) happened. Then we can take this prototype and generate a >>> gamete of some entry genotype (DNA tapes A and B). >>> >>> The codes are: >>> >>> Raw R script: >>> >>> ## Meiosis process through Poisson/Uniform >>> MeiosisR <- function( Map, TapeA, TapeB ) >>> { >>> ## Number of Crossing Overs **Poisson Draw** >>> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map))) >>> ## Assigning Crossing Overs on the Genome **Uniform Draw** >>> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max = >>> max(Map))) >>> ## Building the "Prototype" **mixture between the 2 haplotypes** >>> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2 >>> ## Raffle for the reference haplotype >>> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) ) >>> { >>> ## Building the gamete >>> Gamete <- ifelse(test = as.logical(Prototype), >>> yes = TapeA, >>> no = TapeB) >>> } else { >>> ## idem with other "guide" >>> Gamete <- ifelse(test = as.logical(Prototype), >>> yes = TapeB, >>> no = TapeA) >>> } >>> return( Gamete ) >>> } >>> >>> STL cpp version: >>> >>> // [[Rcpp::plugins(cpp11)]] >>> #include <algorithm> >>> #include <Rcpp.h> >>> >>> using namespace Rcpp; >>> >>> // [[Rcpp::export]] >>> std::vector<int> MeiosisSTL( std::vector<double> Map, >>> std::vector<int> TapeA, >>> std::vector<int> TapeB ) { >>> >>> // Obtaining Chromossome's parameter: length, number & positions of >>> SNPs >>> double L = *std::max_element(Map.begin(), Map.end()); >>> int Size = Map.size(); >>> >>> // RNG process: draw the number & positions of Crossing Overs >>> double CrossingNumber = as<double>( rpois(1, L) ); >>> std::vector<double> CrossingPositions = as<std::vector<double> >( >>> runif( CrossingNumber, 0.0, L ) ); >>> std::sort(CrossingPositions.begin(), CrossingPositions.end()); >>> >>> // Building a Gamete "Prototype" >>> std::vector<bool> Prototype( Size ); >>> >>> // "findInterval" adapted from Wickham's Rcpp Chapter ( >>> http://adv-r.had.co.nz/Rcpp.html) >>> std::vector<double>::iterator Iter, Positions; >>> std::vector<bool>::iterator ProtoIter; >>> std::vector<double>::iterator MapBegin = Map.begin(); >>> std::vector<double>::iterator MapEnd = Map.end(); >>> std::vector<bool>::iterator ProtoBegin = Prototype.begin(); >>> std::vector<double>::iterator CrossPosBegin = >>> CrossingPositions.begin(); >>> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end(); >>> >>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, >>> ++ProtoIter) { >>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); >>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or >>> even >>> } >>> >>> std::vector<int> Gamete; >>> Gamete.reserve( Size ); >>> // raffle: which tape shall be the "guide"? >>> bool v01 = as<bool>( rbinom(1, 1, .5) ); >>> >>> if ( v01 ) { >>> for ( int i = 0; i < Size; i++ ) { >>> // Build the Gamete from the Prototype >>> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( >>> i )) ); >>> } >>> } else { >>> for ( int i = 0; i < Size; i++ ) { >>> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( >>> i )) ); >>> } >>> } >>> >>> return Gamete ; >>> } >>> >>> Full Rcpp version: >>> >>> // [[Rcpp::plugins(cpp11)]] >>> #include <algorithm> >>> #include <Rcpp.h> >>> >>> using namespace Rcpp; >>> >>> // [[Rcpp::export]] >>> IntegerVector MeiosisRcpp( NumericVector Map, >>> IntegerVector TapeA, >>> IntegerVector TapeB ) { >>> >>> double L = max( Map ); // sugar :) >>> int Size = Map.size(); >>> >>> double CrossingNumber = as<double>( rpois(1, L) ); >>> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L ); >>> std::sort(CrossingPositions.begin(), CrossingPositions.end()); >>> >>> LogicalVector Prototype( Size ); >>> >>> NumericVector::iterator Iter, Positions; >>> LogicalVector::iterator ProtoIter; >>> NumericVector::iterator MapBegin = Map.begin(); >>> NumericVector::iterator MapEnd = Map.end(); >>> LogicalVector::iterator ProtoBegin = Prototype.begin(); >>> NumericVector::iterator CrossPosBegin = CrossingPositions.begin(); >>> NumericVector::iterator CrossPosEnd = CrossingPositions.end(); >>> >>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, >>> ++ProtoIter) { >>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter); >>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or >>> even >>> } >>> >>> IntegerVector Gamete; >>> bool v01 = as<bool>( rbinom(1, 1, .5) ); >>> >>> if ( v01 ) { >>> for ( int i = 0; i < Size; i++ ) { >>> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); >>> >>> } >>> } else { >>> for ( int i = 0; i < Size; i++ ) { >>> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) ); >>> } >>> } >>> >>> return Gamete ; >>> } >>> >>> If we take the entry as: >>> >>> ## Genome Setup >>> N <- 1E4 # number of gametes >>> nM <- 400 # number of Markers (SNPs) >>> L <- 4 # genome length (in Morgans) >>> >>> ## building the genome & full heterozigous genotype (diploid, 2 DNA >>> tapes) >>> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions >>> **regular** >>> map <- sort(runif(nM, 0, L)) # idem **random** >>> tapeA <- rep(1, nM) >>> tapeB <- rep(2, nM) >>> >>> You will found results like that: >>> >>> evaluated expressions: >>> >>> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB >>> = tapeB)) >>> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA, >>> TapeB = tapeB)) >>> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, >>> TapeB = tapeB)) >>> >>> - microbenchmark: >>> >>> Unit: seconds >>> expr min lq mean median uq max neval >>> cld >>> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 >>> 100 b >>> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100 >>> a >>> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565 >>> 100 c >>> >>> - rbenchmark: >>> >>> test replications elapsed user.self sys.self >>> 2 cppSTL 100 18.048 18.020 0.020 >>> 1 RawR 100 126.264 124.888 1.324 >>> 3 Rcpp 100 208.647 208.548 0.012 >>> >>> Just to report, my sessionInfo() is as follow: >>> >>> sessionInfo() # Useful Infos **Reproducible Research** >>> R version 3.2.1 (2015-06-18) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> Running under: Debian GNU/Linux 8 (jessie) >>> >>> locale: >>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>> [7] LC_PAPER=en_US.utf8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5 >>> >>> loaded via a namespace (and not attached): >>> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1 >>> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1 >>> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8 >>> [13] proto_0.3-10 munsell_0.4.2 >>> >>> After that, I wish to request help to evaluate those results. Is "full >>> Rcpp" not a good choice when using iterators? Let me know if I am not >>> taking care about something that may change the things. You may notice that >>> I left all parameters as fixed as possible, then, they should not influence >>> on that. >>> >>> I am open to share the files or give further information. >>> >>> Thank you very much indeed regarding any insight on that. >>> >>> Cheers, >>> FH >>> >>> _______________________________________________ >>> Rcpp-devel mailing list >>> Rcpp-devel@lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >>> >> >> >> _______________________________________________ >> Rcpp-devel mailing list >> Rcpp-devel@lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >> > >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel