I agree with Dirk. Take a look at the arma::mat constructor in fastLm. If you are not modifying M, use the raw memory constructor.
http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/ I don't see any other way to avoid using Rcpp::wrap in the return statement. Note there may be other places you copy memory, for instance, in Prod1 you write | mat prodM = M ; | prodM = trans(M) * M ; I think the first statement is not necessary. Dale Smith, Ph.D. Senior Financial Quantitative Analyst Risk & Compliance Fiserv Office: 678-375-5315 www.fiserv.com -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dirk Eddelbuettel Sent: Wednesday, April 17, 2013 9:17 AM To: [email protected] Cc: [email protected] Subject: Re: [Rcpp-devel] Best way to compute M'M if M is triangular using RcppArmadillo On 17 April 2013 at 14:24, F.Tusell wrote: | Not a question strictly about Rcpp but hope this is a right place to | ask. | | I am trying to find out what is the fastest possible way to compute | M'M for M upper triangular. I have coded, | | // [[Rcpp::export]] | SEXP Prod1(SEXP M_) { | const mat M = as<mat>(M_) ; | mat prodM = M ; | prodM = trans(M) * M ; | return(wrap(prodM)) ; | } | | // [[Rcpp::export]] | SEXP Prod2(SEXP M_) { | const mat M = as<mat>(M_) ; | mat prodM = M ; | prodM = trimatu(M).t() * trimatu(M) ; | return(wrap(prodM)) ; | } | | | // [[Rcpp::export]] | SEXP Prod3(SEXP M_) { | mat M = as<mat>(M_) ; | int d = M.n_rows ; | mat prodM = M ; | double * vM = M.memptr() ; | double * vprodM = prodM.memptr() ; | const double one = 1 ; | const double zero = 0 ; | F77_CALL(dsyrk)("L","T",&d,&d,&one,vM,&d,&zero,vprodM,&d) ; | return(wrap(prodM)) ; | } | | and then tested with: | | require(RcppArmadillo) | require(microbenchmark) | sourceCpp("prods.cpp") | d <- 50 | a <- chol(crossprod(matrix(rnorm(d*d),d,d))) | | m <- microbenchmark( | r1 <- Prod1(a), | r2 <- Prod2(a), | r3 <- Prod3(a), | times=100 | ) | | This is what I get: | | > m | Unit: microseconds | expr min lq median uq max neval | r1 <- Prod1(a) 138.749 144.2260 148.4815 159.0830 2456.146 100 | r2 <- Prod2(a) 296.193 320.0275 329.4770 342.4185 2763.041 100 | r3 <- Prod3(a) 132.150 138.2590 140.9270 152.3675 218.719 100 | | Prod3 using BLAS dsyrk is about as fast as Prod1, using Armadillo. | I expected that telling Armadillo that M is upper triangular would | make for a fastest product, and the contrary seems true; Prod2 takes | about twice as much time as Prod1 and Prod3. | | Is there a faster way? You are going about this the right way by starting with empirics. Now, d=50 is not big so your (relative) savings will be rather small. Plus, the way you instantiate the Arma object DOES create extra copies on the arma object instantiation [ see discussion this list just last week ] as well as on exit -- which may dominate the timing. Try d=100, 200, ... 500 for comparison. Lastly, you could write prod3 without going to arma. Dirk -- Dirk Eddelbuettel | [email protected] | http://dirk.eddelbuettel.com _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
