El mié, 17-04-2013 a las 14:01 +0000, Smith, Dale (Norcross) escribió: > I agree with Dirk. Take a look at the arma::mat constructor in fastLm. If you > are not modifying M, use the raw memory constructor. > > http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/ > > I don't see any other way to avoid using Rcpp::wrap in the return statement. > Note there may be other places you copy memory, for instance, in Prod1 you > write > > | mat prodM = M ; > | prodM = trans(M) * M ; >
True. But in Prod3 I needed a pre-existent target array to store the result, so I defined it also in Prod1 and Prod2 to make things alike. Taking a copy of M is indeed a very clumsy way to initialize an array of the right size. > I think the first statement is not necessary. > > Dale Smith, Ph.D. > Senior Financial Quantitative Analyst > Risk & Compliance > Fiserv > Office: 678-375-5315 > www.fiserv.com > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dirk > Eddelbuettel > Sent: Wednesday, April 17, 2013 9:17 AM > To: [email protected] > Cc: [email protected] > Subject: Re: [Rcpp-devel] Best way to compute M'M if M is triangular using > RcppArmadillo > > > On 17 April 2013 at 14:24, F.Tusell wrote: > | Not a question strictly about Rcpp but hope this is a right place to > | ask. > | > | I am trying to find out what is the fastest possible way to compute > | M'M for M upper triangular. I have coded, > | > | // [[Rcpp::export]] > | SEXP Prod1(SEXP M_) { > | const mat M = as<mat>(M_) ; > | mat prodM = M ; > | prodM = trans(M) * M ; > | return(wrap(prodM)) ; > | } > | > | // [[Rcpp::export]] > | SEXP Prod2(SEXP M_) { > | const mat M = as<mat>(M_) ; > | mat prodM = M ; > | prodM = trimatu(M).t() * trimatu(M) ; > | return(wrap(prodM)) ; > | } > | > | > | // [[Rcpp::export]] > | SEXP Prod3(SEXP M_) { > | mat M = as<mat>(M_) ; > | int d = M.n_rows ; > | mat prodM = M ; > | double * vM = M.memptr() ; > | double * vprodM = prodM.memptr() ; > | const double one = 1 ; > | const double zero = 0 ; > | F77_CALL(dsyrk)("L","T",&d,&d,&one,vM,&d,&zero,vprodM,&d) ; > | return(wrap(prodM)) ; > | } > | > | and then tested with: > | > | require(RcppArmadillo) > | require(microbenchmark) > | sourceCpp("prods.cpp") > | d <- 50 > | a <- chol(crossprod(matrix(rnorm(d*d),d,d))) > | > | m <- microbenchmark( > | r1 <- Prod1(a), > | r2 <- Prod2(a), > | r3 <- Prod3(a), > | times=100 > | ) > | > | This is what I get: > | > | > m > | Unit: microseconds > | expr min lq median uq max neval > | r1 <- Prod1(a) 138.749 144.2260 148.4815 159.0830 2456.146 100 > | r2 <- Prod2(a) 296.193 320.0275 329.4770 342.4185 2763.041 100 > | r3 <- Prod3(a) 132.150 138.2590 140.9270 152.3675 218.719 100 > | > | Prod3 using BLAS dsyrk is about as fast as Prod1, using Armadillo. > | I expected that telling Armadillo that M is upper triangular would > | make for a fastest product, and the contrary seems true; Prod2 takes > | about twice as much time as Prod1 and Prod3. > | > | Is there a faster way? > > You are going about this the right way by starting with empirics. > > Now, d=50 is not big so your (relative) savings will be rather small. Plus, > the way you instantiate the Arma object DOES create extra copies on the arma > object instantiation [ see discussion this list just last week ] as well as > on exit -- which may dominate the timing. Try d=100, 200, ... 500 for > comparison. > > Lastly, you could write prod3 without going to arma. > > Dirk I did try larger d, and to my surprise the relative timings stay about the same. Conrad S provides one piece of information I was missing, that trimatu() and trimatl() only have (as yet) effect with inv() and solve(), hence their use in Prod3 is just a waste of time. This explains the results. For the time being, I will stick with trans(M) * M. A function like R's crossprod() in Armadillo would be great. Thank you all for your help. ft. _______________________________________________ Rcpp-devel mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
