The comment of Doug Bates on this thread motivated me to better examine opportunities for algorithm improvements in my R package c++ code written under Rcpp and RcppArmadillo. I performed a few changes: firstly, i removed all general matrix inverse computations, replacing many with triangular solving from cholesky decompositions (e.g. to conduct posterior sampling from a multivariate gaussian where the precision matrix is readily available). secondly, instead of repeatedly performing high-dimensional matrix computations composing sampled parameter and design objects during a sequential sampling scan, I made the computation once per iteration and then operated on the resulting large object by repeatedly extracting portions, pre-sampling, and inserting back the post-sampled result during a sequential scan. thirdly, while linear algebra compositions of parameters and data objects must be rendered repeatedly across posterior sampling iterations, compositions of design/data objects (with each other) may be performed only once and re-used. I had been sloppy and was repeatedly performing computations purely involving data objects inside the sampling loops. I replaced this approach by slicing and dicing the design objects and performing the matrix operations, once, and then repeatedly inserting the objects during sampling. While the first two steps - particularly the second - produced some runtime speed improvements, I was surprised that the third step produced only a very small improvement where I had expected a large reduction in runtime. I used RcppArmadillo and placed these sliced and diced matrix objects into fields from which I then extracted elements during sampling. So I replaced matrix computations with extractions from field containers. Is it possible that extraction speeds from fields and multiplying a chain of moderate-dimensioned matrices (e.g. 5 x 600) are of equivalent computational intensity? Was the compiler compensating for my hackery?
On Tue, Dec 11, 2012 at 7:07 AM, Honglang Wang <wanghonglang2...@gmail.com>wrote: > I tried out that if the dim of the matrix less than or equal to 4, inv() > and solve() have the same speed. > > > Best wishes! > > Honglang Wang > > Office C402 Wells Hall > Department of Statistics and Probability > Michigan State University > 1579 I Spartan Village, East Lansing, MI 48823 > wangh...@msu.edu > > > > On Mon, Dec 10, 2012 at 8:05 AM, c s <conradsand.a...@gmail.com> wrote: > >> Hi Honglang, >> >> I recommend looking into using the solve() function in Armadillo, instead >> of inv(). Using solve() will be considerably faster. >> >> More details here: >> http://arma.sourceforge.net/docs.html#solve >> >> >> >> On Thursday, December 6, 2012, Honglang Wang <wanghonglang2...@gmail.com> >> wrote: >> > The following is a full example although I don't know whether it's >> minimal or not: >> > >> > library(Rcpp) >> > library(RcppArmadillo) >> > sourceCpp("betahat_mod.cpp") >> > >> > #The following is data generation. >> > n=200 >> > m=20 >> > p=2 >> > t=runif(m*n,min=0, max=1) >> > X1=rnorm(m*n,0,1) >> > X1=as.matrix(1+2*exp(t)+X1) >> > X2=rnorm(m*n,0,1) >> > X2=as.matrix(3-4*t^2+X2) >> > X=cbind(X1,X2) >> > beta1=0.5*sin(t) >> > beta2=beta1 >> > rho=0.2 >> > sig2=1 >> > >> eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))})) >> > y=as.matrix(X1*beta1+X2*beta2+eps) >> > >> > #A simple kernel function. >> > ker=function(x,h) >> > { >> > ans=x >> > lo=(abs(x)<h) >> > ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h >> > ans[!lo]=0 >> > return(ans) >> > } >> > >> > >> > h=0.3 >> > >> > #assess the time for evaluating bethat of 100 t's: >> > system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x) >> betahat(ker,x,X,y,t,h,m)$betahat)))) >> > >> > And the .cpp file is the following: >> > >> > // [[Rcpp::depends(RcppArmadillo)]] >> > >> > #include <RcppArmadillo.h> >> > >> > using namespace Rcpp; >> > >> > // [[Rcpp::export]] >> > List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix >> yr, NumericVector tr, double h, int m) { >> > int n = Xr.nrow(), p = Xr.ncol(); >> > arma::mat X(Xr.begin(), n, p, false); >> > arma::mat y(yr.begin(), n, 1, false); >> > arma::colvec t(tr.begin(), tr.size(), false); >> > arma::mat T = X; >> > T.each_col() %= (t-t0)/h; >> > arma::mat D = arma::join_rows(X,T); >> > arma::vec kk =as<arma::vec>(ker(tr-t0,h)); >> > arma::mat W = (arma::diagmat(kk))/m; >> > arma::mat Inv = arma::trans(D)*W*D; >> > arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y; >> > arma::colvec betahat0(betahat.begin(),betahat.size()/2,false); >> > return List::create(Named("betahat") = betahat0); >> > } >> > >> > Best wishes! >> > >> > Honglang Wang >> > >> > Office C402 Wells Hall >> > Department of Statistics and Probability >> > Michigan State University >> > 1579 I Spartan Village, East Lansing, MI 48823 >> > wangh...@msu.edu >> > >> > >> > On Wed, Dec 5, 2012 at 4:07 AM, Christian Gunning <x...@unm.edu> wrote: >> >> >> >> Can you post a minimal full example? >> >> -Christian >> >> >> >> On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang >> >> <wanghonglang2...@gmail.com> wrote: >> >> > Yes, the main issue for my coding is the allocation of memory. And >> I have >> >> > fixed one of the biggest memory allocation issue: 4000 by 4000 >> diagonal >> >> > matrix. And since I am not familiar with Rcpp and RcppArmadillo, I >> have no >> >> > idea how to reuse the memory. I hope I can have some materials to >> learn >> >> > this. Thanks. >> >> > >> >> > >> >> >> >> >> >> What exactly do these timings show? A single call you your >> function? >> >> >> How many calls? >> >> >> >> >> > Here I called my function for 100 times. >> >> > >> >> >> >> >> >> Building on Romain's point: -- a portion of your function's runtime >> is >> >> >> in memory allocation >> >> >> (and you have a lot of allocations here). >> >> >> If you're calling your function thousands or millions of times, then >> >> >> it might pay to closely >> >> >> examine your memory allocation strategies and figure out what's >> >> >> temporary, for example. >> >> >> It looks like you're already using copy_aux_mem = false in a number >> >> >> of places, but you're >> >> >> allocating a lot of objects -- of approx what size? >> >> >> >> >> >> For example, wouldn't this work just as well with one less >> allocation? >> >> >> arma::vec kk = t; >> >> >> arma::uvec q1 = arma::find(arma::abs(tp)<h); >> >> >> kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75; >> >> >> // done with q1. let's reuse it. >> >> >> q1 = arma::find(arma::abs(tp)>=h); >> >> >> // was q2 >> >> >> kk.elem(q1).zeros(); >> >> >> >> >> >> You could potentially allocate memory for temporary working space in >> >> >> R, grab it with copy_aux_mem = false, write your temp results there, >> >> >> and reuse these objects in subsequent function calls. It doesn't >> make >> >> >> sense to go to this trouble, though, if your core algorithm consumes >> >> >> the bulk of runtime. >> >> >> >> >> >> Have you looked on the armadillo notes r.e. inv? Matrix inversion >> has >> >> >> O(>n^2). You may be aided by pencil-and-paper math here. >> >> >> http://arma.sourceforge.net/docs.html#inv >> >> >> >> >> > Here my matrix for inverse is only 4 by 4, so I think it's ok. >> >> > >> >> >> >> >> >> best, >> >> >> Christian >> >> >> >> >> >> > Dear All, >> >> >> > I have tried out the first example by using RcppArmadillo, but I >> am not >> >> >> > sure whether the code is efficient or not. And I did the >> comparison of >> >> >> > the >> >> >> > computation time. >> >> >> > >> >> >> > 1) R code using for loop in R: 87.22s >> >> >> > 2) R code using apply: 77.86s >> >> >> > 3) RcppArmadillo by using for loop in C++: 53.102s >> >> >> > 4) RcppArmadillo together with apply in R: 47.310s >> >> >> > >> >> >> > It is kind of not so big increase. I am wondering whether I used >> an >> >> >> > inefficient way for the C++ coding: >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama! >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama! >> > >> > >> > > > _______________________________________________ > Rcpp-devel mailing list > Rcpp-devel@lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > -- Thank you, Terrance Savitsky
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel