On Thu, Jun 14, 2012 at 4:43 AM, Dirk Eddelbuettel <e...@debian.org> wrote: > And you should find Eigen to be a little faster. Andreas Alfons went as far > as building 'robustHD' using RcppArmadillo with a drop-in for RcppEigen > (in package 'sparseLTSEigen'; both package names from memmory and > I may have mistyped). He reported a performance gain of around 25% for > his problem sets. On the 'fastLm' benchmark, we find the fast Eigen-based > decompositions to be much faster than Armadillo.
This is a mis-conception that needs to be addressed. For equivalent functionality, Armadillo is not necessarily any slower than Eigen, given suitable Lapack and/or Blas libraries are used (such as Intel's MKL or AMD's ACML, or even the open-source Atlas or OpenBlas in many cases). Standard Lapack and Blas are just that: a "better than nothing" baseline implementation in terms of performance. Armadillo doesn't reimplement Lapack and it doesn't reimplement any decompositions -- it uses Lapack. (**This is a very important point, which I elaborate on below**). As such, the speed of Armadillo for matrix decompositions is directly dependant on the particular implementation of Lapack that's installed on the user's machine. I've seen some ridiculous speed differences between standard Lapack and MKL. The latter not only has CPU-specific optimisations (eg. using the latest AVX extensions), but can also do multi-threading. Simply installing ATLAS (which provides speed-ups for several Lapack functions) on Debian/Ubuntu systems can already make a big difference. (Debian & Ubuntu use a trick to redirect Lapack and Blas calls to ATLAS). Under Mac OS X, the Accelerate framework provides fast implementations of Lapack and Blas functions (eg. using multi-threading). I've taken the modular approach to Armadillo (ie. using Lapack rather than reimplementing decompositions), as it specifically allows other specialist parties (such as Intel) to provide Lapack that is highly optimised for particular architectures. I myself would not be able to keep up with the specific optimisations required for each CPU. This also "future-proofs" Armadillo for each new CPU generation. More importantly, numerically stable implementation of computational decompositions/factorisations is notoriously difficult to get right. The core algorithms in Lapack have been evolving for the past 20+ years, being exposed to a bazillion corner-cases. Lapack itself is related to Linpack and Eispack, which are even older. I've been exposed to software development long enough to know that in the end only time can shake out all the bugs. As such, using Lapack is far less risky than reimplementing decompositions from scratch. A "home-made" matrix decomposition might be a bit faster on a particular CPU, but you have far less knowledge as to when it's going to blow up in your face. High-performance variants of Lapack, such as MKL, take an existing proven implementation of a decomposition algorithm and recode parts of it in assembly, and/or parallelise other parts. _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel