On 16/11/2014 00:11, Michael Hannon wrote:
Greetings.  I'd like to get some advice about using OpenBLAS with R, rather
than using the BLAS that comes built in to R.

That was really a topic for the R-devel list: see the posting guide.

I've tried this on my Fedora 20 system (see the appended for details).  I ran
a simple test -- multiplying two large matrices -- and the results were very
impressive, i.e., in favor of OpenBLAS, which is consistent with discussions
I've seen on the web.

If that is all you do, then you should be using an optimized BLAS, and choose the one(s) best for your (unstated) machine(s).

My concern is that maybe this is too good to be true.  I.e., the standard R
configuration is vetted by thousands of people every day.  Can I have the same
degree of confidence with OpenBLAS that I have in the built-in version?

No. And it is 'too good to be true' for most users of R, for whom BLAS operations take a negligible proportion of their CPU time.

And/or are there other caveats to using OpenBLAS of which I should be aware?

Yes: see the 'R Installation and Administration Manual'. Known issues include:

1) Optimized BLAS trade accuracy for speed. Surprisingly much published R code relies on using extended-precision FPU registers for intermediate results, which optimized BLAS do much less than the reference BLAS.

Some packages rely on a particular sign of the solution to svd or eigen problems: people then report as bugs that optimized BLAS give a different sign from the reference BLAS.

2) Fast BLAS normally use multi-threading: that usually helps elapsed time for a single R task at the expense of increased total CPU time. Fine if you have unused CPU cores, but not advantageous in a fully-used multi-core machine, e.g. one that is doing many R sessions in parallel.

3) Many BLAS optimize their use of CPU caches. This works best if the BLAS-using process is the only task running on a particular core (or CPU where CPU cores share cache). (It also means that optimizing on one CPU model and running on another can be disastrous.)



Thanks.

-- Mike

#### Here's the version of R, compiled locally with configuration options:
#### ./configure --enable-R-shlib --enable-BLAS-shlib

$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
.
.
.

#### Here's the R source code for this little test:

library(microbenchmark)

mSize <- 10000
set.seed(42)

aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)

cMat <- aMat %*% bMat  ## do the calculation once to see that it works

traceCMat <- sum(diag(cMat))  ## a mild sanity check on the calculation
traceCMat

microbenchmark(aMat %*% bMat, times=5L)  ## repeat a few more times

-----

#### Here is the output from code, running under various conditions:

traceCMat ###### Using the built-in BLAS from R
[1] -11367.55
microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
           expr      min       lq     mean   median       uq     max neval
  aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662     5

----------

traceCMat  ###### Using libopenblas.so from Fedora
[1] -11367.55
microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
           expr      min       lq     mean   median       uq      max neval
  aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475     5


----------

traceCMat <- sum(diag(cMat))  ###### libopenblas.so from Fedora with
traceCMat                     ###### export OMP_NUM_THREADS=6
[1] -11367.55
microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
           expr      min       lq    mean   median       uq      max neval
  aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866     5


###### Fedora libopenblas.so appears to be single-threaded

----------

traceCMat <- sum(diag(cMat))  ###### libopenblas.so compiled locally
traceCMat                     ###### from source w/OMP_NUM_THREADS=6
[1] -11367.55
microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
           expr      min       lq     mean   median       uq      max neval
  aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705     5


###### Locally-compiled openblas appears to be multi-threaded
###### The microbenchmark appeared to use all 8 processors, even
###### though I asked for only 6.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to