Hi Dirk, sessionInfo() was the right clue. Indeed the version of R on machine B was not linked to OpenBLAS. Switching to a version with OpenBLAS allows the test code to use all cores.
A clear way to check which library is linked is to run the following: > extSoftVersion()["BLAS"] Thanks for your help! On Sat, Feb 24, 2024 at 9:17 AM Dirk Eddelbuettel <e...@debian.org> wrote: > > On 24 February 2024 at 11:44, Robin Liu wrote: > | Thank you Dirk for the response. > | > | I called RcppArmadillo::armadillo_get_number_of_omp_threads() on both > machines > | and correctly see that machine A and B have 20 and 40 cores, > respectively. I > | also see that calling the setter changes this value. > | > | However, calling the setter does not seem to change the number of cores > used on > | either machine A or B. I have updated my code example as below: the > execution > | uses 20 cores on machine A and 1 core on machine B as before, despite my > | setting the number of omp threads to 5. Do you have any further hints? > > I fear you need to debug that on the machine 'B' in question. It's all open > source. I do not think either Conrad or myself put code in to constrain > you > to one core on 'B' (and then doesn't as you see on 'A'). > > You can grep around both the RcppArmadillo wrapper code and the include > Armadillo code, I suggest making a local copy and peppering in some print > statements. > > Also keep in mind that (Rcpp)Armadillo hands off to computation to the > actual > LAPACK / BLAS implementation on that machine. Lots of things can go wrong > there: maybe R was compiled with its own embedded BLAS/LAPACK sources > (preventing a call out to OpenBLAS even when the machine has it). Or maybe > R > was compiled correctly but a single-threaded set of libraries is on the > machine. > > You have not supplied any of that information. Many bug report suggestions > hint that showing `sessionInfo()` helps -- and it does show the BLAS/LAPACK > libraries. You are not forced to show us this, but by not showing us you > prevent us from being more focussed on suggestions. So maybe start at your > end by glancing at sessionInfo() on A and B? > > Dirk > > > | library(RcppArmadillo) > | library(Rcpp) > | > | RcppArmadillo::armadillo_set_number_of_omp_threads(5) > | print(sprintf("There are %d threads", > | RcppArmadillo::armadillo_get_number_of_omp_threads())) > | > | src <- > | r"(#include <RcppArmadillo.h> > | > | // [[Rcpp::depends(RcppArmadillo)]] > | > | // [[Rcpp::export]] > | arma::vec getEigenValues(arma::mat M) { > | return arma::eig_sym(M); > | })" > | > | size <- 10000 > | m <- matrix(rnorm(size^2), size, size) > | m <- m * t(m) > | > | # This line compiles the above code with the -fopenmp flag. > | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE) > | result <- getEigenValues(m) > | print(result[1:10]) > | > | On Fri, Feb 23, 2024 at 12:53 PM Dirk Eddelbuettel <e...@debian.org> > wrote: > | > | > | On 23 February 2024 at 09:35, Robin Liu wrote: > | | Hi all, > | | > | | Here is an R script that uses Armadillo to decompose a large > matrix and > | print > | | the first 10 eigenvalues. > | | > | | library(RcppArmadillo) > | | library(Rcpp) > | | > | | src <- > | | r"(#include <RcppArmadillo.h> > | | > | | // [[Rcpp::depends(RcppArmadillo)]] > | | > | | // [[Rcpp::export]] > | | arma::vec getEigenValues(arma::mat M) { > | | return arma::eig_sym(M); > | | })" > | | > | | size <- 10000 > | | m <- matrix(rnorm(size^2), size, size) > | | m <- m * t(m) > | | > | | # This line compiles the above code with the -fopenmp flag. > | | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE) > | | result <- getEigenValues(m) > | | print(result[1:10]) > | | > | | When I run this code on server A, I see that arma can implicitly > leverage > | all > | | available cores by running top -H. However, on server B it can > only use > | one > | | core despite multiple being available: there is just one process > entry in > | top > | | -H. Both processes successfully exit and return an answer. The > process on > | | server B is of course much slower. > | > | It is documented in the package how this is applied and the policy > is to > | NOT > | blindly enforce one use case (say all cores, or half, or a magically > chosen > | value of N for whatever value of N) but to follow the local admin > setting > | and > | respecting standard environment variables. > | > | So I suspect that your machine 'B' differs from machine 'A' in this > | regards. > | > | Not that this is a _run-time_ and not _compile-time_ behavior. As it > is for > | multicore-enabled LAPACK and BLAS libraries, the OpenMP library and > | basically > | most software of this type. > | > | You can override it, see > | RcppArmadillo::armadillo_set_number_of_omp_threads > | RcppArmadillo::armadillo_get_number_of_omp_threads > | > | Can you try and see if these help you? > | > | Dirk > | > | | Here is the compilation on server A: > | | /usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so' > | | 'file197c21cbec564.cpp' > | | g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG > -I../inst/include > | | -fopenmp -I"/usr/local/lib/R/site-library/Rcpp/include" > -I"/usr/local/ > | lib/R/ > | | site-library/RcppArmadillo/include" -I"/tmp/RtmpwhGRi3/ > | | sourceCpp-x86_64-pc-linux-gnu-1.0.9" -I/usr/local/include -fpic > -g -O2 > | | -fstack-protector-strong -Wformat -Werror=format-security > -Wdate-time > | | -D_FORTIFY_SOURCE=2 -g -c file197c21cbec564.cpp -o > file197c21cbec564.o > | | g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o > | | sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas > -lgfortran > | -lm > | | -lquadmath -L/usr/local/lib/R/lib -lR > | | > | | and here it is for server B: > | | /sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o > 'sourceCpp_2.so' > | | 'file158165b9c4ae1.cpp' > | | g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG > -I../inst/ > | include > | | -fopenmp -I"/home/my_username/.R/library/Rcpp/include" -I"/home/ > | my_username > | | /.R/library/RcppArmadillo/include" -I"/tmp/RtmpvfPt4l/ > | | sourceCpp-x86_64-pc-linux-gnu-1.0.10" -I/usr/local/include -fpic > -g > | -O2 -c > | | file158165b9c4ae1.cpp -o file158165b9c4ae1.o > | | g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib > -L/usr/local/lib64 > | -o > | | sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas > -lgfortran > | -lm > | | -lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR > | | > | | I thought that the -fopenmp flag should let arma implicitly > parallelize > | matrix > | | computations. Any hints as to why this may not work on server B? > | | > | | The actual code I'm running is an R package that includes > RcppArmadillo > | and > | | RcppEnsmallen. Server B is the login node to an hpc cluster, but > the code > | does > | | not use all cores on the compute nodes either. > | | > | | Best, > | | Robin > | | _______________________________________________ > | | Rcpp-devel mailing list > | | Rcpp-devel@lists.r-forge.r-project.org > | | > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel > | > | -- > | dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > | > > -- > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel