В Tue, 24 Oct 2023 10:37:48 +0000 "Helske, Jouni" <jouni.hel...@jyu.fi> пишет:
> Examples with CPU time > 2.5 times elapsed time > user system elapsed ratio > exchange 1.196 0.04 0.159 7.774 I've downloaded the archived copy of the package from the CRAN FTP server, installed it and tried: library(bssm) Sys.setenv("OMP_THREAD_LIMIT" = 2) data("exchange") model <- svm( exchange, rho = uniform(0.97,-0.999,0.999), sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2) ) system.time(particle_smoother(model, particles = 500)) # user system elapsed # 0.515 0.000 0.073 I set a breakpoint on clone() [*] and got quite a few calls creating OpenMP threads with the following call stack: #0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52 <...> #4 0x00007ffff7314e0a in GOMP_parallel () from /usr/lib/x86_64-linux-gnu/libgomp.so.1 <-- RcppArmadillo code below #5 0x00007ffff38f5f00 in arma::eglue_core<arma::eglue_div>::apply<arma::Mat<double>, arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at .../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69 #6 0x00007ffff3a31246 in arma::Mat<double>::operator=<arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>, arma::eop_square>, arma::eglue_div> (X=..., this=0x7fffffff36f0) at .../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226 #7 arma::Col<double>::operator=<arma::eGlue<arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>, arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>, arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fffffff36f0) at .../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535 <-- bssm code below #8 ssm_ung::laplace_iter (this=0x7fffffff15e0, signal=...) at model_ssm_ung.cpp:310 #9 0x00007ffff3a36e9e in ssm_ung::approximate (this=0x7fffffff15e0) at .../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27 #10 0x00007ffff3a3b3d3 in ssm_ung::psi_filter (this=this@entry=0x7fffffff15e0, nsim=nsim@entry=500, alpha=..., weights=..., indices=...) at model_ssm_ung.cpp:517 #11 0x00007ffff3948cd7 in psi_smoother (model_=..., nsim=nsim@entry=500, seed=seed@entry=1092825895, model_type=model_type@entry=3) at R_psi.cpp:131 What does arma::eglue_core do? (gdb) list /* reformatted a bit */ library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64 int n_threads = (std::min)( int(arma_config::mp_threads), int((std::max)(int(1), int(omp_get_max_threads()))) ); (gdb) p arma_config::mp_threads $3 = 8 (gdb) p (int)omp_get_max_threads() $4 = 16 (gdb) p (char*)getenv("OMP_THREAD_LIMIT") $7 = 0x555556576b91 "2" (gdb) p /x (int)omp_get_thread_limit() $9 = 0x7fffffff Sorry for misinforming you about the OMP_THREAD_LIMIT environment variable: the OpenMP specification requires the program to ignore modifications to the environment variables after the program has started [**], so it only works if R is started with OMP_THREAD_LIMIT set. Additionally, the OpenMP thread limit is not supposed to be adjusted at runtime at all [***]. Unfortunately for our situation, Armadillo is very insistent in setting its own number of threads from arma_config::mp_threads (which is constexpr 8 unless you set preprocessor directives while compiling it) and omp_get_max_threads (which is the upper bound on the number of threads that cannot be adjusted at runtime). What I'm about to suggest is a terrible hack, but since Armadillo seems to lack the option to set the number of threads at runtime, there might be no other option. Before you #include an Armadillo header, every time: 1. #include <omp.h> so that the OpenMP functions are declared and the #include guard is set 2. Define a static inline function get_number_of_threads returning the desired number of threads as an int (e.g. referencing an extern int number_of_threads stored elsewhere) 3. #define omp_get_max_threads get_number_of_threads Now if you provide an API for the R code to get and set this number, it should be possible to control the number of threads used by OpenMP code in Armadillo. Basically, a data.table::setDTthreads() for the copy of Armadillo inlined inside your package. If you then compile your package with a large #define ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads *and* limit itself when needed. An alternative course of action is compiling your package with #define ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls to Armadillo. -- Best regards, Ivan [*] https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814 [**] https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-59000021 [***] https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15 ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel