Re: [R-pkg-devel] Issue "Re-building vignettes had CPU time 4.1 times elapsed time"

Ivan Krylov via R-package-devel Thu, 29 Jan 2026 06:27:33 -0800

В Thu, 29 Jan 2026 11:36:00 +1100
Nicholas Tierney <[email protected]> пишет:


> I also note that I cannot reproduce the same error with:
> 
> ```
> rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran",
> "--no-multiarch"), env =
> "_R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ = 53")
> ```

This threshold is very large. It makes more sense to lower it, not
raise, in order to make the test more sensitive. On a machine with 16
hyper-threads, 

_R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=1.1 \
 R CMD check --timings greta_0.5.1.tar.gz

...reproduces the NOTE. (The more CPUs on your machine, the more the
chance that something trying to use all of them will cause a noticeable
spike in CPU time. The machines running CRAN checks have even more
logical CPUs.) In particular,

list.files("vignettes", full.names=TRUE, patt = '[.]Rmd$') |>
 setNames(nm = _) |> lapply(\(.)
  tools::buildVignette(., encoding='UTF-8') |> system.time()
 )

shows that it's the get_started.Rmd vignette (62.765 s userspace CPU
time vs. 27.745 s elapsed time).

What exactly starts these threads? 'pthread_create' is not the complete
answer, but it usually points out some suspects.

ltrace -w 10 -f -e pthread_create -- \
 env R_LIBS_USER=greta.Rcheck:%U \
 R -f greta.Rcheck/greta/doc/get_started.R

...There are a lot of suspects.

Here is a copy of OpenBLAS bundled with a Python wheel that 'greta'
installs starting 16 threads when greta:::check_tf_version is called:

> knitr::opts_chunk$set(echo = TRUE,
+                       eval = greta:::check_tf_version("message"),
+                       purl = greta:::check_tf_version("message"),
+                       cache = TRUE,
+                       comment = NA,
+                       progress = FALSE)
...
[pid 25013] 
libopenblas64_p-r0-0cf96a72.3.23.dev.so->pthread_create(0x7f36ed7ccd80, 0, 
0x7f36ebb50660, 0) = 0
                        blas_thread_init.part.0 (ip = 0x7f36ebb50a2a)
                        blas_thread_init (ip = 0x7f36ebb50ba9)
                        gotoblas_init (ip = 0x7f36eb91807f)
                        __nptl_change_stack_perm (ip = 0x7f36ef0959ce)
                        __nptl_change_stack_perm (ip = 0x7f36ef095ab4)
                        _dl_catch_exception (ip = 0x7f36eeed9274)
                        _dl_rtld_di_serinfo (ip = 0x7f36ef09c21e)
                        _dl_catch_exception (ip = 0x7f36eeed921a)
                        _dl_find_dso_for_object (ip = 0x7f36ef09c5b8)
                        dlmopen (ip = 0x7f36eee0f4b8)

By the time the following code block runs and tries to set
OMP_THREAD_LIMIT=2, it's too late. (It's usually too late to set
OpenMP-related variables inside an R script, because the OpenMP runtime
reads them once at process startup, before any R code can run.)

It might help to set OPENBLAS_NUM_THREADS=2 first, then implicitly load
greta by calling greta:::check_tf_version(...). Though judging by later
results, it's probably a red herring.

Here's node.js creating 16 worker threads through the 'V8' package:

> DiagrammeR::export_graph(attr(gr, "dgr_graph"),
+                          file_name = "figures/full_graph.png",
+                          file_type = "png",
+                          width = 958 * 2,
+                          height = 450 * 2)
[pid 24967] libnode.so.108->pthread_create(0x555bfee583c0, 0x7ffde39e7c50, 
0x7fc7a9f49280, 0x555bfee582f0) = 0
                        _ZN2v84base6Thread5StartEv (ip = 0x7fc7a9f4a424)
                        
_ZN2v88platform30DefaultWorkerThreadsTaskRunner12WorkerThreadC1EPS1_ (ip = 
0x7fc7a95fab76)
                        
_ZN2v88platform30DefaultWorkerThreadsTaskRunnerC2EjPFdvE (ip = 0x7fc7a95fadbc)
                        
_ZN2v88platform15DefaultPlatform37EnsureBackgroundTaskRunnerInitializedEv (ip = 
0x7fc7a95f861e)
                        
_ZN2v88platform18NewDefaultPlatformEiNS0_15IdleTaskSupportENS0_21InProcessStackDumpingESt10unique_p
 (ip = 0x7fc7a95f8849)
                        R_init_V8 (ip = 0x7fc7b001d609)
                        AddDLL (ip = 0x555baec0a5f1)
                        do_dynload (ip = 0x555baec0d0d1)
                        bcEval_loop (ip = 0x555baecd7100)
                        bcEval (ip = 0x555baece5477)

Here's librsvg -> pango creating worker threads:

> DiagrammeR::export_graph(attr(gr, "dgr_graph"),
+                          file_name = "figures/coef_graph.png",
+                          file_type = "png",
+                          width = 325 * 2,
+                          height = 123 * 2)
[pid 24967] libglib-2.0.so.0->pthread_create(0x555bff3ad6d0, 0x7ffde39e6c60, 
0x7fc7a0003650, 0x555bff3ad6a0) = 0
                        g_mutex_init (ip = 0x7fc7a0003fb5)
                        g_thread_new (ip = 0x7fc79ffd9fa8)
                        pango_fc_font_map_get_config (ip = 0x7fc79fae816f)
                        pango_glyph_string_x_to_index (ip = 0x7fc7a00b5803)
                        pango_layout_line_get_extents (ip = 0x7fc7a00c583e)
                        pango_layout_index_to_line_x (ip = 0x7fc7a00c7efe)
                        pango_layout_get_size (ip = 0x7fc7a00c8126)
                        rsvg_pixbuf_from_file_at_max_size (ip = 0x7fc7a074e6d2)
                        rsvg_pixbuf_from_file_at_max_size (ip = 0x7fc7a06661c6)
                        rsvg_error_get_type (ip = 0x7fc7a05f869e)

Do they consume a lot of CPU? Probably not.

Here's Tensorflow creating 16 worker threads:

> draws <- mcmc(m, n_samples = 1000)
[pid 24967] libtensorflow_framework.so.2->pthread_create(0x555bff178668, 
0x7ffde39e8ed8, 0x7fc7eb867150, 0x555bfea36d30) = 0
                        
_ZN3tsl12_GLOBAL__N_18PosixEnv11StartThreadERKNS_13ThreadOptionsERKNSt7__cxx1112basic_stringIcSt11c
 (ip = 0x7fc7eb866567)
                        
_ZN3tsl6thread16EigenEnvironment12CreateThreadESt8functionIFvvEE (ip = 
0x7fc7eacd0a39)
                        
_ZN5Eigen15ThreadPoolTemplIN3tsl6thread16EigenEnvironmentEEC2EibS3_ (ip = 
0x7fc7eaccefaa)
                        
_ZN3tsl6thread10ThreadPoolC2EPNS_3EnvERKNS_13ThreadOptionsERKNSt7__cxx1112basic_stringIcSt11char_tr
 (ip = 0x7fc7eacce92c)
                        
_ZN10tensorflow11LocalDevice19EigenThreadPoolInfoC2ERKNS_14SessionOptionsEiPN3tsl9AllocatorE
 (ip = 0x7fc7ebcbacfc)
                        
_ZN10tensorflow11LocalDeviceC2ERKNS_14SessionOptionsERKNS_16DeviceAttributesE 
(ip = 0x7fc7ebcba857)
                        
_ZN10tensorflow16ThreadPoolDeviceC1ERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt11char_tra
 (ip = 0x7fc7ebc1fc36)
                        
_ZN10tensorflow29GPUCompatibleCPUDeviceFactory13CreateDevicesERKNS_14SessionOptionsERKNSt7__cxx1112
 (ip = 0x7fc7ebd30b64)
                        
_ZN10tensorflow13DeviceFactory13AddCpuDevicesERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt
 (ip = 0x7fc7eb60eabc)
                        
_ZN10tensorflow13DeviceFactory10AddDevicesERKNS_14SessionOptionsERKNSt7__cxx1112basic_stringIcSt11c
 (ip = 0x7fc7eb60eb92)

It _says_ it runs on "up to 2 CPU cores", but empirically it consumes
up to ~400% CPU (4 out of 16 logical CPUs on this computer) while
mcmc(...) is running. Judging by the profiling data, most of the CPU
usage is due to tensorflow/Eigen running multiple threads during the
call to mcmc(), possibly due to the default argument chains = 4?

-- 
Best regards,
Ivan

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Issue "Re-building vignettes had CPU time 4.1 times elapsed time"

Reply via email to