On Jul 27, 2025, at 00:33, Mark Millard <mark...@yahoo.com> wrote:

> On Jul 23, 2025, at 01:42, Mark Millard <mark...@yahoo.com> wrote:
> 
>> In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
>> 512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
>> point ends up with reports like:
>> 
>> swp_pager_getswapspace(22): failed
>> 
>> and:
>> 
>> was killed: failed to reclaim memory
>> 
>> for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
>> in use, 32 FreeBSD cpus, etc.
>> 
>> For example:
>> 
>> . . .
>> Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: 
>> exited on signal 11 (core dumped)
>> Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
>> Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
>> Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
>> Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
>> Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was 
>> killed: failed to reclaim memory
>> Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
>> Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
>> Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was 
>> killed: failed to reclaim memory
>> Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
>> Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
>> Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was 
>> killed: failed to reclaim memory
>> 
>> I've not figured out a way to track down such messages
>> back to the relevant log file for the builds that were
>> killed. Neither the pid, nor the jid appear in
>> the log files. Similarly, nothing in /var/log/messages
>> identifies the poudriere Job Id or other such.
>> 
>> (I've never happened to be actively monitoring when
>> the issue happened. So I've always ended up looking at
>> it after the fact.)
>> 
>> It would be nice to be able to identify what specific
>> packages to try to rebuild for these --and to investigate
>> why the SWAP usage that had stayed under 2 GiByte ended
>> up reaching 512 GiBytes during that period.
> 
> A panic from the activity during another bulk -Ca
> test lead to the dump providing enough context to
> track down the package that was being built that
> got the issue and what is was running that, in
> turn, has the problem memory usage:
> 
> [2D:01:22:29] [06] [00:00:00] Building   graphics/sdl2_gpu | sdl2_gpu-0.12.0
> 
> was using:
> 
> UID   PID  PPID  C PRI NI       VSZ      RSS MWCHAN   STAT TT          TIME 
> COMMAND
> . . .
> 0 79229 40923  4  59  0     23524     4148 wait     D     -       0:00.00 [sh]
> 0 79230 79229  5  59  0     14208      172 wait     Ds    -       0:00.01 
> [make]
> 0 79233 79230  4  59  0     14668      176 wait     D     -       0:00.00 [sh]
> 0 79234 79233  5  59  0     14668      176 wait     D     -       0:00.00 [sh]
> 0 79235 79234 12   0  0     16284      356 select   D     -       0:00.01 
> [ninja]
> 0 79236 79235 28  59  0    223048     1052 uwait    D     -       0:00.44 
> [doxygen]
> 0 79272 79236 25  59  0 157589964 41424308 pfault   D     -       3:25.33 
> [dot]
> 0 79279 79236 31  59  0 157601740 41513520 pfault   D     -       3:23.41 
> [dot]
> 0 79289 79236 14  59  0 157589964 41361600 pfault   D     -       3:22.72 
> [dot]
> 0 79301 79236 18  49  0 157667276 41208476 pfault   D     -       3:24.32 
> [dot]
> . . .
> 
> Part of the context was the /06/ text in:
> . . .
> root     dot        79301    0 
> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev     20 crw-rw-rw-    
> null  r
> root     dot        79289    0 
> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev     20 crw-rw-rw-    
> null  r
> . . .
> root     dot        79279    0 
> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev     20 crw-rw-rw-    
> null  r
> . . .
> root     dot        79272    0 
> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev     20 crw-rw-rw-    
> null  r
> . . .
> root     doxygen    79236    0 
> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev     20 crw-rw-rw-    
> null  r
> . . .
> 
> It identifies the [06] builder and the "Building" notice had made it to
> the disk before the panic happened. Then I could check the Makefile for
> if doxygen was used and it was. graphics/sdl2_gp historical build logs
> suggest problems exist.

Dumb typo, missing the "u" in "gpu", so: graphics/sdl2_gpu

===
Mark Millard
marklmi at yahoo.com


Reply via email to