Hello George,

thanks for the suggestion.

On ou rsystem vm.max_map_count is currently 64k.

I should also mention that memory overcommit is diabled (vm.overcommit_memory = 
2).
If we change this setting to allow memory overcommit (vm.overcommit_memory = 0 
or 1), the issue dissapears.

However, it still looks somewhat surprising that a simple “Hello World” 
application with ~20 MPI processes
already triggers this behaviour. Given that the system has 192 GB of RAM, it 
does not seem obvious why startup
would fail due to memory allocation at such a small scale.

Ideally we would prefer to keep memory overcommit disabled, since it helps 
detect memory issues in user applications
early rather than failing later at runtime.

Is there a way to influence this memory exhausting behaviour with some settings 
in OpenMPI (or UCX compinents).
So far we also experimented with adjusting the UCX FIFO sizes, since the 
defaults changed in newer UCX releases.
In particular we tried restoring the older values used in UCX 1.15: 
UCX_POSIX_FIFO_SIZE=64, UCX_SYSV_FIFO_SIZE=64, UCX_XPMEM_FIFO_SIZE=64.
Unfortunately this did not resolve the issue.

Are there other UCX parameters (e.g. related to shared-memory transports, 
rcache behaviour, or memtype cache) or 
Open MPI MCA parameters that could reduce the number of memory mappings or the 
amount of virtual memory reserved
during startup?

Any suggestions for further debugging or configuration options to try would be 
highly appreciated.

Best regards,
Christoph

----- Original Message -----
From: "Open MPI Users" <[email protected]>
To: "Open MPI Users" <[email protected]>
Sent: Wednesday, 4 March, 2026 17:19:00
Subject: Re: [OMPI users] "Cannot allocate memory” / pgtable failure with Open 
MPI and UCX 1.16 or newer

It looks like some form of resource exhaustion, possibly exceeding the
number of entries into the mmap table. What is the value of
`vm.max_map_count` on this system ? You can obtain it with `sysctl
vm.max_map_count` or `cat /proc/sys/vm/max_map_count`.

  George


On Wed, Mar 4, 2026 at 4:25 AM Christoph Niethammer <[email protected]>
wrote:

> Dear all,
>
> We are hitting the following error when running a simple Open MPI “Hello
> World” with UCX 1.16 or newer and Open MPI 5.0.x and some 4.1.5+ versions
> on a single node:
>
> rcache.c:248  UCX  ERROR   mmap(size=151552) failed: Cannot allocate memory
> pgtable.c:75   Fatal: Failed to allocate page table directory
> *** Process received signal ***
> Signal: Aborted (6)
> Signal code:  (-6)
>
> This is on CentOS 8.10, kernel 4.18, 192 GB RAM, Intel Xeon Gold 6138
> (dual-socket Skylake, 40 cores). The failure is reproducible only when
> using more than 20-24 MPI ranks; fewer than 20 ranks work fine. Older UCX
> versions on the same system (e.g. 1.12) do not show this issue.
>
> The issue also goes away if we run Open MPI with the ob1 PML (without UCX)
> or disable for the UCX PML some of the TLS with UCX_TLS=^shm or UCX_TLS=^ib.
>
> Has anyone seen similar "mmap failed / Failed to allocate page table
> directory" errors with UCX > 1.15 and Open MPI 4.1.x/5.0.x, or is aware of
> known regressions or configuration pitfalls (e.g. rcache, huge pages,
> memtype cache, or other UCX/Open MPI memory-related settings)? Are there
> specific UCX environment variables or OMPI MCA parameters you would
> recommend trying to diagnose this further?
>
> I can provide full ompi_info, ucx_info, build options, and more complete
> logs if that is helpful.
>
>
> Many thanks in advance for any hints or suggestions.
>
>
> Best regards,
> Christoph Niethammer
>
> --
>
> Dr.-Ing. Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
>
> Tel: ++49(0)711-685-87203
> email: [email protected]
> https://www.hlrs.de/people/christoph-niethammer
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Reply via email to