The Memory Hot Unplug section on the wiki at https://pve.proxmox.com/wiki/Hotplug_(qemu_disk,nic,cpu,memory)#Memory_Hot_Unplug is somewhat outdated but subtly dangerous.

It still refers to the now-removed CONFIG_MOVABLE_NODE, but its now available at least on Debian stable as of two years ago. But, more importantly, just blindly setting movable_node leads to system instability if the VM has a nontrivial amount of memory.

eg I had a stock debian VM running postgresql with 16GB RAM with "memhp_default_state=online movable_node" added to the kernel command line and the result was that the OOM killer got invoked regularly when postgres pulled lots of disk into filesystem caches all at once for large queries, despite there being a GB or two of available memory. After chatting with the #mm folks on OFTC they pointed out that, yes, this is generally expected behavior because the result will be zero memory in the Normal zone, forcing all kernel allocations aside from pagecache into the bottom 1GiB of RAM which can easily run out and lead to OOM kills.

In fact, movable_node's documentation even says "This means that the memory of such nodes will be usable only for movable allocations which rules out almost all kernel allocations. Use with caution!"

The hotplug guide at https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html suggests a better option as memory_hotplug.online_policy=auto-movable which keeps a sought ratio between movable and normal zones, fixing the issue, but sadly proxmox doesn't handle it with the automagic hotplug. Sadly, linux (at least 6.12.21) doesn't pick the last dimms to make movable when using auto-movable, but rather (sometimes?) picks ones in the middle of the range (eg the VM I'm looking at is making memory zones 32-67/183 movable, with the rest Normal/DMA32). If I go in the qemu monitor and device_del DIMMs in the lower range they get removed fine, however.

It seems like the wiki should be updated to mention the drawbacks of `movable_node` and ideally the auto-hotunplug logic should try more than just the highest dimm and `online_policy` mentioned in the wiki.

Thanks,
Matt

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to