Hi,

Under a normal environment, the instance's number of huge pages can be
adjusted to the size reported by shared_memory_size_in_huge_pages,
then Postgres can be started and the requested shared memory fit in
the available huge pages.

A similar approach is harder to implement with environments like
kubernetes. If I want to modify the huge pages on a pod, I need to:
- Modify the host's huge pages
- Restart the host's kubelet so it detects the new amount of huge pages
- Modify the pod's huge page request

Most of those steps are far from practical. An alternative would be to
have a fixed number of huge pages (like 25% of the node's memory), and
to adjust the configuration, like the amount of shared_buffers.
However, adjusting the configuration to fit in a fixed amount of
memory is tricky:
- shared_buffers is used to auto-tune multiple parameters so there's
no easy formula to get the correct amount. The only way I've found is
to basically increase shared_buffers until
shared_memory_size_in_huge_pages matches the desired amount of huge
pages
- changing other parameters like max_connections mean shared_buffers
has to be adjusted again

To help with that, the attached patch provides a new option,
huge_pages_autotune_buffers, to automatically use leftover huge pages
as shared_buffers. This requires some changes in the auto-tune logic:
- Subsystems that are using shared_buffers for auto-tuning will rely
on the configured shared_buffers, not the auto-tuned shared_buffers
and they should save the auto-tuned value in a GUC. This will be done
in dedicated auto-tune functions.
- Once the auto-tune functions are called, modifying NBuffers won't
change the requested memory except for the shared buffer pool in
BufferManagerShmemSize
- We can get the leftover memory (free huge pages - requested memory),
and estimate how much shared_buffers we can add
- Increasing shared_buffers will also increase the freelist hashmap,
so the auto-tuned shared_buffers needs to be reduced

The patch is split in the following sub-patches:

0001: Extract the current auto-tune logic in dedicated functions,
making the behaviour more consistent across subsystems.

0002: The checkpointer auto-tunes the request size using NBuffers, but
doesn't save the result in a GUC. This adds a new
checkpoint_request_size GUC with the same auto-tune logic.

0003: Extract HugePages_Free value when /proc/meminfo is parsed in
GetHugePageSize.

0004: Pass NBuffers as parameters to StrategyShmemSize. This is
necessary to get how much memory will be used by the freelist using
'StrategyShmemSize(candidate_nbuffers) - StrategyShmemSize(NBuffers)'.

0005: Add BufferManagerAutotune to auto-tune the amount of shared_buffers.

Regards,
Anthonin Bonnefoy

Attachment: v1-0003-Extract-HugePages_Free-value-in-GetHugePageSize.patch
Description: Binary data

Attachment: v1-0004-Pass-NBuffers-as-parameter-to-StrategyShmemSize.patch
Description: Binary data

Attachment: v1-0005-Auto-tune-shared_buffers-to-use-available-huge-pa.patch
Description: Binary data

Attachment: v1-0002-Add-GUC-for-checkpointer-request-queue-size.patch
Description: Binary data

Attachment: v1-0001-Create-dedicated-shmem-Autotune-functions.patch
Description: Binary data

Reply via email to