On 2024-10-22 03:24, Rahila Syed wrote:
Hi,

PostgreSQL provides following capabilities for reporting memory
contexts statistics.
1. pg_get_backend_memory_contexts(); [1]
2. pg_log_backend_memory_contexts(pid); [2]

[1]  provides a view of memory context statistics for a local backend,
while [2] prints the memory context statistics of any backend or
auxiliary
process to the PostgreSQL logs. Although [1] offers detailed
statistics,
it is limited to the local backend, restricting its use to PostgreSQL
client backends only.
On the other hand, [2] provides the statistics for all backends but
logs them in a file,
which may not be convenient for quick access.

I propose enhancing memory context statistics reporting by combining
these
capabilities and offering a view of memory statistics for all
PostgreSQL backends
and auxiliary processes.

Thanks for working on this!

I originally tried to develop something like your proposal in [2], but there were some difficulties and settled down to implement pg_log_backend_memory_contexts().

Attached is a patch that implements this functionality. It introduces
a SQL function
that takes the PID of a backend as an argument, returning a set of
records,
each containing statistics for a single memory context. The underlying
C function
sends a signal to the backend and waits for it to publish its memory
context statistics
 before returning them to the user. The publishing backend copies
these statistics
during the next CHECK_FOR_INTERRUPTS call.

I remember waiting for dumping memory contexts stats could cause trouble considering some erroneous cases.

For example, just after the target process finished dumping stats, pg_get_remote_backend_memory_contexts() caller is terminated before reading the stats, calling pg_get_remote_backend_memory_contexts() has no response any more:

[session1]$ psql
           (40699)=#

$ kill -s SIGSTOP 40699

[session2] psql
(40866)=# select * FROM pg_get_remote_backend_memory_contexts('40699', false); -- waiting

$ kill -s SIGSTOP 40866

$ kill -s SIGCONT 40699

[session3] psql
           (47656) $ select pg_terminate_backend(40866);

$ kill -s SIGCONT 40866 -- session2 terminated

[session3] (47656)=# select * FROM pg_get_remote_backend_memory_contexts('47656', false); -- no response

It seems the reason is memCtxState->in_use is now and memCtxState->proc_id is 40699. We can continue to use pg_get_remote_backend_memory_contexts() after specifying 40699, but it'd be hard to understand for users.

This approach facilitates on-demand publication of memory statistics
for a specific backend, rather than collecting them at regular
intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.

A fixed-size shared memory block, currently accommodating 30 records,
 is used to store the statistics. This number was chosen arbitrarily,
 as it covers all parent contexts at level 1 (i.e., direct children of
the top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.

Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
 first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.

The statistics are reported in a breadth-first search order of the
memory context tree,
 with parent contexts reported before their children. This provides a
cumulative summary
before diving into the details of each child context's consumption.

The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of
TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of file access,
unless necessary.

A publishing backend signals waiting client backends using a condition

variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the
requested backend.
If not, it continues waiting on the condition variable, timing out
after 2 minutes.
This timeout is an arbitrary choice, and further work is required to
determine
a more practical value.

All backends use the same memory space to publish their statistics.
Before publishing, a backend checks whether the previous statistics
have been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by the client
backend once the data is read. If a backend cannot publish due to
shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.

Please find below an example query to fetch memory contexts from the
backend
 with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.

postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292', false)
LIMIT 2;
-[ RECORD 1 ]-+----------------------
name          | TopMemoryContext
ident         |
type          | AllocSet
path          | {0}
total_bytes   | 97696
total_nblocks | 5
free_bytes    | 15376
free_chunks   | 11
used_bytes    | 82320
pid           | 116292
-[ RECORD 2 ]-+----------------------
name          | RowDescriptionContext
ident         |
type          | AllocSet
path          | {0,1}
total_bytes   | 8192
total_nblocks | 1
free_bytes    | 6912
free_chunks   | 0
used_bytes    | 1280
pid           | 116292

32d3ed8165f821f introduced 1-based path to pg_backend_memory_contexts, but pg_get_remote_backend_memory_contexts() seems to have 0-base path.

pg_backend_memory_contexts has "level" column, but pg_get_remote_backend_memory_contexts doesn't.

Are there any reasons for these?


TODO:
1. Determine the behaviour when the statistics don't fit in one file.

[1] PostgreSQL: Re: Creating a function for exposing memory usage of
backend process [1]

[2] PostgreSQL: Re: Get memory contexts of an arbitrary backend
process [2]

Thank you,
Rahila Syed



Links:
------
[1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F0a768ae1-1703-59c7-86cc-7068ff5e318c%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436604911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=cbO2DBP6IsgMPTEVFNh%2FKeq4IoK3MZvTpzKkCQzNPMo%3D&reserved=0 [2] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.


Reply via email to