On Thu, Jun 30, 2022 at 11:05:28PM +0300, Michael Zhilin wrote: > I would like to submit patch for column wait_event of view pg_stat_activity. > Please find it attached.
Thanks. I had this thread on my list to look at later but then, by chance, I got an erroneous nagios notification about a longrunning transaction, which seems to have been caused by this bug. You can reproduce the problem by running a couple handsful of these: psql -h /tmp postgres -Atc "SELECT * FROM pg_stat_activity WHERE state='active' AND wait_event='ClientRead' LIMIT 1"& > As result backend wait events may be changed for quick queries. Yep. I realize now that I've been seeing this at one of our customers for a pkey lookup query. We don't use high max_connections, but if you have a lot of connections, you might be more likely to hit this. I agree that this is a bug, since it can (and did) cause false positives in a monitoring system. > As well, patch changes way to allocate memory for local structure. Before it > estimates maximum size of required memory and allocate it at once. It could > result into allocation of dozens/hundreds of megabytes for nothing. Now it > allocates memory by chunks to reduce overall amount of allocated memory and > reduce time for allocation. I suggest to present this as two patches: a 0001 patch to fix the bug, and proposed for backpatch, and an 0002 patch for master to improve memory usage. As attached. Actually, once 0001 is resolved, it may be good to start a separate thread for 0002. I plan to add to the next CF. Did you really experience latency before the patch ? It seems like max_connections=1000 would only allocate a bit more than 1MB. For me, max_connections=10000 makes pg_stat_activity 10% slower for me than 100 (11.8s vs 13.2s). With the patch, 10k is only about ~3% slower than 100. So there is an improvement, but people here may not be enthusiastic excited about improving performance for huge values of max_connections. time for a in `seq 1 99`; do psql -h /tmp postgres -Atc "SELECT FROM pg_stat_activity -- WHERE state='active' AND wait_event='ClientRead' LIMIT 1"; done Is there a reason why you made separate allocations for appname, hostname, activity, ssl, gss (I know that's how it's done originally, too) ? Couldn't it be a single allocation ? You could do a single (re)allocation when an active backend is found if the amount of free space is less than 2*NAMEDATALEN+track_activity_query_size+sizeof(PgBackendSSLStatus)+sizeof(PgBackendGSSStatus) I have a patch for that, but I'll share it after 0001 is resolved. Why did you change backendStatus to a pointer ? Is it to minimize the size of the structure to allow for huge values of max_connections ? Note that there were warnings from your 0002: backend_status.c:723:21: warning: ‘localactivity_thr’ may be used uninitialized in this function [-Wmaybe-uninitialized] Thanks, -- Justin
>From 590d9b545805f99b80103bfe900d175ad1d0668c Mon Sep 17 00:00:00 2001 From: Michael Zhilin <m.zhi...@postgrespro.ru> Date: Thu, 10 Mar 2022 17:23:11 +0300 Subject: [PATCH 1/2] fix column wait_event of view pg_stat_activity In case of high amount of connections, we have observed value 'ClientRead' of column wait_event for active connections. Actually backend sets wait_event as 'ClientRead' only in case of idle or idle in transaction state. The reason of discrepancy is different time moments of gathering information from internal structures. At first it gathers state information for all backends. Then it iterates over connections to gather wait information for each backend. Before this patch the columns 'state' and 'wait_event' of view pg_stat_activity may contain contadictive values. The aim of change is to make wait information correct. To achieve it the state and wait information should be gathered at same time. Patch makes no change of system behaviour, but makes pg_stat_activity view more consistent. Tested on CentOS 7.9 / Debian 11 / FreeBSD 14 with 8-32 cores. Co-authored-by: Yura Sokolov <y.soko...@postgrespro.ru> --- src/backend/utils/activity/backend_status.c | 18 +++++++++ src/backend/utils/adt/pgstatfuncs.c | 45 ++++----------------- src/include/utils/backend_status.h | 10 +++++ 3 files changed, 36 insertions(+), 37 deletions(-) diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 415604db322..64567ff5cd7 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -845,6 +845,24 @@ pgstat_read_current_status(void) /* Only valid entries get included into the local array */ if (localentry->backendStatus.st_procpid > 0) { + PGPROC *proc = BackendIdGetProc(i); + if (proc != NULL) + { + PGPROC *leader = proc->lockGroupLeader; + + /* + * Show the leader only for active parallel workers. This + * leaves the field as NULL for the leader of a parallel + * group. + */ + if (leader && leader->pid != localentry->backendStatus.st_procpid) + localentry->leader_pid = leader->pid; + else + localentry->leader_pid = 0; + + localentry->wait_event_info = proc->wait_event_info; + } + BackendIdGetTransactionIds(i, &localentry->backend_xid, &localentry->backend_xmin); diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index db061754c8b..82f169a2d8b 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -555,7 +555,6 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) bool nulls[PG_STAT_GET_ACTIVITY_COLS]; LocalPgBackendStatus *local_beentry; PgBackendStatus *beentry; - PGPROC *proc; const char *wait_event_type = NULL; const char *wait_event = NULL; @@ -652,46 +651,18 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) pfree(clipped_activity); /* leader_pid */ - nulls[28] = true; - - proc = BackendPidGetProc(beentry->st_procpid); - - if (proc == NULL && (beentry->st_backendType != B_BACKEND)) + if (local_beentry->leader_pid) { - /* - * For an auxiliary process, retrieve process info from - * AuxiliaryProcs stored in shared-memory. - */ - proc = AuxiliaryPidGetProc(beentry->st_procpid); + values[28] = Int32GetDatum(local_beentry->leader_pid); + nulls[28] = false; } + else + nulls[28] = true; - /* - * If a PGPROC entry was retrieved, display wait events and lock - * group leader information if any. To avoid extra overhead, no - * extra lock is being held, so there is no guarantee of - * consistency across multiple rows. - */ - if (proc != NULL) + if (local_beentry->wait_event_info) { - uint32 raw_wait_event; - PGPROC *leader; - - raw_wait_event = UINT32_ACCESS_ONCE(proc->wait_event_info); - wait_event_type = pgstat_get_wait_event_type(raw_wait_event); - wait_event = pgstat_get_wait_event(raw_wait_event); - - leader = proc->lockGroupLeader; - - /* - * Show the leader only for active parallel workers. This - * leaves the field as NULL for the leader of a parallel - * group. - */ - if (leader && leader->pid != beentry->st_procpid) - { - values[28] = Int32GetDatum(leader->pid); - nulls[28] = false; - } + wait_event_type = pgstat_get_wait_event_type(local_beentry->wait_event_info); + wait_event = pgstat_get_wait_event(local_beentry->wait_event_info); } if (wait_event_type) diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 7403bca25ed..8f890911486 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -258,6 +258,16 @@ typedef struct LocalPgBackendStatus * not. */ TransactionId backend_xmin; + + /* + * The process wait information + */ + uint32 wait_event_info; + + /* + * The leader process ID + */ + int leader_pid; } LocalPgBackendStatus; -- 2.17.1
>From d122170f942932f9d747f8437a8089990665457a Mon Sep 17 00:00:00 2001 From: Michael Zhilin <m.zhi...@postgrespro.ru> Date: Thu, 10 Mar 2022 17:23:11 +0300 Subject: [PATCH 2/2] wip: pg_stat_activity: allocate memory differently This patch changes the way memory is allocated for temporary (aka local state) structures. In the case of large values of max_connections (over 1000), memory was allocated to accommodate the worst-case number of active backends which may be wasteful and cause increased latency. The patch makes pg_stat_activity view a bit more efficient by allocating memory as needed for active backends. It's still allocated in chunks. Co-authored-by: Yura Sokolov <y.soko...@postgrespro.ru> --- src/backend/utils/activity/backend_status.c | 107 ++++++++++++++------ src/backend/utils/adt/pgstatfuncs.c | 4 +- src/include/utils/backend_status.h | 2 +- 3 files changed, 77 insertions(+), 36 deletions(-) diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 64567ff5cd7..9fbe8a601db 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -25,6 +25,7 @@ #include "utils/guc.h" /* for application_name */ #include "utils/memutils.h" +#define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var)))) /* ---------- * Total number of backends including auxiliary @@ -716,6 +717,24 @@ pgstat_report_xact_timestamp(TimestampTz tstamp) PGSTAT_END_WRITE_ACTIVITY(beentry); } +#define PGSTAT_LOCALDATA_CHUNK 128 +#ifdef HAVE_TYPEOF +#define PGSTAT_LOCALDATA_ALLOC(x,cnt) \ + if (x == NULL || x > x##_thr) { \ + x = (typeof(x)) \ + MemoryContextAlloc(backendStatusSnapContext, \ + cnt * sizeof(x[0]) * PGSTAT_LOCALDATA_CHUNK); \ + x##_thr = x + cnt * (PGSTAT_LOCALDATA_CHUNK - 1); \ + } +#else +#define PGSTAT_LOCALDATA_ALLOC(x,cnt) \ + if (x == NULL || x > x##_thr) { \ + x = MemoryContextAlloc(backendStatusSnapContext, \ + cnt * sizeof(x[0]) * PGSTAT_LOCALDATA_CHUNK); \ + x##_thr = x + cnt * (PGSTAT_LOCALDATA_CHUNK - 1); \ + } +#endif + /* ---------- * pgstat_read_current_status() - * @@ -729,16 +748,24 @@ pgstat_read_current_status(void) volatile PgBackendStatus *beentry; LocalPgBackendStatus *localtable; LocalPgBackendStatus *localentry; + PgBackendStatus *localbeentry, + *localbeentry_thr; char *localappname, + *localappname_thr, *localclienthostname, - *localactivity; + *localclienthostname_thr, + *localactivity, + *localactivity_thr; #ifdef USE_SSL - PgBackendSSLStatus *localsslstatus; + PgBackendSSLStatus *localsslstatus, + *localsslstatus_thr; #endif #ifdef ENABLE_GSS - PgBackendGSSStatus *localgssstatus; + PgBackendGSSStatus *localgssstatus, + *localgssstatus_thr; #endif int i; + int pid; if (localBackendStatusTable) return; /* already done */ @@ -756,32 +783,39 @@ pgstat_read_current_status(void) localtable = (LocalPgBackendStatus *) MemoryContextAlloc(backendStatusSnapContext, sizeof(LocalPgBackendStatus) * NumBackendStatSlots); - localappname = (char *) - MemoryContextAlloc(backendStatusSnapContext, - NAMEDATALEN * NumBackendStatSlots); - localclienthostname = (char *) - MemoryContextAlloc(backendStatusSnapContext, - NAMEDATALEN * NumBackendStatSlots); - localactivity = (char *) - MemoryContextAllocHuge(backendStatusSnapContext, - pgstat_track_activity_query_size * NumBackendStatSlots); + #ifdef USE_SSL - localsslstatus = (PgBackendSSLStatus *) - MemoryContextAlloc(backendStatusSnapContext, - sizeof(PgBackendSSLStatus) * NumBackendStatSlots); + localsslstatus = NULL; #endif #ifdef ENABLE_GSS - localgssstatus = (PgBackendGSSStatus *) - MemoryContextAlloc(backendStatusSnapContext, - sizeof(PgBackendGSSStatus) * NumBackendStatSlots); + localgssstatus = NULL; #endif localNumBackends = 0; + localbeentry = NULL; + localappname = NULL; + localclienthostname = NULL; + localactivity = NULL; beentry = BackendStatusArray; localentry = localtable; for (i = 1; i <= NumBackendStatSlots; i++) { + /* + * Check available space in buffers for local state data + */ + PGSTAT_LOCALDATA_ALLOC(localbeentry, 1); + PGSTAT_LOCALDATA_ALLOC(localappname, NAMEDATALEN); + PGSTAT_LOCALDATA_ALLOC(localclienthostname, NAMEDATALEN); + PGSTAT_LOCALDATA_ALLOC(localactivity, pgstat_track_activity_query_size); +#ifdef USE_SSL + PGSTAT_LOCALDATA_ALLOC(localsslstatus, 1); +#endif +#ifdef ENABLE_GSS + PGSTAT_LOCALDATA_ALLOC(localgssstatus, 1); +#endif + + localentry->backendStatus = localbeentry; /* * Follow the protocol of retrying if st_changecount changes while we * copy the entry, or if it's odd. (The check for odd is needed to @@ -796,11 +830,11 @@ pgstat_read_current_status(void) pgstat_begin_read_activity(beentry, before_changecount); - localentry->backendStatus.st_procpid = beentry->st_procpid; + localentry->backendStatus->st_procpid = beentry->st_procpid; /* Skip all the data-copying work if entry is not in use */ - if (localentry->backendStatus.st_procpid > 0) + if (localentry->backendStatus->st_procpid > 0) { - memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus)); + memcpy(localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus)); /* * For each PgBackendStatus field that is a pointer, copy the @@ -811,23 +845,23 @@ pgstat_read_current_status(void) * because there's always a \0 at the end of the buffer. */ strcpy(localappname, (char *) beentry->st_appname); - localentry->backendStatus.st_appname = localappname; + localentry->backendStatus->st_appname = localappname; strcpy(localclienthostname, (char *) beentry->st_clienthostname); - localentry->backendStatus.st_clienthostname = localclienthostname; + localentry->backendStatus->st_clienthostname = localclienthostname; strcpy(localactivity, (char *) beentry->st_activity_raw); - localentry->backendStatus.st_activity_raw = localactivity; + localentry->backendStatus->st_activity_raw = localactivity; #ifdef USE_SSL if (beentry->st_ssl) { memcpy(localsslstatus, beentry->st_sslstatus, sizeof(PgBackendSSLStatus)); - localentry->backendStatus.st_sslstatus = localsslstatus; + localentry->backendStatus->st_sslstatus = localsslstatus; } #endif #ifdef ENABLE_GSS if (beentry->st_gss) { memcpy(localgssstatus, beentry->st_gssstatus, sizeof(PgBackendGSSStatus)); - localentry->backendStatus.st_gssstatus = localgssstatus; + localentry->backendStatus->st_gssstatus = localgssstatus; } #endif } @@ -842,8 +876,9 @@ pgstat_read_current_status(void) CHECK_FOR_INTERRUPTS(); } + pid = localentry->backendStatus->st_procpid; /* Only valid entries get included into the local array */ - if (localentry->backendStatus.st_procpid > 0) + if (pid > 0) { PGPROC *proc = BackendIdGetProc(i); if (proc != NULL) @@ -855,12 +890,17 @@ pgstat_read_current_status(void) * leaves the field as NULL for the leader of a parallel * group. */ - if (leader && leader->pid != localentry->backendStatus.st_procpid) + if (leader && leader->pid != localentry->backendStatus->st_procpid) localentry->leader_pid = leader->pid; else localentry->leader_pid = 0; - localentry->wait_event_info = proc->wait_event_info; + localentry->wait_event_info = UINT32_ACCESS_ONCE(proc->wait_event_info); + } + else + { + localentry->leader_pid = 0; + localentry->wait_event_info = 0; } BackendIdGetTransactionIds(i, @@ -868,9 +908,10 @@ pgstat_read_current_status(void) &localentry->backend_xmin); localentry++; - localappname += NAMEDATALEN; - localclienthostname += NAMEDATALEN; - localactivity += pgstat_track_activity_query_size; + localbeentry++; + localappname += strlen(localappname) + 1; + localclienthostname += strlen(localclienthostname) + 1; + localactivity += strlen(localactivity) + 1; #ifdef USE_SSL localsslstatus++; #endif @@ -1083,7 +1124,7 @@ pgstat_fetch_stat_beentry(int beid) if (beid < 1 || beid > localNumBackends) return NULL; - return &localBackendStatusTable[beid - 1].backendStatus; + return localBackendStatusTable[beid - 1].backendStatus; } diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 82f169a2d8b..d5f2448912b 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -500,7 +500,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) if (!local_beentry) continue; - beentry = &local_beentry->backendStatus; + beentry = local_beentry->backendStatus; /* * Report values for only those backends which are running the given @@ -581,7 +581,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) continue; } - beentry = &local_beentry->backendStatus; + beentry = local_beentry->backendStatus; /* If looking for specific PID, ignore all the others */ if (pid != -1 && beentry->st_procpid != pid) diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 8f890911486..0b4b426e3aa 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -245,7 +245,7 @@ typedef struct LocalPgBackendStatus /* * Local version of the backend status entry. */ - PgBackendStatus backendStatus; + PgBackendStatus *backendStatus; /* * The xid of the current transaction if available, InvalidTransactionId -- 2.17.1