Hi,

On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplage...@gmail.com>
> Date: Mon, 11 Oct 2021 16:15:06 -0400
> Subject: [PATCH v17 1/7] Read-only atomic backend write function
> 
> For counters in shared memory which can be read by any backend but only
> written to by one backend, an atomic is still needed to protect against
> torn values, however, pg_atomic_fetch_add_u64() is overkill for
> incrementing the counter. pg_atomic_inc_counter() is a helper function
> which can be used to increment these values safely but without
> unnecessary overhead.
>
> Author: Thomas Munro
> ---
>  src/include/port/atomics.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 856338f161..39ffff24dd 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, 
> int64 sub_)
>       return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
>  }
>  
> +/*
> + * On modern systems this is really just *counter++.  On some older systems
> + * there might be more to it, due to inability to read and write 64 bit 
> values
> + * atomically.
> + */
> +static inline void
> +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> +{
> +     pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> +}

I wonder if it's worth putting something in the name indicating that this is
not actual atomic RMW operation. Perhaps adding _unlocked?



> From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplage...@gmail.com>
> Date: Wed, 24 Nov 2021 10:32:56 -0500
> Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
> 
> Add an array of counters in PgBackendStatus which count the buffers
> allocated, extended, fsynced, and written by a given backend. Each "IO
> Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> local, shared, or strategy). "local" and "shared" IO Path counters count
> operations on local and shared buffers. The "strategy" IO Path counts
> buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> The "direct" IO Path counts blocks of IO which are read, written, or
> fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> [Local]BufferAlloc()).
> 
> With this commit, all backends increment a counter in their
> PgBackendStatus when performing an IO operation. This is in preparation
> for future commits which will persist these stats upon backend exit and
> use the counters to provide observability of database IO operations.
> 
> Note that this commit does not add code to increment the "direct" path.
> A separate proposed patch [1] which would add wrappers for smgrwrite(),
> smgrextend(), and smgrimmedsync() would provide a good location to call
> pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> users of these functions.
>
> [1] 
> https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com

On longer thread it's nice for committers to already have Reviewed-By: in the
commit message.

> diff --git a/src/backend/utils/activity/backend_status.c 
> b/src/backend/utils/activity/backend_status.c
> index 7229598822..413cc605f8 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -399,6 +399,15 @@ pgstat_bestart(void)
>       lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
>       lbeentry.st_progress_command_target = InvalidOid;
>       lbeentry.st_query_id = UINT64CONST(0);
> +     for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +     {
> +             IOOps      *io_ops = &lbeentry.io_path_stats[io_path];
> +
> +             pg_atomic_init_u64(&io_ops->allocs, 0);
> +             pg_atomic_init_u64(&io_ops->extends, 0);
> +             pg_atomic_init_u64(&io_ops->fsyncs, 0);
> +             pg_atomic_init_u64(&io_ops->writes, 0);
> +     }
>  
>       /*
>        * we don't zero st_progress_param here to save cycles; nobody should

nit: I think we nearly always have a blank line before loops


> diff --git a/src/backend/utils/init/postinit.c 
> b/src/backend/utils/init/postinit.c
> index 646126edee..93f1b4bcfc 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char 
> *username,
>               RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, 
> ClientCheckTimeoutHandler);
>       }
>  
> +     pgstat_beinit();
>       /*
>        * Initialize local process's access to XLOG.
>        */

nit: same with multi-line comments.


> @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char 
> *username,
>                */
>               CreateAuxProcessResourceOwner();
>  
> +             pgstat_bestart();
>               StartupXLOG();
>               /* Release (and warn about) any buffer pins leaked in 
> StartupXLOG */
>               ReleaseAuxProcessResources(true);
> @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char 
> *username,
>       EnablePortalManager();
>  
>       /* Initialize status reporting */
> -     pgstat_beinit();

I'd like to see changes like moving this kind of thing around broken around
and committed separately. It's much easier to pinpoint breakage if the CF
breaks after moving just pgstat_beinit() around, rather than when committing
this considerably larger patch. And reordering subsystem initialization has
the habit of causing problems...


> +/* ----------
> + * IO Stats reporting utility types
> + * ----------
> + */
> +
> +typedef enum IOOp
> +{
> +     IOOP_ALLOC,
> +     IOOP_EXTEND,
> +     IOOP_FSYNC,
> +     IOOP_WRITE,
> +} IOOp;
> [...]
> +/*
> + * Structure for counting all types of IOOps for a live backend.
> + */
> +typedef struct IOOps
> +{
> +     pg_atomic_uint64 allocs;
> +     pg_atomic_uint64 extends;
> +     pg_atomic_uint64 fsyncs;
> +     pg_atomic_uint64 writes;
> +} IOOps;

To me IOop and IOOps sound to much alike - even though they're really kind of
separate things. s/IOOps/IOOpCounters/ maybe?


> @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
>  {
>       Assert(!pgstat_is_shutdown);
>  
> +     /*
> +      * Only need to send stats on IO Ops for IO Paths when a process exits.
> +      * Users requiring IO Ops for both live and exited backends can read 
> from
> +      * live backends' PgBackendStatus and sum this with totals from exited
> +      * backends persisted by the stats collector.
> +      */
> +     pgstat_send_buffers();

Perhaps something like this comment belongs somewhere at the top of the file,
or in the header, or ...? It's a fairly central design piece, and it's not
obvious one would need to look in the shutdown hook for it?


> +/*
> + * Before exiting, a backend sends its IO op statistics to the collector so
> + * that they may be persisted.
> + */
> +void
> +pgstat_send_buffers(void)
> +{
> +     PgStat_MsgIOPathOps msg;
> +
> +     PgBackendStatus *beentry = MyBEEntry;
> +
> +     /*
> +      * Though some backends with type B_INVALID (such as the single-user 
> mode
> +      * process) do initialize and increment IO operations stats, there is no
> +      * spot in the array of IO operations for backends of type B_INVALID. As
> +      * such, do not send these to the stats collector.
> +      */
> +     if (!beentry || beentry->st_backendType == B_INVALID)
> +             return;

Why does single user mode use B_INVALID? That doesn't seem quite right.


> +     memset(&msg, 0, sizeof(msg));
> +     msg.backend_type = beentry->st_backendType;
> +
> +     pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> +                                                (IOOps *) 
> &beentry->io_path_stats);
> +
> +     pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> +     pgstat_send(&msg, sizeof(msg));
> +}

It seems worth having a path skipping sending the message if there was no IO?



> +/*
> + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> + * local) to those in the equivalent stats structure for exited backends. 
> Note
> + * that this adds and doesn't set, so the destination stats structure should 
> be
> + * zeroed out by the caller initially. This would commonly be used to 
> transfer
> + * all IO Op stats for all IO Paths for a particular backend type to the
> + * pgstats structure.
> + */
> +void
> +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> +{
> +     for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +     {

Sacriligeous, but I find io_path a harder to understand variable name for the
counter than i (or io_path_off or ...) ;)


> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> +     PgStatIOOps *src_io_path_ops;
> +     PgStatIOOps *dest_io_path_ops;
> +
> +     /*
> +      * Subtract 1 from message's BackendType to get a valid index into the
> +      * array of IO Ops which does not include an entry for B_INVALID
> +      * BackendType.
> +      */
> +     Assert(msg->backend_type > B_INVALID);

Probably worth also asserting the upper boundary?



> From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplage...@gmail.com>
> Date: Wed, 24 Nov 2021 11:39:48 -0500
> Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
> 
> Backends count IO operations for various IO paths in their PgBackendStatus.
> Upon exit, they send these counts to the stats collector. Prior to this 
> commit,
> these IO Ops stats would have been reset when the target was "bgwriter".
> 
> With this commit, target "bgwriter" no longer will cause the IO operations
> stats to be reset, and the IO operations stats can be reset with new target,
> "buffers".
> ---
>  doc/src/sgml/monitoring.sgml                |  2 +-
>  src/backend/postmaster/pgstat.c             | 83 +++++++++++++++++++--
>  src/backend/utils/activity/backend_status.c | 29 +++++++
>  src/include/pgstat.h                        |  8 +-
>  src/include/utils/backend_status.h          |  2 +
>  5 files changed, 117 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 62f2a3332b..bda3eef309 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM 
> pg_stat_activity WHERE wait_event i
>         <structfield>stats_reset</structfield> <type>timestamp with time 
> zone</type>
>        </para>
>        <para>
> -       Time at which these statistics were last reset
> +       Time at which these statistics were last reset.
>        </para></entry>
>       </row>
>      </tbody>

Hm?

Shouldn't this new reset target be documented?


> +/*
> + * Helper function to collect and send live backends' current IO operations
> + * stats counters when a stats reset is initiated so that they may be 
> deducted
> + * from future totals.
> + */
> +static void
> +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> +{
> +     PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> +     memset(ops, 0, sizeof(ops));
> +     pgstat_report_live_backend_io_path_ops(ops);
> +
> +     /*
> +      * Iterate through the array of IO Ops for all IO Paths for each
> +      * BackendType. Because the array does not include a spot for 
> BackendType
> +      * B_INVALID, add 1 to the index when setting backend_type so that 
> there is
> +      * no confusion as to the BackendType with which this reset message
> +      * corresponds.
> +      */
> +     for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; 
> backend_type_idx++)
> +     {
> +             msg->m_backend_resets.backend_type = backend_type_idx + 1;
> +             memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> +                             sizeof(msg->m_backend_resets.iop));
> +             pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> +     }
> +}

Probably worth explaining why multiple messages are sent?


> @@ -5583,10 +5621,45 @@ 
> pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
>  {
>       if (msg->m_resettarget == RESET_BGWRITER)
>       {
> -             /* Reset the global, bgwriter and checkpointer statistics for 
> the cluster. */
> -             memset(&globalStats, 0, sizeof(globalStats));
> +             /*
> +              * Reset the global bgwriter and checkpointer statistics for the
> +              * cluster.
> +              */
> +             memset(&globalStats.checkpointer, 0, 
> sizeof(globalStats.checkpointer));
> +             memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
>               globalStats.bgwriter.stat_reset_timestamp = 
> GetCurrentTimestamp();
>       }

Oh, is this a live bug?


> +             /*
> +              * Subtract 1 from the BackendType to arrive at a valid index 
> in the
> +              * array, as it does not contain a spot for B_INVALID 
> BackendType.
> +              */

Instead of repeating a comment about +- 1 in a bunch of places, would it look
better to have two helper inline functions for this purpose?



> +/*
> +* When adding a new column to the pg_stat_buffers view, add a new enum
> +* value here above COLUMN_LENGTH.
> +*/
> +enum
> +{
> +     COLUMN_BACKEND_TYPE,
> +     COLUMN_IO_PATH,
> +     COLUMN_ALLOCS,
> +     COLUMN_EXTENDS,
> +     COLUMN_FSYNCS,
> +     COLUMN_WRITES,
> +     COLUMN_RESET_TIME,
> +     COLUMN_LENGTH,
> +};

COLUMN_LENGTH seems like a fairly generic name...



> From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplage...@gmail.com>
> Date: Wed, 24 Nov 2021 12:20:10 -0500
> Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
> 
> Remove stats from pg_stat_bgwriter which are now more clearly expressed
> in pg_stat_buffers.
> 
> TODO:
> - make pg_stat_checkpointer view and move relevant stats into it
> - add additional stats to pg_stat_bgwriter

When do you think it makes sense to tackle these wrt committing some of the
patches?


> diff --git a/src/backend/storage/buffer/bufmgr.c 
> b/src/backend/storage/buffer/bufmgr.c
> index 6926fc5742..67447f997a 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2164,7 +2164,6 @@ BufferSync(int flags)
>                       if (SyncOneBuffer(buf_id, false, &wb_context) & 
> BUF_WRITTEN)
>                       {
>                               TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> -                             
> PendingCheckpointerStats.m_buf_written_checkpoints++;
>                               num_written++;
>                       }
>               }
> @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
>        */
>       strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
>  
> -     /* Report buffer alloc counts to pgstat */
> -     PendingBgWriterStats.m_buf_alloc += recent_alloc;
> -
>       /*
>        * If we're not running the LRU scan, just stop after doing the stats
>        * stuff.  We mark the saved state invalid so that we can recover sanely
> @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
>                       reusable_buffers++;
>       }
>  
> -     PendingBgWriterStats.m_buf_written_clean += num_written;
> -

Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
compilers to warn... Perhaps we should just remove information from the
tracepoint?


Greetings,

Andres Freund


Reply via email to