Thanks for many your suggestions!
I made the patch to handle the issues.
> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
> instead of just accumulating the stats since the last submission?
> There doesn't seem to be any comment explaining it? Computing
> differences to previous values, and copying to prev*, isn't free. I
> assume this is out of a fear that the stats could get reset before
> they're used for instrument.c purposes - but who knows?
I removed the unnecessary code copying pgWalUsage and just reset the
pgWalUsage after reporting the stats in pgstat_report_wal().
> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
> former being used by wal writer, the latter by most other processes?
> There again don't seem to be comments explaining this.
I added the comments why two functions are separated.
(But is it better to merge them?)
> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> just to figure out if there's been any changes isn't all that
> cheap. This is regularly exercised in read-only workloads too. Seems
> adding a boolean WalStats.have_pending = true or such would be
> better.
> 4) For plain backends pgstat_report_wal() is called by
> pgstat_report_stat() - but it is not checked as part of the "Don't
> expend a clock check if nothing to do" check at the top. It's
> probably rare to have pending wal stats without also passing one of
> the other conditions, but ...
I added the logic to check if the stats counters are updated or not in
pgstat_report_stat() using not only generated wal record but also write/sync
counters, and it can skip to call reporting function.
Although I added the condition which the write/sync counters are updated or
not, I haven't understood the following case yet...Sorry. I checked related
code and tested to insert large object, but I couldn't. I'll investigate more
deeply, but if you already know the function which leads the following case,
please let me know.
> Maybe there is the case where a backend generates no WAL records,
> but just writes them because it needs to do write-ahead-logging
> when flush the table data?
> Ugh! I was missing a very large blob.. Ok, we need additional check
> for non-pgWalUsage part. Sorry.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..095d8d0970 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,6 +19,17 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+
+/*
+ * generated WAL usage counters.
+ *
+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_report_wal() is invoked, so you can compute the difference
+ * in the same transaction only.
+ */
WalUsage pgWalUsage;
static WalUsage save_pgWalUsage;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 60f45ccc4e..e354a454a9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -144,14 +144,6 @@ char *pgstat_stat_tmpname = NULL;
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
-/*
- * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
- * the previous counters from the current ones.
- */
-static WalUsage prevWalUsage;
-
/*
* List of SLRU names that we keep stats for. There is no central registry of
* SLRUs, so we use this fixed list instead. The "other" entry is used for
@@ -879,9 +871,18 @@ pgstat_report_stat(bool disconnect)
TabStatusArray *tsa;
int i;
- /* Don't expend a clock check if nothing to do */
+ /*
+ * Don't expend a clock check if nothing to do.
+ *
+ * Note: Regarding the wal data, it's not enough to check only the wal
+ * records, because there is a case where a backend generates no wal
+ * records, but just writes them because it needs to do
+ * write-ahead-logging when flush the table data.
+ */
if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+ pgWalUsage.wal_records == 0 && WalStats.m_wal_write == 0 &&
+ WalStats.m_wal_sync == 0 &&
!have_function_stats && !disconnect)
return;
@@ -3117,13 +3118,6 @@ pgstat_initialize(void)
MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
}
- /*
- * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
- * calculate how much pgWalUsage counters are increased by substracting
- * prevWalUsage from pgWalUsage.
- */
- prevWalUsage = pgWalUsage;
-
/* Set up a process-exit hook to clean up */
on_shmem_exit(pgstat_beshutdown_hook, 0);
}
@@ -4674,28 +4668,40 @@ pgstat_send_bgwriter(void)
/* ----------
* pgstat_report_wal() -
*
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
+ * Send WAL statistics to the collector and clear the counters.
*
* Must be called by processes that generate WAL.
+ *
+ * If needed, the caller should check whether the statistic counters
+ * are updated, or time interval since we sent is enough before calling
+ * this function.
+ *
+ * Especially, for backends, if they execute read-only transaction,
+ * they rarely gererate and write the WAL data. So, to call this
+ * function is no meaning, but it may leads performance degradation.
+ *
+ * OTOH, for the checkpointer, it always generates the WAL record and
+ * doesn't call this function frequently. So, it doesn't need to check
+ * them.
+ *
+ * Note: if you want to know the WAL statistics counters are updated
+ * or not, to check WalStats.m_wal_records, walStats.wal_write and
+ * walStats.wal_sync. If WalStats.m_wal_records == 0, walStats.wal_write == 0
+ * and WalStats.wal_sync == 0, it means the counters are not updated.
+ * It's not enough to check only the wal records, because there is a
+ * case where a backend generates no wal records, but just writes them
+ * because it needs to do write-ahead-logging when flush the table data.
* ----------
*/
void
pgstat_report_wal(void)
{
- WalUsage walusage;
-
/*
- * Calculate how much WAL usage counters are increased by substracting the
- * previous counters from the current ones. Fill the results in WAL stats
- * message.
+ * set the counters related to generated WAL data.
*/
- MemSet(&walusage, 0, sizeof(WalUsage));
- WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
-
- WalStats.m_wal_records = walusage.wal_records;
- WalStats.m_wal_fpi = walusage.wal_fpi;
- WalStats.m_wal_bytes = walusage.wal_bytes;
+ WalStats.m_wal_records = pgWalUsage.wal_records;
+ WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
+ WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
/*
* Send WAL stats message to the collector.
@@ -4704,18 +4710,25 @@ pgstat_report_wal(void)
return;
/*
- * Save the current counters for the subsequent calculation of WAL usage.
+ * Clear out the statistics buffer for generated WAL data, so it can be
+ * re-used.
+ *
+ * It's ok because no one takes difference crossing pgstat_report_stat()
+ * calls although these counters are used in another places, for example
+ * in pg_stat_statements.c.
*/
- prevWalUsage = pgWalUsage;
+ MemSet(&pgWalUsage, 0, sizeof(WalUsage));
}
/* ----------
* pgstat_send_wal() -
*
- * Send WAL statistics to the collector.
+ * Send WAL statistics to the collector.
*
- * If 'force' is not set, WAL stats message is only sent if enough time has
- * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
+ * If the processes that generate WAL data must call pgstat_report_wal() instead.
+ *
+ * If 'force' is not set, WAL stats message is only sent if the statistics counters
+ * are updated and enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
*
* Return true if the message is sent, and false otherwise.
* ----------
@@ -4723,26 +4736,28 @@ pgstat_report_wal(void)
bool
pgstat_send_wal(bool force)
{
- /* We assume this initializes to zeroes */
- static const PgStat_MsgWal all_zeroes;
static TimestampTz sendTime = 0;
- /*
- * This function can be called even if nothing at all has happened. In
- * this case, avoid sending a completely empty message to the stats
- * collector.
- */
- if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
- return false;
-
if (!force)
{
- TimestampTz now = GetCurrentTimestamp();
+ TimestampTz now;
+
+ /*
+ * First, to check whether the WAL statistics counters are updated. If
+ * not updated, we can skip getting a timestamp which is expensive. We
+ * don't need to check all counters of the WAL statistics counters
+ * because other counters never be incremented if the counts of
+ * written/synced are zero.
+ */
+ if (walStats.wal_write == 0 && walStats.wal_sync == 0)
+ return false;
/*
* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
- * msec since we last sent one.
+ * msec since we last sent one to avoid overloading the stats
+ * collector.
*/
+ now = GetCurrentTimestamp();
if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
return false;
sendTime = now;