On 2021/03/30 17:28, Kyotaro Horiguchi wrote:
> At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikeda...@oss.nttdata.com> 
> wrote in 
>> I update the patch since there were my misunderstanding points.
>>
>> On 2021/03/26 16:20, Masahiro Ikeda wrote:
>>> Thanks for many your suggestions!
>>> I made the patch to handle the issues.
>>>
>>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>>>    instead of just accumulating the stats since the last submission?
>>>>    There doesn't seem to be any comment explaining it? Computing
>>>>    differences to previous values, and copying to prev*, isn't free. I
>>>>    assume this is out of a fear that the stats could get reset before
>>>>    they're used for instrument.c purposes - but who knows?
>>>
>>> I removed the unnecessary code copying pgWalUsage and just reset the
>>> pgWalUsage after reporting the stats in pgstat_report_wal().
>>
>> I didn't change this.
>>
>>>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>>>>    former being used by wal writer, the latter by most other processes?
>>>>    There again don't seem to be comments explaining this.
>>>
>>> I added the comments why two functions are separated.
>>> (But is it better to merge them?)
>>
>> I updated the comments for following reasons.
>>
>>
>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>>>    just to figure out if there's been any changes isn't all that
>>>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>>>    adding a boolean WalStats.have_pending = true or such would be
>>>>    better.
>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>>>    expend a clock check if nothing to do" check at the top.  It's
>>>>    probably rare to have pending wal stats without also passing one of
>>>>    the other conditions, but ...
>>>
>>> I added the logic to check if the stats counters are updated or not in
>>> pgstat_report_stat() using not only generated wal record but also write/sync
>>> counters, and it can skip to call reporting function.
>>
>> I removed the checking code whether the wal stats counters were updated or 
>> not
>> in pgstat_report_stat() since I couldn't understand the valid case yet.
>> pgstat_report_stat() is called by backends when the transaction is finished.
>> This means that the condition seems always pass.
> 
> Doesn't the same holds for all other counters?  If you are saying that
> "wal counters should be zero if all other stats counters are zero", we
> need an assertion to check that and a comment to explain that.
> 
> I personally find it safer to add the WAL-stats condition to the
> fast-return check, rather than addin such assertion.
Thanks for your comments.

OK, I added the condition to the fast-return check. I noticed that I
misunderstood that the purpose is to avoid expanding a clock check using WAL
stats counters. But, the purpose is to make the conditions stricter, right?


> pgstat_send_wal() is called mainly from pgstat_report_wal() which
> consumes pgWalStats counters and WalWriterMain() which
> doesn't. Checking on pgWalStats counters isn't so complex that we need
> to avoid that in wal writer, thus *I* think pgstat_send_wal() and
> pgstat_report_wal() can be consolidated.  Even if you have a strong
> opinion that wal writer should call a separate function, the name
> should be something other than pgstat_send_wal() since it ignores
> pgWalUsage counters, which are supposed to be included in "WAL stats".

OK, I consolidated them.



>> I thought to implement if the WAL stats counters were not updated, skip to
>> send all statistics including the counters for databases and so on. But I
>> think it's not good because it may take more time to be reflected the
>> generated stats by read-only transaction.
> 
> Ur, yep.
> 
>>> Although I added the condition which the write/sync counters are updated or
>>> not, I haven't understood the following case yet...Sorry. I checked related
>>> code and tested to insert large object, but I couldn't. I'll investigate 
>>> more
>>> deeply, but if you already know the function which leads the following case,
>>> please let me know.
>>
>> I understood the above case (Fujii-san, thanks for your advice in person).
>> When to flush buffers, for example, to select buffer replacement victim,
>> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
>> counters are updated or not, I check the number of generated wal record and
>> the counter of syncing in pgstat_report_wal().
>>
>> The reason why not to check the counter of writing is that if to write is
>> happened, to sync is happened too in the above case. I added the comments in
>> the patch.
> 
> Mmm..  Although I couldn't read you clearly, The fact that a flush may
> happen without a write means the reverse at the same time, a write may
> happen without a flush.  For asynchronous commits, WAL-write happens
> alone unaccompanied by a flush.  And the corresponding flush would
> happen later without writes.

Sorry, I didn't explain it enough.

For processes which may generate WAL records like backends, I thought it's
enough to check (1)the number of generated WAL records and (2)the counters of
syncing(flushing) the WAL. This is checked in pgstat_report_wal(). Sorry for
that I didn't mention (1) in the above thread.

If a backend execute a write transaction, some WAL records must be generated.
So, it's ok to check (1) only regardless of whether asynchronous commit is
enabled or not.

OHOT, if a backend execute a read-only transaction, WAL records won't be
generated (although HOT makes a wal records exactly...). But, WAL-write and
flush may happen when to flush buffers via XLogFlush(). In this case, if
WAL-write happened, flush must be happen later. But, if my understanding is
correct, there is no a case to flush doesn't happen, but to write happen.
So, I thought (2) is needed and it's enough to check the counter of
syncing(flushing).


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 237e13361b..15eb3c8e64 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,6 +19,17 @@
 
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
+
+/*
+ * generated WAL usage counters.
+ *
+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_send_wal() is invoked, so you can compute the difference
+ * in the same transaction only.
+ */
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index cdd07770a0..75a95f3de7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -505,7 +505,7 @@ CheckpointerMain(void)
 		pgstat_send_bgwriter();
 
 		/* Send WAL statistics to the stats collector. */
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
@@ -583,7 +583,7 @@ HandleCheckpointerInterrupts(void)
 		BgWriterStats.m_requested_checkpoints++;
 		ShutdownXLOG(0, 0);
 		pgstat_send_bgwriter();
-		pgstat_report_wal();
+		pgstat_send_wal(true);
 
 		/* Normal exit from the checkpointer is here */
 		proc_exit(0);			/* done */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 60f45ccc4e..7fd03b69b4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -144,14 +144,6 @@ char	   *pgstat_stat_tmpname = NULL;
 PgStat_MsgBgWriter BgWriterStats;
 PgStat_MsgWal WalStats;
 
-/*
- * WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_report_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_report_wal() calls, by substracting
- * the previous counters from the current ones.
- */
-static WalUsage prevWalUsage;
-
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -879,9 +871,19 @@ pgstat_report_stat(bool disconnect)
 	TabStatusArray *tsa;
 	int			i;
 
-	/* Don't expend a clock check if nothing to do */
+	/*
+	 * Don't expend a clock check if nothing to do.
+	 *
+	 * Note: regarding to WAL statistics counters, it's not enough to check
+	 * only whether the wal record is generated or not. If a read transaction
+	 * is executed, wal records aren't nomally generated (although HOT makes
+	 * wal records). But, just writes and syncs the wal data may happen when
+	 * to flush buffers.
+	 */
 	if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
 		pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+		pgWalUsage.wal_records == 0 && walStats.wal_write == 0 &&
+		walStats.wal_sync == 0 &&
 		!have_function_stats && !disconnect)
 		return;
 
@@ -975,7 +977,7 @@ pgstat_report_stat(bool disconnect)
 	pgstat_send_funcstats();
 
 	/* Send WAL statistics */
-	pgstat_report_wal();
+	pgstat_send_wal(true);
 
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
@@ -3117,13 +3119,6 @@ pgstat_initialize(void)
 		MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
 	}
 
-	/*
-	 * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
-	 * calculate how much pgWalUsage counters are increased by substracting
-	 * prevWalUsage from pgWalUsage.
-	 */
-	prevWalUsage = pgWalUsage;
-
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
@@ -4671,44 +4666,6 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
-/* ----------
- * pgstat_report_wal() -
- *
- * Calculate how much WAL usage counters are increased and send
- * WAL statistics to the collector.
- *
- * Must be called by processes that generate WAL.
- * ----------
- */
-void
-pgstat_report_wal(void)
-{
-	WalUsage	walusage;
-
-	/*
-	 * Calculate how much WAL usage counters are increased by substracting the
-	 * previous counters from the current ones. Fill the results in WAL stats
-	 * message.
-	 */
-	MemSet(&walusage, 0, sizeof(WalUsage));
-	WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
-
-	WalStats.m_wal_records = walusage.wal_records;
-	WalStats.m_wal_fpi = walusage.wal_fpi;
-	WalStats.m_wal_bytes = walusage.wal_bytes;
-
-	/*
-	 * Send WAL stats message to the collector.
-	 */
-	if (!pgstat_send_wal(true))
-		return;
-
-	/*
-	 * Save the current counters for the subsequent calculation of WAL usage.
-	 */
-	prevWalUsage = pgWalUsage;
-}
-
 /* ----------
  * pgstat_send_wal() -
  *
@@ -4720,20 +4677,28 @@ pgstat_report_wal(void)
  * Return true if the message is sent, and false otherwise.
  * ----------
  */
-bool
+void
 pgstat_send_wal(bool force)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_MsgWal all_zeroes;
 	static TimestampTz sendTime = 0;
 
 	/*
-	 * This function can be called even if nothing at all has happened. In
-	 * this case, avoid sending a completely empty message to the stats
-	 * collector.
+	 * First, to check the WAL stats counters were updated.
+	 *
+	 * Even if the 'force' is true, we don't need to send the stats if the
+	 * counters were not updated.
+	 *
+	 * We can know whether the counters were updated or not to check only
+	 * three counters. So, for performance, we don't allocate another memory
+	 * spaces and check the all stats like pgstat_send_slru().
+	 *
+	 * 'wal_records' is zero means that wal records weren't generated. So,
+	 * 'wal_fpi', 'wal_bytes', 'm_wal_buffers_full' is zero too. 'wal_writes'
+	 * and 'wal_sync' are zero means the counter of time spent is zero too.
 	 */
-	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-		return false;
+	if (pgWalUsage.wal_records == 0 && walStats.wal_write == 0 &&
+		walStats.wal_sync == 0)
+		return;
 
 	if (!force)
 	{
@@ -4741,13 +4706,21 @@ pgstat_send_wal(bool force)
 
 		/*
 		 * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
-		 * msec since we last sent one.
+		 * msec since we last sent one to avoid overloading the stats
+		 * collector.
 		 */
 		if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
-			return false;
+			return;
 		sendTime = now;
 	}
 
+	/*
+	 * set the counters related to generated WAL data.
+	 */
+	WalStats.m_wal_records = pgWalUsage.wal_records;
+	WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
+	WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
+
 	/*
 	 * Prepare and send the message
 	 */
@@ -4759,7 +4732,14 @@ pgstat_send_wal(bool force)
 	 */
 	MemSet(&WalStats, 0, sizeof(WalStats));
 
-	return true;
+	/*
+	 * Clear out the statistics for generated wal usage too.
+	 *
+	 * Although these counters are used in another places, for example in
+	 * pg_stat_statements.c, it's ok to clear out because no one takes
+	 * difference crossing pgstat_send_wal() calls.
+	 */
+	MemSet(&pgWalUsage, 0, sizeof(WalUsage));
 }
 
 /* ----------
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 87672e6f30..2174a7439c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1601,8 +1601,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
-extern void pgstat_report_wal(void);
-extern bool pgstat_send_wal(bool force);
+extern void pgstat_send_wal(bool force);
 
 /* ----------
  * Support functions for the SQL-callable functions to

Reply via email to