On 2021/03/30 17:28, Kyotaro Horiguchi wrote: > At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikeda...@oss.nttdata.com> > wrote in >> I update the patch since there were my misunderstanding points. >> >> On 2021/03/26 16:20, Masahiro Ikeda wrote: >>> Thanks for many your suggestions! >>> I made the patch to handle the issues. >>> >>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage, >>>> instead of just accumulating the stats since the last submission? >>>> There doesn't seem to be any comment explaining it? Computing >>>> differences to previous values, and copying to prev*, isn't free. I >>>> assume this is out of a fear that the stats could get reset before >>>> they're used for instrument.c purposes - but who knows? >>> >>> I removed the unnecessary code copying pgWalUsage and just reset the >>> pgWalUsage after reporting the stats in pgstat_report_wal(). >> >> I didn't change this. >> >>>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the >>>> former being used by wal writer, the latter by most other processes? >>>> There again don't seem to be comments explaining this. >>> >>> I added the comments why two functions are separated. >>> (But is it better to merge them?) >> >> I updated the comments for following reasons. >> >> >>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >>>> just to figure out if there's been any changes isn't all that >>>> cheap. This is regularly exercised in read-only workloads too. Seems >>>> adding a boolean WalStats.have_pending = true or such would be >>>> better. >>>> 4) For plain backends pgstat_report_wal() is called by >>>> pgstat_report_stat() - but it is not checked as part of the "Don't >>>> expend a clock check if nothing to do" check at the top. It's >>>> probably rare to have pending wal stats without also passing one of >>>> the other conditions, but ... >>> >>> I added the logic to check if the stats counters are updated or not in >>> pgstat_report_stat() using not only generated wal record but also write/sync >>> counters, and it can skip to call reporting function. >> >> I removed the checking code whether the wal stats counters were updated or >> not >> in pgstat_report_stat() since I couldn't understand the valid case yet. >> pgstat_report_stat() is called by backends when the transaction is finished. >> This means that the condition seems always pass. > > Doesn't the same holds for all other counters? If you are saying that > "wal counters should be zero if all other stats counters are zero", we > need an assertion to check that and a comment to explain that. > > I personally find it safer to add the WAL-stats condition to the > fast-return check, rather than addin such assertion. Thanks for your comments.
OK, I added the condition to the fast-return check. I noticed that I misunderstood that the purpose is to avoid expanding a clock check using WAL stats counters. But, the purpose is to make the conditions stricter, right? > pgstat_send_wal() is called mainly from pgstat_report_wal() which > consumes pgWalStats counters and WalWriterMain() which > doesn't. Checking on pgWalStats counters isn't so complex that we need > to avoid that in wal writer, thus *I* think pgstat_send_wal() and > pgstat_report_wal() can be consolidated. Even if you have a strong > opinion that wal writer should call a separate function, the name > should be something other than pgstat_send_wal() since it ignores > pgWalUsage counters, which are supposed to be included in "WAL stats". OK, I consolidated them. >> I thought to implement if the WAL stats counters were not updated, skip to >> send all statistics including the counters for databases and so on. But I >> think it's not good because it may take more time to be reflected the >> generated stats by read-only transaction. > > Ur, yep. > >>> Although I added the condition which the write/sync counters are updated or >>> not, I haven't understood the following case yet...Sorry. I checked related >>> code and tested to insert large object, but I couldn't. I'll investigate >>> more >>> deeply, but if you already know the function which leads the following case, >>> please let me know. >> >> I understood the above case (Fujii-san, thanks for your advice in person). >> When to flush buffers, for example, to select buffer replacement victim, >> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats >> counters are updated or not, I check the number of generated wal record and >> the counter of syncing in pgstat_report_wal(). >> >> The reason why not to check the counter of writing is that if to write is >> happened, to sync is happened too in the above case. I added the comments in >> the patch. > > Mmm.. Although I couldn't read you clearly, The fact that a flush may > happen without a write means the reverse at the same time, a write may > happen without a flush. For asynchronous commits, WAL-write happens > alone unaccompanied by a flush. And the corresponding flush would > happen later without writes. Sorry, I didn't explain it enough. For processes which may generate WAL records like backends, I thought it's enough to check (1)the number of generated WAL records and (2)the counters of syncing(flushing) the WAL. This is checked in pgstat_report_wal(). Sorry for that I didn't mention (1) in the above thread. If a backend execute a write transaction, some WAL records must be generated. So, it's ok to check (1) only regardless of whether asynchronous commit is enabled or not. OHOT, if a backend execute a read-only transaction, WAL records won't be generated (although HOT makes a wal records exactly...). But, WAL-write and flush may happen when to flush buffers via XLogFlush(). In this case, if WAL-write happened, flush must be happen later. But, if my understanding is correct, there is no a case to flush doesn't happen, but to write happen. So, I thought (2) is needed and it's enough to check the counter of syncing(flushing). Regards, -- Masahiro Ikeda NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index 237e13361b..15eb3c8e64 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -19,6 +19,17 @@ BufferUsage pgBufferUsage; static BufferUsage save_pgBufferUsage; + +/* + * generated WAL usage counters. + * + * Be careful that the counters are cleared after reporting them to + * the stats collector although you can use WalUsageAccumDiff() + * to computing differences to previous values. For backends, + * the counters may be reset after a transaction is finished and + * pgstat_send_wal() is invoked, so you can compute the difference + * in the same transaction only. + */ WalUsage pgWalUsage; static WalUsage save_pgWalUsage; diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index cdd07770a0..75a95f3de7 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -505,7 +505,7 @@ CheckpointerMain(void) pgstat_send_bgwriter(); /* Send WAL statistics to the stats collector. */ - pgstat_report_wal(); + pgstat_send_wal(true); /* * If any checkpoint flags have been set, redo the loop to handle the @@ -583,7 +583,7 @@ HandleCheckpointerInterrupts(void) BgWriterStats.m_requested_checkpoints++; ShutdownXLOG(0, 0); pgstat_send_bgwriter(); - pgstat_report_wal(); + pgstat_send_wal(true); /* Normal exit from the checkpointer is here */ proc_exit(0); /* done */ diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 60f45ccc4e..7fd03b69b4 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -144,14 +144,6 @@ char *pgstat_stat_tmpname = NULL; PgStat_MsgBgWriter BgWriterStats; PgStat_MsgWal WalStats; -/* - * WAL usage counters saved from pgWALUsage at the previous call to - * pgstat_report_wal(). This is used to calculate how much WAL usage - * happens between pgstat_report_wal() calls, by substracting - * the previous counters from the current ones. - */ -static WalUsage prevWalUsage; - /* * List of SLRU names that we keep stats for. There is no central registry of * SLRUs, so we use this fixed list instead. The "other" entry is used for @@ -879,9 +871,19 @@ pgstat_report_stat(bool disconnect) TabStatusArray *tsa; int i; - /* Don't expend a clock check if nothing to do */ + /* + * Don't expend a clock check if nothing to do. + * + * Note: regarding to WAL statistics counters, it's not enough to check + * only whether the wal record is generated or not. If a read transaction + * is executed, wal records aren't nomally generated (although HOT makes + * wal records). But, just writes and syncs the wal data may happen when + * to flush buffers. + */ if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && pgStatXactCommit == 0 && pgStatXactRollback == 0 && + pgWalUsage.wal_records == 0 && walStats.wal_write == 0 && + walStats.wal_sync == 0 && !have_function_stats && !disconnect) return; @@ -975,7 +977,7 @@ pgstat_report_stat(bool disconnect) pgstat_send_funcstats(); /* Send WAL statistics */ - pgstat_report_wal(); + pgstat_send_wal(true); /* Finally send SLRU statistics */ pgstat_send_slru(); @@ -3117,13 +3119,6 @@ pgstat_initialize(void) MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; } - /* - * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can - * calculate how much pgWalUsage counters are increased by substracting - * prevWalUsage from pgWalUsage. - */ - prevWalUsage = pgWalUsage; - /* Set up a process-exit hook to clean up */ on_shmem_exit(pgstat_beshutdown_hook, 0); } @@ -4671,44 +4666,6 @@ pgstat_send_bgwriter(void) MemSet(&BgWriterStats, 0, sizeof(BgWriterStats)); } -/* ---------- - * pgstat_report_wal() - - * - * Calculate how much WAL usage counters are increased and send - * WAL statistics to the collector. - * - * Must be called by processes that generate WAL. - * ---------- - */ -void -pgstat_report_wal(void) -{ - WalUsage walusage; - - /* - * Calculate how much WAL usage counters are increased by substracting the - * previous counters from the current ones. Fill the results in WAL stats - * message. - */ - MemSet(&walusage, 0, sizeof(WalUsage)); - WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); - - WalStats.m_wal_records = walusage.wal_records; - WalStats.m_wal_fpi = walusage.wal_fpi; - WalStats.m_wal_bytes = walusage.wal_bytes; - - /* - * Send WAL stats message to the collector. - */ - if (!pgstat_send_wal(true)) - return; - - /* - * Save the current counters for the subsequent calculation of WAL usage. - */ - prevWalUsage = pgWalUsage; -} - /* ---------- * pgstat_send_wal() - * @@ -4720,20 +4677,28 @@ pgstat_report_wal(void) * Return true if the message is sent, and false otherwise. * ---------- */ -bool +void pgstat_send_wal(bool force) { - /* We assume this initializes to zeroes */ - static const PgStat_MsgWal all_zeroes; static TimestampTz sendTime = 0; /* - * This function can be called even if nothing at all has happened. In - * this case, avoid sending a completely empty message to the stats - * collector. + * First, to check the WAL stats counters were updated. + * + * Even if the 'force' is true, we don't need to send the stats if the + * counters were not updated. + * + * We can know whether the counters were updated or not to check only + * three counters. So, for performance, we don't allocate another memory + * spaces and check the all stats like pgstat_send_slru(). + * + * 'wal_records' is zero means that wal records weren't generated. So, + * 'wal_fpi', 'wal_bytes', 'm_wal_buffers_full' is zero too. 'wal_writes' + * and 'wal_sync' are zero means the counter of time spent is zero too. */ - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) - return false; + if (pgWalUsage.wal_records == 0 && walStats.wal_write == 0 && + walStats.wal_sync == 0) + return; if (!force) { @@ -4741,13 +4706,21 @@ pgstat_send_wal(bool force) /* * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL - * msec since we last sent one. + * msec since we last sent one to avoid overloading the stats + * collector. */ if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL)) - return false; + return; sendTime = now; } + /* + * set the counters related to generated WAL data. + */ + WalStats.m_wal_records = pgWalUsage.wal_records; + WalStats.m_wal_fpi = pgWalUsage.wal_fpi; + WalStats.m_wal_bytes = pgWalUsage.wal_bytes; + /* * Prepare and send the message */ @@ -4759,7 +4732,14 @@ pgstat_send_wal(bool force) */ MemSet(&WalStats, 0, sizeof(WalStats)); - return true; + /* + * Clear out the statistics for generated wal usage too. + * + * Although these counters are used in another places, for example in + * pg_stat_statements.c, it's ok to clear out because no one takes + * difference crossing pgstat_send_wal() calls. + */ + MemSet(&pgWalUsage, 0, sizeof(WalUsage)); } /* ---------- diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 87672e6f30..2174a7439c 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -1601,8 +1601,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info, extern void pgstat_send_archiver(const char *xlog, bool failed); extern void pgstat_send_bgwriter(void); -extern void pgstat_report_wal(void); -extern bool pgstat_send_wal(bool force); +extern void pgstat_send_wal(bool force); /* ---------- * Support functions for the SQL-callable functions to