Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/25 11:50, Masahiro Ikeda wrote: On 2021/03/23 16:10, Fujii Masao wrote: On 2021/03/22 20:25, ikedamsh wrote: Agreed. Users can know whether the stats is for walreceiver or not. The pg_stat_wal view in standby server shows for the walreceiver, and in primary server it shows for the others. So, I updated the document. (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the docs! There was the discussion about when the stats collector is invoked, at [1]. Currently during archive recovery or standby, the stats collector is invoked when the startup process reaches the consistent state, sends PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting read-only connections. But walreceiver can be invoked at earlier stage. This can cause walreceiver to generate and send the statistics about WAL writing even though the stats collector has not been running yet. This might be problematic? If so, maybe we need to ensure that the stats collector is invoked before walreceiver? During recovery, the stats collector is not invoked if hot standby mode is disabled. But walreceiver can be running in this case. So probably we should change walreceiver so that it's invoked even when hot standby is disabled? Otherwise we cannnot collect the statistics about WAL writing by walreceiver in that case. [1] https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com Thanks for comments! I didn't notice that. As I mentioned[1], if my understanding is right, this issue seem to be not for only the wal receiver. Since the shared memory thread already handles these issues, does this patch, which to collect the stats for the wal receiver and make a common function for writing wal files, have to be committed after the patches for share memory stats are committed? Or to handle them in this thread because we don't know when the shared memory stats patches will be committed. I think the former is better because to collect stats in shared memory is very useful feature for users and it make a big change in design. So, I think it's beneficial to make an effort to move the shared memory stats thread forward (by reviewing or testing) instead of handling the issues in this thread. Sounds reasonable. Agreed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/23 16:10, Fujii Masao wrote: > > > On 2021/03/22 20:25, ikedamsh wrote: >> Agreed. Users can know whether the stats is for walreceiver or not. The >> pg_stat_wal view in standby server shows for the walreceiver, and in primary >> server it shows for the others. So, I updated the document. >> (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) > > Thanks for updating the docs! > > There was the discussion about when the stats collector is invoked, at [1]. > Currently during archive recovery or standby, the stats collector is > invoked when the startup process reaches the consistent state, sends > PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting > read-only connections. But walreceiver can be invoked at earlier stage. > This can cause walreceiver to generate and send the statistics about WAL > writing even though the stats collector has not been running yet. This might > be problematic? If so, maybe we need to ensure that the stats collector is > invoked before walreceiver? > > During recovery, the stats collector is not invoked if hot standby mode is > disabled. But walreceiver can be running in this case. So probably we should > change walreceiver so that it's invoked even when hot standby is disabled? > Otherwise we cannnot collect the statistics about WAL writing by walreceiver > in that case. > > [1] > https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com Thanks for comments! I didn't notice that. As I mentioned[1], if my understanding is right, this issue seem to be not for only the wal receiver. Since the shared memory thread already handles these issues, does this patch, which to collect the stats for the wal receiver and make a common function for writing wal files, have to be committed after the patches for share memory stats are committed? Or to handle them in this thread because we don't know when the shared memory stats patches will be committed. I think the former is better because to collect stats in shared memory is very useful feature for users and it make a big change in design. So, I think it's beneficial to make an effort to move the shared memory stats thread forward (by reviewing or testing) instead of handling the issues in this thread. [1] https://www.postgresql.org/message-id/9f4e19ad-518d-b91a-e500-25a666471c42%40oss.nttdata.com Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/22 20:25, ikedamsh wrote: Agreed. Users can know whether the stats is for walreceiver or not. The pg_stat_wal view in standby server shows for the walreceiver, and in primary server it shows for the others. So, I updated the document. (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the docs! There was the discussion about when the stats collector is invoked, at [1]. Currently during archive recovery or standby, the stats collector is invoked when the startup process reaches the consistent state, sends PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting read-only connections. But walreceiver can be invoked at earlier stage. This can cause walreceiver to generate and send the statistics about WAL writing even though the stats collector has not been running yet. This might be problematic? If so, maybe we need to ensure that the stats collector is invoked before walreceiver? During recovery, the stats collector is not invoked if hot standby mode is disabled. But walreceiver can be running in this case. So probably we should change walreceiver so that it's invoked even when hot standby is disabled? Otherwise we cannnot collect the statistics about WAL writing by walreceiver in that case. [1] https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/22 16:50, Fujii Masao wrote: > > > On 2021/03/22 9:50, ikedamsh wrote: >> Agreed. I separated the patches. >> >> If only the former is committed, my trivial concern is that there may be >> a disadvantage, but no advantage for the standby server. It may lead to >> performance degradation to the wal receiver by calling >> INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the >> latter patch is committed. > > Your concern is valid, so let's polish and commit also the 0003 patch to v14. > I'm still thinking that it's better to separate wal_xxx columns into > walreceiver's and the others. But if we count even walreceiver activity on > the existing columns, regarding 0003 patch, we need to update the document? > For example, "Number of times WAL buffers were written out to disk via > XLogWrite request." should be "Number of times WAL buffers were written > out to disk via XLogWrite request and by WAL receiver process."? Maybe > we need to append some descriptions about this into "WAL configuration" > section? Agreed. Users can know whether the stats is for walreceiver or not. The pg_stat_wal view in standby server shows for the walreceiver, and in primary server it shows for the others. So, I updated the document. (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) >> I followed the argument of pg_pwrite(). >> But, I think "char *" is better, so fixed it. > > Thanks for updating the patch! > > +extern int XLogWriteFile(int fd, char *buf, > + size_t nbyte, off_t offset, > + TimeLineID timelineid, XLogSegNo segno, > + bool write_all); > > write_all seems not to be necessary. You added this flag for walreceiver, > I guess. But even without the argument, walreceiver seems to work expectedly. > So, what about the attached patch? I applied some cosmetic changes to the > patch. Thanks a lot. Yes, "write_all" is unnecessary. Your patch is looks good to me. Regards, -- Masahiro Ikeda NTT DATA CORPORATION diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index db4b4e460c..281b13b9fa 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3493,7 +3493,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Number of times WAL buffers were written out to disk via - XLogWrite request. + XLogWrite request and WAL data were written + out to disk by the WAL receiver process. See for more information about the internal WAL function XLogWrite. @@ -3521,7 +3522,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Total amount of time spent writing WAL buffers to disk via - XLogWrite request, in milliseconds + XLogWrite request and WAL data to disk + by the WAL receiver process, in milliseconds (if is enabled, otherwise zero). This includes the sync time when wal_sync_method is either diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index ae4a3c1293..39e7028c96 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -769,7 +769,7 @@ - There are two internal functions to write WAL data to disk: + There are two internal functions to write generated WAL data to disk: XLogWrite and issue_xlog_fsync. When is enabled, the total amounts of time XLogWrite writes and @@ -795,7 +795,19 @@ issue_xlog_fsync syncs WAL data to disk are also counted as wal_write and wal_sync in pg_stat_wal, respectively. - + To write replicated WAL data to disk by the WAL receiver is almost the same + as above except for some points. First, there is a dedicated code path for the + WAL receiver to write data although issue_xlog_fsync is + the same for syncing data. + Second, the WAL receiver writes replicated WAL data per bytes from the local + memory although the generated WAL data is written per WAL buffer pages. + The counters of wal_write, wal_sync, + wal_write_time, and wal_sync_time are + common statistics for writing/syncing both generated and replicated WAL data. + But, you can distinguish them because the generated WAL data is written/synced + in the primary server and the replicated WAL data is written/synced in + the standby server. + diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index a7a94d2a83..df028c5039 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg) /* Ensure that all WAL records received are flushed to disk */ XLogWalRcvFlush(true); + /* Send WAL statistics to the stats collector before terminating */ + pgstat_send_wal(true); + /* Mark ourselves inactive in shared memory */ SpinLockAcquire(>mutex); Assert(walrcv->walRcvState
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/22 9:50, ikedamsh wrote: Agreed. I separated the patches. If only the former is committed, my trivial concern is that there may be a disadvantage, but no advantage for the standby server. It may lead to performance degradation to the wal receiver by calling INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the latter patch is committed. Your concern is valid, so let's polish and commit also the 0003 patch to v14. I'm still thinking that it's better to separate wal_xxx columns into walreceiver's and the others. But if we count even walreceiver activity on the existing columns, regarding 0003 patch, we need to update the document? For example, "Number of times WAL buffers were written out to disk via XLogWrite request." should be "Number of times WAL buffers were written out to disk via XLogWrite request and by WAL receiver process."? Maybe we need to append some descriptions about this into "WAL configuration" section? I followed the argument of pg_pwrite(). But, I think "char *" is better, so fixed it. Thanks for updating the patch! +extern int XLogWriteFile(int fd, char *buf, + size_t nbyte, off_t offset, + TimeLineID timelineid, XLogSegNo segno, + bool write_all); write_all seems not to be necessary. You added this flag for walreceiver, I guess. But even without the argument, walreceiver seems to work expectedly. So, what about the attached patch? I applied some cosmetic changes to the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6f8810e149..9d8ea7edca 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2534,63 +2534,15 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) { char *from; Sizenbytes; - Sizenleft; int written; - instr_time start; /* OK to write the page(s) */ from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; nbytes = npages * (Size) XLOG_BLCKSZ; - nleft = nbytes; - do - { - errno = 0; - - /* Measure I/O timing to write WAL data */ - if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); - - pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); - written = pg_pwrite(openLogFile, from, nleft, startoffset); - pgstat_report_wait_end(); - - /* -* Increment the I/O timing and the number of times WAL data -* were written out to disk. -*/ - if (track_wal_io_timing) - { - instr_time duration; - - INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration); - } - - WalStats.m_wal_write++; - - if (written <= 0) - { - charxlogfname[MAXFNAMELEN]; - int save_errno; - - if (errno == EINTR) - continue; - - save_errno = errno; - XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo, - wal_segment_size); - errno = save_errno; - ereport(PANIC, - (errcode_for_file_access(), -errmsg("could not write to log file %s " - "at offset %u, length %zu: %m", - xlogfname, startoffset, nleft))); - } - nleft -= written; -
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-19 16:30, Fujii Masao wrote: > On 2021/03/15 10:39, Masahiro Ikeda wrote: >> Thanks, I understood get_sync_bit() checks the sync flags and >> the write unit of generated wal data and replicated wal data is >> different. >> (It's interesting optimization whether to use kernel cache or not.) >> >> OK. Although I agree to separate the stats for the walrecever, >> I want to hear opinions from other people too. I didn't change the >> patch. >> >> Please feel to your comments. > > What about applying the patch for common WAL write function like > XLogWriteFile(), separately from the patch for walreceiver's stats? > Seems the former reaches the consensus, so we can commit it firstly. > Also even only the former change is useful because which allows > walreceiver to report WALWrite wait event. Agreed. I separated the patches. If only the former is committed, my trivial concern is that there may be a disadvantage, but no advantage for the standby server. It may lead to performance degradation to the wal receiver by calling INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the latter patch is committed. I think it's ok because this not happening in the case to disable the "track_wal_io_timing" in the standby server. Although some users may start the standby server using the backup which "track_wal_io_timing" is enabled in the primary server, they will say it's ok since the users already accept the performance degradation in the primary server. >> OK. I agree. >> >> I wonder to change the error check ways depending on who calls this >> function? >> Now, only the walreceiver checks (1)errno==0 and doesn't check >> (2)errno==ENITR. >> Other processes are the opposite. >> >> IIUC, it's appropriate that every process checks (1)(2). >> Please let me know my understanding is wrong. > > I'm thinking the same. Regarding (2), commit 79ce29c734 introduced > that code. According to the following commit log, it seems harmless > to retry on EINTR even walreceiver. > > Also retry on EINTR. All signals used in the backend are flagged > SA_RESTART > nowadays, so it shouldn't happen, but better to be defensive. Thanks, I understood. >>> BTW, currently XLogWrite() increments IO timing even when pg_pwrite() >>> reports an error. But this is useless. Probably IO timing should be >>> incremented after the return code of pg_pwrite() is checked, instead? >> >> Yes, I agree. I fixed it. >> (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) > > Thanks for the patch! > > nleft = nbytes; > do > { > - errno = 0; > + written = XLogWriteFile(openLogFile, from, > nleft, (off_t) > startoffset, > + > ThisTimeLineID, openLogSegNo, wal_segment_size); > > Can we merge this do-while loop in XLogWrite() into the loop > in XLogWriteFile()? > If we do that, ISTM that the following codes are not necessary in > XLogWrite(). > > nleft -= written; > from += written; OK, I fixed it. > + * 'segsize' is a segment size of WAL segment file. > > Since segsize is always wal_segment_size, segsize argument seems > not necessary in XLogWriteFile(). Right. I fixed it. > +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset, > + TimeLineID timelineid, XLogSegNo segno, int segsize) > > Why did you use "const void *" instead of "char *" for *buf? I followed the argument of pg_pwrite(). But, I think "char *" is better, so fixed it. > Regarding 0005 patch, I will review it later. Thanks. Regards, -- Masahiro Ikeda NTT DATA CORPORATION diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index a7a94d2a83..df028c5039 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg) /* Ensure that all WAL records received are flushed to disk */ XLogWalRcvFlush(true); + /* Send WAL statistics to the stats collector before terminating */ + pgstat_send_wal(true); + /* Mark ourselves inactive in shared memory */ SpinLockAcquire(>mutex); Assert(walrcv->walRcvState == WALRCV_STREAMING || @@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) XLogArchiveForceDone(xlogfname); else XLogArchiveNotify(xlogfname); + +/* + * Send WAL statistics to the stats collector when finishing + * the current WAL segment file to avoid overloading it. + */ +pgstat_send_wal(false); } recvFile = -1; diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index bd5e787e55..4c7d90f1b9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2536,61 +2536,14 @@
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/15 10:39, Masahiro Ikeda wrote: Thanks, I understood get_sync_bit() checks the sync flags and the write unit of generated wal data and replicated wal data is different. (It's interesting optimization whether to use kernel cache or not.) OK. Although I agree to separate the stats for the walrecever, I want to hear opinions from other people too. I didn't change the patch. Please feel to your comments. What about applying the patch for common WAL write function like XLogWriteFile(), separately from the patch for walreceiver's stats? Seems the former reaches the consensus, so we can commit it firstly. Also even only the former change is useful because which allows walreceiver to report WALWrite wait event. OK. I agree. I wonder to change the error check ways depending on who calls this function? Now, only the walreceiver checks (1)errno==0 and doesn't check (2)errno==ENITR. Other processes are the opposite. IIUC, it's appropriate that every process checks (1)(2). Please let me know my understanding is wrong. I'm thinking the same. Regarding (2), commit 79ce29c734 introduced that code. According to the following commit log, it seems harmless to retry on EINTR even walreceiver. Also retry on EINTR. All signals used in the backend are flagged SA_RESTART nowadays, so it shouldn't happen, but better to be defensive. BTW, currently XLogWrite() increments IO timing even when pg_pwrite() reports an error. But this is useless. Probably IO timing should be incremented after the return code of pg_pwrite() is checked, instead? Yes, I agree. I fixed it. (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for the patch! nleft = nbytes; do { - errno = 0; + written = XLogWriteFile(openLogFile, from, nleft, (off_t) startoffset, + ThisTimeLineID, openLogSegNo, wal_segment_size); Can we merge this do-while loop in XLogWrite() into the loop in XLogWriteFile()? If we do that, ISTM that the following codes are not necessary in XLogWrite(). nleft -= written; from += written; + * 'segsize' is a segment size of WAL segment file. Since segsize is always wal_segment_size, segsize argument seems not necessary in XLogWriteFile(). +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset, + TimeLineID timelineid, XLogSegNo segno, int segsize) Why did you use "const void *" instead of "char *" for *buf? Regarding 0005 patch, I will review it later. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/11 21:29, Masahiro Ikeda wrote: On 2021-03-11 11:52, Fujii Masao wrote: On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: IIUC the stats collector basically exits after checkpointer and walwriter exit. But there seems no guarantee that the stats collector processes all the messages that other processes have sent during the shutdown of the server. Thanks, I understood the above postmaster behaviors. PMState manages the status and after checkpointer is exited, the postmaster sends SIGQUIT signal to the stats collector if the shutdown mode is smart or fast. (IIUC, although the postmaster kill the walsender, the archiver and the stats collector at the same time, it's ok because the walsender and the archiver doesn't send stats to the stats collector now.) But, there might be a corner case to lose stats sent by background workers like the checkpointer before they exit (although this is not implemented yet.) For example, 1. checkpointer send the stats before it exit 2. stats collector receive the signal and break before processing the stats message from checkpointer. In this case, 1's message is lost. 3. stats collector writes the stats in the statsfiles and exit Why don't you recheck the coming message is zero just before the 2th procedure? (v17-0004-guarantee-to-collect-last-stats-messages.patch) Yes, I was thinking the same. This is the straight-forward fix for this issue. The stats collector should process all the outstanding messages when normal shutdown is requested, as the patch does. On the other hand, if immediate shutdown is requested or emergency bailout (by postmaster death) is requested, maybe the stats collector should skip those processings and exit immediately. But if we implement that, we would need to teach the stats collector the shutdown type (i.e., normal shutdown or immediate one). Because currently SIGQUIT is sent to the collector whichever shutdown is requested, and so the collector cannot distinguish the shutdown type. I'm afraid that change is a bit overkill for now. BTW, I found that the collector calls pgstat_write_statsfiles() even at emergency bailout case, before exiting. It's not necessary to save the stats to the file in that case because subsequent server startup does crash recovery and clears that stats file. So it's better to make the collector exit immediately without calling pgstat_write_statsfiles() at emergency bailout case? Probably this should be discussed in other thread because it's different topic from the feature we're discussing here, though. IIUC, only the stats collector has another hander for SIGQUIT although other background processes have a common hander for it and just call _exit(2). I thought to guarantee when TerminateChildren(SIGTERM) is invoked, don't make stats collector shutdown before other background processes are shutdown. I will make another thread to discuss that the stats collector should know the shutdown type or not. If it should be, it's better to make the stats collector exit as soon as possible if the shutdown type is an immediate, and avoid losing the remaining stats if it's normal. +1 I researched the past discussion related to writing the stats files when the immediate shutdown is requested. And I found the following thread([1]) although the discussion is stopped on 12/1/2016. IIUC, the thread's consensus are (1) To kill the stats collector soon before writing the stats file is needed in some case because there is a possibility that it takes a long time until the failover happens. The possible reasons are that disk write speed is slow, stats files are big, and so on. (2) It needs to change the behavior from removing all stats files when the startup does crash recovery because autovacuum uses the stats. (3) It's ok that the stats collector exit without calling pgstat_write_statsfiles() if the stats file is written every X minutes (using wal or another mechanism) and startup process can restore the stats with slightly low freshness. (4) It needs to find the way how to handle the (2)'s stats file when deleting on PITR rewind or stats collector crash happens. So, I need to ping the threads. But I don't have any idea to handle (4) yet... [1] https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F5EF25A%40G01JPEXMBYT05 I measured the timing of the above in my linux laptop using v17-measure-timing.patch. I don't have any strong opinion to handle this case since this result shows to receive and processes the messages takes too short time (less than 1ms) although the stats collector receives the shutdown signal in 5msec(099->104) after the checkpointer process exits. Agreed. ``` 1615421204.556 [checkpointer] DEBUG: received shutdown request signal 1615421208.099 [checkpointer] DEBUG: proc_exit(-1): 0 callbacks to make # exit and send the
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-12 12:39, Fujii Masao wrote: On 2021/03/11 21:29, Masahiro Ikeda wrote: In addition, I rebased the patch for WAL receiver. (v17-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks! Will review this later. Thanks a lot! I read through the 0003 patch. Here are some comments for that. With the patch, walreceiver's stats are counted as wal_write, wal_sync, wal_write_time and wal_sync_time in pg_stat_wal. But they should be counted as different columns because WAL IO is different between walreceiver and other processes like a backend? For example, open_sync or open_datasync is chosen as wal_sync_method, those other processes use O_DIRECT flag to open WAL files, but walreceiver does not. For example, those other procesess write WAL data in block units, but walreceiver does not. So I'm concerned that mixing different WAL IO stats in the same columns would confuse the users. Thought? I'd like to hear more opinions about how to expose walreceiver's stats to users. Thanks, I understood get_sync_bit() checks the sync flags and the write unit of generated wal data and replicated wal data is different. (It's interesting optimization whether to use kernel cache or not.) OK. Although I agree to separate the stats for the walrecever, I want to hear opinions from other people too. I didn't change the patch. Please feel to your comments. +int +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset) This common function writes WAL data and measures IO timing. IMO we can refactor the code furthermore by making this function handle the case where pg_write() reports an error. In other words, I think that the function should do what do-while loop block in XLogWrite() does. Thought? OK. I agree. I wonder to change the error check ways depending on who calls this function? Now, only the walreceiver checks (1)errno==0 and doesn't check (2)errno==ENITR. Other processes are the opposite. IIUC, it's appropriate that every process checks (1)(2). Please let me know my understanding is wrong. BTW, currently XLogWrite() increments IO timing even when pg_pwrite() reports an error. But this is useless. Probably IO timing should be incremented after the return code of pg_pwrite() is checked, instead? Yes, I agree. I fixed it. (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) BTW, thanks for your comments in person that the bgwriter process will generate wal data. I checked that it generates the WAL to take a snapshot via LogStandySnapshot(). I attached the patch for bgwriter to send the wal stats. (v18-0005-send-stats-for-bgwriter-when-shutdown.patch) This patch includes the following changes. (1) introduce pgstat_send_bgwriter() the mechanism to send the stats if PGSTAT_STAT_INTERVAL msec has passed like pgstat_send_wal() to avoid overloading to stats collector because "bgwriter_delay" can be set for 10msec or more. (2) remove pgstat_report_wal() and integrate with pgstat_send_wal() because bgwriter sends WalStats.m_wal_records and to avoid overloading (see (1)). IIUC, although the pros to separate them is to reduce the calculation cost of WalUsageAccumDiff(), the impact is limited. (3) make a new signal handler for bgwriter to force sending remaining stats during shutdown because of (1) and remove HandleMainLoopInterrupts() because there are no processes to use it. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONdiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9b2eb0d10b..c7bda9127f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2536,7 +2536,6 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) Size nbytes; Size nleft; int written; - instr_time start; /* OK to write the page(s) */ from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; @@ -2544,49 +2543,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) nleft = nbytes; do { -errno = 0; +written = XLogWriteFile(openLogFile, from, nleft, (off_t) startoffset, + ThisTimeLineID, openLogSegNo, wal_segment_size); -/* Measure I/O timing to write WAL data */ -if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); - -pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); -written = pg_pwrite(openLogFile, from, nleft, startoffset); -pgstat_report_wait_end(); - -/* - * Increment the I/O timing and the number of times WAL data - * were written out to disk. - */ -if (track_wal_io_timing) -{ - instr_time duration; - - INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration); -} - -WalStats.m_wal_write++; - -if (written <= 0) -{ - char xlogfname[MAXFNAMELEN]; - int save_errno; - - if (errno == EINTR) - continue; - - save_errno = errno; -
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/11 23:33, Fujii Masao wrote: On 2021/03/11 21:29, Masahiro Ikeda wrote: On 2021-03-11 11:52, Fujii Masao wrote: On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/11 21:29, Masahiro Ikeda wrote: On 2021-03-11 11:52, Fujii Masao wrote: On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/11 21:29, Masahiro Ikeda wrote: On 2021-03-11 11:52, Fujii Masao wrote: On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-11 11:52, Fujii Masao wrote: On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/11 9:38, Masahiro Ikeda wrote: On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-10 17:08, Fujii Masao wrote: On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/10 14:11, Masahiro Ikeda wrote: On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now =
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-09 17:51, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now =
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now,
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/09 4:47, David G. Johnston wrote: On Mon, Mar 8, 2021 at 8:48 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote: Thanks for updating the patch! I applied cosmetic changes to that. Patch attached. Barring any objection, I will commit this version. Read over the patch and it looks good. Thanks for the review! I committed the patch. One minor "the" omission (in a couple of places, copy-paste style): + See for more information about + internal WAL function XLogWrite. "about *the* internal WAL function" I added "the" in such two places. Thanks! Also, I'm not sure why you find omitting documentation that the millisecond field has a fractional part out to microseconds to be helpful. If this information should be documented, we should do that for not only wal_write/sync_time but also other several columns, for example, pg_stat_database.blk_write_time? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Mar 8, 2021 at 8:48 AM Fujii Masao wrote: > > Thanks for updating the patch! I applied cosmetic changes to that. > Patch attached. Barring any objection, I will commit this version. > Read over the patch and it looks good. One minor "the" omission (in a couple of places, copy-paste style): + See for more information about + internal WAL function XLogWrite. "about *the* internal WAL function" Also, I'm not sure why you find omitting documentation that the millisecond field has a fractional part out to microseconds to be helpful. David J.
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/08 19:42, Masahiro Ikeda wrote: On 2021-03-08 13:44, Fujii Masao wrote: On 2021/03/05 19:54, Masahiro Ikeda wrote: On 2021-03-05 12:47, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-08 13:44, Fujii Masao wrote: On 2021/03/05 19:54, Masahiro Ikeda wrote: On 2021-03-05 12:47, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/05 19:54, Masahiro Ikeda wrote: On 2021-03-05 12:47, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now =
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-05 12:47, Fujii Masao wrote: On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now =
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/05 8:38, Masahiro Ikeda wrote: On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now,
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-05 01:02, Fujii Masao wrote: On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now,
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/04 16:14, Masahiro Ikeda wrote: On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || + sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize + * the overhead in WAL-writing. + */ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Yeah, if wal_writer_delay is set to very small value, there is a risk that the WAL stats are sent too frequently. I agree that's a problem. Why don't to make another way to check the timestamp? + /* + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL + * msec since we last sent one + */ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) + { + pgstat_send_wal(); +
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Thu, Mar 4, 2021 at 12:14 PM Masahiro Ikeda wrote: > On 2021-03-03 20:27, Masahiro Ikeda wrote: > > On 2021-03-03 16:30, Fujii Masao wrote: > >> On 2021/03/03 14:33, Masahiro Ikeda wrote: > >>> On 2021-02-24 16:14, Fujii Masao wrote: > On 2021/02/15 11:59, Masahiro Ikeda wrote: > > On 2021-02-10 00:51, David G. Johnston wrote: > >> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda > >> wrote: > >> > >>> I pgindented the patches. > >> > >> ... XLogWrite, which is invoked during an > >> XLogFlush request (see ...). This is also > >> incremented by the WAL receiver during replication. > >> > >> ("which normally called" should be "which is normally called" or > >> "which normally is called" if you want to keep true to the > >> original) > >> You missed the adding the space before an opening parenthesis here > >> and > >> elsewhere (probably copy-paste) > >> > >> is ether -> is either > >> "This parameter is off by default as it will repeatedly query the > >> operating system..." > >> ", because" -> "as" > > > > Thanks, I fixed them. > > > >> wal_write_time and the sync items also need the note: "This is > >> also > >> incremented by the WAL receiver during replication." > > > > I skipped changing it since I separated the stats for the WAL > > receiver > > in pg_stat_wal_receiver. > > > >> "The number of times it happened..." -> " (the tally of this event > >> is > >> reported in wal_buffers_full in) This is undesirable because > >> ..." > > > > Thanks, I fixed it. > > > >> I notice that the patch for WAL receiver doesn't require > >> explicitly > >> computing the sync statistics but does require computing the write > >> statistics. This is because of the presence of issue_xlog_fsync > >> but > >> absence of an equivalent pg_xlog_pwrite. Additionally, I observe > >> that > >> the XLogWrite code path calls pgstat_report_wait_*() while the WAL > >> receiver path does not. It seems technically straight-forward to > >> refactor here to avoid the almost-duplicated logic in the two > >> places, > >> though I suspect there may be a trade-off for not adding another > >> function call to the stack given the importance of WAL processing > >> (though that seems marginalized compared to the cost of actually > >> writing the WAL). Or, as Fujii noted, go the other way and don't > >> have > >> any shared code between the two but instead implement the WAL > >> receiver > >> one to use pg_stat_wal_receiver instead. In either case, this > >> half-and-half implementation seems undesirable. > > > > OK, as Fujii-san mentioned, I separated the WAL receiver stats. > > (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) > > Thanks for updating the patches! > > > > I added the infrastructure code to communicate the WAL receiver > > stats messages between the WAL receiver and the stats collector, > > and > > the stats for WAL receiver is counted in pg_stat_wal_receiver. > > What do you think? > > On second thought, this idea seems not good. Because those stats are > collected between multiple walreceivers, but other values in > pg_stat_wal_receiver is only related to the walreceiver process > running > at that moment. IOW, it seems strange that some values show dynamic > stats and the others show collected stats, even though they are in > the same view pg_stat_wal_receiver. Thought? > >>> > >>> OK, I fixed it. > >>> The stats collected in the WAL receiver is exposed in pg_stat_wal > >>> view in v11 patch. > >> > >> Thanks for updating the patches! I'm now reading 001 patch. > >> > >> +/* Check whether the WAL file was synced to disk right now */ > >> +if (enableFsync && > >> +(sync_method == SYNC_METHOD_FSYNC || > >> + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || > >> + sync_method == SYNC_METHOD_FDATASYNC)) > >> +{ > >> > >> Isn't it better to make issue_xlog_fsync() return immediately > >> if enableFsync is off, sync_method is open_sync or open_data_sync, > >> to simplify the code more? > > > > Thanks for the comments. > > I added the above code in v12 patch. > > > >> > >> +/* > >> + * Send WAL statistics only if WalWriterDelay has elapsed > to > >> minimize > >> + * the overhead in WAL-writing. > >> + */ > >> +if (rc & WL_TIMEOUT) > >> +pgstat_send_wal(); > >> > >> On second thought, this change means that it always takes > >> wal_writer_delay > >> before walwriter's WAL stats is sent after XLogBackgroundFlush() is > >> called. > >> For example, if wal_writer_delay is set to several seconds, some > >> values in > >> pg_stat_wal would be not up-to-date
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-03 20:27, Masahiro Ikeda wrote: On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || +sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || +sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize +* the overhead in WAL-writing. +*/ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Why don't to make another way to check the timestamp? + /* +* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +* msec since we last sent one +*/ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) + { + pgstat_send_wal(); + last_report = now; + } + Although I worried that it's better to
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-03 16:30, Fujii Masao wrote: On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || +sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || +sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? Thanks for the comments. I added the above code in v12 patch. + /* + * Send WAL statistics only if WalWriterDelay has elapsed to minimize +* the overhead in WAL-writing. +*/ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Thanks, I didn't notice that. Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's default value is 200msec and it may be set shorter time. Why don't to make another way to check the timestamp? + /* +* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +* msec since we last sent one +*/ + now = GetCurrentTimestamp(); + if (TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) + { + pgstat_send_wal(); + last_report = now; + } + Although I worried that it's better to add the check code in pgstat_send_wal(),
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/03 14:33, Masahiro Ikeda wrote: On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. Thanks for updating the patches! I'm now reading 001 patch. + /* Check whether the WAL file was synced to disk right now */ + if (enableFsync && + (sync_method == SYNC_METHOD_FSYNC || +sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || +sync_method == SYNC_METHOD_FDATASYNC)) + { Isn't it better to make issue_xlog_fsync() return immediately if enableFsync is off, sync_method is open_sync or open_data_sync, to simplify the code more? + /* +* Send WAL statistics only if WalWriterDelay has elapsed to minimize +* the overhead in WAL-writing. +*/ + if (rc & WL_TIMEOUT) + pgstat_send_wal(); On second thought, this change means that it always takes wal_writer_delay before walwriter's WAL stats is sent after XLogBackgroundFlush() is called. For example, if wal_writer_delay is set to several seconds, some values in pg_stat_wal would be not up-to-date meaninglessly for those seconds. So I'm thinking to withdraw my previous comment and it's ok to send the stats every after XLogBackgroundFlush() is called. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-02-24 16:14, Fujii Masao wrote: On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? OK, I fixed it. The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. I refactored the logic to write xlog file to unify collecting the write stats. As David said, although pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE) is not called in the WAL receiver's path, I agreed that the cost to write the WAL is much bigger. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom c97fff03cd2bd51b28f6c6fde56c48792683f44e Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 12 Feb 2021 11:19:59 +0900 Subject: [PATCH 1/2] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity by backends and background processes except WAL receiver. - the total number of times writing/syncing WAL data. - the total amount of time spent writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. The statistics related to sync are zero when "wal_sync_method" is "open_datasync" or "open_sync", because it doesn't call each sync method. (This requires a catversion bump, as well as an update to
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/02/15 11:59, Masahiro Ikeda wrote: On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) Thanks for updating the patches! I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? On second thought, this idea seems not good. Because those stats are collected between multiple walreceivers, but other values in pg_stat_wal_receiver is only related to the walreceiver process running at that moment. IOW, it seems strange that some values show dynamic stats and the others show collected stats, even though they are in the same view pg_stat_wal_receiver. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-02-10 00:51, David G. Johnston wrote: On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: I pgindented the patches. ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" Thanks, I fixed them. wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." I skipped changing it since I separated the stats for the WAL receiver in pg_stat_wal_receiver. "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." Thanks, I fixed it. I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. OK, as Fujii-san mentioned, I separated the WAL receiver stats. (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch) I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and the stats for WAL receiver is counted in pg_stat_wal_receiver. What do you think? Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom 4ecbc467f88a2f923b3bc2eb6fe2c4f7725c02be Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 12 Feb 2021 11:19:59 +0900 Subject: [PATCH 1/2] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity by backends and background processes except WAL receiver. - the total number of times writing/syncing WAL data. - the total amount of time spent writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. The statistics related to sync are zero when "wal_sync_method" is "open_datasync" or "open_sync", because it doesn't call each sync method. (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.co --- doc/src/sgml/config.sgml | 23 +++- doc/src/sgml/monitoring.sgml | 50 doc/src/sgml/wal.sgml | 12 +++- src/backend/access/transam/xlog.c | 57 +++ src/backend/catalog/system_views.sql | 4 ++ src/backend/postmaster/pgstat.c | 4 ++ src/backend/postmaster/walwriter.c| 16 -- src/backend/utils/adt/pgstatfuncs.c | 24 +++- src/backend/utils/misc/guc.c | 9 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/pgstat.h | 10 src/test/regress/expected/rules.out | 6 +- 14 files changed, 208 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5ef1c7ad3c..0bb03d0717 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7404,7 +7404,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Enables timing of database I/O calls. This parameter is off by -default, because it will repeatedly query the operating system for +default, as it will repeatedly query the operating system for the current time, which may cause significant overhead on some platforms. You can use the tool to measure the overhead of timing on your system. @@ -7418,6 +7418,27 @@ COPY postgres_log FROM
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-02-08 14:26, Fujii Masao wrote: On 2021/02/08 13:01, Fujii Masao wrote: On 2021/02/05 8:45, Masahiro Ikeda wrote: I pgindented the patches. Thanks for updating the patches! + XLogWrite, which nomally called by an + issue_xlog_fsync, which nomally called by an Typo: "nomally" should be "normally"? + XLogFlush request(see linkend="wal-configuration"/>) + XLogFlush request(see linkend="wal-configuration"/>), Isn't it better to add a space character just after "request"? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration); If several cycles happen in the do-while loop, m_wal_write_time should be updated with the sum of "duration" in those cycles instead of "duration" in the last cycle? If yes, "+=" should be used instead of "=" when updating m_wal_write_time? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); Also "=" should be "+=" in the above? + /* Send WAL statistics */ + pgstat_send_wal(); This may cause overhead in WAL-writing by walwriter because it's called every cycles even when walwriter needs to write more WAL next cycle (don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal() should be called only when WaitLatch() returns with WL_TIMEOUT? Thanks, I didn't notice that. I'll fix it. - XLogFlush request(see ) + XLogFlush request(see ), + or WAL data written out to disk by WAL receiver. So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and wal_sync_time are updated even while the other values are not. Isn't this confusing to users? If so, what about reporting those walreceiver stats in pg_stat_wal_receiver? OK, I'll add new infrastructure code to interect with wal receiver and stats collector and show the stats in pg_stat_wal_receiver. if (endofwal) + { + /* Send WAL statistics to the stats collector */ + pgstat_send_wal(); break; You added pgstat_send_wal() so that it's called in some cases where walreceiver exits. But ISTM that there are other walreceiver-exit cases. For example, in the case where SIGTERM is received. Instead, pgstat_send_wal() should be called in WalRcvDie() for those all cases? Thanks, I forgot the case. I'll fix it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-02-08 13:01, Fujii Masao wrote: On 2021/02/05 8:45, Masahiro Ikeda wrote: I pgindented the patches. Thanks for updating the patches! Thanks for checking the patches. + XLogWrite, which nomally called by an + issue_xlog_fsync, which nomally called by an Typo: "nomally" should be "normally"? Yes, I'll fix it. + XLogFlush request(see ) + XLogFlush request(see ), Isn't it better to add a space character just after "request"? Thanks, I'll fix it. + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration); If several cycles happen in the do-while loop, m_wal_write_time should be updated with the sum of "duration" in those cycles instead of "duration" in the last cycle? If yes, "+=" should be used instead of "=" when updating m_wal_write_time? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); Also "=" should be "+=" in the above? Yes, they are my mistake when changing the unit from milliseconds to microseconds. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda wrote: > I pgindented the patches. > > ... XLogWrite, which is invoked during an XLogFlush request (see ...). This is also incremented by the WAL receiver during replication. ("which normally called" should be "which is normally called" or "which normally is called" if you want to keep true to the original) You missed the adding the space before an opening parenthesis here and elsewhere (probably copy-paste) is ether -> is either "This parameter is off by default as it will repeatedly query the operating system..." ", because" -> "as" wal_write_time and the sync items also need the note: "This is also incremented by the WAL receiver during replication." "The number of times it happened..." -> " (the tally of this event is reported in wal_buffers_full in) This is undesirable because ..." I notice that the patch for WAL receiver doesn't require explicitly computing the sync statistics but does require computing the write statistics. This is because of the presence of issue_xlog_fsync but absence of an equivalent pg_xlog_pwrite. Additionally, I observe that the XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver path does not. It seems technically straight-forward to refactor here to avoid the almost-duplicated logic in the two places, though I suspect there may be a trade-off for not adding another function call to the stack given the importance of WAL processing (though that seems marginalized compared to the cost of actually writing the WAL). Or, as Fujii noted, go the other way and don't have any shared code between the two but instead implement the WAL receiver one to use pg_stat_wal_receiver instead. In either case, this half-and-half implementation seems undesirable. David J.
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/02/08 13:01, Fujii Masao wrote: On 2021/02/05 8:45, Masahiro Ikeda wrote: I pgindented the patches. Thanks for updating the patches! + XLogWrite, which nomally called by an + issue_xlog_fsync, which nomally called by an Typo: "nomally" should be "normally"? + XLogFlush request(see ) + XLogFlush request(see ), Isn't it better to add a space character just after "request"? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration); If several cycles happen in the do-while loop, m_wal_write_time should be updated with the sum of "duration" in those cycles instead of "duration" in the last cycle? If yes, "+=" should be used instead of "=" when updating m_wal_write_time? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); Also "=" should be "+=" in the above? + /* Send WAL statistics */ + pgstat_send_wal(); This may cause overhead in WAL-writing by walwriter because it's called every cycles even when walwriter needs to write more WAL next cycle (don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal() should be called only when WaitLatch() returns with WL_TIMEOUT? - XLogFlush request(see ) + XLogFlush request(see ), + or WAL data written out to disk by WAL receiver. So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and wal_sync_time are updated even while the other values are not. Isn't this confusing to users? If so, what about reporting those walreceiver stats in pg_stat_wal_receiver? if (endofwal) + { + /* Send WAL statistics to the stats collector */ + pgstat_send_wal(); break; You added pgstat_send_wal() so that it's called in some cases where walreceiver exits. But ISTM that there are other walreceiver-exit cases. For example, in the case where SIGTERM is received. Instead, pgstat_send_wal() should be called in WalRcvDie() for those all cases? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/02/05 8:45, Masahiro Ikeda wrote: I pgindented the patches. Thanks for updating the patches! + XLogWrite, which nomally called by an + issue_xlog_fsync, which nomally called by an Typo: "nomally" should be "normally"? + XLogFlush request(see ) + XLogFlush request(see ), Isn't it better to add a space character just after "request"? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time = INSTR_TIME_GET_MICROSEC(duration); If several cycles happen in the do-while loop, m_wal_write_time should be updated with the sum of "duration" in those cycles instead of "duration" in the last cycle? If yes, "+=" should be used instead of "=" when updating m_wal_write_time? + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); Also "=" should be "+=" in the above? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
I pgindented the patches. Regards -- Masahiro Ikeda NTT DATA CORPORATIONFrom 47f436d7e423ece33a25adebf4265eac02e575f3 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 29 Jan 2021 16:41:34 +0900 Subject: [PATCH 1/2] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity. - the total number of times writing/syncing WAL data. - the total amount of time spent writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. The statistics related to sync are zero when "wal_sync_method" is "open_datasync" or "open_sync", because it doesn't call each sync method. (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com --- doc/src/sgml/config.sgml | 21 +++ doc/src/sgml/monitoring.sgml | 50 doc/src/sgml/wal.sgml | 12 +++- src/backend/access/transam/xlog.c | 57 +++ src/backend/catalog/system_views.sql | 4 ++ src/backend/postmaster/checkpointer.c | 2 +- src/backend/postmaster/pgstat.c | 4 ++ src/backend/postmaster/walwriter.c| 3 + src/backend/utils/adt/pgstatfuncs.c | 24 +++- src/backend/utils/misc/guc.c | 9 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/pgstat.h | 10 src/test/regress/expected/rules.out | 6 +- 15 files changed, 199 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5ef1c7ad3c..4bdc341141 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7418,6 +7418,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + track_wal_io_timing (boolean) + + track_wal_io_timing configuration parameter + + + + +Enables timing of WAL I/O calls. This parameter is off by default, +because it will repeatedly query the operating system for +the current time, which may cause significant overhead on some +platforms. You can use the tool to +measure the overhead of timing on your system. +I/O timing information is +displayed in +pg_stat_wal. Only superusers can +change this setting. + + + + track_functions (enum) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index c602ee4427..2435f401db 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3487,6 +3487,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + wal_write bigint + + + Number of times WAL buffers were written out to disk via + XLogWrite, which nomally called by an + XLogFlush request(see ) + + + + + + wal_write_time double precision + + + Total amount of time spent writing WAL data to disk, excluding sync time unless +is ether open_datasync or + open_sync. Units are in milliseconds with microsecond resolution. + This is zero when is disabled. + + + + + + wal_sync bigint + + + Number of times WAL files were synced to disk via + issue_xlog_fsync, which nomally called by an + XLogFlush request(see ), + while was set to one of the + "sync at commit" options (i.e., fdatasync, + fsync, or fsync_writethrough). + + + + + + wal_sync_time double precision + + + Total amount of time spent syncing WAL files to disk, in milliseconds with microsecond + resolution. This requires setting to one of + the "sync at commit" options (i.e., fdatasync, fsync, + or fsync_writethrough). + This is zero when is disabled. + + + stats_reset timestamp with time zone diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 66de1ee2f8..984cb5764c 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -663,7 +663,9 @@ the WAL buffers in shared memory. If there is no space for the new record, XLogInsertRecord will have to write (move to kernel cache) a few filled WAL - buffers. This is undesirable because XLogInsertRecord + buffers. The number of times it happend is counted as + wal_buffers_full in . + This is
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-27 00:14, David G. Johnston wrote: On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda wrote: (wal_write) The number of times WAL buffers were written out to disk via XLogWrite Thanks. I thought it's better to omit "The" and "XLogWrite" because other views' description omits "The" and there is no description of "XlogWrite" in the documents. What do you think? The documentation for WAL does get into the public API level of detail and doing so here makes what this measures crystal clear. The potential absence of sufficient detail elsewhere should be corrected instead of making this description more vague. Specifically, probably XLogWrite should be added to the WAL overview as part of this update and probably even have the descriptive section of the documentation note that the number of times that said function is executed is exposed as a counter in the wal statistics table - thus closing the loop. Thanks for your comments. I added the descriptions in documents and separated the patch into attached two patches. First is to add wal i/o activity statistics. Second is to make the wal receiver report the wal statistics. Please let me know if you have any comments. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom a93516dd9836345f97a6f4081597a7079dff4932 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 29 Jan 2021 16:41:34 +0900 Subject: [PATCH 1/2] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity. - the total number of times writing/syncing WAL data. - the total amount of time spent writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. The statistics related to sync are zero when "wal_sync_method" is "open_datasync" or "open_sync", because it doesn't call each sync method. (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com --- doc/src/sgml/config.sgml | 21 +++ doc/src/sgml/monitoring.sgml | 50 doc/src/sgml/wal.sgml | 12 +++- src/backend/access/transam/xlog.c | 59 ++- src/backend/catalog/system_views.sql | 4 ++ src/backend/postmaster/checkpointer.c | 2 +- src/backend/postmaster/pgstat.c | 4 ++ src/backend/postmaster/walwriter.c| 3 + src/backend/utils/adt/pgstatfuncs.c | 24 +++- src/backend/utils/misc/guc.c | 9 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 14 ++--- src/include/pgstat.h | 8 +++ src/test/regress/expected/rules.out | 6 +- 15 files changed, 202 insertions(+), 16 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f1037df5a9..3f1b3c1715 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7416,6 +7416,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + track_wal_io_timing (boolean) + + track_wal_io_timing configuration parameter + + + + +Enables timing of WAL I/O calls. This parameter is off by default, +because it will repeatedly query the operating system for +the current time, which may cause significant overhead on some +platforms. You can use the tool to +measure the overhead of timing on your system. +I/O timing information is +displayed in +pg_stat_wal. Only superusers can +change this setting. + + + + track_functions (enum) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index c602ee4427..2435f401db 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3487,6 +3487,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + wal_write bigint + + + Number of times WAL buffers were written out to disk via + XLogWrite, which nomally called by an + XLogFlush request(see ) + + + + + + wal_write_time double precision + + + Total amount of time spent writing WAL data to disk, excluding sync time unless +is ether open_datasync or + open_sync. Units are in milliseconds with microsecond resolution. + This is zero when is disabled. + + + + + + wal_sync bigint + + + Number of times WAL files were
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda wrote: > > > (wal_write) > > The number of times WAL buffers were written out to disk via XLogWrite > > > > Thanks. > > I thought it's better to omit "The" and "XLogWrite" because other views' > description > omits "The" and there is no description of "XlogWrite" in the documents. > What do you think? > > The documentation for WAL does get into the public API level of detail and doing so here makes what this measures crystal clear. The potential absence of sufficient detail elsewhere should be corrected instead of making this description more vague. Specifically, probably XLogWrite should be added to the WAL overview as part of this update and probably even have the descriptive section of the documentation note that the number of times that said function is executed is exposed as a counter in the wal statistics table - thus closing the loop. David J.
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, David. Thanks for your comments. On 2021-01-26 08:48, David G. Johnston wrote: On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada wrote: On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda wrote: Hi, thanks for the reviews. I updated the attached patch. Thank you for updating the patch! Your original email with "total number of times" is more correct, removing the "of times" and just writing "total number of WAL" is not good wording. Specifically, this change is strictly worse than the original. - Number of times WAL data was written to disk because WAL buffers became full + Total number of WAL data written to disk because WAL buffers became full Both have the flaw that they leave implied exactly what it means to "write WAL to disk". It is also unclear whether a counter, bytes, or both, would be more useful here. I've incorporated this into my documentation suggestions below: (wal_buffers_full) -- Revert - the original was better, though maybe add more detail similar to the below. I didn't research exactly how this works. OK, I understood. I reverted since this is a counter statistics. (wal_write) The number of times WAL buffers were written out to disk via XLogWrite Thanks. I thought it's better to omit "The" and "XLogWrite" because other views' description omits "The" and there is no description of "XlogWrite" in the documents. What do you think? -- Seems like this should have a bytes version too Do you mean that we need to separate statistics for wal write? (wal_write_time) The amount of time spent writing WAL buffers to disk, excluding sync time unless the wal_sync_method is either open_datasync or open_sync. Units are in milliseconds with microsecond resolution. This is zero when track_wal_io_timing is disabled. Thanks, I'll fix it. (wal_sync) The number of times WAL files were synced to disk while wal_sync_method was set to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough). Thanks, I'll fix it. -- it is not going to be zero just because those settings are presently disabled as they could have been enabled at some point since the last time these statistics were reset. Right, your description is correct. The "track_wal_io_timing" has the same limitation, doesn't it? (wal_sync_time) The amount of time spent syncing WAL files to disk, in milliseconds with microsecond resolution. This requires setting wal_sync_method to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough). Thanks, I'll fix it. I will add the comments related to "track_wal_io_timing". Also, I would suggest extracting the changes to postmaster/pgstat.c and replication/walreceiver.c to a separate patch as you've fundamentally changed how it behaves with regards to that function and how it interacts with the WAL receiver. That seems an entirely separate topic warranting its own patch and discussion. OK, I will separate two patches. On 2021-01-26 08:52, David G. Johnston wrote: On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda wrote: I agree with your comments. I think it should report when reaching the end of WAL too. I add the code to report the stats when finishing the current WAL segment file when timeout in the main loop and when reaching the end of WAL. The following is not an improvement: - /* Send WAL statistics to the stats collector. */+ /* Send WAL statistics to stats collector */ The word "the" there makes it proper English. Your copy-pasting should have kept the existing good wording in the other locations rather than replace the existing location with the newly added incorrect wording. Thanks, I'll fix it. This doesn't make sense: * current WAL segment file to avoid loading stats collector. Maybe "overloading" or "overwhelming"? I see you removed the pgstat_send_wal(force) change. The rest of my comments on the v6 patch still stand I believe. Yes, "overloading" is right. Thanks. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda wrote: > > I agree with your comments. I think it should report when > reaching the end of WAL too. I add the code to report the stats > when finishing the current WAL segment file when timeout in the > main loop and when reaching the end of WAL. > > The following is not an improvement: - /* Send WAL statistics to the stats collector. */ + /* Send WAL statistics to stats collector */ The word "the" there makes it proper English. Your copy-pasting should have kept the existing good wording in the other locations rather than replace the existing location with the newly added incorrect wording. This doesn't make sense: * current WAL segment file to avoid loading stats collector. Maybe "overloading" or "overwhelming"? I see you removed the pgstat_send_wal(force) change. The rest of my comments on the v6 patch still stand I believe. David J.
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada wrote: > On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda > wrote: > > > > Hi, thanks for the reviews. > > > > I updated the attached patch. > > Thank you for updating the patch! > Your original email with "total number of times" is more correct, removing the "of times" and just writing "total number of WAL" is not good wording. Specifically, this change is strictly worse than the original. - Number of times WAL data was written to disk because WAL buffers became full + Total number of WAL data written to disk because WAL buffers became full Both have the flaw that they leave implied exactly what it means to "write WAL to disk". It is also unclear whether a counter, bytes, or both, would be more useful here. I've incorporated this into my documentation suggestions below: (wal_buffers_full) -- Revert - the original was better, though maybe add more detail similar to the below. I didn't research exactly how this works. (wal_write) The number of times WAL buffers were written out to disk via XLogWrite -- Seems like this should have a bytes version too (wal_write_time) The amount of time spent writing WAL buffers to disk, excluding sync time unless the wal_sync_method is either open_datasync or open_sync. Units are in milliseconds with microsecond resolution. This is zero when track_wal_io_timing is disabled. (wal_sync) The number of times WAL files were synced to disk while wal_sync_method was set to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough). -- it is not going to be zero just because those settings are presently disabled as they could have been enabled at some point since the last time these statistics were reset. (wal_sync_time) The amount of time spent syncing WAL files to disk, in milliseconds with microsecond resolution. This requires setting wal_sync_method to one of the "sync at commit" options (i.e., fdatasync, fsync, or fsync_writethrough). Also, I would suggest extracting the changes to postmaster/pgstat.c and replication/walreceiver.c to a separate patch as you've fundamentally changed how it behaves with regards to that function and how it interacts with the WAL receiver. That seems an entirely separate topic warranting its own patch and discussion. David J.
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-26 00:03, Masahiko Sawada wrote: On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda wrote: Hi, thanks for the reviews. I updated the attached patch. Thank you for updating the patch! The summary of the changes is following. 1. fix document I followed another view's comments. 2. refactor issue_xlog_fsync() I removed "sync_called" variables, narrowed the "duration" scope and change the switch statement to if statement. Looking at the code again, I think if we check if an fsync was really called when calculating the I/O time, it's better to check that before starting the measurement. bool issue_fsync = false; if (enableFsync && (sync_method == SYNC_METHOD_FSYNC || sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || sync_method == SYNC_METHOD_FDATASYNC)) { if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); issue_fsync = true; } (snip) if (issue_fsync) { if (track_wal_io_timing) { instr_time duration; INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } WalStats.m_wal_sync++; } So I prefer either the above which is a modified version of the original approach or my idea that doesn’t introduce a new local variable I proposed before. But I'm not going to insist on that. Thanks for the comments. I change the code to the above. 3. make wal-receiver report WAL statistics I add the code to collect the statistics for a written operation in XLogWalRcvWrite() and to report stats in WalReceiverMain(). Since WalReceiverMain() can loop fast, to avoid loading stats collector, I add "force" argument to the pgstat_send_wal function. If "force" is false, it can skip reporting until at least 500 msec since it last reported. WalReceiverMain() almost calls pgstat_send_wal() with "force" as false. void -pgstat_send_wal(void) +pgstat_send_wal(bool force) { /* We assume this initializes to zeroes */ static const PgStat_MsgWal all_zeroes; + static TimestampTz last_report = 0; + TimestampTz now; WalUsagewalusage; + /* +* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +* msec since we last sent one or specified "force". +*/ + now = GetCurrentTimestamp(); + if (!force && + !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) + return; + + last_report = now; Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this purpose since it is used as a minimum time for stats file updates. If we want an interval, I think we should define another one Also, with the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time even if track_wal_io_timing is off, which is not good. On the other hand, I agree that your concern that the wal receiver should not send the stats for whenever receiving wal records. So an idea could be to send the wal stats when finishing the current WAL segment file and when timeout in the main loop. That way we can guarantee that the wal stats on a replica is updated at least every time finishing a WAL segment file when actively receiving WAL records and every NAPTIME_PER_CYCLE in other cases. I agree with your comments. I think it should report when reaching the end of WAL too. I add the code to report the stats when finishing the current WAL segment file when timeout in the main loop and when reaching the end of WAL. Regards, -- Masahiro Ikeda NTT DATA CORPORATION--- v6-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-01-25 16:27:50.749429666 +0900 +++ v7-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-01-26 08:19:48.269760642 +0900 @@ -1,6 +1,6 @@ -From e9aad92097c5cff5565b67ce1a8ec6d7b4c8a4d9 Mon Sep 17 00:00:00 2001 +From 02f0888efeb09ae641d9ef905788d995d687c56f Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda -Date: Mon, 25 Jan 2021 16:26:04 +0900 +Date: Tue, 26 Jan 2021 08:18:37 +0900 Subject: [PATCH] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view @@ -22,24 +22,24 @@ Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com -(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) +(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID --- - doc/src/sgml/config.sgml | 21 - doc/src/sgml/monitoring.sgml | 48 - - src/backend/access/transam/xlog.c | 51 ++- + doc/src/sgml/config.sgml | 21 +++ + doc/src/sgml/monitoring.sgml | 48 +++- + src/backend/access/transam/xlog.c | 56 ++- src/backend/catalog/system_views.sql | 4
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda wrote: > > Hi, thanks for the reviews. > > I updated the attached patch. Thank you for updating the patch! > The summary of the changes is following. > > 1. fix document > > I followed another view's comments. > > > 2. refactor issue_xlog_fsync() > > I removed "sync_called" variables, narrowed the "duration" scope and > change the switch statement to if statement. Looking at the code again, I think if we check if an fsync was really called when calculating the I/O time, it's better to check that before starting the measurement. bool issue_fsync = false; if (enableFsync && (sync_method == SYNC_METHOD_FSYNC || sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH || sync_method == SYNC_METHOD_FDATASYNC)) { if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); issue_fsync = true; } (snip) if (issue_fsync) { if (track_wal_io_timing) { instr_time duration; INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } WalStats.m_wal_sync++; } So I prefer either the above which is a modified version of the original approach or my idea that doesn’t introduce a new local variable I proposed before. But I'm not going to insist on that. > > > 3. make wal-receiver report WAL statistics > > I add the code to collect the statistics for a written operation > in XLogWalRcvWrite() and to report stats in WalReceiverMain(). > > Since WalReceiverMain() can loop fast, to avoid loading stats collector, > I add "force" argument to the pgstat_send_wal function. If "force" is > false, it can skip reporting until at least 500 msec since it last > reported. WalReceiverMain() almost calls pgstat_send_wal() with "force" > as false. void -pgstat_send_wal(void) +pgstat_send_wal(bool force) { /* We assume this initializes to zeroes */ static const PgStat_MsgWal all_zeroes; + static TimestampTz last_report = 0; + TimestampTz now; WalUsagewalusage; + /* +* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL +* msec since we last sent one or specified "force". +*/ + now = GetCurrentTimestamp(); + if (!force && + !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) + return; + + last_report = now; Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this purpose since it is used as a minimum time for stats file updates. If we want an interval, I think we should define another one Also, with the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time even if track_wal_io_timing is off, which is not good. On the other hand, I agree that your concern that the wal receiver should not send the stats for whenever receiving wal records. So an idea could be to send the wal stats when finishing the current WAL segment file and when timeout in the main loop. That way we can guarantee that the wal stats on a replica is updated at least every time finishing a WAL segment file when actively receiving WAL records and every NAPTIME_PER_CYCLE in other cases. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, thanks for the reviews. I updated the attached patch. The summary of the changes is following. 1. fix document I followed another view's comments. 2. refactor issue_xlog_fsync() I removed "sync_called" variables, narrowed the "duration" scope and change the switch statement to if statement. 3. make wal-receiver report WAL statistics I add the code to collect the statistics for a written operation in XLogWalRcvWrite() and to report stats in WalReceiverMain(). Since WalReceiverMain() can loop fast, to avoid loading stats collector, I add "force" argument to the pgstat_send_wal function. If "force" is false, it can skip reporting until at least 500 msec since it last reported. WalReceiverMain() almost calls pgstat_send_wal() with "force" as false. Regards, -- Masahiro Ikeda NTT DATA CORPORATION--- v5-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-01-23 09:26:01.919248712 +0900 +++ v6-0001-Add-statistics-related-to-write-sync-wal-records.patch 2021-01-25 16:27:50.749429666 +0900 @@ -1,6 +1,6 @@ -From ee1b7d17391b9d9619f709afeacdd118973471d6 Mon Sep 17 00:00:00 2001 +From e9aad92097c5cff5565b67ce1a8ec6d7b4c8a4d9 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda -Date: Fri, 22 Jan 2021 21:38:31 +0900 +Date: Mon, 25 Jan 2021 16:26:04 +0900 Subject: [PATCH] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view @@ -24,20 +24,22 @@ (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) --- - doc/src/sgml/config.sgml | 21 +++ - doc/src/sgml/monitoring.sgml | 48 ++- - src/backend/access/transam/xlog.c | 59 ++- + doc/src/sgml/config.sgml | 21 + doc/src/sgml/monitoring.sgml | 48 - + src/backend/access/transam/xlog.c | 51 ++- src/backend/catalog/system_views.sql | 4 ++ - src/backend/postmaster/pgstat.c | 4 ++ - src/backend/postmaster/walwriter.c| 3 + - src/backend/utils/adt/pgstatfuncs.c | 24 +++- - src/backend/utils/misc/guc.c | 9 +++ + src/backend/postmaster/checkpointer.c | 2 +- + src/backend/postmaster/pgstat.c | 26 -- + src/backend/postmaster/walwriter.c| 3 ++ + src/backend/replication/walreceiver.c | 30 +++ + src/backend/utils/adt/pgstatfuncs.c | 24 +++-- + src/backend/utils/misc/guc.c | 9 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 14 ++--- - src/include/pgstat.h | 8 +++ - src/test/regress/expected/rules.out | 6 +- - 13 files changed, 189 insertions(+), 13 deletions(-) + src/include/pgstat.h | 10 +++- + src/test/regress/expected/rules.out | 6 ++- + 15 files changed, 232 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 82864bbb24..43f3fbcaf8 100644 @@ -72,7 +74,7 @@ track_functions (enum) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml -index f05140dd42..36764dc64f 100644 +index f05140dd42..5a8fc4eb0c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i @@ -80,7 +82,7 @@ - Number of times WAL data was written to disk because WAL buffers became full -+ Total number of times WAL data was written to disk because WAL buffers became full ++ Total number of WAL data written to disk because WAL buffers became full + + + @@ -89,7 +91,7 @@ + wal_write bigint + + -+ Total number of times WAL data was written to disk ++ Total number of WAL data written to disk + + + @@ -109,7 +111,7 @@ + wal_sync bigint + + -+ Total number of times WAL data was synced to disk ++ Total number of WAL data synced to disk + (if is open_datasync or + open_sync, this value is zero because WAL data is synced + when to write it). @@ -131,7 +133,7 @@ diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c -index 470e113b33..b3890f97a2 100644 +index 470e113b33..1c4860bee7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -110,6 +110,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */ @@ -142,16 +144,15 @@ #ifdef WAL_DEBUG bool XLOG_DEBUG = false; -@@ -2540,6 +2541,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) +@@ -2540,6 +2541,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) Size nbytes; Size nleft; int
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 1:28 PM Masahiro Ikeda wrote: > > On 2021-01-25 13:15, Masahiro Ikeda wrote: > > On 2021-01-25 10:36, Masahiko Sawada wrote: > >> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda > >> wrote: > >>> > >>> On 2021-01-22 14:50, Masahiko Sawada wrote: > >>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > >>> > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> I rebased the patch to the master branch. > >>> > > >>> > Thank you for working on this. I've read the latest patch. Here are > >>> > comments: > >>> > > >>> > --- > >>> > + if (track_wal_io_timing) > >>> > + { > >>> > + INSTR_TIME_SET_CURRENT(duration); > >>> > + INSTR_TIME_SUBTRACT(duration, start); > >>> > + WalStats.m_wal_write_time += > >>> > INSTR_TIME_GET_MILLISEC(duration); > >>> > + } > >>> > > >>> > * I think it should add the time in micro sec. > >>> > After running pgbench with track_wal_io_timing = on for 30 sec, > >>> > pg_stat_wal showed the following on my environment: > >>> > > >>> > postgres(1:61569)=# select * from pg_stat_wal; > >>> > -[ RECORD 1 ]+- > >>> > wal_records | 285947 > >>> > wal_fpi | 53285 > >>> > wal_bytes| 442008213 > >>> > wal_buffers_full | 0 > >>> > wal_write| 25516 > >>> > wal_write_time | 0 > >>> > wal_sync | 25437 > >>> > wal_sync_time| 14490 > >>> > stats_reset | 2021-01-22 10:56:13.29464+09 > >>> > > >>> > Since writes can complete less than a millisecond, wal_write_time > >>> > didn't increase. I think sync_time could also have the same problem. > >>> > >>> Thanks for your comments. I didn't notice that. > >>> I changed the unit from milliseconds to microseconds. > >>> > >>> > --- > >>> > + /* > >>> > +* Measure i/o timing to fsync WAL data. > >>> > +* > >>> > +* The wal receiver skip to collect it to avoid performance > >>> > degradation of standy servers. > >>> > +* If sync_method doesn't have its fsync method, to skip too. > >>> > +*/ > >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing && > >>> > fsyncMethodCalled()) > >>> > + INSTR_TIME_SET_CURRENT(start); > >>> > > >>> > * Why does only the wal receiver skip it even if track_wal_io_timinig > >>> > is true? I think the performance degradation is also true for backend > >>> > processes. If there is another reason for that, I think it's better to > >>> > mention in both the doc and comment. > >>> > * How about checking track_wal_io_timing first? > >>> > * s/standy/standby/ > >>> > >>> I fixed it. > >>> As kuroda-san mentioned too, the skip is no need to be considered. > >> > >> I think you also removed the code to have the wal receiver report the > >> stats. So with the latest patch, the wal receiver tracks those > >> statistics but doesn't report. > >> And maybe XLogWalRcvWrite() also needs to track I/O? > > > > Thanks, I forgot to add them. > > I'll fix it. > > > > > >>> > >>> > --- > >>> > + /* increment the i/o timing and the number of times to fsync WAL > >>> > data */ > >>> > + if (fsyncMethodCalled()) > >>> > + { > >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing) > >>> > + { > >>> > + INSTR_TIME_SET_CURRENT(duration); > >>> > + INSTR_TIME_SUBTRACT(duration, start); > >>> > + WalStats.m_wal_sync_time += > >>> > INSTR_TIME_GET_MILLISEC(duration); > >>> > + } > >>> > + > >>> > + WalStats.m_wal_sync++; > >>> > + } > >>> > > >>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about > >>> > incrementing m_wal_sync after each sync operation? > >>> > >>> I think if syncing the disk does not occur, m_wal_sync should not be > >>> incremented. > >>> It depends enableFsync and sync_method. > >>> > >>> enableFsync is checked in each fsync method like > >>> pg_fsync_no_writethrough(), > >>> so if incrementing m_wal_sync after each sync operation, it should be > >>> implemented > >>> in each fsync method. It leads to many duplicated codes. > >> > >> Right. I missed that each fsync function checks enableFsync. > >> > >>> So, why don't you change the function to a flag whether to > >>> sync data to the disk will be occurred or not in issue_xlog_fsync()? > >> > >> Looks better. Since we don't necessarily need to increment m_wal_sync > >> after doing fsync we can write the code without an additional variable > >> as follows: > >> > >> if (enableFsync) > >> { > >> switch (sync_method) > >> { > >> case SYNC_METHOD_FSYNC: > >> #ifdef HAVE_FSYNC_WRITETHROUGH > >> case SYNC_METHOD_FSYNC_WRITETHROUGH: > >> #endif > >> #ifdef HAVE_FDATASYNC > >> case SYNC_METHOD_FDATASYNC: > >> #endif > >> WalStats.m_wal_sync++; > >> if (track_wal_io_timing) > >> INSTR_TIME_SET_CURRENT(start); > >> break; > >> default: > >>
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-25 13:15, Masahiro Ikeda wrote: On 2021-01-25 10:36, Masahiko Sawada wrote: On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda wrote: On 2021-01-22 14:50, Masahiko Sawada wrote: > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > wrote: >> >> Hi, >> >> I rebased the patch to the master branch. > > Thank you for working on this. I've read the latest patch. Here are > comments: > > --- > + if (track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_write_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > > * I think it should add the time in micro sec. > After running pgbench with track_wal_io_timing = on for 30 sec, > pg_stat_wal showed the following on my environment: > > postgres(1:61569)=# select * from pg_stat_wal; > -[ RECORD 1 ]+- > wal_records | 285947 > wal_fpi | 53285 > wal_bytes| 442008213 > wal_buffers_full | 0 > wal_write| 25516 > wal_write_time | 0 > wal_sync | 25437 > wal_sync_time| 14490 > stats_reset | 2021-01-22 10:56:13.29464+09 > > Since writes can complete less than a millisecond, wal_write_time > didn't increase. I think sync_time could also have the same problem. Thanks for your comments. I didn't notice that. I changed the unit from milliseconds to microseconds. > --- > + /* > +* Measure i/o timing to fsync WAL data. > +* > +* The wal receiver skip to collect it to avoid performance > degradation of standy servers. > +* If sync_method doesn't have its fsync method, to skip too. > +*/ > + if (!AmWalReceiverProcess() && track_wal_io_timing && > fsyncMethodCalled()) > + INSTR_TIME_SET_CURRENT(start); > > * Why does only the wal receiver skip it even if track_wal_io_timinig > is true? I think the performance degradation is also true for backend > processes. If there is another reason for that, I think it's better to > mention in both the doc and comment. > * How about checking track_wal_io_timing first? > * s/standy/standby/ I fixed it. As kuroda-san mentioned too, the skip is no need to be considered. I think you also removed the code to have the wal receiver report the stats. So with the latest patch, the wal receiver tracks those statistics but doesn't report. And maybe XLogWalRcvWrite() also needs to track I/O? Thanks, I forgot to add them. I'll fix it. > --- > + /* increment the i/o timing and the number of times to fsync WAL > data */ > + if (fsyncMethodCalled()) > + { > + if (!AmWalReceiverProcess() && track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_sync_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > + > + WalStats.m_wal_sync++; > + } > > * I'd avoid always calling fsyncMethodCalled() in this path. How about > incrementing m_wal_sync after each sync operation? I think if syncing the disk does not occur, m_wal_sync should not be incremented. It depends enableFsync and sync_method. enableFsync is checked in each fsync method like pg_fsync_no_writethrough(), so if incrementing m_wal_sync after each sync operation, it should be implemented in each fsync method. It leads to many duplicated codes. Right. I missed that each fsync function checks enableFsync. So, why don't you change the function to a flag whether to sync data to the disk will be occurred or not in issue_xlog_fsync()? Looks better. Since we don't necessarily need to increment m_wal_sync after doing fsync we can write the code without an additional variable as follows: if (enableFsync) { switch (sync_method) { case SYNC_METHOD_FSYNC: #ifdef HAVE_FSYNC_WRITETHROUGH case SYNC_METHOD_FSYNC_WRITETHROUGH: #endif #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: #endif WalStats.m_wal_sync++; if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); break; default: break; } } (do fsync and error handling here) /* increment the i/o timing and the number of times to fsync WAL data */ if (track_wal_io_timing) { INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } IIUC, I think we can't handle the following case. When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and "track_wal_io_timing" is enabled, "start" doesn't be initialized. My understanding is something wrong, isn't it? I thought the following is better. ``` /* Measure i/o timing to sync WAL data.*/ if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); (do fsync and error handling
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-25 11:47, japin wrote: On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada wrote: On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda wrote: On 2021-01-22 14:50, Masahiko Sawada wrote: > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > wrote: >> >> Hi, >> >> I rebased the patch to the master branch. > > Thank you for working on this. I've read the latest patch. Here are > comments: > > --- > + if (track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_write_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > > * I think it should add the time in micro sec. > After running pgbench with track_wal_io_timing = on for 30 sec, > pg_stat_wal showed the following on my environment: > > postgres(1:61569)=# select * from pg_stat_wal; > -[ RECORD 1 ]+- > wal_records | 285947 > wal_fpi | 53285 > wal_bytes| 442008213 > wal_buffers_full | 0 > wal_write| 25516 > wal_write_time | 0 > wal_sync | 25437 > wal_sync_time| 14490 > stats_reset | 2021-01-22 10:56:13.29464+09 > > Since writes can complete less than a millisecond, wal_write_time > didn't increase. I think sync_time could also have the same problem. Thanks for your comments. I didn't notice that. I changed the unit from milliseconds to microseconds. > --- > + /* > +* Measure i/o timing to fsync WAL data. > +* > +* The wal receiver skip to collect it to avoid performance > degradation of standy servers. > +* If sync_method doesn't have its fsync method, to skip too. > +*/ > + if (!AmWalReceiverProcess() && track_wal_io_timing && > fsyncMethodCalled()) > + INSTR_TIME_SET_CURRENT(start); > > * Why does only the wal receiver skip it even if track_wal_io_timinig > is true? I think the performance degradation is also true for backend > processes. If there is another reason for that, I think it's better to > mention in both the doc and comment. > * How about checking track_wal_io_timing first? > * s/standy/standby/ I fixed it. As kuroda-san mentioned too, the skip is no need to be considered. I think you also removed the code to have the wal receiver report the stats. So with the latest patch, the wal receiver tracks those statistics but doesn't report. And maybe XLogWalRcvWrite() also needs to track I/O? > --- > + /* increment the i/o timing and the number of times to fsync WAL > data */ > + if (fsyncMethodCalled()) > + { > + if (!AmWalReceiverProcess() && track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_sync_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > + > + WalStats.m_wal_sync++; > + } > > * I'd avoid always calling fsyncMethodCalled() in this path. How about > incrementing m_wal_sync after each sync operation? I think if syncing the disk does not occur, m_wal_sync should not be incremented. It depends enableFsync and sync_method. enableFsync is checked in each fsync method like pg_fsync_no_writethrough(), so if incrementing m_wal_sync after each sync operation, it should be implemented in each fsync method. It leads to many duplicated codes. Right. I missed that each fsync function checks enableFsync. So, why don't you change the function to a flag whether to sync data to the disk will be occurred or not in issue_xlog_fsync()? Looks better. Since we don't necessarily need to increment m_wal_sync after doing fsync we can write the code without an additional variable as follows: if (enableFsync) { switch (sync_method) { case SYNC_METHOD_FSYNC: #ifdef HAVE_FSYNC_WRITETHROUGH case SYNC_METHOD_FSYNC_WRITETHROUGH: #endif #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: #endif WalStats.m_wal_sync++; if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); break; default: break; } } (do fsync and error handling here) /* increment the i/o timing and the number of times to fsync WAL data */ if (track_wal_io_timing) { INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } I think we can change the first switch-case to an if statement. +1. We can also narrow the scope of "duration" into "if (track_wal_io_timing)" branch. Thanks, I'll change it. -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-25 10:36, Masahiko Sawada wrote: On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda wrote: On 2021-01-22 14:50, Masahiko Sawada wrote: > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > wrote: >> >> Hi, >> >> I rebased the patch to the master branch. > > Thank you for working on this. I've read the latest patch. Here are > comments: > > --- > + if (track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_write_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > > * I think it should add the time in micro sec. > After running pgbench with track_wal_io_timing = on for 30 sec, > pg_stat_wal showed the following on my environment: > > postgres(1:61569)=# select * from pg_stat_wal; > -[ RECORD 1 ]+- > wal_records | 285947 > wal_fpi | 53285 > wal_bytes| 442008213 > wal_buffers_full | 0 > wal_write| 25516 > wal_write_time | 0 > wal_sync | 25437 > wal_sync_time| 14490 > stats_reset | 2021-01-22 10:56:13.29464+09 > > Since writes can complete less than a millisecond, wal_write_time > didn't increase. I think sync_time could also have the same problem. Thanks for your comments. I didn't notice that. I changed the unit from milliseconds to microseconds. > --- > + /* > +* Measure i/o timing to fsync WAL data. > +* > +* The wal receiver skip to collect it to avoid performance > degradation of standy servers. > +* If sync_method doesn't have its fsync method, to skip too. > +*/ > + if (!AmWalReceiverProcess() && track_wal_io_timing && > fsyncMethodCalled()) > + INSTR_TIME_SET_CURRENT(start); > > * Why does only the wal receiver skip it even if track_wal_io_timinig > is true? I think the performance degradation is also true for backend > processes. If there is another reason for that, I think it's better to > mention in both the doc and comment. > * How about checking track_wal_io_timing first? > * s/standy/standby/ I fixed it. As kuroda-san mentioned too, the skip is no need to be considered. I think you also removed the code to have the wal receiver report the stats. So with the latest patch, the wal receiver tracks those statistics but doesn't report. And maybe XLogWalRcvWrite() also needs to track I/O? Thanks, I forgot to add them. I'll fix it. > --- > + /* increment the i/o timing and the number of times to fsync WAL > data */ > + if (fsyncMethodCalled()) > + { > + if (!AmWalReceiverProcess() && track_wal_io_timing) > + { > + INSTR_TIME_SET_CURRENT(duration); > + INSTR_TIME_SUBTRACT(duration, start); > + WalStats.m_wal_sync_time += > INSTR_TIME_GET_MILLISEC(duration); > + } > + > + WalStats.m_wal_sync++; > + } > > * I'd avoid always calling fsyncMethodCalled() in this path. How about > incrementing m_wal_sync after each sync operation? I think if syncing the disk does not occur, m_wal_sync should not be incremented. It depends enableFsync and sync_method. enableFsync is checked in each fsync method like pg_fsync_no_writethrough(), so if incrementing m_wal_sync after each sync operation, it should be implemented in each fsync method. It leads to many duplicated codes. Right. I missed that each fsync function checks enableFsync. So, why don't you change the function to a flag whether to sync data to the disk will be occurred or not in issue_xlog_fsync()? Looks better. Since we don't necessarily need to increment m_wal_sync after doing fsync we can write the code without an additional variable as follows: if (enableFsync) { switch (sync_method) { case SYNC_METHOD_FSYNC: #ifdef HAVE_FSYNC_WRITETHROUGH case SYNC_METHOD_FSYNC_WRITETHROUGH: #endif #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: #endif WalStats.m_wal_sync++; if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); break; default: break; } } (do fsync and error handling here) /* increment the i/o timing and the number of times to fsync WAL data */ if (track_wal_io_timing) { INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } IIUC, I think we can't handle the following case. When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and "track_wal_io_timing" is enabled, "start" doesn't be initialized. My understanding is something wrong, isn't it? I think we can change the first switch-case to an if statement. Yes, I'll change it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
RE: About to add WAL write/fsync statistics to pg_stat_wal view
Dear Ikeda-san, > I checked columns' descriptions of other views. > There are "Number of xxx", "Total number of xxx", "Total amount of time > that xxx" and "Total time spent xxx". Right. > Since the "time" is used for showing spending time, not count, > I'll change it to "Total number of WAL data written/synced to disk". > Thought? I misread your patch, sorry. I prefer your suggestion. Please fix like that way with others. Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-25 10:34, kuroda.hay...@fujitsu.com wrote: Dear Ikeda-san, Thank you for updating the patch. This can be applied to master, and can be used on my RHEL7. wal_write_time and wal_sync_time increase normally :-). ``` postgres=# select * from pg_stat_wal; -[ RECORD 1 ]+-- wal_records | 121781 wal_fpi | 2287 wal_bytes| 36055146 wal_buffers_full | 799 wal_write| 12770 wal_write_time | 4.469 wal_sync | 11962 wal_sync_time| 132.352 stats_reset | 2021-01-25 00:51:40.674412+00 ``` Thanks for checking. I put a further comment: ``` @@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i wal_buffers_full bigint - Number of times WAL data was written to disk because WAL buffers became full + Total number of times WAL data was written to disk because WAL buffers became full + + + + + role="column_definition"> + wal_write bigint + + + Total number of times WAL data was written to disk + + + + + role="column_definition"> + wal_write_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was written to disk, in milliseconds + (if is enabled, otherwise zero). + + + + + role="column_definition"> + wal_sync bigint + + + Total number of times WAL data was synced to disk + (if is open_datasync or + open_sync, this value is zero because WAL data is synced + when to write it). + + + + + role="column_definition"> + wal_sync_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was synced to disk, in milliseconds + (if is enabled, otherwise zero. + if is open_datasync or + open_sync, this value is zero too because WAL data is synced + when to write it). ``` Maybe "Total amount of time" should be used, not "Total number of time." Other views use "amount." Thanks. I checked columns' descriptions of other views. There are "Number of xxx", "Total number of xxx", "Total amount of time that xxx" and "Total time spent xxx". Since the "time" is used for showing spending time, not count, I'll change it to "Total number of WAL data written/synced to disk". Thought? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada wrote: > On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda > wrote: >> >> On 2021-01-22 14:50, Masahiko Sawada wrote: >> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda >> > wrote: >> >> >> >> Hi, >> >> >> >> I rebased the patch to the master branch. >> > >> > Thank you for working on this. I've read the latest patch. Here are >> > comments: >> > >> > --- >> > + if (track_wal_io_timing) >> > + { >> > + INSTR_TIME_SET_CURRENT(duration); >> > + INSTR_TIME_SUBTRACT(duration, start); >> > + WalStats.m_wal_write_time += >> > INSTR_TIME_GET_MILLISEC(duration); >> > + } >> > >> > * I think it should add the time in micro sec. >> > After running pgbench with track_wal_io_timing = on for 30 sec, >> > pg_stat_wal showed the following on my environment: >> > >> > postgres(1:61569)=# select * from pg_stat_wal; >> > -[ RECORD 1 ]+- >> > wal_records | 285947 >> > wal_fpi | 53285 >> > wal_bytes| 442008213 >> > wal_buffers_full | 0 >> > wal_write| 25516 >> > wal_write_time | 0 >> > wal_sync | 25437 >> > wal_sync_time| 14490 >> > stats_reset | 2021-01-22 10:56:13.29464+09 >> > >> > Since writes can complete less than a millisecond, wal_write_time >> > didn't increase. I think sync_time could also have the same problem. >> >> Thanks for your comments. I didn't notice that. >> I changed the unit from milliseconds to microseconds. >> >> > --- >> > + /* >> > +* Measure i/o timing to fsync WAL data. >> > +* >> > +* The wal receiver skip to collect it to avoid performance >> > degradation of standy servers. >> > +* If sync_method doesn't have its fsync method, to skip too. >> > +*/ >> > + if (!AmWalReceiverProcess() && track_wal_io_timing && >> > fsyncMethodCalled()) >> > + INSTR_TIME_SET_CURRENT(start); >> > >> > * Why does only the wal receiver skip it even if track_wal_io_timinig >> > is true? I think the performance degradation is also true for backend >> > processes. If there is another reason for that, I think it's better to >> > mention in both the doc and comment. >> > * How about checking track_wal_io_timing first? >> > * s/standy/standby/ >> >> I fixed it. >> As kuroda-san mentioned too, the skip is no need to be considered. > > I think you also removed the code to have the wal receiver report the > stats. So with the latest patch, the wal receiver tracks those > statistics but doesn't report. > > And maybe XLogWalRcvWrite() also needs to track I/O? > >> >> > --- >> > + /* increment the i/o timing and the number of times to fsync WAL >> > data */ >> > + if (fsyncMethodCalled()) >> > + { >> > + if (!AmWalReceiverProcess() && track_wal_io_timing) >> > + { >> > + INSTR_TIME_SET_CURRENT(duration); >> > + INSTR_TIME_SUBTRACT(duration, start); >> > + WalStats.m_wal_sync_time += >> > INSTR_TIME_GET_MILLISEC(duration); >> > + } >> > + >> > + WalStats.m_wal_sync++; >> > + } >> > >> > * I'd avoid always calling fsyncMethodCalled() in this path. How about >> > incrementing m_wal_sync after each sync operation? >> >> I think if syncing the disk does not occur, m_wal_sync should not be >> incremented. >> It depends enableFsync and sync_method. >> >> enableFsync is checked in each fsync method like >> pg_fsync_no_writethrough(), >> so if incrementing m_wal_sync after each sync operation, it should be >> implemented >> in each fsync method. It leads to many duplicated codes. > > Right. I missed that each fsync function checks enableFsync. > >> So, why don't you change the function to a flag whether to >> sync data to the disk will be occurred or not in issue_xlog_fsync()? > > Looks better. Since we don't necessarily need to increment m_wal_sync > after doing fsync we can write the code without an additional variable > as follows: > > if (enableFsync) > { > switch (sync_method) > { > case SYNC_METHOD_FSYNC: > #ifdef HAVE_FSYNC_WRITETHROUGH > case SYNC_METHOD_FSYNC_WRITETHROUGH: > #endif > #ifdef HAVE_FDATASYNC > case SYNC_METHOD_FDATASYNC: > #endif > WalStats.m_wal_sync++; > if (track_wal_io_timing) > INSTR_TIME_SET_CURRENT(start); > break; > default: > break; > } > } > > (do fsync and error handling here) > >/* increment the i/o timing and the number of times to fsync WAL data */ >if (track_wal_io_timing) >{ >INSTR_TIME_SET_CURRENT(duration); >INSTR_TIME_SUBTRACT(duration, start); >WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); >} > > I think we can change the first switch-case to an if statement. > +1. We can also narrow the scope of "duration" into "if (track_wal_io_timing)" branch. --
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda wrote: > > On 2021-01-22 14:50, Masahiko Sawada wrote: > > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > > wrote: > >> > >> Hi, > >> > >> I rebased the patch to the master branch. > > > > Thank you for working on this. I've read the latest patch. Here are > > comments: > > > > --- > > + if (track_wal_io_timing) > > + { > > + INSTR_TIME_SET_CURRENT(duration); > > + INSTR_TIME_SUBTRACT(duration, start); > > + WalStats.m_wal_write_time += > > INSTR_TIME_GET_MILLISEC(duration); > > + } > > > > * I think it should add the time in micro sec. > > After running pgbench with track_wal_io_timing = on for 30 sec, > > pg_stat_wal showed the following on my environment: > > > > postgres(1:61569)=# select * from pg_stat_wal; > > -[ RECORD 1 ]+- > > wal_records | 285947 > > wal_fpi | 53285 > > wal_bytes| 442008213 > > wal_buffers_full | 0 > > wal_write| 25516 > > wal_write_time | 0 > > wal_sync | 25437 > > wal_sync_time| 14490 > > stats_reset | 2021-01-22 10:56:13.29464+09 > > > > Since writes can complete less than a millisecond, wal_write_time > > didn't increase. I think sync_time could also have the same problem. > > Thanks for your comments. I didn't notice that. > I changed the unit from milliseconds to microseconds. > > > --- > > + /* > > +* Measure i/o timing to fsync WAL data. > > +* > > +* The wal receiver skip to collect it to avoid performance > > degradation of standy servers. > > +* If sync_method doesn't have its fsync method, to skip too. > > +*/ > > + if (!AmWalReceiverProcess() && track_wal_io_timing && > > fsyncMethodCalled()) > > + INSTR_TIME_SET_CURRENT(start); > > > > * Why does only the wal receiver skip it even if track_wal_io_timinig > > is true? I think the performance degradation is also true for backend > > processes. If there is another reason for that, I think it's better to > > mention in both the doc and comment. > > * How about checking track_wal_io_timing first? > > * s/standy/standby/ > > I fixed it. > As kuroda-san mentioned too, the skip is no need to be considered. I think you also removed the code to have the wal receiver report the stats. So with the latest patch, the wal receiver tracks those statistics but doesn't report. And maybe XLogWalRcvWrite() also needs to track I/O? > > > --- > > + /* increment the i/o timing and the number of times to fsync WAL > > data */ > > + if (fsyncMethodCalled()) > > + { > > + if (!AmWalReceiverProcess() && track_wal_io_timing) > > + { > > + INSTR_TIME_SET_CURRENT(duration); > > + INSTR_TIME_SUBTRACT(duration, start); > > + WalStats.m_wal_sync_time += > > INSTR_TIME_GET_MILLISEC(duration); > > + } > > + > > + WalStats.m_wal_sync++; > > + } > > > > * I'd avoid always calling fsyncMethodCalled() in this path. How about > > incrementing m_wal_sync after each sync operation? > > I think if syncing the disk does not occur, m_wal_sync should not be > incremented. > It depends enableFsync and sync_method. > > enableFsync is checked in each fsync method like > pg_fsync_no_writethrough(), > so if incrementing m_wal_sync after each sync operation, it should be > implemented > in each fsync method. It leads to many duplicated codes. Right. I missed that each fsync function checks enableFsync. > So, why don't you change the function to a flag whether to > sync data to the disk will be occurred or not in issue_xlog_fsync()? Looks better. Since we don't necessarily need to increment m_wal_sync after doing fsync we can write the code without an additional variable as follows: if (enableFsync) { switch (sync_method) { case SYNC_METHOD_FSYNC: #ifdef HAVE_FSYNC_WRITETHROUGH case SYNC_METHOD_FSYNC_WRITETHROUGH: #endif #ifdef HAVE_FDATASYNC case SYNC_METHOD_FDATASYNC: #endif WalStats.m_wal_sync++; if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); break; default: break; } } (do fsync and error handling here) /* increment the i/o timing and the number of times to fsync WAL data */ if (track_wal_io_timing) { INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } I think we can change the first switch-case to an if statement. > > > > * As far as I read the code, issue_xlog_fsync() seems to do fsync even > > if enableFsync is false. Why does the function return false in that > > case? I might be missing something. > > IIUC, the reason is that I thought that each fsync functions like > pg_fsync_no_writethrough() check enableFsync. > > If this code doesn't
RE: About to add WAL write/fsync statistics to pg_stat_wal view
Dear Ikeda-san, Thank you for updating the patch. This can be applied to master, and can be used on my RHEL7. wal_write_time and wal_sync_time increase normally :-). ``` postgres=# select * from pg_stat_wal; -[ RECORD 1 ]+-- wal_records | 121781 wal_fpi | 2287 wal_bytes| 36055146 wal_buffers_full | 799 wal_write| 12770 wal_write_time | 4.469 wal_sync | 11962 wal_sync_time| 132.352 stats_reset | 2021-01-25 00:51:40.674412+00 ``` I put a further comment: ``` @@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i wal_buffers_full bigint - Number of times WAL data was written to disk because WAL buffers became full + Total number of times WAL data was written to disk because WAL buffers became full + + + + + + wal_write bigint + + + Total number of times WAL data was written to disk + + + + + + wal_write_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was written to disk, in milliseconds + (if is enabled, otherwise zero). + + + + + + wal_sync bigint + + + Total number of times WAL data was synced to disk + (if is open_datasync or + open_sync, this value is zero because WAL data is synced + when to write it). + + + + + + wal_sync_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was synced to disk, in milliseconds + (if is enabled, otherwise zero. + if is open_datasync or + open_sync, this value is zero too because WAL data is synced + when to write it). ``` Maybe "Total amount of time" should be used, not "Total number of time." Other views use "amount." I have no comments anymore. Hayato Kuroda FUJITSU LIMITED
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, Japin Thanks for your comments. On 2021-01-23 01:46, japin wrote: Hi, Masahiro Thanks for you update the v4 patch. Here are some comments: (1) + char*msg = NULL; + boolsync_called;/* whether to sync data to the disk. */ + instr_time start; + instr_time duration; + + /* check whether to sync data to the disk is really occurred. */ + sync_called = false; Maybe we can initialize the "sync_called" variable when declare it. Yes, I fixed it. (2) + if (sync_called) + { + /* increment the i/o timing and the number of times to fsync WAL data */ + if (track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); + } + + WalStats.m_wal_sync++; + } There is an extra space before INSTR_TIME_GET_MICROSEC(duration). Yes, I removed it. In the issue_xlog_fsync(), the comment says that if sync_method is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, it already write synced. Does that mean it synced when write the WAL data? And for those cases, we cannot get accurate write/sync timing and number of write/sync times, right? case SYNC_METHOD_OPEN: case SYNC_METHOD_OPEN_DSYNC: /* write synced it already */ break; Yes, I add the following comments in the document. @@ -3515,6 +3515,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Total number of times WAL data was synced to disk + (if is open_datasync or + open_sync, this value is zero because WAL data is synced + when to write it). @@ -3525,7 +3528,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Total amount of time that has been spent in the portion of WAL data was synced to disk, in milliseconds - (if is enabled, otherwise zero) + (if is enabled, otherwise zero. + if is open_datasync or + open_sync, this value is zero too because WAL data is synced + when to write it). I attached a modified patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom ee1b7d17391b9d9619f709afeacdd118973471d6 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 22 Jan 2021 21:38:31 +0900 Subject: [PATCH] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity. - the total number of writing/syncing WAL data. - the total amount of time that has been spent in writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. The statistics related to sync are zero when "wal_sync_method" is "open_datasync" or "open_sync", because it leads syncing WAL data at same time when to write it. Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) --- doc/src/sgml/config.sgml | 21 +++ doc/src/sgml/monitoring.sgml | 48 ++- src/backend/access/transam/xlog.c | 59 ++- src/backend/catalog/system_views.sql | 4 ++ src/backend/postmaster/pgstat.c | 4 ++ src/backend/postmaster/walwriter.c| 3 + src/backend/utils/adt/pgstatfuncs.c | 24 +++- src/backend/utils/misc/guc.c | 9 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_proc.dat | 14 ++--- src/include/pgstat.h | 8 +++ src/test/regress/expected/rules.out | 6 +- 13 files changed, 189 insertions(+), 13 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 82864bbb24..43f3fbcaf8 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7416,6 +7416,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + track_wal_io_timing (boolean) + + track_wal_io_timing configuration parameter + + + + +Enables timing of WAL I/O calls. This parameter is off by default, +because it will repeatedly query the operating system for +the current time, which may cause significant overhead on some +platforms. You can use the tool to +measure the overhead of timing on your system. +I/O timing information is +displayed in
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, Masahiro Thanks for you update the v4 patch. Here are some comments: (1) + char*msg = NULL; + boolsync_called;/* whether to sync data to the disk. */ + instr_time start; + instr_time duration; + + /* check whether to sync data to the disk is really occurred. */ + sync_called = false; Maybe we can initialize the "sync_called" variable when declare it. (2) + if (sync_called) + { + /* increment the i/o timing and the number of times to fsync WAL data */ + if (track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); + } + + WalStats.m_wal_sync++; + } There is an extra space before INSTR_TIME_GET_MICROSEC(duration). In the issue_xlog_fsync(), the comment says that if sync_method is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, it already write synced. Does that mean it synced when write the WAL data? And for those cases, we cannot get accurate write/sync timing and number of write/sync times, right? case SYNC_METHOD_OPEN: case SYNC_METHOD_OPEN_DSYNC: /* write synced it already */ break; On Fri, 22 Jan 2021 at 21:05, Masahiro Ikeda wrote: > On 2021-01-22 14:50, Masahiko Sawada wrote: >> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda >> wrote: >>> >>> Hi, >>> >>> I rebased the patch to the master branch. >> >> Thank you for working on this. I've read the latest patch. Here are >> comments: >> >> --- >> + if (track_wal_io_timing) >> + { >> + INSTR_TIME_SET_CURRENT(duration); >> + INSTR_TIME_SUBTRACT(duration, start); >> + WalStats.m_wal_write_time += >> INSTR_TIME_GET_MILLISEC(duration); >> + } >> >> * I think it should add the time in micro sec. >> After running pgbench with track_wal_io_timing = on for 30 sec, >> pg_stat_wal showed the following on my environment: >> >> postgres(1:61569)=# select * from pg_stat_wal; >> -[ RECORD 1 ]+- >> wal_records | 285947 >> wal_fpi | 53285 >> wal_bytes| 442008213 >> wal_buffers_full | 0 >> wal_write| 25516 >> wal_write_time | 0 >> wal_sync | 25437 >> wal_sync_time| 14490 >> stats_reset | 2021-01-22 10:56:13.29464+09 >> >> Since writes can complete less than a millisecond, wal_write_time >> didn't increase. I think sync_time could also have the same problem. > > Thanks for your comments. I didn't notice that. > I changed the unit from milliseconds to microseconds. > >> --- >> + /* >> +* Measure i/o timing to fsync WAL data. >> +* >> +* The wal receiver skip to collect it to avoid performance >> degradation of standy servers. >> +* If sync_method doesn't have its fsync method, to skip too. >> +*/ >> + if (!AmWalReceiverProcess() && track_wal_io_timing && >> fsyncMethodCalled()) >> + INSTR_TIME_SET_CURRENT(start); >> >> * Why does only the wal receiver skip it even if track_wal_io_timinig >> is true? I think the performance degradation is also true for backend >> processes. If there is another reason for that, I think it's better to >> mention in both the doc and comment. >> * How about checking track_wal_io_timing first? >> * s/standy/standby/ > > I fixed it. > As kuroda-san mentioned too, the skip is no need to be considered. > >> --- >> + /* increment the i/o timing and the number of times to fsync WAL >> data */ >> + if (fsyncMethodCalled()) >> + { >> + if (!AmWalReceiverProcess() && track_wal_io_timing) >> + { >> + INSTR_TIME_SET_CURRENT(duration); >> + INSTR_TIME_SUBTRACT(duration, start); >> + WalStats.m_wal_sync_time += >> INSTR_TIME_GET_MILLISEC(duration); >> + } >> + >> + WalStats.m_wal_sync++; >> + } >> >> * I'd avoid always calling fsyncMethodCalled() in this path. How about >> incrementing m_wal_sync after each sync operation? > > I think if syncing the disk does not occur, m_wal_sync should not be > incremented. > It depends enableFsync and sync_method. > > enableFsync is checked in each fsync method like > pg_fsync_no_writethrough(), > so if incrementing m_wal_sync after each sync operation, it should be > implemented > in each fsync method. It leads to many duplicated codes. > > So, why don't you change the function to a flag whether to > sync data to the disk will be occurred or not in issue_xlog_fsync()? > > >> --- >> +/* >> + * Check if fsync mothod is called. >> + */ >> +static bool >> +fsyncMethodCalled() >> +{ >> + if (!enableFsync) >> + return false; >> + >> + switch (sync_method) >> + { >> + case SYNC_METHOD_FSYNC: >> + case
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-22 14:50, Masahiko Sawada wrote: On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda wrote: Hi, I rebased the patch to the master branch. Thank you for working on this. I've read the latest patch. Here are comments: --- + if (track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration); + } * I think it should add the time in micro sec. After running pgbench with track_wal_io_timing = on for 30 sec, pg_stat_wal showed the following on my environment: postgres(1:61569)=# select * from pg_stat_wal; -[ RECORD 1 ]+- wal_records | 285947 wal_fpi | 53285 wal_bytes| 442008213 wal_buffers_full | 0 wal_write| 25516 wal_write_time | 0 wal_sync | 25437 wal_sync_time| 14490 stats_reset | 2021-01-22 10:56:13.29464+09 Since writes can complete less than a millisecond, wal_write_time didn't increase. I think sync_time could also have the same problem. Thanks for your comments. I didn't notice that. I changed the unit from milliseconds to microseconds. --- + /* +* Measure i/o timing to fsync WAL data. +* +* The wal receiver skip to collect it to avoid performance degradation of standy servers. +* If sync_method doesn't have its fsync method, to skip too. +*/ + if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled()) + INSTR_TIME_SET_CURRENT(start); * Why does only the wal receiver skip it even if track_wal_io_timinig is true? I think the performance degradation is also true for backend processes. If there is another reason for that, I think it's better to mention in both the doc and comment. * How about checking track_wal_io_timing first? * s/standy/standby/ I fixed it. As kuroda-san mentioned too, the skip is no need to be considered. --- + /* increment the i/o timing and the number of times to fsync WAL data */ + if (fsyncMethodCalled()) + { + if (!AmWalReceiverProcess() && track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time += INSTR_TIME_GET_MILLISEC(duration); + } + + WalStats.m_wal_sync++; + } * I'd avoid always calling fsyncMethodCalled() in this path. How about incrementing m_wal_sync after each sync operation? I think if syncing the disk does not occur, m_wal_sync should not be incremented. It depends enableFsync and sync_method. enableFsync is checked in each fsync method like pg_fsync_no_writethrough(), so if incrementing m_wal_sync after each sync operation, it should be implemented in each fsync method. It leads to many duplicated codes. So, why don't you change the function to a flag whether to sync data to the disk will be occurred or not in issue_xlog_fsync()? --- +/* + * Check if fsync mothod is called. + */ +static bool +fsyncMethodCalled() +{ + if (!enableFsync) + return false; + + switch (sync_method) + { + case SYNC_METHOD_FSYNC: + case SYNC_METHOD_FSYNC_WRITETHROUGH: + case SYNC_METHOD_FDATASYNC: + return true; + default: + /* others don't have a specific fsync method */ + return false; + } +} * I'm concerned that the function name could confuse the reader because it's called even before the fsync method is called. As I commented above, calling to fsyncMethodCalled() can be eliminated. That way, this function is called at only once. So do we really need this function? Thanks to your comments, I removed them. * As far as I read the code, issue_xlog_fsync() seems to do fsync even if enableFsync is false. Why does the function return false in that case? I might be missing something. IIUC, the reason is that I thought that each fsync functions like pg_fsync_no_writethrough() check enableFsync. If this code doesn't check, m_wal_sync_time may be incremented even though some sync methods like SYNC_METHOD_OPEN don't call to sync some data to the disk at the time. * void is missing as argument? * s/mothod/method/ I removed them. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom 3db5e6fc1b68ae02b393a02b2d28f0d163d7bd03 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Fri, 22 Jan 2021 21:38:31 +0900 Subject: [PATCH] Add statistics related to write/sync wal records. This patch adds following statistics to pg_stat_wal view to track WAL I/O activity. - the total number of writing/syncing WAL data. - the total amount of time that has been spent in writing/syncing WAL data. Since to track I/O timing may leads significant overhead, GUC parameter "track_wal_io_timing" is introduced. Only if this is on, the I/O timing is measured. Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
RE: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-01-22 11:54, kuroda.hay...@fujitsu.com wrote: Dear Ikeda-san, This patch cannot be applied to the HEAD, but anyway I put a comment. ``` + /* +* Measure i/o timing to fsync WAL data. +* +* The wal receiver skip to collect it to avoid performance degradation of standy servers. +* If sync_method doesn't have its fsync method, to skip too. +*/ + if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled()) + INSTR_TIME_SET_CURRENT(start); ``` I think m_wal_sync_time should be collected even if the process is WalRecevier. Because all wal_fsync should be recorded, and some performance issues have been aleady occurred if track_wal_io_timing is turned on. I think it's strange only to take care of the walrecevier case. Kuroda-san, Thanks for your comments. Although I thought that the performance impact may be bigger in standby servers because WALReceiver didn't use wal buffers, it's no need to be considered. I agreed that if track_wal_io_timing is turned on, the primary server's performance degradation occurs too. I will make rebased and modified. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda wrote: > > Hi, > > I rebased the patch to the master branch. Thank you for working on this. I've read the latest patch. Here are comments: --- + if (track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration); + } * I think it should add the time in micro sec. After running pgbench with track_wal_io_timing = on for 30 sec, pg_stat_wal showed the following on my environment: postgres(1:61569)=# select * from pg_stat_wal; -[ RECORD 1 ]+- wal_records | 285947 wal_fpi | 53285 wal_bytes| 442008213 wal_buffers_full | 0 wal_write| 25516 wal_write_time | 0 wal_sync | 25437 wal_sync_time| 14490 stats_reset | 2021-01-22 10:56:13.29464+09 Since writes can complete less than a millisecond, wal_write_time didn't increase. I think sync_time could also have the same problem. --- + /* +* Measure i/o timing to fsync WAL data. +* +* The wal receiver skip to collect it to avoid performance degradation of standy servers. +* If sync_method doesn't have its fsync method, to skip too. +*/ + if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled()) + INSTR_TIME_SET_CURRENT(start); * Why does only the wal receiver skip it even if track_wal_io_timinig is true? I think the performance degradation is also true for backend processes. If there is another reason for that, I think it's better to mention in both the doc and comment. * How about checking track_wal_io_timing first? * s/standy/standby/ --- + /* increment the i/o timing and the number of times to fsync WAL data */ + if (fsyncMethodCalled()) + { + if (!AmWalReceiverProcess() && track_wal_io_timing) + { + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_sync_time += INSTR_TIME_GET_MILLISEC(duration); + } + + WalStats.m_wal_sync++; + } * I'd avoid always calling fsyncMethodCalled() in this path. How about incrementing m_wal_sync after each sync operation? --- +/* + * Check if fsync mothod is called. + */ +static bool +fsyncMethodCalled() +{ + if (!enableFsync) + return false; + + switch (sync_method) + { + case SYNC_METHOD_FSYNC: + case SYNC_METHOD_FSYNC_WRITETHROUGH: + case SYNC_METHOD_FDATASYNC: + return true; + default: + /* others don't have a specific fsync method */ + return false; + } +} * I'm concerned that the function name could confuse the reader because it's called even before the fsync method is called. As I commented above, calling to fsyncMethodCalled() can be eliminated. That way, this function is called at only once. So do we really need this function? * As far as I read the code, issue_xlog_fsync() seems to do fsync even if enableFsync is false. Why does the function return false in that case? I might be missing something. * void is missing as argument? * s/mothod/method/ Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: About to add WAL write/fsync statistics to pg_stat_wal view
Dear Ikeda-san, This patch cannot be applied to the HEAD, but anyway I put a comment. ``` + /* +* Measure i/o timing to fsync WAL data. +* +* The wal receiver skip to collect it to avoid performance degradation of standy servers. +* If sync_method doesn't have its fsync method, to skip too. +*/ + if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled()) + INSTR_TIME_SET_CURRENT(start); ``` I think m_wal_sync_time should be collected even if the process is WalRecevier. Because all wal_fsync should be recorded, and some performance issues have been aleady occurred if track_wal_io_timing is turned on. I think it's strange only to take care of the walrecevier case. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, I rebased the patch to the master branch. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 4b60382778..45d54cd394 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + track_wal_io_timing (boolean) + + track_wal_io_timing configuration parameter + + + + +Enables timing of WAL I/O calls. This parameter is off by default, +because it will repeatedly query the operating system for +the current time, which may cause significant overhead on some +platforms. You can use the tool to +measure the overhead of timing on your system. +I/O timing information is +displayed in +pg_stat_wal. Only superusers can +change this setting. + + + + track_functions (enum) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 3d6c901306..1a5aad819f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i wal_buffers_full bigint - Number of times WAL data was written to disk because WAL buffers became full + Total number of times WAL data was written to disk because WAL buffers became full + + + + + + wal_write bigint + + + Total number of times WAL data was written to disk + + + + + + wal_write_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was written to disk, in milliseconds + (if is enabled, otherwise zero). + To avoid standby server's performance degradation, they don't collect + this statistics + + + + + + wal_sync bigint + + + Total number of times WAL data was synced to disk + + + + + + wal_sync_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was synced to disk, in milliseconds + (if is enabled, otherwise zero). + To avoid standby server's performance degradation, they don't collect + this statistics diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9867e1b403..19d101647e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -109,6 +109,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int wal_retrieve_retry_interval = 5000; int max_slot_wal_keep_size_mb = -1; +bool track_wal_io_timing = false; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -980,6 +981,8 @@ static void WALInsertLockAcquireExclusive(void); static void WALInsertLockRelease(void); static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt); +static bool fsyncMethodCalled(); + /* * Insert an XLOG record represented by an already-constructed chain of data * chunks. This is a low-level routine; to construct the WAL record header @@ -2538,6 +2541,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) Size nbytes; Size nleft; int written; + instr_time start; + instr_time duration; /* OK to write the page(s) */ from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; @@ -2546,9 +2551,24 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) do { errno = 0; + +/* Measure i/o timing to write WAL data */ +if (track_wal_io_timing) + INSTR_TIME_SET_CURRENT(start); + pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); written = pg_pwrite(openLogFile, from, nleft, startoffset); pgstat_report_wait_end(); + +if (track_wal_io_timing) +{ + INSTR_TIME_SET_CURRENT(duration); + INSTR_TIME_SUBTRACT(duration, start); + WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration); +} + +WalStats.m_wal_write++; + if (written <= 0) { char xlogfname[MAXFNAMELEN]; @@ -10489,6 +10509,27 @@ assign_xlog_sync_method(int new_sync_method, void *extra) } } +/* + * Check if fsync mothod is called. + */ +static bool +fsyncMethodCalled() +{ + if (!enableFsync) + return false; + + switch (sync_method) + { + case SYNC_METHOD_FSYNC: + case SYNC_METHOD_FSYNC_WRITETHROUGH: + case SYNC_METHOD_FDATASYNC: + return true; + default: + /* others don't have a specific fsync method */ + return false; + } +} + /* * Issue appropriate kind of fsync (if any) for an XLOG output file. @@ -10499,8 +10540,19 @@ assign_xlog_sync_method(int new_sync_method, void *extra) void issue_xlog_fsync(int fd, XLogSegNo segno) { + instr_time
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2020-12-08 16:45, Li Japin wrote: Hi, On Dec 8, 2020, at 1:06 PM, Masahiro Ikeda wrote: Hi, I propose to add wal write/fsync statistics to pg_stat_wal view. It's useful not only for developing/improving source code related to WAL but also for users to detect workload changes, HW failure, and so on. I introduce "track_wal_io_timing" parameter and provide the following information on pg_stat_wal view. I separate the parameter from "track_io_timing" to "track_wal_io_timing" because IIUC, WAL I/O activity may have a greater impact on query performance than database I/O activity. ``` postgres=# SELECT wal_write, wal_write_time, wal_sync, wal_sync_time FROM pg_stat_wal; -[ RECORD 1 ]--+ wal_write | 650 # Total number of times WAL data was written to the disk wal_write_time | 43 # Total amount of time that has been spent in the portion of WAL data was written to disk # if track-wal-io-timing is enabled, otherwise zero wal_sync | 78 # Total number of times WAL data was synced to the disk wal_sync_time | 104 # Total amount of time that has been spent in the portion of WAL data was synced to disk # if track-wal-io-timing is enabled, otherwise zero ``` What do you think? Please let me know your comments. Regards -- Masahiro Ikeda NTT DATA CORPORATION<0001_add_wal_io_activity_to_the_pg_stat_wal.patch> There is a no previous prototype warning for ‘fsyncMethodCalled’, and it now only used in xlog.c, should we declare with static? And this function wants a boolean as a return, should we use true/false other than 0/1? +/* + * Check if fsync mothod is called. + */ +bool +fsyncMethodCalled() +{ + if (!enableFsync) + return 0; + + switch (sync_method) + { + case SYNC_METHOD_FSYNC: + case SYNC_METHOD_FSYNC_WRITETHROUGH: + case SYNC_METHOD_FDATASYNC: + return 1; + default: + /* others don't have a specific fsync method */ + return 0; + } +} + Thanks for your review. I agree with your comments. I fixed them. Regards -- Masahiro Ikeda NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8cd3d690..ba923a2b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + track_wal_io_timing (boolean) + + track_wal_io_timing configuration parameter + + + + +Enables timing of WAL I/O calls. This parameter is off by default, +because it will repeatedly query the operating system for +the current time, which may cause significant overhead on some +platforms. You can use the tool to +measure the overhead of timing on your system. +I/O timing information is +displayed in +pg_stat_wal. Only superusers can +change this setting. + + + + track_functions (enum) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 52a69a53..ce4f652d 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i wal_buffers_full bigint - Number of times WAL data was written to the disk because WAL buffers got full + Total number of times WAL data was written to the disk because WAL buffers got full + + + + + + wal_write bigint + + + Total number of times WAL data was written to the disk + + + + + + wal_write_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was written to disk, in milliseconds + (if is enabled, otherwise zero). + To avoid standby server's performance degradation, they don't collect + this statistics + + + + + + wal_sync bigint + + + Total number of times WAL data was synced to the disk + + + + + + wal_sync_time double precision + + + Total amount of time that has been spent in the portion of + WAL data was synced to disk, in milliseconds + (if is enabled, otherwise zero). + To avoid standby server's performance degradation, they don't collect + this statistics diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7e81ce4f..18d6ecc5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -109,6 +109,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int
Re: About to add WAL write/fsync statistics to pg_stat_wal view
Hi, > On Dec 8, 2020, at 1:06 PM, Masahiro Ikeda wrote: > > Hi, > > I propose to add wal write/fsync statistics to pg_stat_wal view. > It's useful not only for developing/improving source code related to WAL > but also for users to detect workload changes, HW failure, and so on. > > I introduce "track_wal_io_timing" parameter and provide the following > information on pg_stat_wal view. > I separate the parameter from "track_io_timing" to "track_wal_io_timing" > because IIUC, WAL I/O activity may have a greater impact on query performance > than database I/O activity. > > ``` > postgres=# SELECT wal_write, wal_write_time, wal_sync, wal_sync_time FROM > pg_stat_wal; > -[ RECORD 1 ]--+ > wal_write | 650 # Total number of times WAL data was written to the disk > > wal_write_time | 43 # Total amount of time that has been spent in the > portion of WAL data was written to disk > # if track-wal-io-timing is enabled, otherwise zero > > wal_sync | 78 # Total number of times WAL data was synced to the disk > > wal_sync_time | 104 # Total amount of time that has been spent in the > portion of WAL data was synced to disk > # if track-wal-io-timing is enabled, otherwise zero > ``` > > What do you think? > Please let me know your comments. > > Regards > -- > Masahiro Ikeda > NTT DATA CORPORATION<0001_add_wal_io_activity_to_the_pg_stat_wal.patch> There is a no previous prototype warning for ‘fsyncMethodCalled’, and it now only used in xlog.c, should we declare with static? And this function wants a boolean as a return, should we use true/false other than 0/1? +/* + * Check if fsync mothod is called. + */ +bool +fsyncMethodCalled() +{ + if (!enableFsync) + return 0; + + switch (sync_method) + { + case SYNC_METHOD_FSYNC: + case SYNC_METHOD_FSYNC_WRITETHROUGH: + case SYNC_METHOD_FDATASYNC: + return 1; + default: + /* others don't have a specific fsync method */ + return 0; + } +} + -- Best regards ChengDu WenWu Information Technology Co.,Ltd. Japin Li