Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-25 Thread Fujii Masao




On 2021/03/25 11:50, Masahiro Ikeda wrote:



On 2021/03/23 16:10, Fujii Masao wrote:



On 2021/03/22 20:25, ikedamsh wrote:

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the docs!

There was the discussion about when the stats collector is invoked, at [1].
Currently during archive recovery or standby, the stats collector is
invoked when the startup process reaches the consistent state, sends
PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting
read-only connections. But walreceiver can be invoked at earlier stage.
This can cause walreceiver to generate and send the statistics about WAL
writing even though the stats collector has not been running yet. This might
be problematic? If so, maybe we need to ensure that the stats collector is
invoked before walreceiver?

During recovery, the stats collector is not invoked if hot standby mode is
disabled. But walreceiver can be running in this case. So probably we should
change walreceiver so that it's invoked even when hot standby is disabled?
Otherwise we cannnot collect the statistics about WAL writing by walreceiver
in that case.

[1]
https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com


Thanks for comments! I didn't notice that.
As I mentioned[1], if my understanding is right, this issue seem to be not for
only the wal receiver.

Since the shared memory thread already handles these issues, does this patch,
which to collect the stats for the wal receiver and make a common function for
writing wal files, have to be committed after the patches for share memory
stats are committed? Or to handle them in this thread because we don't know
when the shared memory stats patches will be committed.

I think the former is better because to collect stats in shared memory is very
useful feature for users and it make a big change in design. So, I think it's
beneficial to make an effort to move the shared memory stats thread forward
(by reviewing or testing) instead of handling the issues in this thread.


Sounds reasonable. Agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-24 Thread Masahiro Ikeda



On 2021/03/23 16:10, Fujii Masao wrote:
> 
> 
> On 2021/03/22 20:25, ikedamsh wrote:
>> Agreed. Users can know whether the stats is for walreceiver or not. The
>> pg_stat_wal view in standby server shows for the walreceiver, and in primary
>> server it shows for the others. So, I updated the document.
>> (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)
> 
> Thanks for updating the docs!
> 
> There was the discussion about when the stats collector is invoked, at [1].
> Currently during archive recovery or standby, the stats collector is
> invoked when the startup process reaches the consistent state, sends
> PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting
> read-only connections. But walreceiver can be invoked at earlier stage.
> This can cause walreceiver to generate and send the statistics about WAL
> writing even though the stats collector has not been running yet. This might
> be problematic? If so, maybe we need to ensure that the stats collector is
> invoked before walreceiver?
> 
> During recovery, the stats collector is not invoked if hot standby mode is
> disabled. But walreceiver can be running in this case. So probably we should
> change walreceiver so that it's invoked even when hot standby is disabled?
> Otherwise we cannnot collect the statistics about WAL writing by walreceiver
> in that case.
> 
> [1]
> https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com

Thanks for comments! I didn't notice that.
As I mentioned[1], if my understanding is right, this issue seem to be not for
only the wal receiver.

Since the shared memory thread already handles these issues, does this patch,
which to collect the stats for the wal receiver and make a common function for
writing wal files, have to be committed after the patches for share memory
stats are committed? Or to handle them in this thread because we don't know
when the shared memory stats patches will be committed.

I think the former is better because to collect stats in shared memory is very
useful feature for users and it make a big change in design. So, I think it's
beneficial to make an effort to move the shared memory stats thread forward
(by reviewing or testing) instead of handling the issues in this thread.

[1]
https://www.postgresql.org/message-id/9f4e19ad-518d-b91a-e500-25a666471c42%40oss.nttdata.com

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-23 Thread Fujii Masao




On 2021/03/22 20:25, ikedamsh wrote:

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the docs!

There was the discussion about when the stats collector is invoked, at [1].
Currently during archive recovery or standby, the stats collector is
invoked when the startup process reaches the consistent state, sends
PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting
read-only connections. But walreceiver can be invoked at earlier stage.
This can cause walreceiver to generate and send the statistics about WAL
writing even though the stats collector has not been running yet. This might
be problematic? If so, maybe we need to ensure that the stats collector is
invoked before walreceiver?

During recovery, the stats collector is not invoked if hot standby mode is
disabled. But walreceiver can be running in this case. So probably we should
change walreceiver so that it's invoked even when hot standby is disabled?
Otherwise we cannnot collect the statistics about WAL writing by walreceiver
in that case.

[1]
https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-22 Thread ikedamsh


On 2021/03/22 16:50, Fujii Masao wrote:
> 
> 
> On 2021/03/22 9:50, ikedamsh wrote:
>> Agreed. I separated the patches.
>>
>> If only the former is committed, my trivial concern is that there may be
>> a disadvantage, but no advantage for the standby server. It may lead to
>> performance degradation to the wal receiver by calling
>> INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
>> latter patch is committed.
> 
> Your concern is valid, so let's polish and commit also the 0003 patch to v14.
> I'm still thinking that it's better to separate wal_xxx columns into
> walreceiver's and the others. But if we count even walreceiver activity on
> the existing columns, regarding 0003 patch, we need to update the document?
> For example, "Number of times WAL buffers were written out to disk via
> XLogWrite request." should be "Number of times WAL buffers were written
> out to disk via XLogWrite request and by WAL receiver process."? Maybe
> we need to append some descriptions about this into "WAL configuration"
> section?

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)

>> I followed the argument of pg_pwrite().
>> But, I think "char *" is better, so fixed it.
> 
> Thanks for updating the patch!
> 
> +extern int    XLogWriteFile(int fd, char *buf,
> +  size_t nbyte, off_t offset,
> +  TimeLineID timelineid, XLogSegNo segno,
> +  bool write_all);
> 
> write_all seems not to be necessary. You added this flag for walreceiver,
> I guess. But even without the argument, walreceiver seems to work expectedly.
> So, what about the attached patch? I applied some cosmetic changes to the 
> patch.

Thanks a lot. Yes, "write_all" is unnecessary.
Your patch is looks good to me.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index db4b4e460c..281b13b9fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3493,7 +3493,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
   
Number of times WAL buffers were written out to disk via
-   XLogWrite request.
+   XLogWrite request and WAL data were written
+   out to disk by the WAL receiver process.
See  for more information about
the internal WAL function XLogWrite.
   
@@ -3521,7 +3522,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
   
Total amount of time spent writing WAL buffers to disk via
-   XLogWrite request, in milliseconds
+   XLogWrite request and WAL data to disk
+   by the WAL receiver process, in milliseconds
(if  is enabled,
otherwise zero).  This includes the sync time when
wal_sync_method is either
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..39e7028c96 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -769,7 +769,7 @@
   
 
   
-   There are two internal functions to write WAL data to disk:
+   There are two internal functions to write generated WAL data to disk:
XLogWrite and issue_xlog_fsync.
When  is enabled, the total
amounts of time XLogWrite writes and
@@ -795,7 +795,19 @@
issue_xlog_fsync syncs WAL data to disk are also
counted as wal_write and wal_sync
in pg_stat_wal, respectively.
-  
+   To write replicated WAL data to disk by the WAL receiver is almost the same
+   as above except for some points. First, there is a dedicated code path for the
+   WAL receiver to write data although issue_xlog_fsync is
+   the same for syncing data.
+   Second, the WAL receiver writes replicated WAL data per bytes from the local
+   memory although the generated WAL data is written per WAL buffer pages.
+   The counters of wal_write, wal_sync,
+   wal_write_time, and wal_sync_time are
+   common statistics for writing/syncing both generated and replicated WAL data.
+   But, you can distinguish them because the generated WAL data is written/synced
+   in the primary server and the replicated WAL data is written/synced in
+   the standby server.
+   
  
 
  
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..df028c5039 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statistics to the stats collector before terminating */
+	pgstat_send_wal(true);
+
 	/* Mark ourselves inactive in shared memory */
 	SpinLockAcquire(>mutex);
 	Assert(walrcv->walRcvState 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-22 Thread Fujii Masao



On 2021/03/22 9:50, ikedamsh wrote:

Agreed. I separated the patches.

If only the former is committed, my trivial concern is that there may be
a disadvantage, but no advantage for the standby server. It may lead to
performance degradation to the wal receiver by calling
INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
latter patch is committed.


Your concern is valid, so let's polish and commit also the 0003 patch to v14.
I'm still thinking that it's better to separate wal_xxx columns into
walreceiver's and the others. But if we count even walreceiver activity on
the existing columns, regarding 0003 patch, we need to update the document?
For example, "Number of times WAL buffers were written out to disk via
XLogWrite request." should be "Number of times WAL buffers were written
out to disk via XLogWrite request and by WAL receiver process."? Maybe
we need to append some descriptions about this into "WAL configuration"
section?



I followed the argument of pg_pwrite().
But, I think "char *" is better, so fixed it.


Thanks for updating the patch!

+extern int XLogWriteFile(int fd, char *buf,
+ size_t nbyte, off_t offset,
+ TimeLineID timelineid, 
XLogSegNo segno,
+ bool write_all);

write_all seems not to be necessary. You added this flag for walreceiver,
I guess. But even without the argument, walreceiver seems to work expectedly.
So, what about the attached patch? I applied some cosmetic changes to the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 6f8810e149..9d8ea7edca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2534,63 +2534,15 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
{
char   *from;
Sizenbytes;
-   Sizenleft;
int written;
-   instr_time  start;
 
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
-   nleft = nbytes;
-   do
-   {
-   errno = 0;
-
-   /* Measure I/O timing to write WAL data */
-   if (track_wal_io_timing)
-   INSTR_TIME_SET_CURRENT(start);
-
-   pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
-   written = pg_pwrite(openLogFile, from, nleft, 
startoffset);
-   pgstat_report_wait_end();
-
-   /*
-* Increment the I/O timing and the number of 
times WAL data
-* were written out to disk.
-*/
-   if (track_wal_io_timing)
-   {
-   instr_time  duration;
-
-   INSTR_TIME_SET_CURRENT(duration);
-   INSTR_TIME_SUBTRACT(duration, start);
-   WalStats.m_wal_write_time += 
INSTR_TIME_GET_MICROSEC(duration);
-   }
-
-   WalStats.m_wal_write++;
-
-   if (written <= 0)
-   {
-   charxlogfname[MAXFNAMELEN];
-   int save_errno;
-
-   if (errno == EINTR)
-   continue;
-
-   save_errno = errno;
-   XLogFileName(xlogfname, ThisTimeLineID, 
openLogSegNo,
-
wal_segment_size);
-   errno = save_errno;
-   ereport(PANIC,
-   
(errcode_for_file_access(),
-errmsg("could not 
write to log file %s "
-   "at 
offset %u, length %zu: %m",
-   
xlogfname, startoffset, nleft)));
-   }
-   nleft -= written;
-   

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-21 Thread ikedamsh
On 2021-03-19 16:30, Fujii Masao wrote:
> On 2021/03/15 10:39, Masahiro Ikeda wrote:
>> Thanks, I understood get_sync_bit() checks the sync flags and
>> the write unit of generated wal data and replicated wal data is 
>> different.
>> (It's interesting optimization whether to use kernel cache or not.)
>> 
>> OK. Although I agree to separate the stats for the walrecever,
>> I want to hear opinions from other people too. I didn't change the 
>> patch.
>> 
>> Please feel to your comments.
> 
> What about applying the patch for common WAL write function like
> XLogWriteFile(), separately from the patch for walreceiver's stats?
> Seems the former reaches the consensus, so we can commit it firstly.
> Also even only the former change is useful because which allows
> walreceiver to report WALWrite wait event.

Agreed. I separated the patches.

If only the former is committed, my trivial concern is that there may be
a disadvantage, but no advantage for the standby server. It may lead to
performance degradation to the wal receiver by calling
INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
latter patch is committed.

I think it's ok because this not happening in the case to disable the
"track_wal_io_timing" in the standby server. Although some users may start the
standby server using the backup which "track_wal_io_timing" is enabled in the
primary server, they will say it's ok since the users already accept the
performance degradation in the primary server.

>> OK. I agree.
>> 
>> I wonder to change the error check ways depending on who calls this 
>> function?
>> Now, only the walreceiver checks (1)errno==0 and doesn't check 
>> (2)errno==ENITR.
>> Other processes are the opposite.
>> 
>> IIUC, it's appropriate that every process checks (1)(2).
>> Please let me know my understanding is wrong.
> 
> I'm thinking the same. Regarding (2), commit 79ce29c734 introduced
> that code. According to the following commit log, it seems harmless
> to retry on EINTR even walreceiver.
> 
> Also retry on EINTR. All signals used in the backend are flagged 
> SA_RESTART
> nowadays, so it shouldn't happen, but better to be defensive.

Thanks, I understood.


>>> BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
>>> reports an error. But this is useless. Probably IO timing should be
>>> incremented after the return code of pg_pwrite() is checked, instead?
>> 
>> Yes, I agree. I fixed it.
>> (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)
> 
> Thanks for the patch!
> 
>   nleft = nbytes;
>   do
>   {
> - errno = 0;
> + written = XLogWriteFile(openLogFile, from, 
> nleft, (off_t) 
> startoffset,
> + 
> ThisTimeLineID, openLogSegNo, wal_segment_size);
> 
> Can we merge this do-while loop in XLogWrite() into the loop
> in XLogWriteFile()?
> If we do that, ISTM that the following codes are not necessary in 
> XLogWrite().
> 
>   nleft -= written;
>   from += written;

OK, I fixed it.


> + * 'segsize' is a segment size of WAL segment file.
> 
> Since segsize is always wal_segment_size, segsize argument seems
> not necessary in XLogWriteFile().

Right. I fixed it.


> +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset,
> +   TimeLineID timelineid, XLogSegNo segno, int segsize)
> 
> Why did you use "const void *" instead of "char *" for *buf?

I followed the argument of pg_pwrite().
But, I think "char *" is better, so fixed it.


> Regarding 0005 patch, I will review it later.

Thanks.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..df028c5039 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statistics to the stats collector before terminating */
+	pgstat_send_wal(true);
+
 	/* Mark ourselves inactive in shared memory */
 	SpinLockAcquire(>mutex);
 	Assert(walrcv->walRcvState == WALRCV_STREAMING ||
@@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 	XLogArchiveForceDone(xlogfname);
 else
 	XLogArchiveNotify(xlogfname);
+
+/*
+ * Send WAL statistics to the stats collector when finishing
+ * the current WAL segment file to avoid overloading it.
+ */
+pgstat_send_wal(false);
 			}
 			recvFile = -1;
 



diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bd5e787e55..4c7d90f1b9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2536,61 +2536,14 @@ 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-19 Thread Fujii Masao




On 2021/03/15 10:39, Masahiro Ikeda wrote:

Thanks, I understood get_sync_bit() checks the sync flags and
the write unit of generated wal data and replicated wal data is different.
(It's interesting optimization whether to use kernel cache or not.)

OK. Although I agree to separate the stats for the walrecever,
I want to hear opinions from other people too. I didn't change the patch.

Please feel to your comments.


What about applying the patch for common WAL write function like
XLogWriteFile(), separately from the patch for walreceiver's stats?
Seems the former reaches the consensus, so we can commit it firstly.
Also even only the former change is useful because which allows
walreceiver to report WALWrite wait event.


OK. I agree.

I wonder to change the error check ways depending on who calls this function?
Now, only the walreceiver checks (1)errno==0 and doesn't check (2)errno==ENITR.
Other processes are the opposite.

IIUC, it's appropriate that every process checks (1)(2).
Please let me know my understanding is wrong.


I'm thinking the same. Regarding (2), commit 79ce29c734 introduced
that code. According to the following commit log, it seems harmless
to retry on EINTR even walreceiver.

Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.


BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
reports an error. But this is useless. Probably IO timing should be
incremented after the return code of pg_pwrite() is checked, instead?


Yes, I agree. I fixed it.
(v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for the patch!

nleft = nbytes;
do
{
-   errno = 0;
+   written = XLogWriteFile(openLogFile, from, 
nleft, (off_t) startoffset,
+   
ThisTimeLineID, openLogSegNo, wal_segment_size);

Can we merge this do-while loop in XLogWrite() into the loop
in XLogWriteFile()?

If we do that, ISTM that the following codes are not necessary in XLogWrite().

nleft -= written;
from += written;

+ * 'segsize' is a segment size of WAL segment file.

Since segsize is always wal_segment_size, segsize argument seems
not necessary in XLogWriteFile().

+XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset,
+ TimeLineID timelineid, XLogSegNo segno, int segsize)

Why did you use "const void *" instead of "char *" for *buf?

Regarding 0005 patch, I will review it later.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-14 Thread Masahiro Ikeda

On 2021/03/11 21:29, Masahiro Ikeda wrote:

On 2021-03-11 11:52, Fujii Masao wrote:

On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:
IIUC the stats collector basically exits after checkpointer and 
walwriter exit.

But there seems no guarantee that the stats collector processes
all the messages that other processes have sent during the 
shutdown of

the server.


Thanks, I understood the above postmaster behaviors.

PMState manages the status and after checkpointer is exited, the 
postmaster sends
SIGQUIT signal to the stats collector if the shutdown mode is smart 
or fast.

(IIUC, although the postmaster kill the walsender, the archiver and
the stats collector at the same time, it's ok because the walsender
and the archiver doesn't send stats to the stats collector now.)

But, there might be a corner case to lose stats sent by background 
workers like
the checkpointer before they exit (although this is not implemented 
yet.)


For example,

1. checkpointer send the stats before it exit
2. stats collector receive the signal and break before processing
    the stats message from checkpointer. In this case, 1's message 
is lost.

3. stats collector writes the stats in the statsfiles and exit

Why don't you recheck the coming message is zero just before the 
2th procedure?

(v17-0004-guarantee-to-collect-last-stats-messages.patch)


Yes, I was thinking the same. This is the straight-forward fix for 
this issue.

The stats collector should process all the outstanding messages when
normal shutdown is requested, as the patch does. On the other hand,
if immediate shutdown is requested or emergency bailout (by 
postmaster death)
is requested, maybe the stats collector should skip those 
processings

and exit immediately.

But if we implement that, we would need to teach the stats collector
the shutdown type (i.e., normal shutdown or immediate one). Because
currently SIGQUIT is sent to the collector whichever shutdown is 
requested,
and so the collector cannot distinguish the shutdown type. I'm 
afraid that

change is a bit overkill for now.

BTW, I found that the collector calls pgstat_write_statsfiles() even 
at

emergency bailout case, before exiting. It's not necessary to save
the stats to the file in that case because subsequent server startup 
does

crash recovery and clears that stats file. So it's better to make
the collector exit immediately without calling 
pgstat_write_statsfiles()
at emergency bailout case? Probably this should be discussed in 
other
thread because it's different topic from the feature we're 
discussing here,

though.


IIUC, only the stats collector has another hander for SIGQUIT 
although
other background processes have a common hander for it and just call 
_exit(2).
I thought to guarantee when TerminateChildren(SIGTERM) is invoked, 
don't make stats

collector shutdown before other background processes are shutdown.

I will make another thread to discuss that the stats collector should 
know the shutdown type or not.
If it should be, it's better to make the stats collector exit as soon 
as possible if the shutdown type

is an immediate, and avoid losing the remaining stats if it's normal.


+1


I researched the past discussion related to writing the stats files when 
the immediate
shutdown is requested. And I found the following thread([1]) although 
the discussion is

stopped on 12/1/2016.

IIUC, the thread's consensus are

(1) To kill the stats collector soon before writing the stats file is 
needed in some case
because there is a possibility that it takes a long time until the 
failover happens.
The possible reasons are that disk write speed is slow, stats files 
are big, and so on.


(2) It needs to change the behavior from removing all stats files when 
the startup does

crash recovery because autovacuum uses the stats.

(3) It's ok that the stats collector exit without calling 
pgstat_write_statsfiles() if
the stats file is written every X minutes (using wal or another 
mechanism) and startup

process can restore the stats with slightly low freshness.

(4) It needs to find the way how to handle the (2)'s stats file when 
deleting on PITR

rewind or stats collector crash happens.

So, I need to ping the threads. But I don't have any idea to handle (4) 
yet...


[1] 
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F5EF25A%40G01JPEXMBYT05









I measured the timing of the above in my linux laptop using 
v17-measure-timing.patch.
I don't have any strong opinion to handle this case since this 
result shows to receive and processes
the messages takes too short time (less than 1ms) although the 
stats collector receives the shutdown

signal in 5msec(099->104) after the checkpointer process exits.


Agreed.



```
1615421204.556 [checkpointer] DEBUG:  received shutdown request 
signal
1615421208.099 [checkpointer] DEBUG:  proc_exit(-1): 0 callbacks to 
make  # exit and send the 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-14 Thread Masahiro Ikeda

On 2021-03-12 12:39, Fujii Masao wrote:

On 2021/03/11 21:29, Masahiro Ikeda wrote:

In addition, I rebased the patch for WAL receiver.
(v17-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks! Will review this later.


Thanks a lot!


I read through the 0003 patch. Here are some comments for that.

With the patch, walreceiver's stats are counted as wal_write,
wal_sync, wal_write_time and wal_sync_time in pg_stat_wal. But they
should be counted as different columns because WAL IO is different
between walreceiver and other processes like a backend? For example,
open_sync or open_datasync is chosen as wal_sync_method, those other
processes use O_DIRECT flag to open WAL files, but walreceiver does
not. For example, those other procesess write WAL data in block units,
but walreceiver does not. So I'm concerned that mixing different WAL
IO stats in the same columns would confuse the users. Thought? I'd
like to hear more opinions about how to expose walreceiver's stats to
users.


Thanks, I understood get_sync_bit() checks the sync flags and
the write unit of generated wal data and replicated wal data is 
different.

(It's interesting optimization whether to use kernel cache or not.)

OK. Although I agree to separate the stats for the walrecever,
I want to hear opinions from other people too. I didn't change the 
patch.


Please feel to your comments.





+int
+XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset)

This common function writes WAL data and measures IO timing. IMO we
can refactor the code furthermore by making this function handle the
case where pg_write() reports an error. In other words, I think that
the function should do what do-while loop block in XLogWrite() does.
Thought?


OK. I agree.

I wonder to change the error check ways depending on who calls this 
function?
Now, only the walreceiver checks (1)errno==0 and doesn't check 
(2)errno==ENITR.

Other processes are the opposite.

IIUC, it's appropriate that every process checks (1)(2).
Please let me know my understanding is wrong.




BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
reports an error. But this is useless. Probably IO timing should be
incremented after the return code of pg_pwrite() is checked, instead?


Yes, I agree. I fixed it.
(v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)



BTW, thanks for your comments in person that the bgwriter process will 
generate wal data.
I checked that it generates the WAL to take a snapshot via 
LogStandySnapshot().

I attached the patch for bgwriter to send the wal stats.
(v18-0005-send-stats-for-bgwriter-when-shutdown.patch)

This patch includes the following changes.

(1) introduce pgstat_send_bgwriter() the mechanism to send the stats
if PGSTAT_STAT_INTERVAL msec has passed like pgstat_send_wal()
to avoid overloading to stats collector because "bgwriter_delay"
can be set for 10msec or more.

(2) remove pgstat_report_wal() and integrate with pgstat_send_wal()
because bgwriter sends WalStats.m_wal_records and to avoid 
overloading (see (1)).
IIUC, although the pros to separate them is to reduce the 
calculation cost of

WalUsageAccumDiff(), the impact is limited.

(3) make a new signal handler for bgwriter to force sending remaining 
stats during shutdown
because of (1) and remove HandleMainLoopInterrupts() because there 
are no processes to use it.



Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9b2eb0d10b..c7bda9127f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2536,7 +2536,6 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
-			instr_time	start;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2544,49 +2543,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			nleft = nbytes;
 			do
 			{
-errno = 0;
+written = XLogWriteFile(openLogFile, from, nleft, (off_t) startoffset,
+		ThisTimeLineID, openLogSegNo, wal_segment_size);
 
-/* Measure I/O timing to write WAL data */
-if (track_wal_io_timing)
-	INSTR_TIME_SET_CURRENT(start);
-
-pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
-written = pg_pwrite(openLogFile, from, nleft, startoffset);
-pgstat_report_wait_end();
-
-/*
- * Increment the I/O timing and the number of times WAL data
- * were written out to disk.
- */
-if (track_wal_io_timing)
-{
-	instr_time	duration;
-
-	INSTR_TIME_SET_CURRENT(duration);
-	INSTR_TIME_SUBTRACT(duration, start);
-	WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
-}
-
-WalStats.m_wal_write++;
-
-if (written <= 0)
-{
-	char		xlogfname[MAXFNAMELEN];
-	int			save_errno;
-
-	if (errno == EINTR)
-		continue;
-
-	save_errno = errno;
-	

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-11 Thread Fujii Masao




On 2021/03/11 23:33, Fujii Masao wrote:



On 2021/03/11 21:29, Masahiro Ikeda wrote:

On 2021-03-11 11:52, Fujii Masao wrote:

On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-11 Thread Fujii Masao




On 2021/03/11 21:29, Masahiro Ikeda wrote:

On 2021-03-11 11:52, Fujii Masao wrote:

On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+  

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-11 Thread Fujii Masao




On 2021/03/11 21:29, Masahiro Ikeda wrote:

On 2021-03-11 11:52, Fujii Masao wrote:

On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+  

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-11 Thread Masahiro Ikeda

On 2021-03-11 11:52, Fujii Masao wrote:

On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked 
during an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally 
called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening 
parenthesis here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly 
query the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: 
"This is also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the 
WAL receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of 
this event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing 
the write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while 
the WAL
receiver path does not.  It seems technically 
straight-forward to
refactor here to avoid the almost-duplicated logic in the 
two places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way 
and don't have
any shared code between the two but instead implement the 
WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, 
this

half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver 
stats.

(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL 
receiver stats messages between the WAL receiver and the 
stats collector, and
the stats for WAL receiver is counted in 
pg_stat_wal_receiver.

What do you think?


On second thought, this idea seems not good. Because those 
stats are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver 
process running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they 
are in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in 
pg_stat_wal view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right 
now */

+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or 
open_data_sync,

to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has 
elapsed to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after 
XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, 
some values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send
the stats every after XLogBackgroundFlush() is called. 
Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a 
risk
that the WAL stats are sent too frequently. I agree that's a 
problem.




Why don't to make another way to check the timestamp?

+   /*
+  

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-10 Thread Fujii Masao



On 2021/03/11 9:38, Masahiro Ikeda wrote:

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+ 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-10 Thread Masahiro Ikeda

On 2021-03-10 17:08, Fujii Masao wrote:

On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during 
an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally 
called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening 
parenthesis here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly 
query the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This 
is also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while 
the WAL
receiver path does not.  It seems technically 
straight-forward to
refactor here to avoid the almost-duplicated logic in the 
two places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the 
WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, 
this

half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver 
stats.

(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL 
receiver stats messages between the WAL receiver and the 
stats collector, and
the stats for WAL receiver is counted in 
pg_stat_wal_receiver.

What do you think?


On second thought, this idea seems not good. Because those 
stats are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver 
process running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they 
are in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in 
pg_stat_wal view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now 
*/

+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or 
open_data_sync,

to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has 
elapsed to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() 
is called.
For example, if wal_writer_delay is set to several seconds, some 
values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send

the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a 
risk
that the WAL stats are sent too frequently. I agree that's a 
problem.




Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+   

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-10 Thread Fujii Masao




On 2021/03/10 14:11, Masahiro Ikeda wrote:

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-09 Thread Masahiro Ikeda

On 2021-03-09 17:51, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" 
or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis 
here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query 
the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This is 
also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while the 
WAL
receiver path does not.  It seems technically straight-forward 
to
refactor here to avoid the almost-duplicated logic in the two 
places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats 
collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats 
are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they are 
in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal 
view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed 
to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() 
is called.
For example, if wal_writer_delay is set to several seconds, some 
values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send

the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-09 Thread Fujii Masao




On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now,

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-09 Thread Fujii Masao




On 2021/03/09 4:47, David G. Johnston wrote:

On Mon, Mar 8, 2021 at 8:48 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:


Thanks for updating the patch! I applied cosmetic changes to that.
Patch attached. Barring any objection, I will commit this version.


Read over the patch and it looks good.


Thanks for the review! I committed the patch.




One minor "the" omission (in a couple of places, copy-paste style):

+       See  for more information about
+       internal WAL function XLogWrite.

"about *the* internal WAL function"


I added "the" in such two places. Thanks!




Also, I'm not sure why you find omitting documentation that the millisecond 
field has a fractional part out to microseconds to be helpful.


If this information should be documented, we should do that for
not only wal_write/sync_time but also other several columns,
for example, pg_stat_database.blk_write_time?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-08 Thread David G. Johnston
On Mon, Mar 8, 2021 at 8:48 AM Fujii Masao 
wrote:

>
> Thanks for updating the patch! I applied cosmetic changes to that.
> Patch attached. Barring any objection, I will commit this version.
>

Read over the patch and it looks good.

One minor "the" omission (in a couple of places, copy-paste style):

+   See  for more information about
+   internal WAL function XLogWrite.

"about *the* internal WAL function"

Also, I'm not sure why you find omitting documentation that the millisecond
field has a fractional part out to microseconds to be helpful.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-08 Thread Fujii Masao



On 2021/03/08 19:42, Masahiro Ikeda wrote:

On 2021-03-08 13:44, Fujii Masao wrote:

On 2021/03/05 19:54, Masahiro Ikeda wrote:

On 2021-03-05 12:47, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-08 Thread Masahiro Ikeda

On 2021-03-08 13:44, Fujii Masao wrote:

On 2021/03/05 19:54, Masahiro Ikeda wrote:

On 2021-03-05 12:47, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during 
an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally 
called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening 
parenthesis here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly 
query the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This 
is also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while 
the WAL
receiver path does not.  It seems technically 
straight-forward to
refactor here to avoid the almost-duplicated logic in the 
two places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the 
WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, 
this

half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver 
stats.

(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL 
receiver stats messages between the WAL receiver and the 
stats collector, and
the stats for WAL receiver is counted in 
pg_stat_wal_receiver.

What do you think?


On second thought, this idea seems not good. Because those 
stats are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver 
process running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they 
are in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in 
pg_stat_wal view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now 
*/

+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or 
open_data_sync,

to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has 
elapsed to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() 
is called.
For example, if wal_writer_delay is set to several seconds, some 
values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send

the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a 
risk
that the WAL stats are sent too frequently. I agree that's a 
problem.




Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+   

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-07 Thread Fujii Masao




On 2021/03/05 19:54, Masahiro Ikeda wrote:

On 2021-03-05 12:47, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-05 Thread Masahiro Ikeda

On 2021-03-05 12:47, Fujii Masao wrote:

On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is 
also

incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" 
or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis 
here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query 
the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This is 
also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable 
because ..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of 
issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while the 
WAL
receiver path does not.  It seems technically straight-forward 
to
refactor here to avoid the almost-duplicated logic in the two 
places,
though I suspect there may be a trade-off for not adding 
another
function call to the stack given the importance of WAL 
processing
(though that seems marginalized compared to the cost of 
actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats 
collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats 
are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running
at that moment. IOW, it seems strange that some values show 
dynamic
stats and the others show collected stats, even though they are 
in

the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal 
view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed 
to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() 
is called.
For example, if wal_writer_delay is set to several seconds, some 
values in
pg_stat_wal would be not up-to-date meaninglessly for those 
seconds.
So I'm thinking to withdraw my previous comment and it's ok to 
send

the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-04 Thread Fujii Masao



On 2021/03/05 8:38, Masahiro Ikeda wrote:

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now,

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-04 Thread Masahiro Ikeda

On 2021-03-05 01:02, Fujii Masao wrote:

On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis 
here and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query 
the

operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This is 
also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this 
event is
reported in wal_buffers_full in) This is undesirable because 
..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly
computing the sync statistics but does require computing the 
write
statistics.  This is because of the presence of issue_xlog_fsync 
but
absence of an equivalent pg_xlog_pwrite.  Additionally, I 
observe that
the XLogWrite code path calls pgstat_report_wait_*() while the 
WAL
receiver path does not.  It seems technically straight-forward 
to
refactor here to avoid the almost-duplicated logic in the two 
places,

though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL 
processing

(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and 
don't have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats collector, 
and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats 
are

collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running
at that moment. IOW, it seems strange that some values show 
dynamic

stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal 
view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed 
to minimize

+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is 
called.
For example, if wal_writer_delay is set to several seconds, some 
values in

pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now,

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-04 Thread Fujii Masao



On 2021/03/04 16:14, Masahiro Ikeda wrote:

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+    /* Check whether the WAL file was synced to disk right now */
+    if (enableFsync &&
+    (sync_method == SYNC_METHOD_FSYNC ||
+ sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+ sync_method == SYNC_METHOD_FDATASYNC))
+    {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+    /*
+ * Send WAL statistics only if WalWriterDelay has elapsed to minimize
+ * the overhead in WAL-writing.
+ */
+    if (rc & WL_TIMEOUT)
+    pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.


Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.



Why don't to make another way to check the timestamp?

+   /*
+    * Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+    * msec since we last sent one
+    */
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+   {
+   pgstat_send_wal();
+ 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-04 Thread Ibrar Ahmed
On Thu, Mar 4, 2021 at 12:14 PM Masahiro Ikeda 
wrote:

> On 2021-03-03 20:27, Masahiro Ikeda wrote:
> > On 2021-03-03 16:30, Fujii Masao wrote:
> >> On 2021/03/03 14:33, Masahiro Ikeda wrote:
> >>> On 2021-02-24 16:14, Fujii Masao wrote:
>  On 2021/02/15 11:59, Masahiro Ikeda wrote:
> > On 2021-02-10 00:51, David G. Johnston wrote:
> >> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
> >>  wrote:
> >>
> >>> I pgindented the patches.
> >>
> >> ... XLogWrite, which is invoked during an
> >> XLogFlush request (see ...).  This is also
> >> incremented by the WAL receiver during replication.
> >>
> >> ("which normally called" should be "which is normally called" or
> >> "which normally is called" if you want to keep true to the
> >> original)
> >> You missed the adding the space before an opening parenthesis here
> >> and
> >> elsewhere (probably copy-paste)
> >>
> >> is ether -> is either
> >> "This parameter is off by default as it will repeatedly query the
> >> operating system..."
> >> ", because" -> "as"
> >
> > Thanks, I fixed them.
> >
> >> wal_write_time and the sync items also need the note: "This is
> >> also
> >> incremented by the WAL receiver during replication."
> >
> > I skipped changing it since I separated the stats for the WAL
> > receiver
> > in pg_stat_wal_receiver.
> >
> >> "The number of times it happened..." -> " (the tally of this event
> >> is
> >> reported in wal_buffers_full in) This is undesirable because
> >> ..."
> >
> > Thanks, I fixed it.
> >
> >> I notice that the patch for WAL receiver doesn't require
> >> explicitly
> >> computing the sync statistics but does require computing the write
> >> statistics.  This is because of the presence of issue_xlog_fsync
> >> but
> >> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe
> >> that
> >> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
> >> receiver path does not.  It seems technically straight-forward to
> >> refactor here to avoid the almost-duplicated logic in the two
> >> places,
> >> though I suspect there may be a trade-off for not adding another
> >> function call to the stack given the importance of WAL processing
> >> (though that seems marginalized compared to the cost of actually
> >> writing the WAL).  Or, as Fujii noted, go the other way and don't
> >> have
> >> any shared code between the two but instead implement the WAL
> >> receiver
> >> one to use pg_stat_wal_receiver instead.  In either case, this
> >> half-and-half implementation seems undesirable.
> >
> > OK, as Fujii-san mentioned, I separated the WAL receiver stats.
> > (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
> 
>  Thanks for updating the patches!
> 
> 
> > I added the infrastructure code to communicate the WAL receiver
> > stats messages between the WAL receiver and the stats collector,
> > and
> > the stats for WAL receiver is counted in pg_stat_wal_receiver.
> > What do you think?
> 
>  On second thought, this idea seems not good. Because those stats are
>  collected between multiple walreceivers, but other values in
>  pg_stat_wal_receiver is only related to the walreceiver process
>  running
>  at that moment. IOW, it seems strange that some values show dynamic
>  stats and the others show collected stats, even though they are in
>  the same view pg_stat_wal_receiver. Thought?
> >>>
> >>> OK, I fixed it.
> >>> The stats collected in the WAL receiver is exposed in pg_stat_wal
> >>> view in v11 patch.
> >>
> >> Thanks for updating the patches! I'm now reading 001 patch.
> >>
> >> +/* Check whether the WAL file was synced to disk right now */
> >> +if (enableFsync &&
> >> +(sync_method == SYNC_METHOD_FSYNC ||
> >> + sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
> >> + sync_method == SYNC_METHOD_FDATASYNC))
> >> +{
> >>
> >> Isn't it better to make issue_xlog_fsync() return immediately
> >> if enableFsync is off, sync_method is open_sync or open_data_sync,
> >> to simplify the code more?
> >
> > Thanks for the comments.
> > I added the above code in v12 patch.
> >
> >>
> >> +/*
> >> + * Send WAL statistics only if WalWriterDelay has elapsed
> to
> >> minimize
> >> + * the overhead in WAL-writing.
> >> + */
> >> +if (rc & WL_TIMEOUT)
> >> +pgstat_send_wal();
> >>
> >> On second thought, this change means that it always takes
> >> wal_writer_delay
> >> before walwriter's WAL stats is sent after XLogBackgroundFlush() is
> >> called.
> >> For example, if wal_writer_delay is set to several seconds, some
> >> values in
> >> pg_stat_wal would be not up-to-date 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-03 Thread Masahiro Ikeda

On 2021-03-03 20:27, Masahiro Ikeda wrote:

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis here 
and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.

wal_write_time and the sync items also need the note: "This is 
also

incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this event 
is
reported in wal_buffers_full in) This is undesirable because 
..."


Thanks, I fixed it.

I notice that the patch for WAL receiver doesn't require 
explicitly

computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync 
but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe 
that

the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two 
places,

though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't 
have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats collector, 
and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running

at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal 
view in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+   /* Check whether the WAL file was synced to disk right now */
+   if (enableFsync &&
+   (sync_method == SYNC_METHOD_FSYNC ||
+sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+sync_method == SYNC_METHOD_FDATASYNC))
+   {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+   /*
+		 * Send WAL statistics only if WalWriterDelay has elapsed to 
minimize

+* the overhead in WAL-writing.
+*/
+   if (rc & WL_TIMEOUT)
+   pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is 
called.
For example, if wal_writer_delay is set to several seconds, some 
values in

pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.

Why don't to make another way to check the timestamp?

+   /*
+* Don't send a message unless it's been at least
PGSTAT_STAT_INTERVAL
+* msec since we last sent one
+*/
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now,
PGSTAT_STAT_INTERVAL))
+   {
+   pgstat_send_wal();
+   last_report = now;
+   }
+

Although I worried that it's better to 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-03 Thread Masahiro Ikeda

On 2021-03-03 16:30, Fujii Masao wrote:

On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the 
original)
You missed the adding the space before an opening parenthesis here 
and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL 
receiver

in pg_stat_wal_receiver.

"The number of times it happened..." -> " (the tally of this event 
is
reported in wal_buffers_full in) This is undesirable because 
..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync 
but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe 
that

the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two 
places,

though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't 
have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver 
stats messages between the WAL receiver and the stats collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process 
running

at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view 
in v11 patch.


Thanks for updating the patches! I'm now reading 001 patch.

+   /* Check whether the WAL file was synced to disk right now */
+   if (enableFsync &&
+   (sync_method == SYNC_METHOD_FSYNC ||
+sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+sync_method == SYNC_METHOD_FDATASYNC))
+   {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


Thanks for the comments.
I added the above code in v12 patch.



+   /*
+		 * Send WAL statistics only if WalWriterDelay has elapsed to 
minimize

+* the overhead in WAL-writing.
+*/
+   if (rc & WL_TIMEOUT)
+   pgstat_send_wal();

On second thought, this change means that it always takes 
wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is 
called.
For example, if wal_writer_delay is set to several seconds, some values 
in

pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?


Thanks, I didn't notice that.

Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
default value is 200msec and it may be set shorter time.

Why don't to make another way to check the timestamp?

+   /*
+* Don't send a message unless it's been at least 
PGSTAT_STAT_INTERVAL

+* msec since we last sent one
+*/
+   now = GetCurrentTimestamp();
+   if (TimestampDifferenceExceeds(last_report, now, 
PGSTAT_STAT_INTERVAL))

+   {
+   pgstat_send_wal();
+   last_report = now;
+   }
+

Although I worried that it's better to add the check code in 
pgstat_send_wal(),


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-02 Thread Fujii Masao




On 2021/03/03 14:33, Masahiro Ikeda wrote:

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 
patch.


Thanks for updating the patches! I'm now reading 001 patch.

+   /* Check whether the WAL file was synced to disk right now */
+   if (enableFsync &&
+   (sync_method == SYNC_METHOD_FSYNC ||
+sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
+sync_method == SYNC_METHOD_FDATASYNC))
+   {

Isn't it better to make issue_xlog_fsync() return immediately
if enableFsync is off, sync_method is open_sync or open_data_sync,
to simplify the code more?


+   /*
+* Send WAL statistics only if WalWriterDelay has elapsed to 
minimize
+* the overhead in WAL-writing.
+*/
+   if (rc & WL_TIMEOUT)
+   pgstat_send_wal();

On second thought, this change means that it always takes wal_writer_delay
before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
For example, if wal_writer_delay is set to several seconds, some values in
pg_stat_wal would be not up-to-date meaninglessly for those seconds.
So I'm thinking to withdraw my previous comment and it's ok to send
the stats every after XLogBackgroundFlush() is called. Thought?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-02 Thread Masahiro Ikeda

On 2021-02-24 16:14, Fujii Masao wrote:

On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here 
and

elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe 
that

the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't 
have
any shared code between the two but instead implement the WAL 
receiver

one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!


I added the infrastructure code to communicate the WAL receiver stats 
messages between the WAL receiver and the stats collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?


OK, I fixed it.
The stats collected in the WAL receiver is exposed in pg_stat_wal view 
in v11 patch.


I notice that the patch for WAL receiver doesn't require explicitly 
computing the sync statistics but does require computing the write 
statistics.  This is because of the presence of issue_xlog_fsync but 
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that 
the XLogWrite code path calls pgstat_report_wait_*() while the WAL 
receiver path does not.  It seems technically straight-forward to 
refactor here to avoid the almost-duplicated logic in the two places, 
though I suspect there may be a trade-off for not adding another 
function call to the stack given the importance of WAL processing 
(though that seems marginalized compared to the cost of actually 
writing the WAL).  Or, as Fujii noted, go the other way and don't have 
any shared code between the two but instead implement the WAL receiver 
one to use pg_stat_wal_receiver instead.  In either case, this 
half-and-half implementation seems undesirable.


I refactored the logic to write xlog file to unify collecting the write 
stats.
As David said, although pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE) 
is not called in the WAL receiver's path,

I agreed that the cost to write the WAL is much bigger.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom c97fff03cd2bd51b28f6c6fde56c48792683f44e Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 12 Feb 2021 11:19:59 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity by backends and background processes
except WAL receiver.

- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.

(This requires a catversion bump, as well as an update to
 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-23 Thread Fujii Masao




On 2021/02/15 11:59, Masahiro Ikeda wrote:

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the patches!



I added the infrastructure code to communicate the WAL receiver stats messages 
between the WAL receiver and the stats collector, and
the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?


On second thought, this idea seems not good. Because those stats are
collected between multiple walreceivers, but other values in
pg_stat_wal_receiver is only related to the walreceiver process running
at that moment. IOW, it seems strange that some values show dynamic
stats and the others show collected stats, even though they are in
the same view pg_stat_wal_receiver. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-14 Thread Masahiro Ikeda

On 2021-02-10 00:51, David G. Johnston wrote:

On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
 wrote:


I pgindented the patches.


... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also
incremented by the WAL receiver during replication.

("which normally called" should be "which is normally called" or
"which normally is called" if you want to keep true to the original)
You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either
"This parameter is off by default as it will repeatedly query the
operating system..."
", because" -> "as"


Thanks, I fixed them.


wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."


I skipped changing it since I separated the stats for the WAL receiver
in pg_stat_wal_receiver.


"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."


Thanks, I fixed it.


I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
the XLogWrite code path calls pgstat_report_wait_*() while the WAL
receiver path does not.  It seems technically straight-forward to
refactor here to avoid the almost-duplicated logic in the two places,
though I suspect there may be a trade-off for not adding another
function call to the stack given the importance of WAL processing
(though that seems marginalized compared to the cost of actually
writing the WAL).  Or, as Fujii noted, go the other way and don't have
any shared code between the two but instead implement the WAL receiver
one to use pg_stat_wal_receiver instead.  In either case, this
half-and-half implementation seems undesirable.


OK, as Fujii-san mentioned, I separated the WAL receiver stats.
(v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)

I added the infrastructure code to communicate the WAL receiver stats 
messages between the WAL receiver and the stats collector, and

the stats for WAL receiver is counted in pg_stat_wal_receiver.
What do you think?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 4ecbc467f88a2f923b3bc2eb6fe2c4f7725c02be Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 12 Feb 2021 11:19:59 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity by backends and background processes
except WAL receiver.

- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.

(This requires a catversion bump, as well as an update to
 PGSTAT_FILE_FORMAT_ID)

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David
Johnston, Fujii Masao
Discussion:
https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.co
---
 doc/src/sgml/config.sgml  | 23 +++-
 doc/src/sgml/monitoring.sgml  | 50 
 doc/src/sgml/wal.sgml | 12 +++-
 src/backend/access/transam/xlog.c | 57 +++
 src/backend/catalog/system_views.sql  |  4 ++
 src/backend/postmaster/pgstat.c   |  4 ++
 src/backend/postmaster/walwriter.c| 16 --
 src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
 src/backend/utils/misc/guc.c  |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   |  6 +-
 src/include/pgstat.h  | 10 
 src/test/regress/expected/rules.out   |  6 +-
 14 files changed, 208 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5ef1c7ad3c..0bb03d0717 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7404,7 +7404,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   

 Enables timing of database I/O calls.  This parameter is off by
-default, because it will repeatedly query the operating system for
+default, as it will repeatedly query the operating system for
 the current time, which may cause significant overhead on some
 platforms.  You can use the  tool to
 measure the overhead of timing on your system.
@@ -7418,6 +7418,27 @@ COPY postgres_log FROM 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-14 Thread Masahiro Ikeda

On 2021-02-08 14:26, Fujii Masao wrote:

On 2021/02/08 13:01, Fujii Masao wrote:



On 2021/02/05 8:45, Masahiro Ikeda wrote:

I pgindented the patches.


Thanks for updating the patches!

+   XLogWrite, which nomally called by an
+   issue_xlog_fsync, which nomally called by 
an


Typo: "nomally" should be "normally"?

+   XLogFlush request(see linkend="wal-configuration"/>)
+   XLogFlush request(see linkend="wal-configuration"/>),


Isn't it better to add a space character just after "request"?

+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    WalStats.m_wal_write_time = 
INSTR_TIME_GET_MICROSEC(duration);


If several cycles happen in the do-while loop, m_wal_write_time should 
be
updated with the sum of "duration" in those cycles instead of 
"duration"
in the last cycle? If yes, "+=" should be used instead of "=" when 
updating

m_wal_write_time?

+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    WalStats.m_wal_sync_time = 
INSTR_TIME_GET_MICROSEC(duration);


Also "=" should be "+=" in the above?


+   /* Send WAL statistics */
+   pgstat_send_wal();

This may cause overhead in WAL-writing by walwriter because it's called
every cycles even when walwriter needs to write more WAL next cycle
(don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal()
should be called only when WaitLatch() returns with WL_TIMEOUT?


Thanks, I didn't notice that.
I'll fix it.


-   XLogFlush request(see )
+   XLogFlush request(see ),
+   or WAL data written out to disk by WAL receiver.

So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and
wal_sync_time are updated even while the other values are not. Isn't 
this
confusing to users? If so, what about reporting those walreceiver stats 
in

pg_stat_wal_receiver?


OK, I'll add new infrastructure code to interect with wal receiver
and stats collector and show the stats in pg_stat_wal_receiver.


if (endofwal)
+   {
+   /* Send WAL statistics to the stats 
collector */
+   pgstat_send_wal();
break;

You added pgstat_send_wal() so that it's called in some cases where
walreceiver exits. But ISTM that there are other walreceiver-exit 
cases.

For example, in the case where SIGTERM is received. Instead,
pgstat_send_wal() should be called in WalRcvDie() for those all cases?


Thanks, I forgot the case.
I'll fix it.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-14 Thread Masahiro Ikeda

On 2021-02-08 13:01, Fujii Masao wrote:

On 2021/02/05 8:45, Masahiro Ikeda wrote:

I pgindented the patches.


Thanks for updating the patches!


Thanks for checking the patches.


+   XLogWrite, which nomally called by an
+   issue_xlog_fsync, which nomally called by 
an


Typo: "nomally" should be "normally"?


Yes, I'll fix it.


+   XLogFlush request(see )
+   XLogFlush request(see ),

Isn't it better to add a space character just after "request"?


Thanks, I'll fix it.


+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_write_time = 
INSTR_TIME_GET_MICROSEC(duration);

If several cycles happen in the do-while loop, m_wal_write_time should 
be
updated with the sum of "duration" in those cycles instead of 
"duration"
in the last cycle? If yes, "+=" should be used instead of "=" when 
updating

m_wal_write_time?
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time = 
INSTR_TIME_GET_MICROSEC(duration);

Also "=" should be "+=" in the above?


Yes, they are my mistake when changing the unit from milliseconds to 
microseconds.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-09 Thread David G. Johnston
On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda 
wrote:

> I pgindented the patches.
>
>
... XLogWrite, which is invoked during an
XLogFlush request (see ...).  This is also incremented
by the WAL receiver during replication.

("which normally called" should be "which is normally called" or "which
normally is called" if you want to keep true to the original)

You missed the adding the space before an opening parenthesis here and
elsewhere (probably copy-paste)

is ether -> is either

"This parameter is off by default as it will repeatedly query the operating
system..."
", because" -> "as"

wal_write_time and the sync items also need the note: "This is also
incremented by the WAL receiver during replication."

"The number of times it happened..." -> " (the tally of this event is
reported in wal_buffers_full in) This is undesirable because ..."

I notice that the patch for WAL receiver doesn't require explicitly
computing the sync statistics but does require computing the write
statistics.  This is because of the presence of issue_xlog_fsync but
absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that the
XLogWrite code path calls pgstat_report_wait_*() while the WAL receiver
path does not.  It seems technically straight-forward to refactor here to
avoid the almost-duplicated logic in the two places, though I suspect there
may be a trade-off for not adding another function call to the stack given
the importance of WAL processing (though that seems marginalized compared
to the cost of actually writing the WAL).  Or, as Fujii noted, go the other
way and don't have any shared code between the two but instead implement
the WAL receiver one to use pg_stat_wal_receiver instead.  In either case,
this half-and-half implementation seems undesirable.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-07 Thread Fujii Masao




On 2021/02/08 13:01, Fujii Masao wrote:



On 2021/02/05 8:45, Masahiro Ikeda wrote:

I pgindented the patches.


Thanks for updating the patches!

+   XLogWrite, which nomally called by an
+   issue_xlog_fsync, which nomally called by an

Typo: "nomally" should be "normally"?

+   XLogFlush request(see )
+   XLogFlush request(see ),

Isn't it better to add a space character just after "request"?

+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    WalStats.m_wal_write_time = 
INSTR_TIME_GET_MICROSEC(duration);

If several cycles happen in the do-while loop, m_wal_write_time should be
updated with the sum of "duration" in those cycles instead of "duration"
in the last cycle? If yes, "+=" should be used instead of "=" when updating
m_wal_write_time?

+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);

Also "=" should be "+=" in the above?


+   /* Send WAL statistics */
+   pgstat_send_wal();

This may cause overhead in WAL-writing by walwriter because it's called
every cycles even when walwriter needs to write more WAL next cycle
(don't need to sleep on WaitLatch)? If this is right, pgstat_send_wal()
should be called only when WaitLatch() returns with WL_TIMEOUT?

-   XLogFlush request(see )
+   XLogFlush request(see ),
+   or WAL data written out to disk by WAL receiver.

So regarding walreceiver, only wal_write, wal_write_time, wal_sync, and
wal_sync_time are updated even while the other values are not. Isn't this
confusing to users? If so, what about reporting those walreceiver stats in
pg_stat_wal_receiver?

if (endofwal)
+   {
+   /* Send WAL statistics to the stats 
collector */
+   pgstat_send_wal();
break;

You added pgstat_send_wal() so that it's called in some cases where
walreceiver exits. But ISTM that there are other walreceiver-exit cases.
For example, in the case where SIGTERM is received. Instead,
pgstat_send_wal() should be called in WalRcvDie() for those all cases?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-07 Thread Fujii Masao




On 2021/02/05 8:45, Masahiro Ikeda wrote:

I pgindented the patches.


Thanks for updating the patches!

+   XLogWrite, which nomally called by an
+   issue_xlog_fsync, which nomally called by an

Typo: "nomally" should be "normally"?

+   XLogFlush request(see )
+   XLogFlush request(see ),

Isn't it better to add a space character just after "request"?

+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_write_time = 
INSTR_TIME_GET_MICROSEC(duration);

If several cycles happen in the do-while loop, m_wal_write_time should be
updated with the sum of "duration" in those cycles instead of "duration"
in the last cycle? If yes, "+=" should be used instead of "=" when updating
m_wal_write_time?

+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time = 
INSTR_TIME_GET_MICROSEC(duration);

Also "=" should be "+=" in the above?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-02-04 Thread Masahiro Ikeda

I pgindented the patches.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 47f436d7e423ece33a25adebf4265eac02e575f3 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 29 Jan 2021 16:41:34 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity.

- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.

(This requires a catversion bump, as well as an update to
 PGSTAT_FILE_FORMAT_ID)

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
---
 doc/src/sgml/config.sgml  | 21 +++
 doc/src/sgml/monitoring.sgml  | 50 
 doc/src/sgml/wal.sgml | 12 +++-
 src/backend/access/transam/xlog.c | 57 +++
 src/backend/catalog/system_views.sql  |  4 ++
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/postmaster/pgstat.c   |  4 ++
 src/backend/postmaster/walwriter.c|  3 +
 src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
 src/backend/utils/misc/guc.c  |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   |  6 +-
 src/include/pgstat.h  | 10 
 src/test/regress/expected/rules.out   |  6 +-
 15 files changed, 199 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5ef1c7ad3c..4bdc341141 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7418,6 +7418,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..2435f401db 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3487,6 +3487,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   wal_write bigint
+  
+  
+   Number of times WAL buffers were written out to disk via 
+   XLogWrite, which nomally called by an 
+   XLogFlush request(see )
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time spent writing WAL data to disk, excluding sync time unless 
+is ether open_datasync or 
+   open_sync. Units are in milliseconds with microsecond resolution.
+   This is zero when  is disabled.
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Number of times WAL files were synced to disk via 
+   issue_xlog_fsync, which nomally called by an 
+   XLogFlush request(see ),
+   while  was set to one of the 
+   "sync at commit" options (i.e., fdatasync, 
+   fsync, or fsync_writethrough).
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time spent syncing WAL files to disk, in milliseconds with microsecond 
+   resolution. This requires setting  to one of 
+   the "sync at commit" options (i.e., fdatasync, fsync,
+   or fsync_writethrough).
+   This is zero when  is disabled.
+  
+ 
+
  
   
stats_reset timestamp with time zone
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..984cb5764c 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -663,7 +663,9 @@
the WAL buffers in shared memory. If there is no
space for the new record, XLogInsertRecord will have
to write (move to kernel cache) a few filled WAL
-   buffers. This is undesirable because XLogInsertRecord
+   buffers. The number of times it happend is counted as 
+   wal_buffers_full in .
+   This is 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-29 Thread Masahiro Ikeda

On 2021-01-27 00:14, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda
 wrote:


(wal_write)
The number of times WAL buffers were written out to disk via

XLogWrite




Thanks.

I thought it's better to omit "The" and "XLogWrite" because other
views'
description
omits "The" and there is no description of "XlogWrite" in the
documents.
What do you think?


The documentation for WAL does get into the public API level of detail
and doing so here makes what this measures crystal clear.  The
potential absence of sufficient detail elsewhere should be corrected
instead of making this description more vague.  Specifically, probably
XLogWrite should be added to the WAL overview as part of this update
and probably even have the descriptive section of the documentation
note that the number of times that said function is executed is
exposed as a counter in the wal statistics table - thus closing the
loop.


Thanks for your comments.

I added the descriptions in documents and separated the patch
into attached two patches. First is to add wal i/o activity statistics.
Second is to make the wal receiver report the wal statistics.

Please let me know if you have any comments.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom a93516dd9836345f97a6f4081597a7079dff4932 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 29 Jan 2021 16:41:34 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity.

- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.

(This requires a catversion bump, as well as an update to
 PGSTAT_FILE_FORMAT_ID)

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
---
 doc/src/sgml/config.sgml  | 21 +++
 doc/src/sgml/monitoring.sgml  | 50 
 doc/src/sgml/wal.sgml | 12 +++-
 src/backend/access/transam/xlog.c | 59 ++-
 src/backend/catalog/system_views.sql  |  4 ++
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/postmaster/pgstat.c   |  4 ++
 src/backend/postmaster/walwriter.c|  3 +
 src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
 src/backend/utils/misc/guc.c  |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   | 14 ++---
 src/include/pgstat.h  |  8 +++
 src/test/regress/expected/rules.out   |  6 +-
 15 files changed, 202 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1037df5a9..3f1b3c1715 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7416,6 +7416,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..2435f401db 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3487,6 +3487,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   wal_write bigint
+  
+  
+   Number of times WAL buffers were written out to disk via 
+   XLogWrite, which nomally called by an 
+   XLogFlush request(see )
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time spent writing WAL data to disk, excluding sync time unless 
+is ether open_datasync or 
+   open_sync. Units are in milliseconds with microsecond resolution.
+   This is zero when  is disabled.
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Number of times WAL files were 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-26 Thread David G. Johnston
On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda 
wrote:

>
> > (wal_write)
> > The number of times WAL buffers were written out to disk via XLogWrite
> >
>
> Thanks.
>
> I thought it's better to omit "The" and "XLogWrite" because other views'
> description
> omits "The" and there is no description of "XlogWrite" in the documents.
> What do you think?
>
>
The documentation for WAL does get into the public API level of detail and
doing so here makes what this measures crystal clear.  The potential
absence of sufficient detail elsewhere should be corrected instead of
making this description more vague.  Specifically, probably XLogWrite
should be added to the WAL overview as part of this update and probably
even have the descriptive section of the documentation note that the number
of times that said function is executed is exposed as a counter in the wal
statistics table - thus closing the loop.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiro Ikeda

Hi, David.

Thanks for your comments.

On 2021-01-26 08:48, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada
 wrote:


On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
 wrote:


Hi, thanks for the reviews.

I updated the attached patch.


Thank you for updating the patch!


Your original email with "total number of times" is more correct,
removing the "of times" and just writing "total number of WAL" is not
good wording.

Specifically, this change is strictly worse than the original.

-   Number of times WAL data was written to disk because WAL
buffers became full
+   Total number of WAL data written to disk because WAL buffers
became full

Both have the flaw that they leave implied exactly what it means to
"write WAL to disk".  It is also unclear whether a counter, bytes, or
both, would be more useful here. I've incorporated this into my
documentation suggestions below:
(wal_buffers_full)

-- Revert - the original was better, though maybe add more detail
similar to the below.  I didn't research exactly how this works.


OK, I understood.
I reverted since this is a counter statistics.



(wal_write)
The number of times WAL buffers were written out to disk via XLogWrite



Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views' 
description
omits "The" and there is no description of "XlogWrite" in the documents. 
What do you think?



-- Seems like this should have a bytes version too


Do you mean that we need to separate statistics for wal write?



(wal_write_time)
The amount of time spent writing WAL buffers to disk, excluding sync
time unless the wal_sync_method is either open_datasync or open_sync.
Units are in milliseconds with microsecond resolution.  This is zero
when track_wal_io_timing is disabled.


Thanks, I'll fix it.



(wal_sync)
The number of times WAL files were synced to disk while
wal_sync_method was set to one of the "sync at commit" options (i.e.,
fdatasync, fsync, or fsync_writethrough).


Thanks, I'll fix it.



-- it is not going to be zero just because those settings are
presently disabled as they could have been enabled at some point since
the last time these statistics were reset.


Right, your description is correct.
The "track_wal_io_timing" has the same limitation, doesn't it?



(wal_sync_time)
The amount of time spent syncing WAL files to disk, in milliseconds
with microsecond resolution.  This requires setting wal_sync_method to
one of the "sync at commit" options (i.e., fdatasync, fsync, or
fsync_writethrough).


Thanks, I'll fix it.
I will add the comments related to "track_wal_io_timing".



Also,

I would suggest extracting the changes to postmaster/pgstat.c and
replication/walreceiver.c to a separate patch as you've fundamentally
changed how it behaves with regards to that function and how it
interacts with the WAL receiver.  That seems an entirely separate
topic warranting its own patch and discussion.


OK, I will separate two patches.


On 2021-01-26 08:52, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda
 wrote:


I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.


The following is not an improvement:

- /* Send WAL statistics to the stats collector. */+ /* Send WAL
statistics to stats collector */

The word "the" there makes it proper English.  Your copy-pasting
should have kept the existing good wording in the other locations
rather than replace the existing location with the newly added
incorrect wording.


Thanks, I'll fix it.



This doesn't make sense:

* current WAL segment file to avoid loading stats collector.

Maybe "overloading" or "overwhelming"?

I see you removed the pgstat_send_wal(force) change.  The rest of my
comments on the v6 patch still stand I believe.


Yes, "overloading" is right. Thanks.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread David G. Johnston
On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda 
wrote:

>
> I agree with your comments. I think it should report when
> reaching the end of WAL too. I add the code to report the stats
> when finishing the current WAL segment file when timeout in the
> main loop and when reaching the end of WAL.
>
>
The following is not an improvement:

- /* Send WAL statistics to the stats collector. */
+ /* Send WAL statistics to stats collector */

The word "the" there makes it proper English.  Your copy-pasting should
have kept the existing good wording in the other locations rather than
replace the existing location with the newly added incorrect wording.

This doesn't make sense:

* current WAL segment file to avoid loading stats collector.

Maybe "overloading" or "overwhelming"?

I see you removed the pgstat_send_wal(force) change.  The rest of my
comments on the v6 patch still stand I believe.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread David G. Johnston
On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada 
wrote:

> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda 
> wrote:
> >
> > Hi, thanks for the reviews.
> >
> > I updated the attached patch.
>
> Thank you for updating the patch!
>

Your original email with "total number of times" is more correct, removing
the "of times" and just writing "total number of WAL" is not good wording.

Specifically, this change is strictly worse than the original.

-   Number of times WAL data was written to disk because WAL buffers
became full
+   Total number of WAL data written to disk because WAL buffers became
full

Both have the flaw that they leave implied exactly what it means to "write
WAL to disk".  It is also unclear whether a counter, bytes, or both, would
be more useful here. I've incorporated this into my documentation
suggestions below:

(wal_buffers_full)
-- Revert - the original was better, though maybe add more detail similar
to the below.  I didn't research exactly how this works.

(wal_write)
The number of times WAL buffers were written out to disk via XLogWrite

-- Seems like this should have a bytes version too

(wal_write_time)
The amount of time spent writing WAL buffers to disk, excluding sync time
unless the wal_sync_method is either open_datasync or open_sync.
Units are in milliseconds with microsecond resolution.  This is zero when
track_wal_io_timing is disabled.

(wal_sync)
The number of times WAL files were synced to disk while wal_sync_method was
set to one of the "sync at commit" options (i.e., fdatasync, fsync,
or fsync_writethrough).

-- it is not going to be zero just because those settings are presently
disabled as they could have been enabled at some point since the last time
these statistics were reset.

(wal_sync_time)
The amount of time spent syncing WAL files to disk, in milliseconds with
microsecond resolution.  This requires setting wal_sync_method to one of
the "sync at commit" options (i.e., fdatasync, fsync,
or fsync_writethrough).


Also,

I would suggest extracting the changes to postmaster/pgstat.c and
replication/walreceiver.c to a separate patch as you've fundamentally
changed how it behaves with regards to that function and how it interacts
with the WAL receiver.  That seems an entirely separate topic warranting
its own patch and discussion.

David J.


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiro Ikeda

On 2021-01-26 00:03, Masahiko Sawada wrote:
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda 
 wrote:


Hi, thanks for the reviews.

I updated the attached patch.


Thank you for updating the patch!


The summary of the changes is following.

1. fix document

I followed another view's comments.


2. refactor issue_xlog_fsync()

I removed "sync_called" variables, narrowed the "duration" scope and
change the switch statement to if statement.


Looking at the code again, I think if we check if an fsync was really
called when calculating the I/O time, it's better to check that before
starting the measurement.

bool issue_fsync = false;

if (enableFsync &&
(sync_method == SYNC_METHOD_FSYNC ||
 sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
 sync_method == SYNC_METHOD_FDATASYNC))
{
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
issue_fsync = true;
}
(snip)
if (issue_fsync)
{
if (track_wal_io_timing)
{
instr_time  duration;

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time = 
INSTR_TIME_GET_MICROSEC(duration);

}
WalStats.m_wal_sync++;
}

So I prefer either the above which is a modified version of the
original approach or my idea that doesn’t introduce a new local
variable I proposed before. But I'm not going to insist on that.


Thanks for the comments.
I change the code to the above.




3. make wal-receiver report WAL statistics

I add the code to collect the statistics for a written operation
in XLogWalRcvWrite() and to report stats in WalReceiverMain().

Since WalReceiverMain() can loop fast, to avoid loading stats 
collector,

I add "force" argument to the pgstat_send_wal function. If "force" is
false, it can skip reporting until at least 500 msec since it last
reported. WalReceiverMain() almost calls pgstat_send_wal() with 
"force"

as false.


 void
-pgstat_send_wal(void)
+pgstat_send_wal(bool force)
 {
/* We assume this initializes to zeroes */
static const PgStat_MsgWal all_zeroes;
+   static TimestampTz last_report = 0;

+   TimestampTz now;
WalUsagewalusage;

+   /*
+* Don't send a message unless it's been at least 
PGSTAT_STAT_INTERVAL

+* msec since we last sent one or specified "force".
+*/
+   now = GetCurrentTimestamp();
+   if (!force &&
+   !TimestampDifferenceExceeds(last_report, now, 
PGSTAT_STAT_INTERVAL))

+   return;
+
+   last_report = now;

Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
purpose since it is used as a minimum time for stats file updates. If
we want an interval, I think we should define another one Also, with
the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
even if track_wal_io_timing is off, which is not good. On the other
hand, I agree that your concern that the wal receiver should not send
the stats for whenever receiving wal records. So an idea could be to
send the wal stats when finishing the current WAL segment file and
when timeout in the main loop. That way we can guarantee that the wal
stats on a replica is updated at least every time finishing a WAL
segment file when actively receiving WAL records and every
NAPTIME_PER_CYCLE in other cases.


I agree with your comments. I think it should report when
reaching the end of WAL too. I add the code to report the stats
when finishing the current WAL segment file when timeout in the
main loop and when reaching the end of WAL.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION--- v6-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-25 16:27:50.749429666 +0900
+++ v7-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-26 08:19:48.269760642 +0900
@@ -1,6 +1,6 @@
-From e9aad92097c5cff5565b67ce1a8ec6d7b4c8a4d9 Mon Sep 17 00:00:00 2001
+From 02f0888efeb09ae641d9ef905788d995d687c56f Mon Sep 17 00:00:00 2001
 From: Masahiro Ikeda 
-Date: Mon, 25 Jan 2021 16:26:04 +0900
+Date: Tue, 26 Jan 2021 08:18:37 +0900
 Subject: [PATCH] Add statistics related to write/sync wal records.
 
 This patch adds following statistics to pg_stat_wal view
@@ -22,24 +22,24 @@
 Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada
 Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
 
-(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
+(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID
 ---
- doc/src/sgml/config.sgml  | 21 
- doc/src/sgml/monitoring.sgml  | 48 -
- src/backend/access/transam/xlog.c | 51 ++-
+ doc/src/sgml/config.sgml  | 21 +++
+ doc/src/sgml/monitoring.sgml  | 48 +++-
+ src/backend/access/transam/xlog.c | 56 ++-
  src/backend/catalog/system_views.sql  |  4 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-25 Thread Masahiko Sawada
On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda  wrote:
>
> Hi, thanks for the reviews.
>
> I updated the attached patch.

Thank you for updating the patch!

> The summary of the changes is following.
>
> 1. fix document
>
> I followed another view's comments.
>
>
> 2. refactor issue_xlog_fsync()
>
> I removed "sync_called" variables, narrowed the "duration" scope and
> change the switch statement to if statement.

Looking at the code again, I think if we check if an fsync was really
called when calculating the I/O time, it's better to check that before
starting the measurement.

bool issue_fsync = false;

if (enableFsync &&
(sync_method == SYNC_METHOD_FSYNC ||
 sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
 sync_method == SYNC_METHOD_FDATASYNC))
{
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
issue_fsync = true;
}
(snip)
if (issue_fsync)
{
if (track_wal_io_timing)
{
instr_time  duration;

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
}
WalStats.m_wal_sync++;
}

So I prefer either the above which is a modified version of the
original approach or my idea that doesn’t introduce a new local
variable I proposed before. But I'm not going to insist on that.

>
>
> 3. make wal-receiver report WAL statistics
>
> I add the code to collect the statistics for a written operation
> in XLogWalRcvWrite() and to report stats in WalReceiverMain().
>
> Since WalReceiverMain() can loop fast, to avoid loading stats collector,
> I add "force" argument to the pgstat_send_wal function. If "force" is
> false, it can skip reporting until at least 500 msec since it last
> reported. WalReceiverMain() almost calls pgstat_send_wal() with "force"
> as false.

 void
-pgstat_send_wal(void)
+pgstat_send_wal(bool force)
 {
/* We assume this initializes to zeroes */
static const PgStat_MsgWal all_zeroes;
+   static TimestampTz last_report = 0;

+   TimestampTz now;
WalUsagewalusage;

+   /*
+* Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+* msec since we last sent one or specified "force".
+*/
+   now = GetCurrentTimestamp();
+   if (!force &&
+   !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+   return;
+
+   last_report = now;

Hmm, I don’t think it's good to use PGSTAT_STAT_INTERVAL for this
purpose since it is used as a minimum time for stats file updates. If
we want an interval, I think we should define another one Also, with
the patch, pgstat_send_wal() calls GetCurrentTimestamp() every time
even if track_wal_io_timing is off, which is not good. On the other
hand, I agree that your concern that the wal receiver should not send
the stats for whenever receiving wal records. So an idea could be to
send the wal stats when finishing the current WAL segment file and
when timeout in the main loop. That way we can guarantee that the wal
stats on a replica is updated at least every time finishing a WAL
segment file when actively receiving WAL records and every
NAPTIME_PER_CYCLE in other cases.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

Hi, thanks for the reviews.

I updated the attached patch.
The summary of the changes is following.

1. fix document

I followed another view's comments.


2. refactor issue_xlog_fsync()

I removed "sync_called" variables, narrowed the "duration" scope and
change the switch statement to if statement.


3. make wal-receiver report WAL statistics

I add the code to collect the statistics for a written operation
in XLogWalRcvWrite() and to report stats in WalReceiverMain().

Since WalReceiverMain() can loop fast, to avoid loading stats collector,
I add "force" argument to the pgstat_send_wal function. If "force" is
false, it can skip reporting until at least 500 msec since it last 
reported. WalReceiverMain() almost calls pgstat_send_wal() with "force" 
as false.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION--- v5-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-23 09:26:01.919248712 +0900
+++ v6-0001-Add-statistics-related-to-write-sync-wal-records.patch	2021-01-25 16:27:50.749429666 +0900
@@ -1,6 +1,6 @@
-From ee1b7d17391b9d9619f709afeacdd118973471d6 Mon Sep 17 00:00:00 2001
+From e9aad92097c5cff5565b67ce1a8ec6d7b4c8a4d9 Mon Sep 17 00:00:00 2001
 From: Masahiro Ikeda 
-Date: Fri, 22 Jan 2021 21:38:31 +0900
+Date: Mon, 25 Jan 2021 16:26:04 +0900
 Subject: [PATCH] Add statistics related to write/sync wal records.
 
 This patch adds following statistics to pg_stat_wal view
@@ -24,20 +24,22 @@
 
 (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
 ---
- doc/src/sgml/config.sgml  | 21 +++
- doc/src/sgml/monitoring.sgml  | 48 ++-
- src/backend/access/transam/xlog.c | 59 ++-
+ doc/src/sgml/config.sgml  | 21 
+ doc/src/sgml/monitoring.sgml  | 48 -
+ src/backend/access/transam/xlog.c | 51 ++-
  src/backend/catalog/system_views.sql  |  4 ++
- src/backend/postmaster/pgstat.c   |  4 ++
- src/backend/postmaster/walwriter.c|  3 +
- src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
- src/backend/utils/misc/guc.c  |  9 +++
+ src/backend/postmaster/checkpointer.c |  2 +-
+ src/backend/postmaster/pgstat.c   | 26 --
+ src/backend/postmaster/walwriter.c|  3 ++
+ src/backend/replication/walreceiver.c | 30 +++
+ src/backend/utils/adt/pgstatfuncs.c   | 24 +++--
+ src/backend/utils/misc/guc.c  |  9 
  src/backend/utils/misc/postgresql.conf.sample |  1 +
  src/include/access/xlog.h |  1 +
  src/include/catalog/pg_proc.dat   | 14 ++---
- src/include/pgstat.h  |  8 +++
- src/test/regress/expected/rules.out   |  6 +-
- 13 files changed, 189 insertions(+), 13 deletions(-)
+ src/include/pgstat.h  | 10 +++-
+ src/test/regress/expected/rules.out   |  6 ++-
+ 15 files changed, 232 insertions(+), 18 deletions(-)
 
 diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
 index 82864bbb24..43f3fbcaf8 100644
@@ -72,7 +74,7 @@
track_functions (enum)

 diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
-index f05140dd42..36764dc64f 100644
+index f05140dd42..5a8fc4eb0c 100644
 --- a/doc/src/sgml/monitoring.sgml
 +++ b/doc/src/sgml/monitoring.sgml
 @@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
@@ -80,7 +82,7 @@


 -   Number of times WAL data was written to disk because WAL buffers became full
-+   Total number of times WAL data was written to disk because WAL buffers became full
++   Total number of WAL data written to disk because WAL buffers became full
 +  
 + 
 +
@@ -89,7 +91,7 @@
 +   wal_write bigint
 +  
 +  
-+   Total number of times WAL data was written to disk
++   Total number of WAL data written to disk
 +  
 + 
 +
@@ -109,7 +111,7 @@
 +   wal_sync bigint
 +  
 +  
-+   Total number of times WAL data was synced to disk
++   Total number of WAL data synced to disk
 +   (if  is open_datasync or 
 +   open_sync, this value is zero because WAL data is synced 
 +   when to write it).
@@ -131,7 +133,7 @@
   
  
 diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
-index 470e113b33..b3890f97a2 100644
+index 470e113b33..1c4860bee7 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -110,6 +110,7 @@ int			CommitDelay = 0;	/* precommit delay in microseconds */
@@ -142,16 +144,15 @@
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
-@@ -2540,6 +2541,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
+@@ -2540,6 +2541,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
  			Size		nbytes;
  			Size		nleft;
  			int	

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiko Sawada
On Mon, Jan 25, 2021 at 1:28 PM Masahiro Ikeda  wrote:
>
> On 2021-01-25 13:15, Masahiro Ikeda wrote:
> > On 2021-01-25 10:36, Masahiko Sawada wrote:
> >> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
> >>  wrote:
> >>>
> >>> On 2021-01-22 14:50, Masahiko Sawada wrote:
> >>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
> >>> >  wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I rebased the patch to the master branch.
> >>> >
> >>> > Thank you for working on this. I've read the latest patch. Here are
> >>> > comments:
> >>> >
> >>> > ---
> >>> > +   if (track_wal_io_timing)
> >>> > +   {
> >>> > +   INSTR_TIME_SET_CURRENT(duration);
> >>> > +   INSTR_TIME_SUBTRACT(duration, start);
> >>> > +   WalStats.m_wal_write_time +=
> >>> > INSTR_TIME_GET_MILLISEC(duration);
> >>> > +   }
> >>> >
> >>> > * I think it should add the time in micro sec.
> >>> > After running pgbench with track_wal_io_timing = on for 30 sec,
> >>> > pg_stat_wal showed the following on my environment:
> >>> >
> >>> > postgres(1:61569)=# select * from pg_stat_wal;
> >>> > -[ RECORD 1 ]+-
> >>> > wal_records  | 285947
> >>> > wal_fpi  | 53285
> >>> > wal_bytes| 442008213
> >>> > wal_buffers_full | 0
> >>> > wal_write| 25516
> >>> > wal_write_time   | 0
> >>> > wal_sync | 25437
> >>> > wal_sync_time| 14490
> >>> > stats_reset  | 2021-01-22 10:56:13.29464+09
> >>> >
> >>> > Since writes can complete less than a millisecond, wal_write_time
> >>> > didn't increase. I think sync_time could also have the same problem.
> >>>
> >>> Thanks for your comments. I didn't notice that.
> >>> I changed the unit from milliseconds to microseconds.
> >>>
> >>> > ---
> >>> > +   /*
> >>> > +* Measure i/o timing to fsync WAL data.
> >>> > +*
> >>> > +* The wal receiver skip to collect it to avoid performance
> >>> > degradation of standy servers.
> >>> > +* If sync_method doesn't have its fsync method, to skip too.
> >>> > +*/
> >>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> >>> > fsyncMethodCalled())
> >>> > +   INSTR_TIME_SET_CURRENT(start);
> >>> >
> >>> > * Why does only the wal receiver skip it even if track_wal_io_timinig
> >>> > is true? I think the performance degradation is also true for backend
> >>> > processes. If there is another reason for that, I think it's better to
> >>> > mention in both the doc and comment.
> >>> > * How about checking track_wal_io_timing first?
> >>> > * s/standy/standby/
> >>>
> >>> I fixed it.
> >>> As kuroda-san mentioned too, the skip is no need to be considered.
> >>
> >> I think you also removed the code to have the wal receiver report the
> >> stats. So with the latest patch, the wal receiver tracks those
> >> statistics but doesn't report.
> >> And maybe XLogWalRcvWrite() also needs to track I/O?
> >
> > Thanks, I forgot to add them.
> > I'll fix it.
> >
> >
> >>>
> >>> > ---
> >>> > +   /* increment the i/o timing and the number of times to fsync WAL
> >>> > data */
> >>> > +   if (fsyncMethodCalled())
> >>> > +   {
> >>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing)
> >>> > +   {
> >>> > +   INSTR_TIME_SET_CURRENT(duration);
> >>> > +   INSTR_TIME_SUBTRACT(duration, start);
> >>> > +   WalStats.m_wal_sync_time +=
> >>> > INSTR_TIME_GET_MILLISEC(duration);
> >>> > +   }
> >>> > +
> >>> > +   WalStats.m_wal_sync++;
> >>> > +   }
> >>> >
> >>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
> >>> > incrementing m_wal_sync after each sync operation?
> >>>
> >>> I think if syncing the disk does not occur, m_wal_sync should not be
> >>> incremented.
> >>> It depends enableFsync and sync_method.
> >>>
> >>> enableFsync is checked in each fsync method like
> >>> pg_fsync_no_writethrough(),
> >>> so if incrementing m_wal_sync after each sync operation, it should be
> >>> implemented
> >>> in each fsync method. It leads to many duplicated codes.
> >>
> >> Right. I missed that each fsync function checks enableFsync.
> >>
> >>> So, why don't you change the function to a flag whether to
> >>> sync data to the disk will be occurred or not in issue_xlog_fsync()?
> >>
> >> Looks better. Since we don't necessarily need to increment m_wal_sync
> >> after doing fsync we can write the code without an additional variable
> >> as follows:
> >>
> >> if (enableFsync)
> >> {
> >> switch (sync_method)
> >> {
> >> case SYNC_METHOD_FSYNC:
> >> #ifdef HAVE_FSYNC_WRITETHROUGH
> >> case SYNC_METHOD_FSYNC_WRITETHROUGH:
> >> #endif
> >> #ifdef HAVE_FDATASYNC
> >> case SYNC_METHOD_FDATASYNC:
> >> #endif
> >> WalStats.m_wal_sync++;
> >> if (track_wal_io_timing)
> >> INSTR_TIME_SET_CURRENT(start);
> >> break;
> >> default:
> >>

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

On 2021-01-25 13:15, Masahiro Ikeda wrote:

On 2021-01-25 10:36, Masahiko Sawada wrote:

On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
 wrote:


On 2021-01-22 14:50, Masahiko Sawada wrote:
> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>  wrote:
>>
>> Hi,
>>
>> I rebased the patch to the master branch.
>
> Thank you for working on this. I've read the latest patch. Here are
> comments:
>
> ---
> +   if (track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_write_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
>
> * I think it should add the time in micro sec.
> After running pgbench with track_wal_io_timing = on for 30 sec,
> pg_stat_wal showed the following on my environment:
>
> postgres(1:61569)=# select * from pg_stat_wal;
> -[ RECORD 1 ]+-
> wal_records  | 285947
> wal_fpi  | 53285
> wal_bytes| 442008213
> wal_buffers_full | 0
> wal_write| 25516
> wal_write_time   | 0
> wal_sync | 25437
> wal_sync_time| 14490
> stats_reset  | 2021-01-22 10:56:13.29464+09
>
> Since writes can complete less than a millisecond, wal_write_time
> didn't increase. I think sync_time could also have the same problem.

Thanks for your comments. I didn't notice that.
I changed the unit from milliseconds to microseconds.

> ---
> +   /*
> +* Measure i/o timing to fsync WAL data.
> +*
> +* The wal receiver skip to collect it to avoid performance
> degradation of standy servers.
> +* If sync_method doesn't have its fsync method, to skip too.
> +*/
> +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> fsyncMethodCalled())
> +   INSTR_TIME_SET_CURRENT(start);
>
> * Why does only the wal receiver skip it even if track_wal_io_timinig
> is true? I think the performance degradation is also true for backend
> processes. If there is another reason for that, I think it's better to
> mention in both the doc and comment.
> * How about checking track_wal_io_timing first?
> * s/standy/standby/

I fixed it.
As kuroda-san mentioned too, the skip is no need to be considered.


I think you also removed the code to have the wal receiver report the
stats. So with the latest patch, the wal receiver tracks those
statistics but doesn't report.
And maybe XLogWalRcvWrite() also needs to track I/O?


Thanks, I forgot to add them.
I'll fix it.




> ---
> +   /* increment the i/o timing and the number of times to fsync WAL
> data */
> +   if (fsyncMethodCalled())
> +   {
> +   if (!AmWalReceiverProcess() && track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_sync_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
> +
> +   WalStats.m_wal_sync++;
> +   }
>
> * I'd avoid always calling fsyncMethodCalled() in this path. How about
> incrementing m_wal_sync after each sync operation?

I think if syncing the disk does not occur, m_wal_sync should not be
incremented.
It depends enableFsync and sync_method.

enableFsync is checked in each fsync method like
pg_fsync_no_writethrough(),
so if incrementing m_wal_sync after each sync operation, it should be
implemented
in each fsync method. It leads to many duplicated codes.


Right. I missed that each fsync function checks enableFsync.


So, why don't you change the function to a flag whether to
sync data to the disk will be occurred or not in issue_xlog_fsync()?


Looks better. Since we don't necessarily need to increment m_wal_sync
after doing fsync we can write the code without an additional variable
as follows:

if (enableFsync)
{
switch (sync_method)
{
case SYNC_METHOD_FSYNC:
#ifdef HAVE_FSYNC_WRITETHROUGH
case SYNC_METHOD_FSYNC_WRITETHROUGH:
#endif
#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
#endif
WalStats.m_wal_sync++;
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
break;
default:
break;
}
}

  (do fsync and error handling here)

   /* increment the i/o timing and the number of times to fsync WAL 
data */

   if (track_wal_io_timing)
   {
   INSTR_TIME_SET_CURRENT(duration);
   INSTR_TIME_SUBTRACT(duration, start);
   WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
   }


IIUC, I think we can't handle the following case.

When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and
"track_wal_io_timing" is enabled, "start" doesn't be initialized.

My understanding is something wrong, isn't it?


I thought the following is better.


```
/* Measure i/o timing to sync WAL data.*/
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);

  (do fsync and error handling 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

On 2021-01-25 11:47, japin wrote:
On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada  
wrote:

On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
 wrote:


On 2021-01-22 14:50, Masahiko Sawada wrote:
> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>  wrote:
>>
>> Hi,
>>
>> I rebased the patch to the master branch.
>
> Thank you for working on this. I've read the latest patch. Here are
> comments:
>
> ---
> +   if (track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_write_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
>
> * I think it should add the time in micro sec.
> After running pgbench with track_wal_io_timing = on for 30 sec,
> pg_stat_wal showed the following on my environment:
>
> postgres(1:61569)=# select * from pg_stat_wal;
> -[ RECORD 1 ]+-
> wal_records  | 285947
> wal_fpi  | 53285
> wal_bytes| 442008213
> wal_buffers_full | 0
> wal_write| 25516
> wal_write_time   | 0
> wal_sync | 25437
> wal_sync_time| 14490
> stats_reset  | 2021-01-22 10:56:13.29464+09
>
> Since writes can complete less than a millisecond, wal_write_time
> didn't increase. I think sync_time could also have the same problem.

Thanks for your comments. I didn't notice that.
I changed the unit from milliseconds to microseconds.

> ---
> +   /*
> +* Measure i/o timing to fsync WAL data.
> +*
> +* The wal receiver skip to collect it to avoid performance
> degradation of standy servers.
> +* If sync_method doesn't have its fsync method, to skip too.
> +*/
> +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> fsyncMethodCalled())
> +   INSTR_TIME_SET_CURRENT(start);
>
> * Why does only the wal receiver skip it even if track_wal_io_timinig
> is true? I think the performance degradation is also true for backend
> processes. If there is another reason for that, I think it's better to
> mention in both the doc and comment.
> * How about checking track_wal_io_timing first?
> * s/standy/standby/

I fixed it.
As kuroda-san mentioned too, the skip is no need to be considered.


I think you also removed the code to have the wal receiver report the
stats. So with the latest patch, the wal receiver tracks those
statistics but doesn't report.

And maybe XLogWalRcvWrite() also needs to track I/O?



> ---
> +   /* increment the i/o timing and the number of times to fsync WAL
> data */
> +   if (fsyncMethodCalled())
> +   {
> +   if (!AmWalReceiverProcess() && track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_sync_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
> +
> +   WalStats.m_wal_sync++;
> +   }
>
> * I'd avoid always calling fsyncMethodCalled() in this path. How about
> incrementing m_wal_sync after each sync operation?

I think if syncing the disk does not occur, m_wal_sync should not be
incremented.
It depends enableFsync and sync_method.

enableFsync is checked in each fsync method like
pg_fsync_no_writethrough(),
so if incrementing m_wal_sync after each sync operation, it should be
implemented
in each fsync method. It leads to many duplicated codes.


Right. I missed that each fsync function checks enableFsync.


So, why don't you change the function to a flag whether to
sync data to the disk will be occurred or not in issue_xlog_fsync()?


Looks better. Since we don't necessarily need to increment m_wal_sync
after doing fsync we can write the code without an additional variable
as follows:

if (enableFsync)
{
switch (sync_method)
{
case SYNC_METHOD_FSYNC:
#ifdef HAVE_FSYNC_WRITETHROUGH
case SYNC_METHOD_FSYNC_WRITETHROUGH:
#endif
#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
#endif
WalStats.m_wal_sync++;
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
break;
default:
break;
}
}

  (do fsync and error handling here)

   /* increment the i/o timing and the number of times to fsync WAL 
data */

   if (track_wal_io_timing)
   {
   INSTR_TIME_SET_CURRENT(duration);
   INSTR_TIME_SUBTRACT(duration, start);
   WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
   }

I think we can change the first switch-case to an if statement.



+1. We can also narrow the scope of "duration" into "if
(track_wal_io_timing)" branch.


Thanks, I'll change it.

--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

On 2021-01-25 10:36, Masahiko Sawada wrote:

On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
 wrote:


On 2021-01-22 14:50, Masahiko Sawada wrote:
> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>  wrote:
>>
>> Hi,
>>
>> I rebased the patch to the master branch.
>
> Thank you for working on this. I've read the latest patch. Here are
> comments:
>
> ---
> +   if (track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_write_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
>
> * I think it should add the time in micro sec.
> After running pgbench with track_wal_io_timing = on for 30 sec,
> pg_stat_wal showed the following on my environment:
>
> postgres(1:61569)=# select * from pg_stat_wal;
> -[ RECORD 1 ]+-
> wal_records  | 285947
> wal_fpi  | 53285
> wal_bytes| 442008213
> wal_buffers_full | 0
> wal_write| 25516
> wal_write_time   | 0
> wal_sync | 25437
> wal_sync_time| 14490
> stats_reset  | 2021-01-22 10:56:13.29464+09
>
> Since writes can complete less than a millisecond, wal_write_time
> didn't increase. I think sync_time could also have the same problem.

Thanks for your comments. I didn't notice that.
I changed the unit from milliseconds to microseconds.

> ---
> +   /*
> +* Measure i/o timing to fsync WAL data.
> +*
> +* The wal receiver skip to collect it to avoid performance
> degradation of standy servers.
> +* If sync_method doesn't have its fsync method, to skip too.
> +*/
> +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> fsyncMethodCalled())
> +   INSTR_TIME_SET_CURRENT(start);
>
> * Why does only the wal receiver skip it even if track_wal_io_timinig
> is true? I think the performance degradation is also true for backend
> processes. If there is another reason for that, I think it's better to
> mention in both the doc and comment.
> * How about checking track_wal_io_timing first?
> * s/standy/standby/

I fixed it.
As kuroda-san mentioned too, the skip is no need to be considered.


I think you also removed the code to have the wal receiver report the
stats. So with the latest patch, the wal receiver tracks those
statistics but doesn't report.
And maybe XLogWalRcvWrite() also needs to track I/O?


Thanks, I forgot to add them.
I'll fix it.




> ---
> +   /* increment the i/o timing and the number of times to fsync WAL
> data */
> +   if (fsyncMethodCalled())
> +   {
> +   if (!AmWalReceiverProcess() && track_wal_io_timing)
> +   {
> +   INSTR_TIME_SET_CURRENT(duration);
> +   INSTR_TIME_SUBTRACT(duration, start);
> +   WalStats.m_wal_sync_time +=
> INSTR_TIME_GET_MILLISEC(duration);
> +   }
> +
> +   WalStats.m_wal_sync++;
> +   }
>
> * I'd avoid always calling fsyncMethodCalled() in this path. How about
> incrementing m_wal_sync after each sync operation?

I think if syncing the disk does not occur, m_wal_sync should not be
incremented.
It depends enableFsync and sync_method.

enableFsync is checked in each fsync method like
pg_fsync_no_writethrough(),
so if incrementing m_wal_sync after each sync operation, it should be
implemented
in each fsync method. It leads to many duplicated codes.


Right. I missed that each fsync function checks enableFsync.


So, why don't you change the function to a flag whether to
sync data to the disk will be occurred or not in issue_xlog_fsync()?


Looks better. Since we don't necessarily need to increment m_wal_sync
after doing fsync we can write the code without an additional variable
as follows:

if (enableFsync)
{
switch (sync_method)
{
case SYNC_METHOD_FSYNC:
#ifdef HAVE_FSYNC_WRITETHROUGH
case SYNC_METHOD_FSYNC_WRITETHROUGH:
#endif
#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
#endif
WalStats.m_wal_sync++;
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
break;
default:
break;
}
}

  (do fsync and error handling here)

   /* increment the i/o timing and the number of times to fsync WAL 
data */

   if (track_wal_io_timing)
   {
   INSTR_TIME_SET_CURRENT(duration);
   INSTR_TIME_SUBTRACT(duration, start);
   WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
   }


IIUC, I think we can't handle the following case.

When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and
"track_wal_io_timing" is enabled, "start" doesn't be initialized.

My understanding is something wrong, isn't it?



I think we can change the first switch-case to an if statement.


Yes, I'll change it.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




RE: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread kuroda.hay...@fujitsu.com
Dear Ikeda-san,

> I checked columns' descriptions of other views.
> There are "Number of xxx", "Total number of xxx", "Total amount of time 
> that xxx" and "Total time spent xxx".

Right.

> Since the "time" is used for showing spending time, not count,
> I'll change it to "Total number of WAL data written/synced to disk".
> Thought?

I misread your patch, sorry. I prefer your suggestion.
Please fix like that way with others.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





RE: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

On 2021-01-25 10:34, kuroda.hay...@fujitsu.com wrote:

Dear Ikeda-san,

Thank you for updating the patch. This can be applied to master, and
can be used on my RHEL7.
wal_write_time and wal_sync_time increase normally :-).

```
postgres=# select * from pg_stat_wal;
-[ RECORD 1 ]+--
wal_records  | 121781
wal_fpi  | 2287
wal_bytes| 36055146
wal_buffers_full | 799
wal_write| 12770
wal_write_time   | 4.469
wal_sync | 11962
wal_sync_time| 132.352
stats_reset  | 2021-01-25 00:51:40.674412+00
```


Thanks for checking.


I put a further comment:

```
@@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to disk because WAL
buffers became full
+   Total number of times WAL data was written to disk because WAL
buffers became full
+  
+ 
+
+ 
+  role="column_definition">

+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to disk
+  
+ 
+
+ 
+  role="column_definition">
+   wal_write_time double 
precision

+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled,
otherwise zero).
+  
+ 
+
+ 
+  role="column_definition">

+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to disk
+   (if  is
open_datasync or
+   open_sync, this value is zero because WAL
data is synced
+   when to write it).
+  
+ 
+
+ 
+  role="column_definition">
+   wal_sync_time double 
precision

+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled,
otherwise zero.
+   if  is
open_datasync or
+   open_sync, this value is zero too because
WAL data is synced
+   when to write it).
   
  
 ```

Maybe "Total amount of time" should be used, not "Total number of 
time."

Other views use "amount."


Thanks.

I checked columns' descriptions of other views.
There are "Number of xxx", "Total number of xxx", "Total amount of time 
that xxx" and "Total time spent xxx".


Since the "time" is used for showing spending time, not count,
I'll change it to "Total number of WAL data written/synced to disk".
Thought?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread japin


On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada  wrote:
> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
>  wrote:
>>
>> On 2021-01-22 14:50, Masahiko Sawada wrote:
>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I rebased the patch to the master branch.
>> >
>> > Thank you for working on this. I've read the latest patch. Here are
>> > comments:
>> >
>> > ---
>> > +   if (track_wal_io_timing)
>> > +   {
>> > +   INSTR_TIME_SET_CURRENT(duration);
>> > +   INSTR_TIME_SUBTRACT(duration, start);
>> > +   WalStats.m_wal_write_time +=
>> > INSTR_TIME_GET_MILLISEC(duration);
>> > +   }
>> >
>> > * I think it should add the time in micro sec.
>> > After running pgbench with track_wal_io_timing = on for 30 sec,
>> > pg_stat_wal showed the following on my environment:
>> >
>> > postgres(1:61569)=# select * from pg_stat_wal;
>> > -[ RECORD 1 ]+-
>> > wal_records  | 285947
>> > wal_fpi  | 53285
>> > wal_bytes| 442008213
>> > wal_buffers_full | 0
>> > wal_write| 25516
>> > wal_write_time   | 0
>> > wal_sync | 25437
>> > wal_sync_time| 14490
>> > stats_reset  | 2021-01-22 10:56:13.29464+09
>> >
>> > Since writes can complete less than a millisecond, wal_write_time
>> > didn't increase. I think sync_time could also have the same problem.
>>
>> Thanks for your comments. I didn't notice that.
>> I changed the unit from milliseconds to microseconds.
>>
>> > ---
>> > +   /*
>> > +* Measure i/o timing to fsync WAL data.
>> > +*
>> > +* The wal receiver skip to collect it to avoid performance
>> > degradation of standy servers.
>> > +* If sync_method doesn't have its fsync method, to skip too.
>> > +*/
>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
>> > fsyncMethodCalled())
>> > +   INSTR_TIME_SET_CURRENT(start);
>> >
>> > * Why does only the wal receiver skip it even if track_wal_io_timinig
>> > is true? I think the performance degradation is also true for backend
>> > processes. If there is another reason for that, I think it's better to
>> > mention in both the doc and comment.
>> > * How about checking track_wal_io_timing first?
>> > * s/standy/standby/
>>
>> I fixed it.
>> As kuroda-san mentioned too, the skip is no need to be considered.
>
> I think you also removed the code to have the wal receiver report the
> stats. So with the latest patch, the wal receiver tracks those
> statistics but doesn't report.
>
> And maybe XLogWalRcvWrite() also needs to track I/O?
>
>>
>> > ---
>> > +   /* increment the i/o timing and the number of times to fsync WAL
>> > data */
>> > +   if (fsyncMethodCalled())
>> > +   {
>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing)
>> > +   {
>> > +   INSTR_TIME_SET_CURRENT(duration);
>> > +   INSTR_TIME_SUBTRACT(duration, start);
>> > +   WalStats.m_wal_sync_time +=
>> > INSTR_TIME_GET_MILLISEC(duration);
>> > +   }
>> > +
>> > +   WalStats.m_wal_sync++;
>> > +   }
>> >
>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
>> > incrementing m_wal_sync after each sync operation?
>>
>> I think if syncing the disk does not occur, m_wal_sync should not be
>> incremented.
>> It depends enableFsync and sync_method.
>>
>> enableFsync is checked in each fsync method like
>> pg_fsync_no_writethrough(),
>> so if incrementing m_wal_sync after each sync operation, it should be
>> implemented
>> in each fsync method. It leads to many duplicated codes.
>
> Right. I missed that each fsync function checks enableFsync.
>
>> So, why don't you change the function to a flag whether to
>> sync data to the disk will be occurred or not in issue_xlog_fsync()?
>
> Looks better. Since we don't necessarily need to increment m_wal_sync
> after doing fsync we can write the code without an additional variable
> as follows:
>
> if (enableFsync)
> {
> switch (sync_method)
> {
> case SYNC_METHOD_FSYNC:
> #ifdef HAVE_FSYNC_WRITETHROUGH
> case SYNC_METHOD_FSYNC_WRITETHROUGH:
> #endif
> #ifdef HAVE_FDATASYNC
> case SYNC_METHOD_FDATASYNC:
> #endif
> WalStats.m_wal_sync++;
> if (track_wal_io_timing)
> INSTR_TIME_SET_CURRENT(start);
> break;
> default:
> break;
> }
> }
>
>   (do fsync and error handling here)
>
>/* increment the i/o timing and the number of times to fsync WAL data */
>if (track_wal_io_timing)
>{
>INSTR_TIME_SET_CURRENT(duration);
>INSTR_TIME_SUBTRACT(duration, start);
>WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
>}
>
> I think we can change the first switch-case to an if statement.
>

+1. We can also narrow the scope of "duration" into "if (track_wal_io_timing)" 
branch.

-- 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiko Sawada
On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
 wrote:
>
> On 2021-01-22 14:50, Masahiko Sawada wrote:
> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
> >  wrote:
> >>
> >> Hi,
> >>
> >> I rebased the patch to the master branch.
> >
> > Thank you for working on this. I've read the latest patch. Here are
> > comments:
> >
> > ---
> > +   if (track_wal_io_timing)
> > +   {
> > +   INSTR_TIME_SET_CURRENT(duration);
> > +   INSTR_TIME_SUBTRACT(duration, start);
> > +   WalStats.m_wal_write_time +=
> > INSTR_TIME_GET_MILLISEC(duration);
> > +   }
> >
> > * I think it should add the time in micro sec.
> > After running pgbench with track_wal_io_timing = on for 30 sec,
> > pg_stat_wal showed the following on my environment:
> >
> > postgres(1:61569)=# select * from pg_stat_wal;
> > -[ RECORD 1 ]+-
> > wal_records  | 285947
> > wal_fpi  | 53285
> > wal_bytes| 442008213
> > wal_buffers_full | 0
> > wal_write| 25516
> > wal_write_time   | 0
> > wal_sync | 25437
> > wal_sync_time| 14490
> > stats_reset  | 2021-01-22 10:56:13.29464+09
> >
> > Since writes can complete less than a millisecond, wal_write_time
> > didn't increase. I think sync_time could also have the same problem.
>
> Thanks for your comments. I didn't notice that.
> I changed the unit from milliseconds to microseconds.
>
> > ---
> > +   /*
> > +* Measure i/o timing to fsync WAL data.
> > +*
> > +* The wal receiver skip to collect it to avoid performance
> > degradation of standy servers.
> > +* If sync_method doesn't have its fsync method, to skip too.
> > +*/
> > +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> > fsyncMethodCalled())
> > +   INSTR_TIME_SET_CURRENT(start);
> >
> > * Why does only the wal receiver skip it even if track_wal_io_timinig
> > is true? I think the performance degradation is also true for backend
> > processes. If there is another reason for that, I think it's better to
> > mention in both the doc and comment.
> > * How about checking track_wal_io_timing first?
> > * s/standy/standby/
>
> I fixed it.
> As kuroda-san mentioned too, the skip is no need to be considered.

I think you also removed the code to have the wal receiver report the
stats. So with the latest patch, the wal receiver tracks those
statistics but doesn't report.

And maybe XLogWalRcvWrite() also needs to track I/O?

>
> > ---
> > +   /* increment the i/o timing and the number of times to fsync WAL
> > data */
> > +   if (fsyncMethodCalled())
> > +   {
> > +   if (!AmWalReceiverProcess() && track_wal_io_timing)
> > +   {
> > +   INSTR_TIME_SET_CURRENT(duration);
> > +   INSTR_TIME_SUBTRACT(duration, start);
> > +   WalStats.m_wal_sync_time +=
> > INSTR_TIME_GET_MILLISEC(duration);
> > +   }
> > +
> > +   WalStats.m_wal_sync++;
> > +   }
> >
> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
> > incrementing m_wal_sync after each sync operation?
>
> I think if syncing the disk does not occur, m_wal_sync should not be
> incremented.
> It depends enableFsync and sync_method.
>
> enableFsync is checked in each fsync method like
> pg_fsync_no_writethrough(),
> so if incrementing m_wal_sync after each sync operation, it should be
> implemented
> in each fsync method. It leads to many duplicated codes.

Right. I missed that each fsync function checks enableFsync.

> So, why don't you change the function to a flag whether to
> sync data to the disk will be occurred or not in issue_xlog_fsync()?

Looks better. Since we don't necessarily need to increment m_wal_sync
after doing fsync we can write the code without an additional variable
as follows:

if (enableFsync)
{
switch (sync_method)
{
case SYNC_METHOD_FSYNC:
#ifdef HAVE_FSYNC_WRITETHROUGH
case SYNC_METHOD_FSYNC_WRITETHROUGH:
#endif
#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
#endif
WalStats.m_wal_sync++;
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
break;
default:
break;
}
}

  (do fsync and error handling here)

   /* increment the i/o timing and the number of times to fsync WAL data */
   if (track_wal_io_timing)
   {
   INSTR_TIME_SET_CURRENT(duration);
   INSTR_TIME_SUBTRACT(duration, start);
   WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
   }

I think we can change the first switch-case to an if statement.

>
>
> > * As far as I read the code, issue_xlog_fsync() seems to do fsync even
> > if enableFsync is false. Why does the function return false in that
> > case? I might be missing something.
>
> IIUC, the reason is that I thought that each fsync functions like
> pg_fsync_no_writethrough() check enableFsync.
>
> If this code doesn't 

RE: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread kuroda.hay...@fujitsu.com
Dear Ikeda-san,

Thank you for updating the patch. This can be applied to master, and
can be used on my RHEL7.
wal_write_time and wal_sync_time increase normally :-).

```
postgres=# select * from pg_stat_wal;
-[ RECORD 1 ]+--
wal_records  | 121781
wal_fpi  | 2287
wal_bytes| 36055146
wal_buffers_full | 799
wal_write| 12770
wal_write_time   | 4.469
wal_sync | 11962
wal_sync_time| 132.352
stats_reset  | 2021-01-25 00:51:40.674412+00
```

I put a further comment:

```
@@ -3485,7 +3485,53 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to disk because WAL buffers became 
full
+   Total number of times WAL data was written to disk because WAL buffers 
became full
+  
+ 
+
+ 
+  
+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to disk
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled, otherwise 
zero).
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to disk
+   (if  is 
open_datasync or 
+   open_sync, this value is zero because WAL data is 
synced 
+   when to write it).
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled, otherwise 
zero.
+   if  is 
open_datasync or 
+   open_sync, this value is zero too because WAL data 
is synced 
+   when to write it).
   
  
 ```

Maybe "Total amount of time" should be used, not "Total number of time."
Other views use "amount."

I have no comments anymore.

Hayato Kuroda
FUJITSU LIMITED




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-24 Thread Masahiro Ikeda

Hi, Japin

Thanks for your comments.

On 2021-01-23 01:46, japin wrote:

Hi, Masahiro

Thanks for you update the v4 patch.  Here are some comments:

(1)
+   char*msg = NULL;
+   boolsync_called;/* whether to sync
data to the disk. */
+   instr_time  start;
+   instr_time  duration;
+
+   /* check whether to sync data to the disk is really occurred. 
*/

+   sync_called = false;

Maybe we can initialize the "sync_called" variable when declare it.


Yes, I fixed it.


(2)
+   if (sync_called)
+   {
+   /* increment the i/o timing and the number of times to
fsync WAL data */
+   if (track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time =
INSTR_TIME_GET_MICROSEC(duration);
+   }
+
+   WalStats.m_wal_sync++;
+   }

There is an extra space before INSTR_TIME_GET_MICROSEC(duration).


Yes, I removed it.


In the issue_xlog_fsync(), the comment says that if sync_method is
SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, it already write synced.
Does that mean it synced when write the WAL data?  And for those cases, 
we
cannot get accurate write/sync timing and number of write/sync times, 
right?


case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
/* write synced it already */
break;


Yes, I add the following comments in the document.

@@ -3515,6 +3515,9 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i

   
   
Total number of times WAL data was synced to disk
+   (if  is 
open_datasync or
+   open_sync, this value is zero because WAL 
data is synced

+   when to write it).
   
  

@@ -3525,7 +3528,10 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i

   
Total amount of time that has been spent in the portion of
WAL data was synced to disk, in milliseconds
-   (if  is enabled, 
otherwise zero)
+   (if  is enabled, 
otherwise zero.
+   if  is 
open_datasync or
+   open_sync, this value is zero too because WAL 
data is synced

+   when to write it).
   
  


I attached a modified patch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom ee1b7d17391b9d9619f709afeacdd118973471d6 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 22 Jan 2021 21:38:31 +0900
Subject: [PATCH] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity.

- the total number of writing/syncing WAL data.
- the total amount of time that has been spent in
  writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it leads syncing
WAL data at same time when to write it.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com

(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
---
 doc/src/sgml/config.sgml  | 21 +++
 doc/src/sgml/monitoring.sgml  | 48 ++-
 src/backend/access/transam/xlog.c | 59 ++-
 src/backend/catalog/system_views.sql  |  4 ++
 src/backend/postmaster/pgstat.c   |  4 ++
 src/backend/postmaster/walwriter.c|  3 +
 src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
 src/backend/utils/misc/guc.c  |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   | 14 ++---
 src/include/pgstat.h  |  8 +++
 src/test/regress/expected/rules.out   |  6 +-
 13 files changed, 189 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..43f3fbcaf8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7416,6 +7416,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-22 Thread japin


Hi, Masahiro

Thanks for you update the v4 patch.  Here are some comments:

(1)
+   char*msg = NULL;
+   boolsync_called;/* whether to sync data to the 
disk. */
+   instr_time  start;
+   instr_time  duration;
+
+   /* check whether to sync data to the disk is really occurred. */
+   sync_called = false;

Maybe we can initialize the "sync_called" variable when declare it.

(2)
+   if (sync_called)
+   {
+   /* increment the i/o timing and the number of times to fsync 
WAL data */
+   if (track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time =  
INSTR_TIME_GET_MICROSEC(duration);
+   }
+
+   WalStats.m_wal_sync++;
+   }

There is an extra space before INSTR_TIME_GET_MICROSEC(duration).

In the issue_xlog_fsync(), the comment says that if sync_method is
SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, it already write synced.
Does that mean it synced when write the WAL data?  And for those cases, we
cannot get accurate write/sync timing and number of write/sync times, right?

case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
/* write synced it already */
break;

On Fri, 22 Jan 2021 at 21:05, Masahiro Ikeda  wrote:
> On 2021-01-22 14:50, Masahiko Sawada wrote:
>> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda 
>>  wrote:
>>> 
>>> Hi,
>>> 
>>> I rebased the patch to the master branch.
>> 
>> Thank you for working on this. I've read the latest patch. Here are 
>> comments:
>> 
>> ---
>> +   if (track_wal_io_timing)
>> +   {
>> +   INSTR_TIME_SET_CURRENT(duration);
>> +   INSTR_TIME_SUBTRACT(duration, start);
>> +   WalStats.m_wal_write_time +=
>> INSTR_TIME_GET_MILLISEC(duration);
>> +   }
>> 
>> * I think it should add the time in micro sec.
>> After running pgbench with track_wal_io_timing = on for 30 sec,
>> pg_stat_wal showed the following on my environment:
>> 
>> postgres(1:61569)=# select * from pg_stat_wal;
>> -[ RECORD 1 ]+-
>> wal_records  | 285947
>> wal_fpi  | 53285
>> wal_bytes| 442008213
>> wal_buffers_full | 0
>> wal_write| 25516
>> wal_write_time   | 0
>> wal_sync | 25437
>> wal_sync_time| 14490
>> stats_reset  | 2021-01-22 10:56:13.29464+09
>> 
>> Since writes can complete less than a millisecond, wal_write_time
>> didn't increase. I think sync_time could also have the same problem.
>
> Thanks for your comments. I didn't notice that.
> I changed the unit from milliseconds to microseconds.
>
>> ---
>> +   /*
>> +* Measure i/o timing to fsync WAL data.
>> +*
>> +* The wal receiver skip to collect it to avoid performance
>> degradation of standy servers.
>> +* If sync_method doesn't have its fsync method, to skip too.
>> +*/
>> +   if (!AmWalReceiverProcess() && track_wal_io_timing && 
>> fsyncMethodCalled())
>> +   INSTR_TIME_SET_CURRENT(start);
>> 
>> * Why does only the wal receiver skip it even if track_wal_io_timinig
>> is true? I think the performance degradation is also true for backend
>> processes. If there is another reason for that, I think it's better to
>> mention in both the doc and comment.
>> * How about checking track_wal_io_timing first?
>> * s/standy/standby/
>
> I fixed it.
> As kuroda-san mentioned too, the skip is no need to be considered.
>
>> ---
>> +   /* increment the i/o timing and the number of times to fsync WAL 
>> data */
>> +   if (fsyncMethodCalled())
>> +   {
>> +   if (!AmWalReceiverProcess() && track_wal_io_timing)
>> +   {
>> +   INSTR_TIME_SET_CURRENT(duration);
>> +   INSTR_TIME_SUBTRACT(duration, start);
>> +   WalStats.m_wal_sync_time += 
>> INSTR_TIME_GET_MILLISEC(duration);
>> +   }
>> +
>> +   WalStats.m_wal_sync++;
>> +   }
>> 
>> * I'd avoid always calling fsyncMethodCalled() in this path. How about
>> incrementing m_wal_sync after each sync operation?
>
> I think if syncing the disk does not occur, m_wal_sync should not be 
> incremented.
> It depends enableFsync and sync_method.
>
> enableFsync is checked in each fsync method like 
> pg_fsync_no_writethrough(),
> so if incrementing m_wal_sync after each sync operation, it should be 
> implemented
> in each fsync method. It leads to many duplicated codes.
>
> So, why don't you change the function to a flag whether to
> sync data to the disk will be occurred or not in issue_xlog_fsync()?
>
>
>> ---
>> +/*
>> + * Check if fsync mothod is called.
>> + */
>> +static bool
>> +fsyncMethodCalled()
>> +{
>> +   if (!enableFsync)
>> +   return false;
>> +
>> +   switch (sync_method)
>> +   {
>> +   case SYNC_METHOD_FSYNC:
>> +   case 

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-22 Thread Masahiro Ikeda

On 2021-01-22 14:50, Masahiko Sawada wrote:
On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda 
 wrote:


Hi,

I rebased the patch to the master branch.


Thank you for working on this. I've read the latest patch. Here are 
comments:


---
+   if (track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_write_time +=
INSTR_TIME_GET_MILLISEC(duration);
+   }

* I think it should add the time in micro sec.
After running pgbench with track_wal_io_timing = on for 30 sec,
pg_stat_wal showed the following on my environment:

postgres(1:61569)=# select * from pg_stat_wal;
-[ RECORD 1 ]+-
wal_records  | 285947
wal_fpi  | 53285
wal_bytes| 442008213
wal_buffers_full | 0
wal_write| 25516
wal_write_time   | 0
wal_sync | 25437
wal_sync_time| 14490
stats_reset  | 2021-01-22 10:56:13.29464+09

Since writes can complete less than a millisecond, wal_write_time
didn't increase. I think sync_time could also have the same problem.


Thanks for your comments. I didn't notice that.
I changed the unit from milliseconds to microseconds.


---
+   /*
+* Measure i/o timing to fsync WAL data.
+*
+* The wal receiver skip to collect it to avoid performance
degradation of standy servers.
+* If sync_method doesn't have its fsync method, to skip too.
+*/
+   if (!AmWalReceiverProcess() && track_wal_io_timing && 
fsyncMethodCalled())

+   INSTR_TIME_SET_CURRENT(start);

* Why does only the wal receiver skip it even if track_wal_io_timinig
is true? I think the performance degradation is also true for backend
processes. If there is another reason for that, I think it's better to
mention in both the doc and comment.
* How about checking track_wal_io_timing first?
* s/standy/standby/


I fixed it.
As kuroda-san mentioned too, the skip is no need to be considered.


---
+   /* increment the i/o timing and the number of times to fsync WAL 
data */

+   if (fsyncMethodCalled())
+   {
+   if (!AmWalReceiverProcess() && track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time += 
INSTR_TIME_GET_MILLISEC(duration);

+   }
+
+   WalStats.m_wal_sync++;
+   }

* I'd avoid always calling fsyncMethodCalled() in this path. How about
incrementing m_wal_sync after each sync operation?


I think if syncing the disk does not occur, m_wal_sync should not be 
incremented.

It depends enableFsync and sync_method.

enableFsync is checked in each fsync method like 
pg_fsync_no_writethrough(),
so if incrementing m_wal_sync after each sync operation, it should be 
implemented

in each fsync method. It leads to many duplicated codes.

So, why don't you change the function to a flag whether to
sync data to the disk will be occurred or not in issue_xlog_fsync()?



---
+/*
+ * Check if fsync mothod is called.
+ */
+static bool
+fsyncMethodCalled()
+{
+   if (!enableFsync)
+   return false;
+
+   switch (sync_method)
+   {
+   case SYNC_METHOD_FSYNC:
+   case SYNC_METHOD_FSYNC_WRITETHROUGH:
+   case SYNC_METHOD_FDATASYNC:
+   return true;
+   default:
+   /* others don't have a specific fsync method */
+   return false;
+   }
+}

* I'm concerned that the function name could confuse the reader
because it's called even before the fsync method is called. As I
commented above, calling to fsyncMethodCalled() can be eliminated.
That way, this function is called at only once. So do we really need
this function?


Thanks to your comments, I removed them.



* As far as I read the code, issue_xlog_fsync() seems to do fsync even
if enableFsync is false. Why does the function return false in that
case? I might be missing something.


IIUC, the reason is that I thought that each fsync functions like 
pg_fsync_no_writethrough() check enableFsync.


If this code doesn't check, m_wal_sync_time may be incremented
even though some sync methods like SYNC_METHOD_OPEN don't call to sync 
some data to the disk at the time.



* void is missing as argument?

* s/mothod/method/


I removed them.


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 3db5e6fc1b68ae02b393a02b2d28f0d163d7bd03 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 22 Jan 2021 21:38:31 +0900
Subject: [PATCH] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity.

- the total number of writing/syncing WAL data.
- the total amount of time that has been spent in
  writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com

RE: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-22 Thread Masahiro Ikeda

On 2021-01-22 11:54, kuroda.hay...@fujitsu.com wrote:

Dear Ikeda-san,

This patch cannot be applied to the HEAD, but anyway I put a comment.

```
+   /*
+* Measure i/o timing to fsync WAL data.
+*
+* The wal receiver skip to collect it to avoid performance
degradation of standy servers.
+* If sync_method doesn't have its fsync method, to skip too.
+*/
+	if (!AmWalReceiverProcess() && track_wal_io_timing && 
fsyncMethodCalled())

+   INSTR_TIME_SET_CURRENT(start);
```

I think m_wal_sync_time should be collected even if the process is 
WalRecevier.

Because all wal_fsync should be recorded, and
some performance issues have been aleady occurred if
track_wal_io_timing is turned on.
I think it's strange only to take care of the walrecevier case.


Kuroda-san, Thanks for your comments.

Although I thought that the performance impact may be bigger in standby 
servers
because WALReceiver didn't use wal buffers, it's no need to be 
considered.

I agreed that if track_wal_io_timing is turned on, the primary server's
performance degradation occurs too.

I will make rebased and modified.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-21 Thread Masahiko Sawada
On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda  wrote:
>
> Hi,
>
> I rebased the patch to the master branch.

Thank you for working on this. I've read the latest patch. Here are comments:

---
+   if (track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_write_time +=
INSTR_TIME_GET_MILLISEC(duration);
+   }

* I think it should add the time in micro sec.

After running pgbench with track_wal_io_timing = on for 30 sec,
pg_stat_wal showed the following on my environment:

postgres(1:61569)=# select * from pg_stat_wal;
-[ RECORD 1 ]+-
wal_records  | 285947
wal_fpi  | 53285
wal_bytes| 442008213
wal_buffers_full | 0
wal_write| 25516
wal_write_time   | 0
wal_sync | 25437
wal_sync_time| 14490
stats_reset  | 2021-01-22 10:56:13.29464+09

Since writes can complete less than a millisecond, wal_write_time
didn't increase. I think sync_time could also have the same problem.

---
+   /*
+* Measure i/o timing to fsync WAL data.
+*
+* The wal receiver skip to collect it to avoid performance
degradation of standy servers.
+* If sync_method doesn't have its fsync method, to skip too.
+*/
+   if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled())
+   INSTR_TIME_SET_CURRENT(start);

* Why does only the wal receiver skip it even if track_wal_io_timinig
is true? I think the performance degradation is also true for backend
processes. If there is another reason for that, I think it's better to
mention in both the doc and comment.

* How about checking track_wal_io_timing first?

* s/standy/standby/

---
+   /* increment the i/o timing and the number of times to fsync WAL data */
+   if (fsyncMethodCalled())
+   {
+   if (!AmWalReceiverProcess() && track_wal_io_timing)
+   {
+   INSTR_TIME_SET_CURRENT(duration);
+   INSTR_TIME_SUBTRACT(duration, start);
+   WalStats.m_wal_sync_time += INSTR_TIME_GET_MILLISEC(duration);
+   }
+
+   WalStats.m_wal_sync++;
+   }

* I'd avoid always calling fsyncMethodCalled() in this path. How about
incrementing m_wal_sync after each sync operation?

---
+/*
+ * Check if fsync mothod is called.
+ */
+static bool
+fsyncMethodCalled()
+{
+   if (!enableFsync)
+   return false;
+
+   switch (sync_method)
+   {
+   case SYNC_METHOD_FSYNC:
+   case SYNC_METHOD_FSYNC_WRITETHROUGH:
+   case SYNC_METHOD_FDATASYNC:
+   return true;
+   default:
+   /* others don't have a specific fsync method */
+   return false;
+   }
+}

* I'm concerned that the function name could confuse the reader
because it's called even before the fsync method is called. As I
commented above, calling to fsyncMethodCalled() can be eliminated.
That way, this function is called at only once. So do we really need
this function?

* As far as I read the code, issue_xlog_fsync() seems to do fsync even
if enableFsync is false. Why does the function return false in that
case? I might be missing something.

* void is missing as argument?

* s/mothod/method/

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-21 Thread kuroda.hay...@fujitsu.com
Dear Ikeda-san,

This patch cannot be applied to the HEAD, but anyway I put a comment.

```
+   /*
+* Measure i/o timing to fsync WAL data.
+*
+* The wal receiver skip to collect it to avoid performance degradation 
of standy servers.
+* If sync_method doesn't have its fsync method, to skip too.
+*/
+   if (!AmWalReceiverProcess() && track_wal_io_timing && 
fsyncMethodCalled())
+   INSTR_TIME_SET_CURRENT(start);
```

I think m_wal_sync_time should be collected even if the process is WalRecevier.
Because all wal_fsync should be recorded, and
some performance issues have been aleady occurred if track_wal_io_timing is 
turned on.
I think it's strange only to take care of the walrecevier case.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: About to add WAL write/fsync statistics to pg_stat_wal view

2020-12-25 Thread Masahiro Ikeda

Hi,

I rebased the patch to the master branch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b60382778..45d54cd394 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c901306..1a5aad819f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to disk because WAL buffers became full
+   Total number of times WAL data was written to disk because WAL buffers became full
+  
+ 
+
+ 
+  
+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to disk
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to disk
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
   
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9867e1b403..19d101647e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -109,6 +109,7 @@ int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -980,6 +981,8 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+static bool fsyncMethodCalled();
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -2538,6 +2541,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
+			instr_time	start;
+			instr_time	duration;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2546,9 +2551,24 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 errno = 0;
+
+/* Measure i/o timing to write WAL data */
+if (track_wal_io_timing)
+	INSTR_TIME_SET_CURRENT(start);
+
 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 pgstat_report_wait_end();
+
+if (track_wal_io_timing)
+{
+	INSTR_TIME_SET_CURRENT(duration);
+	INSTR_TIME_SUBTRACT(duration, start);
+	WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration);
+}
+
+WalStats.m_wal_write++;
+
 if (written <= 0)
 {
 	char		xlogfname[MAXFNAMELEN];
@@ -10489,6 +10509,27 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 	}
 }
 
+/*
+ * Check if fsync mothod is called.
+ */
+static bool
+fsyncMethodCalled()
+{
+	if (!enableFsync)
+		return false;
+
+	switch (sync_method)
+	{
+		case SYNC_METHOD_FSYNC:
+		case SYNC_METHOD_FSYNC_WRITETHROUGH:
+		case SYNC_METHOD_FDATASYNC:
+			return true;
+		default:
+			/* others don't have a specific fsync method */
+			return false;
+	}
+}
+
 
 /*
  * Issue appropriate kind of fsync (if any) for an XLOG output file.
@@ -10499,8 +10540,19 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	instr_time	

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2020-12-08 Thread Masahiro Ikeda

On 2020-12-08 16:45, Li Japin wrote:

Hi,

On Dec 8, 2020, at 1:06 PM, Masahiro Ikeda  
wrote:


Hi,

I propose to add wal write/fsync statistics to pg_stat_wal view.
It's useful not only for developing/improving source code related to 
WAL

but also for users to detect workload changes, HW failure, and so on.

I introduce "track_wal_io_timing" parameter and provide the following 
information on pg_stat_wal view.
I separate the parameter from "track_io_timing" to 
"track_wal_io_timing"
because IIUC, WAL I/O activity may have a greater impact on query 
performance than database I/O activity.


```
postgres=# SELECT wal_write, wal_write_time, wal_sync, wal_sync_time 
FROM pg_stat_wal;

-[ RECORD 1 ]--+
wal_write  | 650  # Total number of times WAL data was written to 
the disk


wal_write_time | 43   # Total amount of time that has been spent in 
the portion of WAL data was written to disk
 # if track-wal-io-timing is enabled, otherwise 
zero


wal_sync   | 78   # Total number of times WAL data was synced to 
the disk


wal_sync_time  | 104  # Total amount of time that has been spent in 
the portion of WAL data was synced to disk
 # if track-wal-io-timing is enabled, otherwise 
zero

```

What do you think?
Please let me know your comments.

Regards
--
Masahiro Ikeda
NTT DATA 
CORPORATION<0001_add_wal_io_activity_to_the_pg_stat_wal.patch>


There is a no previous prototype warning for ‘fsyncMethodCalled’, and
it now only used in xlog.c,
should we declare with static? And this function wants a boolean as a
return, should we use
true/false other than 0/1?

+/*
+ * Check if fsync mothod is called.
+ */
+bool
+fsyncMethodCalled()
+{
+   if (!enableFsync)
+   return 0;
+
+   switch (sync_method)
+   {
+   case SYNC_METHOD_FSYNC:
+   case SYNC_METHOD_FSYNC_WRITETHROUGH:
+   case SYNC_METHOD_FDATASYNC:
+   return 1;
+   default:
+   /* others don't have a specific fsync method */
+   return 0;
+   }
+}
+


Thanks for your review.
I agree with your comments. I fixed them.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d690..ba923a2b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 52a69a53..ce4f652d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to the disk because WAL buffers got full
+   Total number of times WAL data was written to the disk because WAL buffers got full
+  
+ 
+
+ 
+  
+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to the disk
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to the disk
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
   
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7e81ce4f..18d6ecc5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -109,6 +109,7 @@ int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2020-12-07 Thread Li Japin
Hi,

> On Dec 8, 2020, at 1:06 PM, Masahiro Ikeda  wrote:
> 
> Hi,
> 
> I propose to add wal write/fsync statistics to pg_stat_wal view.
> It's useful not only for developing/improving source code related to WAL
> but also for users to detect workload changes, HW failure, and so on.
> 
> I introduce "track_wal_io_timing" parameter and provide the following 
> information on pg_stat_wal view.
> I separate the parameter from "track_io_timing" to "track_wal_io_timing"
> because IIUC, WAL I/O activity may have a greater impact on query performance 
> than database I/O activity.
> 
> ```
> postgres=# SELECT wal_write, wal_write_time, wal_sync, wal_sync_time FROM 
> pg_stat_wal;
> -[ RECORD 1 ]--+
> wal_write  | 650  # Total number of times WAL data was written to the disk
> 
> wal_write_time | 43   # Total amount of time that has been spent in the 
> portion of WAL data was written to disk
>  # if track-wal-io-timing is enabled, otherwise zero
> 
> wal_sync   | 78   # Total number of times WAL data was synced to the disk
> 
> wal_sync_time  | 104  # Total amount of time that has been spent in the 
> portion of WAL data was synced to disk
>  # if track-wal-io-timing is enabled, otherwise zero
> ```
> 
> What do you think?
> Please let me know your comments.
> 
> Regards
> -- 
> Masahiro Ikeda
> NTT DATA CORPORATION<0001_add_wal_io_activity_to_the_pg_stat_wal.patch>

There is a no previous prototype warning for ‘fsyncMethodCalled’, and it now 
only used in xlog.c,
should we declare with static? And this function wants a boolean as a return, 
should we use
true/false other than 0/1?

+/*
+ * Check if fsync mothod is called.
+ */
+bool
+fsyncMethodCalled()
+{
+   if (!enableFsync)
+   return 0;
+
+   switch (sync_method)
+   {
+   case SYNC_METHOD_FSYNC:
+   case SYNC_METHOD_FSYNC_WRITETHROUGH:
+   case SYNC_METHOD_FDATASYNC:
+   return 1;
+   default:
+   /* others don't have a specific fsync method */
+   return 0;
+   }
+}
+

--
Best regards
ChengDu WenWu Information Technology Co.,Ltd.
Japin Li



About to add WAL write/fsync statistics to pg_stat_wal view

2020-12-07 Thread Masahiro Ikeda

Hi,

I propose to add wal write/fsync statistics to pg_stat_wal view.
It's useful not only for developing/improving source code related to WAL
but also for users to detect workload changes, HW failure, and so on.

I introduce "track_wal_io_timing" parameter and provide the following 
information on pg_stat_wal view.

I separate the parameter from "track_io_timing" to "track_wal_io_timing"
because IIUC, WAL I/O activity may have a greater impact on query 
performance than database I/O activity.


```
postgres=# SELECT wal_write, wal_write_time, wal_sync, wal_sync_time 
FROM pg_stat_wal;

-[ RECORD 1 ]--+
wal_write  | 650  # Total number of times WAL data was written to 
the disk


wal_write_time | 43   # Total amount of time that has been spent in the 
portion of WAL data was written to disk
  # if track-wal-io-timing is enabled, otherwise 
zero


wal_sync   | 78   # Total number of times WAL data was synced to the 
disk


wal_sync_time  | 104  # Total amount of time that has been spent in the 
portion of WAL data was synced to disk
  # if track-wal-io-timing is enabled, otherwise 
zero

```

What do you think?
Please let me know your comments.

Regards
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d690..ba923a2b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 52a69a53..ce4f652d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to the disk because WAL buffers got full
+   Total number of times WAL data was written to the disk because WAL buffers got full
+  
+ 
+
+ 
+  
+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to the disk
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to the disk
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
   
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7e81ce4f..c9f33d23 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -109,6 +109,7 @@ int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2528,6 +2529,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
+			instr_time	start;
+			instr_time	duration;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2536,9 +2539,24 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 errno = 0;
+
+/* Measure i/o timing to write WAL data */
+if (track_wal_io_timing)
+	INSTR_TIME_SET_CURRENT(start);
+
 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 pgstat_report_wait_end();
+
+if (track_wal_io_timing)
+{
+	INSTR_TIME_SET_CURRENT(duration);
+	INSTR_TIME_SUBTRACT(duration, start);
+	WalStats.m_wal_write_time +=