Re: [HACKERS] wait events for disk I/O

2017-03-20 Thread Rajkumar Raghuwanshi
On Sat, Mar 18, 2017 at 10:52 PM, Rushabh Lathia
 wrote:
>
>
> On Sat, Mar 18, 2017 at 5:15 PM, Robert Haas  wrote:
>>
>> On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia
>>  wrote:
>> > I tried to cover all the suggestion in the attached latest patch.
>>
>> Committed.  I reworded the documentation entries, renamed a few of the
>> wait events to make things more consistent, put all three lists in
>> rigorous alphabetical order, and I fixed a couple of places where an
>> error return from a system call could lead to returning without
>> clearing the wait event.
>>

Thanks, I ran pgbench with shared_buffers set to 128kB, able to see
below wait IO events.

DataFileRead
DataFileWrite
WALInitSync
WALInitWrite
WALWrite
DataFileSync
WALRead

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-18 Thread Rushabh Lathia
On Sat, Mar 18, 2017 at 5:15 PM, Robert Haas  wrote:

> On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia
>  wrote:
> > I tried to cover all the suggestion in the attached latest patch.
>
> Committed.  I reworded the documentation entries, renamed a few of the
> wait events to make things more consistent, put all three lists in
> rigorous alphabetical order, and I fixed a couple of places where an
> error return from a system call could lead to returning without
> clearing the wait event.
>
>
Thanks Robert.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Rushabh Lathia


Re: [HACKERS] wait events for disk I/O

2017-03-18 Thread Robert Haas
On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia
 wrote:
> I tried to cover all the suggestion in the attached latest patch.

Committed.  I reworded the documentation entries, renamed a few of the
wait events to make things more consistent, put all three lists in
rigorous alphabetical order, and I fixed a couple of places where an
error return from a system call could lead to returning without
clearing the wait event.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-17 Thread Rushabh Lathia
Thanks Robert for the review.

On Thu, Mar 16, 2017 at 8:05 PM, Robert Haas  wrote:

> On Thu, Mar 16, 2017 at 8:28 AM, Rahila Syed 
> wrote:
> > Thank you for the updated patch.
> >
> > I have applied and tested it on latest sources and the patch looks good
> to
> > me.
>
> The documentation puts the new wait events in a pretty random order.
> I think they should be alphabetized, like we do with the IPC events.
>

Done.


> I also suggest we change the naming scheme so that the kind of thing
> being operated on is first and this is followed by the operation name.
> This will let us keep related entries next to each other after
> alphabetizing.  So with that principle in mind:
>
>
Yes above naming scheme is more clear then the one i choose.

- instead of ReadDataBlock etc. I propose DataFileRead, DataFileWrite,
> DataFileSync, DataFileExtend, DataFileFlush, DataFilePrefetch,
> DataFileTruncate.  using file instead of block avoids singular/plural
> confusion.
> - instead of RelationSync and RelationImmedSync I proposed
> DataFileSync and DataFileImmediateSync; these are md.c operations like
> the previous set, so why name it differently?
>

Yes, you are right, DataFileSync and DataFileImmediateSync make more
sense.


> - instead of WriteRewriteDataBlock and SyncRewriteDataBlock and
> TruncateLogicalMappingRewrite, which aren't consistent with each other
> even though they are related, I propose LogicalRewriteWrite,
> LogicalRewriteSync, and LogicalRewriteTruncate, which are also closer
> to the names of the functions that contain those wait points
> - for ReadBuffile and WriteBuffile seem OK, I propose BufFileRead and
> BufFileWrite, again reversing the order and also tweaking the
> capitalization
> - in keeping with our new policy of referring to xlog as wal in user
> visible interfaces, I propose WALRead, WALCopyRead, WALWrite,
> WALInitWrite, WALCopyWrite, WALBootstrapWrite, WALInitSync,
> WALBootstrapSync, WALSyncMethodAssign
> - for control file ops, ControlFileRead, ControlFileWrite,
> ControlFileWriteUpdate, ControlFileSync, ControlFileSyncUpdate
>

- ReadApplyLogicalMapping and friends seem to have to do with the
> reorderbuffer code, so maybe ReorderBufferRead etc.
> - there seems to be some discrepancy between the documentation and
> pgstat_get_wait_io for the snapbuild stuff.  maybe SnapBuildWrite,
> SnapBuildSync, SnapBuildRead.
> - SLRURead, SLRUWrite, etc.
> - TimelineHistoryRead, etc.
> - the syslogger changes should be dropped, since the syslogger is not
> and should not be connected to shared memory
>

Ok removed.


> - the replslot terminology seems like a case of odd capitalization and
> overeager abbreviation.  why not ReplicationSlotRead,
> ReplicationSlotWrite, etc?  similarly RelationMapRead,
> RelationMapWrite, etc?
> - CopyFileRead, CopyFileWrite
> - LockFileCreateRead, etc.
> - AddToDataDirLockFileRead is a little long and incomprehensible;
> maybe LockFileUpdateRead etc.
>

How about LockFileAddToDataDirRead? even though its little long but it
gives clear understanding about what's going on.


> - DSMWriteZeroBytes, maybe?
>
>
DSMFillZeroWrite? Basically want to keep the file IP operation at the end of
the event name.


> Of course the constants should be renamed to match.
>

I tried to cover all the suggestion in the attached latest patch.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9eaf43a..acf310e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -716,6 +716,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   point.
  
 
+
+ 
+  IO: The server process is waiting for a IO to complete.
+  wait_event will identify the specific wait point.
+ 
+

   
  
@@ -1272,6 +1278,271 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  RecoveryApplyDelay
  Waiting to apply WAL at recovery because it is delayed.
 
+
+ IO
+ BufFileRead
+ Waiting during buffer read operation.
+
+
+ BufFileWrite
+ Waiting during buffer write operation.
+
+
+ CheckpointLogicalRewriteSync
+ Waiting to sync logical mapping during a checkpoint for logical rewrite mappings.
+
+
+ ControlFileRead
+ Waiting to read the control file.
+
+
+ ControlFileSync
+ Waiting to sync the control file.
+
+
+ ControlFileSyncUpdate
+ Waiting to sync the control file during update control file.
+
+
+ ControlFileWrite
+ Waiting to write the control file.
+
+
+ ControlFileWriteUpdate
+ Waiting to write the control file during update control file.
+

Re: [HACKERS] wait events for disk I/O

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 8:28 AM, Rahila Syed  wrote:
> Thank you for the updated patch.
>
> I have applied and tested it on latest sources and the patch looks good to
> me.

The documentation puts the new wait events in a pretty random order.
I think they should be alphabetized, like we do with the IPC events.
I also suggest we change the naming scheme so that the kind of thing
being operated on is first and this is followed by the operation name.
This will let us keep related entries next to each other after
alphabetizing.  So with that principle in mind:

- instead of ReadDataBlock etc. I propose DataFileRead, DataFileWrite,
DataFileSync, DataFileExtend, DataFileFlush, DataFilePrefetch,
DataFileTruncate.  using file instead of block avoids singular/plural
confusion.
- instead of RelationSync and RelationImmedSync I proposed
DataFileSync and DataFileImmediateSync; these are md.c operations like
the previous set, so why name it differently?
- instead of WriteRewriteDataBlock and SyncRewriteDataBlock and
TruncateLogicalMappingRewrite, which aren't consistent with each other
even though they are related, I propose LogicalRewriteWrite,
LogicalRewriteSync, and LogicalRewriteTruncate, which are also closer
to the names of the functions that contain those wait points
- for ReadBuffile and WriteBuffile seem OK, I propose BufFileRead and
BufFileWrite, again reversing the order and also tweaking the
capitalization
- in keeping with our new policy of referring to xlog as wal in user
visible interfaces, I propose WALRead, WALCopyRead, WALWrite,
WALInitWrite, WALCopyWrite, WALBootstrapWrite, WALInitSync,
WALBootstrapSync, WALSyncMethodAssign
- for control file ops, ControlFileRead, ControlFileWrite,
ControlFileWriteUpdate, ControlFileSync, ControlFileSyncUpdate
- ReadApplyLogicalMapping and friends seem to have to do with the
reorderbuffer code, so maybe ReorderBufferRead etc.
- there seems to be some discrepancy between the documentation and
pgstat_get_wait_io for the snapbuild stuff.  maybe SnapBuildWrite,
SnapBuildSync, SnapBuildRead.
- SLRURead, SLRUWrite, etc.
- TimelineHistoryRead, etc.
- the syslogger changes should be dropped, since the syslogger is not
and should not be connected to shared memory
- the replslot terminology seems like a case of odd capitalization and
overeager abbreviation.  why not ReplicationSlotRead,
ReplicationSlotWrite, etc?  similarly RelationMapRead,
RelationMapWrite, etc?
- CopyFileRead, CopyFileWrite
- LockFileCreateRead, etc.
- AddToDataDirLockFileRead is a little long and incomprehensible;
maybe LockFileUpdateRead etc.
- DSMWriteZeroBytes, maybe?

Of course the constants should be renamed to match.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-16 Thread Rahila Syed
Thank you for the updated patch.

I have applied and tested it on latest sources and the patch looks good to
me.

>I am not quite sure about this, as this is for stat statements. Also part
from the
>place you found there are many other fwrite() call into
pg_stat_statements, and
>I intentionally haven't added event here as it is very small write about
stat, and
>doesn't look like we should add for those call.
I agree that this writes less amount of data. Although tracking this can
be useful too in scenarios where pg_stat_statements is lagging due to I/O
bottleneck.
I will leave this decision to the committer.

Thank you,
Rahila Syed

On Wed, Mar 15, 2017 at 1:03 PM, Rushabh Lathia 
wrote:

> Thanks Rahila for reviewing this patch.
>
> On Tue, Mar 14, 2017 at 8:13 PM, Rahila Syed 
> wrote:
>
>> Hello,
>>
>> I applied and tested this patch on latest sources and it works fine.
>>
>> Following are some comments,
>>
>> >+   /* Wait event for SNRU */
>> >+   WAIT_EVENT_READ_SLRU_PAGE,
>> Typo in the comment.
>>
>>
> Fixed.
>
>
>> >FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush,
>> WAIT_EVENT_FLUSH_DATA_BLOCK);
>> This call is inside mdwriteback() which can flush more than one block so
>> should  WAIT_EVENT _FLUSH_DATA_BLOCK
>> be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS?
>>
>>
> Changed with WAIT_EVENT_FLUSH_DATA_BLOCKS.
>
>
>> Should calls to write() in following functions be tracked too?
>>  qtext_store()  - This is related to pg_stat_statements
>>
>>
>
> I am not quite sure about this, as this is for stat statements. Also part
> from the
> place you found there are many other fwrite() call into
> pg_stat_statements, and
> I intentionally haven't added event here as it is very small write about
> stat, and
> doesn't look like we should add for those call.
>
>
>
>> dsm_impl_mmap() - This is in relation to creating dsm segments.
>>
>>
> Added new event here. Actually particular write call is zero-filling the
> DSM file.
>
>
>> write_auto_conf_file()-  This is called when updated configuration
>> parameters are
>>  written to a temp file.
>>
>>
> write_auto_conf_file() is getting called during the ALTER SYSTEM call.
> Here write
> happen only when someone explicitly run the ALTER SYSTEM call. This is
> administrator call and so doesn't seem like necessary to add separate wait
> event
> for this.
>
> PFA latest patch with other fixes.
>
>
>>
>> On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas 
>>> wrote:
>>>
 On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila 
 wrote:
 > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas 
 wrote:
 >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila 
 wrote:
 >>> Sure, if you think both Writes and Reads at OS level can have some
 >>> chance of blocking in obscure cases, then we should add a wait event
 >>> for them.
 >>
 >> I think writes have a chance of blocking in cases even in cases that
 >> are not very obscure at all.
 >
 > Point taken for writes, but I think in general we should have some
 > criteria based on which we can decide whether to have a wait event for
 > a particular call. It should not happen that we have tons of wait
 > events and out of which, only a few are helpful in most of the cases
 > in real-world scenarios.

 Well, the problem is that if you pick and choose which wait events to
 add based on what you think will be common, you're actually kind of
 hosing yourself. Because now when something uncommon happens, suddenly
 you don't get any wait event data and you can't tell what's happening.
 I think the number of new wait events added by Rushabh's patch is
 wholly reasonable.  Yeah, some of those are going to be a lot more
 common than others, but so what?  We add wait events so that we can
 find out what's going on.  I don't want to sometimes know when a
 backend is blocked on an I/O.  I want to ALWAYS know.


>>> Yes, I agree with Robert. Knowing what we want and what we don't
>>> want is difficult to judge. Something which we might think its not useful
>>> information, and later of end up into situation where we re-think about
>>> adding those missing stuff is not good. Having more information about
>>> the system, specially for monitoring purpose is always good.
>>>
>>> I am attaching  another version of the patch, as I found stupid mistake
>>> in the earlier version of patch, where I missed to initialize initial
>>> value to
>>> WaitEventIO enum. Also earlier version was not getting cleanly apply on
>>> the current version of sources.
>>>
>>>
>>>
>>> --
>>> Rushabh Lathia
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>>
>>>
>>> --

Re: [HACKERS] wait events for disk I/O

2017-03-15 Thread Rushabh Lathia
Thanks Rahila for reviewing this patch.

On Tue, Mar 14, 2017 at 8:13 PM, Rahila Syed  wrote:

> Hello,
>
> I applied and tested this patch on latest sources and it works fine.
>
> Following are some comments,
>
> >+   /* Wait event for SNRU */
> >+   WAIT_EVENT_READ_SLRU_PAGE,
> Typo in the comment.
>
>
Fixed.


> >FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush,
> WAIT_EVENT_FLUSH_DATA_BLOCK);
> This call is inside mdwriteback() which can flush more than one block so
> should  WAIT_EVENT _FLUSH_DATA_BLOCK
> be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS?
>
>
Changed with WAIT_EVENT_FLUSH_DATA_BLOCKS.


> Should calls to write() in following functions be tracked too?
>  qtext_store()  - This is related to pg_stat_statements
>
>

I am not quite sure about this, as this is for stat statements. Also part
from the
place you found there are many other fwrite() call into pg_stat_statements,
and
I intentionally haven't added event here as it is very small write about
stat, and
doesn't look like we should add for those call.



> dsm_impl_mmap() - This is in relation to creating dsm segments.
>
>
Added new event here. Actually particular write call is zero-filling the
DSM file.


> write_auto_conf_file()-  This is called when updated configuration
> parameters are
>  written to a temp file.
>
>
write_auto_conf_file() is getting called during the ALTER SYSTEM call. Here
write
happen only when someone explicitly run the ALTER SYSTEM call. This is
administrator call and so doesn't seem like necessary to add separate wait
event
for this.

PFA latest patch with other fixes.


>
> On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
> wrote:
>
>>
>>
>> On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas 
>> wrote:
>>
>>> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila 
>>> wrote:
>>> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas 
>>> wrote:
>>> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila 
>>> wrote:
>>> >>> Sure, if you think both Writes and Reads at OS level can have some
>>> >>> chance of blocking in obscure cases, then we should add a wait event
>>> >>> for them.
>>> >>
>>> >> I think writes have a chance of blocking in cases even in cases that
>>> >> are not very obscure at all.
>>> >
>>> > Point taken for writes, but I think in general we should have some
>>> > criteria based on which we can decide whether to have a wait event for
>>> > a particular call. It should not happen that we have tons of wait
>>> > events and out of which, only a few are helpful in most of the cases
>>> > in real-world scenarios.
>>>
>>> Well, the problem is that if you pick and choose which wait events to
>>> add based on what you think will be common, you're actually kind of
>>> hosing yourself. Because now when something uncommon happens, suddenly
>>> you don't get any wait event data and you can't tell what's happening.
>>> I think the number of new wait events added by Rushabh's patch is
>>> wholly reasonable.  Yeah, some of those are going to be a lot more
>>> common than others, but so what?  We add wait events so that we can
>>> find out what's going on.  I don't want to sometimes know when a
>>> backend is blocked on an I/O.  I want to ALWAYS know.
>>>
>>>
>> Yes, I agree with Robert. Knowing what we want and what we don't
>> want is difficult to judge. Something which we might think its not useful
>> information, and later of end up into situation where we re-think about
>> adding those missing stuff is not good. Having more information about
>> the system, specially for monitoring purpose is always good.
>>
>> I am attaching  another version of the patch, as I found stupid mistake
>> in the earlier version of patch, where I missed to initialize initial
>> value to
>> WaitEventIO enum. Also earlier version was not getting cleanly apply on
>> the current version of sources.
>>
>>
>>
>> --
>> Rushabh Lathia
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>


Regards,
Rushabh Lathia
www.EnterpriseDB.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4d03531..45fedec 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -716,6 +716,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   point.
  
 
+
+ 
+  IO: The server process is waiting for a IO to complete.
+  wait_event will identify the specific wait point.
+ 
+

   
  
@@ -1272,6 +1278,276 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   

Re: [HACKERS] wait events for disk I/O

2017-03-14 Thread Rahila Syed
Hello,

I applied and tested this patch on latest sources and it works fine.

Following are some comments,

>+   /* Wait event for SNRU */
>+   WAIT_EVENT_READ_SLRU_PAGE,
Typo in the comment.

>FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush,
WAIT_EVENT_FLUSH_DATA_BLOCK);
This call is inside mdwriteback() which can flush more than one block so
should  WAIT_EVENT _FLUSH_DATA_BLOCK
be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS?

Should calls to write() in following functions be tracked too?
 qtext_store()  - This is related to pg_stat_statements

dsm_impl_mmap() - This is in relation to creating dsm segments.

write_auto_conf_file()-  This is called when updated configuration
parameters are
 written to a temp file.

Thank you,
Rahila Syed

On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
wrote:

>
>
> On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas  wrote:
>
>> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila 
>> wrote:
>> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas 
>> wrote:
>> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila 
>> wrote:
>> >>> Sure, if you think both Writes and Reads at OS level can have some
>> >>> chance of blocking in obscure cases, then we should add a wait event
>> >>> for them.
>> >>
>> >> I think writes have a chance of blocking in cases even in cases that
>> >> are not very obscure at all.
>> >
>> > Point taken for writes, but I think in general we should have some
>> > criteria based on which we can decide whether to have a wait event for
>> > a particular call. It should not happen that we have tons of wait
>> > events and out of which, only a few are helpful in most of the cases
>> > in real-world scenarios.
>>
>> Well, the problem is that if you pick and choose which wait events to
>> add based on what you think will be common, you're actually kind of
>> hosing yourself. Because now when something uncommon happens, suddenly
>> you don't get any wait event data and you can't tell what's happening.
>> I think the number of new wait events added by Rushabh's patch is
>> wholly reasonable.  Yeah, some of those are going to be a lot more
>> common than others, but so what?  We add wait events so that we can
>> find out what's going on.  I don't want to sometimes know when a
>> backend is blocked on an I/O.  I want to ALWAYS know.
>>
>>
> Yes, I agree with Robert. Knowing what we want and what we don't
> want is difficult to judge. Something which we might think its not useful
> information, and later of end up into situation where we re-think about
> adding those missing stuff is not good. Having more information about
> the system, specially for monitoring purpose is always good.
>
> I am attaching  another version of the patch, as I found stupid mistake
> in the earlier version of patch, where I missed to initialize initial
> value to
> WaitEventIO enum. Also earlier version was not getting cleanly apply on
> the current version of sources.
>
>
>
> --
> Rushabh Lathia
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] wait events for disk I/O

2017-03-12 Thread Rajkumar Raghuwanshi
On Thu, Mar 9, 2017 at 10:54 AM, Rushabh Lathia
 wrote:
> Thanks Rajkumar for performing tests on this patch.
>
> Yes, I also noticed similar results in my testing. Additionally sometime I
> also
> noticed ReadSLRUPage event on my system.
>
> I also run the reindex database command and I notices below IO events.
>
> SyncImmedRelation,
> WriteDataBlock
> WriteBuffile,
> WriteXLog,
> ReadDataBlock

I have tried for generating some more events, by running pgbench on
master/slave configuration, able to see three more events.
WriteInitXLogFile
SyncInitXLogFile
ReadXLog


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-08 Thread Rushabh Lathia
Thanks Rajkumar for performing tests on this patch.

Yes, I also noticed similar results in my testing. Additionally sometime I
also
noticed ReadSLRUPage event on my system.

I also run the reindex database command and I notices below IO events.

SyncImmedRelation,
WriteDataBlock
WriteBuffile,
WriteXLog,
ReadDataBlock



On Wed, Mar 8, 2017 at 6:41 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
> wrote:
>
>> I am attaching  another version of the patch, as I found stupid mistake
>> in the earlier version of patch, where I missed to initialize initial
>> value to
>> WaitEventIO enum. Also earlier version was not getting cleanly apply on
>> the current version of sources.
>>
>
> I have applied attached patch, set shared_buffers to 128kB and ran
> pgbench, I am able to see below distinct IO events.
>
> --with ./pgbench -i -s 500 postgres
> application_namewait_event_typewait_event  query
>  pgbench  IO  ExtendDataBlock
> copy pgbench_account
>  pgbench  IO  WriteXLog
> copy pgbench_account
>  pgbench  IO  WriteDataBlock
>  copy pgbench_account
>  pgbench  IO  ReadDataBlock
>  vacuum analyze pgben
>  pgbench  IO  ReadBuffile
>alter table pgbench_
>
> --with ./pgbench -T 600 postgres (readwrite)
> application_namewait_event_typewait_event  query
>  pgbench IO   ReadDataBlock
>  UPDATE pgbench_accou
>  pgbench IO   WriteDataBlock
> UPDATE pgbench_telle
> IO   SyncDataBlock
>
>  pgbench IO   SyncDataBlock
>  UPDATE pgbench_telle
> IO   SyncDataBlock
>  autovacuum: VACUUM A
>  pgbench IO   WriteXLog
> END;
>  pgbench IO   ExtendDataBlock
> copy pgbench_account
>
> --with ./pgbench -T 600 -S postgres (select only)
> application_namewait_event_typewait_event  query
>  pgbench IO   ReadDataBlock
>  SELECT abalance FROM
>
> Attached excel with all IO event values.
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>
>
>


-- 
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] wait events for disk I/O

2017-03-08 Thread Rajkumar Raghuwanshi
On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia 
wrote:

> I am attaching  another version of the patch, as I found stupid mistake
> in the earlier version of patch, where I missed to initialize initial
> value to
> WaitEventIO enum. Also earlier version was not getting cleanly apply on
> the current version of sources.
>

I have applied attached patch, set shared_buffers to 128kB and ran pgbench,
I am able to see below distinct IO events.

--with ./pgbench -i -s 500 postgres
application_namewait_event_typewait_event  query
 pgbench  IO  ExtendDataBlock
copy pgbench_account
 pgbench  IO  WriteXLog
copy pgbench_account
 pgbench  IO  WriteDataBlock
 copy pgbench_account
 pgbench  IO  ReadDataBlock
 vacuum analyze pgben
 pgbench  IO  ReadBuffile
   alter table pgbench_

--with ./pgbench -T 600 postgres (readwrite)
application_namewait_event_typewait_event  query
 pgbench IO   ReadDataBlock
 UPDATE pgbench_accou
 pgbench IO   WriteDataBlock
UPDATE pgbench_telle
IO   SyncDataBlock

 pgbench IO   SyncDataBlock
 UPDATE pgbench_telle
IO   SyncDataBlock
 autovacuum: VACUUM A
 pgbench IO   WriteXLog
END;
 pgbench IO   ExtendDataBlock
copy pgbench_account

--with ./pgbench -T 600 -S postgres (select only)
application_namewait_event_typewait_event  query
 pgbench IO   ReadDataBlock
 SELECT abalance FROM

Attached excel with all IO event values.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


wait_event_for_disk_io.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-08 Thread Rushabh Lathia
On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas  wrote:

> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas 
> wrote:
> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila 
> wrote:
> >>> Sure, if you think both Writes and Reads at OS level can have some
> >>> chance of blocking in obscure cases, then we should add a wait event
> >>> for them.
> >>
> >> I think writes have a chance of blocking in cases even in cases that
> >> are not very obscure at all.
> >
> > Point taken for writes, but I think in general we should have some
> > criteria based on which we can decide whether to have a wait event for
> > a particular call. It should not happen that we have tons of wait
> > events and out of which, only a few are helpful in most of the cases
> > in real-world scenarios.
>
> Well, the problem is that if you pick and choose which wait events to
> add based on what you think will be common, you're actually kind of
> hosing yourself. Because now when something uncommon happens, suddenly
> you don't get any wait event data and you can't tell what's happening.
> I think the number of new wait events added by Rushabh's patch is
> wholly reasonable.  Yeah, some of those are going to be a lot more
> common than others, but so what?  We add wait events so that we can
> find out what's going on.  I don't want to sometimes know when a
> backend is blocked on an I/O.  I want to ALWAYS know.
>
>
Yes, I agree with Robert. Knowing what we want and what we don't
want is difficult to judge. Something which we might think its not useful
information, and later of end up into situation where we re-think about
adding those missing stuff is not good. Having more information about
the system, specially for monitoring purpose is always good.

I am attaching  another version of the patch, as I found stupid mistake
in the earlier version of patch, where I missed to initialize initial value
to
WaitEventIO enum. Also earlier version was not getting cleanly apply on
the current version of sources.



-- 
Rushabh Lathia
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


wait_event_for_disk_IO_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-07 Thread Robert Haas
On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila  wrote:
> On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas  wrote:
>> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila  wrote:
>>> Sure, if you think both Writes and Reads at OS level can have some
>>> chance of blocking in obscure cases, then we should add a wait event
>>> for them.
>>
>> I think writes have a chance of blocking in cases even in cases that
>> are not very obscure at all.
>
> Point taken for writes, but I think in general we should have some
> criteria based on which we can decide whether to have a wait event for
> a particular call. It should not happen that we have tons of wait
> events and out of which, only a few are helpful in most of the cases
> in real-world scenarios.

Well, the problem is that if you pick and choose which wait events to
add based on what you think will be common, you're actually kind of
hosing yourself. Because now when something uncommon happens, suddenly
you don't get any wait event data and you can't tell what's happening.
I think the number of new wait events added by Rushabh's patch is
wholly reasonable.  Yeah, some of those are going to be a lot more
common than others, but so what?  We add wait events so that we can
find out what's going on.  I don't want to sometimes know when a
backend is blocked on an I/O.  I want to ALWAYS know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-07 Thread Amit Kapila
On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas  wrote:
> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila  wrote:
>> Sure, if you think both Writes and Reads at OS level can have some
>> chance of blocking in obscure cases, then we should add a wait event
>> for them.
>
> I think writes have a chance of blocking in cases even in cases that
> are not very obscure at all.
>

Point taken for writes, but I think in general we should have some
criteria based on which we can decide whether to have a wait event for
a particular call. It should not happen that we have tons of wait
events and out of which, only a few are helpful in most of the cases
in real-world scenarios.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-07 Thread Robert Haas
On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila  wrote:
> Sure, if you think both Writes and Reads at OS level can have some
> chance of blocking in obscure cases, then we should add a wait event
> for them.

I think writes have a chance of blocking in cases even in cases that
are not very obscure at all.

>> I think somebody should try a pgbench run with this patch applied,
>> using a scale factor greater than shared_buffers, and generate a wait
>> event profile, just to see if these are showing up and how often.
>
> Yeah, that makes sense to me and we should try for both read-write and
> read-only tests.

wfm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-06 Thread Amit Kapila
On Tue, Mar 7, 2017 at 5:27 AM, Robert Haas  wrote:
> On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia  
> wrote:
>> Yes, I thought of adding wait event only for the sync but then recording the
>> wait event for both write and sync. I understand that OS level writes are
>> cheap but it still have some cost attached to that. Also I thought for the
>> monitoring tool being develop using this wait events, will have more useful
>> capture data if we try to collect as much info as we can. Or may be not.
>>
>> I am open for other opinion/suggestions.
>
> Writes are NOT always fast.  I've seen cases of write() blocking for
> LONG periods of time on systems that are in the process of failing, or
> just busy.  So I think we certainly want to advertise a wait event for
> those.
>

Sure, if you think both Writes and Reads at OS level can have some
chance of blocking in obscure cases, then we should add a wait event
for them.

> Likewise, I agree that the prefetch call probably SHOULDN'T block, but
> just because it shouldn't doesn't mean it never will.
>
> I think somebody should try a pgbench run with this patch applied,
> using a scale factor greater than shared_buffers, and generate a wait
> event profile, just to see if these are showing up and how often.
>

Yeah, that makes sense to me and we should try for both read-write and
read-only tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-06 Thread Michael Paquier
On Tue, Mar 7, 2017 at 8:57 AM, Robert Haas  wrote:
> On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia  
> wrote:
>> Yes, I thought of adding wait event only for the sync but then recording the
>> wait event for both write and sync. I understand that OS level writes are
>> cheap but it still have some cost attached to that. Also I thought for the
>> monitoring tool being develop using this wait events, will have more useful
>> capture data if we try to collect as much info as we can. Or may be not.
>>
>> I am open for other opinion/suggestions.
>
> Writes are NOT always fast.  I've seen cases of write() blocking for
> LONG periods of time on systems that are in the process of failing, or
> just busy.  So I think we certainly want to advertise a wait event for
> those.

+1. I see that quite often really happens, and a lot particularly on
overloaded VMs. Having to debug such systems until you notice that
what is slow is just the underlying system is no fun. So being able to
show up an event state in memory where we can say that things are
stuck on I/O just with PG interface can really help in diagnostics
without having to look at storage-level logs for a first impression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-06 Thread Robert Haas
On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia  wrote:
> Yes, I thought of adding wait event only for the sync but then recording the
> wait event for both write and sync. I understand that OS level writes are
> cheap but it still have some cost attached to that. Also I thought for the
> monitoring tool being develop using this wait events, will have more useful
> capture data if we try to collect as much info as we can. Or may be not.
>
> I am open for other opinion/suggestions.

Writes are NOT always fast.  I've seen cases of write() blocking for
LONG periods of time on systems that are in the process of failing, or
just busy.  So I think we certainly want to advertise a wait event for
those.

Likewise, I agree that the prefetch call probably SHOULDN'T block, but
just because it shouldn't doesn't mean it never will.

I think somebody should try a pgbench run with this patch applied,
using a scale factor greater than shared_buffers, and generate a wait
event profile, just to see if these are showing up and how often.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-03-06 Thread Rushabh Lathia
On Sat, Mar 4, 2017 at 7:53 PM, Amit Kapila  wrote:

> On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia
>  wrote:
> >
> > My colleague Rahila reported compilation issue with
> > the patch. Issue was only coming with we do the clean
> > build on the branch.
> >
> > Fixed the same into latest version of patch.
> >
>
> Few assorted comments:
>
> 1.
> +
> + WriteBuffile
> + Waiting during
> buffile read operation.
> +
>
> Operation name and definition are not matching.
>
>
Will fix this.


>
> 2.
> +FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info)
>  {
>  #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
>   int returnCode;
> @@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount)
>   if (returnCode < 0)
>   return returnCode;
>
> + pgstat_report_wait_start(wait_event_info);
>   returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
> POSIX_FADV_WILLNEED);
> + pgstat_report_wait_end();
>
> AFAIK, this call is non-blocking and will just initiate a read, so do
> you think we should record wait event for such a call.
>
>
Yes, prefecth call is a non-blocking and will just initiate a read. But
this info
about the prefetch into wait events will give more info about the system.

3.
> - written = FileWrite(src->vfd, waldata_start, len);
> + written = FileWrite(src->vfd, waldata_start, len,
> + WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK);
>   if (written != len)
>   ereport(ERROR,
>   (errcode_for_file_access(),
> @@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state)
>   hash_seq_init(_status, state->rs_logical_mappings);
>   while ((src = (RewriteMappingFile *) hash_seq_search(_status)) !=
> NULL)
>   {
> - if (FileSync(src->vfd) != 0)
> + if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0)
>
>
> Do we want to consider recording wait event for both write and sync?
> It seems to me OS level writes are relatively cheap and sync calls are
> expensive, so shouldn't we just log for sync calls?  I could see the
> similar usage at multiple places in the patch.
>
>
Yes, I thought of adding wait event only for the sync but then recording the
wait event for both write and sync. I understand that OS level writes are
cheap but it still have some cost attached to that. Also I thought for the
monitoring tool being develop using this wait events, will have more useful
capture data if we try to collect as much info as we can. Or may be not.

I am open for other opinion/suggestions.


--
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia


Re: [HACKERS] wait events for disk I/O

2017-03-04 Thread Amit Kapila
On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia
 wrote:
>
> My colleague Rahila reported compilation issue with
> the patch. Issue was only coming with we do the clean
> build on the branch.
>
> Fixed the same into latest version of patch.
>

Few assorted comments:

1.
+
+ WriteBuffile
+ Waiting during
buffile read operation.
+

Operation name and definition are not matching.


2.
+FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info)
 {
 #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
  int returnCode;
@@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount)
  if (returnCode < 0)
  return returnCode;

+ pgstat_report_wait_start(wait_event_info);
  returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
POSIX_FADV_WILLNEED);
+ pgstat_report_wait_end();

AFAIK, this call is non-blocking and will just initiate a read, so do
you think we should record wait event for such a call.

3.
- written = FileWrite(src->vfd, waldata_start, len);
+ written = FileWrite(src->vfd, waldata_start, len,
+ WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK);
  if (written != len)
  ereport(ERROR,
  (errcode_for_file_access(),
@@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state)
  hash_seq_init(_status, state->rs_logical_mappings);
  while ((src = (RewriteMappingFile *) hash_seq_search(_status)) != NULL)
  {
- if (FileSync(src->vfd) != 0)
+ if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0)


Do we want to consider recording wait event for both write and sync?
It seems to me OS level writes are relatively cheap and sync calls are
expensive, so shouldn't we just log for sync calls?  I could see the
similar usage at multiple places in the patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-02-20 Thread Rushabh Lathia
My colleague Rahila reported compilation issue with
the patch. Issue was only coming with we do the clean
build on the branch.

Fixed the same into latest version of patch.

Thanks,


On Tue, Jan 31, 2017 at 11:09 AM, Rushabh Lathia 
wrote:

>
>
> On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia
>>  wrote:
>> > Attached is the patch, which extend the existing wait event
>> infrastructure
>> > to implement the wait events for the disk I/O. Basically
>> pg_stat_activity's
>> > wait event information to show data about disk I/O as well as IPC
>> primitives.
>> >
>> > Implementation details:
>> >
>> > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO
>> > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch,
>> > FileWriteback, FileSync, and FileTruncate. Set this wait event just
>> before
>> > performing the file system operation and clear it just after.
>> > - Pass down an appropriate wait event from  caller of any of those
>> > functions.
>> > - Also set and clear a wait event around standalone calls to read(),
>> > write(), fsync() in other parts of the system.
>> > - Added documentation for all newly added wait event.
>>
>> Looks neat, those are unlikely to overlap with other wait events.
>>
>
> Thanks.
>
>
>>
>> > Open issue:
>> > - Might missed few standalone calls to read(), write(), etc which need
>> > to pass the wait_event_info.
>>
>> It may be an idea to use LD_PRELOAD with custom versions of read(),
>> write() and fsync(), and look at the paths where no flags are set in
>> MyProc->wait_event_info, and log information when that happens.
>>
>>
> Yes, may be I will try this.
>
>
>> > Thanks to my colleague Robert Haas for his help in design.
>> > Please let me know your thought, and thanks for reading.
>>
>> Did you consider a wrapper of the type pg_read_event() or
>> pg_write_event(), etc.?
>>
>
> I thought on that, but eventually stick to this approach as it looks
> more neat and uniform with other wait event implementation.
>
>
>
>> --
>> Michael
>>
>
>
>
> Thanks,
> Rushabh Lathia
> www.EnterpriseDB.com
>



-- 
Rushabh Lathia


wait_event_for_disk_IO_v1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wait events for disk I/O

2017-01-30 Thread Rushabh Lathia
On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier 
wrote:

> On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia
>  wrote:
> > Attached is the patch, which extend the existing wait event
> infrastructure
> > to implement the wait events for the disk I/O. Basically
> pg_stat_activity's
> > wait event information to show data about disk I/O as well as IPC
> primitives.
> >
> > Implementation details:
> >
> > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO
> > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch,
> > FileWriteback, FileSync, and FileTruncate. Set this wait event just
> before
> > performing the file system operation and clear it just after.
> > - Pass down an appropriate wait event from  caller of any of those
> > functions.
> > - Also set and clear a wait event around standalone calls to read(),
> > write(), fsync() in other parts of the system.
> > - Added documentation for all newly added wait event.
>
> Looks neat, those are unlikely to overlap with other wait events.
>

Thanks.


>
> > Open issue:
> > - Might missed few standalone calls to read(), write(), etc which need
> > to pass the wait_event_info.
>
> It may be an idea to use LD_PRELOAD with custom versions of read(),
> write() and fsync(), and look at the paths where no flags are set in
> MyProc->wait_event_info, and log information when that happens.
>
>
Yes, may be I will try this.


> > Thanks to my colleague Robert Haas for his help in design.
> > Please let me know your thought, and thanks for reading.
>
> Did you consider a wrapper of the type pg_read_event() or
> pg_write_event(), etc.?
>

I thought on that, but eventually stick to this approach as it looks
more neat and uniform with other wait event implementation.



> --
> Michael
>



Thanks,
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] wait events for disk I/O

2017-01-30 Thread Michael Paquier
On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia
 wrote:
> Attached is the patch, which extend the existing wait event infrastructure
> to implement the wait events for the disk I/O. Basically pg_stat_activity's
> wait event information to show data about disk I/O as well as IPC primitives.
>
> Implementation details:
>
> - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO
> - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch,
> FileWriteback, FileSync, and FileTruncate. Set this wait event just before
> performing the file system operation and clear it just after.
> - Pass down an appropriate wait event from  caller of any of those
> functions.
> - Also set and clear a wait event around standalone calls to read(),
> write(), fsync() in other parts of the system.
> - Added documentation for all newly added wait event.

Looks neat, those are unlikely to overlap with other wait events.

> Open issue:
> - Might missed few standalone calls to read(), write(), etc which need
> to pass the wait_event_info.

It may be an idea to use LD_PRELOAD with custom versions of read(),
write() and fsync(), and look at the paths where no flags are set in
MyProc->wait_event_info, and log information when that happens.

> Thanks to my colleague Robert Haas for his help in design.
> Please let me know your thought, and thanks for reading.

Did you consider a wrapper of the type pg_read_event() or
pg_write_event(), etc.?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers