On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz <byavu...@gmail.com> wrote: > > > I wanted to inform you that the 73f0a13266 commit changed all WALRead > > calls to read variable bytes, only the WAL receiver was reading > > variable bytes before. > > I want to start working on this again if possible. I will try to > summarize the current status:
Thanks for working on this. > * With the 73f0a13266 commit, the WALRead() function started to read > variable bytes in every case. Before, only the WAL receiver was > reading variable bytes. > > * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were > discussing what we have to do when this is merged. It is decided that > WALReadFromBuffers() does not call pgstat_report_wait_start() because > this function does not perform any IO [1]. We may follow the same > logic by not including these to pg_stat_io? Right. WALReadFromBuffers doesn't do any I/O. Whoever reads WAL from disk (backends, walsenders, recovery process) using pg_pread (XLogPageRead, WALRead) needs to be tracked in pg_stat_io or some other view. If it were to be in pg_stat_io, although we may not be able to distinguish WAL read stats at a backend level (like how many times/bytes a walsender or recovery process or a backend read WAL from disk), but it can help understand overall impact of WAL read I/O at a cluster level. With this approach, the WAL I/O stats are divided up - WAL read I/O and write I/O stats are in pg_stat_io and pg_stat_wal respectively. This makes me think if we need to add WAL read I/O stats also to pg_stat_wal. Then, we can also add WALReadFromBuffers stats hits/misses there. With this approach, pg_stat_wal can be a one-stop view for all the WAL related stats. If needed, we can join info from pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats are emitted to the end-user via pg_stat_io. > * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this > does not block anything related to putting WAL stats in pg_stat_io. > > If I am not missing any new changes, the only problem is reading > variable bytes now. We have discussed a couple of solutions: > > 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document > that this means some variable byte I/O is happening. > > I kind of dislike this solution because if the *only* read I/O is > happening in variable bytes, it will look like write and extend I/Os > are happening in variable bytes as well. As a solution, it could be > documented that only read I/Os could happen in variable bytes for now. Yes, read I/O for relation and WAL can happen in variable bytes. I think this idea seems reasonable and simple yet useful to know the cluster-wide read I/O. However, another point here is how the total number of bytes read is represented with existing pg_stat_io columns 'reads' and 'op_bytes'. It is known now with 'reads' * 'op_bytes', but with variable bytes, how is read bytes calculated? Maybe add new columns read_bytes/write_bytes? > 2- Use op_bytes_[read | write | extend] columns instead of one > op_bytes column, also use the first solution. > > This can solve the first solution's weakness but it introduces two > more columns. This is more future proof compared to the first solution > if there is a chance that some variable I/O could happen in write and > extend calls as well in the future. -1 as more columns impact the readability and usability. > 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of > pg_stat_io. > > pg_stat_io could remain untouchable and we will have flexibility to > edit this new view as much as we want. But the original aim of the > pg_stat_io is evaluating all I/O from a single view and adding a new > view breaks this aim. -1 as it defeats the very purpose of one-stop view pg_stat_io for all kinds of I/O. PS: see my response above about adding both WAL write I/O and read I/O stats to pg_stat_wal. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com