Hi,

On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath.rupireddyforpostg...@gmail.com> wrote:

On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostg...@gmail.com> wrote:

I'm also wondering if it would make sense to extend the test coverage of it (and 
pg_waldump) to "validate" that both
extracted images are the same and matches the one modified right after the 
checkpoint.

What do you think? (could be done later in another patch though).

I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.

I'm attaching the v2 patch for further review.

[1]
SELECT * FROM page_header(:'page_from_table');
     lsn    | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
  0/1891D78 |        0 |     0 |    40 |  8064 |    8192 |     8192 |
     4 |         0
(1 row)

SELECT * FROM page_header(:'page_from_wal');
     lsn    | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
  0/1891D78 |        0 |     0 |    44 |  8032 |    8192 |     8192 |
     4 |       735
(1 row)

Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.

I took a stab at how and what gets logged as FPI in WAL records:

Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.

Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo)  just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.

My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.

I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.


Thanks for updating the patch!

+-- Compare FPI from WAL record and page from table, they must be same

I think "must be the same" or "must be identical" sounds better (but not 100% 
sure).

Except this nit, V4 looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to