On Wed, Apr 27, 2022 at 8:57 AM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Wed, Apr 27, 2022 at 8:45 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > I wrote: > > > Thomas Munro <thomas.mu...@gmail.com> writes: > > >> BTW If you had your local change from debug.patch (upthread), that'd > > >> defeat the patch. I mean this: > > > > >> + if(!*errormsg) > > >> + *errormsg = "decode_queue_head is null"; > > > > > Oh! Okay, I'll retry without that. > > > > I've now done several runs with your patch and not seen the test failure. > > However, I think we ought to rethink this API a bit rather than just > > apply the patch as-is. Even if it were documented, relying on > > errormsg = NULL to mean something doesn't seem like a great plan. > > Sorry for being late in the game, occupied with other stuff. > > How about using private_data of XLogReaderState for > read_local_xlog_page_no_wait, something like this? > > typedef struct ReadLocalXLogPageNoWaitPrivate > { > bool end_of_wal; > } ReadLocalXLogPageNoWaitPrivate; > > In read_local_xlog_page_no_wait: > > /* If asked, let's not wait for future WAL. */ > if (!wait_for_wal) > { > private_data->end_of_wal = true; > break; > } > > /* > * Opaque data for callbacks to use. Not used by XLogReader. > */ > void *private_data;
I found an easy way to reproduce this consistently (I think on any server): I basically generated huge WAL record (I used a fun extension that I wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can use pg_logical_emit_message as well) session 1: select * from pg_synthesize_wal_record(1*1024*1024); --> generate 1 MB of WAL record first and make a note of the output lsn (lsn1) session 2: select * from pg_get_wal_records_info_till_end_of_wal(lsn1); \watch 1 session 1: select * from pg_synthesize_wal_record(1000*1024*1024); --> generate ~1 GB of WAL record and we see ERROR: could not read WAL at XXXXX in session 2. Delay the checkpoint (set checkpoint_timeout to 1hr) just not recycle the wal files while we run pg_walinspect functions, no other changes required from the default initdb settings on the server. And, Thomas's patch fixes the issue. Regards, Bharath Rupireddy.