On 06/06/18 04:04, Michael Paquier wrote: > On Tue, Jun 05, 2018 at 01:00:30PM +0200, Petr Jelinek wrote: >> I didn't say anything about CreateDecodingContext though. I am talking >> about the fact that we then use the same variable as input to >> XLogReadRecord later in the logical slot code path. The whole point of >> having restart_lsn for logical slot is to have correct start point for >> XLogReadRecord so using the confirmed_flush there is wrong (and has been >> wrong since the original commit). > > OK, I finally see the point you are coming at after a couple of hours > brainstorming about it, and indeed using confirmed_lsn is logically > incorrect so the current code is inconsistent with what > DecodingContextFindStartpoint() does, so we rely on restart_lsn to scan > for all the records and the decoder state is here to make sure that the > slot's confirmed_lsn is updated to a consistent position. I have added > your feedback in the attached (indented and such), which results in some > simplifications with a couple of routines. > > I am attaching as well the patch I sent yesterday. 0001 is candidate > for a back-patch, 0002 is for HEAD to fix the slot advance stuff. > > There is another open item related to slot advancing here: > https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru > And per my tests the patch is solving this item as well. I have just > used the test mentioned in the thread which: > 1) creates a slot > 2) does some activity to generate a couple of WAL pages > 3) advances the slot at page boundary > 4) Moves again the slot. > This test crashes on HEAD at step 4, and not with the attached. > > What do you think?
Seems reasonable to me. I think the only thing to note about the patches from my side is that we probably don't want to default to restart_lsn for the pg_logical_replication_slot_advance() return value (when nothing was done) but rather the confirmed_lsn. As it is in current patch if we call the function repeatedly and one call moved slot forward but the next one didn't the return value will go backwards as restart_lsn tends to be behind the confirmed one. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services