On Thu, Oct 9, 2025 at 3:09 PM Srinath Reddy Sadipiralla
<[email protected]> wrote:
> just a second late :( i was about to post a patch addressing the refactors 
> which Robert mentioned  ,anyway will have a look at your latest patch John 
> thanks :), curious about the tap test.
>
> while i was writing the patch something suddenly struck me , that is why we 
> are even depending on last_common_segno ,because once we reached 
> decide_wal_file_action it means that the file exists in both target and 
> source ,AFAIK this can only happen with wal segments older than or equal to 
> last_common_segno because once the promotion competes the filename of the WAL 
> files gets changed with the new timelineID(2), for ex: if the 
> last_common_segno is 000000010000000000000003 then based on the rules in 
> XLogInitNewTimeline
> 1) if the timeline switch happens in middle of segment ,copy data from the 
> last WAL segment and create WAL file with same segno but different 
> timelineID,in this case the starting WAL file for the new timeline will be 
> 000000020000000000000003
> 2) if the timeline switch happens at segment boundary , just create next 
> segment for this case the starting WAL file for the new timeline will be 
> 000000020000000000000004
>
> so basically the files which exists in source and not in target like the new 
> timeline WAL segments will be copied to target in total before we reach 
> decide_wal_file_action , so i think we don't need to think about copying WAL 
> files after divergence point by calculating and checking against 
> last_common_segno which we are doing in our current approach , i think we can 
> just do

What makes me nervous about this is that it isn't necessarily the case
that the servers were perfectly in sync at the time of the failure.
Suppose that the primary was in the middle of writing
000000010000000000000003. The standby might also have this file, but
it might contain less valid data than the one on the primary;
therefore, if we don't copy the file, the two servers might not have
an identical file. Maybe that wouldn't really matter, in the sense
that the extra valid data that exists on the original primary
shouldn't prevent it from replaying WAL on the new primary's timeline,
which is probably all we really care about. But it feels dangerous to
me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to