On Tue, May 18, 2021 at 12:22 PM Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
> And finally I think I could reach the situation the commit wanted to fix. > > I took a basebackup from a standby just before replaying the first > checkpoint of the new timeline (by using debugger), without copying > pg_wal. In this backup, the control file contains checkPointCopy of > the previous timeline. > > I modified StartXLOG so that expectedTLEs is set just after first > determining recoveryTargetTLI, then started the grandchild node. I > have the following error and the server fails to continue replication. > [postmaster] LOG: starting PostgreSQL 14beta1 on x86_64-pc-linux-gnu... > [startup] LOG: database system was interrupted while in recovery at log... > [startup] LOG: set expectedtles tli=6, length=1 > [startup] LOG: Probing history file for TLI=7 > [startup] LOG: entering standby mode > [startup] LOG: scanning segment 3 TLI 6, source 0 > [startup] LOG: Trying fetching history file for TLI=6 > [walreceiver] LOG: fetching timeline history file for timeline 5 from pri... > [walreceiver] LOG: fetching timeline history file for timeline 6 from pri... > [walreceiver] LOG: started streaming ... primary at 0/3000000 on timeline 5 > [walreceiver] DETAIL: End of WAL reached on timeline 5 at 0/30006E0. > [startup] LOG: unexpected timeline ID 1 in log segment > 000000050000000000000003, offset 0 > [startup] LOG: Probing history file for TLI=7 > [startup] LOG: scanning segment 3 TLI 6, source 0 > (repeats forever) So IIUC, this logs shows that "ControlFile->checkPointCopy.ThisTimeLineID" is 6 but "ControlFile->checkPoint" record is on TL 5? I think if you had the old version of the code (before the commit) or below code [1], right after initializing expectedTLEs then you would have hit the FATAL the patch had fix. While debugging did you check what was the "ControlFile->checkPoint" LSN vs the first LSN of the first segment with TL6? expectedTLEs = readTimeLineHistory(recoveryTargetTLI); [1] if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != ControlFile->checkPointCopy.ThisTimeLineID) { report(FATAL.. } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com