>> On Thu, 2026-01-15 at 14:14 +0800, [email protected] wrote: >> Now if we run pg_rewind on server A, it examines the local WAL to find all >> the blocks >> that were modified after the last common checkpoint (which happened in step >> 3 above). >> If neither wal_log_hints = on nor checksums are enabled (which effectively >> forces >> WAL-logging hint bit changes), there is no track of step 5 in the WAL, and >> pg_rewind >> fails to copy that block from server B. The consequence is that after >> pg_rewind, the >> row is *still* visible on server A because of the hint bits. That is data >> corruption. >> Therefore, the requirement cannot be relaxed.
>Currently pg_rewind search wal start at checkpoint lsn or redo lsn, I mean to >search more >wal to cover whole releated transactions so any releated pages with copyed, >and we never >warried about hint bits issue. Base on the discussion I write a patch and introduce it: Currently pg_rewind search checkpoint start at divergerec and walk backward. Then it collect change pages from checkpoint to divergerec forward. We modify the second step and collect the minimal commited transaction id and named min_commited_xid. And collect the 'first appeared' transaction id by XLOG_RUNNING_XACTS wal record and named base_xid. If base_xid <= min_commited_xid we can work a safy rewind. How ever if we can not met 'base_xid <= min_commited_xid' then we read wal from checkpoint and walk backward until we met the goal, ofcause we collect change pages during the third step. If we can not met the goal at last, we report an error for can not finish. The third step maybe slowly so I add a option(-d or --deep-dig), by default it stop if can not met the goal at the second step. And user should add -d to run the third step. Patch attached. ---- Best Regards, Movead Li
0001-Enable-pg_rewind-without-page-consistence.patch
Description: Binary data
