On Wed, Dec 9, 2020 at 6:35 PM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> On 08/12/2020 06:45, Kyotaro Horiguchi wrote: > > At Mon, 7 Dec 2020 20:13:25 +0200, Heikki Linnakangas <hlinn...@iki.fi> > wrote in > >> I think we should fix this properly. I'm not sure if it can lead to a > >> broken cluster, but at least it can cause pg_rewind to fail > >> unnecessarily and in a user-unfriendly way. But this is actually > >> pretty simple to fix. pg_rewind looks at the control file to find out > >> the timeline the server is on. When promotion happens, the startup > >> process updates minRecoveryPoint and minRecoveryPointTLI fields in the > >> control file. We just need to read it from there. Patch attached. > > > > Looks fine to me. A bit concerned about making sourceHistory > > needlessly file-local but on the other hand unifying sourceHistory and > > targetHistory looks better. > > Looking closer, findCommonAncestorTimeline() was freeing sourceHistory, > which was pretty horrible when it's a file-local variable. I changed it > so that both the source and target histories are passed to > findCommonAncestorTimeline() as arguments. That seems more clear. > > > For the test part, that change doesn't necessariry catch the failure > > of the current version, but I *believe* the prevous code is the result > > of an actual failure in the past so the test probablistically (or > > dependently on platforms?) hits the failure if it happned. > > Right. I think the current test coverage is good enough. We've been > bitten by this a few times by now, when we've forgotten to add the > manual checkpoint commands to new tests, and the buildfarm has caught it > pretty quickly. > > >> I think we should also backpatch this. Back in 2015, we decided that > >> we can live with this, but it's always been a bit bogus, and seems > >> simple enough to fix. > > > > I don't think this changes any successful behavior and it just saves > > the failure case so +1 for back-patching. > > Thanks for the review! New patch version attached. > > - Heikki > The patch does not apply successfully http://cfbot.cputube.org/patch_32_2864.log 1 out of 10 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej There is a minor issue therefore I rebase the patch. Please take a look at that. -- Ibrar Ahmed
v3-0001-pg_rewind-Fix-determining-TLI-when-server-was-jus.patch
Description: Binary data