On 06/19/2017 10:30 AM, Andres Freund wrote:
Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
weird enough to be interesting.  What he'd observed was that he promoted
some PITR standby, and early clones of that node work, but later clones
did not, failing to read some segment.

The problems turns out to be the following:  [explanation]

Good detective work!

The minimal fix here is presumably not to use XLByteToPrevSeg() in
RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
serves here - I don't think it's ever needed.

Agreed, I don't see a reason for it either.

There seems to be a larger question ehre though: Why does
XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
at that period?  That seems like a bad idea, especially in more
complicated scenarios where some precursor timeline might live for
longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
which timeline a segment ought to come from, based on the historY?

Yeah. I've had that thought for years as well, but there has never been any pressing reason to bite the bullet and rewrite it, so I haven't gotten around to it.

- Heikki

