On Wed, Dec 9, 2015 at 10:41 AM, Andres Freund <and...@anarazel.de> wrote: >> (I am glad you talked the author out of back-patching; otherwise, >> 9.4.5 and 9.3.10 would have introduced a data loss bug.) > > Isn't that a bug in a, as far as we know, impossible scenario? Unless I > miss something there's no known case where it's "expected" that > find_multixact_start() fails after initially succeeding? Sure, it sucks > that the bug survived review and that it was written in the first > place. But it not showing up during testing isn't meaningful, given it's > a should-never-happen scenario.
If I correctly understand the scenario that you are describing, that does happen - not for the same MXID, but for different ones. At least the last time I checked, and I'm not sure if we've fixed this, it could happen because the SLRU page that contains the multixact wasn't flushed out of the SLRU buffers yet. But apart from that, it could happen any time there's a gap in the sequence of files, and that sure doesn't seem like a can't-happen situation. We know that, on 9.3, there's definitely a sequence of events that leads to a 0000 file followed by a gap followed by the series of files that are still live. Given the number of other bugs we've fixed in this area, I would not like to bet on that being the only scenario where this crops up. It *shouldn't* happen, and as far as we know, if you start and end on a version newer than 4f627f8 and aa29c1c, it won't. Older branches, though, I wouldn't like to bet on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers