On 2019-01-29 11:25:41 -0800, Andres Freund wrote: > While chatting with Robert about this issue I came across the following > section of code: > > /* > * If the FSM knows nothing of the rel, try the last page > before we > * give up and extend. This avoids one-tuple-per-page syndrome > during > * bootstrapping or in a recently-started system. > */ > if (targetBlock == InvalidBlockNumber) > { > BlockNumber nblocks = > RelationGetNumberOfBlocks(relation); > > if (nblocks > 0) > targetBlock = nblocks - 1; > } > > > I think that explains the issue (albeit not why it is much more frequent > on BSDs). Because we're not going through the FSM, it's perfectly > possible to find a page that is uninitialized, *and* is not yet in the > FSM. The only reason this wasn't previously actively broken, I think, is > that while we previously *also* looked that page (before the extending > backend acquired a lock!), when looking at the page > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free > space because it just interprets the zeroes in pd_upper - pd_lower as no > free space.
FWIW, after commenting out that block and adapting a few regression tests to changed plans, I could not reproduce the issue on a FreeBSD machine in 31 runs, where it previously triggered in roughly 1/3 cases. Still don't quite understand why so much more likely on BSD... Greetings, Andres Freund