On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> While chatting with Robert about this issue I came across the following
> section of code:
> 
>               /*
>                * If the FSM knows nothing of the rel, try the last page 
> before we
>                * give up and extend.  This avoids one-tuple-per-page syndrome 
> during
>                * bootstrapping or in a recently-started system.
>                */
>               if (targetBlock == InvalidBlockNumber)
>               {
>                       BlockNumber nblocks = 
> RelationGetNumberOfBlocks(relation);
> 
>                       if (nblocks > 0)
>                               targetBlock = nblocks - 1;
>               }
> 
> 
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs).  Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.

FWIW, after commenting out that block and adapting a few regression
tests to changed plans, I could not reproduce the issue on a FreeBSD
machine in 31 runs, where it previously triggered in roughly 1/3 cases.

Still don't quite understand why so much more likely on BSD...

Greetings,

Andres Freund

Reply via email to