Hi, On 2018-07-15 18:48:43 -0400, Tom Lane wrote: > So basically, WAL replay hits an error while holding a buffer pin, and > nothing is done to release the buffer pin, but AtProcExit_Buffers thinks > something should have been done.
I think there's a few other cases where we hit this. I've seen something similar from inside checkpointer / BufferSync(). I'd be surprised if bgwriter couldn't be triggered into the same. > What seems like a better answer for now is to adjust AtProcExit_Buffers > so that it doesn't cause an assertion failure in this case. I think we > can define "this case" as being "failure exit from the startup process": > if that happens, the postmaster is going to throw up its hands and force > a panic shutdown anyway, so the failure to free a buffer pin shouldn't > have any serious consequences. > The attached one-liner patch does that, and is enough to get through > Michael's test case without an assertion. This is just proof of concept > though --- my inclination, if we go this route, is to make a slightly > longer patch that would fix CheckForBufferLeaks to still print the WARNING > messages if any, but not die with an assertion afterwards. > > Another question is whether this is really a sufficient band-aid. It > looks to me like AtProcExit_Buffers will be called in any auxiliary > process type, not only the startup process. We could just replace the Assert() with a PANIC? > Do, or should we, force a panic restart for nonzero-exit-code failures > of all aux process types? If not, what are we going to do to clean up > similar failures in other aux process types? I'm pretty sure that we do *not* force a panic on all nonzero-exit-code cases for other subprocesses. > BTW, this assertion has been there since fe548629; before that, there > was code that would release any leaked buffer pins, relying on the > PrivateRefCount data for that. So another idea is to restore some > version of that capability, although I think we really really don't > wanna scan the PrivateRefCount array if we can avoid it. I don't think scanning PrivateRefCount would be particularly problematic - in almost all cases it's going to be tiny. These day it's not NBuffers sized anymore, so I can't forsee any performance problems? I think we could invent something like like enter/leave "transactional mode" wherein we throw a PANIC when a buffer leaked. Processes that don't enter it - like startup, checkpointer, etc - would just WARN? Greetings, Andres Freund