On Fri, Apr 6, 2018 at 11:34 AM, Andrew Gierth <and...@tao11.riddles.org.uk> wrote: >>>>>> "Thomas" == Thomas Munro <thomas.mu...@enterprisedb.com> writes: > > >> As far as I can tell from reading the code, if a checkpoint fails the > >> checkpointer is supposed to keep all the outstanding fsync requests for > >> next time. Am I wrong, or is there some failure in the logic to do this? > > Thomas> Yikes. I think this is suspicious: > > Yes, tracing through a checkpoint shows that this is clearly wrong. > > Thomas> Why is it OK to unlink the bitmapset? We still need its > Thomas> contents, in the case that the fsync fails! > > Right. > > But I don't think just copying the value is sufficient; if a new bit was > set while we were processing the old ones, how would we know which to > clear? We couldn't just clear all the bits afterwards because then we > might lose a request.
Agreed. The attached draft patch handles that correctly, I think. -- Thomas Munro http://www.enterprisedb.com
draft.patch
Description: Binary data