On 21 September 2017 at 16:56, Michael Paquier <michael.paqu...@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier > > <michael.paqu...@gmail.com> wrote: > >> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: > >>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive > >>> backup has been introduced, so we should back-patch to the all > >>> supported versions. > >> > >> There is a typo here right? Non-exclusive backups have been introduced > >> in 9.6. Why would a back-patch further down be needed? > > > > I think the non-exclusive backups infrastructure has been introduced > > in 9.1 for pg_basebackup. I've not checked yet that it can be > > reproduced using pg_basebackup in PG9.1 but I thought it could happen > > as far as I checked the code. > > Yep, but the deficiency is caused by the use before_shmem_exit() in > the SQL-level functions present in 9.6~ which make the cleanup happen > potentially twice. If you look at the code of basebackup.c, you would > notice that the cleanups of the counters only happen within the > PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be > enough. > > >> +- Assert(XLogCtl->Insert.nonExclusiveBackups >= 0); > >> + if (XLogCtl->Insert.nonExclusiveBackups > 0) > >> + XLogCtl->Insert.nonExclusiveBackups--; > >> Hm, no, I don't agree. I think that instead you should just leave > >> do_pg_abort_backup() immediately if sessionBackupState is set to > >> SESSION_BACKUP_NONE. This variable is the link between the global > >> counters and the session stopping the backup so I don't think that we > >> should touch this assertion of this counter. I think that this method > >> would be safe as well for backup start as pg_start_backup_callback > >> takes care of any cleanup. Also because the counters are incremented > >> before entering in the PG_ENSURE_ERROR_CLEANUP block, and > >> sessionBackupState is updated just after leaving the block. > > > > I think that the assertion failure still can happen if the process > > aborts after decremented the counter and before setting to > > SESSION_BACKUP_NONE. Am I missing something? > > The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger > the cleanup at this moment. So this happens when waiting for the > archives to be done, and the session flag is set to NONE at this > point. > Another one to watch out for is that elog(...) and ereport(...) invoke CHECK_FOR_INTERRUPTS. That's given me exciting surprises before when combined with assertion checking and various exit cleanup hooks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services