On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote: > On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote: > > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby <pry...@telsasoft.com> wrote: > > > > > > > Hm, I was just wondering what happens if an error happens *during* > > > > update_vacuum_error_cbarg(). It seems like if we set > > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, > > > > then an > > > > error would cause a crash. > > > > > > > > Can't that be avoided if you check if cbarg->indname is non-null in > > vacuum_error_callback as we are already doing for > > VACUUM_ERRCB_PHASE_TRUNCATE? > > > > > > And if we pfree and set indname before phase, it'd > > > > be a problem when going from an index phase to non-index phase. > > > > How is it possible that we move to the non-index phase without > > clearing indname as we always revert back the old phase information? > > The crash scenario I'm trying to avoid would be like statement_timeout or > other > asynchronous event occurring between two non-atomic operations. > > I said that there's an issue no matter what order we set indname/phase; > If we wrote: > |cbarg->indname = indname; > |cbarg->phase = phase; > ..and hit a timeout (or similar) between setting indname=NULL but before > setting phase=VACUUM_INDEX, then we can crash due to null pointer. > > But if we write: > |cbarg->phase = phase; > |if (cbarg->indname) {pfree(cbarg->indname);} > |cbarg->indname = indname ? pstrdup(indname) : NULL; > ..then we can still crash if we timeout between freeing cbarg->indname and > setting it to null, due to acccessing a pfreed allocation.
If "phase" is updated before "indname", I'm able to induce a synthetic crash like this: +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL) +{ +kill(getpid(), SIGINT); +pg_sleep(1); // that's needed since signals are delivered asynchronously +} And another crash if we do this after pfree but before setting indname. +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL) +{ +kill(getpid(), SIGINT); +pg_sleep(1); +} I'm not sure if those are possible outside of "induced" errors. Maybe the function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar? -- Justin