On Sun, Jun 11, 2023 at 11:07 PM torikoshia <torikos...@oss.nttdata.com> wrote: > > On 2023-06-06 03:26, James Coleman wrote: > > On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikos...@oss.nttdata.com> > > wrote: > >> > >> On 2023-06-03 02:51, James Coleman wrote: > >> > Hello, > >> > > >> > Thanks for working on this patch! > > > > Sure thing! I'm *very interested* in seeing this available, and I > > think it paves the way for some additional features later on... > > > >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikos...@oss.nttdata.com> > >> ... > >> > To put it positively: we believe that, for example, catalog accesses > >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > >> > an existing valid transaction/query state, as it would be for this > >> > patch -- are safe. If there were problems, then those problems are > >> > likely bugs we already have in other CFI cases. > >> > >> Thanks a lot for the discussion and sharing it! > >> I really appreciate it. > >> > >> BTW I'm not sure whether all the CFI are called in valid transaction, > >> do you think we should check each of them? > > > > I kicked off the regressions tests with a call to > > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() > > call. Several hours and 52 GB of logs later I have confirmed that > > (with the attached revision) at the very least the regression test > > suite can't trigger any kind of failures regardless of when we trigger > > this. The existing code in the patch for only running the explain when > > there's an active query handling that. > > Thanks for the testing! > > > I've attached v27. The important change here in 0001 is that it > > guarantees the interrupt handler is re-entrant, since that was a bug > > exposed by my testing. I've also included 0002 which is only meant for > > testing (it attempts to log in the plan in every > > CHECK_FOR_INTERRUPTS() call). > > When SIGINT is sent during ProcessLogQueryPlanInterrupt(), > ProcessLogQueryPlanInterruptActive can remain true. > After that, pg_log_query_plan() does nothing but just returns. > > AFAIU, v26 logs plan for each pg_log_query_plan() even when another > pg_log_query_plan() is running, but it doesn't cause errors or critical > problem. > > Considering the problem solved and introduced by v27, I think v26 might > be fine. > How do you think?
The testing I did with calling this during every CFI is what uncovered the re-entrancy problem. IIRC (without running that test again) the problem was a stack overflow. Now, to be sure this is a particularly degenerate case because in real-world usage it'd be impossible in practice, I think, to trigger that many calls to this function (and by extension the interrupt handler). If SIGINT is the only concern we could reset ProcessLogQueryPlanInterruptActive in error handling code. I admit that part of my thought process here is thinking ahead to an additional patch I'd like to see on top of this, which is logging a query plan before cleaning up when statement timeout occurs. The re-entrancy issue becomes more interesting then, I think, since we would then have automated calling of the logging code. BTW: I'd thought that would make a nice follow-up patch for this, but if you'd prefer I could add it as another patch in the series here. What do you think about resetting the flag versus just not having it? Regards, James Coleman