On Sat, Dec 3, 2022 at 12:45 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Fri, Dec 02, 2022 at 12:11:35PM +0530, Bharath Rupireddy wrote: > > On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart <nathandboss...@gmail.com> > > wrote: > >> The test appears to reliably create snapshot and mapping files, so if the > >> directories are empty at some point after the checkpoint at the end, we can > >> be reasonably certain the custodian took action. I didn't add explicit > >> checks that there are files in the directories before the checkpoint > >> because a concurrent checkpoint could make such checks unreliable. > > > > I think you're right. I added sqls to see if the snapshot and mapping > > files count > 0, see [1] and the cirrus-ci members are happy too - > > https://github.com/BRupireddy/postgres/tree/custodian_review_2. I > > think we can consider adding these count > 0 checks to tests. > > My worry about adding "count > 0" checks is that a concurrent checkpoint > could make them unreliable. In other words, those checks might ordinarily > work, but if an automatic checkpoint causes the files be cleaned up just > beforehand, they will fail.
Hm. It would have been better with a TAP test module for testing the custodian code reliably. Anyway, that mustn't stop the patch getting in. If required, we can park the TAP test module for later - IMO. Others may have different thoughts here. > > Having said above, I'm okay to process ShutdownRequestPending as early > > as possible, however, should we also add CHECK_FOR_INTERRUPTS() > > alongside ShutdownRequestPending? > > I'm not seeing a need for CHECK_FOR_INTERRUPTS. Do you see one? Since the custodian has SignalHandlerForShutdownRequest as SIGINT and SIGTERM handlers, unlike StatementCancelHandler and die respectively, no need of CFI I guess. And also none of the CFI signal handler flags applies to the custodian. > > While thinking about this, one thing that really struck me is what > > happens if we let the custodian exit, say after processing > > ShutdownRequestPending immediately or after a restart, leaving other > > queued tasks? The custodian will never get to work on those tasks > > unless the requestors (say checkpoint or some other process) requests > > it to do so after restart. Maybe, we don't need to worry about it. > > Maybe we need to worry about it. Maybe it's an overkill to save the > > custodian's task state to disk so that it can come up and do the > > leftover tasks upon restart. > > Yes, tasks will need to be retried when the server starts again. The ones > in this patch set should be requested again during the next checkpoint. > Temporary file cleanup would always be requested during server start, so > that should be handled as well. Even today, the server might abruptly shut > down while executing these tasks, and we don't have any special handling > for that. Right. The v18 patch set posted upthread https://www.postgresql.org/message-id/20221201214026.GA1799688%40nathanxps13 looks good to me. I see the CF entry is marked RfC - https://commitfest.postgresql.org/41/3448/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com