Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Stephen Frost (sfr...@snowman.net) wrote: > > * Michael Paquier (mich...@paquier.xyz) wrote: > > > On Mon, Mar 22, 2021 at 04:07:12PM -0400, Robert Haas wrote: > > > > On Mon, Mar 22, 2021 at 1:48 PM Stephen Frost > > > > wrote: > > > >> Thanks for that. Attached is just a rebased version with a commit > > > >> message added. If there aren't any other concerns, I'll commit this in > > > >> the next few days and back-patch it. When it comes to 12 and older, > > > >> does anyone want to opine about the wait event to use? I was thinking > > > >> PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... > > > > > > > > I'm not sure if we should back-patch this, but I think if you do you > > > > should just add a wait event, rather than using a generic one. > > > > > > I would not back-patch that either, as this is an improvement of the > > > current state. I agree that this had better introduce a new wait > > > event. Even if this stuff gets backpatched, you won't introduce an > > > ABI incompatibility with a new event as long as you add the new event > > > at the end of the existing enum lists, but let's keep the wait events > > > ordered on HEAD. > > > > Adding CFI's in places that really should have them is something we > > certainly have back-patched in the past, and that's just 'an improvement > > of the current state' too, so I don't quite follow the argument being > > made here that this shouldn't be back-patched. > > > > I don't have any problem with adding into the older releases, at the end > > of the existing lists, the same wait event that exists in 13+ for this > > already. > > > > Any other thoughts on this, particularly about back-patching or not..? > > We seem to be at a bit of an impasse on this regarding back-patching, > which seems unfortunate to me, but without someone else commenting it > seems like it's stalled. > > I'll go ahead and push the change to HEAD soon, as there doesn't seem to > be any contention regarding that. Done. Thanks! Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Mon, Mar 22, 2021 at 04:07:12PM -0400, Robert Haas wrote: > > > On Mon, Mar 22, 2021 at 1:48 PM Stephen Frost wrote: > > >> Thanks for that. Attached is just a rebased version with a commit > > >> message added. If there aren't any other concerns, I'll commit this in > > >> the next few days and back-patch it. When it comes to 12 and older, > > >> does anyone want to opine about the wait event to use? I was thinking > > >> PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... > > > > > > I'm not sure if we should back-patch this, but I think if you do you > > > should just add a wait event, rather than using a generic one. > > > > I would not back-patch that either, as this is an improvement of the > > current state. I agree that this had better introduce a new wait > > event. Even if this stuff gets backpatched, you won't introduce an > > ABI incompatibility with a new event as long as you add the new event > > at the end of the existing enum lists, but let's keep the wait events > > ordered on HEAD. > > Adding CFI's in places that really should have them is something we > certainly have back-patched in the past, and that's just 'an improvement > of the current state' too, so I don't quite follow the argument being > made here that this shouldn't be back-patched. > > I don't have any problem with adding into the older releases, at the end > of the existing lists, the same wait event that exists in 13+ for this > already. > > Any other thoughts on this, particularly about back-patching or not..? We seem to be at a bit of an impasse on this regarding back-patching, which seems unfortunate to me, but without someone else commenting it seems like it's stalled. I'll go ahead and push the change to HEAD soon, as there doesn't seem to be any contention regarding that. Thanks, Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Mar 22, 2021 at 04:07:12PM -0400, Robert Haas wrote: > > On Mon, Mar 22, 2021 at 1:48 PM Stephen Frost wrote: > >> Thanks for that. Attached is just a rebased version with a commit > >> message added. If there aren't any other concerns, I'll commit this in > >> the next few days and back-patch it. When it comes to 12 and older, > >> does anyone want to opine about the wait event to use? I was thinking > >> PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... > > > > I'm not sure if we should back-patch this, but I think if you do you > > should just add a wait event, rather than using a generic one. > > I would not back-patch that either, as this is an improvement of the > current state. I agree that this had better introduce a new wait > event. Even if this stuff gets backpatched, you won't introduce an > ABI incompatibility with a new event as long as you add the new event > at the end of the existing enum lists, but let's keep the wait events > ordered on HEAD. Adding CFI's in places that really should have them is something we certainly have back-patched in the past, and that's just 'an improvement of the current state' too, so I don't quite follow the argument being made here that this shouldn't be back-patched. I don't have any problem with adding into the older releases, at the end of the existing lists, the same wait event that exists in 13+ for this already. Any other thoughts on this, particularly about back-patching or not..? Thanks, Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
On Mon, Mar 22, 2021 at 04:07:12PM -0400, Robert Haas wrote: > On Mon, Mar 22, 2021 at 1:48 PM Stephen Frost wrote: >> Thanks for that. Attached is just a rebased version with a commit >> message added. If there aren't any other concerns, I'll commit this in >> the next few days and back-patch it. When it comes to 12 and older, >> does anyone want to opine about the wait event to use? I was thinking >> PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... > > I'm not sure if we should back-patch this, but I think if you do you > should just add a wait event, rather than using a generic one. I would not back-patch that either, as this is an improvement of the current state. I agree that this had better introduce a new wait event. Even if this stuff gets backpatched, you won't introduce an ABI incompatibility with a new event as long as you add the new event at the end of the existing enum lists, but let's keep the wait events ordered on HEAD. -- Michael signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
On Mon, Mar 22, 2021 at 1:48 PM Stephen Frost wrote: > Thanks for that. Attached is just a rebased version with a commit > message added. If there aren't any other concerns, I'll commit this in > the next few days and back-patch it. When it comes to 12 and older, > does anyone want to opine about the wait event to use? I was thinking > PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... I'm not sure if we should back-patch this, but I think if you do you should just add a wait event, rather than using a generic one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Fri, Dec 11, 2020 at 7:57 AM Stephen Frost wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > The if-we're-going-to-delay-anyway path in vacuum_delay_point seems > > > OK to add a touch more overhead to, though. > > > > Alright, for this part at least, seems like it'd be something like the > > attached. > > > > Only lightly tested, but does seem to address the specific example which > > was brought up on this thread. > > > > Thoughts..? > > +1 Thanks for that. Attached is just a rebased version with a commit message added. If there aren't any other concerns, I'll commit this in the next few days and back-patch it. When it comes to 12 and older, does anyone want to opine about the wait event to use? I was thinking PG_WAIT_TIMEOUT or WAIT_EVENT_PG_SLEEP ... Or do folks think this shouldn't be backpatched? That would mean it wouldn't help anyone for years, which would be pretty unfortuante, hence my feeling that it's worthwhile to backpatch. Thanks! Stephen From 9daf52b78d106c86e038dcefdb1d8345d22b9756 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 22 Mar 2021 13:25:57 -0400 Subject: [PATCH] Use a WaitLatch for vacuum/autovacuum sleeping Instead of using pg_usleep() in vacuum_delay_point(), use a WaitLatch. This has the advantage that we will realize if the postmaster has been killed since the last time we decided to sleep while vacuuming. Discussion: https://postgr.es/m/CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=j...@mail.gmail.com Backpatch: 9.6- --- src/backend/commands/vacuum.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index c064352e23..662aff04b4 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2080,9 +2080,11 @@ vacuum_delay_point(void) if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY); - pg_usleep((long) (msec * 1000)); - pgstat_report_wait_end(); + (void) WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + msec, + WAIT_EVENT_VACUUM_DELAY); + ResetLatch(MyLatch); VacuumCostBalance = 0; -- 2.27.0 signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
On Fri, Dec 11, 2020 at 7:57 AM Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > The if-we're-going-to-delay-anyway path in vacuum_delay_point seems > > OK to add a touch more overhead to, though. > > Alright, for this part at least, seems like it'd be something like the > attached. > > Only lightly tested, but does seem to address the specific example which > was brought up on this thread. > > Thoughts..? +1
Re: Autovacuum worker doesn't immediately exit on postmaster death
On Fri, Dec 11, 2020 at 8:34 AM Robert Haas wrote: > On Thu, Oct 29, 2020 at 5:36 PM Alvaro Herrera > wrote: > > Maybe instead of thinking specifically in terms of vacuum, we could > > count buffer accesses (read from kernel) and check the latch once every > > 1000th such, or something like that. Then a very long query doesn't > > have to wait until it's run to completion. The cost is one integer > > addition per syscall, which should be bearable. > > Interesting idea. One related case is where everything is fine on the > server side but the client has disconnected and we don't notice that > the socket has changed state until something makes us try to send a > message to the client, which might be a really long time if the > server's doing like a lengthy computation before generating any rows. > It would be really nice if we could find a cheap way to check for both > postmaster death and client disconnect every now and then, like if a > single system call could somehow answer both questions. For the record, an alternative approach was proposed[1] that periodically checks for disconnected sockets using a timer, that will then cause the next CFI() to abort. Doing the check (a syscall) based on elapsed time rather than every nth CFI() or buffer access or whatever seems better in some ways, considering the difficulty of knowing what the frequency will be. One of the objections was that it added unacceptable setitimer() calls. We discussed an idea to solve that problem generally, and then later I prototyped that idea in another thread[2] about idle session timeouts (not sure about that yet, comments welcome). I've also wondered about checking postmaster_possibly_dead in CFI() on platforms where we have it (and working to increase that set of platforms), instead of just reacting to PM death when sleeping. But it seems like the real problem in this specific case is the use of pg_usleep() where WaitLatch() should be used, no? The recovery loop is at the opposite end of the spectrum: while vacuum doesn't check for postmaster death often enough, the recovery loop checks potentially hundreds of thousands or millions of times per seconds, which sucks on systems that don't have parent-death signals and slows down recovery quite measurably. In the course of the discussion about fixing that[3] we spotted other places that are using a pg_usleep() where they ought to be using WaitLatch() (which comes with exit-on-PM-death behaviour built-in). By the way, the patch in that thread does almost what Robert described, namely check for PM death every nth time (which in this case means every nth WAL record), except it's not in the main CFI(), it's in a special variant used just for recovery. [1] https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru [2] https://www.postgresql.org/message-id/flat/763a0689-f189-459e-946f-f0ec44589...@hotmail.com [3] https://www.postgresql.org/message-id/flat/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4r2yaobywooa3m8xe...@mail.gmail.com
Re: Autovacuum worker doesn't immediately exit on postmaster death
On Thu, Oct 29, 2020 at 5:36 PM Alvaro Herrera wrote: > Maybe instead of thinking specifically in terms of vacuum, we could > count buffer accesses (read from kernel) and check the latch once every > 1000th such, or something like that. Then a very long query doesn't > have to wait until it's run to completion. The cost is one integer > addition per syscall, which should be bearable. Interesting idea. One related case is where everything is fine on the server side but the client has disconnected and we don't notice that the socket has changed state until something makes us try to send a message to the client, which might be a really long time if the server's doing like a lengthy computation before generating any rows. It would be really nice if we could find a cheap way to check for both postmaster death and client disconnect every now and then, like if a single system call could somehow answer both questions. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Alvaro Herrera writes: > > On 2020-Oct-29, Stephen Frost wrote: > >> I do think it'd be good to find a way to check every once in a while > >> even when we aren't going to delay though. Not sure what the best > >> answer there is. > > > Maybe instead of thinking specifically in terms of vacuum, we could > > count buffer accesses (read from kernel) and check the latch once every > > 1000th such, or something like that. Then a very long query doesn't > > have to wait until it's run to completion. The cost is one integer > > addition per syscall, which should be bearable. > > I'm kind of unwilling to add any syscalls at all to normal execution > code paths for this purpose. People shouldn't be sig-kill'ing the > postmaster, or if they do, cleaning up the mess is their responsibility. > I'd also suggest that adding nearly-untestable code paths for this > purpose is a fine way to add bugs we'll never catch. > > The if-we're-going-to-delay-anyway path in vacuum_delay_point seems > OK to add a touch more overhead to, though. Alright, for this part at least, seems like it'd be something like the attached. Only lightly tested, but does seem to address the specific example which was brought up on this thread. Thoughts..? Thanks, Stephen diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 98270a1049..c90a4edb98 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2069,9 +2069,11 @@ vacuum_delay_point(void) if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY); - pg_usleep((long) (msec * 1000)); - pgstat_report_wait_end(); + (void) WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + msec, + WAIT_EVENT_VACUUM_DELAY); + ResetLatch(MyLatch); VacuumCostBalance = 0; signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Alvaro Herrera writes: > > On 2020-Oct-29, Stephen Frost wrote: > >> I do think it'd be good to find a way to check every once in a while > >> even when we aren't going to delay though. Not sure what the best > >> answer there is. > > > Maybe instead of thinking specifically in terms of vacuum, we could > > count buffer accesses (read from kernel) and check the latch once every > > 1000th such, or something like that. Then a very long query doesn't > > have to wait until it's run to completion. The cost is one integer > > addition per syscall, which should be bearable. > > I'm kind of unwilling to add any syscalls at all to normal execution > code paths for this purpose. People shouldn't be sig-kill'ing the > postmaster, or if they do, cleaning up the mess is their responsibility. > I'd also suggest that adding nearly-untestable code paths for this > purpose is a fine way to add bugs we'll never catch. Not sure if either is at all viable, but I had a couple of thoughts about other ways to possibly address this. The first simplistic idea is this- we have lots of processes that pick up pretty quickly on the postmaster going away due to checking if it's still around while waiting for something else to happen anyway (like the autovacuum launcher...), and we have CFI's in a lot of places where it's reasonable to do a CFI but isn't alright to check for postmaster death. While it'd be better if there were more platforms where parent death would send a signal to the children, that doesn't seem to be coming any time soon- so why don't we do it ourselves? That is, when we discover that the postmaster has died, scan through the proc array (carefully, since it could be garbage, but all we're looking for are the PIDs of anything that might still be around) and try sending a signal to any processes that are left? Those signals would hopefully get delivered and the other backends would discover the signal through CFI and exit reasonably quickly. The other thought I had was around trying to check for postmaster death when we're about to do some I/O, which would probably catch a large number of these cases too though technically some process might stick around for a while if it's only dealing with things that are already in shared buffers, I suppose. Also seems complicated and expensive to do. > The if-we're-going-to-delay-anyway path in vacuum_delay_point seems > OK to add a touch more overhead to, though. Yeah, this certainly seems reasonable to do too and on a well run system would likely be enough 90+% of the time. Thanks, Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Alvaro Herrera writes: > On 2020-Oct-29, Stephen Frost wrote: >> I do think it'd be good to find a way to check every once in a while >> even when we aren't going to delay though. Not sure what the best >> answer there is. > Maybe instead of thinking specifically in terms of vacuum, we could > count buffer accesses (read from kernel) and check the latch once every > 1000th such, or something like that. Then a very long query doesn't > have to wait until it's run to completion. The cost is one integer > addition per syscall, which should be bearable. I'm kind of unwilling to add any syscalls at all to normal execution code paths for this purpose. People shouldn't be sig-kill'ing the postmaster, or if they do, cleaning up the mess is their responsibility. I'd also suggest that adding nearly-untestable code paths for this purpose is a fine way to add bugs we'll never catch. The if-we're-going-to-delay-anyway path in vacuum_delay_point seems OK to add a touch more overhead to, though. regards, tom lane
Re: Autovacuum worker doesn't immediately exit on postmaster death
On 2020-Oct-29, Stephen Frost wrote: > I do think it'd be good to find a way to check every once in a while > even when we aren't going to delay though. Not sure what the best > answer there is. Maybe instead of thinking specifically in terms of vacuum, we could count buffer accesses (read from kernel) and check the latch once every 1000th such, or something like that. Then a very long query doesn't have to wait until it's run to completion. The cost is one integer addition per syscall, which should be bearable. (This doesn't help with a query that's running arbitrarily outside of Postgres, or doing something that doesn't access disk -- but it'd help with a majority of problem cases.)
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2020-10-29 12:27:53 -0400, Tom Lane wrote: > > Maybe put a check into vacuum_delay_point, and poll the pipe when we're > > about to sleep anyway? > > Perhaps we should just replace the pg_usleep() with a latch wait? I'm not sure why, but I had the thought that we already had done that, and was a bit surprised that it wasn't that way, so +1 from my part. I do think it'd be good to find a way to check every once in a while even when we aren't going to delay though. Not sure what the best answer there is. Thanks, Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Hi, On 2020-10-29 12:27:53 -0400, Tom Lane wrote: > Maybe put a check into vacuum_delay_point, and poll the pipe when we're > about to sleep anyway? Perhaps we should just replace the pg_usleep() with a latch wait? Greetings, Andres Freund
Re: Autovacuum worker doesn't immediately exit on postmaster death
On 2020-Oct-28, Alexander Kukushkin wrote: > Hello, > > I know, nobody in their mind should do that, but, if the postmaster > process is killed with SIGKILL signal, most backend processes > correctly notice the fact of the postmaster process absence and exit. > There is one exception though, when there are autovacuum worker > processes they are continuing to run until eventually finish and exit. So, if you have a manual vacuum running on the table (with vacuum_cost_delay=0) and kill -KILL the postmaster, that one also lingers arbitrarily long afterwards? (I suppose the problem is not as obvious just because the vacuum wouldn't run as long, because of no vacuum cost delay; but it'd still be a problem if you made the table bigger.)
Re: Autovacuum worker doesn't immediately exit on postmaster death
On 2020-Oct-29, Stephen Frost wrote: > > It's hard to do better than that, because on most platforms there's > > no way to get a signal on parent-process death, so the only way to > > notice would be to poll the postmaster-death pipe constantly; which > > would be hugely expensive in comparison to the value. > > I agree that 'constantly' wouldn't be great, but with some periodicity > that's more frequent than 'not until a few hours later when we finally > finish vacuuming this relation' would be nice. At least with autovauum > we may be periodically sleeping anyway so it doesn't seem like polling > at that point would really be terrible, though it'd be nice to check > every once in a while even if we aren't sleeping. vacuum_delay_point seems an obvious candidate, as soon as we've determined that the sleep interval is > 0; since we're going to sleep, the cost of a syscall seems negligible. I'm not sure what to suggest for vacuums that don't have vacuum costing active, though.
Re: Autovacuum worker doesn't immediately exit on postmaster death
Stephen Frost writes: > I agree that 'constantly' wouldn't be great, but with some periodicity > that's more frequent than 'not until a few hours later when we finally > finish vacuuming this relation' would be nice. At least with autovauum > we may be periodically sleeping anyway so it doesn't seem like polling > at that point would really be terrible, though it'd be nice to check > every once in a while even if we aren't sleeping. Maybe put a check into vacuum_delay_point, and poll the pipe when we're about to sleep anyway? That wouldn't fix anything except autovacuum, but if you're right that that's a primary pain point then it'd help. regards, tom lane
Re: Autovacuum worker doesn't immediately exit on postmaster death
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Victor Yegorov writes: > > ср, 28 окт. 2020 г. в 19:44, Alexander Kukushkin : > >> I know, nobody in their mind should do that, but, if the postmaster > >> process is killed with SIGKILL signal, most backend processes > >> correctly notice the fact of the postmaster process absence and exit. > >> There is one exception though, when there are autovacuum worker > >> processes they are continuing to run until eventually finish and exit. > > > Do you get the same behaviour also on master? > > As there was some work in this area for 14, see > > https://git.postgresql.org/pg/commitdiff/44fc6e259b > > That was about SIGQUIT response, which isn't really related to this > scenario. But I do not think Alexander has accurately characterized > the situation. *No* server processes will react instantly to postmaster > death. Typically they'll only detect it while waiting for some other > condition, such as client input, or in some cases while iterating their > outermost loop. So if they're busy with calculations they might not > notice for a long time. I don't think autovacuum is any worse than > a busy client backend on this score. Considering how long an autovacuum can run, it seems like it'd be worthwhile to find a useful place to check for postmaster-death. Typical well-running systems are going to be waiting for the client pretty frequently and therefore this does make autovacuum stick out in this case. > It's hard to do better than that, because on most platforms there's > no way to get a signal on parent-process death, so the only way to > notice would be to poll the postmaster-death pipe constantly; which > would be hugely expensive in comparison to the value. I agree that 'constantly' wouldn't be great, but with some periodicity that's more frequent than 'not until a few hours later when we finally finish vacuuming this relation' would be nice. At least with autovauum we may be periodically sleeping anyway so it doesn't seem like polling at that point would really be terrible, though it'd be nice to check every once in a while even if we aren't sleeping. Thanks, Stephen signature.asc Description: PGP signature
Re: Autovacuum worker doesn't immediately exit on postmaster death
Victor Yegorov writes: > ср, 28 окт. 2020 г. в 19:44, Alexander Kukushkin : >> I know, nobody in their mind should do that, but, if the postmaster >> process is killed with SIGKILL signal, most backend processes >> correctly notice the fact of the postmaster process absence and exit. >> There is one exception though, when there are autovacuum worker >> processes they are continuing to run until eventually finish and exit. > Do you get the same behaviour also on master? > As there was some work in this area for 14, see > https://git.postgresql.org/pg/commitdiff/44fc6e259b That was about SIGQUIT response, which isn't really related to this scenario. But I do not think Alexander has accurately characterized the situation. *No* server processes will react instantly to postmaster death. Typically they'll only detect it while waiting for some other condition, such as client input, or in some cases while iterating their outermost loop. So if they're busy with calculations they might not notice for a long time. I don't think autovacuum is any worse than a busy client backend on this score. It's hard to do better than that, because on most platforms there's no way to get a signal on parent-process death, so the only way to notice would be to poll the postmaster-death pipe constantly; which would be hugely expensive in comparison to the value. On the whole I'm skeptical that this is a useful consideration to expend effort on. You shouldn't be killing the postmaster that way. If you do, you'll soon learn not to, for plenty of reasons besides this one. regards, tom lane
Re: Autovacuum worker doesn't immediately exit on postmaster death
ср, 28 окт. 2020 г. в 19:44, Alexander Kukushkin : > I know, nobody in their mind should do that, but, if the postmaster > process is killed with SIGKILL signal, most backend processes > correctly notice the fact of the postmaster process absence and exit. > There is one exception though, when there are autovacuum worker > processes they are continuing to run until eventually finish and exit. > > … > > I was able to reproduce it with 13.0 and 12.4, and I believe older > versions are also affected. > Do you get the same behaviour also on master? As there was some work in this area for 14, see https://git.postgresql.org/pg/commitdiff/44fc6e259b -- Victor Yegorov