Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Fri, 23 Jun 2017 19:43:35 -0400 Stephen Frostwrote: > Alvaro, all, > > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > > Yugo Nagata wrote: > > > > > I tried to make it. Patch attached. > > > > > > It is easy and simple. Although I haven't tried for autovacuum worker, > > > I confirmed I can change other process' parameters without affecting > > > others. > > > Do you want this in PG? > > > > One advantage of this gadget is that you can signal backends that you > > own without being superuser, so +1 for the general idea. (Please do > > create a fresh thread so that this can be appropriately linked to in > > commitfest app, though.) > > Well, that wouldn't quite work with the proposed patch because the > proposed patch REVOKE's execute from public... Yes. It is intentional to revoke execute from public because we need to change configuration before signaling other backend and it needs superuser privilege. > I'm trying to figure out how it's actually useful to be able to signal a > backend that you own about a config change that you *aren't* able to > make without being a superuser.. Now, if you could tell the other > backend to use a new value for some USERSET GUC, then *that* would be > useful and interesting. > > I haven't got any particularly great ideas for how to make that happen > though. > > > You need a much bigger patch for the autovacuum worker. The use case I > > had in mind is that you have a maintenance window and can run fast > > vacuuming during it, but if the table is large and vacuum can't finish > > within that window then you want vacuum to slow down, without stopping > > it completely. But implementing this requires juggling the > > stdVacuumCostDelay/Limit values during the reload, which are currently > > read at start of vacuuming only; the working values are overwritten from > > those during a rebalance. > > Being able to change those values during a vacuuming run would certainly > be useful, I've had cases where I would have liked to have been able to > do just exactly that. That's largely independent of this though. > > Thanks! > > Stephen -- Yugo Nagata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Sat, 24 Jun 2017 08:09:52 +0900 Michael Paquierwrote: > On Sat, Jun 24, 2017 at 5:07 AM, Alvaro Herrera > wrote: > > Yugo Nagata wrote: > > > >> I tried to make it. Patch attached. > >> > >> It is easy and simple. Although I haven't tried for autovacuum worker, > >> I confirmed I can change other process' parameters without affecting > >> others. > >> Do you want this in PG? > > Just browsing the patch... > > +if (r == SIGNAL_BACKEND_NOSUPERUSER) > +ereport(WARNING, > +(errmsg("must be a superuser to terminate superuser > process"))); > + > +if (r == SIGNAL_BACKEND_NOPERMISSION) > +ereport(WARNING, > + (errmsg("must be a member of the role whose process > is being terminated or member of pg_signal_backend"))); > Both messages are incorrect. This is not trying to terminate a process. I'll fix this. > > +Datum > +pg_reload_conf_pid(PG_FUNCTION_ARGS) > I think that the naming is closer to pg_reload_backend. > > Documentation is needed as well. Agree with pg_reload_backend and I'll write the documetation. > > > One advantage of this gadget is that you can signal backends that you > > own without being superuser, so +1 for the general idea. (Please do > > create a fresh thread so that this can be appropriately linked to in > > commitfest app, though.) > > That would be nice. Sure. I'll create a new thread and attach the new patch to it. > > > You need a much bigger patch for the autovacuum worker. The use case I > > had in mind is that you have a maintenance window and can run fast > > vacuuming during it, but if the table is large and vacuum can't finish > > within that window then you want vacuum to slow down, without stopping > > it completely. But implementing this requires juggling the > > stdVacuumCostDelay/Limit values during the reload, which are currently > > read at start of vacuuming only; the working values are overwritten from > > those during a rebalance. > > Yeah, that's independent from the patch above. Agree. It would be another feature from pg_reload_backend and I think it would be implmented as another patch. > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Yugo Nagata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
Alvaro, all, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Yugo Nagata wrote: > > > I tried to make it. Patch attached. > > > > It is easy and simple. Although I haven't tried for autovacuum worker, > > I confirmed I can change other process' parameters without affecting others. > > Do you want this in PG? > > One advantage of this gadget is that you can signal backends that you > own without being superuser, so +1 for the general idea. (Please do > create a fresh thread so that this can be appropriately linked to in > commitfest app, though.) Well, that wouldn't quite work with the proposed patch because the proposed patch REVOKE's execute from public... I'm trying to figure out how it's actually useful to be able to signal a backend that you own about a config change that you *aren't* able to make without being a superuser.. Now, if you could tell the other backend to use a new value for some USERSET GUC, then *that* would be useful and interesting. I haven't got any particularly great ideas for how to make that happen though. > You need a much bigger patch for the autovacuum worker. The use case I > had in mind is that you have a maintenance window and can run fast > vacuuming during it, but if the table is large and vacuum can't finish > within that window then you want vacuum to slow down, without stopping > it completely. But implementing this requires juggling the > stdVacuumCostDelay/Limit values during the reload, which are currently > read at start of vacuuming only; the working values are overwritten from > those during a rebalance. Being able to change those values during a vacuuming run would certainly be useful, I've had cases where I would have liked to have been able to do just exactly that. That's largely independent of this though. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Sat, Jun 24, 2017 at 5:07 AM, Alvaro Herrerawrote: > Yugo Nagata wrote: > >> I tried to make it. Patch attached. >> >> It is easy and simple. Although I haven't tried for autovacuum worker, >> I confirmed I can change other process' parameters without affecting others. >> Do you want this in PG? Just browsing the patch... +if (r == SIGNAL_BACKEND_NOSUPERUSER) +ereport(WARNING, +(errmsg("must be a superuser to terminate superuser process"))); + +if (r == SIGNAL_BACKEND_NOPERMISSION) +ereport(WARNING, + (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))); Both messages are incorrect. This is not trying to terminate a process. +Datum +pg_reload_conf_pid(PG_FUNCTION_ARGS) I think that the naming is closer to pg_reload_backend. Documentation is needed as well. > One advantage of this gadget is that you can signal backends that you > own without being superuser, so +1 for the general idea. (Please do > create a fresh thread so that this can be appropriately linked to in > commitfest app, though.) That would be nice. > You need a much bigger patch for the autovacuum worker. The use case I > had in mind is that you have a maintenance window and can run fast > vacuuming during it, but if the table is large and vacuum can't finish > within that window then you want vacuum to slow down, without stopping > it completely. But implementing this requires juggling the > stdVacuumCostDelay/Limit values during the reload, which are currently > read at start of vacuuming only; the working values are overwritten from > those during a rebalance. Yeah, that's independent from the patch above. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
Yugo Nagata wrote: > I tried to make it. Patch attached. > > It is easy and simple. Although I haven't tried for autovacuum worker, > I confirmed I can change other process' parameters without affecting others. > Do you want this in PG? One advantage of this gadget is that you can signal backends that you own without being superuser, so +1 for the general idea. (Please do create a fresh thread so that this can be appropriately linked to in commitfest app, though.) You need a much bigger patch for the autovacuum worker. The use case I had in mind is that you have a maintenance window and can run fast vacuuming during it, but if the table is large and vacuum can't finish within that window then you want vacuum to slow down, without stopping it completely. But implementing this requires juggling the stdVacuumCostDelay/Limit values during the reload, which are currently read at start of vacuuming only; the working values are overwritten from those during a rebalance. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Thu, 22 Jun 2017 14:08:30 +0900 Michael Paquierwrote: > On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagata wrote: > > On Thu, 22 Jun 2017 12:05:19 +0900 > > Michael Paquier wrote: > >> signal-able). Different thought, but I'd love to see a SQL function > >> that allows triggering SIGHUP on a specific process, like an > >> autovacuum worker to change its cost parameters. There is > >> pg_reload_conf() to do so but that's system-wide. > > > > For example, is that like this? > > > > =# alter system set autovacuum_vacuum_cost_delay to 10; > > =# select pg_reload_conf(); > > =# alter system reset autovacuum_vacuum_cost_delay; > > No need to reset the parameter afterwards as this makes it sensible to > updates afterwards, but you have the idea. Note that this is rather > recent, as autovacuum listens to SIGHUP only since a75fb9b3. I tried to make it. Patch attached. It is easy and simple. Although I haven't tried for autovacuum worker, I confirmed I can change other process' parameters without affecting others. Do you want this in PG? [session A (PID:18375)] =# show autovacuum_vacuum_cost_delay; autovacuum_vacuum_cost_delay -- 20ms (1 row) [session B] postgres=# alter system set autovacuum_vacuum_cost_delay to 10; ALTER SYSTEM postgres=# select pg_reload_conf(18375); pg_reload_conf t (1 row) postgres=# show autovacuum_vacuum_cost_delay; autovacuum_vacuum_cost_delay -- 20ms (1 row) [session A again] postgres=# show autovacuum_vacuum_cost_delay; autovacuum_vacuum_cost_delay -- 10ms (1 row) > -- > Michael -- Yugo Nagata diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 0fdad0c..d79faf7 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1128,6 +1128,7 @@ REVOKE EXECUTE ON FUNCTION pg_wal_replay_pause() FROM public; REVOKE EXECUTE ON FUNCTION pg_wal_replay_resume() FROM public; REVOKE EXECUTE ON FUNCTION pg_rotate_logfile() FROM public; REVOKE EXECUTE ON FUNCTION pg_reload_conf() FROM public; +REVOKE EXECUTE ON FUNCTION pg_reload_conf(int) FROM public; REVOKE EXECUTE ON FUNCTION pg_current_logfile() FROM public; REVOKE EXECUTE ON FUNCTION pg_current_logfile(text) FROM public; diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 62341b8..cc173ba 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -339,6 +339,36 @@ pg_reload_conf(PG_FUNCTION_ARGS) PG_RETURN_BOOL(true); } +/* + * Signal to reload the database configuration for a specified backend + * + * Permission checking for this function is managed through the normal + * GRANT system. + */ +Datum +pg_reload_conf_pid(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_INT32(0); + int r = pg_signal_backend(pid, SIGHUP); + + if (r == SIGNAL_BACKEND_NOSUPERUSER) + ereport(WARNING, +(errmsg("must be a superuser to terminate superuser process"))); + + if (r == SIGNAL_BACKEND_NOPERMISSION) + ereport(WARNING, + (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))); + + if (r) + { + ereport(WARNING, +(errmsg("failed to send signal to backend: %d", pid))); + PG_RETURN_BOOL(false); + } + + PG_RETURN_BOOL(true); +} + /* * Rotate log file diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 1191b4a..7258f15 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -3255,6 +3255,8 @@ DESCR("true if wal replay is paused"); DATA(insert OID = 2621 ( pg_reload_conf PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_reload_conf _null_ _null_ _null_ )); DESCR("reload configuration files"); +DATA(insert OID = 3409 ( pg_reload_conf PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_reload_conf_pid _null_ _null_ _null_ )); +DESCR("reload configuration files"); DATA(insert OID = 2622 ( pg_rotate_logfile PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_rotate_logfile _null_ _null_ _null_ )); DESCR("rotate log file"); DATA(insert OID = 3800 ( pg_current_logfilePGNSP PGUID 12 1 0 0 0 f f f f f f v s 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_current_logfile _null_ _null_ _null_ )); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagatawrote: > On Thu, 22 Jun 2017 12:05:19 +0900 > Michael Paquier wrote: >> signal-able). Different thought, but I'd love to see a SQL function >> that allows triggering SIGHUP on a specific process, like an >> autovacuum worker to change its cost parameters. There is >> pg_reload_conf() to do so but that's system-wide. > > For example, is that like this? > > =# alter system set autovacuum_vacuum_cost_delay to 10; > =# select pg_reload_conf(); > =# alter system reset autovacuum_vacuum_cost_delay; No need to reset the parameter afterwards as this makes it sensible to updates afterwards, but you have the idea. Note that this is rather recent, as autovacuum listens to SIGHUP only since a75fb9b3. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Thu, 22 Jun 2017 12:05:19 +0900 Michael Paquierwrote: > On Thu, Jun 22, 2017 at 11:52 AM, Andres Freund wrote: > > On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote: > >> I agree that we can kill theses processes by the OS command. However, > >> It seems to me that pg_{cancel,terminate}_backend don't need to be able to > >> kill processes except for client backends because we can do same thing by > >> the OS command if necessary, and acutually these functions cannot kill > >> most other processes, for example, background writer. Are the autovacuum > >> launcher and background worker special for these functions? > > > > I strongly disagree with this - I think it's quite useful to be able to > > kill things via SQL that can hold lock on database objects. I'm not > > seeing which problem would be solved by prohibiting this? > > +1. Controlling them as of now is useful, particularly now that all > processes show wait events, even auxiliary ones (though not I got it. I agree that the current behaviour and it isn't worth changint it. In my understand, processes that the functions can kill (client backends, background workers, autovacuum processes) are ones that can hold lock on database objects. > signal-able). Different thought, but I'd love to see a SQL function > that allows triggering SIGHUP on a specific process, like an > autovacuum worker to change its cost parameters. There is > pg_reload_conf() to do so but that's system-wide. For example, is that like this? =# alter system set autovacuum_vacuum_cost_delay to 10; =# select pg_reload_conf(); =# alter system reset autovacuum_vacuum_cost_delay; > -- > Michael -- Yugo Nagata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Thu, Jun 22, 2017 at 11:52 AM, Andres Freundwrote: > On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote: >> I agree that we can kill theses processes by the OS command. However, >> It seems to me that pg_{cancel,terminate}_backend don't need to be able to >> kill processes except for client backends because we can do same thing by >> the OS command if necessary, and acutually these functions cannot kill >> most other processes, for example, background writer. Are the autovacuum >> launcher and background worker special for these functions? > > I strongly disagree with this - I think it's quite useful to be able to > kill things via SQL that can hold lock on database objects. I'm not > seeing which problem would be solved by prohibiting this? +1. Controlling them as of now is useful, particularly now that all processes show wait events, even auxiliary ones (though not signal-able). Different thought, but I'd love to see a SQL function that allows triggering SIGHUP on a specific process, like an autovacuum worker to change its cost parameters. There is pg_reload_conf() to do so but that's system-wide. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote: > On Wed, 21 Jun 2017 11:04:34 -0400 > Robert Haaswrote: > > > On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata wrote: > > > I have found that we can cancel/terminate autovacuum launchers and > > > background worker processes by pg_cancel/terminate_backend function. > > > I'm wondering this behavior is not expected and if not I want to fix it. > > > > I think it is expected. Even if we blocked it, those processes have > > to cope gracefully with SIGTERM, because anyone with access to the OS > > user can kill them that way by hand. > > I agree that we can kill theses processes by the OS command. However, > It seems to me that pg_{cancel,terminate}_backend don't need to be able to > kill processes except for client backends because we can do same thing by > the OS command if necessary, and acutually these functions cannot kill > most other processes, for example, background writer. Are the autovacuum > launcher and background worker special for these functions? I strongly disagree with this - I think it's quite useful to be able to kill things via SQL that can hold lock on database objects. I'm not seeing which problem would be solved by prohibiting this? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Wed, 21 Jun 2017 11:04:34 -0400 Robert Haaswrote: > On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata wrote: > > I have found that we can cancel/terminate autovacuum launchers and > > background worker processes by pg_cancel/terminate_backend function. > > I'm wondering this behavior is not expected and if not I want to fix it. > > I think it is expected. Even if we blocked it, those processes have > to cope gracefully with SIGTERM, because anyone with access to the OS > user can kill them that way by hand. I agree that we can kill theses processes by the OS command. However, It seems to me that pg_{cancel,terminate}_backend don't need to be able to kill processes except for client backends because we can do same thing by the OS command if necessary, and acutually these functions cannot kill most other processes, for example, background writer. Are the autovacuum launcher and background worker special for these functions? > > > However, we can terminate background workers by pg_terminate_backend. > > In the following example, I terminated the logical replication launcher, > > and this process did not appear again[1]. > > > > postgres=# select pg_terminate_backend(30902); > > pg_terminate_backend > > -- > > t > > (1 row) > > That seems to be a bug in logical replication. > > > Similarly, we can terminate autovacuum launcher by pg_terminate_backend, > > but a new process is restarted by postmaster in this case.[2] > > > > postgres=# select pg_terminate_backend(30900); > > pg_terminate_backend > > -- > > t > > (1 row) > > That is as I would expect. > > > [2] > > On the other hand, when we use pg_cancel_backend for autovacuum launcher, > > it causes the following error. I'll report the detail in another thread. > > > > ERROR: can't attach the same segment more than once > > I think that's a bug. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Yugo Nagata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
Yugo Nagata wrote: > However, we can terminate background workers by pg_terminate_backend. > In the following example, I terminated the logical replication launcher, > and this process did not appear again[1]. > > postgres=# select pg_terminate_backend(30902); > pg_terminate_backend > -- > t > (1 row) I think failing to restart the replication launcher after it stops is probably a bug, and should be fixed by having postmaster start another one if it dies. > Similarly, we can terminate autovacuum launcher by pg_terminate_backend, > but a new process is restarted by postmaster in this case.[2] Yeah, this is operating as intended. You can turn autovacuum off and the launcher should go away, and turn it back on and launcher should start. So we expect the autovac launcher to respond to signals. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagatawrote: > I have found that we can cancel/terminate autovacuum launchers and > background worker processes by pg_cancel/terminate_backend function. > I'm wondering this behavior is not expected and if not I want to fix it. I think it is expected. Even if we blocked it, those processes have to cope gracefully with SIGTERM, because anyone with access to the OS user can kill them that way by hand. > However, we can terminate background workers by pg_terminate_backend. > In the following example, I terminated the logical replication launcher, > and this process did not appear again[1]. > > postgres=# select pg_terminate_backend(30902); > pg_terminate_backend > -- > t > (1 row) That seems to be a bug in logical replication. > Similarly, we can terminate autovacuum launcher by pg_terminate_backend, > but a new process is restarted by postmaster in this case.[2] > > postgres=# select pg_terminate_backend(30900); > pg_terminate_backend > -- > t > (1 row) That is as I would expect. > [2] > On the other hand, when we use pg_cancel_backend for autovacuum launcher, > it causes the following error. I'll report the detail in another thread. > > ERROR: can't attach the same segment more than once I think that's a bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers
Hi, I have found that we can cancel/terminate autovacuum launchers and background worker processes by pg_cancel/terminate_backend function. I'm wondering this behavior is not expected and if not I want to fix it. The current pg_stat_activity shows background workers and autovacuum lancher as below. It made me come up with the question. postgres=# select pid, wait_event, backend_type from pg_stat_activity ; pid | wait_event |backend_type ---+-+- 30902 | LogicalLauncherMain | background worker 30900 | AutoVacuumMain | autovacuum launcher 30923 | | client backend 30898 | BgWriterMain| background writer 30897 | CheckpointerMain| checkpointer 30899 | WalWriterMain | walwriter (6 rows) We cannot use pg_terminate/cancel_backend for most processes except client backends. For example, when I tried to terminate the background writer, I got a warning and failed. postgres=# select pg_terminate_backend(30899); WARNING: PID 30899 is not a PostgreSQL server process pg_terminate_backend -- f (1 row) However, we can terminate background workers by pg_terminate_backend. In the following example, I terminated the logical replication launcher, and this process did not appear again[1]. postgres=# select pg_terminate_backend(30902); pg_terminate_backend -- t (1 row) postgres=# select pid, wait_event, backend_type from pg_stat_activity ; pid |wait_event |backend_type ---+---+- 30900 | AutoVacuumMain| autovacuum launcher 30923 | | client backend 30898 | BgWriterHibernate | background writer 30897 | CheckpointerMain | checkpointer 30899 | WalWriterMain | walwriter (5 rows) Similarly, we can terminate autovacuum launcher by pg_terminate_backend, but a new process is restarted by postmaster in this case.[2] postgres=# select pg_terminate_backend(30900); pg_terminate_backend -- t (1 row) postgres=# select pid, wait_event, backend_type from pg_stat_activity ; pid |wait_event |backend_type ---+---+- 32483 | AutoVacuumMain| autovacuum launcher 30923 | | client backend 30898 | BgWriterHibernate | background writer 30897 | CheckpointerMain | checkpointer 30899 | WalWriterMain | walwriter (5 rows) My question is whether the behavior of pg_terminate/cancel_backend is expected. If these functions should succeed only for client backends, we need to fix the behavior. Attached is a patch to fix it in that case. In my patch, process type is checked in pg_signal_backend(), and if it is background worker or autovacuum launcher then throw a warning and fail. BackendPidGetProc() returns valid PGPROC for proccesses that are initialized by PostgresInit(), and, in my understand, all such proccess are client backends, background workers, and autovacuum launcher. So, if this is neither background woker nor autovacuum launcher, this should be a normal client backend. For this check, I added a new field, isAutoVacuumLauncher, to PGPROC. Any comments would be appreciated. - [1] AFAIK, we have to restart the server to enable logical replication after this. I'm not sure this is expected, but I found the following comment in ProcessInterrupts(). Does "can be stopped at any time" mean that we can drop this process completely? 2852 else if (IsLogicalLauncher()) 2853 { 2854 ereport(DEBUG1, 2855 (errmsg("logical replication launcher shutting down"))); 2856 2857 /* The logical replication launcher can be stopped at any time. */ 2858 proc_exit(0); 2859 } When the logical replication launcher receive SIGTERM, this exits with exitstatus 0, so this is not restarted by the postmaster. [2] On the other hand, when we use pg_cancel_backend for autovacuum launcher, it causes the following error. I'll report the detail in another thread. ERROR: can't attach the same segment more than once - Regards, -- Yugo Nagatapg_terminate_backend.pach Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Tue, Jun 26, 2012 at 10:58 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Wed, Jun 27, 2012 at 8:13 AM, Magnus Hagander mag...@hagander.net wrote: I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? +1. Three votes with no objections is good enough for me, so, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Tue, Jun 26, 2012 at 1:58 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? I think that would be good. It is at the level of pain whereas I would backpatch from day one, but I think it would be a welcome change to people who couldn't justify gnashing their teeth and distributing a tweaked Postgres like I would. It saves me some effort, too. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. -- fdr Extend-same-role-backend-management-to-pg_terminate_backend-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. -- fdr From 73c794a0cce148c2848adfb06be9aac985ac41d8 Mon Sep 17 00:00:00 2001 From: Daniel Farina dan...@heroku.com Date: Sun, 18 Mar 2012 20:08:37 -0700 Subject: [PATCH] Extend same-role backend management to pg_terminate_backend This makes it more similar to pg_cancel_backend, except it gives users the ability to close runaway connections entirely. Signed-off-by: Daniel Farina dan...@heroku.com --- doc/src/sgml/func.sgml |6 +- src/backend/utils/adt/misc.c | 12 +++- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 34fea16..7cece3a 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 14403,14409 SELECT set_config('log_statement_stats', 'off', false); literalfunctionpg_terminate_backend(parameterpid/parameter typeint/)/function/literal /entry entrytypeboolean/type/entry !entryTerminate a backend/entry /row /tbody /tgroup --- 14403,14413 literalfunctionpg_terminate_backend(parameterpid/parameter typeint/)/function/literal /entry entrytypeboolean/type/entry !entryTerminate a backend. You can execute this against ! another backend that has exactly the same role as the user ! calling the function. In all other cases, you must be a ! superuser. !/entry /row /tbody /tgroup *** a/src/backend/utils/adt/misc.c --- b/src/backend/utils/adt/misc.c *** *** 162,179 pg_cancel_backend(PG_FUNCTION_ARGS) } /* ! * Signal to terminate a backend process. Only allowed by superuser. */ Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! if (!superuser()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! errmsg(must be superuser to terminate other server processes), ! errhint(You can cancel your own processes with pg_cancel_backend().))); ! PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM) == SIGNAL_BACKEND_SUCCESS); } /* --- 162,181 } /* ! * Signal to terminate a backend process. This is allowed if you are superuser ! * or have the same role as the process being terminated. */ Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM); ! ! if (r == SIGNAL_BACKEND_NOPERMISSION) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! (errmsg(must be superuser or have the same role to terminate backends running in other server processes; ! PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS); } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Fri, Mar 16, 2012 at 04:42:07PM -0700, Daniel Farina wrote: On Fri, Mar 16, 2012 at 3:42 PM, Noah Misch n...@leadboat.com wrote: On Thu, Mar 15, 2012 at 04:14:03PM -0700, Daniel Farina wrote: I imagine the problem is a race condition whereby a pid might be reused by another process owned by another user (doesn't that also affect pg_cancel_backend?). ?Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? No, I think the hazard you identify here is orthogonal to the question of when to authorize pg_terminate_backend(). ?As you note downthread, protocol-level cancellations available in released versions already exhibit this hazard. ?I wouldn't mind a clean fix for this, but it's an independent subject. Hmm. Well, here's a patch that implements exactly that, I think, similar to the one posted to this thread, but not using BackendIds, but rather the newly-introduced SessionId. Would appreciate comments. Because an out-of-band signaling method has been integrated more complex behaviors -- such as closing the TERM-against-SECURITY-DEFINER-FUNCTION hazard -- can be addressed. For now I've only attempted to solve the problem of backend ambiguity, which basically necessitated out-of-line information transfer as per the usual means. This patch still changes the policy for pg_terminate_backend(), and it does not fix other SIGINT senders like processCancelRequest() and ProcSleep(). If you're concerned about PID-reuse races, audit all backend signalling. Either fix all such problems or propose a plan to get there eventually. Any further discussion of this topic needs a new subject line; mixing its consideration with proposals to change the policy behind pg_terminate_backend() reduces the chances of the right people commenting on these distinct questions. Currently, when pg_terminate_backend() follows a pg_cancel_backend() on which the target has yet to act, the eventual outcome is a terminated process. With this patch, the pg_terminate_backend() becomes a no-op with this warning: ! ereport(WARNING, ! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! (errmsg(process is busy responding to administrative ! request)), ! (errhint(This is temporary, and may be retried.; That's less useful than the current behavior. That being said, I can't get too excited about closing PID-reuse races. I've yet to see another program do so. I've never seen a trouble report around this race for any software. Every OS I have used assigns PIDs so as to maximize the reuse interval, which seems like an important POLA measure given typical admin formulae like kill `ps | grep ...`. This defense can only matter in fork-bomb conditions, at which point a stray signal is minor. I do think it's worth keeping this idea in a back pocket for achieving those more complex behaviors, should we ever desire them. Here I discussed a hazard specific to allowing pg_terminate_backend(): http://archives.postgresql.org/message-id/20110602045955.gc8...@tornado.gateway.2wire.net To summarize, user code can trap SIGINT cancellations, but it cannot trap SIGTERM terminations. ?If a backend is executing a SECURITY DEFINER function when another backend of the same role calls pg_terminate_backend() thereon, the pg_terminate_backend() caller could achieve something he cannot achieve in PostgreSQL 9.1. ?I vote that this is an acceptable loss. I'll throw out a patch that just lets this hazard exist and see what happens, although it is obsoleted/incompatible with the one already attached. +1. Has anyone actually said that the PID-reuse race or the thing I mention above should block such a patch? I poked back through the threads I could remember and found nothing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 10:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: On Thu, Mar 15, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: But actually I don't see what you hope to gain from such a change, even if it can be made to work. Anyone who can do kill(SIGINT) can do kill(SIGKILL), say --- so you have to be able to trust the signal sender. What's the point of not trusting it to verify the client identity? No longer true with pg_cancel_backend not-by-superuser, no? No. That doesn't affect the above argument in the least. And in fact if there's any question whatsoever as to whether unprivileged cross-backend signals are secure, they are not going in in the first place. Okay, well, I believe there is a race in handling common administrative signals that *might* possibly matter. In the past, pg_cancel_backend was superuser only, which is a lot like saying only available to people who can be the postgres user and run kill. The cancellation packet is handled via first checking the other backend's BackendKey and then SIGINTing it, leaving only the most narrow possibility for a misdirected SIGINT. But it really is unfortunate that I, a user, run a query or have a problematic connection of my own role and just want the thing to stop, but I can't do anything about it without superuser. In recognition of that, pg_cancel_backend now can operate on backends owned by the same user (once again, checked before the signal is processed by the receiver, just like with the cancellation packet). There was some angst (but I'm not sure about how specific or uniform) to extend such signaling power to pg_terminate_backend, and the only objection I can think of is there is this race, or so it would seem to me. Maybe it's too small to care, in which case we can just extend the same policy to pg_terminate_backend, or maybe it's not, in which case we could get rid of any signaling race conditions. The only hypothetical person who would be happy with the current situation, if characterized correctly, would be one who thinks that pid-race on SIGINT from non-superusers (long has been true in the form of the cancellation packet) is okay but on SIGTERM such a race is not. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 11:01 PM, Daniel Farina dan...@heroku.com wrote: Okay, well, I believe there is a race in handling common administrative signals that *might* possibly matter. In the past, pg_cancel_backend was superuser only, which is a lot like saying only available to people who can be the postgres user and run kill. The cancellation packet is handled via first checking the other backend's BackendKey and then SIGINTing it, leaving only the most narrow possibility for a misdirected SIGINT. Attached is a patch that sketches a removal of the caveat by relying on the BackendId in PGPROC instead of the pid. Basically, the idea is to make it work more like how cancellation keys work, except for internal SQL functions. I think the unsatisfying crux here is the uniqueness of BackendId over the life of one *postmaster* invocation: sinvaladt.c MyBackendId = (stateP - segP-procState[0]) + 1; /* Advertise assigned backend ID in MyProc */ MyProc-backendId = MyBackendId; I'm not sure precisely what to think about how this numbering winds up working on quick inspection. Clearly, if BackendIds can be reused quickly then the pid-race problem comes back in spirit right away. But given the contract of MyBackendId as I understand it (it just has to be unique among all backends at any given time), it could be changed. I don't *think* it's used for its arithmetic relationship to its underlying components anywhere. Another similar solution (not attached) would be to send information about the originating backend through PGPROC and having the check be against those rather than merely a correct and unambiguous MyBackendId. I also see now that cancellation packets does not have this caveat because the postmaster is control of all forking and joining in a serially executed path, so it need not worry about pid racing. Another nice use for this might be to, say, deliver the memory or performance stats of another process while-in-execution, without having to be superuser or and/or gdbing in back to the calling backend. -- fdr From f466fe53e8e64cfa49bf56dbdf5920f9ea4e3562 Mon Sep 17 00:00:00 2001 From: Daniel Farina dan...@heroku.com Date: Thu, 15 Mar 2012 20:46:38 -0700 Subject: [PATCH] Implement race-free sql-originated backend cancellation/termination If promising, this patch requires a few documentation updates in addendum. Since 0495aaad8b337642830a4d4e82f8b8c02b27b1be, pg_cancel_backend can be run on backends that have the same role as the backend pg_cancel_backend is being invoked from. Since that time, a documented caveat exists stating that there was a race whereby a process death-and-recycle could result in an otherwise unrelated backend receiving SIGINT. Now it is desirable for pg_terminate_backend -- which also has the effect of having the backend exit, and closing the socket -- to also be usable by non-superusers. Presuming SIGINT was acceptable to race, it's not clear that it's acceptable for SIGTERM to race in the same way, so this patch seeks to try to do something about that. This patch attempts to close this race condition by targeting query cancellation/termination against the per-backend-startup unique BackendId to unambiguously identify the session rather than a PID. This makes the SQL function act more like how cancellation keys work already (perhaps these paths can be converged somehow). Signed-off-by: Daniel Farina dan...@heroku.com --- src/backend/access/transam/twophase.c |4 + src/backend/storage/ipc/procsignal.c |3 + src/backend/storage/lmgr/proc.c |3 + src/backend/tcop/postgres.c | 55 + src/backend/utils/adt/misc.c | 104 +++- src/include/miscadmin.h |1 + src/include/storage/proc.h| 17 + src/include/storage/procsignal.h |1 + 8 files changed, 146 insertions(+), 42 deletions(-) *** a/src/backend/access/transam/twophase.c --- b/src/backend/access/transam/twophase.c *** *** 62,67 --- 62,68 #include storage/procarray.h #include storage/sinvaladt.h #include storage/smgr.h + #include storage/spin.h #include utils/builtins.h #include utils/memutils.h #include utils/timestamp.h *** *** 326,331 MarkAsPreparing(TransactionId xid, const char *gid, --- 327,335 proc-backendId = InvalidBackendId; proc-databaseId = databaseid; proc-roleId = owner; + SpinLockInit(MyProc-adminMutex); + proc-adminAction = ADMIN_ACTION_NONE; + proc-adminBackendId = InvalidBackendId; proc-lwWaiting = false; proc-lwWaitMode = 0; proc-lwWaitLink = NULL; *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *** *** 258,263 procsignal_sigusr1_handler(SIGNAL_ARGS) --- 258,266 if (CheckProcSignal(PROCSIG_NOTIFY_INTERRUPT)) HandleNotifyInterrupt(); + if (CheckProcSignal(PROCSIG_ADMIN_ACTION_INTERRUPT)) +
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 04:14:03PM -0700, Daniel Farina wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 I imagine the problem is a race condition whereby a pid might be reused by another process owned by another user (doesn't that also affect pg_cancel_backend?). Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? No, I think the hazard you identify here is orthogonal to the question of when to authorize pg_terminate_backend(). As you note downthread, protocol-level cancellations available in released versions already exhibit this hazard. I wouldn't mind a clean fix for this, but it's an independent subject. Here I discussed a hazard specific to allowing pg_terminate_backend(): http://archives.postgresql.org/message-id/20110602045955.gc8...@tornado.gateway.2wire.net To summarize, user code can trap SIGINT cancellations, but it cannot trap SIGTERM terminations. If a backend is executing a SECURITY DEFINER function when another backend of the same role calls pg_terminate_backend() thereon, the pg_terminate_backend() caller could achieve something he cannot achieve in PostgreSQL 9.1. I vote that this is an acceptable loss. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Fri, Mar 16, 2012 at 3:42 PM, Noah Misch n...@leadboat.com wrote: On Thu, Mar 15, 2012 at 04:14:03PM -0700, Daniel Farina wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 I imagine the problem is a race condition whereby a pid might be reused by another process owned by another user (doesn't that also affect pg_cancel_backend?). Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? No, I think the hazard you identify here is orthogonal to the question of when to authorize pg_terminate_backend(). As you note downthread, protocol-level cancellations available in released versions already exhibit this hazard. I wouldn't mind a clean fix for this, but it's an independent subject. Hmm. Well, here's a patch that implements exactly that, I think, similar to the one posted to this thread, but not using BackendIds, but rather the newly-introduced SessionId. Would appreciate comments. Because an out-of-band signaling method has been integrated more complex behaviors -- such as closing the TERM-against-SECURITY-DEFINER-FUNCTION hazard -- can be addressed. For now I've only attempted to solve the problem of backend ambiguity, which basically necessitated out-of-line information transfer as per the usual means. Here I discussed a hazard specific to allowing pg_terminate_backend(): http://archives.postgresql.org/message-id/20110602045955.gc8...@tornado.gateway.2wire.net To summarize, user code can trap SIGINT cancellations, but it cannot trap SIGTERM terminations. If a backend is executing a SECURITY DEFINER function when another backend of the same role calls pg_terminate_backend() thereon, the pg_terminate_backend() caller could achieve something he cannot achieve in PostgreSQL 9.1. I vote that this is an acceptable loss. I'll throw out a patch that just lets this hazard exist and see what happens, although it is obsoleted/incompatible with the one already attached. -- fdr Implement-race-free-sql-originated-backend-cancellation-v2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Fri, Mar 16, 2012 at 4:42 PM, Daniel Farina dan...@heroku.com wrote: Hmm. Well, here's a patch that implements exactly that, I think, That version had some screws loose due to some editor snafus. Hopefully all better. -- fdr Implement-race-free-sql-originated-backend-cancellation-v3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_terminate_backend for same-role
Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. I imagine the problem is a race condition whereby a pid might be reused by another process owned by another user (doesn't that also affect pg_cancel_backend?). Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 I imagine the problem is a race condition whereby a pid might be reused by another process owned by another user (doesn't that also affect pg_cancel_backend?). Yes, but I think it's too unlikely to happen. Not sure if we really should worry about that. Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? What if pid is unfortunately reused after passing the test of MyCancelKey and before sending the signal? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: Shall we just do everything using the MyCancelKey (which I think could just be called SessionKey, SessionSecret, or even just Session) as to ensure we have no case of mistaken identity? Or does that end up being problematic? What if pid is unfortunately reused after passing the test of MyCancelKey and before sending the signal? The way MyCancelKey is checked now is backwards, in my mind. It seems like it would be better checked by the receiving PID (one can use a check/recheck also, if so inclined). Is there a large caveat to that? I'm working on a small patch to do this I hope to post soon (modulus difficulties), but am fully aware that messing around PGPROC and signal handling can be a bit fiddly. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
Daniel Farina dan...@heroku.com writes: The way MyCancelKey is checked now is backwards, in my mind. It seems like it would be better checked by the receiving PID (one can use a check/recheck also, if so inclined). Is there a large caveat to that? You mean, other than the fact that kill(2) can't transmit such a key? But actually I don't see what you hope to gain from such a change, even if it can be made to work. Anyone who can do kill(SIGINT) can do kill(SIGKILL), say --- so you have to be able to trust the signal sender. What's the point of not trusting it to verify the client identity? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Thu, Mar 15, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: The way MyCancelKey is checked now is backwards, in my mind. It seems like it would be better checked by the receiving PID (one can use a check/recheck also, if so inclined). Is there a large caveat to that? You mean, other than the fact that kill(2) can't transmit such a key? I was planning on using an out-of-line mechanism. Bad idea? But actually I don't see what you hope to gain from such a change, even if it can be made to work. Anyone who can do kill(SIGINT) can do kill(SIGKILL), say --- so you have to be able to trust the signal sender. What's the point of not trusting it to verify the client identity? No longer true with pg_cancel_backend not-by-superuser, no? Now there are new people who can do kill(SIGINT) (modulus the already existing cancel requests). -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
Daniel Farina dan...@heroku.com writes: On Thu, Mar 15, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: But actually I don't see what you hope to gain from such a change, even if it can be made to work. Anyone who can do kill(SIGINT) can do kill(SIGKILL), say --- so you have to be able to trust the signal sender. What's the point of not trusting it to verify the client identity? No longer true with pg_cancel_backend not-by-superuser, no? No. That doesn't affect the above argument in the least. And in fact if there's any question whatsoever as to whether unprivileged cross-backend signals are secure, they are not going in in the first place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
2011/6/2 Noah Misch n...@leadboat.com: On Wed, Jun 01, 2011 at 10:26:34PM -0400, Josh Kupershmidt wrote: On Wed, Jun 1, 2011 at 5:55 PM, Noah Misch n...@leadboat.com wrote: On Sun, May 29, 2011 at 10:56:02AM -0400, Josh Kupershmidt wrote: Looking around, I see there were real problems[1] with sending SIGTERM to individual backends back in 2005 or so, and pg_terminate_backend() was only deemed safe enough to put in for 8.4 [2]. So expanding pg_terminate_backend() privileges does make me a tad nervous. The documentation for the CREATE USER flag would boil down to omit this flag only if you're worried about undiscovered PostgreSQL bugs in this area. ?I'd echo Tom's sentiment from the first thread, In any case I think we have to solve it, not create new mechanisms to try to ignore it. I do agree with Tom's sentiment from that thread. But, if we are confident that pg_terminate_backend() is safe enough to relax permissions on, then I take it you agree we should plan to extend this power to all users? Yes; that's what I was trying to say. Having thought about this some more, I do now see a risk. Currently, a SECURITY DEFINER function (actually any function, but that's where it matters) can trap query_canceled. By doing so, the author can ensure that only superusers and crashes may halt the function during a section protected in this way. One might use it to guard a series of updates made over dblink. pg_terminate_backend() breaks this protection. I've never designed something this way; it only suffices when you merely sort-of-care about transactional integrity. Perhaps it's an acceptable loss for this feature? And if so, is this patch a good first step on that path? Understand that the pg_terminate_backend() is able to kill process that need not to be killed. I suppose that looking inside the internal postgreql table in order to not allow a normal db owner to kill a superuser connection can avoid this problem? Yes. Reading through those old threads made me realize this patch would give database owners the ability to kill off autovacuum workers. Seems like we'd want to restrict that power to superusers. Would we? ?Any old user can already stifle VACUUM by holding a transaction open. This is true, though it's possible we might at some point want a backend process which really shouldn't be killable by non-superusers (if vacuum/autovacuum isn't one already.) Actually, I could easily imagine a superuser running an important query on a database getting peeved if a non-superuser were allowed to cancel/terminate his queries. That's really a different level of user isolation than we have. If your important query runs on a database owned by someone else, calls functions owned by someone else, or reads tables owned by someone else, you're substantially at the mercy of those object owners. That situation probably is unsatisfactory to some folks. Adding the possibility that a database owner could cancel your query seems like an extension of that codependency more than a new exposure. If I am the database owner I need to be able to manage my DB. Ok for superuser connection (and internal administrative process like autovacuum) I am the developer, not the DBA, so sometimes, when I wrong something, I need to kill my session if I wrong something Can we suppose, in a more generic case, that an user can kill connection only from the same user even if this is not the database owner? Best Regards, Torello -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
On Wed, Jun 1, 2011 at 5:55 PM, Noah Misch n...@leadboat.com wrote: On Sun, May 29, 2011 at 10:56:02AM -0400, Josh Kupershmidt wrote: Looking around, I see there were real problems[1] with sending SIGTERM to individual backends back in 2005 or so, and pg_terminate_backend() was only deemed safe enough to put in for 8.4 [2]. So expanding pg_terminate_backend() privileges does make me a tad nervous. The documentation for the CREATE USER flag would boil down to omit this flag only if you're worried about undiscovered PostgreSQL bugs in this area. I'd echo Tom's sentiment from the first thread, In any case I think we have to solve it, not create new mechanisms to try to ignore it. I do agree with Tom's sentiment from that thread. But, if we are confident that pg_terminate_backend() is safe enough to relax permissions on, then I take it you agree we should plan to extend this power to all users? And if so, is this patch a good first step on that path? Reading through those old threads made me realize this patch would give database owners the ability to kill off autovacuum workers. Seems like we'd want to restrict that power to superusers. Would we? Any old user can already stifle VACUUM by holding a transaction open. This is true, though it's possible we might at some point want a backend process which really shouldn't be killable by non-superusers (if vacuum/autovacuum isn't one already.) Actually, I could easily imagine a superuser running an important query on a database getting peeved if a non-superuser were allowed to cancel/terminate his queries. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
On Sun, May 29, 2011 at 5:04 AM, Noah Misch n...@leadboat.com wrote: What risks arise from unconditionally allowing these calls for the same user's backends? `pg_cancel_backend' ought to be safe enough; the user always has access to the standard cancellation protocol, making the SQL interface a mere convenience (albeit a compelling one). `pg_terminate_backend' does open up access to a new behavior, but no concrete risks come to mind. Looking around, I see there were real problems[1] with sending SIGTERM to individual backends back in 2005 or so, and pg_terminate_backend() was only deemed safe enough to put in for 8.4 [2]. So expanding pg_terminate_backend() privileges does make me a tad nervous. Reading through those old threads made me realize this patch would give database owners the ability to kill off autovacuum workers. Seems like we'd want to restrict that power to superusers. On the other hand, this *would* be substantial new authority for database owners. Seems like a reasonable authority to grant, though. And I also realized that this patch's approach might force us to maintain a permissions wart if we ever want to implement fine-grained control for this stuff, e.g. a per-role setting enabling self-kills. It would be a bit lame to have to document Use this CREATE/ALTER ROLE flag. Or be the database owner. Or be a superuser. Now, a few technical comments about the patch: 1.) This bit looks dangerous: + backend = pgstat_fetch_stat_beentry(i); + if (backend-st_procpid == pid) { Since pgstat_fetch_stat_beentry() might return NULL. I think you want BackendPidGetProc(). Ah, thanks for the pointer. Josh -- [1] http://postgresql.1045698.n5.nabble.com/pg-terminate-backend-idea-td1930120.html [2] http://postgresql.1045698.n5.nabble.com/Re-COMMITTERS-pgsql-Add-pg-terminate-backend-to-allow-terminating-only-a-single-td1983763i20.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
On Fri, Mar 11, 2011 at 8:54 AM, Bruce Momjian br...@momjian.us wrote: I have added it to the next commit fest. Hi Torello, I have volunteered (more accurately, Greg Smith volunteered me :-) to be a reviewer for this patch. I know you're a bit new here, so I thought I'd outline where this patch stands and what's expected if you'd like to move it along. We organize patch reviews via commitfests lasting a month or so. Some more information about this process: http://wiki.postgresql.org/wiki/CommitFest Each commitfest is a period wherein you can expect to receive some feedback on your patch and advice on things which might need to be improved (in this case, it's my job to provide you this feedback). Your patch is in the upcoming commitfest, scheduled to run from June 15 to July 14. So if you're interested in being responsible for this patch, or some variant of it, eventually making its way into PostgreSQL 9.2, you should be willing to update your patch based on feedback, request advice, etc. during this period. If you're not interested in getting sucked into this process that's OK -- just please advise us if that's the case, and maybe someone else will be willing to take charge of the patch. Anssi and I posted some initial feedback on the patch's goals earlier. I would like to ultimately see users have the capability to pg_cancel_backend() their own queries. But I could at least conceive of others not wanting this behavior enabled by default. So perhaps this patch's approach of granting extra privs to the database owner could work as a first attempt. And maybe a later version could introduce a GUC allowing the DBA to control whether users can cancel/terminate their backends, or we could instead have an option flag to CREATE/ALTER ROLE, allowing per-user configuration. It would be helpful to hear from others whether this patch's goals would work as a first pass at this problem, so that Torello doesn't waste time on a doomed approach. Also, it might be helpful to add an entry on the Todo list for 'allow non-superusers to use pg_cancel_backend()', in case this patch gets sunk. Now, a few technical comments about the patch: 1.) This bit looks dangerous: +backend = pgstat_fetch_stat_beentry(i); +if (backend-st_procpid == pid) { Since pgstat_fetch_stat_beentry() might return NULL. I'm a bit suspicious about whether looping through pgstat_fetch_stat_beentry() is the best way to determine the database owner for a given backend PID, but I haven't dug in enough yet to suggest a better alternative. 2.) The way the code inside pg_signal_backend() is structured, doing: select pg_cancel_backend(12345); as non-superuser, where '12345' is a fictitious PID, can now give you the incorrect error message: ERROR: must be superuser or target database owner to signal other server processes 3.) No documentation adjustments, and the comments need some cleaup. Torello: I'll be happy to handle comments/documentation for you as a native English speaker, so you don't have to worry about this part. That's it for now. Torello, I look forward to hearing back from you, and hope that you have some time to work on this patch further. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
Kevin Grittner wrote: Torello Querci tque...@gmail.com wrote: I attach a path for this It's too late in the release cycle to consider this for version 9.1. Please add it to the open CommitFest for consideration for 9.2: https://commitfest.postgresql.org/action/commitfest_view/open I have added it to the next commit fest. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
On Mon, Feb 14, 2011 at 8:58 AM, Anssi Kääriäinen anssi.kaariai...@thl.fi wrote: On 02/14/2011 02:10 PM, Torello Querci wrote: I suppose that give the right to the owner db user to terminate or cancel other session connected to the database which it is owner is a good thing. I not see any security problem because this user can cancel or terminate only the session related with the own database, but if you think that this is a problem, a configuration parameter can be used. For what it's worth, a big +1 from me. We have pretty much the same use case. It would be good if you could also terminate your own connections. The superuser-only restriction for pg_cancel_backend() has been a pet peeve of mine as well. I actually posted a patch a while back to let users pg_cancel_backend() their own queries, see: http://archives.postgresql.org/pgsql-admin/2010-02/msg00052.php IMO it'd be better to do away with this patch's check of: /* If the user not is the superuser, need to be the db owner. */ and instead just check if the target session's user matches that of the cancel requester. Additionally, this patch keeps all the permission checking inside pg_signal_backend(). That's fine if we're sure that we want pg_cancel_backend() and pg_terminate_backend() to undergo the same permissions check, but perhaps it's a bad idea to relax the permissions check on pg_terminate_backend() ? Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
Hi, this is the first time that I post here, so if I wrong please don't kill me ... I see that pg_terminate_backend and pg_cancel_backend can be execute only by admin users. This approach seems to be too restrictive in a lots of real situation. In dept, I have a situation where it is created one database machine for all the postgresql database. This database machine is managed by IT staff that have created two user for each application. One user is the owner db user that create, drop, grant on this db, while the other user is the application db. In this situation I (the developer) not able to disconnect any client and stop any high weight queries. Unfortunately the application run on application server that is manager, again, by IT staff and I not have the right to stop it. I suppose that give the right to the owner db user to terminate or cancel other session connected to the database which it is owner is a good thing. I not see any security problem because this user can cancel or terminate only the session related with the own database, but if you think that this is a problem, a configuration parameter can be used. Of course I can create a function with admin right that do the same thing but the IT staff need to install, configure, and give the right grant. So, I suppose, that this can to be only a workaround, not the solution. Sorry for my English. I attach a path for this Best Regards, Torello diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 5bda4af..5327447 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -33,6 +33,7 @@ #include storage/procarray.h #include utils/builtins.h #include tcop/tcopprot.h +#include pgstat.h #define atooid(x) ((Oid) strtoul((x), NULL, 10)) @@ -75,9 +76,33 @@ static bool pg_signal_backend(int pid, int sig) { if (!superuser()) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(must be superuser to signal other server processes; + { +bool haveRight = false; +PgBackendStatus *backend; + + /* If the user not is the superuser, need to be the db owner. */ + if (pg_database_ownercheck(MyDatabaseId, GetUserId())) { + +/* Check for the specify backend in the stat info table */ +int nBackend = pgstat_fetch_stat_numbackends(); +int i; +for (i = 1; i=nBackend; ++i) { +backend = pgstat_fetch_stat_beentry(i); +if (backend-st_procpid == pid) { +if (backend-st_databaseid == MyDatabaseId) +haveRight = true; +break; +} +} +} + +if (!haveRight) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), +(errmsg(must be superuser or database destination owner to signal other server processes; + } + + if (!IsBackendPid(pid)) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
On 02/14/2011 02:10 PM, Torello Querci wrote: I suppose that give the right to the owner db user to terminate or cancel other session connected to the database which it is owner is a good thing. I not see any security problem because this user can cancel or terminate only the session related with the own database, but if you think that this is a problem, a configuration parameter can be used. For what it's worth, a big +1 from me. We have pretty much the same use case. It would be good if you could also terminate your own connections. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend and pg_cancel_backend by not administrator user
Torello Querci tque...@gmail.com wrote: I attach a path for this It's too late in the release cycle to consider this for version 9.1. Please add it to the open CommitFest for consideration for 9.2: https://commitfest.postgresql.org/action/commitfest_view/open -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: The closest thing I can think of to an automated test is to run repeated sets of the parallel regression tests, and each time SIGTERM a randomly chosen backend at a randomly chosen time. Then see if anything funny Yep, that was my plan, plus running the parallel regression tests you get the possibility of 2 backends. I was intentionally suggesting only one kill per test cycle. Multiple kills will probably create an O(N^2) explosion in the set of possible downstream-failure deltas. I doubt you'd really get any improvement in testing coverage to justify the much larger amount of hand validation needed. It also strikes me that you could make some simple alterations to the regression tests to reduce the set of observable downstream deltas. For example, anyplace where a test loads a table with successive INSERTs and that table is used by later tests, wrap the INSERT sequence with BEGIN/END. Then there is only one possible downstream delta (empty table) and not N different possibilities for an N-row table. I have added pg_terminate_backend() to use SIGTERM and will start running tests as discussed with Tom. I will post my scripts too. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I think if we want pg_terminate_backend, we have to do it the way that was originally discussed: have it issue SIGTERM and fix whatever needs to be fixed in the SIGTERM code path. Well, with no movement on this TODO item since it was removed in 8.0, I am willing to give users something that works most of the time. If the users want it so bad, why has no one stepped up to do the testing? Good question. Tom and I talked about this on the phone today. I think the problem is testing to try to prove the lack of a bug. How long does someone test to know they have reasonably proven a bug doesn't exist? Right. I think we have *a lot* of users that regularly use SIGTERM to kill the backends, becuase they have no idea it's dangerous. They just use ps to find the backend and kill to see it go away. If we had a *big* problem with it, we'd hear about it. So I doubt we have. But it's quite possible that we have a narrow problem that can cause problems in some issues - but I expect most people who run this stuff without knowing it's dangerous aren't running the worlds largest and most loaded databases... //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4 if there is some reasonable amount of testing done during this development cycle to try to expose any problems. If no one is willing to do any such testing, then as far as I'm concerned there is no market for the feature and we should drop it from TODO. If someone can come up with an automated script to do this kind of testing, I can commit a VM or three to running this 24/7 for a month, easily... But I don't trust myself in coming up with a test-case that's good enough :-P //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane [EMAIL PROTECTED] writes: ISTM that there will be more cases like this in future, so we need a general solution anyway. I propose the following sort of code structure for these situations: on_shmem_exit(cleanup_routine, arg); PG_TRY(); { ... do something ... cancel_shmem_exit(cleanup_routine, arg); } PG_CATCH(); { cancel_shmem_exit(cleanup_routine, arg); cleanup_routine(arg); PG_RE_THROW(); } PG_END_TRY(); [We would also have to block SIGTERM around the second cancel_shmem_exit and cleanup_routine, no? Or if it's idempotent (actually, wouldn't it have to be?) run them in the reverse order.] This seems exceptionally fragile. Someone's bound to add another case without realizing it. I thought about doing something like having a flag in_unprotected_pg_try which PG_TRY would set to true, and on_shmem_exit would clear. Then any LWLock or other shmem operation could assert it's cleared. But that doesn't work for nested PG_TRY blocks. Possibly we could get it to work by keeping a pair of counters but I'm not sure that's sufficient. Are all the known cases LWLocks? Is there a reason we couldn't just have a generic cleanup routine which just releases all LWLocks we hold before exiting? Or weakly -- an assert in CHECK_FOR_INTERRUPTS that there are no LWLocks held or if there are that there's a cleanup routine in place. We wouldn't know for sure that it's the *right* cleanup routine or enough cleanup routines but hopefully we would catch the error often enough to notice. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Gregory Stark [EMAIL PROTECTED] writes: Or weakly -- an assert in CHECK_FOR_INTERRUPTS oops, that's irrelevant of course. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: ISTM that there will be more cases like this in future, so we need a general solution anyway. I propose the following sort of code structure for these situations: [We would also have to block SIGTERM around the second cancel_shmem_exit and cleanup_routine, no? Or if it's idempotent (actually, wouldn't it have to be?) run them in the reverse order.] No, we wouldn't, because a SIGTERM can only actually fire at a CHECK_FOR_INTERRUPTS() call. You'd just need to be sure there wasn't one in the cleanup code. Are all the known cases LWLocks? *None* of the known cases are LWLocks, nor any other resource that we have generic cleanup code for. The problem cases are one-off resources that it seemed we could avoid having a real cleanup mechanism for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane [EMAIL PROTECTED] writes: [We would also have to block SIGTERM around the second cancel_shmem_exit and cleanup_routine, no? Or if it's idempotent (actually, wouldn't it have to be?) run them in the reverse order.] No, we wouldn't, because a SIGTERM can only actually fire at a CHECK_FOR_INTERRUPTS() call. You'd just need to be sure there wasn't one in the cleanup code. Wait, huh? In that case I don't see what advantage any of this has over Bruce's patch. And his approach seemed a lot more robust. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: No, we wouldn't, because a SIGTERM can only actually fire at a CHECK_FOR_INTERRUPTS() call. You'd just need to be sure there wasn't one in the cleanup code. Wait, huh? In that case I don't see what advantage any of this has over Bruce's patch. And his approach seemed a lot more robust. Maybe I missed something, but I thought he was just proposing some macro syntactic sugar over the same code that I described. No, I meant the earlier patch which you rejected with the flag in MyProc. I realize there were other issues but the initial concern was that it wouldn't respond promptly because it would wait for CHECK_FOR_INTERRUPTS. But if sigterm was never handled except at a CHECK_FOR_INTERRUPTS then that was never a factor. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: No, we wouldn't, because a SIGTERM can only actually fire at a CHECK_FOR_INTERRUPTS() call. You'd just need to be sure there wasn't one in the cleanup code. Wait, huh? In that case I don't see what advantage any of this has over Bruce's patch. And his approach seemed a lot more robust. Maybe I missed something, but I thought he was just proposing some macro syntactic sugar over the same code that I described. I'm not against syntactic sugar, but it doesn't seem to be totally clean; I think you would need two repetitions of the function and arg anyway: PG_ENSURE_ERROR_CLEANUP(cleanup_function, arg); { ... risky code here ... } PG_END_ENSURE_ERROR_CLEANUP(cleanup_function, arg); because both the before and after parts need to know the function name and the arg to pass it. The only way to avoid duplication would be if the whole controlled code block was actually an argument to the macro: PG_ENSURE_ERROR_CLEANUP(cleanup_function, arg, ... risky code here ... ); but there are very good reasons not to do that when the controlled code can be large; for instance that #if can't be used portably in a macro argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Gregory Stark [EMAIL PROTECTED] writes: No, I meant the earlier patch which you rejected with the flag in MyProc. I realize there were other issues but the initial concern was that it wouldn't respond promptly because it would wait for CHECK_FOR_INTERRUPTS. No, that's not the concern in the least. The concern is that something could trap the attempted throwing of the error *after* CHECK_FOR_INTERRUPTS has noticed the signal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I am starting to think we need to analyze the source code rather than testing, because of what we are testing for. I was thinking about that a bit more, in connection with trying to verbalize why I don't like your patch ;-) The fundamental assumption here is that we think that proc_exit() from the main loop of PostgresMain() should be safe, because that's exercised anytime a client quits or dies unexpectedly, and so it should be a well-tested path. What is lacking such test coverage is the many code paths that start proc_exit() from any random CHECK_FOR_INTERRUPTS() point in the backend. Some of those points might be (in fact are known to be, as of today) holding locks or otherwise in a state that prevents a clean proc_exit(). Your patch proposes to replace that code path by one that fakes a QueryCancel error and then does proc_exit() once control reaches the outer level. While that certainly *should* work (ignoring the question of whether control gets to the outer level in any reasonable amount of time), it's a bit of a stretch to claim that it's a well-tested case. At the moment it would only arise if a QueryCancel interrupt arrived at approximately the same time that the client died or sent a disconnect message, which is surely far less probable than client death by itself. So I don't buy the argument that this is a better-tested path, and I definitely don't want to introduce new complexity in order to make it happen like that. I would like my use of SIGINT to be like someone pressing ^C and then \q in psql, but I can see that the SIGINT might be delivered in places where libpq would not deliver it, like during shutdown. Hopefully that could be addressed, but I agree using SIGTERM is better and more useful. Now, as to whether a SIGTERM-style exit is safe. With the exception of the PG_CATCH issues that we already know about, any other case seems like a clear coding error: you'd have to be doing something that you know could throw an error, without having made provision to clean up whatever unclean state you are in. It'd be nice to think that we haven't been that silly anywhere. Experience however teaches that such an assumption is hubris. I think that the way forward is to fix the PG_CATCH issues (which I will try to get done later this week) and then get someone to mount a serious effort to break it, as outlined in previous discussions: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php I was thinking of running two parallel regression tests and killing one at random and see if the others complete. One problem is that the regression tests issue a lot of server error messages so somehow I would need to filter out the normal errors, which I think I could do. Magnus has offered to test too. I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4 if there is some reasonable amount of testing done during this development cycle to try to expose any problems. If no one is willing to do any such testing, then as far as I'm concerned there is no market for the feature and we should drop it from TODO. Agreed, but the feature is going to be asked for again soon so we would probably have to put it on the features we don't want, and that is going to look pretty odd. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4 if there is some reasonable amount of testing done during this development cycle to try to expose any problems. If someone can come up with an automated script to do this kind of testing, I can commit a VM or three to running this 24/7 for a month, easily... But I don't trust myself in coming up with a test-case that's good enough :-P The closest thing I can think of to an automated test is to run repeated sets of the parallel regression tests, and each time SIGTERM a randomly chosen backend at a randomly chosen time. Then see if anything funny happens. The hard part here is distinguishing expected from unexpected regression outputs, especially in view of the fact that some of the tests depend on database contents set up by earlier tests. I'm thinking that you could automatically discard the regression diff for the specific test that got SIGTERM'd, as long as it looked like the normal output up to the point where the terminated by administrator error appears. Then what you'd have is the potential for downstream failures due to things not being created, which *should* fall into a fairly stylized set of possible diffs. So get the script to throw away any diffs that exactly match ones seen previously. Run it for awhile, and then hand-validate the set of diffs that it's saved ... or if any of 'em look funny, report. One gotcha I can think of is that killing the prepared_xacts test can leave you with open 2PC transactions, which will interfere with starting the next cycle of the tests (you have to kill them before you can dropdb). But you could add a rollback prepared to the driver script to clean out any uncommitted prepared xact. Whether this is workable or not depends on the size of the set of expected downstream-failure diffs. My gut feeling from many years of watching regression test crashes is that it'd be large but not completely impractical to look through by hand. I haven't time to write something like that myself, but offhand it seems like it could be done without more than a day or so's work, especially if you start from the buildfarm infrastructure. BTW, don't forget to include autovac workers in the set of SIGTERM target candidates. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4 if there is some reasonable amount of testing done during this development cycle to try to expose any problems. If someone can come up with an automated script to do this kind of testing, I can commit a VM or three to running this 24/7 for a month, easily... But I don't trust myself in coming up with a test-case that's good enough :-P The closest thing I can think of to an automated test is to run repeated sets of the parallel regression tests, and each time SIGTERM a randomly chosen backend at a randomly chosen time. Then see if anything funny Yep, that was my plan, plus running the parallel regression tests you get the possibility of 2 backends. happens. The hard part here is distinguishing expected from unexpected regression outputs, especially in view of the fact that some of the tests depend on database contents set up by earlier tests. Oh, that is a problem. I was going to get the typical errors, sort them and put them in a file. Then when I run the test, I sort the log again and use diff | grep '' to see an differences. If there are cases that generate new errors on a previous failure, I will have to add that output too. I'm thinking that you could automatically discard the regression diff for the specific test that got SIGTERM'd, as long as it looked like the normal output up to the point where the terminated by administrator error appears. Then what you'd have is the potential for downstream failures due to things not being created, which *should* fall into a fairly stylized set of possible diffs. So get the script to throw away any diffs that exactly match ones seen previously. Run it for awhile, and then hand-validate the set of diffs that it's saved ... or if any of 'em look funny, report. Yep, see above. One gotcha I can think of is that killing the prepared_xacts test can leave you with open 2PC transactions, which will interfere with starting the next cycle of the tests (you have to kill them before you can dropdb). But you could add a rollback prepared to the driver script to clean out any uncommitted prepared xact. Good idea. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: The closest thing I can think of to an automated test is to run repeated sets of the parallel regression tests, and each time SIGTERM a randomly chosen backend at a randomly chosen time. Then see if anything funny Yep, that was my plan, plus running the parallel regression tests you get the possibility of 2 backends. I was intentionally suggesting only one kill per test cycle. Multiple kills will probably create an O(N^2) explosion in the set of possible downstream-failure deltas. I doubt you'd really get any improvement in testing coverage to justify the much larger amount of hand validation needed. It also strikes me that you could make some simple alterations to the regression tests to reduce the set of observable downstream deltas. For example, anyplace where a test loads a table with successive INSERTs and that table is used by later tests, wrap the INSERT sequence with BEGIN/END. Then there is only one possible downstream delta (empty table) and not N different possibilities for an N-row table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: The closest thing I can think of to an automated test is to run repeated sets of the parallel regression tests, and each time SIGTERM a randomly chosen backend at a randomly chosen time. Then see if anything funny Yep, that was my plan, plus running the parallel regression tests you get the possibility of 2 backends. I was intentionally suggesting only one kill per test cycle. Multiple kills will probably create an O(N^2) explosion in the set of possible downstream-failure deltas. I doubt you'd really get any improvement in testing coverage to justify the much larger amount of hand validation needed. No, the point is that you have 2 backends to choose from to kill. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_terminate_backend() idea
Gregory Stark wrote: Add pg_terminate_backend() to allow terminating only a single session. I'm interested in this because I was already looking for a solution to the out of signals problem we have. I think we could expand this by having a bunch of boolean flags, one each for different conditions including the sinval processing conditions, interrupt, info, and terminate. (Any more?) The two things we would have to check to be sure of is: 1) Do we care about how many times events are processed? Ie, if you press interrupt twice is it important that that be handled as an interrupt twice? It doesn't work that way currently for interrupt but are any of the other conditions sensitive to this? I don't think so. Not that I can think of. 2) Do we care what order things happen in? Ie, if you send an info request and then a cancel request is it ok if the cancel is handled first. I don't see why not myself. And if it's a terminate request we *clear* don't want to bother handling any other events first. I don't think we care. It seems to me we could replace all of the above with either SIGINT or USR1 and have a bunch of boolean flags in MyProc. I'm not sure of the implication for sinval processing of having to get a whole bunch of LWLocks though. Keep in mind PGPROC-terminate was easier because once it was set it was never cleared. If you need to clear the flag and keep going the code is going to need to be a little more sophisticated to avoid lost interrupts. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_terminate_backend() issues
Tom Lane wrote: I wrote: All in all, this patch wasn't ready to apply without review. I'm currently looking to see whether it's salvageable or not. After further study, I've concluded that it is in fact not salvageable, and I respectfully request that it be reverted. OK, reverted. I was able to get things to more or less work most of the time with three or four additional ugly hacks in postgres.c, but I still don't have any great confidence that there aren't windows where a terminate request wouldn't be ignored (even without considering the uncooperative- function scenario). Moreover it's entirely unclear that this approach actually dodges any of the hypothetical bugs in SIGTERM handling. I don't understand. If we call proc_exit(0) instead, it is the same as someone exiting the backend via PQfinish(). Given the complexity that it adds, I think it's fair to say that this approach is probably buggier than the other way. I think if we want pg_terminate_backend, we have to do it the way that was originally discussed: have it issue SIGTERM and fix whatever needs to be fixed in the SIGTERM code path. As the TODO list used to say before it got editorialized upon, this is mostly a matter of testing (something that I can tell this patch was sadly lacking :-(). We do need also to go around and fix any places that think a PG_TRY block is a sufficient way to clean up state in shared memory --- cf this AFAIK-still-unfixed bug: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00218.php Well, with no movement on this TODO item since it was removed in 8.0, I am willing to give users something that works most of the time. As you said already, cancel isn't going to be 100% reliable anyway because of places we can't check for cancel. I will work on a new version of the patch that people can see. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: I was able to get things to more or less work most of the time with three or four additional ugly hacks in postgres.c, but I still don't have any great confidence that there aren't windows where a terminate request wouldn't be ignored (even without considering the uncooperative- function scenario). Moreover it's entirely unclear that this approach actually dodges any of the hypothetical bugs in SIGTERM handling. I don't understand. If we call proc_exit(0) instead, it is the same as someone exiting the backend via PQfinish(). Well, using proc_exit() instead of die() might have alleviated that particular objection, but there are still the others. I think if we want pg_terminate_backend, we have to do it the way that was originally discussed: have it issue SIGTERM and fix whatever needs to be fixed in the SIGTERM code path. Well, with no movement on this TODO item since it was removed in 8.0, I am willing to give users something that works most of the time. If the users want it so bad, why has no one stepped up to do the testing? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
I did a quick grep for PG_CATCH uses to see what we have along the lines of this bug: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00218.php In current sources there are three places at risk: btbulkdelete, as noted in the above message pg_start_backup needs to reset forcePageWrites = false createdb wants to UnlockSharedObject and delete any already-copied files In createdb, the Unlock actually doesn't need to get cleaned up, since transaction abort would release the lock anyway. But leaving a possibly large mess of orphaned files doesn't seem nice. ISTM that there will be more cases like this in future, so we need a general solution anyway. I propose the following sort of code structure for these situations: on_shmem_exit(cleanup_routine, arg); PG_TRY(); { ... do something ... cancel_shmem_exit(cleanup_routine, arg); } PG_CATCH(); { cancel_shmem_exit(cleanup_routine, arg); cleanup_routine(arg); PG_RE_THROW(); } PG_END_TRY(); where cancel_shmem_exit is defined to pop the latest shmem_exit_list entry if it matches the passed arguments (I don't see any particular need to search further than that in the list). This structure guarantees that cleanup_routine will be called on the way out of either a plain ERROR or a FATAL exit. Thoughts? We clearly must do something about this before we can even think of calling retail SIGTERM a supported feature ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() idea
Bruce Momjian [EMAIL PROTECTED] writes: Gregory Stark wrote: It seems to me we could replace all of the above with either SIGINT or USR1 and have a bunch of boolean flags in MyProc. I'm not sure of the implication for sinval processing of having to get a whole bunch of LWLocks though. Keep in mind PGPROC-terminate was easier because once it was set it was never cleared. If you need to clear the flag and keep going the code is going to need to be a little more sophisticated to avoid lost interrupts. That's kind of the point. I don't think we care about lost interrupts for any of these things. As long as we know a sinval flush happened I think we don't care how many more happened. Or a cancel request or terminate request. For sinval of course it would be important to clear the flags (with MyProc locked) prior to processing the queue in case one arrives as soon as we're finished. But I think that's the only detail. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: I did a quick grep for PG_CATCH uses to see what we have along the lines of this bug: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00218.php In current sources there are three places at risk: btbulkdelete, as noted in the above message pg_start_backup needs to reset forcePageWrites = false createdb wants to UnlockSharedObject and delete any already-copied files In createdb, the Unlock actually doesn't need to get cleaned up, since transaction abort would release the lock anyway. But leaving a possibly large mess of orphaned files doesn't seem nice. ISTM that there will be more cases like this in future, so we need a general solution anyway. I propose the following sort of code structure for these situations: on_shmem_exit(cleanup_routine, arg); PG_TRY(); { ... do something ... cancel_shmem_exit(cleanup_routine, arg); } PG_CATCH(); { cancel_shmem_exit(cleanup_routine, arg); cleanup_routine(arg); PG_RE_THROW(); } PG_END_TRY(); where cancel_shmem_exit is defined to pop the latest shmem_exit_list entry if it matches the passed arguments (I don't see any particular need to search further than that in the list). This structure guarantees that cleanup_routine will be called on the way out of either a plain ERROR or a FATAL exit. Thoughts? We clearly must do something about this before we can even think of calling retail SIGTERM a supported feature ... Here is a little different idea. Seems we want something more like: PG_ON_EXIT(); { cleanup_routine(arg); } PG_TRY(); { ... do something ... } PG_CATCH(); { PG_RE_THROW(); } PG_END_TRY(); PG_ON_EXIT() would place cleanup_routine() into the shmem_exit_list and it would be called at the end of PG_CATCH and removed from shmem_exit_list by PG_END_TRY(). Another idea would be to define the PG_CATCH() first so it would be put in the shmem_exit_list before the TRY was tried, but I assume you don't want all those functions to run on exit. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Bruce Momjian [EMAIL PROTECTED] writes: Here is a little different idea. Seems we want something more like: Yeah, I thought about building a set of macros that would have this ability folded in, but it didn't seem like much notational advantage given that you have to write the cleanup routine out-of-line anyway. Also, until we have a lot more than three use-sites, more macros don't seem to be worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Here is a little different idea. Seems we want something more like: Yeah, I thought about building a set of macros that would have this ability folded in, but it didn't seem like much notational advantage given that you have to write the cleanup routine out-of-line anyway. Yea, agreed. I am a little worried about the out-of-line issue because some cases might become more complex because you have to pass lots of parameters, perhaps. Also, until we have a lot more than three use-sites, more macros don't seem to be worth the trouble. Oh, only three use sites. I thought we had three wrong sites but that more had to be changed. Yea, go for it! -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I think if we want pg_terminate_backend, we have to do it the way that was originally discussed: have it issue SIGTERM and fix whatever needs to be fixed in the SIGTERM code path. Well, with no movement on this TODO item since it was removed in 8.0, I am willing to give users something that works most of the time. If the users want it so bad, why has no one stepped up to do the testing? Good question. Tom and I talked about this on the phone today. I think the problem is testing to try to prove the lack of a bug. How long does someone test to know they have reasonably proven a bug doesn't exist? I think the other problem is what to test. One SIGTERM problem that was reported was related to schema changes. Certainly that has to be tested, but what else: prepared statements, queries, HOT updates, connect/disconnect, notify? There are lots of place to test and it is hard to know which ones, and for how long. I am starting to think we need to analyze the source code rather than testing, because of what we are testing for. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend() issues
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: I was able to get things to more or less work most of the time with three or four additional ugly hacks in postgres.c, but I still don't have any great confidence that there aren't windows where a terminate request wouldn't be ignored (even without considering the uncooperative- function scenario). Moreover it's entirely unclear that this approach actually dodges any of the hypothetical bugs in SIGTERM handling. I don't understand. If we call proc_exit(0) instead, it is the same as someone exiting the backend via PQfinish(). Well, using proc_exit() instead of die() might have alleviated that particular objection, but there are still the others. Tom, thanks for you feedback. It was very helpful. I have adjusted the patch to return a locked PGPROC entry, initialize the PGPROC-terminate boolean, called proc_exit(), and moved the PGPROC-terminate test out of the cancel loop entirely so lost cancel signals are not as big a problem anymore. I agree it would be best for pg_terminate_backend() and for the postmaster use of SIGTERM if SIGTERM was more reliable about cleanup, so hopefully that will work for 8.4. I have attached my patch in hopes we can use it as a backup in case we can't get SIGTERM to work for individual backends for 8.4. [ I have posted about testing in a separate email.] -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/func.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v retrieving revision 1.432 diff -c -c -r1.432 func.sgml *** doc/src/sgml/func.sgml 15 Apr 2008 20:28:46 - 1.432 --- doc/src/sgml/func.sgml 16 Apr 2008 03:59:20 - *** *** 11849,11854 --- 11849,11857 primarypg_cancel_backend/primary /indexterm indexterm + primarypg_terminate_backend/primary +/indexterm +indexterm primarypg_reload_conf/primary /indexterm indexterm *** *** 11885,11890 --- 11888,11900 /row row entry + literalfunctionpg_terminate_backend/function(parameterpid/parameter typeint/)/literal + /entry +entrytypeboolean/type/entry +entryTerminate a backend/entry + /row + row +entry literalfunctionpg_reload_conf/function()/literal /entry entrytypeboolean/type/entry *** *** 11907,11915 /para para ! functionpg_cancel_backend/ sends a query cancel ! (systemitemSIGINT/) signal to a backend process identified by ! process ID. The process ID of an active backend can be found from the structfieldprocpid/structfield column in the structnamepg_stat_activity/structname view, or by listing the commandpostgres/command processes on the server with --- 11917,11926 /para para ! functionpg_cancel_backend/ and functionpg_terminate_backend/ ! send a query cancel (systemitemSIGINT/) signal to a backend process ! identified by process ID. The ! process ID of an active backend can be found from the structfieldprocpid/structfield column in the structnamepg_stat_activity/structname view, or by listing the commandpostgres/command processes on the server with Index: doc/src/sgml/runtime.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.413 diff -c -c -r1.413 runtime.sgml *** doc/src/sgml/runtime.sgml 15 Apr 2008 20:28:46 - 1.413 --- doc/src/sgml/runtime.sgml 16 Apr 2008 03:59:20 - *** *** 1372,1377 --- 1372,1384 well. /para /important + + para +To terminate a session while allowing other sessions to continue, use +functionpg_terminate_backend()/ (xref +linkend=functions-admin-signal-table) rather than sending a signal +to the child process. + /para /sect1 sect1 id=preventing-server-spoofing Index: src/backend/access/transam/twophase.c === RCS file: /cvsroot/pgsql/src/backend/access/transam/twophase.c,v retrieving revision 1.41 diff -c -c -r1.41 twophase.c *** src/backend/access/transam/twophase.c 25 Mar 2008 22:42:42 - 1.41 --- src/backend/access/transam/twophase.c 16 Apr 2008 03:59:20 - *** *** 283,288 --- 283,289 gxact-proc.databaseId = databaseid; gxact-proc.roleId = owner; gxact-proc.inCommit = false; + gxact-proc.terminate = false; gxact-proc.vacuumFlags = 0; gxact-proc.lwWaiting = false; gxact-proc.lwExclusive = false; Index:
Re: [HACKERS] pg_terminate_backend() issues
Bruce Momjian [EMAIL PROTECTED] writes: I am starting to think we need to analyze the source code rather than testing, because of what we are testing for. I was thinking about that a bit more, in connection with trying to verbalize why I don't like your patch ;-) The fundamental assumption here is that we think that proc_exit() from the main loop of PostgresMain() should be safe, because that's exercised anytime a client quits or dies unexpectedly, and so it should be a well-tested path. What is lacking such test coverage is the many code paths that start proc_exit() from any random CHECK_FOR_INTERRUPTS() point in the backend. Some of those points might be (in fact are known to be, as of today) holding locks or otherwise in a state that prevents a clean proc_exit(). Your patch proposes to replace that code path by one that fakes a QueryCancel error and then does proc_exit() once control reaches the outer level. While that certainly *should* work (ignoring the question of whether control gets to the outer level in any reasonable amount of time), it's a bit of a stretch to claim that it's a well-tested case. At the moment it would only arise if a QueryCancel interrupt arrived at approximately the same time that the client died or sent a disconnect message, which is surely far less probable than client death by itself. So I don't buy the argument that this is a better-tested path, and I definitely don't want to introduce new complexity in order to make it happen like that. Now, as to whether a SIGTERM-style exit is safe. With the exception of the PG_CATCH issues that we already know about, any other case seems like a clear coding error: you'd have to be doing something that you know could throw an error, without having made provision to clean up whatever unclean state you are in. It'd be nice to think that we haven't been that silly anywhere. Experience however teaches that such an assumption is hubris. I think that the way forward is to fix the PG_CATCH issues (which I will try to get done later this week) and then get someone to mount a serious effort to break it, as outlined in previous discussions: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4 if there is some reasonable amount of testing done during this development cycle to try to expose any problems. If no one is willing to do any such testing, then as far as I'm concerned there is no market for the feature and we should drop it from TODO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend
Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM to a backend, don't you? The main issue still is that console access id required, on any OS. Yeah. Though for the Windows case only, we could easily enough make it possible to run pg_ctl kill remotely, since we use a named pipe. Does this seem like a good or bad idea? //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend
Magnus Hagander wrote: Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM to a backend, don't you? The main issue still is that console access id required, on any OS. Yeah. Though for the Windows case only, we could easily enough make it possible to run pg_ctl kill remotely, since we use a named pipe. Does this seem like a good or bad idea? Not too helpful. How to kill a win32 backend from a linux workstation? Additionally, NP requires an authenticated RPC connection. I you're not allowed to access the console, you probably haven't got sufficient access permissions to NP as well, or you'd need extra policy tweaking or so. Nightmarish, just to avoid the easy and intuitive way. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
Magnus Hagander [EMAIL PROTECTED] writes: Though for the Windows case only, we could easily enough make it possible to run pg_ctl kill remotely, since we use a named pipe. Does this seem like a good or bad idea? Seems like we'd be opening a can of security worms :-( regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend
Though for the Windows case only, we could easily enough make it possible to run pg_ctl kill remotely, since we use a named pipe. Does this seem like a good or bad idea? Seems like we'd be opening a can of security worms :-( Not really, standard windows ACL already applies to everything, so you need to be an admin on the machine to make it work. Anyhoo, I don't really see the gain in it, which also seems to be what others think, so let's just drop that idea. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
Andreas Pflug [EMAIL PROTECTED] writes: utils/adt/misc.c says: //* Disabled in 8.0 due to reliability concerns; FIXME someday *// Datum *pg_terminate_backend*(PG_FUNCTION_ARGS) Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please remove that comment and make the function live finally? No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
Andrew Dunstan wrote: Andreas Pflug wrote: Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM to a backend, don't you? The main issue still is that console access id required, on any OS. Regards, Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: utils/adt/misc.c says: //* Disabled in 8.0 due to reliability concerns; FIXME someday *// Datum *pg_terminate_backend*(PG_FUNCTION_ARGS) Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please remove that comment and make the function live finally? No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. If the backend's stuck, I'll have to SIGTERM it, whether there's pg_terminate_backend or not. Ultimately, if resources should remain locked, there's no chance except restarting the whole server anyway. SIGTERM gives me a fair chance (90%) that it will work without restart. The persistent refusal of supporting the function makes it more painful to execute, but not less necessary. Regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: utils/adt/misc.c says: //* Disabled in 8.0 due to reliability concerns; FIXME someday *// Datum *pg_terminate_backend*(PG_FUNCTION_ARGS) Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please remove that comment and make the function live finally? No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. I am not sure how you prove the non-existance of a bug. Ideas? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. I am not sure how you prove the non-existance of a bug. Ideas? What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. Now wait a minute, is there some risk of lockup if I kill a backend ? Cause I do that relatively often (say 20 times a day, when some web users time out but their query keeps running). Should I rather not do it ? Thanks, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_terminate_backend
Csaba Nagy [EMAIL PROTECTED] writes: Now wait a minute, is there some risk of lockup if I kill a backend ? Cause I do that relatively often (say 20 times a day, when some web users time out but their query keeps running). Should I rather not do it ? statement_timeout is your friend. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
You didn't answer the original question: is killing SIGTERM a backend known/suspected to be dangerous ? And if yes, what's the risk (pointers to discussions would be nice too). statement_timeout is your friend. I know, but unfortunately I can't use it. I did try to use statement_timeout and it worked out quite bad (due to our usage scenario). Some of the web requests which time out on the web should still go through... and we have activities which should not observe statement timeout at all, i.e. they must finish however long that takes. I know it would be possible to use a different user with it's own statement timeout for those requests, but that means we have to rewrite a lot of code which is not possible immediately, and our admins would resist to add even more configuration (additional users=additional connection pool+caches and all to be configured). We also can fix the queries so no timeout happens in the first place, but that will take us even more time. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. If the backend's stuck, I'll have to SIGTERM it, whether there's pg_terminate_backend or not. Stuck? You have not shown us a case where SIGTERM rather than SIGINT is necessary or appropriate. It seems to me the above is assuming the existence of unknown backend bugs, exactly the same thing you think I shouldn't be assuming ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_terminate_backend
On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: You didn't answer the original question: is killing SIGTERM a backend ^^^ Nevermind, I don't do that. I do 'kill backend_pid' without specifying the signal, and I'm sufficiently unfamiliar with the unix signal names to have confused them. Is a plain kill still dangerous ? Thanks, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
Stuck? You have not shown us a case where SIGTERM rather than SIGINT is necessary or appropriate. It seems to me the above is assuming the existence of unknown backend bugs, exactly the same thing you think I shouldn't be assuming ... I do know a case where a plain kill will seem to be stucked: on vacuum of a big table. I guess when it starts an index's cleanup scan it will insist to finish it before stopping. I'm not sure if that's the cause, but I have seen delays of 30 minutes for killing a vacuum... it's true that finally it always did die... but it's also true that I have 'kill -9'-ed it before because I thought it's stucked. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
Csaba Nagy [EMAIL PROTECTED] writes: I do know a case where a plain kill will seem to be stucked: on vacuum of a big table. I guess when it starts an index's cleanup scan it will insist to finish it before stopping. We've fixed a few cases of missing CHECK_FOR_INTERRUPTS lately, and will fix more if you can point them out. Note though that SIGTERM is just as vulnerable to that as SIGINT. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. If the backend's stuck, I'll have to SIGTERM it, whether there's pg_terminate_backend or not. Stuck? You have not shown us a case where SIGTERM rather than SIGINT is necessary or appropriate. Last night, I had a long-running query I launched from pgAdmin. It was happily running and completing on the server (took about 2 hours), and the backend went back to IDLE. pgAdmin didn't get back a response, assuming the query was still running. Apparently, the VPN router had interrupted the connection silently without notifying either side of the tcp connection. Since the backend is IDLE, there's no query to cancel and SIGINT won't help. So Stuck for me means a backend *not* responding to SIGINT. BTW, there's another scenario where SIGINT won't help. Imagine an app running wild hammering the server with queries regardless of query cancels (maybe some retry mechanism). You'd like to interrupt that connection, i.e. get rid of the backend. Regards, Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pg_terminate_backend
Csaba Nagy wrote: On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: You didn't answer the original question: is killing SIGTERM a backend ^^^ Nevermind, I don't do that. I do 'kill backend_pid' without specifying the signal, and I'm sufficiently unfamiliar with the unix signal names to have confused them. Is a plain kill still dangerous ? SIGTERM is the default kill parameter, so you do exactly what I'm talking about. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
Csaba Nagy [EMAIL PROTECTED] writes: On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: You didn't answer the original question: is killing SIGTERM a backend ^^^ Nevermind, I don't do that. I do 'kill backend_pid' without specifying the signal, man kill says the default is SIGTERM. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend
Bruce Momjian wrote: I am not sure how you prove the non-existance of a bug. Ideas? Would be worth at least the Nobel prize :-) Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
man kill says the default is SIGTERM. OK, so that means I do use it... is it known to be dangerous ? I thought till now that it is safe to use. What about select pg_cancel_backend() ? Thanks, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend
Csaba Nagy wrote: man kill says the default is SIGTERM. OK, so that means I do use it... is it known to be dangerous ? I thought till now that it is safe to use. Apparently you never suffered any problems from that; neither did I. What about select pg_cancel_backend() That's the function wrapper around kill -SIGINT, which is probably the way you could safely stop your queries most of the time. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_terminate_backend
I am not sure how you prove the non-existance of a bug. Ideas? I do that by deleting all of my code (usually by accident :-) No code, no bugs! -- Korry
Re: [HACKERS] pg_terminate_backend
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. I am not sure how you prove the non-existance of a bug. Ideas? What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. OK, here is an opportunity for someone to run tests to get this into 8.2. The code already exists in CVS, but we need testing to enable it. I would think running a huge workload and killing it over and over again would be a good test. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. OK, here is an opportunity for someone to run tests to get this into 8.2. The code already exists in CVS, but we need testing to enable it. I would think running a huge workload and killing it over and over again would be a good test. Big multiprocess workload and you kill individual processes at random while letting the rest run. It probably needs to be something that stresses more of the code than pgbench would, too. (For instance, it'd be a good idea if some of the workload involved having a few 2PC transactions getting prepared and then either committed or rolled back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of corner case that's probably never been exercised.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_terminate_backend
Thanks. Good plan. --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. OK, here is an opportunity for someone to run tests to get this into 8.2. The code already exists in CVS, but we need testing to enable it. I would think running a huge workload and killing it over and over again would be a good test. Big multiprocess workload and you kill individual processes at random while letting the rest run. It probably needs to be something that stresses more of the code than pgbench would, too. (For instance, it'd be a good idea if some of the workload involved having a few 2PC transactions getting prepared and then either committed or rolled back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of corner case that's probably never been exercised.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] pg_terminate_backend
Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. For other cases I'd really really really appreciate if that function would make it into 8.2. utils/adt/misc.c says: #*ifdef* NOT_USED //* Disabled in 8.0 due to reliability concerns; FIXME someday *// Datum *pg_terminate_backend*(PG_FUNCTION_ARGS) Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please remove that comment and make the function live finally? Regards, Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
Andreas Pflug wrote: Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM to a backend, don't you? cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend idea
On Tue, 2005-06-21 at 23:34 -0400, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: In any case the correct way to solve the problem is to find out what's being left corrupt by SIGTERM, rather than install more messiness in order to avoid facing the real issue ... I am confused. Are you talking about the client SIGTERM or the server? I am talking about Rod Taylor's reports that SIGTERM'ing individual backends tends to lead to lock table corrupted crashes awhile later. Now, I've been playing the part of Chicken Little on this for awhile, but seeing an actual report of problems from the field certainly strengthens my feelings about it. Bringing this thread back to life. I have not seen a lock table corruption issue with SIGTERM in 8.1 on Solaris/Sun IV, Linux/AMD64, or Linux/Intel. I don't recall seeing one on 8.0.3 either though I'm pretty sure there were several on 8.0.1. There are times when locks for a process hang around for a few minutes before getting cleared. I don't recall whether they were ungranted table locks or entries waiting on a transaction ID lock, but the source was Slony and a large pg_listener structure with more than 2 pages (yes, pages not tuples). I have also seen processes refusing to acknowledge the signal and exit during btree index builds, but that's not a data corruption issue. -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_terminate_backend idea
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Assuming we don't get such a case, and a chance to fix it, before 8.1 (while still hoping we will get it fixed properly, we can't be sure, can we? If we were, it'd be fixed already). In this case, will you consider such a kludgy solution as a temporary fix to resolve a problem that a lot of users are having? And then plan to have it removed once sending SIGTERM directly to a backend can be considered safe? Kluges tend to become institutionalized, so my reaction is no. It's also worth pointing out that with so little understanding of the problem Rod is reporting, it's tough to make a convincing case that this kluge will avoid it. SIGTERM exit *shouldn't* be leaving any corrupted locktable entries behind; it's not that much different from the normal case. Until we find out what's going on, introducing still another exit path isn't really going to make me feel more comfortable, no matter how close it's alleged to be to the normal path. I have been running some tests to try to see the lock table corruption but I have been unable to reproduce the problem. I have attached my crude test scripts. I would run the scripts and then open another session as a different user and do UPDATE and LOCK to cause the psql session to block. The only functional difference I can see between a SIGTERM exit and a cancel followed by a normal exit is the call to AbortCurrentTransaction(). Could that be significant? Because I can't reproduce the failure I can't know for sure. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 #!/usr/contrib/bin/expect -- set timeout -1 eval spawn sql test expect -nocase test= send begin;\r expect -nocase test= send lock pg_class;\r expect -nocase test= send select * from pg_locks;\r expect -nocase test= send update test set x=3;\r expect -nocase test= expect eof exit while : do XPID=`/letc/ps_sysv -ef | grep 'postgres test'|grep -v grep|awk '{print $2}'` if [ $XPID != ] thenkill $XPID echo $XPID XPID=`/letc/ps_sysv -ef | grep 'psql test'|grep -v execargs|awk '{print $2}'` kill $XPID fi sleep 1 done ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] pg_terminate_backend idea
Bruce Momjian pgman@candle.pha.pa.us writes: I have been running some tests to try to see the lock table corruption but I have been unable to reproduce the problem. It's possible that what Rod was complaining of is fixed in CVS tip --- see discussion about RemoveFromWaitQueue() bug. If so, it would have been a bug that could be seen in other circumstances too, but maybe SIGTERM made it more probable for some reason. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] pg_terminate_backend idea
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: I have been running some tests to try to see the lock table corruption but I have been unable to reproduce the problem. It's possible that what Rod was complaining of is fixed in CVS tip --- see discussion about RemoveFromWaitQueue() bug. If so, it would have been a bug that could be seen in other circumstances too, but maybe SIGTERM made it more probable for some reason. Was that backpatched to 8.0.X? If not, I can test that branch of CVS for the problem. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster