On Thu, Mar 18, 2021 at 1:11 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Thu, Mar 18, 2021 at 12:46 PM Fujii Masao > <masao.fu...@oss.nttdata.com> wrote: > > On 2021/03/17 11:58, Kyotaro Horiguchi wrote: > > > The first suggested signature for pg_terminate_backend() with timeout > > > was pg_terminate_backend(pid, timeout). The current signature (pid, > > > wait?, timeout) looks redundant. Maybe the reason for rejecting 0 > > > astimeout is pg_terminate_backend(pid, true, 0) looks odd but it we > > > can wait forever in that case (as other features does). > > > > I'm afraid that "waiting forever" can cause something like deadlock > > situation, > > as follows. We have no mechanism to detect this for now. > > > > 1. backend 1 took the lock on the relation A. > > 2. backend 2 took the lock on the relation B. > > 3. backend 1 tries to take the lock on the relation B and is waiting for > > the lock to be released. > > 4. backend 2 accidentally executes pg_wait_for_backend_termination() with > > the pid of backend 1, and then is waiting for backend 1 to be > > terminated. > > Yeah this can happen. > > So, as stated upthread, how about a timeout 0 (which is default) > telling "don't wait", erroring out on negative value and when > specified a positive milliseconds value, then wait for that amount of > time. With this semantics, we can remove the wait flag for > pg_terminate_backend(pid, 0). Thoughts? > > And for pg_wait_for_backend_termination timeout 0 or negative, we > error out. Thoughts?
Attaching v11 patch that removed the wait boolean flag in the pg_terminate_backend and timeout 0 indicates "no wait", negative value "errors out", positive value "waits for those many milliseconds". Also addressed other review comments that I received upthread. Please review v11 further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
From 96c74184bdf378cb6105ae7d24bfc7a6888b955a Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com> Date: Fri, 19 Mar 2021 11:19:36 +0530 Subject: [PATCH v11] pg_terminate_backend with wait and timeout This patch adds extra default argument timeout (with default value 0 meaning don't wait), to pg_terminate_backend which will be used to terminate and wait timeout milliseconds for the backend's termination. This patch also adds a new function pg_wait_for_backend_termination(pid, timeout) which checks for the existence of the backend with given PID and waits timeout milliseconds for its termination. --- doc/src/sgml/func.sgml | 32 ++++++- doc/src/sgml/monitoring.sgml | 4 + src/backend/catalog/system_views.sql | 10 ++ src/backend/postmaster/pgstat.c | 3 + src/backend/storage/ipc/signalfuncs.c | 132 +++++++++++++++++++++++++- src/include/catalog/pg_proc.dat | 9 +- src/include/pgstat.h | 3 +- 7 files changed, 184 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9492a3c6b9..523e6dd72a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24821,7 +24821,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); <indexterm> <primary>pg_terminate_backend</primary> </indexterm> - <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type> ) + <function>pg_terminate_backend</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>0</literal> ) <returnvalue>boolean</returnvalue> </para> <para> @@ -24829,7 +24829,35 @@ SELECT collation for ('foo' COLLATE "de_DE"); specified process ID. This is also allowed if the calling role is a member of the role whose backend is being terminated or the calling role has been granted <literal>pg_signal_backend</literal>, - however only superusers can terminate superuser backends. + however only superusers can terminate superuser backends. When + <parameter>timeout</parameter> is not specified, this function sends + SIGTERM to the backend with the given process ID without waiting for + its actual termination and returns <literal>true</literal>. Sometimes + the backend process may still exist. With <parameter>timeout</parameter> + specified in milliseconds, the function ensures that the backend + process is actually goes away from the system processes. It keeps + checking for the existence of the backend process until the given + <parameter>timeout</parameter> occurs and returns <literal>true</literal> + if the process doesn't exit. On timeout a warning is emitted and + <literal>false</literal> is returned. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_wait_for_backend_termination</primary> + </indexterm> + <function>pg_wait_for_backend_termination</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>timeout</parameter> <type>bigint</type> <literal>DEFAULT</literal> <literal>5000</literal> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Checks the existence of backend process with specified process ID. It + waits until the given <parameter>timeout</parameter> or the default + 5000 milliseconds or 5 seconds. <literal>true</literal> is returned + when the backend is found to be not existing within the + <parameter>timeout</parameter> milliseconds. On timeout, a warning is + emitted and <literal>false</literal> is returned. </para></entry> </row> </tbody> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index db4b4e460c..f362184d5e 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1569,6 +1569,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </thead> <tbody> + <row> + <entry><literal>BackendTermination</literal></entry> + <entry>Waiting for the termination of a backend.</entry> + </row> <row> <entry><literal>BackupWaitWalArchive</literal></entry> <entry>Waiting for WAL files required for a backup to be successfully diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 0dca65dc7b..192c0ae2bd 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1277,6 +1277,16 @@ CREATE OR REPLACE FUNCTION RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote' PARALLEL SAFE; +CREATE OR REPLACE FUNCTION + pg_terminate_backend(pid integer, timeout int8 DEFAULT 0) + RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend' + PARALLEL SAFE; + +CREATE OR REPLACE FUNCTION + pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000) + RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination' + PARALLEL SAFE; + -- legacy definition for compatibility with 9.3 CREATE OR REPLACE FUNCTION json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 208a33692f..78729f1b92 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3998,6 +3998,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) switch (w) { + case WAIT_EVENT_BACKEND_TERMINATION: + event_name = "BackendTermination"; + break; case WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE: event_name = "BackupWaitWalArchive"; break; diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c index 69fe23a256..1e141fba7c 100644 --- a/src/backend/storage/ipc/signalfuncs.c +++ b/src/backend/storage/ipc/signalfuncs.c @@ -18,6 +18,7 @@ #include "catalog/pg_authid.h" #include "miscadmin.h" +#include "pgstat.h" #include "postmaster/syslogger.h" #include "storage/pmsignal.h" #include "storage/proc.h" @@ -126,15 +127,96 @@ pg_cancel_backend(PG_FUNCTION_ARGS) } /* - * Signal to terminate a backend process. This is allowed if you are a member - * of the role whose process is being terminated. + * Wait until there is no backend process with the given PID and return true. + * On timeout, a warning is emitted and false is returned. + */ +static bool +pg_wait_until_termination(int pid, int64 timeout) +{ + /* + * Wait in steps of waittime milliseconds until this function exits or + * timeout. + */ + int64 waittime = 100; + /* + * Initially remaining time is the entire timeout specified by the user. + */ + int64 remainingtime = timeout; + + /* + * Check existence of the backend. If it doesn't exist, then check for the + * errno ESRCH that confirms it and return true. If the backend still + * exists, then wait for waittime milliseconds, again check for the + * existence. Repeat this until timeout or an error occurs or a pending + * interrupt such as query cancel gets processed. + */ + do + { + if (remainingtime < waittime) + waittime = remainingtime; + + if (kill(pid, 0) == -1) + { + if (errno == ESRCH) + return true; + else + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("could not check the existence of the backend with PID %d: %m", + pid))); + } + + /* Process interrupts, if any, before going for wait. */ + CHECK_FOR_INTERRUPTS(); + + (void) WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + waittime, + WAIT_EVENT_BACKEND_TERMINATION); + + ResetLatch(MyLatch); + + remainingtime -= waittime; + } while (remainingtime > 0); + + ereport(WARNING, + (errmsg("backend with PID %d did not terminate within %lld milliseconds", + pid, (long long int) timeout))); + + return false; +} + +/* + * Signal to terminate a backend process. This is allowed if you are a member + * of the role whose process is being terminated. If timeout input argument is + * 0 (which is default), then this function just signals the backend and + * doesn't wait. Otherwise it waits until given the timeout milliseconds or no + * process has the given PID and returns true. On timeout, a warning is emitted + * and false is returned. * * Note that only superusers can signal superuser-owned processes. */ Datum pg_terminate_backend(PG_FUNCTION_ARGS) { - int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM); + int pid; + int r; + int timeout; + bool wait; + bool result; + + pid = PG_GETARG_INT32(0); + timeout = PG_GETARG_INT64(1); + + if (timeout < 0) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("\"timeout\" must not be negative"))); + + /* If timeout is zero (which is default), then don't wait. */ + wait = (timeout == 0) ? false : true; + + r = pg_signal_backend(pid, SIGTERM); if (r == SIGNAL_BACKEND_NOSUPERUSER) ereport(ERROR, @@ -146,7 +228,49 @@ pg_terminate_backend(PG_FUNCTION_ARGS) (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))); - PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS); + result = (r == SIGNAL_BACKEND_SUCCESS); + + /* Wait only if requested and the termination is successful. */ + if (wait && result) + result = pg_wait_until_termination(pid, timeout); + + PG_RETURN_BOOL(result); +} + +/* + * This function waits for a backend with the given PID to be terminated or + * until the given timeout milliseconds occurs. Returns true if the backend + * doesn't exist. On Timeout a warning is emitted and false is returned. + */ +Datum +pg_wait_for_backend_termination(PG_FUNCTION_ARGS) +{ + int pid; + int64 timeout; + PGPROC *proc = NULL; + bool result = false; + + pid = PG_GETARG_INT32(0); + timeout = PG_GETARG_INT64(1); + + if (timeout <= 0) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("\"timeout\" must not be negative or zero"))); + + proc = BackendPidGetProc(pid); + + if (proc == NULL) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", pid))); + + PG_RETURN_BOOL(result); + } + + result = pg_wait_until_termination(pid, timeout); + + PG_RETURN_BOOL(result); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 93393fcfd4..2f5fa24bf0 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6143,9 +6143,14 @@ { oid => '2171', descr => 'cancel a server process\' current query', proname => 'pg_cancel_backend', provolatile => 'v', prorettype => 'bool', proargtypes => 'int4', prosrc => 'pg_cancel_backend' }, -{ oid => '2096', descr => 'terminate a server process', +{ oid => '2096', descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs', proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool', - proargtypes => 'int4', prosrc => 'pg_terminate_backend' }, + proargtypes => 'int4 int8', proargnames => '{pid,timeout}', + prosrc => 'pg_terminate_backend' }, +{ oid => '16386', descr => 'wait for a backend process exit or timeout occurs', + proname => 'pg_wait_for_backend_termination', provolatile => 'v', prorettype => 'bool', + proargtypes => 'int4 int8', proargnames => '{pid,timeout}', + prosrc => 'pg_wait_for_backend_termination' }, { oid => '2172', descr => 'prepare for taking an online backup', proname => 'pg_start_backup', provolatile => 'v', proparallel => 'r', prorettype => 'pg_lsn', proargtypes => 'text bool bool', diff --git a/src/include/pgstat.h b/src/include/pgstat.h index be43c04802..bbfe2f2f04 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -967,7 +967,8 @@ typedef enum */ typedef enum { - WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE = PG_WAIT_IPC, + WAIT_EVENT_BACKEND_TERMINATION = PG_WAIT_IPC, + WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE, WAIT_EVENT_BGWORKER_SHUTDOWN, WAIT_EVENT_BGWORKER_STARTUP, WAIT_EVENT_BTREE_PAGE, -- 2.25.1