Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Wed, Mar 8, 2017 at 9:05 PM, Masahiko Sawadawrote: > On Wed, Feb 8, 2017 at 12:04 AM, Fujii Masao wrote: >> On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinek >> wrote: >>> On 06/02/17 17:33, Fujii Masao wrote: On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek wrote: > On 03/02/17 19:38, Fujii Masao wrote: >> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao >> wrote: >>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >>> wrote: At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao wrote in
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Wed, Feb 8, 2017 at 12:04 AM, Fujii Masaowrote: > On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinek > wrote: >> On 06/02/17 17:33, Fujii Masao wrote: >>> On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek >>> wrote: On 03/02/17 19:38, Fujii Masao wrote: > On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao > wrote: >> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >> wrote: >>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao >>> wrote in >>>
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Sat, Feb 4, 2017 at 3:11 PM, Petr Jelinekwrote: > That was the reason why DropSubscription didn't release the lock in the > first place. It was supposed to be released at the end of the > transaction though. Holding an LWLock until end-of-transaction is a phenomenally bad idea, both because you lose interruptibility and because of the deadlock risk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinekwrote: > On 06/02/17 17:33, Fujii Masao wrote: >> On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek >> wrote: >>> On 03/02/17 19:38, Fujii Masao wrote: On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao wrote: > On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI > wrote: >> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao >> wrote in >>
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On 06/02/17 17:33, Fujii Masao wrote: > On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek >wrote: >> On 03/02/17 19:38, Fujii Masao wrote: >>> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao wrote: On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI wrote: > At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao > wrote in >
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinekwrote: > On 03/02/17 19:38, Fujii Masao wrote: >> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao wrote: >>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >>> wrote: At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao wrote in
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On 03/02/17 19:38, Fujii Masao wrote: > On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masaowrote: >> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >> wrote: >>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao >>> wrote in >>>
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masaowrote: > On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI > wrote: >> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao wrote >> in
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHIwrote: > At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao wrote > in
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masaowrote in
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquierwrote: > On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane wrote: >> Kyotaro HORIGUCHI writes: >>> Then, the reason for the TRY-CATCH cluase is that I found that >>> some functions called from there can throw exceptions. >> >> Yes, but all LWLocks should be released by normal error recovery. >> It should not be necessary for this code to clean that up by hand. >> If it were necessary, there would be TRY-CATCH around every single >> LWLockAcquire in the backend, and we'd have an unreadable and >> unmaintainable system. Please don't add a TRY-CATCH unless it's >> *necessary* -- and you haven't explained why this one is. Yes. > Putting hands into the code and at the problem, I can see that > dropping a subscription on a node makes it unresponsive in case of a > stop. And that's just because calls to LWLockRelease are missing as in > the patch attached. A try/catch problem should not be necessary. Thanks for the patch! With the patch, LogicalRepLauncherLock is released at the end of DropSubscription(). But ISTM that the lock should be released just after logicalrep_worker_stop() and there is no need to protect the removal of replication slot with the lock. /* * If we found worker but it does not have proc set it is starting up, * wait for it to finish and then kill it. */ while (worker && !worker->proc) { ISTM that the above loop in logicalrep_worker_stop() is not necessary because LogicalRepLauncherLock ensures that the above condition is always false. Thought? Am I missing something? If the above condition is true, which means that there is the worker slot having the "subid" of the worker to kill, but its "proc" has not been set yet. Without LogicalRepLauncherLock, this situation can happen after "subid" is set by the launcher and before "proc" is set by the worker. But LogicalRepLauncherLock protects those operations, so logicalrep_worker_stop() called while holding the lock should always think the above condition is false. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Thu, Feb 2, 2017 at 2:13 PM, Tom Lanewrote: > Kyotaro HORIGUCHI writes: >> Then, the reason for the TRY-CATCH cluase is that I found that >> some functions called from there can throw exceptions. > > Yes, but all LWLocks should be released by normal error recovery. > It should not be necessary for this code to clean that up by hand. > If it were necessary, there would be TRY-CATCH around every single > LWLockAcquire in the backend, and we'd have an unreadable and > unmaintainable system. Please don't add a TRY-CATCH unless it's > *necessary* -- and you haven't explained why this one is. Putting hands into the code and at the problem, I can see that dropping a subscription on a node makes it unresponsive in case of a stop. And that's just because calls to LWLockRelease are missing as in the patch attached. A try/catch problem should not be necessary. -- Michael drop-subs-locks.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Kyotaro HORIGUCHIwrites: > Then, the reason for the TRY-CATCH cluase is that I found that > some functions called from there can throw exceptions. Yes, but all LWLocks should be released by normal error recovery. It should not be necessary for this code to clean that up by hand. If it were necessary, there would be TRY-CATCH around every single LWLockAcquire in the backend, and we'd have an unreadable and unmaintainable system. Please don't add a TRY-CATCH unless it's *necessary* -- and you haven't explained why this one is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
At Thu, 2 Feb 2017 08:46:11 +0900, Michael Paquierwrote in > On Thu, Feb 2, 2017 at 2:14 AM, Fujii Masao wrote: > > The lwlock would be released when an exception occurs, so I don't think > > that TRY-CATCH is necessary here. Or it's necessary for another reason? > > +PG_CATCH(); > +{ > +LWLockRelease(LogicalRepLauncherLock); > +PG_RE_THROW(); > +} > +PG_END_TRY(); > Just to do that, a TRY/CATCH block looks like an overkill to me. Why > not just call LWLockRelease in the ERROR and return code paths? I though the same first. The modification at the "if (wrconn ==" is the remains of that. It is reverted inthe attached patch. Then, the reason for the TRY-CATCH cluase is that I found that some functions called from there can throw exceptions. logicalrep_worker_stop and replorigin_drop have ereport in its path. load_library apparently can throw exception. (walrcv_(libpq_) functions don't seeem to.) regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From d0ca653bb2aa776742a2e7a697b02794b1ad66d9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 2 Feb 2017 11:33:40 +0900 Subject: [PATCH] Fix DROP SUBSCRIPTION's lock leak. DROP SUBSCRIPTION acquires a lock on LogicalRepLauncherLock but never releases. This fixes it. --- src/backend/commands/subscriptioncmds.c | 86 ++--- 1 file changed, 48 insertions(+), 38 deletions(-) diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 5de..223eea4 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -511,52 +511,62 @@ DropSubscription(DropSubscriptionStmt *stmt) /* Protect against launcher restarting the worker. */ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE); - /* Kill the apply worker so that the slot becomes accessible. */ - logicalrep_worker_stop(subid); - - /* Remove the origin tracking if exists. */ - snprintf(originname, sizeof(originname), "pg_%u", subid); - originid = replorigin_by_name(originname, true); - if (originid != InvalidRepOriginId) - replorigin_drop(originid); - - /* If the user asked to not drop the slot, we are done mow.*/ - if (!stmt->drop_slot) - { - heap_close(rel, NoLock); - return; - } - /* - * Otherwise drop the replication slot at the publisher node using - * the replication connection. + * Some functions called here can throw exceptions. we must release + * LogicalRepLauncherLock for the case. */ - load_file("libpqwalreceiver", false); + PG_TRY(); + { + /* Kill the apply worker so that the slot becomes accessible. */ + logicalrep_worker_stop(subid); - initStringInfo(); - appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname); + /* Remove the origin tracking if exists. */ + snprintf(originname, sizeof(originname), "pg_%u", subid); + originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + replorigin_drop(originid); - wrconn = walrcv_connect(conninfo, true, subname, ); - if (wrconn == NULL) - ereport(ERROR, -(errmsg("could not connect to publisher when attempting to " - "drop the replication slot \"%s\"", slotname), - errdetail("The error was: %s", err))); + /* Do the follwoing only if the user asked to actually drop the slot */ + if (stmt->drop_slot) + { + /* + * Drop the replication slot at the publisher node using the + * replication connection. + */ + load_file("libpqwalreceiver", false); - if (!walrcv_command(wrconn, cmd.data, )) - ereport(ERROR, -(errmsg("could not drop the replication slot \"%s\" on publisher", - slotname), - errdetail("The error was: %s", err))); - else - ereport(NOTICE, -(errmsg("dropped replication slot \"%s\" on publisher", - slotname))); + initStringInfo(); + appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname); + + wrconn = walrcv_connect(conninfo, true, subname, ); + if (wrconn == NULL) +ereport(ERROR, + (errmsg("could not connect to publisher when attempting to " +"drop the replication slot \"%s\"", slotname), + errdetail("The error was: %s", err))); - walrcv_disconnect(wrconn); + else if (!walrcv_command(wrconn, cmd.data, )) +ereport(ERROR, + (errmsg("could not drop the replication slot \"%s\" on publisher", +slotname), + errdetail("The error was: %s", err))); + + ereport(NOTICE, + (errmsg("dropped replication slot \"%s\" on publisher", + slotname))); - pfree(cmd.data); + walrcv_disconnect(wrconn); + pfree(cmd.data); + } + } + PG_CATCH(); + { + LWLockRelease(LogicalRepLauncherLock); + PG_RE_THROW(); + } + PG_END_TRY(); + LWLockRelease(LogicalRepLauncherLock); heap_close(rel, NoLock); } -- 2.9.2 -- Sent via pgsql-hackers mailing list
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Thu, Feb 2, 2017 at 2:14 AM, Fujii Masaowrote: > The lwlock would be released when an exception occurs, so I don't think > that TRY-CATCH is necessary here. Or it's necessary for another reason? +PG_CATCH(); +{ +LWLockRelease(LogicalRepLauncherLock); +PG_RE_THROW(); +} +PG_END_TRY(); Just to do that, a TRY/CATCH block looks like an overkill to me. Why not just call LWLockRelease in the ERROR and return code paths? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
On Wed, Feb 1, 2017 at 5:36 PM, Kyotaro HORIGUCHIwrote: > Hello, while looking another bug, I found that standby cannot > shutdown after DROP SUBSCRIPTION. > > standby=# CREATE SUBSCRPTION sub1 ... > standby=# > standby=# DROP SUBSCRIPTION sub1; > > Ctrl-C to the standby fails to work. ApplyLauncherMain is waiting > LogicalRepLauncherLock forever. > > The culprit is DropSbuscription. It acquires > LogicalRepLauncherLock but never releases. > > The attached patch fixes it. Most part of the fucntion is now > enclosed by PG_TRY-CATCH since some functions can throw > exceptions. The lwlock would be released when an exception occurs, so I don't think that TRY-CATCH is necessary here. Or it's necessary for another reason? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Hello, while looking another bug, I found that standby cannot shutdown after DROP SUBSCRIPTION. standby=# CREATE SUBSCRPTION sub1 ... standby=# standby=# DROP SUBSCRIPTION sub1; Ctrl-C to the standby fails to work. ApplyLauncherMain is waiting LogicalRepLauncherLock forever. The culprit is DropSbuscription. It acquires LogicalRepLauncherLock but never releases. The attached patch fixes it. Most part of the fucntion is now enclosed by PG_TRY-CATCH since some functions can throw exceptions. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 5de..f07143e 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -511,51 +511,67 @@ DropSubscription(DropSubscriptionStmt *stmt) /* Protect against launcher restarting the worker. */ LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE); - /* Kill the apply worker so that the slot becomes accessible. */ - logicalrep_worker_stop(subid); - - /* Remove the origin tracking if exists. */ - snprintf(originname, sizeof(originname), "pg_%u", subid); - originid = replorigin_by_name(originname, true); - if (originid != InvalidRepOriginId) - replorigin_drop(originid); - - /* If the user asked to not drop the slot, we are done mow.*/ - if (!stmt->drop_slot) - { - heap_close(rel, NoLock); - return; - } - /* - * Otherwise drop the replication slot at the publisher node using - * the replication connection. + * replorigin_drop can throw an exception. we must release + * LogicalRepLauncherLock for the case. */ - load_file("libpqwalreceiver", false); + PG_TRY(); + { + /* Kill the apply worker so that the slot becomes accessible. */ + logicalrep_worker_stop(subid); - initStringInfo(); - appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname); + /* Remove the origin tracking if exists. */ + snprintf(originname, sizeof(originname), "pg_%u", subid); + originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + replorigin_drop(originid); - wrconn = walrcv_connect(conninfo, true, subname, ); - if (wrconn == NULL) - ereport(ERROR, -(errmsg("could not connect to publisher when attempting to " - "drop the replication slot \"%s\"", slotname), - errdetail("The error was: %s", err))); + /* If the user asked to not drop the slot, we are done now.*/ + if (stmt->drop_slot) + { + /* + * Otherwise drop the replication slot at the publisher node using + * the replication connection. + */ + load_file("libpqwalreceiver", false); + + initStringInfo(); + appendStringInfo(, "DROP_REPLICATION_SLOT \"%s\"", slotname); + + wrconn = walrcv_connect(conninfo, true, subname, ); + if (wrconn == NULL || !walrcv_command(wrconn, cmd.data, )) + { +if (wrconn == NULL) + ereport(ERROR, + (errmsg("could not connect to publisher when " + "attempting to drop the " + "replication slot \"%s\"", slotname), + errdetail("The error was: %s", err))); - if (!walrcv_command(wrconn, cmd.data, )) - ereport(ERROR, -(errmsg("could not drop the replication slot \"%s\" on publisher", - slotname), - errdetail("The error was: %s", err))); - else - ereport(NOTICE, -(errmsg("dropped replication slot \"%s\" on publisher", - slotname))); +ereport(ERROR, + (errmsg("could not drop the replication slot \"%s\" on publisher", +slotname), + errdetail("The error was: %s", err))); + } - walrcv_disconnect(wrconn); + ereport(NOTICE, + (errmsg("dropped replication slot \"%s\" on publisher", + slotname))); - pfree(cmd.data); + pfree(cmd.data); + } + } + PG_CATCH(); + { + LWLockRelease(LogicalRepLauncherLock); + PG_RE_THROW(); + } + PG_END_TRY(); + + LWLockRelease(LogicalRepLauncherLock); + + if (wrconn) + walrcv_disconnect(wrconn); heap_close(rel, NoLock); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers