Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-09-09 Thread Robert Haas
On Wed, Sep 9, 2015 at 8:14 AM, Ashutosh Bapat
 wrote:
> PFA patch with improved test module and fix for a bug.
>
> bgworker_sigusr1_handler() should set the latch when set_latch_on_sigusr1 is
> true, similar to procsignal_sigusr1_handler(). Without this fix, if a
> background worker without DATABASE_CONNECTION flag calls
> WaitForBackgroundWorker*() functions, those functions wait indefinitely as
> the latch is never set upon receipt of SIGUSR1.

This is hardly an insurmountable problem given that they can replace
bgworker_sigusr1_handler() with another handler whenever they like.
It might be a good idea anyway, but not without saving and restoring
errno.

>> Thanks.  I don't think this test module is suitable for commit for a
>> number of reasons, including the somewhat hackish use of exit(0)
>> instead of proper error reporting
> I have changed this part of code.

Maybe I should have been more clear: I don't really want a test module
for this.  Yeah, there was a bug, but we fixed it, and I don't see
that it's going to come back.  We might have a different one, but this
module is so special-purpose that it won't catch that.  If there are
some +1s to the idea of this test module from other people then I am
not unwilling to reconsider, but my opinion is that this is the wrong
thing to spend time on.  If we had some plan to build out a test
framework here that would really thoroughly exercise these code paths,
that might be valuable -- I'm not opposed to the idea of trying to
have better test coverage of the bgworker code.  But I just don't see
that this particular module does enough to make it worthwhile.

Considering that, I'm not really excited about spending a lot of time
on review right now, but FWIW I still think the error handling in here
is weak.  Why elog() and not ereport()?  Yeah, this is not user-facing
stuff exactly because it's just a test module, but where else do we
use elog() like this?  Why elog(WARNING) and proc_exit(1) instead of
elog(FATAL) or elog(PANIC)? The messages don't follow the style
guidelines either, like "found launcher in unexpected state (expected
status %d, got %d), exiting." -- that doesn't look anything like our
usual style for reporting errors.

>> , the fact that you didn't integrate
>> it into the Makefile structure properly
>
> What was missing? I am able to make {check,clean,install) from the
> directory. I can also make -C  check from repository's root.

make check-world won't run it, right?  So there would be no BF coverage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-09-09 Thread Ashutosh Bapat
On Wed, Sep 2, 2015 at 2:02 AM, Robert Haas  wrote:

> On Mon, Aug 31, 2015 at 8:01 AM, Ashutosh Bapat
>  wrote:
> >> Thanks.  It needs testing though to see if it really works as
> >> intended.  Can you look into that?
> >
> > PFA the patch containing your code changes + test module. See if that
> meets
> > your expectations.
>
>
PFA patch with improved test module and fix for a bug.

bgworker_sigusr1_handler() should set the latch when set_latch_on_sigusr1
is true, similar to procsignal_sigusr1_handler(). Without this fix, if a
background worker without DATABASE_CONNECTION flag calls
WaitForBackgroundWorker*() functions, those functions wait indefinitely as
the latch is never set upon receipt of SIGUSR1.


> Thanks.  I don't think this test module is suitable for commit for a
> number of reasons, including the somewhat hackish use of exit(0)
> instead of proper error reporting


I have changed this part of code.


> , the fact that you didn't integrate
> it into the Makefile structure properly


What was missing? I am able to make {check,clean,install) from the
directory. I can also make -C  check from repository's root.


> and the fact that without the
> postmaster.c changes it hangs forever instead of causing a test
> failure.


Changed this too. The SQL level function test_bgwnotify() now errors out if
it doesn't receive notification in specific time.


> But it's sufficient to show that the code changes have the
> intended effect.
>

Looking at the kind of bugs I am getting, we should commit this test
module. Let me know your comments, I will fix those.


> I've committed this and back-patched it to 9.5, but not further.  It's
> a bug fix, but it's also a refactoring exercise, so I'd rather not
> push it into released branches.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 68c9505..fbbf97e 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -556,20 +556,23 @@ bgworker_die(SIGNAL_ARGS)
  * Standard SIGUSR1 handler for unconnected workers
  *
  * Here, we want to make sure an unconnected worker will at least heed
  * latch activity.
  */
 static void
 bgworker_sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	if (set_latch_on_sigusr1)
+		SetLatch(MyLatch);
+
 	latch_sigusr1_handler();
 
 	errno = save_errno;
 }
 
 /*
  * Start a new background worker
  *
  * This is the main entry point for background worker, to be called from
  * postmaster.
diff --git a/src/test/modules/test_bgwnotify/Makefile b/src/test/modules/test_bgwnotify/Makefile
new file mode 100644
index 000..6931c09
--- /dev/null
+++ b/src/test/modules/test_bgwnotify/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/test_bgwnotify/Makefile
+
+MODULE_big = test_bgwnotify
+OBJS = test_bgwnotify.o $(WIN32RES)
+PGFILEDESC = "test_bgwnotify - example use of background worker notification infrastructure"
+
+EXTENSION = test_bgwnotify
+DATA = test_bgwnotify--1.0.sql
+
+REGRESS = test_bgwnotify
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_bgwnotify
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_bgwnotify/expected/test_bgwnotify.out b/src/test/modules/test_bgwnotify/expected/test_bgwnotify.out
new file mode 100644
index 000..b7e1b65
--- /dev/null
+++ b/src/test/modules/test_bgwnotify/expected/test_bgwnotify.out
@@ -0,0 +1,11 @@
+CREATE EXTENSION test_bgwnotify;
+--
+-- We're checking that the operations complete without crashing or hanging and
+-- that none of their internal sanity tests fail.
+--
+SELECT test_bgwnotify();
+ test_bgwnotify 
+
+ t
+(1 row)
+
diff --git a/src/test/modules/test_bgwnotify/sql/test_bgwnotify.sql b/src/test/modules/test_bgwnotify/sql/test_bgwnotify.sql
new file mode 100644
index 000..e448ef0
--- /dev/null
+++ b/src/test/modules/test_bgwnotify/sql/test_bgwnotify.sql
@@ -0,0 +1,7 @@
+CREATE EXTENSION test_bgwnotify;
+
+--
+-- We're checking that the operations complete without crashing or hanging and
+-- that none of their internal sanity tests fail.
+--
+SELECT test_bgwnotify();
diff --git a/src/test/modules/test_bgwnotify/test_bgwnotify--1.0.sql b/src/test/modules/test_bgwnotify/test_bgwnotify--1.0.sql
new file mode 100644
index 000..f3423bc
--- /dev/null
+++ b/src/test/modules/test_bgwnotify/test_bgwnotify--1.0.sql
@@ -0,0 +1,7 @@
+/* src/test/modules/test_bgwnotify/test_bgwnotify--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_bgwnotify" to load this file. \quit
+
+CREATE FUNCTION test_bgwnotify() RETURNS pg_catalog.bool STRICT
+	AS 

Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-09-01 Thread Robert Haas
On Mon, Aug 31, 2015 at 8:01 AM, Ashutosh Bapat
 wrote:
>> Thanks.  It needs testing though to see if it really works as
>> intended.  Can you look into that?
>
> PFA the patch containing your code changes + test module. See if that meets
> your expectations.

Thanks.  I don't think this test module is suitable for commit for a
number of reasons, including the somewhat hackish use of exit(0)
instead of proper error reporting, the fact that you didn't integrate
it into the Makefile structure properly, and the fact that without the
postmaster.c changes it hangs forever instead of causing a test
failure.  But it's sufficient to show that the code changes have the
intended effect.

I've committed this and back-patched it to 9.5, but not further.  It's
a bug fix, but it's also a refactoring exercise, so I'd rather not
push it into released branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-31 Thread Ashutosh Bapat
On Sat, Aug 8, 2015 at 7:46 PM, Robert Haas  wrote:

> On Wed, Aug 5, 2015 at 3:33 AM, Ashutosh Bapat
>  wrote:
> > This idea looks good.
>
> Thanks.  It needs testing though to see if it really works as
> intended.  Can you look into that?
>

PFA the patch containing your code changes + test module. See if that meets
your expectations.


>
> > Looking at larger picture, we should also enable this feature to be used
> by
> > auxilliary processes. It's very hard to add a new auxilliary process in
> > current code. One has to go add code at many places to make sure that the
> > auxilliary processes die and are re-started correctly. Even tougher to
> add a
> > parent auxilliary process, which spawns multiple worker processes.That
> would
> > be whole lot simpler if we could allow the auxilliary processes to use
> > background worker infrastructure (which is what they are utlimately).
>
> That's a separate patch, but, sure, we could do that.  I agree with
> Alvaro's comments: the postmaster should start all children.  Other
> processes should just request that it do so.  We have two mechanisms
> for that right now: the one used by bgworkers, and the one used by the
> AV launcher.
>

BY children I really meant workers that it requests postmaster to start,
not the OS definition of child.


>
> > BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on
> exit
> > callbacks and detaches the shared memory segment from the background
> worker.
> > That avoids a full cluster restart when one of those worker which can not
> > corrupt shared memory dies. But I do not see any check to prevent such
> > backend from calling PGSharedMemoryReattach()
>
> There isn't, but you shouldn't do that.  :-)
>
> This is C code; you can't protect against actively malicious code.
>

We have taken pains to check whether the worker was started with
BGWORKER_BACKEND_DATABASE_CONNECTION flag, when it requests to connect to a
database. I think it makes sense to do that with ACCESS_SHMEM flag as well.
Otherwise, some buggy extension would connect to the shared memory and exit
without postmaster restarting all the backends. Obvious one can argue that,
memory corruption is possible even without this flag, but we should try to
protect exposed interfaces.

>
> > So it looks like, it suffices to assume that background worker either
> needs
> > to access shared memory or doesn't. Any background worker having shared
> > memory access can also access database and thus becomes part of the
> backend
> > list. Or may be we just avoid these flags and treat every background
> worker
> > as if it passed both these flags. That will simplify a lot of code.
>
> I think it's useful to support workers that don't have shared memory
> access at all, because those can crash without causing a system-wide
> reset.  But I don't see the point in distinguishing between workers
> with shared-memory access and those with a database connection.  I
> mean, obviously the worker needs to be able to initialize itself
> either way, but there seems to be no reason to force that to be
> signalled in bgw_flags.  It can just depend on whether
> BackgroundWorkerInitializeConnection gets called.
>

+1.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 000524d..1818f7c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,22 +148,21 @@
  * children we have and send them appropriate signals when necessary.
  *
  * "Special" children such as the startup, bgwriter and autovacuum launcher
  * tasks are not in this list.  Autovacuum worker and walsender are in it.
  * Also, "dead_end" children are in it: these are children launched just for
  * the purpose of sending a friendly rejection message to a would-be client.
  * We must track them because they are attached to shared memory, but we know
  * they will never become live backends.  dead_end children are not assigned a
  * PMChildSlot.
  *
- * Background workers that request shared memory access during registration are
- * in this list, too.
+ * Background workers are in this list, too.
  */
 typedef struct bkend
 {
 	pid_t		pid;			/* process id of backend */
 	long		cancel_key;		/* cancel key for cancels for this backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 
 	/*
 	 * Flavor of backend or auxiliary process.  Note that BACKEND_TYPE_WALSND
 	 * backends initially announce themselves as BACKEND_TYPE_NORMAL, so if
@@ -397,27 +396,25 @@ static int	ServerLoop(void);
 static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool SSLdone);
 static void processCancelRequest(Port *port, void *pkt);
 static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state canAcceptConnections(void);
 static long PostmasterRandom(void);
 static void Rand

Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-08 Thread Robert Haas
On Wed, Aug 5, 2015 at 3:33 AM, Ashutosh Bapat
 wrote:
> This idea looks good.

Thanks.  It needs testing though to see if it really works as
intended.  Can you look into that?

> Looking at larger picture, we should also enable this feature to be used by
> auxilliary processes. It's very hard to add a new auxilliary process in
> current code. One has to go add code at many places to make sure that the
> auxilliary processes die and are re-started correctly. Even tougher to add a
> parent auxilliary process, which spawns multiple worker processes.That would
> be whole lot simpler if we could allow the auxilliary processes to use
> background worker infrastructure (which is what they are utlimately).

That's a separate patch, but, sure, we could do that.  I agree with
Alvaro's comments: the postmaster should start all children.  Other
processes should just request that it do so.  We have two mechanisms
for that right now: the one used by bgworkers, and the one used by the
AV launcher.

> BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
> callbacks and detaches the shared memory segment from the background worker.
> That avoids a full cluster restart when one of those worker which can not
> corrupt shared memory dies. But I do not see any check to prevent such
> backend from calling PGSharedMemoryReattach()

There isn't, but you shouldn't do that.  :-)

This is C code; you can't protect against actively malicious code.

> So it looks like, it suffices to assume that background worker either needs
> to access shared memory or doesn't. Any background worker having shared
> memory access can also access database and thus becomes part of the backend
> list. Or may be we just avoid these flags and treat every background worker
> as if it passed both these flags. That will simplify a lot of code.

I think it's useful to support workers that don't have shared memory
access at all, because those can crash without causing a system-wide
reset.  But I don't see the point in distinguishing between workers
with shared-memory access and those with a database connection.  I
mean, obviously the worker needs to be able to initialize itself
either way, but there seems to be no reason to force that to be
signalled in bgw_flags.  It can just depend on whether
BackgroundWorkerInitializeConnection gets called.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-05 Thread Alvaro Herrera
Ashutosh Bapat wrote:

> Looking at larger picture, we should also enable this feature to be
> used by auxilliary processes. It's very hard to add a new auxilliary
> process in current code.  One has to go add code at many places to
> make sure that the auxilliary processes die and are re-started
> correctly.

No kidding.

> Even tougher to add a parent auxilliary process, which spawns multiple
> worker processes.

I'm not sure about this point.  The autovacuum launcher, which is an
obvious candidate to have "children" processes, does not do that but
instead talks to postmaster to do it instead.  We did it that way to
avoid the danger of children processes connected to shared memory that
would not be direct children of postmaster; that could cause reliability
problems if the intermediate process fails to signal its children for
some reason.  Along the same lines I would suggest that other bgworker
processes should also be controllers only, that is it can ask postmaster
to start other processes but not start them directly.

> That
> would be whole lot simpler if we could allow the auxilliary processes to
> use background worker infrastructure (which is what they are utlimately).
> 
> About the flags BGWORKER_BACKEND_DATABASE_CONNECTION and
> BGWORKER_SHMEM_ACCESS
>  BGWORKER_BACKEND_DATABASE_CONNECTION is used at seven places in the code:
> one is assertion, two check existence of this flag, when backend actually
> connects to a database, fourth checks whether BGWORKER_SHMEM_ACCESS is also
> set, fifth creates parallel workers with this flag, sixth uses the flag to
> add backend to the backed list (which you have removed). Seventh usage is
> only usage which installs signal handlers based on this flag, which too I
> guess can be overridden (or set correctly) by the actual background worker
> code.
> 
> BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
> callbacks and detaches the shared memory segment from the background
> worker. That avoids a full cluster restart when one of those worker which
> can not corrupt shared memory dies. But I do not see any check to prevent
> such backend from calling PGSharedMemoryReattach()
> 
> So it looks like, it suffices to assume that background worker either needs
> to access shared memory or doesn't. Any background worker having shared
> memory access can also access database and thus becomes part of the backend
> list. Or may be we just avoid these flags and treat every background worker
> as if it passed both these flags. That will simplify a lot of code.

If you want to make aux processes use the bgworker infrastructure (a
worthy goal I think), it is possible that more uses would be found for
those flags.  For instance, avlauncher is equivalent to a worker that
has SHMEM_ACCESS but no DATABASE_CONNECTION; maybe some of the code
would differ depending on whether or not DATABASE_CONNECTION is
specified.  The parallel to the avlauncher was the main reason I decided
to separate those two flags.  Therefore I wouldn't try to simplify just
yet.  If you succeed in making the avlauncher (and more generally all of
autovacuum) use bgworker code, perhaps that would be the time to search
for simplifications to make, because we would know more about how it
would be used.

>From your list above it doesn't sound like DATABASE_CONNECTION actually
does anything useful other than sanity checks.  I wonder if the behavior
that avlauncher becomes part of the backend list would be sane (this is
what would happen if you simply remove the DATABASE_CONNECTION flag and
then turn avlauncher into a regular bgworker).

On further thought, given that almost every aux process has particular
timing needs for start/stop under different conditions, I am doubtful
that you could turn any of them into a bgworker.  Maybe pgstat would be
the only exception, perhaps bgwriter, not sure.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-05 Thread Ashutosh Bapat
On Wed, Aug 5, 2015 at 2:07 AM, Robert Haas  wrote:

> On Tue, Jul 7, 2015 at 4:34 AM, Ashutosh Bapat
>  wrote:
> > With that notion of backend, to fix the original problem I reported,
> > PostmasterMarkPIDForWorkerNotify() should also look at the
> > BackgroundWorkerList. As per the comments in the prologue of this
> function,
> > it assumes that only backends need to be notified about worker state
> change,
> > which looks too restrictive. Any backend or background worker, which
> wants
> > to be notified about a background worker it requested to be spawned,
> should
> > be allowed to do so.
>
> Yeah.  I'm wondering if we should fix this problem by just insisting
> that all workers have an entry in BackendList.  At first look, this
> seems like it would make the code a lot simpler and have virtually no
> downside.  As it is, we're repeatedly reinventing new and different
> ways for unconnected background workers to do things that work fine in
> all other cases, and I don't see the point of that.
>
> I haven't really tested the attached patch - sadly, we have no testing
> of background workers without shared memory access in the tree - but
> it shows what I have in mind.
>
> Thoughts?
>
>
This idea looks good.

Looking at larger picture, we should also enable this feature to be used by
auxilliary processes. It's very hard to add a new auxilliary process in
current code. One has to go add code at many places to make sure that the
auxilliary processes die and are re-started correctly. Even tougher to add
a parent auxilliary process, which spawns multiple worker processes.That
would be whole lot simpler if we could allow the auxilliary processes to
use background worker infrastructure (which is what they are utlimately).

About the flags BGWORKER_BACKEND_DATABASE_CONNECTION and
BGWORKER_SHMEM_ACCESS
 BGWORKER_BACKEND_DATABASE_CONNECTION is used at seven places in the code:
one is assertion, two check existence of this flag, when backend actually
connects to a database, fourth checks whether BGWORKER_SHMEM_ACCESS is also
set, fifth creates parallel workers with this flag, sixth uses the flag to
add backend to the backed list (which you have removed). Seventh usage is
only usage which installs signal handlers based on this flag, which too I
guess can be overridden (or set correctly) by the actual background worker
code.

BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on exit
callbacks and detaches the shared memory segment from the background
worker. That avoids a full cluster restart when one of those worker which
can not corrupt shared memory dies. But I do not see any check to prevent
such backend from calling PGSharedMemoryReattach()

So it looks like, it suffices to assume that background worker either needs
to access shared memory or doesn't. Any background worker having shared
memory access can also access database and thus becomes part of the backend
list. Or may be we just avoid these flags and treat every background worker
as if it passed both these flags. That will simplify a lot of code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-04 Thread Robert Haas
On Tue, Jul 7, 2015 at 4:34 AM, Ashutosh Bapat
 wrote:
> With that notion of backend, to fix the original problem I reported,
> PostmasterMarkPIDForWorkerNotify() should also look at the
> BackgroundWorkerList. As per the comments in the prologue of this function,
> it assumes that only backends need to be notified about worker state change,
> which looks too restrictive. Any backend or background worker, which wants
> to be notified about a background worker it requested to be spawned, should
> be allowed to do so.

Yeah.  I'm wondering if we should fix this problem by just insisting
that all workers have an entry in BackendList.  At first look, this
seems like it would make the code a lot simpler and have virtually no
downside.  As it is, we're repeatedly reinventing new and different
ways for unconnected background workers to do things that work fine in
all other cases, and I don't see the point of that.

I haven't really tested the attached patch - sadly, we have no testing
of background workers without shared memory access in the tree - but
it shows what I have in mind.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 000524d..1818f7c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -155,8 +155,7 @@
  * they will never become live backends.  dead_end children are not assigned a
  * PMChildSlot.
  *
- * Background workers that request shared memory access during registration are
- * in this list, too.
+ * Background workers are in this list, too.
  */
 typedef struct bkend
 {
@@ -404,13 +403,11 @@ static long PostmasterRandom(void);
 static void RandomSalt(char *md5Salt);
 static void signal_child(pid_t pid, int signal);
 static bool SignalSomeChildren(int signal, int targets);
-static bool SignalUnconnectedWorkers(int signal);
 static void TerminateChildren(int signal);
 
 #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
 
 static int	CountChildren(int target);
-static int	CountUnconnectedWorkers(void);
 static void maybe_start_bgworker(void);
 static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
 static pid_t StartChildProcess(AuxProcType type);
@@ -2414,7 +2411,6 @@ SIGHUP_handler(SIGNAL_ARGS)
 (errmsg("received SIGHUP, reloading configuration files")));
 		ProcessConfigFile(PGC_SIGHUP);
 		SignalChildren(SIGHUP);
-		SignalUnconnectedWorkers(SIGHUP);
 		if (StartupPID != 0)
 			signal_child(StartupPID, SIGHUP);
 		if (BgWriterPID != 0)
@@ -2491,7 +2487,6 @@ pmdie(SIGNAL_ARGS)
 /* and bgworkers too; does this need tweaking? */
 SignalSomeChildren(SIGTERM,
 			   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-SignalUnconnectedWorkers(SIGTERM);
 /* and the autovac launcher too */
 if (AutoVacPID != 0)
 	signal_child(AutoVacPID, SIGTERM);
@@ -2543,11 +2538,11 @@ pmdie(SIGNAL_ARGS)
 signal_child(BgWriterPID, SIGTERM);
 			if (WalReceiverPID != 0)
 signal_child(WalReceiverPID, SIGTERM);
-			SignalUnconnectedWorkers(SIGTERM);
 			if (pmState == PM_RECOVERY)
 			{
+SignalSomeChildren(SIGTERM, BACKEND_TYPE_BGWORKER);
 /*
- * Only startup, bgwriter, walreceiver, unconnected bgworkers,
+ * Only startup, bgwriter, walreceiver, possibly bgworkers,
  * and/or checkpointer should be active in this state; we just
  * signaled the first four, and we don't want to kill
  * checkpointer yet.
@@ -2999,25 +2994,21 @@ CleanupBackgroundWorker(int pid,
 		}
 
 		/* Get it out of the BackendList and clear out remaining data */
-		if (rw->rw_backend)
-		{
-			Assert(rw->rw_worker.bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION);
-			dlist_delete(&rw->rw_backend->elem);
+		dlist_delete(&rw->rw_backend->elem);
 #ifdef EXEC_BACKEND
-			ShmemBackendArrayRemove(rw->rw_backend);
+		ShmemBackendArrayRemove(rw->rw_backend);
 #endif
 
-			/*
-			 * It's possible that this background worker started some OTHER
-			 * background worker and asked to be notified when that worker
-			 * started or stopped.  If so, cancel any notifications destined
-			 * for the now-dead backend.
-			 */
-			if (rw->rw_backend->bgworker_notify)
-BackgroundWorkerStopNotifications(rw->rw_pid);
-			free(rw->rw_backend);
-			rw->rw_backend = NULL;
-		}
+		/*
+		 * It's possible that this background worker started some OTHER
+		 * background worker and asked to be notified when that worker
+		 * started or stopped.  If so, cancel any notifications destined
+		 * for the now-dead backend.
+		 */
+		if (rw->rw_backend->bgworker_notify)
+			BackgroundWorkerStopNotifications(rw->rw_pid);
+		free(rw->rw_backend);
+		rw->rw_backend = NULL;
 		rw->rw_pid = 0;
 		rw->rw_child_slot = 0;
 		ReportBackgroundWorkerPID(rw);	/* report child death */
@@ -3160,15 +3151,12 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 			 * Found entry for freshly-de

Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-07-07 Thread Ashutosh Bapat
In CleanupBackgroundWorker(), we seem to differentiate between a background
worker with shared memory access and a backend.

2914 /*
2915  * Additionally, for shared-memory-connected workers, just
like a
2916  * backend, any exit status other than 0 or 1 is considered a
crash
2917  * and causes a system-wide restart.
2918  */

There is an assertion on line 2943 which implies that a backend should have
a database connection.

2939
2940 /* Get it out of the BackendList and clear out remaining data
*/
2941 if (rw->rw_backend)
2942 {
2943 Assert(rw->rw_worker.bgw_flags &
BGWORKER_BACKEND_DATABASE_CONNECTION);

A backend is a process created to handle a client connection [1], which is
always connected to a database. After custom background workers were
introduced we seem to have continued that notion. Hence only the background
workers which request database connection are in BackendList. So, may be we
should continue with that notion and correct the comment as "Background
workers that request database connection during registration are in this
list, too." or altogether delete that comment.

With that notion of backend, to fix the original problem I reported,
PostmasterMarkPIDForWorkerNotify() should also look at the
BackgroundWorkerList. As per the comments in the prologue of this function,
it assumes that only backends need to be notified about worker state
change, which looks too restrictive. Any backend or background worker,
which wants to be notified about a background worker it requested to be
spawned, should be allowed to do so.

5655 /*
5656  * When a backend asks to be notified about worker state changes, we
5657  * set a flag in its backend entry.  The background worker machinery
needs
5658  * to know when such backends exit.
5659  */
5660 bool
5661 PostmasterMarkPIDForWorkerNotify(int pid)

PFA the patch which changes PostmasterMarkPIDForWorkerNotify to look at
BackgroundWorkerList as well.

The backends that request to be notified are marked using bgworker_notify,
so that when such a backend dies the notifications to it can be turned off
using BackgroundWorkerStopNotifications(). Now that we allow a background
worker to be notified, we have to build similar flag in RegisteredBgWorker
for the same purpose. The patch does that.
[1]. http://www.postgresql.org/developer/backend/

On Mon, Jun 8, 2015 at 11:22 PM, Robert Haas  wrote:

> On Mon, Jun 8, 2015 at 1:51 PM, Alvaro Herrera 
> wrote:
> > Robert Haas wrote:
> >
> >> After studying this, I think it's a bug.  See this comment:
> >>
> >>  * Normal child backends can only be launched when we are in PM_RUN or
> >>  * PM_HOT_STANDBY state.  (We also allow launch of normal
> >>  * child backends in PM_WAIT_BACKUP state, but only for superusers.)
> >>  * In other states we handle connection requests by launching "dead_end"
> >>  * child processes, which will simply send the client an error message
> and
> >>  * quit.  (We track these in the BackendList so that we can know when
> they
> >>  * are all gone; this is important because they're still connected to
> shared
> >>  * memory, and would interfere with an attempt to destroy the shmem
> segment,
> >>  * possibly leading to SHMALL failure when we try to make a new one.)
> >>  * In PM_WAIT_DEAD_END state we are waiting for all the dead_end
> children
> >>  * to drain out of the system, and therefore stop accepting connection
> >>  * requests at all until the last existing child has quit (which
> hopefully
> >>  * will not be very long).
> >>
> >> That comment seems to imply that, at the very least, all backends with
> >> shared-memory access need to be part of BackendList.  But really, I'm
> >> unclear why we'd ever want to exit without all background workers
> >> having shut down, so maybe they should all be in there.  Isn't that
> >> sorta the point of this facility?
> >>
> >> Alvaro, any thoughts?
> >
> > As I recall, my thinking was:
> >
> > * anything connected to shmem that crashes could leave things in bad
> > state (for example locks improperly held), whereas things not connected
> > to shmem could crash without it being a problem for the rest of the
> > system and thus do not require a full restart cycle.  This stuff is
> > detected with the PMChildSlot thingy; therefore all bgworkers with shmem
> > access should have one of these.
> >
> > * I don't recall offhand what the criteria is for stuff to be in
> > postmaster's BackendList.  According to the comment on top of struct
> > Backend, bgworkers connected to shmem should be on that list, even if
> > they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
> > registration.  So that would be a bug.
> >
> > Does this help you any?
>
> Well, it sounds like we at least need to think about making it
> consistently depend on shmem-access rather than database-connection.
> I'm less sure what to do with workers that have not even got shmem
> access.
>
> --
> Robert Ha

Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-06-08 Thread Robert Haas
On Mon, Jun 8, 2015 at 1:51 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>
>> After studying this, I think it's a bug.  See this comment:
>>
>>  * Normal child backends can only be launched when we are in PM_RUN or
>>  * PM_HOT_STANDBY state.  (We also allow launch of normal
>>  * child backends in PM_WAIT_BACKUP state, but only for superusers.)
>>  * In other states we handle connection requests by launching "dead_end"
>>  * child processes, which will simply send the client an error message and
>>  * quit.  (We track these in the BackendList so that we can know when they
>>  * are all gone; this is important because they're still connected to shared
>>  * memory, and would interfere with an attempt to destroy the shmem segment,
>>  * possibly leading to SHMALL failure when we try to make a new one.)
>>  * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
>>  * to drain out of the system, and therefore stop accepting connection
>>  * requests at all until the last existing child has quit (which hopefully
>>  * will not be very long).
>>
>> That comment seems to imply that, at the very least, all backends with
>> shared-memory access need to be part of BackendList.  But really, I'm
>> unclear why we'd ever want to exit without all background workers
>> having shut down, so maybe they should all be in there.  Isn't that
>> sorta the point of this facility?
>>
>> Alvaro, any thoughts?
>
> As I recall, my thinking was:
>
> * anything connected to shmem that crashes could leave things in bad
> state (for example locks improperly held), whereas things not connected
> to shmem could crash without it being a problem for the rest of the
> system and thus do not require a full restart cycle.  This stuff is
> detected with the PMChildSlot thingy; therefore all bgworkers with shmem
> access should have one of these.
>
> * I don't recall offhand what the criteria is for stuff to be in
> postmaster's BackendList.  According to the comment on top of struct
> Backend, bgworkers connected to shmem should be on that list, even if
> they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
> registration.  So that would be a bug.
>
> Does this help you any?

Well, it sounds like we at least need to think about making it
consistently depend on shmem-access rather than database-connection.
I'm less sure what to do with workers that have not even got shmem
access.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-06-08 Thread Alvaro Herrera
Robert Haas wrote:

> After studying this, I think it's a bug.  See this comment:
> 
>  * Normal child backends can only be launched when we are in PM_RUN or
>  * PM_HOT_STANDBY state.  (We also allow launch of normal
>  * child backends in PM_WAIT_BACKUP state, but only for superusers.)
>  * In other states we handle connection requests by launching "dead_end"
>  * child processes, which will simply send the client an error message and
>  * quit.  (We track these in the BackendList so that we can know when they
>  * are all gone; this is important because they're still connected to shared
>  * memory, and would interfere with an attempt to destroy the shmem segment,
>  * possibly leading to SHMALL failure when we try to make a new one.)
>  * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
>  * to drain out of the system, and therefore stop accepting connection
>  * requests at all until the last existing child has quit (which hopefully
>  * will not be very long).
> 
> That comment seems to imply that, at the very least, all backends with
> shared-memory access need to be part of BackendList.  But really, I'm
> unclear why we'd ever want to exit without all background workers
> having shut down, so maybe they should all be in there.  Isn't that
> sorta the point of this facility?
> 
> Alvaro, any thoughts?

As I recall, my thinking was:

* anything connected to shmem that crashes could leave things in bad
state (for example locks improperly held), whereas things not connected
to shmem could crash without it being a problem for the rest of the
system and thus do not require a full restart cycle.  This stuff is
detected with the PMChildSlot thingy; therefore all bgworkers with shmem
access should have one of these.

* I don't recall offhand what the criteria is for stuff to be in
postmaster's BackendList.  According to the comment on top of struct
Backend, bgworkers connected to shmem should be on that list, even if
they did not have the BGWORKER_BACKEND_DATABASE_CONNECTION flag on
registration.  So that would be a bug.

Does this help you any?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-06-08 Thread Robert Haas
On Thu, Jun 4, 2015 at 8:52 AM, Ashutosh Bapat
 wrote:
> Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html
> does not indicate any relation between the fields bgw_notify_pid and
> bgw_flags of BackgroundWorker structure. But in one has to set
> BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature.

The comments in maybe_start_bgworker make it fairly clear that this
was done intentionally, but they don't explain why:

 * If not connected, we don't need a Backend
element, but we still
 * need a PMChildSlot.

But there's another comment elsewhere that says that the criteria is
shared-memory access, not database connection:

 * Background workers that request shared memory access during registration are
 * in this list, too.

So at a minimum, the current situation is not self-consistent.

After studying this, I think it's a bug.  See this comment:

 * Normal child backends can only be launched when we are in PM_RUN or
 * PM_HOT_STANDBY state.  (We also allow launch of normal
 * child backends in PM_WAIT_BACKUP state, but only for superusers.)
 * In other states we handle connection requests by launching "dead_end"
 * child processes, which will simply send the client an error message and
 * quit.  (We track these in the BackendList so that we can know when they
 * are all gone; this is important because they're still connected to shared
 * memory, and would interfere with an attempt to destroy the shmem segment,
 * possibly leading to SHMALL failure when we try to make a new one.)
 * In PM_WAIT_DEAD_END state we are waiting for all the dead_end children
 * to drain out of the system, and therefore stop accepting connection
 * requests at all until the last existing child has quit (which hopefully
 * will not be very long).

That comment seems to imply that, at the very least, all backends with
shared-memory access need to be part of BackendList.  But really, I'm
unclear why we'd ever want to exit without all background workers
having shut down, so maybe they should all be in there.  Isn't that
sorta the point of this facility?

Alvaro, any thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-06-04 Thread Ashutosh Bapat
Hi,
Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html
does not indicate any relation between the fields bgw_notify_pid and
bgw_flags of BackgroundWorker structure. But in one has to set
BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature.

In BackgroundWorkerStateChange
 318 /*
 319  * Copy the PID to be notified about state changes, but only
if the
 320  * postmaster knows about a backend with that PID.  It isn't
an error
 321  * if the postmaster doesn't know about the PID, because the
backend
 322  * that requested the worker could have died (or been killed)
just
 323  * after doing so.  Nonetheless, at least until we get some
experience
 324  * with how this plays out in the wild, log a message at a
relative
 325  * high debug level.
 326  */
 327 rw->rw_worker.bgw_notify_pid = slot->worker.bgw_notify_pid;
 328 if
(!PostmasterMarkPIDForWorkerNotify(rw->rw_worker.bgw_notify_pid))
 329 {
 330 elog(DEBUG1, "worker notification PID %lu is not valid",
 331  (long) rw->rw_worker.bgw_notify_pid);
 332 rw->rw_worker.bgw_notify_pid = 0;
 333 }

bgw_notify_pid gets wiped out (and that too silently) if
PostmasterMarkPIDForWorkerNotify() returns false.
PostmasterMarkPIDForWorkerNotify() only looks at the BackendList, which
does not contain all the background worker created by
Register*BackgroundWorker() calls. Only a baground worker which has set
BGWORKER_BACKEND_DATABASE_CONNECTION, gets added into BackendList in
maybe_start_bgworker()

5629 if (rw->rw_worker.bgw_flags &
BGWORKER_BACKEND_DATABASE_CONNECTION)
5630 {
5631 if (!assign_backendlist_entry(rw))
5632 return;
5633 }
5634 else
5635 rw->rw_child_slot = MyPMChildSlot =
AssignPostmasterChildSlot();
5636
5637 do_start_bgworker(rw);  /* sets rw->rw_pid */
5638
5639 if (rw->rw_backend)
5640 {
5641 dlist_push_head(&BackendList, &rw->rw_backend->elem);

Should we change the documentation to say "one needs to set
BGWORKER_BACKEND_DATABASE_CONNECTION" in order to use bgw_notify_pid
feature? OR we should fix the code not to wipe out bgw_notify_pid in the
code above (may be change PostmasterMarkPIDForWorkerNotify() to scan the
list of other background workers as well)?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company