subject:"Re\: \[HACKERS\] Unportable implementation of background worker start"

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
>> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
>> in latch.c, rather than bothering with a full-blown configure check.

> Yea, that sounds worth trying.  Wonder if we need to care about kernels
> not supporting it, but glibc having support?  I'd be ok skimping on that
> for now.

On my RHEL6 box,  is provided by glibc not the kernel:

$ rpm -qf /usr/include/sys/epoll.h
glibc-headers-2.12-1.209.el6_9.1.x86_64

So I think it's probably safe to assume that the header is in sync
with what glibc can do.

As for kernel (much) older than glibc, I'd rather expect glibc to paper
over that, though I've not looked at the source code to be sure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Andres Freund

On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
> >> I went ahead and changed the call to epoll_create into epoll_create1.
> >> I'm not too concerned about loss of portability there --- it seems
> >> unlikely that many people are still using ten-year-old glibc, and
> >> even less likely that any of them would be interested in running
> >> current Postgres on their stable-unto-death platform.  We could add
> >> a configure test for epoll_create1 if you feel one's needed, but
> >> I think it'd just be a waste of cycles.
> 
> > Yea, I think we can live with that.  If we find it's a problem, we can
> > add a configure test later.
> 
> Well, according to the buildfarm, "later" is "now" :-(.

Too bad.


> If RHEL5 is too old to have epoll_create1, I think your dates for it
> might be a bit off.  Anyway, I'll go do something about that in a
> little bit.

2008-11-13 is when glibc 2.9 was released. Appears that RHEL5 still
ships with glibc 2.5, released 2006-09-29. Given that RHEL 5 originally
was released 2007-03-05, that's not too surprising?


> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
> in latch.c, rather than bothering with a full-blown configure check.

Yea, that sounds worth trying.  Wonder if we need to care about kernels
not supporting it, but glibc having support?  I'd be ok skimping on that
for now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
>> I went ahead and changed the call to epoll_create into epoll_create1.
>> I'm not too concerned about loss of portability there --- it seems
>> unlikely that many people are still using ten-year-old glibc, and
>> even less likely that any of them would be interested in running
>> current Postgres on their stable-unto-death platform.  We could add
>> a configure test for epoll_create1 if you feel one's needed, but
>> I think it'd just be a waste of cycles.

> Yea, I think we can live with that.  If we find it's a problem, we can
> add a configure test later.

Well, according to the buildfarm, "later" is "now" :-(.

If RHEL5 is too old to have epoll_create1, I think your dates for it
might be a bit off.  Anyway, I'll go do something about that in a
little bit.

It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
in latch.c, rather than bothering with a full-blown configure check.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane

Robert Haas  writes:
> On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund  wrote:
>>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la
>>> the draft patch I posted earlier.  I'd be happy to do this if we were
>>> at the start of a devel cycle, but right now seems a bit late --- not
>>> to mention that we really need to fix 9.6 as well.

>> Yea, backpatching this to 9.6 seems like a bigger hammer than
>> appropriate.  I'm on the fence WRT master, I think there's an argument
>> to be made that this is going to become a bigger and bigger problem, and
>> that we'll wish in a year or two that we had fewer releases with
>> parallelism etc that don't use WaitEventSets.

> I think changing this might be wise.  This problem isn't going away
> for real until we do this, right?

Sure, but we have a lot of other problems that aren't going away until
we fix them, either.  This patch smells like new development to me ---
and it's not even very complete, because really what Andres wants to do
(and I concur) is to get rid of the postmaster's habit of doing
interesting things in signal handlers.  I'm definitely not on board with
doing that for v10 at this point.  But if we apply this patch, and then
do that in v11, then v10 will look like neither earlier nor later branches
with respect to the postmaster's event wait mechanisms.  I think that's
a recipe for undue maintenance pain.

I believe that the already-committed patches represent a sufficient-for-now
response to the known performance problems here, and so I'm thinking we
should stop here for v10.  I'm okay with pushing the latch.c changes
I just proposed, because those provide a useful safeguard against
extension modules doing something exciting in the postmaster process.
But I don't think we should go much further than that for v10.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund

On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
> Here's an updated version of that, which makes use of our previous
> conclusion that F_SETFD/FD_CLOEXEC are available everywhere except
> Windows, and fixes some sloppy thinking about the EXEC_BACKEND case.
> 
> I went ahead and changed the call to epoll_create into epoll_create1.
> I'm not too concerned about loss of portability there --- it seems
> unlikely that many people are still using ten-year-old glibc, and
> even less likely that any of them would be interested in running
> current Postgres on their stable-unto-death platform.  We could add
> a configure test for epoll_create1 if you feel one's needed, but
> I think it'd just be a waste of cycles.

Yea, I think we can live with that.  If we find it's a problem, we can
add a configure test later.


> I propose to push this into HEAD and 9.6 too.

Cool.


Change looks good to me.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Robert Haas

On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund  wrote:
>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la
>> the draft patch I posted earlier.  I'd be happy to do this if we were
>> at the start of a devel cycle, but right now seems a bit late --- not
>> to mention that we really need to fix 9.6 as well.
>
> Yea, backpatching this to 9.6 seems like a bigger hammer than
> appropriate.  I'm on the fence WRT master, I think there's an argument
> to be made that this is going to become a bigger and bigger problem, and
> that we'll wish in a year or two that we had fewer releases with
> parallelism etc that don't use WaitEventSets.

I think changing this might be wise.  This problem isn't going away
for real until we do this, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund

On 2017-04-26 11:42:38 -0400, Tom Lane wrote:
> 1. Let HEAD stand as it is.  We have a problem with slow response to
> bgworker start requests that arrive while ServerLoop is active, but that's
> a pretty tight window usually (although I believe I've seen it hit at
> least once in testing).
> 
> 2. Reinstall the pselect patch, blacklisting NetBSD and HPUX and whatever
> else we find to be flaky.  Then only the blacklisted platforms have the
> problem.

That seems unattractive at this point.  I'm not looking forward to
having to debug more random platforms that implement this badly in a
yet another weird way.

> 3. Go ahead with converting the postmaster to use WaitEventSet, a la
> the draft patch I posted earlier.  I'd be happy to do this if we were
> at the start of a devel cycle, but right now seems a bit late --- not
> to mention that we really need to fix 9.6 as well.

Yea, backpatching this to 9.6 seems like a bigger hammer than
appropriate.  I'm on the fence WRT master, I think there's an argument
to be made that this is going to become a bigger and bigger problem, and
that we'll wish in a year or two that we had fewer releases with
parallelism etc that don't use WaitEventSets.

> I'm leaning to doing #1 plus the maybe_start_bgworker change.  There's
> certainly room for difference of opinion here, though.  Thoughts?

I see you did the bgworker thing - that seems good to me.

Thanks,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane

Andres Freund  writes:
> I'd still like to get something like your CLOEXEC patch applied
> independently however.

Here's an updated version of that, which makes use of our previous
conclusion that F_SETFD/FD_CLOEXEC are available everywhere except
Windows, and fixes some sloppy thinking about the EXEC_BACKEND case.

I went ahead and changed the call to epoll_create into epoll_create1.
I'm not too concerned about loss of portability there --- it seems
unlikely that many people are still using ten-year-old glibc, and
even less likely that any of them would be interested in running
current Postgres on their stable-unto-death platform.  We could add
a configure test for epoll_create1 if you feel one's needed, but
I think it'd just be a waste of cycles.

I propose to push this into HEAD and 9.6 too.

regards, tom lane

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 0d0701a..946fbff 100644
*** a/src/backend/storage/ipc/latch.c
--- b/src/backend/storage/ipc/latch.c
*** static volatile sig_atomic_t waiting = f
*** 118,123 
--- 118,126 
  static int	selfpipe_readfd = -1;
  static int	selfpipe_writefd = -1;
  
+ /* Process owning the self-pipe --- needed for checking purposes */
+ static int	selfpipe_owner_pid = 0;
+ 
  /* Private function prototypes */
  static void sendSelfPipeByte(void);
  static void drainSelfPipe(void);
*** InitializeLatchSupport(void)
*** 146,152 
  #ifndef WIN32
  	int			pipefd[2];
  
! 	Assert(selfpipe_readfd == -1);
  
  	/*
  	 * Set up the self-pipe that allows a signal handler to wake up the
--- 149,189 
  #ifndef WIN32
  	int			pipefd[2];
  
! 	if (IsUnderPostmaster)
! 	{
! 		/*
! 		 * We might have inherited connections to a self-pipe created by the
! 		 * postmaster.  It's critical that child processes create their own
! 		 * self-pipes, of course, and we really want them to close the
! 		 * inherited FDs for safety's sake.
! 		 */
! 		if (selfpipe_owner_pid != 0)
! 		{
! 			/* Assert we go through here but once in a child process */
! 			Assert(selfpipe_owner_pid != MyProcPid);
! 			/* Release postmaster's pipe FDs; ignore any error */
! 			(void) close(selfpipe_readfd);
! 			(void) close(selfpipe_writefd);
! 			/* Clean up, just for safety's sake; we'll set these below */
! 			selfpipe_readfd = selfpipe_writefd = -1;
! 			selfpipe_owner_pid = 0;
! 		}
! 		else
! 		{
! 			/*
! 			 * Postmaster didn't create a self-pipe ... or else we're in an
! 			 * EXEC_BACKEND build, in which case it doesn't matter since the
! 			 * postmaster's pipe FDs were closed by the action of FD_CLOEXEC.
! 			 */
! 			Assert(selfpipe_readfd == -1);
! 		}
! 	}
! 	else
! 	{
! 		/* In postmaster or standalone backend, assert we do this but once */
! 		Assert(selfpipe_readfd == -1);
! 		Assert(selfpipe_owner_pid == 0);
! 	}
  
  	/*
  	 * Set up the self-pipe that allows a signal handler to wake up the
*** InitializeLatchSupport(void)
*** 154,176 
  	 * that SetLatch won't block if the event has already been set many times
  	 * filling the kernel buffer. Make the read-end non-blocking too, so that
  	 * we can easily clear the pipe by reading until EAGAIN or EWOULDBLOCK.
  	 */
  	if (pipe(pipefd) < 0)
  		elog(FATAL, "pipe() failed: %m");
  	if (fcntl(pipefd[0], F_SETFL, O_NONBLOCK) == -1)
! 		elog(FATAL, "fcntl() failed on read-end of self-pipe: %m");
  	if (fcntl(pipefd[1], F_SETFL, O_NONBLOCK) == -1)
! 		elog(FATAL, "fcntl() failed on write-end of self-pipe: %m");
  
  	selfpipe_readfd = pipefd[0];
  	selfpipe_writefd = pipefd[1];
  #else
  	/* currently, nothing to do here for Windows */
  #endif
  }
  
  /*
!  * Initialize a backend-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
--- 191,220 
  	 * that SetLatch won't block if the event has already been set many times
  	 * filling the kernel buffer. Make the read-end non-blocking too, so that
  	 * we can easily clear the pipe by reading until EAGAIN or EWOULDBLOCK.
+ 	 * Also, make both FDs close-on-exec, since we surely do not want any
+ 	 * child processes messing with them.
  	 */
  	if (pipe(pipefd) < 0)
  		elog(FATAL, "pipe() failed: %m");
  	if (fcntl(pipefd[0], F_SETFL, O_NONBLOCK) == -1)
! 		elog(FATAL, "fcntl(F_SETFL) failed on read-end of self-pipe: %m");
  	if (fcntl(pipefd[1], F_SETFL, O_NONBLOCK) == -1)
! 		elog(FATAL, "fcntl(F_SETFL) failed on write-end of self-pipe: %m");
! 	if (fcntl(pipefd[0], F_SETFD, FD_CLOEXEC) == -1)
! 		elog(FATAL, "fcntl(F_SETFD) failed on read-end of self-pipe: %m");
! 	if (fcntl(pipefd[1], F_SETFD, FD_CLOEXEC) == -1)
! 		elog(FATAL, "fcntl(F_SETFD) failed on write-end of self-pipe: %m");
  
  	selfpipe_readfd = pipefd[0];
  	selfpipe_writefd = pipefd[1];
+ 	selfpipe_owner_pid = MyProcPid;
  #else
  	/* currently, nothing to do here for Windows */
  #endif
  }
  
  /*
!  * Initialize a process-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
*** InitLatc

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane

I wrote:
> =?utf-8?Q?R=C3=A9mi_Zara?=  writes:
>> coypu was not stuck (no buildfarm related process running), but failed to 
>> clean-up shared memory and semaphores.
>> I’ve done the clean-up.

> Huh, that's even more interesting.

I installed NetBSD 5.1.5 on an old Mac G4; I believe this is a reasonable
approximation to coypu's environment.  With the pselect patch installed,
I can replicate the behavior we saw in the buildfarm of connections
immediately failing with "the database system is starting up".
Investigation shows that pselect reports ready sockets correctly (which is
what allows connections to get in at all), and it does stop waiting either
for a signal or for a timeout.  What it forgets to do is to actually
service the signal.  The observed behavior is caused by the fact that
reaper() is never called so the postmaster never realizes that the startup
process has finished.

I experimented with putting

PG_SETMASK(&UnBlockSig);
PG_SETMASK(&BlockSig);

immediately after the pselect() call, and found that indeed that lets
signals get serviced, and things work pretty much normally.

However, closer inspection finds that pselect only stops waiting when
a signal arrives *while it's waiting*, not if there was a signal already
pending.  So this is actually even more broken than the so called "non
atomic" behavior we had expected to see --- at least with that, the
pending signal would have gotten serviced promptly, even if ServerLoop
itself didn't iterate.

This is all giving me less than warm fuzzy feelings about the state of
pselect support out in the real world.

So at this point we seem to have three plausible alternatives:

1. Let HEAD stand as it is.  We have a problem with slow response to
bgworker start requests that arrive while ServerLoop is active, but that's
a pretty tight window usually (although I believe I've seen it hit at
least once in testing).

2. Reinstall the pselect patch, blacklisting NetBSD and HPUX and whatever
else we find to be flaky.  Then only the blacklisted platforms have the
problem.

3. Go ahead with converting the postmaster to use WaitEventSet, a la
the draft patch I posted earlier.  I'd be happy to do this if we were
at the start of a devel cycle, but right now seems a bit late --- not
to mention that we really need to fix 9.6 as well.

We could substantially ameliorate the slow-response problem by allowing
maybe_start_bgworker to launch multiple workers per call, which is
something I think we should do regardless.  (I have a patch written to
allow it to launch up to N workers per call, but have held off committing
that till after the dust settles in ServerLoop.)

I'm leaning to doing #1 plus the maybe_start_bgworker change.  There's
certainly room for difference of opinion here, though.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-25 Thread Tom Lane

=?utf-8?Q?R=C3=A9mi_Zara?=  writes:
>> Le 25 avr. 2017 à 01:47, Tom Lane  a écrit :
>> It looks like coypu is going to need manual intervention (ie, kill -9
>> on the leftover postmaster) to get unwedged :-(.  That's particularly
>> disturbing because it implies that ServerLoop isn't iterating at all;
>> otherwise, it'd have noticed by now that the buildfarm script deleted
>> its data directory out from under it.

> coypu was not stuck (no buildfarm related process running), but failed to 
> clean-up shared memory and semaphores.
> I’ve done the clean-up.

Huh, that's even more interesting.

Looking at the code, what ServerLoop actually does when it notices that
the postmaster.pid file has been removed is

kill(MyProcPid, SIGQUIT);

So if our hypothesis is that pselect() failed to unblock signals,
then failure to quit is easily explained: the postmaster never
received/acted on its own signal.  But that should have left you
with a running postmaster holding the shared memory and semaphores.
Seems like if it is gone but it failed to remove those, somebody must've
kill -9'd it ... but who?  I see nothing in the buildfarm script that
would.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Rémi Zara


> Le 25 avr. 2017 à 01:47, Tom Lane  a écrit :
> 
> I wrote:
>> What I'm inclined to do is to revert the pselect change but not the other,
>> to see if that fixes these two animals.  If it does, we could look into
>> blacklisting these particular platforms when choosing pselect.
> 
> It looks like coypu is going to need manual intervention (ie, kill -9
> on the leftover postmaster) to get unwedged :-(.  That's particularly
> disturbing because it implies that ServerLoop isn't iterating at all;
> otherwise, it'd have noticed by now that the buildfarm script deleted
> its data directory out from under it.  Even if NetBSD's pselect had
> forgotten to unblock signals, you'd figure it'd time out after a
> minute ... so it's even more broken than that.
> 

Hi,

coypu was not stuck (no buildfarm related process running), but failed to 
clean-up shared memory and semaphores.
I’ve done the clean-up.

Regards,

Rémi



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane

I wrote:
> What I'm inclined to do is to revert the pselect change but not the other,
> to see if that fixes these two animals.  If it does, we could look into
> blacklisting these particular platforms when choosing pselect.

It looks like coypu is going to need manual intervention (ie, kill -9
on the leftover postmaster) to get unwedged :-(.  That's particularly
disturbing because it implies that ServerLoop isn't iterating at all;
otherwise, it'd have noticed by now that the buildfarm script deleted
its data directory out from under it.  Even if NetBSD's pselect had
forgotten to unblock signals, you'd figure it'd time out after a
minute ... so it's even more broken than that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-24 18:14:41 -0400, Tom Lane wrote:
>> A bit of googling establishes that NetBSD 5.1 has a broken pselect
>> implementation:
>> 
>> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625

> Yikes.  Do I understand correctly that they effectively just mapped
> pselect to select?

That's what it sounds like.  Probably not intentionally, but in effect.

>> What I'm inclined to do is to revert the pselect change but not the other,
>> to see if that fixes these two animals.  If it does, we could look into
>> blacklisting these particular platforms when choosing pselect.

> Seems sensible.

Will do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund

On 2017-04-24 18:14:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
> >> coypu's problem is unrelated:
> 
> > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6
> > failure is the same as gharial's mode of failure.
> 
> [ looks closer... ]  Oh: the 9.6 run occurred first, and the failures on
> HEAD and 9.5 are presumably follow-on damage because the stuck postmaster
> hasn't released semaphores.
> 
> A bit of googling establishes that NetBSD 5.1 has a broken pselect
> implementation:
> 
> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625

Yikes.  Do I understand correctly that they effectively just mapped
pselect to select?


> What I'm inclined to do is to revert the pselect change but not the other,
> to see if that fixes these two animals.  If it does, we could look into
> blacklisting these particular platforms when choosing pselect.

Seems sensible.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
>> coypu's problem is unrelated:

> Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6
> failure is the same as gharial's mode of failure.

[ looks closer... ]  Oh: the 9.6 run occurred first, and the failures on
HEAD and 9.5 are presumably follow-on damage because the stuck postmaster
hasn't released semaphores.

A bit of googling establishes that NetBSD 5.1 has a broken pselect
implementation:

http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625

That says they fixed it in later versions but not 5.1 :-(

I can't find any similar smoking gun on the web for HPUX, but
I'd fully expect their bug database to be behind a paywall.

What I'm inclined to do is to revert the pselect change but not the other,
to see if that fixes these two animals.  If it does, we could look into
blacklisting these particular platforms when choosing pselect.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund

On 2017-04-24 17:33:39 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
> >> Unclear if related, but
> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
> >> has a suspicious timing of failing in a weird way.
> 
> > Given that gharial is also failing on 9.6 (same set of commits) and
> > coypu fails (again same set) on 9.6
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2017-04-24%2018%3A20%3A33
> 
> coypu's problem is unrelated:
> 
> running bootstrap script ... 2017-04-24 23:04:53.084 CEST [21114] FATAL:  
> could not create semaphores: No space left on device
> 2017-04-24 23:04:53.084 CEST [21114] DETAIL:  Failed system call was 
> semget(1, 17, 03600).

Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6
failure is the same as gharial's mode of failure.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
>> Unclear if related, but
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
>> has a suspicious timing of failing in a weird way.

> Given that gharial is also failing on 9.6 (same set of commits) and
> coypu fails (again same set) on 9.6
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2017-04-24%2018%3A20%3A33

coypu's problem is unrelated:

running bootstrap script ... 2017-04-24 23:04:53.084 CEST [21114] FATAL:  could 
not create semaphores: No space left on device
2017-04-24 23:04:53.084 CEST [21114] DETAIL:  Failed system call was semget(1, 
17, 03600).

but it does seem likely that one of these patches broke gharial.
That's pretty annoying :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund

On 2017-04-24 13:16:44 -0700, Andres Freund wrote:
> Unclear if related, but
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
> has a suspicious timing of failing in a weird way.

Given that gharial is also failing on 9.6 (same set of commits) and
coypu fails (again same set) on 9.6
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2017-04-24%2018%3A20%3A33

I'm afraid it's more likely to be related.

gharial & coypu owners, any chance you could try starting postgres with
log_min_messages=debug5 on one of the affected machines?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund

On 2017-04-21 23:50:41 -0400, Tom Lane wrote:
> I wrote:
> > Attached is a lightly-tested draft patch that converts the postmaster to
> > use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> > about whether this is the direction to proceed, though.
> 
> Attached are a couple of patches that represent a plausible Plan B.
> The first one changes the postmaster to run its signal handlers without
> specifying SA_RESTART.  I've confirmed that that seems to fix the
> select_parallel-test-takes-a-long-time problem on gaur/pademelon.
> The second one uses pselect, if available, to replace the unblock-signals/
> select()/block-signals dance in ServerLoop.  On platforms where pselect
> exists and works properly, that should fix the race condition I described
> previously.  On platforms where it doesn't, we're no worse off than
> before.
> 
> As mentioned in the comments for the second patch, even if we don't
> have working pselect(), the only problem is that ServerLoop's response
> to an interrupt might be delayed by as much as the up-to-1-minute timeout.
> The only existing case where that's really bad is launching multiple
> bgworkers.  I would therefore advocate also changing maybe_start_bgworker
> to start up to N bgworkers per call, where N is large enough to pretty
> much always satisfy simultaneously-arriving requests.  I'd pick 100 or
> so, but am willing to negotiate.
> 
> I think that these patches represent something we could back-patch
> without a lot of trepidation, unlike the WaitEventSet-based approach.
> Therefore, my proposal is to apply and backpatch these changes, and
> call it good for v10.  For v11, we could work on changing the postmaster
> to not do work in signal handlers, as discussed upthread.  That would
> supersede these two patches completely, though I'd still advocate for
> keeping the change in maybe_start_bgworker.
> 
> Note: for testing purposes, these patches are quite independent; just
> ignore the hunk in the second patch that changes a comment added by
> the first one.

Unclear if related, but
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2017-04-24%2019%3A30%3A42
has a suspicious timing of failing in a weird way.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund

On 2017-04-21 23:50:41 -0400, Tom Lane wrote:
> Attached are a couple of patches that represent a plausible Plan B.
> The first one changes the postmaster to run its signal handlers without
> specifying SA_RESTART.  I've confirmed that that seems to fix the
> select_parallel-test-takes-a-long-time problem on gaur/pademelon.

> The second one uses pselect, if available, to replace the unblock-signals/
> select()/block-signals dance in ServerLoop.  On platforms where pselect
> exists and works properly, that should fix the race condition I described
> previously.  On platforms where it doesn't, we're no worse off than
> before.

We probably should note somewhere prominently that pselect isn't
actually race-free on a number of platforms.


> I think that these patches represent something we could back-patch
> without a lot of trepidation, unlike the WaitEventSet-based approach.
> Therefore, my proposal is to apply and backpatch these changes, and
> call it good for v10.  For v11, we could work on changing the postmaster
> to not do work in signal handlers, as discussed upthread.  That would
> supersede these two patches completely, though I'd still advocate for
> keeping the change in maybe_start_bgworker.

Not yet having looked at your patches, that sounds like a reasonable
plan.  I'd still like to get something like your CLOEXEC patch applied
independently however.


< patches >

Looks reasonable on a quick skim.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

I wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.

Attached are a couple of patches that represent a plausible Plan B.
The first one changes the postmaster to run its signal handlers without
specifying SA_RESTART.  I've confirmed that that seems to fix the
select_parallel-test-takes-a-long-time problem on gaur/pademelon.
The second one uses pselect, if available, to replace the unblock-signals/
select()/block-signals dance in ServerLoop.  On platforms where pselect
exists and works properly, that should fix the race condition I described
previously.  On platforms where it doesn't, we're no worse off than
before.

As mentioned in the comments for the second patch, even if we don't
have working pselect(), the only problem is that ServerLoop's response
to an interrupt might be delayed by as much as the up-to-1-minute timeout.
The only existing case where that's really bad is launching multiple
bgworkers.  I would therefore advocate also changing maybe_start_bgworker
to start up to N bgworkers per call, where N is large enough to pretty
much always satisfy simultaneously-arriving requests.  I'd pick 100 or
so, but am willing to negotiate.

I think that these patches represent something we could back-patch
without a lot of trepidation, unlike the WaitEventSet-based approach.
Therefore, my proposal is to apply and backpatch these changes, and
call it good for v10.  For v11, we could work on changing the postmaster
to not do work in signal handlers, as discussed upthread.  That would
supersede these two patches completely, though I'd still advocate for
keeping the change in maybe_start_bgworker.

Note: for testing purposes, these patches are quite independent; just
ignore the hunk in the second patch that changes a comment added by
the first one.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c382345..52b5e2c 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** PostmasterMain(int argc, char *argv[])
*** 610,615 
--- 610,624 
  	/*
  	 * Set up signal handlers for the postmaster process.
  	 *
+ 	 * In the postmaster, we want to install non-ignored handlers *without*
+ 	 * SA_RESTART.  This is because they'll be blocked at all times except
+ 	 * when ServerLoop is waiting for something to happen, and during that
+ 	 * window, we want signals to exit the select(2) wait so that ServerLoop
+ 	 * can respond if anything interesting happened.  On some platforms,
+ 	 * signals marked SA_RESTART would not cause the select() wait to end.
+ 	 * Child processes will generally want SA_RESTART, but we expect them to
+ 	 * set up their own handlers before unblocking signals.
+ 	 *
  	 * CAUTION: when changing this list, check for side-effects on the signal
  	 * handling setup of child processes.  See tcop/postgres.c,
  	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
*** PostmasterMain(int argc, char *argv[])
*** 620,635 
  	pqinitmask();
  	PG_SETMASK(&BlockSig);

! 	pqsignal(SIGHUP, SIGHUP_handler);	/* reread config file and have
! 		 * children do same */
! 	pqsignal(SIGINT, pmdie);	/* send SIGTERM and shut down */
! 	pqsignal(SIGQUIT, pmdie);	/* send SIGQUIT and die */
! 	pqsignal(SIGTERM, pmdie);	/* wait for children and shut down */
  	pqsignal(SIGALRM, SIG_IGN); /* ignored */
  	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
! 	pqsignal(SIGUSR1, sigusr1_handler); /* message from child process */
! 	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
! 	pqsignal(SIGCHLD, reaper);	/* handle child termination */
  	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
  	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
  	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
--- 629,648 
  	pqinitmask();
  	PG_SETMASK(&BlockSig);

! 	pqsignal_no_restart(SIGHUP, SIGHUP_handler);		/* reread config file
! 		 * and have children do
! 		 * same */
! 	pqsignal_no_restart(SIGINT, pmdie); /* send SIGTERM and shut down */
! 	pqsignal_no_restart(SIGQUIT, pmdie);		/* send SIGQUIT and die */
! 	pqsignal_no_restart(SIGTERM, pmdie);		/* wait for children and shut
!  * down */
  	pqsignal(SIGALRM, SIG_IGN); /* ignored */
  	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
! 	pqsignal_no_restart(SIGUSR1, sigusr1_handler);		/* message from child
! 		 * process */
! 	pqsignal_no_restart(SIGUSR2, dummy_handler);		/* unused, reserve for
! 		 * children */
! 	pqsignal_no_restart(SIGCHLD, reaper);		/* handle child termination */
  	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
  	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
  	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
diff --git a/src/include/port.h

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> It also appears to me that do_start_bgworker's treatment of fork
>> failure is completely brain dead.  Did anyone really think about
>> that case?

> Hmm, I probably modelled it on autovacuum without giving that case much
> additional consideration.

Attached is a proposed patch that should make fork failure behave
sanely, ie it works much the same as a worker crash immediately after
launch.  I also refactored things a bit to make do_start_bgworker
fully responsible for updating the RegisteredBgWorker's state,
rather than doing just some of it as before.

I tested this by hot-wiring the fork_process call to fail some of
the time, which showed that the postmaster now seems to recover OK,
but parallel.c's logic is completely innocent of the idea that
worker-startup failure is possible.  The leader backend just freezes,
and nothing short of kill -9 on that backend will get you out of it.
Fixing that seems like material for a separate patch though.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c382345..8461c24 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** static void TerminateChildren(int signal
*** 420,425 
--- 420,426 
  #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
  
  static int	CountChildren(int target);
+ static bool assign_backendlist_entry(RegisteredBgWorker *rw);
  static void maybe_start_bgworker(void);
  static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
  static pid_t StartChildProcess(AuxProcType type);
*** bgworker_forkexec(int shmem_slot)
*** 5531,5543 
   * Start a new bgworker.
   * Starting time conditions must have been checked already.
   *
   * This code is heavily based on autovacuum.c, q.v.
   */
! static void
  do_start_bgworker(RegisteredBgWorker *rw)
  {
  	pid_t		worker_pid;
  
  	ereport(DEBUG1,
  			(errmsg("starting background worker process \"%s\"",
  	rw->rw_worker.bgw_name)));
--- 5532,5564 
   * Start a new bgworker.
   * Starting time conditions must have been checked already.
   *
+  * Returns true on success, false on failure.
+  * In either case, update the RegisteredBgWorker's state appropriately.
+  *
   * This code is heavily based on autovacuum.c, q.v.
   */
! static bool
  do_start_bgworker(RegisteredBgWorker *rw)
  {
  	pid_t		worker_pid;
  
+ 	Assert(rw->rw_pid == 0);
+ 
+ 	/*
+ 	 * Allocate and assign the Backend element.  Note we must do this before
+ 	 * forking, so that we can handle out of memory properly.
+ 	 *
+ 	 * Treat failure as though the worker had crashed.  That way, the
+ 	 * postmaster will wait a bit before attempting to start it again; if it
+ 	 * tried again right away, most likely it'd find itself repeating the
+ 	 * out-of-memory or fork failure condition.
+ 	 */
+ 	if (!assign_backendlist_entry(rw))
+ 	{
+ 		rw->rw_crashed_at = GetCurrentTimestamp();
+ 		return false;
+ 	}
+ 
  	ereport(DEBUG1,
  			(errmsg("starting background worker process \"%s\"",
  	rw->rw_worker.bgw_name)));
*** do_start_bgworker(RegisteredBgWorker *rw
*** 5549,5557 
  #endif
  	{
  		case -1:
  			ereport(LOG,
  	(errmsg("could not fork worker process: %m")));
! 			return;
  
  #ifndef EXEC_BACKEND
  		case 0:
--- 5570,5586 
  #endif
  	{
  		case -1:
+ 			/* in postmaster, fork failed ... */
  			ereport(LOG,
  	(errmsg("could not fork worker process: %m")));
! 			/* undo what assign_backendlist_entry did */
! 			ReleasePostmasterChildSlot(rw->rw_child_slot);
! 			rw->rw_child_slot = 0;
! 			free(rw->rw_backend);
! 			rw->rw_backend = NULL;
! 			/* mark entry as crashed, so we'll try again later */
! 			rw->rw_crashed_at = GetCurrentTimestamp();
! 			break;
  
  #ifndef EXEC_BACKEND
  		case 0:
*** do_start_bgworker(RegisteredBgWorker *rw
*** 5575,5588 
  			PostmasterContext = NULL;
  
  			StartBackgroundWorker();
  			break;
  #endif
  		default:
  			rw->rw_pid = worker_pid;
  			rw->rw_backend->pid = rw->rw_pid;
  			ReportBackgroundWorkerPID(rw);
! 			break;
  	}
  }
  
  /*
--- 5604,5627 
  			PostmasterContext = NULL;
  
  			StartBackgroundWorker();
+ 
+ 			exit(1);			/* should not get here */
  			break;
  #endif
  		default:
+ 			/* in postmaster, fork successful ... */
  			rw->rw_pid = worker_pid;
  			rw->rw_backend->pid = rw->rw_pid;
  			ReportBackgroundWorkerPID(rw);
! 			/* add new worker to lists of backends */
! 			dlist_push_head(&BackendList, &rw->rw_backend->elem);
! #ifdef EXEC_BACKEND
! 			ShmemBackendArrayAdd(rw->rw_backend);
! #endif
! 			return true;
  	}
+ 
+ 	return false;
  }
  
  /*
*** bgworker_should_start_now(BgWorkerStartT
*** 5629,5634 
--- 5668,5675 
   * Allocate the Backend struct for a connected background worker, but don't
   * add it to the list of backends just yet.
   *
+  * On fa

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Tom Lane  writes:
> Andres Freund  writes:
>> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>>> but I see that SUSv2
>>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>>> coding rules it ought to be okay to assume they're there.  I'm tempted to
>>> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
>>> if anything in the buildfarm complains.

>> +1

> Done, we'll soon see what happens.

Should have seen this coming, I guess: some of the Windows critters are
falling over, apparently because they lack fcntl() altogether.  So the
#ifdef F_SETFD was really acting as a proxy for "#ifdef HAVE_FCNTL".

There's no HAVE_FCNTL test in configure ATM, and I'm thinking it would
be pretty silly to add one, since surely it would succeed on anything
Unix-y enough to run the configure script.

I'm guessing the best thing to do is put back #ifdef F_SETFD;
alternatively we might spell it like "#ifndef WIN32", but I'm unsure
if that'd do what we want on Cygwin or MinGW.

In non-Windows code paths in latch.c, we probably wouldn't need to
bother with #ifdef F_SETFD.

Hopefully we can leave in the removal of "#define FD_CLOEXEC".
Will wait a bit longer for more results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>> but I see that SUSv2
>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>> coding rules it ought to be okay to assume they're there.  I'm tempted to
>> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
>> if anything in the buildfarm complains.

> +1

Done, we'll soon see what happens.

In the same area, I noticed that POSIX does not say that the success
result for fcntl(F_SETFD) and related cases is 0.  It says that the
failure result is -1 and the success result is some other value.
We seem to have this right in most places, but e.g. port/noblock.c
gets it wrong.  The lack of field complaints implies that just about
everybody actually does return 0 on success, but I still think it
would be a good idea to run around and make all the calls test
specifically for -1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund

Tom,

On 2017-04-21 13:49:27 -0400, Tom Lane wrote:
> >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
> >> -   * any new connections, so we don't call select(), and just 
> >> sleep.
> >> +   * any new connections, so we don't call WaitEventSetWait(), 
> >> and just
> >> +   * sleep.  XXX not ideal
> >> */
> 
> > Couldn't we just deactive the sockets in the set instead?
> 
> Yeah, I think it'd be better to do something like that.  The pg_usleep
> call has the same issue of possibly not responding to interrupts.  The
> risks are a lot less, since it's a much shorter wait, but I would rather
> eliminate the separate code path in favor of doing it honestly.  Didn't
> seem like something to fuss over in the first draft though.

Ok, cool.


> >> +  ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);
> 
> > Why are we using MAXLISTEN, rather than the actual number of things to
> > listen to?
> 
> It'd take more code (ie, an additional scan of the array) to predetermine
> that.  I figured the space-per-item in the WaitEventSet wasn't enough to
> worry about ... do you think differently?

I'm not sure.  We do create an epoll handler with enough space, and that
has some overhead. Don't know whether that's worthwhile to care about.


> > I kind of would like, in master, take a chance of replace all the work
> > done in signal handlers, by just a SetLatch(), and do it outside of
> > signal handlers instead.  Forking from signal handlers is just plain
> > weird.
> 
> Yeah, maybe it's time.  But in v11, and not for back-patch.

Agreed.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund

On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
> but I see that SUSv2
> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
> coding rules it ought to be okay to assume they're there.  I'm tempted to
> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
> if anything in the buildfarm complains.

+1

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>>> +#ifndef FD_CLOEXEC
>>> +#define FD_CLOEXEC 1
>>> +#endif

>> Hm? Are we sure this is portable?  Is there really cases that have
>> F_SETFD, but not CLOEXEC?

> Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq.

Looking closer, that code dates to 

Author: Tom Lane 
Branch: master Release: REL8_0_BR [7627b91cd] 2004-10-21 20:23:19 +

Set the close-on-exec flag for libpq's socket to the backend, to avoid
any possible problems from child programs executed by the client app.
Per suggestion from Elliot Lee of Red Hat.

and while the public discussion about it

https://www.postgresql.org/message-id/flat/18172.1098382248%40sss.pgh.pa.us

doesn't really say, I suspect the specific coding was Elliot's suggestion
as well.  It probably had some value at one time ... but I see that SUSv2
mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
coding rules it ought to be okay to assume they're there.  I'm tempted to
rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
if anything in the buildfarm complains.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>> Attached is a lightly-tested draft patch that converts the postmaster to
>> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
>> about whether this is the direction to proceed, though.  It adds at least
>> a couple of kernel calls per postmaster signal delivery, and probably to
>> every postmaster connection acceptance (ServerLoop iteration), to fix
>> problems that are so corner-casey that we've never even known we had them
>> till now.

> I'm not concerned much about the signal delivery paths, and I can't
> quite imagine that another syscall in the accept path is going to be
> measurable - worth ensuring though.
> ...
> On the other hand most types of our processes do SetLatch() in just
> nearly all the relevant signal handlers anyway, so they're pretty close
> to behaviour already.

True.  Maybe I'm being too worried.

>> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
>> - * any new connections, so we don't call select(), and just 
>> sleep.
>> + * any new connections, so we don't call WaitEventSetWait(), 
>> and just
>> + * sleep.  XXX not ideal
>> */

> Couldn't we just deactive the sockets in the set instead?

Yeah, I think it'd be better to do something like that.  The pg_usleep
call has the same issue of possibly not responding to interrupts.  The
risks are a lot less, since it's a much shorter wait, but I would rather
eliminate the separate code path in favor of doing it honestly.  Didn't
seem like something to fuss over in the first draft though.

>> +ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);

> Why are we using MAXLISTEN, rather than the actual number of things to
> listen to?

It'd take more code (ie, an additional scan of the array) to predetermine
that.  I figured the space-per-item in the WaitEventSet wasn't enough to
worry about ... do you think differently?

> Random note: Do we actually have any code that errors out if too many
> sockets are being listened to?

Yes, see StreamServerPort, about line 400.

> I kind of would like, in master, take a chance of replace all the work
> done in signal handlers, by just a SetLatch(), and do it outside of
> signal handlers instead.  Forking from signal handlers is just plain
> weird.

Yeah, maybe it's time.  But in v11, and not for back-patch.

>> +#ifndef FD_CLOEXEC
>> +#define FD_CLOEXEC 1
>> +#endif

> Hm? Are we sure this is portable?  Is there really cases that have
> F_SETFD, but not CLOEXEC?

Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq.
Might well be obsolete but I see no particular reason not to do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> I'm coming back to the idea that at least in the back branches, the
>> thing to do is allow maybe_start_bgworker to start multiple workers.
>> 
>> Is there any actual evidence for the claim that that might have
>> bad side effects?

> Well, I ran tests with a few dozen thousand sample workers and the
> neglect for other things (such as connection requests) was visible, but
> that's probably not a scenario many servers run often currently.

Indeed.  I'm pretty skeptical that that's an interesting case, and if it
is, the current coding is broken anyway, because with that many workers
you are going to start noticing that running maybe_start_bgworker over
again for each worker is an O(N^2) proposition.  Admittedly, iterating
the loop in maybe_start_bgworker is really cheap compared to a fork(),
but eventually the big-O problem is going to eat your lunch.

> I don't strongly object to the idea of removing the "return" in older
> branches, since it's evidently a problem.  However, as bgworkers start
> to be used more, I think we should definitely have some protection.  In
> a system with a large number of workers available for parallel queries,
> it seems possible for a high velocity server to get stuck in the loop
> for some time.  (I haven't actually verified this, though.  My
> experiments were with the early kind, static bgworkers.)

It might be sensible to limit the number of workers launched per call,
but I think the limit should be quite a bit higher than 1 ... something
like 100 or 1000 might be appropriate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund

Hi,

On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.  It adds at least
> a couple of kernel calls per postmaster signal delivery, and probably to
> every postmaster connection acceptance (ServerLoop iteration), to fix
> problems that are so corner-casey that we've never even known we had them
> till now.

I'm not concerned much about the signal delivery paths, and I can't
quite imagine that another syscall in the accept path is going to be
measurable - worth ensuring though.

I do agree that it's a bit of a big stick for the back-branches...

> A different line of thought is to try to provide a bulletproof solution,
> but push the implementation problems down into latch.c --- that is, the
> goal would be to provide a pselect-like variant of WaitEventSetWait that
> is guaranteed to return if interrupted, as opposed to the current behavior
> where it's guaranteed not to.  But that seems like quite a bit of work.

Seems like a sane idea to me.  The use of latches has already grown due
to parallelism and it'll likely grow further - some of them seem likely
to also have to deal with such concerns.  I'd much rather centralize
things down to a common place.

On the other hand most types of our processes do SetLatch() in just
nearly all the relevant signal handlers anyway, so they're pretty close
to behaviour already.

> Whether or not we decide to change over the postmaster.c code, I think
> it'd likely be a good idea to apply most or all of the attached changes
> in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
> and the other changes will provide some safety if some preloaded extension
> decides to create a latch in the postmaster process.

On the principle, I agree.  Reading through the changes now.

> @@ -1667,76 +1656,64 @@ ServerLoop(void)
>* do nontrivial work.
>*
>* If we are in PM_WAIT_DEAD_END state, then we don't want to 
> accept
> -  * any new connections, so we don't call select(), and just 
> sleep.
> +  * any new connections, so we don't call WaitEventSetWait(), 
> and just
> +  * sleep.  XXX not ideal
>*/

Couldn't we just deactive the sockets in the set instead?

>  /*
> - * Initialise the masks for select() for the ports we are listening on.
> - * Return the number of sockets to listen on.
> + * Create a WaitEventSet for ServerLoop() to wait on.  This includes the
> + * sockets we are listening on, plus the PostmasterLatch.
>   */
> -static int
> -initMasks(fd_set *rmask)
> +static void
> +initServerWaitSet(void)
>  {
> - int maxsock = -1;
>   int i;
>  
> - FD_ZERO(rmask);
> + ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);

Why are we using MAXLISTEN, rather than the actual number of things to
listen to?  The only benefit of this seems to be that we could
theoretically allow dynamic reconfiguration of the sockets a bit more
easily in the future, but that could just as well be done by recreating the set.

Random note: Do we actually have any code that errors out if too many
sockets are being listened to?

> @@ -2553,6 +2543,9 @@ SIGHUP_handler(SIGNAL_ARGS)
>  #endif
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch(&PostmasterLatch);
> +
>   PG_SETMASK(&UnBlockSig);
>  
>   errno = save_errno;
> @@ -2724,6 +2717,9 @@ pmdie(SIGNAL_ARGS)
>   break;
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch(&PostmasterLatch);
> +
>   PG_SETMASK(&UnBlockSig);
>  
>   errno = save_errno;
> @@ -3037,6 +3033,9 @@ reaper(SIGNAL_ARGS)
>*/
>   PostmasterStateMachine();
>  
> + /* Force ServerLoop to iterate */
> + SetLatch(&PostmasterLatch);
> +
>   /* Done with signal handler */
>   PG_SETMASK(&UnBlockSig);
>  
> @@ -5078,6 +5077,9 @@ sigusr1_handler(SIGNAL_ARGS)
>   signal_child(StartupPID, SIGUSR2);
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch(&PostmasterLatch);
> +
>   PG_SETMASK(&UnBlockSig);
>  
>   errno = save_errno;

I kind of would like, in master, take a chance of replace all the work
done in signal handlers, by just a SetLatch(), and do it outside of
signal handlers instead.  Forking from signal handlers is just plain
weird.

> diff --git a/src/backend/storage/ipc/latchindex 4798370..92d2ff0 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -62,6 +62,10 @@
>  #include "storage/pmsignal.h"
>  #include "storage/shmem.h"
>  
> +#ifndef FD_CLOEXEC
> +#define FD_CLOEXEC 1
> +#endif

Hm? Are we sure this is portable?  Is there really cases that have
F_SETFD, but not CLOEXEC?

Greetings,

Andres Freund

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Attached is a lightly-tested draft patch that converts the postmaster to
use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
about whether this is the direction to proceed, though.  It adds at least
a couple of kernel calls per postmaster signal delivery, and probably to
every postmaster connection acceptance (ServerLoop iteration), to fix
problems that are so corner-casey that we've never even known we had them
till now.

I'm looking longingly at pselect(2), ppoll(2), and epoll_pwait(2), which
would solve the problem without need for a self-pipe.  But the latter two
are Linux-only, and while pselect does exist in recent POSIX editions,
it's nonetheless got portability issues.  Googling suggests that on a
number of platforms, pselect is non-atomic, ie it's nothing but a
wrapper for sigprocmask/select/sigprocmask, which would mean that the
race condition I described in my previous message still exists.

Despite that, a pretty attractive proposal is to do, essentially,

#ifdef HAVE_PSELECT
selres = pselect(nSockets, &rmask, NULL, NULL, &timeout, &UnBlockSig);
#else
PG_SETMASK(&UnBlockSig);
selres = select(nSockets, &rmask, NULL, NULL, &timeout);
PG_SETMASK(&BlockSig);
#endif

This fixes the race on platforms where pselect exists and is correctly
implemented, and we're no worse off than before where that's not true.

The other component of the problem is the possibility that select() will
restart if the signal is marked SA_RESTART.  (Presumably that would apply
to pselect too.)  I am thinking that maybe the answer is "if it hurts,
don't do it" --- that is, in the postmaster maybe we shouldn't use
SA_RESTART, at least not for these signals.

A different line of thought is to try to provide a bulletproof solution,
but push the implementation problems down into latch.c --- that is, the
goal would be to provide a pselect-like variant of WaitEventSetWait that
is guaranteed to return if interrupted, as opposed to the current behavior
where it's guaranteed not to.  But that seems like quite a bit of work.

Whether or not we decide to change over the postmaster.c code, I think
it'd likely be a good idea to apply most or all of the attached changes
in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
and the other changes will provide some safety if some preloaded extension
decides to create a latch in the postmaster process.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6831342..658ba73 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***
*** 79,88 
  #include 
  #include 
  
- #ifdef HAVE_SYS_SELECT_H
- #include 
- #endif
- 
  #ifdef USE_BONJOUR
  #include 
  #endif
--- 79,84 
*** static bool LoadedSSL = false;
*** 379,384 
--- 375,386 
  static DNSServiceRef bonjour_sdref = NULL;
  #endif
  
+ /* WaitEventSet that ServerLoop uses to wait for connections */
+ static WaitEventSet *ServerWaitSet = NULL;
+ 
+ /* Latch used to ensure that signals interrupt the wait in ServerLoop */
+ static Latch PostmasterLatch;
+ 
  /*
   * postmaster.c - function prototypes
   */
*** static int	ServerLoop(void);
*** 409,415 
  static int	BackendStartup(Port *port);
  static int	ProcessStartupPacket(Port *port, bool SSLdone);
  static void processCancelRequest(Port *port, void *pkt);
! static int	initMasks(fd_set *rmask);
  static void report_fork_failure_to_client(Port *port, int errnum);
  static CAC_state canAcceptConnections(void);
  static bool RandomCancelKey(int32 *cancel_key);
--- 411,417 
  static int	BackendStartup(Port *port);
  static int	ProcessStartupPacket(Port *port, bool SSLdone);
  static void processCancelRequest(Port *port, void *pkt);
! static void initServerWaitSet(void);
  static void report_fork_failure_to_client(Port *port, int errnum);
  static CAC_state canAcceptConnections(void);
  static bool RandomCancelKey(int32 *cancel_key);
*** checkDataDir(void)
*** 1535,1541 
  }
  
  /*
!  * Determine how long should we let ServerLoop sleep.
   *
   * In normal conditions we wait at most one minute, to ensure that the other
   * background tasks handled by ServerLoop get done even when no requests are
--- 1537,1543 
  }
  
  /*
!  * Determine how long should we let ServerLoop sleep (in msec).
   *
   * In normal conditions we wait at most one minute, to ensure that the other
   * background tasks handled by ServerLoop get done even when no requests are
*** checkDataDir(void)
*** 1543,1551 
   * we don't actually sleep so that they are quickly serviced.  Other exception
   * cases are as shown in the code.
   */
! static void
! DetermineSleepTime(struct timeval * timeout)
  {
  	TimestampTz next_wakeup = 0;
  
  	/*
--- 1545,1554 
   * we don't actually sleep so that they are quickly serviced.  Other exception
   * ca

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Alvaro Herrera

Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable.  But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
> 
> PG_SETMASK(&UnBlockSig);
> 
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them.  Then control
> returns and proceeds to
> 
> selres = select(nSockets, &rmask, NULL, NULL, &timeout);
> 
> But now there's no interrupt pending.  So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses.  They could be waiting awhile.
> 
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting.  It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently.  I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem.  However, as bgworkers start
to be used more, I think we should definitely have some protection.  In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time.  (I haven't actually verified this, though.  My
experiments were with the early kind, static bgworkers.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
>> My first reaction was that that sounded like a lot more work than removing
>> two lines from maybe_start_bgworker and adjusting some comments.  But on
>> closer inspection, the slow-bgworker-start issue isn't the only problem
>> here.

> FWIW, I vaguely remember somewhat related issues on x86/linux too.

After sleeping and thinking more, I've realized that the
slow-bgworker-start issue actually exists on *every* platform, it's just
harder to hit when select() is interruptable.  But consider the case
where multiple bgworker-start requests arrive while ServerLoop is
actively executing (perhaps because a connection request just came in).
The postmaster has signals blocked, so nothing happens for the moment.
When we go around the loop and reach

PG_SETMASK(&UnBlockSig);

the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
bgworker start requests, and services just one of them.  Then control
returns and proceeds to

selres = select(nSockets, &rmask, NULL, NULL, &timeout);

But now there's no interrupt pending.  So the remaining start requests
do not get serviced until (a) some other postmaster interrupt arrives,
or (b) the one-minute timeout elapses.  They could be waiting awhile.

Bottom line is that any request for more than one bgworker at a time
faces a non-negligible risk of suffering serious latency.

I'm coming back to the idea that at least in the back branches, the
thing to do is allow maybe_start_bgworker to start multiple workers.
Is there any actual evidence for the claim that that might have
bad side effects?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund

On 2017-04-20 20:10:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
> >> Also, if it's not there we'd fall back to using plain poll(), which is
> >> not so awful that we need to work hard to avoid it.  I'd just as soon
> >> keep the number of combinations down.
> 
> > Just using fcntl(SET, CLOEXEC) wound't increase the number of
> > combinations?
> 
> True, if you just did it that way unconditionally.  But doesn't that
> require an extra kernel call per CreateWaitEventSet()?

It does - the question is whether that matters much.  FE/BE uses a
persistent wait set, but unfortunately much of other latch users
don't. And some of them can be somewhat frequent - so I guess that'd
possibly be measurable.  Ok, so I'm on board with epoll1.

If somebody were to change more frequent latch users to use persistent
wait sets, that'd be good too.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
>> Also, if it's not there we'd fall back to using plain poll(), which is
>> not so awful that we need to work hard to avoid it.  I'd just as soon
>> keep the number of combinations down.

> Just using fcntl(SET, CLOEXEC) wound't increase the number of
> combinations?

True, if you just did it that way unconditionally.  But doesn't that
require an extra kernel call per CreateWaitEventSet()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund

On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
> >> So ... what would you say to replacing epoll_create() with
> >> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
> >> represent inheritable-across-exec resources on any platform,
> >> making it a lot easier to deal with the EXEC_BACKEND case.
> 
> > I'm generally quite in favor of using CLOEXEC as much as possible in our
> > tree.  I'm a bit concerned with epoll_create1's availability tho - the
> > glibc support for it was introduced in 2.9, whereas epoll_create is in
> > 2.3.2.  On the other hand 2.9 was released 2008-11-13.
> 
> Also, if it's not there we'd fall back to using plain poll(), which is
> not so awful that we need to work hard to avoid it.  I'd just as soon
> keep the number of combinations down.

Just using fcntl(SET, CLOEXEC) wound't increase the number of
combinations?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
>> So ... what would you say to replacing epoll_create() with
>> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
>> represent inheritable-across-exec resources on any platform,
>> making it a lot easier to deal with the EXEC_BACKEND case.

> I'm generally quite in favor of using CLOEXEC as much as possible in our
> tree.  I'm a bit concerned with epoll_create1's availability tho - the
> glibc support for it was introduced in 2.9, whereas epoll_create is in
> 2.3.2.  On the other hand 2.9 was released 2008-11-13.

Also, if it's not there we'd fall back to using plain poll(), which is
not so awful that we need to work hard to avoid it.  I'd just as soon
keep the number of combinations down.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund

On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> >>> or are the HANDLEs in a Windows WaitEventSet not inheritable
> >>> resources?
> 
> >> So that kind of sounds like it should be doable.
> 
> > Ah, good.  I'll add a comment about that and press on.
> 
> So ... what would you say to replacing epoll_create() with
> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
> represent inheritable-across-exec resources on any platform,
> making it a lot easier to deal with the EXEC_BACKEND case.
> 
> AFAIK, both APIs are Linux-only, and epoll_create1() is not much
> newer than epoll_create(), so it seems like we'd not be giving up
> much portability if we insist on epoll_create1.

I'm generally quite in favor of using CLOEXEC as much as possible in our
tree.  I'm a bit concerned with epoll_create1's availability tho - the
glibc support for it was introduced in 2.9, whereas epoll_create is in
2.3.2.  On the other hand 2.9 was released 2008-11-13.   If we remain
concerned we could just fcntl(fd, F_SETFD, FD_CLOEXEC) instead - that
should only be like three lines more code or such, and should be
available for a lot longer.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>>> resources?

>> So that kind of sounds like it should be doable.

> Ah, good.  I'll add a comment about that and press on.

So ... what would you say to replacing epoll_create() with
epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
represent inheritable-across-exec resources on any platform,
making it a lot easier to deal with the EXEC_BACKEND case.

AFAIK, both APIs are Linux-only, and epoll_create1() is not much
newer than epoll_create(), so it seems like we'd not be giving up
much portability if we insist on epoll_create1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund

On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
> I wrote:
> > Alvaro Herrera  writes:
> >> Tom Lane wrote:
> >>> Hm.  Do you have a more-portable alternative?
> 
> >> I was thinking in a WaitEventSet from latch.c.
> 
> My first reaction was that that sounded like a lot more work than removing
> two lines from maybe_start_bgworker and adjusting some comments.  But on
> closer inspection, the slow-bgworker-start issue isn't the only problem
> here.  On a machine that restarts select()'s timeout after an interrupt,
> as (at least) HPUX does, the postmaster will actually never iterate
> ServerLoop's loop except immediately after receiving a new connection
> request.  The stream of interrupts from the autovac launcher is alone
> sufficient to prevent the initial 60-second timeout from ever elapsing.
> So if there are no new connection requests for awhile, none of the
> housekeeping actions in ServerLoop get done.

FWIW, I vaguely remember somewhat related issues on x86/linux too.  On
busy machines in autovacuum_freeze_max_age territory (pretty frequent
thing these days), the signalling frequency caused problems in
postmaster, but I unfortunately don't remember the symptoms very well.


> So it's looking to me like we do need to do something about this, and
> ideally back-patch it all the way.  But WaitEventSet doesn't exist
> before 9.6.  Do we have the stomach for back-patching that into
> stable branches?

Hm, that's not exactly no code - on the other hand it's (excepting
the select(2) path) reasonably well exercised.  What we could do is to
convert master, see how the beta process likes it, and then backpatch?
These don't like super pressing issues.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>> resources?

> I think we have control over that. According to
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
> CreateProcess() has to be called with bInheritHandles = true (which we
> do for backends), and SECURITY_ATTRIBUTES.bInheritHandle has to be true
> too.  The latter we already only do for InitSharedLatch(), but not for
> InitLatch(), nor for the WSACreateEvent's created for sockets - those
> apparently can never be inherited.

> So that kind of sounds like it should be doable.

Ah, good.  I'll add a comment about that and press on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund

On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> or are the HANDLEs in a Windows WaitEventSet not inheritable
> resources?

I think we have control over that. According to
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
CreateProcess() has to be called with bInheritHandles = true (which we
do for backends), and SECURITY_ATTRIBUTES.bInheritHandle has to be true
too.  The latter we already only do for InitSharedLatch(), but not for
InitLatch(), nor for the WSACreateEvent's created for sockets - those
apparently can never be inherited.

So that kind of sounds like it should be doable.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane

I wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

> My first reaction was that that sounded like a lot more work than removing
> two lines from maybe_start_bgworker and adjusting some comments.  But on
> closer inspection, the slow-bgworker-start issue isn't the only problem
> here.  On a machine that restarts select()'s timeout after an interrupt,
> as (at least) HPUX does, the postmaster will actually never iterate
> ServerLoop's loop except immediately after receiving a new connection
> request.

I had a go at making the postmaster use a WaitEventSet, and immediately
ran into problems: latch.c is completely unprepared to be used anywhere
except a postmaster child process.  I think we can get around the issue
for the self-pipe, as per the attached untested patch.  But there remains
a problem: we should do a FreeWaitEventSet() after forking a child
process to ensure that postmaster children aren't running around with
open FDs for the postmaster's stuff.  This is no big problem in a regular
Unix build; we can give ClosePostmasterPorts() the responsibility.
It *is* a problem in EXEC_BACKEND children, which won't have inherited
the WaitEventSet data structure.  Maybe we could ignore the problem for
Unix EXEC_BACKEND builds, since we consider those to be for debug
purposes only, not for production.  But I don't think we can get away
with it for Windows --- or are the HANDLEs in a Windows WaitEventSet
not inheritable resources?

regards, tom lane

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4798370..d4ac54c 100644
*** a/src/backend/storage/ipc/latch.c
--- b/src/backend/storage/ipc/latch.c
*** static volatile sig_atomic_t waiting = f
*** 129,134 
--- 129,136 
  /* Read and write ends of the self-pipe */
  static int	selfpipe_readfd = -1;
  static int	selfpipe_writefd = -1;
+ /* Process owning the self-pipe --- needed for checking purposes */
+ static int	selfpipe_owner_pid = 0;
  
  /* Private function prototypes */
  static void sendSelfPipeByte(void);
*** InitializeLatchSupport(void)
*** 158,164 
  #ifndef WIN32
  	int			pipefd[2];
  
! 	Assert(selfpipe_readfd == -1);
  
  	/*
  	 * Set up the self-pipe that allows a signal handler to wake up the
--- 160,202 
  #ifndef WIN32
  	int			pipefd[2];
  
! 	if (IsUnderPostmaster)
! 	{
! 		/*
! 		 * We might have inherited connections to a self-pipe created by the
! 		 * postmaster.  It's critical that child processes create their own
! 		 * self-pipes, of course, and we really want them to close the
! 		 * inherited FDs for safety's sake.
! 		 */
! 		if (selfpipe_owner_pid != 0)
! 		{
! 			/* Assert we go through here but once in a child process */
! 			Assert(selfpipe_owner_pid != MyProcPid);
! 			/* Release postmaster's pipe */
! 			close(selfpipe_readfd);
! 			close(selfpipe_writefd);
! 			/* Clean up, just for safety's sake; we'll set these in a bit */
! 			selfpipe_readfd = selfpipe_writefd = -1;
! 			selfpipe_owner_pid = 0;
! 		}
! 		else
! 		{
! 			/*
! 			 * Postmaster didn't create a self-pipe ... or else we're in an
! 			 * EXEC_BACKEND build, and don't know because we didn't inherit
! 			 * values for the static variables.  (In that case we'll fail to
! 			 * close the inherited FDs, but that seems acceptable since
! 			 * EXEC_BACKEND builds aren't meant for production on Unix.)
! 			 */
! 			Assert(selfpipe_readfd == -1);
! 		}
! 	}
! 	else
! 	{
! 		/* In postmaster or standalone backend, assert we do this but once */
! 		Assert(selfpipe_readfd == -1);
! 		Assert(selfpipe_owner_pid == 0);
! 	}
  
  	/*
  	 * Set up the self-pipe that allows a signal handler to wake up the
*** InitializeLatchSupport(void)
*** 176,188 
  
  	selfpipe_readfd = pipefd[0];
  	selfpipe_writefd = pipefd[1];
  #else
  	/* currently, nothing to do here for Windows */
  #endif
  }
  
  /*
!  * Initialize a backend-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
--- 214,227 
  
  	selfpipe_readfd = pipefd[0];
  	selfpipe_writefd = pipefd[1];
+ 	selfpipe_owner_pid = MyProcPid;
  #else
  	/* currently, nothing to do here for Windows */
  #endif
  }
  
  /*
!  * Initialize a process-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
*** InitLatch(volatile Latch *latch)
*** 193,199 
  
  #ifndef WIN32
  	/* Assert InitializeLatchSupport has been called in this process */
! 	Assert(selfpipe_readfd >= 0);
  #else
  	latch->event = CreateEvent(NULL, TRUE, FALSE, NULL);
  	if (latch->event == NULL)
--- 232,238 
  
  #ifndef WIN32
  	/* Assert InitializeLatchSupport has been called in this process */
! 	Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid);
  #else
  	latch->event = CreateEvent(NULL, TRUE, FALSE, NULL);
  	if (latch->event == NULL)
*** OwnLatch(volatile Latch *latch)
*** 256,262 ***

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane

I wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

My first reaction was that that sounded like a lot more work than removing
two lines from maybe_start_bgworker and adjusting some comments.  But on
closer inspection, the slow-bgworker-start issue isn't the only problem
here.  On a machine that restarts select()'s timeout after an interrupt,
as (at least) HPUX does, the postmaster will actually never iterate
ServerLoop's loop except immediately after receiving a new connection
request.  The stream of interrupts from the autovac launcher is alone
sufficient to prevent the initial 60-second timeout from ever elapsing.
So if there are no new connection requests for awhile, none of the
housekeeping actions in ServerLoop get done.

Most of those actions are concerned with restarting failed background
tasks, which is something we could get by without --- it's unlikely
that those tasks would fail without causing a database-wide restart,
and then there really isn't going to be much need for them until at least
one new connection request has arrived.  But the last step in that loop is
concerned with touching the sockets and lock files to prevent aggressive
/tmp cleaners from removing them, and that's something that can't be let
slide, or we might as well not have the logic at all.

I've confirmed experimentally that on my HPUX box, a postmaster not
receiving new connections for an hour or more in fact fails to update
the mod times on the sockets and lock files.  So that problem isn't
hypothetical.

So it's looking to me like we do need to do something about this, and
ideally back-patch it all the way.  But WaitEventSet doesn't exist
before 9.6.  Do we have the stomach for back-patching that into
stable branches?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane

Andres Freund  writes:
> FWIW, I'd wished before that we used something a bit more modern than
> select() if available... It's nice to be able to listen to a larger
> number of sockets without repeated O(sockets) overhead.

[ raised eyebrow... ]  Is anyone really running postmasters with enough
listen sockets for that to be meaningful?

> BTW, we IIRC had discussed removing the select() backed latch
> implementation in this release.  I'll try to dig up that discussion.

Might be sensible.  Even my pet dinosaurs have poll(2).  We should
check the buildfarm to see if the select() implementation is being
tested at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Andres Freund

On 2017-04-19 18:56:26 -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Hm.  Do you have a more-portable alternative?
> 
> > I was thinking in a WaitEventSet from latch.c.
> 
> Yeah, some googling turns up the suggestion that a self-pipe is a portable
> way to get consistent semantics from select(); latch.c has already done
> that.  I suppose that using latch.c would be convenient in that we'd have
> to write little new code, but it's a tad annoying to add the overhead of a
> self-pipe on platforms where we don't need it (which seems to be most).

FWIW, I'd wished before that we used something a bit more modern than
select() if available... It's nice to be able to listen to a larger
number of sockets without repeated O(sockets) overhead.

BTW, we IIRC had discussed removing the select() backed latch
implementation in this release.  I'll try to dig up that discussion.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Hm.  Do you have a more-portable alternative?

> I was thinking in a WaitEventSet from latch.c.

Yeah, some googling turns up the suggestion that a self-pipe is a portable
way to get consistent semantics from select(); latch.c has already done
that.  I suppose that using latch.c would be convenient in that we'd have
to write little new code, but it's a tad annoying to add the overhead of a
self-pipe on platforms where we don't need it (which seems to be most).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> So I'm wondering what the design rationale was for only starting one
> >> bgworker per invocation.
> 
> > The rationale was that there may be other tasks waiting for postmaster
> > attention, and if there are many bgworkers needing to be started, the
> > other work may be delayed for a long time.  This is not the first time
> > that this rationale has been challenged, but so far there hasn't been
> > any good reason to change it.  One option is to just remove it as you
> > propose, but a different one is to stop using select(2) in ServerLoop,
> > because those behavior differences seem to make it rather unusable.
> 
> Hm.  Do you have a more-portable alternative?

I was thinking in a WaitEventSet from latch.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> So I'm wondering what the design rationale was for only starting one
>> bgworker per invocation.

> The rationale was that there may be other tasks waiting for postmaster
> attention, and if there are many bgworkers needing to be started, the
> other work may be delayed for a long time.  This is not the first time
> that this rationale has been challenged, but so far there hasn't been
> any good reason to change it.  One option is to just remove it as you
> propose, but a different one is to stop using select(2) in ServerLoop,
> because those behavior differences seem to make it rather unusable.

Hm.  Do you have a more-portable alternative?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera

Tom Lane wrote:

> While I haven't yet tested it, it seems like a fix might be as simple
> as deleting these lines in maybe_start_bgworker:
> 
> /*
>  * Have ServerLoop call us again.  Note that there might not
>  * actually *be* another runnable worker, but we don't care all
>  * that much; we will find out the next time we run.
>  */
> StartWorkerNeeded = true;
> return;
> 
> So I'm wondering what the design rationale was for only starting one
> bgworker per invocation.

The rationale was that there may be other tasks waiting for postmaster
attention, and if there are many bgworkers needing to be started, the
other work may be delayed for a long time.  This is not the first time
that this rationale has been challenged, but so far there hasn't been
any good reason to change it.  One option is to just remove it as you
propose, but a different one is to stop using select(2) in ServerLoop,
because those behavior differences seem to make it rather unusable.

> It also appears to me that do_start_bgworker's treatment of fork
> failure is completely brain dead.  Did anyone really think about
> that case?

Hmm, I probably modelled it on autovacuum without giving that case much
additional consideration.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

50 matches

Mail list logo