Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-13 Thread Heikki Linnakangas

On 12/09/10 20:13, Jeff Davis wrote:

On Sun, 2010-09-12 at 12:29 -0400, Tom Lane wrote:

... why throw an ERROR there if it can't happen (or
indicates an inconsistent state when it does happen)?


Are you suggesting that an Assert would be sufficient?


I'm not too picky about whether it's Assert, ERROR, or PANIC (Asserts
aren't available in production systems, which I assume is why elog was
used);


Right, OwnLatch is a not performance-critical, so it's better to elog() 
IMHO.



but we should be consistent and document that:
  (a) it shouldn't happen
  (b) that it's just a sanity check and we're ignoring the race


Would this be sufficient?

--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -156,6 +156,7 @@ OwnLatch(volatile Latch *latch)
if (selfpipe_readfd == -1)
initSelfPipe();

+   /* sanity check */
if (latch-owner_pid != 0)
elog(ERROR, latch already owned);
latch-owner_pid = MyProcPid;

Or you want to suggest something better?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-13 Thread Jeff Davis
On Mon, 2010-09-13 at 09:10 +0300, Heikki Linnakangas wrote:
  but we should be consistent and document that:
(a) it shouldn't happen
(b) that it's just a sanity check and we're ignoring the race
 
 Would this be sufficient?
 
 --- a/src/backend/port/unix_latch.c
 +++ b/src/backend/port/unix_latch.c
 @@ -156,6 +156,7 @@ OwnLatch(volatile Latch *latch)
   if (selfpipe_readfd == -1)
   initSelfPipe();
 
 + /* sanity check */
   if (latch-owner_pid != 0)
   elog(ERROR, latch already owned);
   latch-owner_pid = MyProcPid;
 
 Or you want to suggest something better?

Perfect. I was just slightly confused reading it the first time, and I
think that would have cleared it up.

Thanks,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-13 Thread Heikki Linnakangas

On 13/09/10 20:43, Jeff Davis wrote:

On Mon, 2010-09-13 at 09:10 +0300, Heikki Linnakangas wrote:

but we should be consistent and document that:
   (a) it shouldn't happen
   (b) that it's just a sanity check and we're ignoring the race


Would this be sufficient?

--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -156,6 +156,7 @@ OwnLatch(volatile Latch *latch)
if (selfpipe_readfd == -1)
initSelfPipe();

+   /* sanity check */
if (latch-owner_pid != 0)
elog(ERROR, latch already owned);
latch-owner_pid = MyProcPid;

Or you want to suggest something better?


Perfect. I was just slightly confused reading it the first time, and I
think that would have cleared it up.


Ok, added that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Jeff Davis
On Sat, 2010-09-11 at 19:15 +0300, Heikki Linnakangas wrote:
 Committed. I'll take a look at making walreceiver respond quickly when 
 WAL arrives in the standby, using latches, but that shouldn't interfere 
 with what you're doing.

I glanced at the code, and I see (in OwnLatch()):

+   if (latch-owner_pid != 0)
+   elog(ERROR, latch already owned);
+   latch-owner_pid = MyProcPid;

But it looks like there may be a race there. I assume the callers are
supposed to have a lock at this point (and it looks like the current
caller is safe, but I didn't look in detail). Something still seems
strange to me though -- why throw an ERROR there if it can't happen (or
indicates an inconsistent state when it does happen)? And it doesn't
look safe to _not_ throw an error (due to a race) if it does happen.

It seems like OwnLatch is only supposed to be used when you_not_ to
throw an error in the event of a race (I don't think it is). already
know that you own it, because there's no error code returned or blocking
behavior, it just throws an ERROR. But if that's the case, what's the
point of OwnLatch()?

Perhaps add a few comments to describe to other users of the API how to
properly own a shared latch?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I glanced at the code, and I see (in OwnLatch()):

 +   if (latch-owner_pid != 0)
 +   elog(ERROR, latch already owned);
 +   latch-owner_pid = MyProcPid;

 But it looks like there may be a race there.

Yeah, that error check is only intended to catch gross logic errors,
not to guard against race conditions.  I don't think we really could
prevent a race there without adding a spinlock, which seems like
overkill.

 ... why throw an ERROR there if it can't happen (or
 indicates an inconsistent state when it does happen)?

Are you suggesting that an Assert would be sufficient?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Jeff Davis
On Sun, 2010-09-12 at 12:29 -0400, Tom Lane wrote:
  ... why throw an ERROR there if it can't happen (or
  indicates an inconsistent state when it does happen)?
 
 Are you suggesting that an Assert would be sufficient?
 

I'm not too picky about whether it's Assert, ERROR, or PANIC (Asserts
aren't available in production systems, which I assume is why elog was
used); but we should be consistent and document that:
 (a) it shouldn't happen
 (b) that it's just a sanity check and we're ignoring the race

However, that also means that the whole concept of OwnLatch/DisownLatch
is entirely redundant, and only there for asserts because it doesn't do
anything else. That seems a little strange to me, as well, so (at
minimum) it should be documented that the functions really have no
effect on execution and are required only to support debugging.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Jeff Davis
On Sun, 2010-09-12 at 10:13 -0700, Jeff Davis wrote:
 I'm not too picky about whether it's Assert, ERROR, or PANIC (Asserts
 aren't available in production systems, which I assume is why elog was
 used); but we should be consistent and document that:
  (a) it shouldn't happen
  (b) that it's just a sanity check and we're ignoring the race

I should add that Assert seems to imply both of those things (at least
to me), so that would solve one of the confusing aspects of the API.

The other confusing part that I think still needs comments is that
OwnLatch/DisownLatch don't really do anything; they just carry
information for sanity checks later.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 However, that also means that the whole concept of OwnLatch/DisownLatch
 is entirely redundant, and only there for asserts because it doesn't do
 anything else. That seems a little strange to me, as well, so (at
 minimum) it should be documented that the functions really have no
 effect on execution and are required only to support debugging.

Uh, this is nonsense.  You have to have something like these functions
to support transferring ownership of a latch from one process to
another, which is required at least for the walreceiver usage.

It's correct that the latch code itself isn't trying very hard to avoid
a race condition in acquiring ownership, but that doesn't make the whole
thing useless, it just means that we're assuming that will be avoided
by logic elsewhere.  If there is a bug elsewhere that allows two
different processes to try to take ownership of the same latch, the
current coding will expose that bug soon enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-12 Thread Jeff Davis
On Sun, 2010-09-12 at 14:12 -0400, Tom Lane wrote:
 Uh, this is nonsense.  You have to have something like these functions
 to support transferring ownership of a latch from one process to
 another, which is required at least for the walreceiver usage.

Oh, I see. It's needed to know where to send the signal, of course.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-11 Thread Heikki Linnakangas

On 06/09/10 19:27, Heikki Linnakangas wrote:

On 06/09/10 17:18, Tom Lane wrote:

BTW, on reflection the AcquireLatch/ReleaseLatch terminology seems a bit
ill chosen: ReleaseLatch sounds way too much like something that would
just unlock or clear the latch. Perhaps OwnLatch/DisownLatch, or
something along that line.


Yeah, I see what you mean. Although, maybe it's just me but Own/Disown
looks ugly. Anyone have a better suggestion?


Magnus suggested AssociateLatch, given that the description of the 
function is that it associates the latch with the current process. I 
went with Own/Disown after all, it feels more precise, and having made 
the changes it doesn't look that ugly to me anymore.



Here's an updated patch, with all the issues reported this far fixed,
except for that naming issue, and Fujii's suggestion to use poll()
instead of select() where available. I've also polished it quite a bit,
improving comments etc. Magnus, can you take a look at the Windows
implementation to check that it's sane? At least it seems to work.


We discussed the patch over IM, there's one point minor point I'd like 
to get into the archives:



It seems that NumSharedLatches() is entirely wrongly placed if it's in
  the win32 specific code! That needs to be somewhere shared, so people find it,


Yeah. There's a notice of that in OwnLatch(), but it would be nice if we 
could make it even more prominent. One idea is to put in latch.h as:


#define NumSharedLatches() (max_wal_senders /* + something else in the 
future */ )


When it's a #define, we don't need to put #include walsender.h in 
latch.h, it's enough to put it in win32_latch.c. It's a bit weird to 
have a #define in one header file that doesn't work unless you #include 
another header file in where you use it, but it would work. Any opinions 
on whether that's better than having NumSharedLatches() defined in the 
Win32-specific win32_latch.c file? I'm inclined to leave it as it is, in 
win32_latch.c, but I'm not sure.


Barring any last-minute objections, I'll commit this in the next few 
days. This patch doesn't affect walreceiver yet; I think the next step 
is to use the latches to eliminate the polling loop in walreceiver too, 
so that as soon as a piece of WAL is fsync'd to disk in the standby, 
it's applied.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-11 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 06/09/10 19:27, Heikki Linnakangas wrote:
 It seems that NumSharedLatches() is entirely wrongly placed if it's in
 the win32 specific code! That needs to be somewhere shared, so people find 
 it,

 Yeah. There's a notice of that in OwnLatch(), but it would be nice if we 
 could make it even more prominent. One idea is to put in latch.h as:

 #define NumSharedLatches() (max_wal_senders /* + something else in the 
 future */ )

 When it's a #define, we don't need to put #include walsender.h in 
 latch.h, it's enough to put it in win32_latch.c. It's a bit weird to 
 have a #define in one header file that doesn't work unless you #include 
 another header file in where you use it, but it would work.

We have other precedents for that.  But having said that ...

 Any opinions 
 on whether that's better than having NumSharedLatches() defined in the 
 Win32-specific win32_latch.c file? I'm inclined to leave it as it is, in 
 win32_latch.c, but I'm not sure.

I'd leave it alone.  I see no very good reason to expose
NumSharedLatches globally.

 Barring any last-minute objections, I'll commit this in the next few 
 days. This patch doesn't affect walreceiver yet; I think the next step 
 is to use the latches to eliminate the polling loop in walreceiver too, 
 so that as soon as a piece of WAL is fsync'd to disk in the standby, 
 it's applied.

I will do some work as well once it's in.  Since I was the one
complaining about extra wakeups in the other processes, I'm willing
to go fix 'em.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-11 Thread Heikki Linnakangas

On 11/09/10 18:02, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Barring any last-minute objections, I'll commit this in the next few
days. This patch doesn't affect walreceiver yet; I think the next step
is to use the latches to eliminate the polling loop in walreceiver too,
so that as soon as a piece of WAL is fsync'd to disk in the standby,
it's applied.


I will do some work as well once it's in.  Since I was the one
complaining about extra wakeups in the other processes, I'm willing
to go fix 'em.


Committed. I'll take a look at making walreceiver respond quickly when 
WAL arrives in the standby, using latches, but that shouldn't interfere 
with what you're doing.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-09 Thread Markus Wanner

On 09/08/2010 08:18 PM, Tom Lane wrote:

Considering that we know that major platforms
such as FreeBSD have changed their implementations *very* recently,
it seems foolish to assume that an executable built on a machine with
corrected pselect could not be run on one with an older implementation.


FWIW testing a recent development (i.e. 9.0-devel) version of FreeBSD 
still failed to properly support pselect().



Also, we have plenty of experience with substituting poll() for
select(), so I'm not too worried about copy-and-pasting such code.


It's certainly a simpler change than the self-pipe trick vs. pselect(), yes.

I'm happy to go with the self-pipe trick. A quick micro-benchmark didn't 
even show any significant difference compared to pselect(), so form that 
perspective, it's not a big deal.


And we'd then have a major project using the self-pipe trick. (I would 
still like to know what others exist).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-09 Thread Markus Wanner

On 09/08/2010 08:01 PM, Heikki Linnakangas wrote:

Yeah, there isn't much you can do about it. Perhaps you could set a
mayday flag (a global boolean variable) if it fails, and check that in
the main loop, elogging a warning there instead. But I don't think we
need to go to such lengths, realistically the write() will never fail or
you have bigger problems.


Hm.. I think I'd like to see such a mayday flag. Just so we at least 
have a chance of finding out that something has gone wrong - whether or 
not there's a bigger problem.



Perhaps, although it should be very rare to have more than one byte in
the pipe. SetLatch doesn't write another byte if the latch is already
set, so you only get multiple bytes in the pipe if many processes set
the latch at the same instant.


Depending on what you use these latches for, it might not be that rare 
anymore. Trying to read more than one byte at a time doesn't cost anything.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Markus Wanner

Hi,

On 09/06/2010 11:03 PM, Tom Lane wrote:

I don't entirely see the point of opening ourselves up to the risk of
using a pselect that's not safe under the hood.


It should be possible to reliably determine the platforms that provide 
an atomic pselect(). For those, I'm hesitant to use a trick, where 
pselect() clearly provides a simpler and more official alternative. 
Especially considering that those platforms form the vast majority for 
running Postgres on.


What I'm most concerned about is the write() syscall within the signal 
handler. If that fails for another reason than those covered, we miss 
the signal. As Heikki points out in the comment, it's hard to deal with 
such a failure.


Regarding the exact implementation, the positioning of drainSelfPipe in 
Heikki's implementation seems strange to me. Most descriptions of the 
self-pipe trick [1] [2] [4] put the drainSelfPipe() just after the 
select(), where you can be sure there actually is something to read. 
(Except [3], which recommends putting it inside the signal handler, 
which I find even more frightening).


Maybe you can read more than one byte at a time in drainSelfPipe(), to 
save some syscalls?


Talking about the trick itself again: I found a lot of descriptions and 
mentioning of the self-pipe trick, but so far I only found an unknown 
window manager [5] and the custom inetd that's mentioned in the LWN 
article [4] which really use that trick. Digging deeper revealed that 
there's a sigsafe library [6] as well as the bglibs [7] which both seems 
to use the self-pipe trick as well (of which the later doesn't even care 
about the write()'s return value in the signal handler). None of these 
two libraries seems to be used in any project of relevance.


Overall I got the impression that people like to describe the trick, 
because it sounds so nifty and clever. However, I'd feel more 
comfortable if I knew there were some medium to large projects that 
actually use that trick. But AFAICT not even Bernstein's qmail does.



In any case, on most
modern platforms poll() is preferable to any variant of select().


Only Linux provides a ppoll() variant. And poll() itself doesn't replace 
pselect().



Overall, I'm glad this gets addressed. Note that this is a long standing 
issue for Postgres-R and it's covered with a lengthy comment in its TODO 
file [8].


Regards

Markus Wanner


[1] D. J. Bernstein, The self-pipe trick
http://cr.yp.to/docs/selfpipe.html

[2] Emile van Bergen, Avoiding races with Unix signals and select()
http://www.xs4all.nl/~evbergen/unix-signals.html

[3] Alex Pennace, Safe UNIX Signal Handling Tips
http://osiris.978.org/~alex/safesignalhandling.html

[4] LWN Article: The new pselect() system call, mentions the self-pipe 
trick in a comment

http://lwn.net/Articles/176911/

[5] Karmen: a window manager
http://freshmeat.net/projects/karmen

[6] sigsafe library
http://www.slamb.org/projects/sigsafe/

[7] Bruce Guenter, one stop library package
http://untroubled.org/bglibs/

[8] Postgres-R TOOD entry
http://git.postgres-r.org/?p=Postgres-R;a=blob;f=src/backend/replication/TODO;h=7bfc37ee9629943b9ff052d571b9d933ab38a0a8;hb=HEAD#l12


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Heikki Linnakangas

On 08/09/10 20:36, Markus Wanner wrote:

On 09/06/2010 11:03 PM, Tom Lane wrote:

I don't entirely see the point of opening ourselves up to the risk of
using a pselect that's not safe under the hood.


It should be possible to reliably determine the platforms that provide
an atomic pselect(). For those, I'm hesitant to use a trick, where
pselect() clearly provides a simpler and more official alternative.
Especially considering that those platforms form the vast majority for
running Postgres on.


Perhaps, but I'm equally concerned that having different implementations 
for different platforms means that all implementations get less testing 
than if we use only one. Because of that I'm actually reluctant to even 
use poll() where available instead of select(). At least in the first 
phase, until someone demonstrates that there's a measurable difference 
in performance. We only call poll/select when we're about to sleep, so 
it's not really that performance critical anyway.



What I'm most concerned about is the write() syscall within the signal
handler. If that fails for another reason than those covered, we miss
the signal. As Heikki points out in the comment, it's hard to deal with
such a failure.


Yeah, there isn't much you can do about it. Perhaps you could set a 
mayday flag (a global boolean variable) if it fails, and check that in 
the main loop, elogging a warning there instead. But I don't think we 
need to go to such lengths, realistically the write() will never fail or 
you have bigger problems.



Maybe you can read more than one byte at a time in drainSelfPipe(), to
save some syscalls?


Perhaps, although it should be very rare to have more than one byte in 
the pipe. SetLatch doesn't write another byte if the latch is already 
set, so you only get multiple bytes in the pipe if many processes set 
the latch at the same instant.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 08/09/10 20:36, Markus Wanner wrote:
 It should be possible to reliably determine the platforms that provide
 an atomic pselect(). For those, I'm hesitant to use a trick, where
 pselect() clearly provides a simpler and more official alternative.
 Especially considering that those platforms form the vast majority for
 running Postgres on.

 Perhaps, but I'm equally concerned that having different implementations 
 for different platforms means that all implementations get less testing 
 than if we use only one.

There's that, and there's also that Markus' premise is full of holes.
Exactly how will you determine that pselect is safe at compile time?
Even if you correctly determine that, how can you be sure that the
finished executable will only be run against a version of libc that has
a safe implementation?  Considering that we know that major platforms
such as FreeBSD have changed their implementations *very* recently,
it seems foolish to assume that an executable built on a machine with
corrected pselect could not be run on one with an older implementation.

 Because of that I'm actually reluctant to even 
 use poll() where available instead of select(). At least in the first 
 phase, until someone demonstrates that there's a measurable difference 
 in performance.

select() is demonstrably a loser whenever the process has a lot of open
files.  Also, we have plenty of experience with substituting poll() for
select(), so I'm not too worried about copy-and-pasting such code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 If the issue is just that select() doesn't get interrupted and we don't
 care about a couple of syscalls, would it not be better to simply use
 sigaction to turn on SA_RESTART just prior to the select() and turn it
 off just after. Or are these systems so broken that select() won't be
 interrupted, even if the signal handler is explicitly configured to do
 so?

I think you mean turn *off* SA_RESTART.  We'd have to do that for each
signal that we were concerned about allowing to interrupt the select(),
so it's more than just two added calls.  Another small problem is that
the latch code doesn't/shouldn't know what handlers are active, and
AFAICS you can't use sigaction() to flip that flag without setting the
handler address too.  So while maybe we could do it that way, it'd be
pretty dang messy.

In my mind the main value of the Latch code will be to have a clean
platform-independent API for waiting.  Why all the angst about whether
the implementation underneath is clean or not?  It's more important that
it *works* and we don't have to worry about whether it will break on
platform XYZ.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Martijn van Oosterhout
On Mon, Sep 06, 2010 at 07:24:59PM +0200, Markus Wanner wrote:
 Do I understand correctly that the purpose of this patch is to work  
 around the brokenness of select() on very few platforms? Or is there any  
 additional feature that plain signals don't give us?

If the issue is just that select() doesn't get interrupted and we don't
care about a couple of syscalls, would it not be better to simply use
sigaction to turn on SA_RESTART just prior to the select() and turn it
off just after. Or are these systems so broken that select() won't be
interrupted, even if the signal handler is explicitly configured to do
so?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-08 Thread Heikki Linnakangas

On 08/09/10 23:07, Martijn van Oosterhout wrote:

On Mon, Sep 06, 2010 at 07:24:59PM +0200, Markus Wanner wrote:

Do I understand correctly that the purpose of this patch is to work
around the brokenness of select() on very few platforms? Or is there any
additional feature that plain signals don't give us?


If the issue is just that select() doesn't get interrupted and we don't
care about a couple of syscalls, would it not be better to simply use
sigaction to turn on SA_RESTART just prior to the select() and turn it
off just after. Or are these systems so broken that select() won't be
interrupted, even if the signal handler is explicitly configured to do
so?


I don't know if SA_RESTART is portable. But in any case, that will do 
nothing about the race condition where the signal arrives just *before* 
the select() call.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-07 Thread Heikki Linnakangas

On 06/09/10 23:10, Markus Wanner wrote:

Good. How about syscall overhead? One more write operation to the
self-pipe per signal from within the signal handler and one read to
actually clear the 'ready' state of the pipe from the waiter portion of
the code, right?


Right.


Do we plan to replace all (or most) existing internal signals with these
latches to circumvent the interruption problem? Or just the ones we need
to wait for using pg_usleep()?


At least the poll loops in bgwriter and walwriter need to be replaced if 
we want to fix the issue Tom mentioned earlier that the server currently 
wakes up periodically, waking up the CPU which wastes electricity. 
There's no hurry to replace other code.



For Postgres-R, I'd probably have to extend it to call select() not only
on the self-pipe, but on at least one other socket as well (to talk to
the GCS). As long as that's possible, it looks like a more portable
replacement for the pselect() variant that's currently in place there.


Yeah, that would be a straightforward extension.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-07 Thread Heikki Linnakangas

On 06/09/10 20:24, Markus Wanner wrote:

On 09/06/2010 06:27 PM, Heikki Linnakangas wrote:

+ * It's important to reset the latch*before* checking if there's work to
+ * do. Otherwise, if someone sets the latch between the check and the
+ * ResetLatch call, you will miss it and Wait will block.


Why doesn't WaitLatch() clear it? What's the use case for waiting for a
latch and *not* wanting to reset it?


Setting a latch that's already set is very fast, so you want to keep it 
set until the last moment. See the coding in walsender for example, it 
goes to some lengths to avoid clearing the latch until it's very sure 
there's no more work for it to do. That helps to keep the overhead in 
backends committing transactions low. (no-one has tried to measure that 
yet, though)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-07 Thread Markus Wanner

On 09/07/2010 09:06 AM, Heikki Linnakangas wrote:

Setting a latch that's already set is very fast, so you want to keep it
set until the last moment. See the coding in walsender for example, it
goes to some lengths to avoid clearing the latch until it's very sure
there's no more work for it to do. That helps to keep the overhead in
backends committing transactions low. (no-one has tried to measure that
yet, though)


Understood, thanks.

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Heikki Linnakangas

On 03/09/10 21:50, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 03/09/10 21:16, Tom Lane wrote:

It's probably not too unreasonable to assume that pid_t assignment is
atomic.  But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at approximately
the same time as a latch owner is coming or going.



I don't see how to avoid it. A walsender, or any process really, can
exit at any time. It can make the latch inaccessible to others before it
exits to minimize the window, but it's always going to be possible that
another process is just about to call SetLatch when you exit.


Well, in that case what we need to do is presume that the latch object
has a continuing existence but the owner/receiver can come and go.
I would suggest that InitLatch needs to initialize the object into a
valid but unowned state; there is *no* deinitialize operation; and
there are AcquireLatch and ReleaseLatch operations to become owner
or stop being owner.


I think we have just a terminology issue. What you're describing is 
exactly how it works now, if you just s/InitLatch/AcquireLatch. At the 
moment there's no need for an initialization function other than the 
InitLatch/AcquireLatch that associates the latch with the current 
process. I can add one for the sake of future-proofing, and to have 
better-defined behavior for setting a latch that has not been owned by 
anyone yet, but it's not strictly necessary.



 We also need to define the semantics of SetLatch
on an unowned latch --- does this set a signal condition that will be
available to the next owner?


At the moment, no. Perhaps that would be useful, separating the Init and 
Acquire operations is needed to make that sane.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 21:50, Tom Lane wrote:
 Well, in that case what we need to do is presume that the latch object
 has a continuing existence but the owner/receiver can come and go.
 I would suggest that InitLatch needs to initialize the object into a
 valid but unowned state; there is *no* deinitialize operation; and
 there are AcquireLatch and ReleaseLatch operations to become owner
 or stop being owner.

 I think we have just a terminology issue. What you're describing is 
 exactly how it works now, if you just s/InitLatch/AcquireLatch.

No, it isn't.  What I'm suggesting requires breaking InitLatch into two
operations.

 We also need to define the semantics of SetLatch
 on an unowned latch --- does this set a signal condition that will be
 available to the next owner?

 At the moment, no. Perhaps that would be useful, separating the Init and 
 Acquire operations is needed to make that sane.

Exactly.  I'm not totally sure either if it would be useful, but the
current design makes it impossible to allow that.

BTW, on reflection the AcquireLatch/ReleaseLatch terminology seems a bit
ill chosen: ReleaseLatch sounds way too much like something that would
just unlock or clear the latch.  Perhaps OwnLatch/DisownLatch, or
something along that line.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Heikki Linnakangas

On 06/09/10 17:18, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I think we have just a terminology issue. What you're describing is
exactly how it works now, if you just s/InitLatch/AcquireLatch.


No, it isn't.  What I'm suggesting requires breaking InitLatch into two
operations.


We also need to define the semantics of SetLatch
on an unowned latch --- does this set a signal condition that will be
available to the next owner?



At the moment, no. Perhaps that would be useful, separating the Init and
Acquire operations is needed to make that sane.


Exactly.  I'm not totally sure either if it would be useful, but the
current design makes it impossible to allow that.


Ok, I've split the Init and Acquire steps into two.


BTW, on reflection the AcquireLatch/ReleaseLatch terminology seems a bit
ill chosen: ReleaseLatch sounds way too much like something that would
just unlock or clear the latch.  Perhaps OwnLatch/DisownLatch, or
something along that line.


Yeah, I see what you mean. Although, maybe it's just me but Own/Disown 
looks ugly. Anyone have a better suggestion?


Here's an updated patch, with all the issues reported this far fixed, 
except for that naming issue, and Fujii's suggestion to use poll() 
instead of select() where available. I've also polished it quite a bit, 
improving comments etc. Magnus, can you take a look at the Windows 
implementation to check that it's sane? At least it seems to work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/configure b/configure
index bd9b347..432cd58 100755
--- a/configure
+++ b/configure
@@ -27773,6 +27773,13 @@ _ACEOF
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -29098,7 +29105,7 @@ fi
 ac_config_files=$ac_config_files GNUmakefile src/Makefile.global
 
 
-ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
+ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
 
 
 if test $PORTNAME = win32; then
@@ -29722,6 +29729,7 @@ do
 src/backend/port/dynloader.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c ;;
 src/backend/port/pg_sema.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} ;;
 src/backend/port/pg_shmem.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} ;;
+src/backend/port/pg_latch.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} ;;
 src/include/dynloader.h) CONFIG_LINKS=$CONFIG_LINKS src/include/dynloader.h:src/backend/port/dynloader/${template}.h ;;
 src/include/pg_config_os.h) CONFIG_LINKS=$CONFIG_LINKS src/include/pg_config_os.h:src/include/port/${template}.h ;;
 src/Makefile.port) CONFIG_LINKS=$CONFIG_LINKS src/Makefile.port:src/makefiles/Makefile.${template} ;;
diff --git a/configure.in b/configure.in
index 7b09986..7f84cea 100644
--- a/configure.in
+++ b/configure.in
@@ -1700,6 +1700,13 @@ else
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -1841,6 +1848,7 @@ AC_CONFIG_LINKS([
   src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c
   src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION}
   src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION}
+  src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION}
   src/include/dynloader.h:src/backend/port/dynloader/${template}.h
   src/include/pg_config_os.h:src/include/port/${template}.h
   src/Makefile.port:src/makefiles/Makefile.${template}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 615a7fa..094d0c9 100644
--- 

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Markus Wanner

Hi,

On 09/06/2010 06:27 PM, Heikki Linnakangas wrote:

Here's an updated patch, with all the issues reported this far fixed,
except for that naming issue, and Fujii's suggestion to use poll()
instead of select() where available. I've also polished it quite a bit,
improving comments etc. Magnus, can you take a look at the Windows
implementation to check that it's sane? At least it seems to work.


Is pselect() really as unportable as stated in the patch? What platforms 
have problems with pselect()?


Using the self-pipe trick, don't we risk running into the open file 
handles limitation? Or is it just two handles per process?


Do I understand correctly that the purpose of this patch is to work 
around the brokenness of select() on very few platforms? Or is there any 
additional feature that plain signals don't give us?



+ * It's important to reset the latch*before*  checking if there's work to
+ * do. Otherwise, if someone sets the latch between the check and the
+ * ResetLatch call, you will miss it and Wait will block.


Why doesn't WaitLatch() clear it? What's the use case for waiting for a 
latch and *not* wanting to reset it?


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 Is pselect() really as unportable as stated in the patch? What platforms 
 have problems with pselect()?

Well, it's not defined in the Single Unix Spec, which is our customary
reference for portability.  Also, it's alleged that some platforms have
it but in a form that's not actually any safer than select().  For
example, I read in the Darwin man page for it

IMPLEMENTATION NOTES
 The pselect() function is implemented in the C library as a wrapper
 around select().

and that man page appears to be borrowed verbatim from FreeBSD.

 Using the self-pipe trick, don't we risk running into the open file 
 handles limitation? Or is it just two handles per process?

It's just two handles per process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Markus Wanner

On 09/06/2010 08:46 PM, Tom Lane wrote:

Well, it's not defined in the Single Unix Spec, which is our customary
reference for portability.


FWIW, I bet the self-pipe trick isn't mentioned there, either... any 
good precedence that it actually works as expected on all of the target 
platforms? Existing users of the self-pipe trick?


(You are certainly aware that pselect() is defined in newer versions of 
POSIX).



Also, it's alleged that some platforms have
it but in a form that's not actually any safer than select().  For
example, I read in the Darwin man page for it

IMPLEMENTATION NOTES
  The pselect() function is implemented in the C library as a wrapper
  around select().


Ouch. Indeed, quick googling reveals the following source code for Darwin:

http://www.opensource.apple.com/source/Libc/Libc-391.5.22/gen/FreeBSD/pselect.c

Now that you are mentioning it, I seem to recall that even glibc had a 
user-space implementation of pselect. Fortunately, that's quite some 
years ago.



and that man page appears to be borrowed verbatim from FreeBSD.


At least FreeBSD seems to have fixed this about 8 months ago:
http://svn.freebsd.org/viewvc/base?view=revisionrevision=200725

Maybe Darwin catches up eventually?


AFAICT the custom select() implementation we are using for Windows could 
easily be changed to mimic pselect() instead. Thus most reasonably 
up-to-date Linux distributions plus Windows certainly provide a workable 
pselect() syscall. Would it be worth using pselect() for those (and 
maybe others that support pselect() appropriately)?



It's just two handles per process.


Good. How about syscall overhead? One more write operation to the 
self-pipe per signal from within the signal handler and one read to 
actually clear the 'ready' state of the pipe from the waiter portion of 
the code, right?


Do we plan to replace all (or most) existing internal signals with these 
latches to circumvent the interruption problem? Or just the ones we need 
to wait for using pg_usleep()?


For Postgres-R, I'd probably have to extend it to call select() not only 
on the self-pipe, but on at least one other socket as well (to talk to 
the GCS). As long as that's possible, it looks like a more portable 
replacement for the pselect() variant that's currently in place there.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 AFAICT the custom select() implementation we are using for Windows could 
 easily be changed to mimic pselect() instead. Thus most reasonably 
 up-to-date Linux distributions plus Windows certainly provide a workable 
 pselect() syscall. Would it be worth using pselect() for those (and 
 maybe others that support pselect() appropriately)?

I don't entirely see the point of opening ourselves up to the risk of
using a pselect that's not safe under the hood.  In any case, on most
modern platforms poll() is preferable to any variant of select().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-05 Thread Simon Riggs
On Fri, 2010-09-03 at 18:24 -0400, Tom Lane wrote:
 Now the HS case likewise appears to be set up so that the signal can
 only directly interrupt ProcWaitForSignal, so I think the core issue
 is
 whether any deadlock situations are possible.  Given that this gets
 called from a low-level place like LockBufferForCleanup, I don't feel
 too comfortable about that.

LockBufferForCleanup is only ever called during recovery by
heap_xlog_clean() or btree_xlog_vacuum().

The actions taken to replay a WAL record are independent of all other
WAL records from a locking perspective, so replay of every WAL record
starts with no LWlocks held by startup process. LockBufferForCleanup is
taken early on in replay a heap or btree cleanup record and so we can
easily check that no other LWlocks are held while it is called.

   I certainly haven't seen any analysis or
 documentation of what locks can safely be held at that point.
 The deadlock checker only tries to take the LockMgr LWLocks, so
 extrapolating from whether it is safe to whether touching the
 ProcArrayLock is safe seems entirely unfounded.

So the startup process calls one LWlock, ProcArrayLock, and is not
holding any other LWlock when it does. The deadlock checker attempts to
get and hold all of the other lock partition locks. So deadlock checker
already does the thing you're saying might be dangerous and the startup
process doesn't.

The ProcArrayLock is only taken as a way of signaling other backends. If
that is particularly unsafe we could redesign that aspect.

 It might be worth pointing out here that LockBufferForCleanup is
 already
 known to be a risk factor for undetected deadlocks, even without HS in
 the picture, because of the possibility of deadlocks involving a chain
 of both heavyweight locks and LWLocks.  Whether HS makes it materially
 worse may be something that we need field experience to determine.

You may be right and that it will be a problem.

The deadlock risk we're protecting against is a deadlock involving both
normal locks and buffer pins. We're safer having it than not having this
code, IMHO.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 02/09/10 23:13, Tom Lane wrote:

The WaitLatch ...timeout API could use a bit of refinement.  I'd suggest
defining negative timeout as meaning wait forever, so that timeout = 0
can be used for check but don't wait.  Also, it seems like the
function shouldn't just return void but should return a bool to show
whether it saw the latch set or timed out.


In case of WaitLatchOrSocket, the caller might want to know if a latch 
was set, the socket became readable, or it timed out. So we need three 
different return values.


 (Yeah, I realize the caller
 could look into the latch to find that out, but callers really ought to
 treat latches as opaque structs.)

Hmm, maybe we need a TestLatch function to check if a latch is set.


I don't think you have the select-failed logic right in
WaitLatchOrSocket; on EINTR it will suppose that FD_ISSET is a valid
test to make, which I think ain't the case.  Just continue around
the loop.


Yep.

I also realized that the timeout handling is a bit surprising with 
interrupts. After EINTR we call select() again with the same timeout, so 
a signal effectively restarts the timer. We seem to have similar 
behavior in a couple of other places, in pgstat.c and auth.c. So maybe 
that's OK and just needs to be documented, but I thought I'd bring it up.



It seems like both implementations are #include'ing more than they
ought to --- why replication/walsender.h, in particular?


Windows implementation needs it for the max_wal_senders variable, to 
allocate enough shared Event objects in LatchShmemInit. In unix_latch.c 
it's not needed.



Also, using sig_atomic_t for owner_pid is entirely not sane.
On many platforms sig_atomic_t is only a byte, and besides
which you have no need for that field to be settable by a
signal handler.


Hmm, true, it doesn't need to be set from signal handler, but is there 
an atomicity problem if one process calls ReleaseLatch while another 
process is in SetLatch? ReleaseLatch sets owner_pid to 0, while SetLatch 
reads it and calls kill() on it. Can we assume that pid_t is atomic, or 
do we need a spinlock to protect it? (Windows implementation has a 
similar issue with HANDLE instead of pid_t)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 elog(FATAL) is *certainly* not a better idea.  I think there's really
 nothing that can be done, you just have to silently ignore the error.

 Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
 RecoveryConflictInterrupt() do that when unknown conflict mode is given
 as an argument. Are these calls unsafe, too?

[ shrug... ]  I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers.  Simon rejected that.
I am waiting for irrefutable evidence to emerge from the field
(and am very confident that it will be forthcoming...) before
I argue with him further.  Meanwhile, I'm not going to accept anything
unsafe in a core facility like this patch is going to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 02/09/10 23:13, Tom Lane wrote:
 (Yeah, I realize the caller
 could look into the latch to find that out, but callers really ought to
 treat latches as opaque structs.)

 Hmm, maybe we need a TestLatch function to check if a latch is set.

+1.  It could be a macro for now.  I wish that we could declare Latch as
an opaque struct, but we probably need to let callers embed it in a
larger struct, so we'll just have to rely on callers to code cleanly.

 I also realized that the timeout handling is a bit surprising with 
 interrupts. After EINTR we call select() again with the same timeout, so 
 a signal effectively restarts the timer.

Actually it's platform-dependent.  On some machines (I think
BSD-derived) the value of the timeout struct gets reduced by the elapsed
time before returning, so that if you just loop around you don't get
more than the originally specified delay time in total.

 We seem to have similar 
 behavior in a couple of other places, in pgstat.c and auth.c. So maybe 
 that's OK and just needs to be documented, but I thought I'd bring it up.

Yeah.  I am hoping that this facility will greatly reduce the need for
callers to care about the exact timeout delay, so I think that what we
should do for now is just document that the timeout might be several
times as long as specified, and see how it goes from there.

We could fix the problem if we had to, by adding some gettimeofday()
calls and computing the remaining delay time each time through the
loop.  But let's avoid doing that until it's clear we need it.

 Also, using sig_atomic_t for owner_pid is entirely not sane.

 Hmm, true, it doesn't need to be set from signal handler, but is there 
 an atomicity problem if one process calls ReleaseLatch while another 
 process is in SetLatch?

If there is *any* possibility of that happening then you have far worse
problems than whether the field is atomically readable or not: the
behavior will be unpredictable at just slightly larger timescales.
This is the reason why I think it'd be better if ReleaseLatch simply
didn't exist.  That'd discourage people from designing dynamic latch
structures, which are fundamentally going to be subject to race
conditions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 elog(FATAL) is *certainly* not a better idea.  I think there's really
 nothing that can be done, you just have to silently ignore the error.

 Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
 RecoveryConflictInterrupt() do that when unknown conflict mode is given
 as an argument. Are these calls unsafe, too?

 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

Oh.  I thought you had ignored his objections and fixed it.  Why are
we releasing 9.0 with this problem again?  Surely this is nuts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

 Oh.  I thought you had ignored his objections and fixed it.  Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

My original review of hot standby found about half a dozen things
I thought were broken:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
After a *very* long-drawn-out fight I fixed one of them
(max_standby_delay), largely still over Simon's objections.  I don't
have the energy to repeat that another half-dozen times, so I'm going
to wait for the suspected problems to be proven by field experience.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

 Oh.  I thought you had ignored his objections and fixed it.  Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

 My original review of hot standby found about half a dozen things
 I thought were broken:
 http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
 After a *very* long-drawn-out fight I fixed one of them
 (max_standby_delay), largely still over Simon's objections.  I don't
 have the energy to repeat that another half-dozen times, so I'm going
 to wait for the suspected problems to be proven by field experience.

Bummer.  Allow me to cast a vote for doing something about the fact
that handle_standby_sig_alarm() thinks it can safely acquire an LWLock
in a signal handler.  I think we should be making our decisions on
what to change in the code based on what is technically sound, rather
than based on how much the author complains about changing it.  Of
course there may be cases where there is a legitimate difference of
opinion concerning the best way forward, but I don't believe this is
one of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 17:51, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 02/09/10 23:13, Tom Lane wrote:

Also, using sig_atomic_t for owner_pid is entirely not sane.



Hmm, true, it doesn't need to be set from signal handler, but is there
an atomicity problem if one process calls ReleaseLatch while another
process is in SetLatch?


If there is *any* possibility of that happening then you have far worse
problems than whether the field is atomically readable or not: the
behavior will be unpredictable at just slightly larger timescales.
This is the reason why I think it'd be better if ReleaseLatch simply
didn't exist.  That'd discourage people from designing dynamic latch
structures, which are fundamentally going to be subject to race
conditions.


Each Walsender needs a latch, and walsenders come and go. I first 
experimented with had no ReleaseLatch function; instead any process 
could call WaitLatch on any shared latch, as long as only one process 
waits on a given latch at a time. But it had exactly the same problem, 
WaitLatch had to set the pid on the Latch struct to allow other 
processes to send the signal. Another process could call SetLatch and 
read the pid field, while WaitLatch is just setting it. I think we'll 
have to put a spinlock there, if we can't assume that assignment of 
pid_t is atomic. It's not the end of the world..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 17:51, Tom Lane wrote:
 If there is *any* possibility of that happening then you have far worse
 problems than whether the field is atomically readable or not: the
 behavior will be unpredictable at just slightly larger timescales.

 Each Walsender needs a latch, and walsenders come and go.

Well, then we need to think extremely hard about the circumstances in
which we need to send a cross-process latch signal to walsenders and
what the behavior needs to be in the race conditions.

 WaitLatch had to set the pid on the Latch struct to allow other 
 processes to send the signal. Another process could call SetLatch and 
 read the pid field, while WaitLatch is just setting it. I think we'll 
 have to put a spinlock there, if we can't assume that assignment of 
 pid_t is atomic. It's not the end of the world..

Yes it is.  Signal handlers can't take spinlocks (what if they interrupt
while the mainline is holding the lock?).

It's probably not too unreasonable to assume that pid_t assignment is
atomic.  But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at approximately
the same time as a latch owner is coming or going.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 21:16, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

WaitLatch had to set the pid on the Latch struct to allow other
processes to send the signal. Another process could call SetLatch and
read the pid field, while WaitLatch is just setting it. I think we'll
have to put a spinlock there, if we can't assume that assignment of
pid_t is atomic. It's not the end of the world..


Yes it is.  Signal handlers can't take spinlocks (what if they interrupt
while the mainline is holding the lock?).


Ok, I see.


It's probably not too unreasonable to assume that pid_t assignment is
atomic.  But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at approximately
the same time as a latch owner is coming or going.


I don't see how to avoid it. A walsender, or any process really, can 
exit at any time. It can make the latch inaccessible to others before it 
exits to minimize the window, but it's always going to be possible that 
another process is just about to call SetLatch when you exit.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 21:16, Tom Lane wrote:
 It's probably not too unreasonable to assume that pid_t assignment is
 atomic.  But I'm still thinking that we have bigger problems than that
 if there are really cases where SetLatch can execute at approximately
 the same time as a latch owner is coming or going.

 I don't see how to avoid it. A walsender, or any process really, can 
 exit at any time. It can make the latch inaccessible to others before it 
 exits to minimize the window, but it's always going to be possible that 
 another process is just about to call SetLatch when you exit.

Well, in that case what we need to do is presume that the latch object
has a continuing existence but the owner/receiver can come and go.
I would suggest that InitLatch needs to initialize the object into a
valid but unowned state; there is *no* deinitialize operation; and
there are AcquireLatch and ReleaseLatch operations to become owner
or stop being owner.  We also need to define the semantics of SetLatch
on an unowned latch --- does this set a signal condition that will be
available to the next owner?

This amount of complexity might be overkill for local latches, but I
think we need it for shared ones.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Ron Mayer
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) [...]
 
 [...]Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

Will the docs give enough info so that release note readers
will know when they're giving well-informed consent to volunteer
to produce such field evidence?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 3:11 PM, Ron Mayer rm...@cheapcomplexdevices.com wrote:
 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) [...]

 [...]Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

 Will the docs give enough info so that release note readers
 will know when they're giving well-informed consent to volunteer
 to produce such field evidence?

Yeah, exactly.  Good news: you can now run queries on the standby.
Bad news: we've abandoned our policy of not releasing with known bugs.
 Have fun and enjoy using PostgreSQL, the world's most advanced open
source databSegmentation fault

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 19:38, Robert Haas wrote:

On Fri, Sep 3, 2010 at 12:10 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Fri, Sep 3, 2010 at 10:07 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

[ shrug... ]  I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers.  Simon rejected that.
I am waiting for irrefutable evidence to emerge from the field
(and am very confident that it will be forthcoming...) before
I argue with him further.  Meanwhile, I'm not going to accept anything
unsafe in a core facility like this patch is going to be.



Oh.  I thought you had ignored his objections and fixed it.  Why are
we releasing 9.0 with this problem again?  Surely this is nuts.


My original review of hot standby found about half a dozen things
I thought were broken:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
After a *very* long-drawn-out fight I fixed one of them
(max_standby_delay), largely still over Simon's objections.  I don't
have the energy to repeat that another half-dozen times, so I'm going
to wait for the suspected problems to be proven by field experience.


Bummer.  Allow me to cast a vote for doing something about the fact
that handle_standby_sig_alarm() thinks it can safely acquire an LWLock
in a signal handler.  I think we should be making our decisions on
what to change in the code based on what is technically sound, rather
than based on how much the author complains about changing it.  Of
course there may be cases where there is a legitimate difference of
opinion concerning the best way forward, but I don't believe this is
one of them.


Hmm, just to make the risk more concrete, here's one scenario that could 
happen:


1. Startup process tries to acquire cleanup lock on a page. It's pinned, 
so it has to wait, and calls ResolveRecoveryConflictWithBufferPin().
2. ResolveRecoveryConflictWithBufferPin enables the standby SIGALRM 
handler by calling enable_standby_sig_alarm(), and calls 
ProcWaitForSignal().


3. ProcWaitForSignal() calls semop() (assuming sysv semaphores here) to 
wait for the process semaphore


4. Max standby delay is reached and SIGALRM fired. CheckStandbyTimeout() 
is called in signal handler. CheckStandbyTimeout() calls 
SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends()


5. CancelDBBackends() tries to acquire ProcArrayLock in exclusive mode. 
It's being held by another process, so we have to sleep


6. To sleep, LWLockAcquire calls PGSemaphoreLock, which calls semop() to 
wait on on the process semaphore.


So we now have the same process nested twice inside a semop() call. 
Looking at the Linux signal (7) man page, it is undefined what happens 
if semop() is re-entered in a signal handler while another semop() call 
is happening in main line of execution. Assuming it somehow works, which 
semop() call is going to return when the semaphore is incremented?


Maybe that's ok, if I'm reading the deadlock checker code correctly, it 
also calls semop() to increment the another process' semaphore, and the 
deadlock checker can be invoked from a signal handler while in semop() 
to wait on our process' semaphore. BTW, sem_post(), which is the Posix 
function for incrementing a semaphore, is listed as a safe function to 
call in a signal handler. But it's certainly fishy.


A safer approach would be to just PGSemaphoreUnlock() in the signal 
handler, and do all the other processing outside it. You'd still call 
semop() within semop(), but at least it would be closer to the semop() 
within semop() we already do in the deadlock checker. And there would be 
less randomness from timing and lock contention involved, making it 
easier to test the behavior on various platforms.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 4:20 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Maybe that's ok, if I'm reading the deadlock checker code correctly, it also
 calls semop() to increment the another process' semaphore, and the deadlock
 checker can be invoked from a signal handler while in semop() to wait on our
 process' semaphore. BTW, sem_post(), which is the Posix function for
 incrementing a semaphore, is listed as a safe function to call in a signal
 handler. But it's certainly fishy.

Color me confused; I may need to backpedal rapidly here.  I had
thought that what Tom was complaining about was the fact that the
signal handler was taking LWLocks, which I would have thought to be
totally unsafe.  But it seems the deadlock detector does the same
thing, more or less because the signal handlers are set up so that
they don't do anything unless we're within a limited section of code
where nothing too interesting can happen.  I'm not too sure why we
think that it's safe to invoke the deadlock detector that way, but
it's also not too clear to me why this case is any worse.

It furthermore appears that Simon's reply to Tom's complaint about
this function was: This was modelled very closely on
handle_sig_alarm() and was reviewed by other hackers. I'm not great on
that, as you know, so if you can explain what it is I can't do, and
how that differs from handle_sig_alarm running the deadlock detector
in the same way, then I'll work on it some more.

I guess I need the same explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 A safer approach would be to just PGSemaphoreUnlock() in the signal 
 handler, and do all the other processing outside it.

I don't see any particularly good reason to assume that
PGSemaphoreUnlock is safe either: you're still talking about nested
semop operations.

The pre-existing SIGALRM handler uses a self-signal (kill(MyProcPid,
SIGINT)) to kick the process off any wait it might be doing.  I'd rather
do something like that.

Or maybe the work you're doing on latches would help ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Color me confused; I may need to backpedal rapidly here.  I had
 thought that what Tom was complaining about was the fact that the
 signal handler was taking LWLocks, which I would have thought to be
 totally unsafe.

Well, it's unsafe if the signal could interrupt mainline code that is
trying to take an LWLock or already holds the same LWLock --- or, if you
consider deadlock risks against other processes, already holding other
LWLocks might be problematic too.  And trying to use facilities like
elog or palloc is also pretty unsafe if the mainline is too.

The reason the deadlock checker is okay is that there is only a *very*
narrow range of mainline code that it could possibly be interrupting,
basically nothing but the PGSemaphoreLock() call in ProcSleep.
(There's a lot of other code inside the loop there, but notice that it's
protected by tests that ensure that it won't run unless the deadlock
checker already did.)

One salient thing about ProcSleep is that you shouldn't call it while
holding any LWLocks, and another is that if you're sleeping while
holding regular heavyweight locks, the deadlock checker is exactly what
will get you out of trouble if there's a deadlock.

Now the HS case likewise appears to be set up so that the signal can
only directly interrupt ProcWaitForSignal, so I think the core issue is
whether any deadlock situations are possible.  Given that this gets
called from a low-level place like LockBufferForCleanup, I don't feel
too comfortable about that.  I certainly haven't seen any analysis or
documentation of what locks can safely be held at that point.
The deadlock checker only tries to take the LockMgr LWLocks, so
extrapolating from whether it is safe to whether touching the
ProcArrayLock is safe seems entirely unfounded.

It might be worth pointing out here that LockBufferForCleanup is already
known to be a risk factor for undetected deadlocks, even without HS in
the picture, because of the possibility of deadlocks involving a chain
of both heavyweight locks and LWLocks.  Whether HS makes it materially
worse may be something that we need field experience to determine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 6:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Now the HS case likewise appears to be set up so that the signal can
 only directly interrupt ProcWaitForSignal, so I think the core issue is
 whether any deadlock situations are possible.  Given that this gets
 called from a low-level place like LockBufferForCleanup, I don't feel
 too comfortable about that.  I certainly haven't seen any analysis or
 documentation of what locks can safely be held at that point.
 The deadlock checker only tries to take the LockMgr LWLocks, so
 extrapolating from whether it is safe to whether touching the
 ProcArrayLock is safe seems entirely unfounded.

 It might be worth pointing out here that LockBufferForCleanup is already
 known to be a risk factor for undetected deadlocks, even without HS in
 the picture, because of the possibility of deadlocks involving a chain
 of both heavyweight locks and LWLocks.  Whether HS makes it materially
 worse may be something that we need field experience to determine.

OK, well, that does make it more clear what you're worried about, but
it seems that moving the work out of the signal handler isn't going to
solve this problem.  The core issue appears to be whether the caller
might be holding a lock which another process might attempt to take
while already holding ProcArrayLock, or (in the case of a three or
more way deadlock) whether the caller might be holding a lock that
some other process could try to acquire after seizing ProcArrayLock.
As far as I can tell from looking through the code, the only locks we
ever acquire while holding ProcArrayLock are XidGenLock and
known_assigned_xids_lck (which is a spin lock).  The latter is never
held more than VERY briefly.  XidGenLock is generally also held only
briefly, not acquiring any other locks in the meantime, but
GetNewTransactionId() is not obviously (to me) safe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-02 Thread Heikki Linnakangas

On 02/09/10 06:46, Fujii Masao wrote:

On Wed, Sep 1, 2010 at 4:11 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

The obvious next question is how to wait for multiple sockets and a latch at
the same time? Perhaps we should have a select()-like interface where you
can pass multiple file descriptors. Then again, looking at the current
callers of select() in the backend, apart from postmaster they all wait for
only one fd.


Currently backends have not waited for multiple sockets, so I don't think that
interface is required for now. Similarly, we don't need to wait for the socket
to be ready to *write* because there is no use case for now.


Ok, here's an updated patch with WaitLatchOrSocket that let's you do that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/configure b/configure
index bd9b347..432cd58 100755
--- a/configure
+++ b/configure
@@ -27773,6 +27773,13 @@ _ACEOF
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -29098,7 +29105,7 @@ fi
 ac_config_files=$ac_config_files GNUmakefile src/Makefile.global
 
 
-ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
+ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
 
 
 if test $PORTNAME = win32; then
@@ -29722,6 +29729,7 @@ do
 src/backend/port/dynloader.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c ;;
 src/backend/port/pg_sema.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} ;;
 src/backend/port/pg_shmem.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} ;;
+src/backend/port/pg_latch.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} ;;
 src/include/dynloader.h) CONFIG_LINKS=$CONFIG_LINKS src/include/dynloader.h:src/backend/port/dynloader/${template}.h ;;
 src/include/pg_config_os.h) CONFIG_LINKS=$CONFIG_LINKS src/include/pg_config_os.h:src/include/port/${template}.h ;;
 src/Makefile.port) CONFIG_LINKS=$CONFIG_LINKS src/Makefile.port:src/makefiles/Makefile.${template} ;;
diff --git a/configure.in b/configure.in
index 7b09986..7f84cea 100644
--- a/configure.in
+++ b/configure.in
@@ -1700,6 +1700,13 @@ else
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -1841,6 +1848,7 @@ AC_CONFIG_LINKS([
   src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c
   src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION}
   src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION}
+  src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION}
   src/include/dynloader.h:src/backend/port/dynloader/${template}.h
   src/include/pg_config_os.h:src/include/port/${template}.h
   src/Makefile.port:src/makefiles/Makefile.${template}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 615a7fa..094d0c9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -55,6 +55,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include pgstat.h
+#include replication/walsender.h
 #include storage/fd.h
 #include storage/procarray.h
 #include storage/sinvaladt.h
@@ -1025,6 +1026,13 @@ EndPrepare(GlobalTransaction gxact)
 
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
+	/*
+	 * Wake up all walsenders to send WAL up to the PREPARE record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
 	/* write correct CRC and close file */
 	if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
 	{
@@ -2005,6 +2013,13 @@ 

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Ok, here's an updated patch with WaitLatchOrSocket that let's you do that.

Minor code review items:

s/There is three/There are three/ in unix_latch.c header comment

The header comment points out the correct usage of ResetLatch, but I
think it should also emphasize that the correct usage of SetLatch is
to set whatever flags indicate work-to-do before SetLatch.

I don't care for the Asserts in InitLatch and InitSharedLatch.
Initialization functions ought not assume that the struct they are
told to initialize contains anything but garbage.  And *especially*
not assume that without documentation.

s/inter-proess/inter-process/, at least 2 places

Does ReleaseLatch() have any actual use-case, and if so what would it be?
I think we could do without it.

The WaitLatch timeout API could use a bit of refinement.  I'd suggest
defining negative timeout as meaning wait forever, so that timeout = 0
can be used for check but don't wait.  Also, it seems like the
function shouldn't just return void but should return a bool to show
whether it saw the latch set or timed out.  (Yeah, I realize the caller
could look into the latch to find that out, but callers really ought to
treat latches as opaque structs.)

I don't think you have the select-failed logic right in
WaitLatchOrSocket; on EINTR it will suppose that FD_ISSET is a valid
test to make, which I think ain't the case.  Just continue around
the loop.

Comment for unix_latch's latch_sigusr1_handler refers to SetEvent,
which is a Windows-ism.

+* XXX: Is it safe to elog(ERROR) in a signal handler?

No, it isn't.

It seems like both implementations are #include'ing more than they
ought to --- why replication/walsender.h, in particular?
I don't think unix_latch needs spin.h either.

+typedef struct
+{
+   volatile sig_atomic_t   is_set;
+   volatile sig_atomic_t   owner_pid;
+} Latch;

I don't believe it is either sane or portable to declare struct fields
as volatile.  You need to attach the volatile qualifier to the function
arguments instead, eg

extern WaitLatch(volatile Latch *latch, ...)

Also, using sig_atomic_t for owner_pid is entirely not sane.
On many platforms sig_atomic_t is only a byte, and besides
which you have no need for that field to be settable by a
signal handler.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-02 Thread Fujii Masao
On Fri, Sep 3, 2010 at 5:13 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Does ReleaseLatch() have any actual use-case, and if so what would it be?
 I think we could do without it.

In Unix, probably we can live without that. But in Windows, we need to
free SharedEventHandles slot for upcoming process using a latch when
ending.

 The WaitLatch timeout API could use a bit of refinement.  I'd suggest
 defining negative timeout as meaning wait forever, so that timeout = 0
 can be used for check but don't wait.  Also, it seems like the
 function shouldn't just return void but should return a bool to show
 whether it saw the latch set or timed out.  (Yeah, I realize the caller
 could look into the latch to find that out, but callers really ought to
 treat latches as opaque structs.)

Agreed.

 I don't think you have the select-failed logic right in
 WaitLatchOrSocket; on EINTR it will suppose that FD_ISSET is a valid
 test to make, which I think ain't the case.  Just continue around
 the loop.

EINTR already makes us go back to the top of the loop since FD_ISSET(pipe)
is not checked. Then we would clear the pipe and break out of the loop
because of latch-is_set == true.

 +                        * XXX: Is it safe to elog(ERROR) in a signal handler?

 No, it isn't.

We should use elog(FATAL) or check proc_exit_inprogress, instead?

+   if (errno != EAGAIN  errno != EWOULDBLOCK)
+   {
+   /*
+* XXX: Is it safe to elog(ERROR) in a signal handler?
+*/
+   elog(ERROR, write() on self-pipe failed: %m);
+   }
+   if (errno == EINTR)
+   goto retry;

errno == EINTR) seems to be never checked.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-02 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 +   * XXX: Is it safe to elog(ERROR) in a signal handler?
 
 No, it isn't.

 We should use elog(FATAL) or check proc_exit_inprogress, instead?

elog(FATAL) is *certainly* not a better idea.  I think there's really
nothing that can be done, you just have to silently ignore the error.
BTW, if we retry, there had probably better be a limit on how many times
to retry ...

 + if (errno != EAGAIN  errno != EWOULDBLOCK)
 + {
 + /*
 +  * XXX: Is it safe to elog(ERROR) in a signal handler?
 +  */
 + elog(ERROR, write() on self-pipe failed: %m);
 + }
 + if (errno == EINTR)
 + goto retry;

 errno == EINTR) seems to be never checked.

Another issue with coding like that is that it supposes elog() won't
change errno.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-02 Thread Fujii Masao
On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 +               * XXX: Is it safe to elog(ERROR) in a signal handler?

 No, it isn't.

 We should use elog(FATAL) or check proc_exit_inprogress, instead?

 elog(FATAL) is *certainly* not a better idea.  I think there's really
 nothing that can be done, you just have to silently ignore the error.

Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
RecoveryConflictInterrupt() do that when unknown conflict mode is given
as an argument. Are these calls unsafe, too?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-01 Thread Heikki Linnakangas

On 31/08/10 15:47, Fujii Masao wrote:

On Fri, Aug 27, 2010 at 4:39 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

/*
  * XXX: Should we invent an API to wait for data coming from the
  * client connection too? It's not critical, but we could then
  * eliminate the timeout altogether and go to sleep for good.
  */


Yes, it would be very helpful when walsender waits for the ACK from
the standby in upcoming synchronous replication.


I propose to change WaitLatch() so that it accepts the socket
descriptor as an argument, to wait for data coming from the
client connection.


Yeah, we probably should do that now.


 WaitLatch() monitors not only the self-pipe
but also the given socket. If -1 is supplied, it checks only
the self-pipe.


The obvious next question is how to wait for multiple sockets and a 
latch at the same time? Perhaps we should have a select()-like interface 
where you can pass multiple file descriptors. Then again, looking at the 
current callers of select() in the backend, apart from postmaster they 
all wait for only one fd.



The socket should be monitored by using poll() if the platform
has it, since poll() is usually more efficient.


Yeah, I guess.


So I'd like to change Unix implementation of WaitLatch() as
follows. Thought?

---
define WaitLatch(latch, timeout) WaitLatchAndSocket(latch, -1, timeout)

void WaitLatchAndSocket(Latch *latch, int sock, long timeout);
{
 ...

 FD_SET(selfpipe_readfd,input_mask);
 if (sock != -1)
 FD_SET(sock,input_mask);

#ifdef HAVE_POLL
 poll(...)
#else
 select(...)
  #endif   /* HAVE_POLL */

 ...
}
---


Yep.


Windows implementation of WaitLatchAndSocket() seems not to be
so simple. We would need to wait for both the latch event and
the packet from the socket by using WaitForMultipleObjectsEx().


Well, we already use WaitForMultipleObjectsEx() to implement select() on 
Windows, so it should be straightforward to copy that. I'll look into that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-01 Thread Fujii Masao
On Wed, Sep 1, 2010 at 4:11 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The obvious next question is how to wait for multiple sockets and a latch at
 the same time? Perhaps we should have a select()-like interface where you
 can pass multiple file descriptors. Then again, looking at the current
 callers of select() in the backend, apart from postmaster they all wait for
 only one fd.

Currently backends have not waited for multiple sockets, so I don't think that
interface is required for now. Similarly, we don't need to wait for the socket
to be ready to *write* because there is no use case for now.

 Windows implementation of WaitLatchAndSocket() seems not to be
 so simple. We would need to wait for both the latch event and
 the packet from the socket by using WaitForMultipleObjectsEx().

 Well, we already use WaitForMultipleObjectsEx() to implement select() on
 Windows, so it should be straightforward to copy that. I'll look into that.

Agreed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-31 Thread Heikki Linnakangas
Here's a 2nd version of the latch patch. Now with a Windows 
implementation. Comments welcome.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/configure b/configure
index bd9b347..432cd58 100755
--- a/configure
+++ b/configure
@@ -27773,6 +27773,13 @@ _ACEOF
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -29098,7 +29105,7 @@ fi
 ac_config_files=$ac_config_files GNUmakefile src/Makefile.global
 
 
-ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
+ac_config_links=$ac_config_links src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} src/include/dynloader.h:src/backend/port/dynloader/${template}.h src/include/pg_config_os.h:src/include/port/${template}.h src/Makefile.port:src/makefiles/Makefile.${template}
 
 
 if test $PORTNAME = win32; then
@@ -29722,6 +29729,7 @@ do
 src/backend/port/dynloader.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c ;;
 src/backend/port/pg_sema.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION} ;;
 src/backend/port/pg_shmem.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION} ;;
+src/backend/port/pg_latch.c) CONFIG_LINKS=$CONFIG_LINKS src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION} ;;
 src/include/dynloader.h) CONFIG_LINKS=$CONFIG_LINKS src/include/dynloader.h:src/backend/port/dynloader/${template}.h ;;
 src/include/pg_config_os.h) CONFIG_LINKS=$CONFIG_LINKS src/include/pg_config_os.h:src/include/port/${template}.h ;;
 src/Makefile.port) CONFIG_LINKS=$CONFIG_LINKS src/Makefile.port:src/makefiles/Makefile.${template} ;;
diff --git a/configure.in b/configure.in
index 7b09986..7f84cea 100644
--- a/configure.in
+++ b/configure.in
@@ -1700,6 +1700,13 @@ else
   SHMEM_IMPLEMENTATION=src/backend/port/win32_shmem.c
 fi
 
+# Select latch implementation type.
+if test $PORTNAME != win32; then
+  LATCH_IMPLEMENTATION=src/backend/port/unix_latch.c
+else
+  LATCH_IMPLEMENTATION=src/backend/port/win32_latch.c
+fi
+
 # If not set in template file, set bytes to use libc memset()
 if test x$MEMSET_LOOP_LIMIT = x ; then
   MEMSET_LOOP_LIMIT=1024
@@ -1841,6 +1848,7 @@ AC_CONFIG_LINKS([
   src/backend/port/dynloader.c:src/backend/port/dynloader/${template}.c
   src/backend/port/pg_sema.c:${SEMA_IMPLEMENTATION}
   src/backend/port/pg_shmem.c:${SHMEM_IMPLEMENTATION}
+  src/backend/port/pg_latch.c:${LATCH_IMPLEMENTATION}
   src/include/dynloader.h:src/backend/port/dynloader/${template}.h
   src/include/pg_config_os.h:src/include/port/${template}.h
   src/Makefile.port:src/makefiles/Makefile.${template}
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 615a7fa..094d0c9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -55,6 +55,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include pgstat.h
+#include replication/walsender.h
 #include storage/fd.h
 #include storage/procarray.h
 #include storage/sinvaladt.h
@@ -1025,6 +1026,13 @@ EndPrepare(GlobalTransaction gxact)
 
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
+	/*
+	 * Wake up all walsenders to send WAL up to the PREPARE record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
 	/* write correct CRC and close file */
 	if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
 	{
@@ -2005,6 +2013,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
+	/*
+	 * Wake up all walsenders to send WAL up to the COMMIT PREPARED record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
 	/* Mark the transaction committed in pg_clog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
@@ -2078,6 +2093,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	XLogFlush(recptr);
 
 	/*
+	 * Wake up all walsenders to send WAL up to the ABORT PREPARED record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
+	/*
 	 * Mark the transaction 

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-31 Thread Fujii Masao
On Tue, Aug 31, 2010 at 4:06 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here's a 2nd version of the latch patch. Now with a Windows
 implementation. Comments welcome.

Seems good.

Two minor comments:

 rc = WaitForSingleObject(latch-event, timeout / 1000);
 if (rc == WAIT_FAILED)
 {
   ereport(ERROR,
   (errcode_for_socket_access(),
errmsg(WaitForSingleObject() failed: error code %d, (int) 
 GetLastError(;
 }
 if (rc == WAIT_TIMEOUT)
   break;  /* timeout exceeded */

We should also check rc == WAIT_OBJECT_0?

 static volatile HANDLE waitingEvent = false;

s/false/NULL?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-31 Thread Fujii Masao
On Fri, Aug 27, 2010 at 4:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
 /*
  * XXX: Should we invent an API to wait for data coming from the
  * client connection too? It's not critical, but we could then
  * eliminate the timeout altogether and go to sleep for good.
  */

 Yes, it would be very helpful when walsender waits for the ACK from
 the standby in upcoming synchronous replication.

I propose to change WaitLatch() so that it accepts the socket
descriptor as an argument, to wait for data coming from the
client connection. WaitLatch() monitors not only the self-pipe
but also the given socket. If -1 is supplied, it checks only
the self-pipe.

The socket should be monitored by using poll() if the platform
has it, since poll() is usually more efficient.

So I'd like to change Unix implementation of WaitLatch() as
follows. Thought?

---
define WaitLatch(latch, timeout) WaitLatchAndSocket(latch, -1, timeout)

void WaitLatchAndSocket(Latch *latch, int sock, long timeout);
{
...

FD_SET(selfpipe_readfd, input_mask);
if (sock != -1)
FD_SET(sock, input_mask);

#ifdef HAVE_POLL
poll(...)
#else
select(...)
 #endif   /* HAVE_POLL */

...
}
---

Windows implementation of WaitLatchAndSocket() seems not to be
so simple. We would need to wait for both the latch event and
the packet from the socket by using WaitForMultipleObjectsEx().

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-31 Thread Bruce Momjian
Tom Lane wrote:
 Greg Smith g...@2ndquadrant.com writes:
  Tom Lane wrote:
  Well, yes they are.  They cause unnecessary process wakeups and thereby
  consume cycles even when the database is idle.  See for example a
  longstanding complaint here:
  https://bugzilla.redhat.com/show_bug.cgi?id=252129
 
  ...  The only clear case where this is 
  always a win is when the system it totally idle.
 
 If you'll climb down off that horse for a moment: yeah, the idle case is
 *exactly* what they're complaining about.  In particular, the complaint
 is that it's unreasonable to have Postgres running on a machine at all
 unless it's actively being used, because it forces significant CPU power
 drain anyway.  That gets in the way of our plan for world domination,
 no?  If you can't have a PG sitting unobtrusively in the background,
 waiting for you to have a need for it, it won't get installed in the
 first place.  People will pick mysql, or something else with a smaller
 footprint, to put on their laptops, and then we lose some more mindshare.
 I see this as just another facet of the argument about whether it's okay
 to have default settings that try to take over the entire machine.

FYI, last week I was running PG 8.4.X with default settings on a T43
laptop running XP with 1 Gig of RAM and it caused a racing game I was
playing to run jerkily.  When I shut down the Postgres server, the
problem was fixed.  I was kind of surprised that an idle PG server could
cause that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-28 Thread Greg Smith

Tom Lane wrote:

Greg Smith g...@2ndquadrant.com writes:
  
...  The only clear case where this is 
always a win is when the system it totally idle.


If you'll climb down off that horse for a moment: yeah, the idle case is
*exactly* what they're complaining about.


I wasn't on a horse here--I saw the specific case they must be most 
annoyed with.  If it's possible to turn this around into a push model 
instead of how it works now, without tanking so that performance (and 
maybe even power use!) suffers for the worst or even average case, that 
refactoring could end up outperforming the current model.  I'd like the 
background writer to never go to sleep at all really if enough works 
come in to keep it always busy, and I think that might fall out nicely 
as a result of aiming for the other end--making it sleeper deeper when 
it's not needed.


Just pointing out the very specific place where I suspect that will go 
wrong if it's not done carefully is the fsync absorption, because it's 
already broken enough that we're reworking it here.  PostgreSQL already 
ship a bad enough default configuration to decide yet another spot 
should be yielded to somebody else's priorities, unless that actually 
meets performance goals.  I think I can quantify what those should be 
with a test case for this part.



I see this as just another facet of the argument about whether it's okay
to have default settings that try to take over the entire machine.
  


Mostly agreed here.  So long as the result is automatic enough to not 
introduce yet another GUC set to the wrong thing for busy systems by 
default, this could turn out great.  The list of things you must change 
to get the database to work right for serious work is already far too 
long, and I dread the thought of putting yet another on there just for 
the sake of lower power use.  Another part of the plan for world 
domination is that Oracle DBAs install the database for a test and say 
wow, that wasn't nearly as complicated as I'm used to and it performs 
well, that was nice.  That those people matter too is all I'm saying.  
I could easily file a bug from their perspective saying Background 
writer is a lazy SOB in default config that would be no less valid than 
the one being discussed here.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Fujii Masao
On Thu, Aug 26, 2010 at 7:40 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here's a first attempt at implementing that. To demonstrate how it works, I
 modified walsender to use the new latch facility, also to respond quickly to
 SIGHUP and SIGTERM.

Great!

 There's two kinds of latches, local and global. Local latches can only be
 set from the same process - allowing you to replace pg_usleep() with
 something that is always interruptible by signals (by setting the latch in
 the signal handler). The global latches work the same, and indeed the
 implementation is the same, but the latch resides in shared memory, and can
 be set by any process attached to shared memory. On Unix, when you set a
 latch waited for by another process, the setter sends SIGUSR1 to the waiting
 process, and the signal handler sends the byte to the self-pipe to wake up
 the select().

According to this explanation, the latch which walsender uses seems to be
local. If it's true, walsender should call InitSharedLatch rather than
InitLatch?

 /*
  * XXX: Should we invent an API to wait for data coming from the
  * client connection too? It's not critical, but we could then
  * eliminate the timeout altogether and go to sleep for good.
  */

Yes, it would be very helpful when walsender waits for the ACK from
the standby in upcoming synchronous replication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Heikki Linnakangas

On 27/08/10 10:39, Fujii Masao wrote:

On Thu, Aug 26, 2010 at 7:40 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

There's two kinds of latches, local and global. Local latches can only be
set from the same process - allowing you to replace pg_usleep() with
something that is always interruptible by signals (by setting the latch in
the signal handler). The global latches work the same, and indeed the
implementation is the same, but the latch resides in shared memory, and can
be set by any process attached to shared memory. On Unix, when you set a
latch waited for by another process, the setter sends SIGUSR1 to the waiting
process, and the signal handler sends the byte to the self-pipe to wake up
the select().


According to this explanation, the latch which walsender uses seems to be
local. If it's true, walsender should call InitSharedLatch rather than
InitLatch?


Yes, it should.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Greg Smith

Tom Lane wrote:

Well, yes they are.  They cause unnecessary process wakeups and thereby
consume cycles even when the database is idle.  See for example a
longstanding complaint here:
https://bugzilla.redhat.com/show_bug.cgi?id=252129

If we're going to go to the trouble of having a mechanism like this,
I'd like it to fix that problem so I can close out that bug.
  


The way the background writer wakes up periodically to absorb fsync 
requests is already way too infrequent on a busy system.  That component 
of the bug report reeks as not a bug to me.  Don't like the default?  
Increase bgwriter_delay; move on with your life.  We're not going to 
increase the default, to screw over regular users, just to instead 
prioritize people who prefer low power consumption.  There's a knob for 
it, you can tune the other way if you want.  And based on this pile of 
data I'm sorting through the last few weeks, my guess is that if the 
default is going anywhere in 9.1 it's to make the BGW run *more* often, 
not less.  What those asking for the default change don't realize is 
that if the BGW stops doing useful work, backends will start doing more 
disk writes with their own fsync calls, and now you've just messed with 
out how often the disks have to be powered up.  I could probably 
construct a test case that uses more power with the behavior they think 
they want than the current one does.  The only clear case where this is 
always a win is when the system it totally idle.


The complaint that there's no similar way to detune the logger for lower 
power use, something you can't really tweak on your own, is a much more 
reasonable demand.


I have a patch that adds a new column to pg_stat_bgwriter to count how 
often backend fsync calls happen directly, because the background writer 
couldn't be bothered to absorb them.  If this latching idea goes 
somewhere, that should be a reasonable way to measure if the new 
implementation is getting in the way of that particular processing 
requirement.  It's important to realize that the fsync absorb function 
of the background writer has become absolutely critical on modern 
systems that hit high transaction rates, so any refactoring of its basic 
design needs to stay responsive to those incoming backend requests.  I 
don't see any high-level issues with the latch design approach Heikki 
has proposed in regards to that.  I'll take a look at some of the other 
test cases I have here to see if they can help quantify its impact on 
this aspect of BGW behavior.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Robert Haas
On Fri, Aug 27, 2010 at 5:54 PM, Greg Smith g...@2ndquadrant.com wrote:
 Tom Lane wrote:

 Well, yes they are.  They cause unnecessary process wakeups and thereby
 consume cycles even when the database is idle.  See for example a
 longstanding complaint here:
 https://bugzilla.redhat.com/show_bug.cgi?id=252129

 If we're going to go to the trouble of having a mechanism like this,
 I'd like it to fix that problem so I can close out that bug.

 The way the background writer wakes up periodically to absorb fsync requests
 is already way too infrequent on a busy system.

Maybe instead of a fixed-duration sleep we could wake it up when it
needs to do something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 27, 2010 at 5:54 PM, Greg Smith g...@2ndquadrant.com wrote:
 The way the background writer wakes up periodically to absorb fsync requests
 is already way too infrequent on a busy system.

 Maybe instead of a fixed-duration sleep we could wake it up when it
 needs to do something.

*Any* fixed delay is going to be too long for some people and not long
enough for others; and the very same system might fall into both
categories at different times of day.  I don't think make
bgwriter_delay customizable is an adequate answer.  We've put up with
that so far because it wasn't possible to do better given the
infrastructure we had for waiting; but if we're going to try to improve
the infrastructure, we should have the ambition of getting rid of this
type of problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-27 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Tom Lane wrote:
 Well, yes they are.  They cause unnecessary process wakeups and thereby
 consume cycles even when the database is idle.  See for example a
 longstanding complaint here:
 https://bugzilla.redhat.com/show_bug.cgi?id=252129

 ...  The only clear case where this is 
 always a win is when the system it totally idle.

If you'll climb down off that horse for a moment: yeah, the idle case is
*exactly* what they're complaining about.  In particular, the complaint
is that it's unreasonable to have Postgres running on a machine at all
unless it's actively being used, because it forces significant CPU power
drain anyway.  That gets in the way of our plan for world domination,
no?  If you can't have a PG sitting unobtrusively in the background,
waiting for you to have a need for it, it won't get installed in the
first place.  People will pick mysql, or something else with a smaller
footprint, to put on their laptops, and then we lose some more mindshare.
I see this as just another facet of the argument about whether it's okay
to have default settings that try to take over the entire machine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-26 Thread Heikki Linnakangas

On 24/08/10 02:44, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

[ latch proposal ]


This seems reasonably clean as far as signal conditions generated
internally to Postgres go, but I remain unclear on how it helps for
response to actual signals.


You can can set the latch in the signal handler.

Here's a first attempt at implementing that. To demonstrate how it 
works, I modified walsender to use the new latch facility, also to 
respond quickly to SIGHUP and SIGTERM.


There's two kinds of latches, local and global. Local latches can only 
be set from the same process - allowing you to replace pg_usleep() with 
something that is always interruptible by signals (by setting the latch 
in the signal handler). The global latches work the same, and indeed the 
implementation is the same, but the latch resides in shared memory, and 
can be set by any process attached to shared memory. On Unix, when you 
set a latch waited for by another process, the setter sends SIGUSR1 to 
the waiting process, and the signal handler sends the byte to the 
self-pipe to wake up the select().


On Windows, we use WaitEvent to wait on a latch, and SetEvent to wake 
up. The difference between global and local latches is that for global 
latches, the Event object needs to be created upfront at postmaster 
startup so that its inherited to all child processes, and stored in 
shared memory. A local Event object can be created only in the process 
that needs it.


I put the code in src/backend/storage/ipc/latch.c now, but it probably 
ought to go in src/backend/portability instead, with a separate 
win32_latch.c file for Windows.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 615a7fa..094d0c9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -55,6 +55,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include pgstat.h
+#include replication/walsender.h
 #include storage/fd.h
 #include storage/procarray.h
 #include storage/sinvaladt.h
@@ -1025,6 +1026,13 @@ EndPrepare(GlobalTransaction gxact)
 
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
+	/*
+	 * Wake up all walsenders to send WAL up to the PREPARE record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
 	/* write correct CRC and close file */
 	if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
 	{
@@ -2005,6 +2013,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
+	/*
+	 * Wake up all walsenders to send WAL up to the COMMIT PREPARED record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
 	/* Mark the transaction committed in pg_clog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
@@ -2078,6 +2093,13 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	XLogFlush(recptr);
 
 	/*
+	 * Wake up all walsenders to send WAL up to the ABORT PREPARED record
+	 * immediately if replication is enabled
+	 */
+	if (max_wal_senders  0)
+		WalSndWakeup();
+
+	/*
 	 * Mark the transaction aborted in clog.  This is not absolutely necessary
 	 * but we may as well do it while we are here.
 	 */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6bcc55c..942d5c2 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -36,6 +36,7 @@
 #include libpq/be-fsstubs.h
 #include miscadmin.h
 #include pgstat.h
+#include replication/walsender.h
 #include storage/bufmgr.h
 #include storage/fd.h
 #include storage/lmgr.h
@@ -1068,6 +1069,13 @@ RecordTransactionCommit(void)
 		XLogFlush(XactLastRecEnd);
 
 		/*
+		 * Wake up all walsenders to send WAL up to the COMMIT record
+		 * immediately if replication is enabled
+		 */
+		if (max_wal_senders  0)
+			WalSndWakeup();
+
+		/*
 		 * Now we may update the CLOG, if we wrote a COMMIT record above
 		 */
 		if (markXidCommitted)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 53c2581..9f5d1af 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -66,9 +66,6 @@ bool		am_walsender = false;		/* Am I a walsender process ? */
 int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
 int			WalSndDelay = 200;	/* max sleep time between some actions */
 
-#define NAPTIME_PER_CYCLE 10L		/* max sleep time between cycles
-		 * (100ms) */
-
 /*
  * These variables are used similarly to openLogFile/Id/Seg/Off,
  * but for walsender to read the XLOG.
@@ -93,6 +90,7 @@ static volatile sig_atomic_t ready_to_stop = false;
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndShutdownHandler(SIGNAL_ARGS);
 static void WalSndQuickDieHandler(SIGNAL_ARGS);
+static void 

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-24 Thread Heikki Linnakangas

On 24/08/10 04:08, Alvaro Herrera wrote:

Excerpts from Tom Lane's message of lun ago 23 19:44:02 -0400 2010:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:



[ latch proposal ]


This seems reasonably clean as far as signal conditions generated
internally to Postgres go, but I remain unclear on how it helps for
response to actual signals.


This could probably replace the signalling between postmaster and
autovac launcher, as well.


Hmm, postmaster needs to stay out of shared memory..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-23 Thread Heikki Linnakangas

On 20/08/10 17:28, Tom Lane wrote:

[ It's way past time to change the thread title ]

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 20/08/10 16:24, Tom Lane wrote:

You keep on proposing solutions that only work for walsender :-(.



Well yes, the other places where we use pg_usleep() are not really a
problem as is.


Well, yes they are.  They cause unnecessary process wakeups and thereby
consume cycles even when the database is idle.  See for example a
longstanding complaint here:
https://bugzilla.redhat.com/show_bug.cgi?id=252129

If we're going to go to the trouble of having a mechanism like this,
I'd like it to fix that problem so I can close out that bug.


Hmm, if you want to put bgwriter and walwriter to deep sleep, then 
someone will need to wake them up when they have work to do. Currently 
they poll. Maybe they should just sleep longer, like 10 seconds, if 
there hasn't been any work to do in the last X wakeups.


We've been designing the new sleep facility so that the event that wakes 
up the sleep is sent from the signal handler in the same process, but it 
seems that all the potential users would actually want to be woken up 
from *another* process, so the signal handler seems like an unnecessary 
middleman. Particularly on Windows where signals are simulated with 
pipes and threads, while you could just send a Windows event directly 
from one process to another.


A common feature that all the users of this facility want is that once 
the event is sent, re-sending it is a fast no-op until re-enabled by the 
receiver. For example, if we need backends to wake up bgwriter after 
dirtying a buffer, you don't want to waste many cycles determining that 
bgwriter is already active and doesn't need to be woken up.


Let's call these latches. I'm thinking of something like this:

/* Similar to LWLockId */
typedef enum
{
  BgwriterLatch,
  WalwriterLatch,
  /* plus one for each walsender */
} LatchId;

/*
 * Wait for given latch to be set. Only one process can wait
 * for a given latch at a time.
 */
WaitLatch(LatchId latch, long timeout);

/*
 * Sets latch. Returns quickly if the latch is set already.
 */
SetLatch(LatchId latch);

/*
 * Clear the latch. Calling WaitLatch after this will sleep, unless
 * the latch is set again before the WaitLatch call.
 */
ResetLatch(LatchId latch);

There would be a boolean for each latch in shared memory, to indicate if 
the latch is armed, allowing quick return from SetLatch if the latch 
is already set. Plus a signal to wake up the waiting process (maybe use 
procsignal.c), and the self-pipe trick within the receiving process to 
make it race condition free. On Windows, the signal and the self-pipe 
trick are replaced with Windows events.


I'll try out this approach tomorrow..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-23 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 20/08/10 17:28, Tom Lane wrote:
 Well, yes they are.  They cause unnecessary process wakeups and thereby
 consume cycles even when the database is idle.  See for example a
 longstanding complaint here:
 https://bugzilla.redhat.com/show_bug.cgi?id=252129
 
 If we're going to go to the trouble of having a mechanism like this,
 I'd like it to fix that problem so I can close out that bug.

 Hmm, if you want to put bgwriter and walwriter to deep sleep, then 
 someone will need to wake them up when they have work to do. Currently 
 they poll. Maybe they should just sleep longer, like 10 seconds, if 
 there hasn't been any work to do in the last X wakeups.

I should probably clarify that I don't expect this patch to solve that
problem all by itself.  But ISTM that a guaranteed-interruptible version
of pg_usleep is a necessary prerequisite before we can even think about
dialing down the database's CPU consumption at idle.

 We've been designing the new sleep facility so that the event that wakes 
 up the sleep is sent from the signal handler in the same process, but it 
 seems that all the potential users would actually want to be woken up 
 from *another* process, so the signal handler seems like an unnecessary 
 middleman.

No, because there are also lots of cases where the signal is arriving
from a non-Postgres process; SIGTERM arriving from init being the killer
case that you simply don't get to propose an alternative design for.
More generally, I'm uninterested in decoupling cases like SIGHUP for
postgresql.conf change from the existing signal mechanism, because it's
just too useful to be able to trigger that externally.

 Particularly on Windows where signals are simulated with 
 pipes and threads, while you could just send a Windows event directly 
 from one process to another.

Windows of course has different constraints, and in particular it lacks
the constraint that we must be able to respond to system-defined
signals.  So I wouldn't object to the underlying implementation being
different for Windows.

 [ latch proposal ]

This seems reasonably clean as far as signal conditions generated
internally to Postgres go, but I remain unclear on how it helps for
response to actual signals.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-23 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 23 19:44:02 -0400 2010:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

  [ latch proposal ]
 
 This seems reasonably clean as far as signal conditions generated
 internally to Postgres go, but I remain unclear on how it helps for
 response to actual signals.

This could probably replace the signalling between postmaster and
autovac launcher, as well.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-20 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Aug 19, 2010 at 08:36:09PM +0300, Heikki Linnakangas wrote:

[...]

 Hmm, will need to think about a suitable API for that. The nice thing would 
 be that we could implement it using pselect() where available. (And 
 reliable - the Linux select() man page says that glibc's pselect() is 
 emulated using select(), and suffers from the very same race condition 
 pselect() was invented to solve. How awful is that!?)

It is indeed. It seems, though, that from Linux kernel 2.6.16 and glibc
2.4 on, things look better [1]. As a reference, Debian stable (not known
to adventure too far into the present ;-) is libc 2.7 on kernel 2.6.26.

Of course, enterprise GNU/Linux distros are said to be even more
conservative...

[1] http://lwn.net/Articles/176911/

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFMbiauBcgs9XrR2kYRAhiwAJ41f29jSIy409epTH0eJRXW17oByACeIkRo
CRg2BCw8tn3PkdnNR1i/MCY=
=GVMT
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-20 Thread Heikki Linnakangas

On 19/08/10 20:59, Tom Lane wrote:

Offhand I'd suggest something like

SetSleepInterrupt() -- called by signal handlers, writes pipe
ClearSleepInterrupt()   -- called by sleep-and-do-something loops, clears pipe

pg_usleep() itself remains the same, but it is now guaranteed to return
immediately if SetSleepInterrupt is called, or has been called since the
last ClearSleepInterrupt.


Hmm, we have pg_usleep() calls in some fairly low-level functions, like 
mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we 
don't want those pg_usleep()s to return immediately. And pg_usleep() is 
used in some client code too. I think we need a separate sleep function 
for this.


Another idea is to not use unix signals at all, but ProcSendSignal() and 
ProcWaitForSignal(). We would not need the signal handler at all. 
Walsender would use ProcWaitForSignal() instead of pg_usleep() and 
backends that want to wake it up would use ProcSendSignal(). The problem 
is that there is currently no way to specify a timeout, but I presume 
the underlying semaphore operations have that capability, and we could 
expose it.


Actually ProcSendSignal()/ProcWaitForSignal() won't work as is, because 
walsender doesn't have a PGPROC entry, but you could easily build a 
similar mechanism, using PGSemaphoreLock/Unlock like 
ProcSendSignal()/WaitForSignal() does.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-20 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, we have pg_usleep() calls in some fairly low-level functions, like 
 mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we 
 don't want those pg_usleep()s to return immediately. And pg_usleep() is 
 used in some client code too. I think we need a separate sleep function 
 for this.

Well, we'd need some careful thought about which sleeps need what, but I
don't necessarily have an objection to a separate interruptable sleep
function.

 Another idea is to not use unix signals at all, but ProcSendSignal() and 
 ProcWaitForSignal(). We would not need the signal handler at all. 
 Walsender would use ProcWaitForSignal() instead of pg_usleep() and 
 backends that want to wake it up would use ProcSendSignal().

You keep on proposing solutions that only work for walsender :-(.
Most of the other places where we have pg_usleep actually do want
a timed sleep, I believe.  It's also unclear that we can always expect
ProcSendSignal to be usable --- for example, stuff like SIGHUP would
be sent by processes that might not be connected to shared memory.

 The problem 
 is that there is currently no way to specify a timeout, but I presume 
 the underlying semaphore operations have that capability, and we could 
 expose it.

They don't, or at least the semop-based implementation doesn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-20 Thread Heikki Linnakangas

On 20/08/10 16:24, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Hmm, we have pg_usleep() calls in some fairly low-level functions, like
mdunlink() and s_lock(). If someone has called SetSleepInterrupt(), we
don't want those pg_usleep()s to return immediately. And pg_usleep() is
used in some client code too. I think we need a separate sleep function
for this.


Well, we'd need some careful thought about which sleeps need what, but I
don't necessarily have an objection to a separate interruptable sleep
function.


If we have to, we could also support multiple interrupts with multiple 
self-pipes, so that you can choose at pg_usleep() which ones to wake up on.



Another idea is to not use unix signals at all, but ProcSendSignal() and
ProcWaitForSignal(). We would not need the signal handler at all.
Walsender would use ProcWaitForSignal() instead of pg_usleep() and
backends that want to wake it up would use ProcSendSignal().


You keep on proposing solutions that only work for walsender :-(.


Well yes, the other places where we use pg_usleep() are not really a 
problem as is. If as a side-effect we can make them respond more quickly 
to signals, with small changes, that's good, but walsender is the only 
one that's performance critical.


That said, a select() based solution is my current favorite.


Most of the other places where we have pg_usleep actually do want
a timed sleep, I believe.  It's also unclear that we can always expect
ProcSendSignal to be usable --- for example, stuff like SIGHUP would
be sent by processes that might not be connected to shared memory.


The problem
is that there is currently no way to specify a timeout, but I presume
the underlying semaphore operations have that capability, and we could
expose it.


They don't, or at least the semop-based implementation doesn't.


There's semtimedop(). I don't know how portable it is, but it seems to 
exist at least on Linux, Solaris, HPUX and AIX. On what platforms do we 
use sysv semaphores?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-08-20 Thread Tom Lane
[ It's way past time to change the thread title ]

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 20/08/10 16:24, Tom Lane wrote:
 You keep on proposing solutions that only work for walsender :-(.

 Well yes, the other places where we use pg_usleep() are not really a 
 problem as is.

Well, yes they are.  They cause unnecessary process wakeups and thereby
consume cycles even when the database is idle.  See for example a
longstanding complaint here:
https://bugzilla.redhat.com/show_bug.cgi?id=252129

If we're going to go to the trouble of having a mechanism like this,
I'd like it to fix that problem so I can close out that bug.

 There's semtimedop(). I don't know how portable it is, but it seems to 
 exist at least on Linux, Solaris, HPUX and AIX.

It's not on my HPUX, and I don't see it in the Single Unix Spec.

 On what platforms do we use sysv semaphores?

AFAIK, everything except Windows and extremely old versions of OS X.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Heikki Linnakangas

On 19/08/10 04:46, Robert Haas wrote:

At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.


Agreed. EnterpriseDB is interested in having that feature, so I'm on the 
hook to spend time on it regardless of commitfests.



I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.


Well, that's the only uncontroversial thing about it that doesn't 
require any fighting over the UI or desired behavior. That's why I've 
focused on that first, and also because it's useful regardless of 
synchronous replication. But once that's done, we'll have to nail down 
how synchronous replication is supposed to behave, and how to configure it.



 And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.


Fujii is on vacation, but I've started working on it. The two issues 
with Fujii's latest patch are that it would not respond promptly on 
platforms where signals don't interrupt sleep, and it suffers the 
classic race condition that pselect() was invented for. I'm going to 
replace pg_usleep() with select(), and use the so called self-pipe 
trick to get over the race condition. I have that written up but I want 
to do some testing and cleanup before posting the patch.



  It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not.  We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November.  We'll be lucky to
commit sync rep that early.


Agreed. We need to decide the scope and minimum set of features real 
soon to get something concrete finished.


BTW, on what platforms signals don't interrupt sleep? Although that 
issue has been discussed many times before, I couldn't find any 
reference to a real platform in the archives.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 BTW, on what platforms signals don't interrupt sleep? Although that 
 issue has been discussed many times before, I couldn't find any 
 reference to a real platform in the archives.

I've got one in captivity (my old HPUX box).  Happy to test whatever you
come up with.

Considering that pg_usleep is implemented with select, I'm not following
what you mean by replace pg_usleep() with select()?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Alvaro Herrera
Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:
 On 19/08/10 04:46, Robert Haas wrote:

   And so far we haven't seen a patch for that.
  Somebody write one.  And then let's get it reviewed and committed RSN.
 
 Fujii is on vacation, but I've started working on it. The two issues 
 with Fujii's latest patch are that it would not respond promptly on 
 platforms where signals don't interrupt sleep, and it suffers the 
 classic race condition that pselect() was invented for. I'm going to 
 replace pg_usleep() with select(), and use the so called self-pipe 
 trick to get over the race condition. I have that written up but I want 
 to do some testing and cleanup before posting the patch.

Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
select() doesn't handle pipes, only sockets.  You may need some extra
hack to make it work there.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Magnus Hagander
On Thu, Aug 19, 2010 at 17:08, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:
 On 19/08/10 04:46, Robert Haas wrote:

   And so far we haven't seen a patch for that.
  Somebody write one.  And then let's get it reviewed and committed RSN.

 Fujii is on vacation, but I've started working on it. The two issues
 with Fujii's latest patch are that it would not respond promptly on
 platforms where signals don't interrupt sleep, and it suffers the
 classic race condition that pselect() was invented for. I'm going to
 replace pg_usleep() with select(), and use the so called self-pipe
 trick to get over the race condition. I have that written up but I want
 to do some testing and cleanup before posting the patch.

 Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
 select() doesn't handle pipes, only sockets.  You may need some extra
 hack to make it work there.

We have a pipe implementation using sockets that is used on Windows
for just this reason, IIRC.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Heikki Linnakangas

On 19/08/10 18:08, Alvaro Herrera wrote:

Excerpts from Heikki Linnakangas's message of jue ago 19 02:02:34 -0400 2010:

On 19/08/10 04:46, Robert Haas wrote:



  And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.


Fujii is on vacation, but I've started working on it. The two issues
with Fujii's latest patch are that it would not respond promptly on
platforms where signals don't interrupt sleep, and it suffers the
classic race condition that pselect() was invented for. I'm going to
replace pg_usleep() with select(), and use the so called self-pipe
trick to get over the race condition. I have that written up but I want
to do some testing and cleanup before posting the patch.


Hmm, IIRC the self-pipe trick doesn't work on Windows, mainly because
select() doesn't handle pipes, only sockets.  You may need some extra
hack to make it work there.


That's fine, we can do the naive set-a-flag implementation on Windows 
because on Windows our signals are only delivered at 
CHECK_FOR_INTERRUPTS(), so we don't have the race condition to begin with.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Heikki Linnakangas

On 19/08/10 16:38, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

BTW, on what platforms signals don't interrupt sleep? Although that
issue has been discussed many times before, I couldn't find any
reference to a real platform in the archives.


I've got one in captivity (my old HPUX box).  Happy to test whatever you
come up with.

Considering that pg_usleep is implemented with select, I'm not following
what you mean by replace pg_usleep() with select()?


Instead of using pg_usleep(), call select() directly, waiting not only 
for the timeout, but also for data to arrive on the self-pipe. The 
signal handler writes a byte to the self-pipe, waking up the select(). 
That way the select() is interupted by the signal arriving, even if 
signals per se don't interrupt it. And it closes the race condition 
involved with setting a flag in the signal handler and checking that in 
the main loop.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 19/08/10 16:38, Tom Lane wrote:
 Considering that pg_usleep is implemented with select, I'm not following
 what you mean by replace pg_usleep() with select()?

 Instead of using pg_usleep(), call select() directly, waiting not only 
 for the timeout, but also for data to arrive on the self-pipe. The 
 signal handler writes a byte to the self-pipe, waking up the select(). 
 That way the select() is interupted by the signal arriving, even if 
 signals per se don't interrupt it. And it closes the race condition 
 involved with setting a flag in the signal handler and checking that in 
 the main loop.

Hmm, but couldn't you still do that inside pg_usleep?  Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Heikki Linnakangas

On 19/08/10 19:57, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 19/08/10 16:38, Tom Lane wrote:

Considering that pg_usleep is implemented with select, I'm not following
what you mean by replace pg_usleep() with select()?



Instead of using pg_usleep(), call select() directly, waiting not only
for the timeout, but also for data to arrive on the self-pipe. The
signal handler writes a byte to the self-pipe, waking up the select().
That way the select() is interupted by the signal arriving, even if
signals per se don't interrupt it. And it closes the race condition
involved with setting a flag in the signal handler and checking that in
the main loop.


Hmm, but couldn't you still do that inside pg_usleep?  Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.


I don't understand, do what inside pg_usleep()?

We only need to respond quickly to one signal, the one that tells 
walsender there's some new WAL that you should send. We can rely on 
polling for all the other signals like SIGHUP for config reload or 
shutdown request, like we do today.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 19/08/10 19:57, Tom Lane wrote:
 Hmm, but couldn't you still do that inside pg_usleep?  Signal handlers
 that do that couldn't know if they were interrupting a sleep per se,
 so this would have to be a backend-wide convention.

 I don't understand, do what inside pg_usleep()?

I was envisioning still using pg_usleep, and having this interaction
between signal handlers and the delay be private to pg_usleep, rather
than putting such ugly code into forty-nine different places.  There
are a *lot* of places where we have loops that break down delays
into at-most-one-second pg_usleep calls, and if we're going to have a
hack like this we should fix them all, not just two or three that SR
cares about.

But I'm still not seeing how this self-pipe hack avoids a race
condition.  If the signal handler is sending a byte whenever it
executes, then you could have bytes already stacked up in the pipe
from previous interrupts that didn't happen to come while inside
pg_usleep.  If you clear those before sleeping, you have a race
condition, and if you don't, then you fail to sleep the intended
amount of time even though there was no interrupt this time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Heikki Linnakangas

On 19/08/10 20:18, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 19/08/10 19:57, Tom Lane wrote:

Hmm, but couldn't you still do that inside pg_usleep?  Signal handlers
that do that couldn't know if they were interrupting a sleep per se,
so this would have to be a backend-wide convention.



I don't understand, do what inside pg_usleep()?


I was envisioning still using pg_usleep, and having this interaction
between signal handlers and the delay be private to pg_usleep, rather
than putting such ugly code into forty-nine different places.  There
are a *lot* of places where we have loops that break down delays
into at-most-one-second pg_usleep calls, and if we're going to have a
hack like this we should fix them all, not just two or three that SR
cares about.


Hmm, will need to think about a suitable API for that. The nice thing 
would be that we could implement it using pselect() where available. 
(And reliable - the Linux select() man page says that glibc's pselect() 
is emulated using select(), and suffers from the very same race 
condition pselect() was invented to solve. How awful is that!?)



But I'm still not seeing how this self-pipe hack avoids a race
condition.  If the signal handler is sending a byte whenever it
executes, then you could have bytes already stacked up in the pipe
from previous interrupts that didn't happen to come while inside
pg_usleep.  If you clear those before sleeping, you have a race
condition, and if you don't, then you fail to sleep the intended
amount of time even though there was no interrupt this time.


You clear the pipe after waking up. Before sending all the pending WAL 
and deciding that you're fully caught up again:


for(;;)
{
  1. select()
  2. clear pipe
  3. send any pending WAL
}

There's more information on the self-pipe trick at e.g 
http://lwn.net/Articles/177897/


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 19/08/10 20:18, Tom Lane wrote:
 But I'm still not seeing how this self-pipe hack avoids a race
 condition.  If the signal handler is sending a byte whenever it
 executes, then you could have bytes already stacked up in the pipe
 from previous interrupts that didn't happen to come while inside
 pg_usleep.  If you clear those before sleeping, you have a race
 condition, and if you don't, then you fail to sleep the intended
 amount of time even though there was no interrupt this time.

 You clear the pipe after waking up.

Hmm ... that doesn't answer my second objection about failing to sleep
the expected amount of time, but on reflection I guess that can't be
done: we have to be able to cope with interrupts occurring just before
the sleep actually begins, and there's no way to define just before
except by reference to the calling code having done whatever it might
need to do before/after sleeping.

 Hmm, will need to think about a suitable API for that.

Offhand I'd suggest something like

SetSleepInterrupt() -- called by signal handlers, writes pipe
ClearSleepInterrupt()   -- called by sleep-and-do-something loops, clears pipe

pg_usleep() itself remains the same, but it is now guaranteed to return
immediately if SetSleepInterrupt is called, or has been called since the
last ClearSleepInterrupt.

 The nice thing 
 would be that we could implement it using pselect() where available. 
 (And reliable - the Linux select() man page says that glibc's pselect() 
 is emulated using select(), and suffers from the very same race 
 condition pselect() was invented to solve. How awful is that!?)

Ick.  So how would you tell if it's trustworthy?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas
Kevin didn't send out an official gavel-banging announcement of the
end of CommitFest 2009-07 (possibly because I neglected until today to
give him privileges to actually change it in the web application), but
I'd just like to take a minute to thank him publicly for his efforts.
We started this CommitFest with something like 60 patches, which is
definitely on the larger side for a CommitFest, and Kevin did a great
job staying on top of what was going on with all of them and, I felt,
really helped keep us on track.  At the same time, I felt he did this
with a very light touch that made the whole thing go very smoothly.
So -- thanks, Kevin!

I also appreciate the efforts of all those who reviewed.  Good reviews
are really critical to keep the burden from building up on committers,
and I appreciate the efforts of everyone who contributed, in many
cases probably on their own time.  I'm particularly grateful to the
people who were vigilant about spelling, grammar, coding style,
whitespace, and other nitpicky little issues that are not much fun,
but which at least for me are a major time sink if they're still
lingering when it comes time to do the actual commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I'd just like to take a minute to thank him publicly for his
 efforts.  We started this CommitFest with something like 60
 patches, which is definitely on the larger side for a CommitFest,
 and Kevin did a great job staying on top of what was going on with
 all of them and, I felt, really helped keep us on track.  At the
 same time, I felt he did this with a very light touch that made
 the whole thing go very smoothly.  So -- thanks, Kevin!
 
You're welcome.  It was educational for me.  I don't think I want to
try to handle two in a row, and I think your style is better suited
than mine to the final CF for a release, but I might be able to take
on the 2010-11 CF if people want that.
 
My hand was not always so light behind the scenes, though -- I sent
or received about 100 off-list emails to try to keep things moving. 
Hopefully nobody was too offended by my nagging.  :-)
 
Oh, and thanks for putting together the CF web application.  Without
that, I couldn't have done half as well as I did.
 
 I also appreciate the efforts of all those who reviewed.
 
Yes, I'll second that!  I've always been impressed with the
PostgreSQL community, and managing this CF gave me new insights and
appreciation for the intelligence, professionalism, and community
spirit of its members -- authors, reviewers, and committers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Greg Smith

Kevin Grittner wrote:

I don't think I want to try to handle two in a row, and I think your style is 
better suited
than mine to the final CF for a release, but I might be able to take on the 
2010-11 CF if people want that


Ha, you just put yourself right back on the hook with that comment, and 
Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the 
question of what's going to happen with CF-2 next month.


I think the crucial thing with the 2010-09 CF is that we have to get 
serious progress made sorting out all the sync rep ideas before/during 
that one.  The review Yeb did and subsequent discussion was really 
helpful, but the scope on that needs to actually get nailed down to 
*something* concrete if it's going to get built early enough in the 9.1 
release to be properly reviewed and tested for more than one round.  
Parts of the design and scope still feel like they're expanding to me, 
and I think having someone heavily involved in the next CF who is 
willing to push on nailing down that particular area is pretty 
important.  Will volunteer myself if I can stay on schedule to make it 
past the major time commitment sink I've had so far this year by then.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas
On Wed, Aug 18, 2010 at 7:46 PM, Greg Smith g...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 I don't think I want to try to handle two in a row, and I think your style
 is better suited
 than mine to the final CF for a release, but I might be able to take on
 the 2010-11 CF if people want that

 Ha, you just put yourself right back on the hook with that comment, and
 Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the
 question of what's going to happen with CF-2 next month.

My reputation precedes me, apparently.  Although I appreciate everyone
so far being willing to avoid mentioning exactly what that reputation
might be.  :-)

 I think the crucial thing with the 2010-09 CF is that we have to get serious
 progress made sorting out all the sync rep ideas before/during that one.
  The review Yeb did and subsequent discussion was really helpful, but the
 scope on that needs to actually get nailed down to *something* concrete if
 it's going to get built early enough in the 9.1 release to be properly
 reviewed and tested for more than one round.  Parts of the design and scope
 still feel like they're expanding to me, and I think having someone heavily
 involved in the next CF who is willing to push on nailing down that
 particular area is pretty important.  Will volunteer myself if I can stay on
 schedule to make it past the major time commitment sink I've had so far this
 year by then.

Sitting on Sync Rep is a job and a half by itself, without adding all
the other CF work on top of it.  Maybe we should try to find two
vi^Holunteers: a CommitFest Manager (CFM) and a Major Feature
Babysitter (MBS).  At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.  I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.  And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.
 It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not.  We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November.  We'll be lucky to
commit sync rep that early.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers