Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

Florian Pflug Mon, 04 Jul 2011 09:37:00 -0700

On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:
>>       Under Linux, select() may report a socket file descriptor as "ready for
>>       reading",  while nevertheless a subsequent read blocks.  This could for
>>       example happen when data has arrived but  upon  examination  has  wrong
>>       checksum and is discarded.  There may be other circumstances in which a
>>       file descriptor is spuriously reported as ready.  Thus it may be  safer
>>       to use O_NONBLOCK on sockets that should not block.
> 
> So in theory, on Linux you might WaitLatch might sometimes incorrectly return 
> WL_POSTMASTER_DEATH. None of the callers check for WL_POSTMASTER_DEATH return 
> code, they call PostmasterIsAlive() before assuming the postmaster has died, 
> so that won't affect correctness at the moment. I doubt that scenario can 
> even happen in our case, select() on a pipe that is never written to. But 
> maybe we should add add an assertion to WaitLatch to assert that if select() 
> reports that the postmaster pipe has been closed, PostmasterIsAlive() also 
> returns false.


The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.

Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

Reply via email to