Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Petr Jelinek
On 06/06/17 23:42, Andres Freund wrote:
> On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
>> On 06/06/17 23:17, Andres Freund wrote:
>>> Right.  I found a couple more instance of similarly iffy, although not
>>> quite as broken, patterns in launcher.c.  It's easy to get this wrong,
>>> but it's a lot easy if you do it differently everywhere you use a
>>> latch.  It's not good if code in the same file, by the same author(s),
>>> has different ways of using latches.
>>
>> Huh? I see same pattern everywhere in launcher.c, what am I missing?
> 
> WaitForReplicationWorkerAttach:
> while (...)
> CHECK_FOR_INTERRUPTS();
> /* other stuff including returns */
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
> 
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff */
> CHECK_FOR_INTERRUPTS()
> WaitLatch()
> POSTMASTER_DEATH
> ResetLatch()
> /* other stuff including returns */
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff including returns */
> CHECK_FOR_INTERRUPTS();
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
> 
> ApplyLauncherMain:
> while (!got_SIGTERM)
> /* lots other stuff */
> WaitLatch()
> WL_POSTMASTER_DEATH
> /* some other stuff */
> ResetLatch()
> (note no CFI)
> 
> they're not hugely different, but subtely there are differences.
> Sometimes you're guaranteed to check for interrupts after resetting the
> latch, in other cases not. Sometimes expensive-ish things happen before
> a CFI...
> 

Ah that's because signals in launcher are broken, see
https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb...@2ndquadrant.com
which also includes patch to fix it.

We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Andres Freund
On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
> On 06/06/17 23:17, Andres Freund wrote:
> > Right.  I found a couple more instance of similarly iffy, although not
> > quite as broken, patterns in launcher.c.  It's easy to get this wrong,
> > but it's a lot easy if you do it differently everywhere you use a
> > latch.  It's not good if code in the same file, by the same author(s),
> > has different ways of using latches.
> 
> Huh? I see same pattern everywhere in launcher.c, what am I missing?

WaitForReplicationWorkerAttach:
while (...)
CHECK_FOR_INTERRUPTS();
/* other stuff including returns */
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

logicalrep_worker_stop loop 1:
while (...)
/* other stuff */
CHECK_FOR_INTERRUPTS()
WaitLatch()
POSTMASTER_DEATH
ResetLatch()
/* other stuff including returns */
logicalrep_worker_stop loop 1:
while (...)
/* other stuff including returns */
CHECK_FOR_INTERRUPTS();
WaitLatch()
WL_POSTMASTER_DEATH
ResetLatch()

ApplyLauncherMain:
while (!got_SIGTERM)
/* lots other stuff */
WaitLatch()
WL_POSTMASTER_DEATH
/* some other stuff */
ResetLatch()
(note no CFI)

they're not hugely different, but subtely there are differences.
Sometimes you're guaranteed to check for interrupts after resetting the
latch, in other cases not. Sometimes expensive-ish things happen before
a CFI...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Petr Jelinek
On 06/06/17 23:17, Andres Freund wrote:
> On 2017-06-06 17:14:59 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> The function  in $subject does:
>>
>>> ResetLatch(>procLatch);
>>> rc = WaitLatchOrSocket(>procLatch,
>>>WL_POSTMASTER_DEATH | WL_SOCKET_READABLE 
>>> |
>>>WL_LATCH_SET,
>>>PQsocket(streamConn),
>>>0,
>>>WAIT_EVENT_LIBPQWALRECEIVER);
>>
>> Yeah, this is certainly broken.
>>
>>> Afaict, the ResetLatch() really should just instead be in the if (rc & 
>>> WL_LATCH_SET) block.
>>
>> And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
>> since that is the useful work that we want to be sure occurs after
>> any latch-setting event.
> 
> Right.  I found a couple more instance of similarly iffy, although not
> quite as broken, patterns in launcher.c.  It's easy to get this wrong,
> but it's a lot easy if you do it differently everywhere you use a
> latch.  It's not good if code in the same file, by the same author(s),
> has different ways of using latches.

Huh? I see same pattern everywhere in launcher.c, what am I missing?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Andres Freund
On 2017-06-06 17:14:59 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > The function  in $subject does:
> 
> > ResetLatch(>procLatch);
> > rc = WaitLatchOrSocket(>procLatch,
> >WL_POSTMASTER_DEATH | WL_SOCKET_READABLE 
> > |
> >WL_LATCH_SET,
> >PQsocket(streamConn),
> >0,
> >WAIT_EVENT_LIBPQWALRECEIVER);
> 
> Yeah, this is certainly broken.
> 
> > Afaict, the ResetLatch() really should just instead be in the if (rc & 
> > WL_LATCH_SET) block.
> 
> And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
> since that is the useful work that we want to be sure occurs after
> any latch-setting event.

Right.  I found a couple more instance of similarly iffy, although not
quite as broken, patterns in launcher.c.  It's easy to get this wrong,
but it's a lot easy if you do it differently everywhere you use a
latch.  It's not good if code in the same file, by the same author(s),
has different ways of using latches.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Tom Lane
Andres Freund  writes:
> The function  in $subject does:

> ResetLatch(>procLatch);
> rc = WaitLatchOrSocket(>procLatch,
>WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
>WL_LATCH_SET,
>PQsocket(streamConn),
>0,
>WAIT_EVENT_LIBPQWALRECEIVER);

Yeah, this is certainly broken.

> Afaict, the ResetLatch() really should just instead be in the if (rc & 
> WL_LATCH_SET) block.

And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
since that is the useful work that we want to be sure occurs after
any latch-setting event.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Andres Freund
Hi,

The function  in $subject does:
while (PQisBusy(streamConn))
{
int rc;

/*
 * We don't need to break down the sleep into smaller increments,
 * since we'll get interrupted by signals and can either handle
 * interrupts here or elog(FATAL) within SIGTERM signal handler if
 * the signal arrives in the middle of establishment of
 * replication connection.
 */
ResetLatch(>procLatch);
rc = WaitLatchOrSocket(>procLatch,
   WL_POSTMASTER_DEATH | WL_SOCKET_READABLE |
   WL_LATCH_SET,
   PQsocket(streamConn),
   0,
   WAIT_EVENT_LIBPQWALRECEIVER);
if (rc & WL_POSTMASTER_DEATH)
exit(1);
/* interrupted */
if (rc & WL_LATCH_SET)
{
CHECK_FOR_INTERRUPTS();
continue;
}

Doing ResetLatch();WaitLatch() like that makes it possible to miss a the
latch being set, e.g. if it happens just after WaitLatchOrSocket()
returns.

Afaict, the ResetLatch() really should just instead be in the if (rc & 
WL_LATCH_SET)
block.

Unless somebody protests, I'll make it so.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers