On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska <a...@cybertec.at> wrote:
>> During my experiments with parallel workers I sometimes saw the "master" and
>> worker process blocked. The master uses shm queue to send data to the worker,
>> both sides nowait==false. I concluded that the following happened:
>>
>> The worker process set itself as a receiver on the queue after
>> shm_mq_wait_internal() has completed its first check of "ptr", so this
>> function left sender's procLatch in reset state. But before the procLatch was
>> reset, the receiver still managed to read some data and set sender's 
>> procLatch
>> to signal the reading, and eventually called its (receiver's) WaitLatch().
>>
>> So sender has effectively missed the receiver's notification and called
>> WaitLatch() too (if the receiver already waits on its latch, it does not help
>> for sender to call shm_mq_notify_receiver(): receiver won't do anything
>> because there's no new data in the queue).
>>
>> Below is my patch proposal.
>
> Another good catch.  However, I would prefer to fix this without
> introducing a "continue" as I think that will make the control flow
> clearer.  Therefore, I propose the attached variant of your idea.

Err, that doesn't work at all.  Have a look at this version instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index e765cea..0e60dbc 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -777,33 +777,37 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			return SHM_MQ_DETACHED;
 		}
 
-		if (available == 0)
+		if (available == 0 && !mqh->mqh_counterparty_attached)
 		{
-			shm_mq_result res;
-
 			/*
 			 * The queue is full, so if the receiver isn't yet known to be
 			 * attached, we must wait for that to happen.
 			 */
-			if (!mqh->mqh_counterparty_attached)
+			if (nowait)
 			{
-				if (nowait)
+				if (shm_mq_get_receiver(mq) == NULL)
 				{
-					if (shm_mq_get_receiver(mq) == NULL)
-					{
-						*bytes_written = sent;
-						return SHM_MQ_WOULD_BLOCK;
-					}
-				}
-				else if (!shm_mq_wait_internal(mq, &mq->mq_receiver,
-											   mqh->mqh_handle))
-				{
-					mq->mq_detached = true;
 					*bytes_written = sent;
-					return SHM_MQ_DETACHED;
+					return SHM_MQ_WOULD_BLOCK;
 				}
-				mqh->mqh_counterparty_attached = true;
 			}
+			else if (!shm_mq_wait_internal(mq, &mq->mq_receiver,
+										   mqh->mqh_handle))
+			{
+				mq->mq_detached = true;
+				*bytes_written = sent;
+				return SHM_MQ_DETACHED;
+			}
+			mqh->mqh_counterparty_attached = true;
+
+			/*
+			 * The receiver may have read some data after attaching, so we
+			 * must not wait without rechecking the queue state.
+			 */
+		}
+		else if (available == 0)
+		{
+			shm_mq_result res;
 
 			/* Let the receiver know that we need them to read some data. */
 			res = shm_mq_notify_receiver(mq);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to