Re: [HACKERS] shared memory message queues

Robert Haas Mon, 09 Dec 2013 11:27:42 -0800

On Sun, Dec 8, 2013 at 5:52 AM, Kohei KaiGai <[email protected]> wrote:
> 2013/12/6 Kohei KaiGai <[email protected]>:
>> What will happen if sender tries to send a large chunk that needs to
>> be split into multiple sub-chunks and receiver concurrently detaches
>> itself from the queue during the writes by sender?
>> It seems to me the sender gets SHM_MQ_DETACHED and only
>> earlier half of the chunk still remains on the queue even though
>> its total length was already in the message queue.
>> It may eventually lead infinite loop on the receiver side when another
>> receiver appeared again later, then read incomplete chunk.
>> Does it a feasible scenario? If so, it might be a solution to prohibit
>> enqueuing something without receiver, and reset queue when a new
>> receiver is attached.
>>
> Doesn't it an intended usage to attach a peer process on a message
> queue that had once detached, does it?
> If so, it may be a solution to put ereport() on shm_mq_set_receiver()
> and shm_mq_set_sender() to prohibit to assign a process on the
> message queue with mq_detached = true. It will make the situation
> simplified.


It's not intended that you should be able to attach a new reader or
writer in place of an old one that detached.  That would in fact be
pretty tricky to support, because if the detached process was in the
middle of reading or writing a message at the time it died, then
there's no way to recover protocol sync.  We could design some
mechanism for that, but in the case of background workers connected to
dynamic shared memory segments it isn't needed, because I assume that
when the background worker croaks, you're going to tear down the
dynamic shared memory segment and thus the whole queue will disappear;
if the user retries the query, we'll create a whole new segment
containing a whole new queue (or queues).

Now, if we wanted to use these queues in permanent shared memory, we'd
probably need to think a little bit harder about this.  It is not
impossible to make it work even as things stand, because you could
reuse the same chunk of shared memory and just overwrite it with a
newly-initialized queue.  You'd need some mechanism to figure out when
to do that, and it might be kind of ugly, but I think i'd be doable.
That wasn't the design center for this attempt, though, and if we want
to use it that way then we probably should spend some time figuring
out how to support both a "clean" detach, where the reader or writer
goes away at a message boundary, and possibly also a "dirty" detach,
where the reader or writer goes away in the middle of a message.  I
view those as problems for future patches, though.

> Regarding to the test-shm-mq-v1.patch, setup_background_workers()
> tries to launch nworkers of background worker processes, however,
> may fail during the launching if max_worker_processes is not enough.
> Is it a situation to attach the BGWORKER_EPHEMERAL flag when
> your patch gets committed, isn't it?

I dropped the proposal for BGWORKER_EPHEMERAL; I no longer think we
need that.  If not all of the workers can be registered,
setup_background_workers() will throw an error when
RegisterDynamicBackgroundWorker returns false.  If the workers are
successfully registered but don't get as far as connecting to the
shm_mq, wait_for_workers_to_become_ready() will detect that condition
and throw an error.  If all of the workers start up and attached to
the shared memory message queues but then later one of them dies, the
fact that it got as far as connecting to the shm_mq means that the
message queue's on_dsm_detach callback will run, which will mark the
queues to which it is connected as detached.  That will cause the
workers on either side of it to exit also until eventually the failure
propagates back around to the user backend.  This is all a bit complex
but I don't see a simpler solution.

> Also, test_shm_mq_setup() waits for completion of starting up of
> background worker processes. I'm uncertain whether it is really
> needed, because this shared memory message queue allows to
> send byte stream without receiver, and also blocks until byte
> stream will come from the peer to be set later.

That's actually a very important check.  Suppose we've got just 3
worker processes, so that the message queues are connected like this:

user backend -> worker 1 -> worker 2 -> worker 3 -> user backend

When test_shm_mq_setup connects to the queues linking it to worker 1
and worker 3, it passes a BackgroundWorkerHandle to shm_mq_attach. As
a result, if either worker 1 or worker 3 fails during startup, before
attaching to the queue, the user backend would notice that and error
out right away, even if it didn't do
wait_for_workers_to_become_ready().  However, if worker 2 fails during
startup, neither the user backend nor either of the other workers
would notice that without wait_for_workers_to_become_ready(): the user
backend isn't connected to worker 2 by a shm_mq at all, and workers 1
and 3 have no BackgroundWorkerHandle to pass to shm_mq_attach(),
because they're not the process that registered worker 2 in the first
place.  So everything would just hang.  The arrangement I've actually
got here should ensure that no matter how many workers you have and
which ones die at what point in their life cycle, everything will shut
down properly.  If that's not the case, it's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory message queues

Reply via email to