2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shul...@zalando.de>
> On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>> On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <
>> tomas.von...@2ndquadrant.com> wrote:
>>>> Now the backend that has been signaled on the second call to
>>>> pg_cmdstatus (it can be either some other backend, or the backend B
>>>> again) will not find an unprocessed slot, thus it will not try to
>>>> attach/detach the queue and the backend A will block forever.
>>>> This requires a really bad timing and the user should be able to
>>>> interrupt the querying backend A still.
>>> I think we can't rely on the low probability that this won't happen, and
>>> we should not rely on people interrupting the backend. Being able to detect
>>> the situation and fail gracefully should be possible.
>>> It may be possible to introduce some lock-less protocol preventing such
>>> situations, but it's not there at the moment. If you believe it's possible,
>>> you need to explain and "prove" that it's actually safe.
>>> Otherwise we may need to introduce some basic locking - for example we
>>> may introduce a LWLock for each slot, and lock it with dontWait=true (and
>>> skip it if we couldn't lock it). This should prevent most scenarios where
>>> one corrupted slot blocks many processes.
>> OK, I will revisit this part then.
> I have a radical proposal to remove the need for locking: make the
> CmdStatusSlot struct consist of a mere dsm_handle and move all the required
> metadata like sender_pid, request_type, etc. into the shared memory segment
> If we allow the only the requesting process to update the slot (that is
> the handle value itself) this removes the need for locking between sender
> and receiver.
> The sender will walk through the slots looking for a non-zero dsm handle
> (according to dsm_create() implementation 0 is considered an invalid
> handle), and if it finds a valid one, it will attach and look inside, to
> check if it's destined for this process ID. At first that might sound
> strange, but I would expect 99% of the time that the only valid slot would
> be for the process that has been just signaled.
> The sender process will then calculate the response message, update the
> result_code in the shared memory segment and finally send the message
> through the queue. If the receiver has since detached we get a detached
> result code and bail out.
> Clearing the slot after receiving the message should be the requesting
> process' responsibility. This way the receiver only writes to the slot and
> the sender only reads from it.
> By the way, is it safe to assume atomic read/writes of dsm_handle
> (uint32)? I would be surprised if not.
I don't see any reason why it should not to work - only few processes will
wait for data - so lost attach/detach shm operations will not be too much.