2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shul...@zalando.de> :
> On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr < > oleksandr.shul...@zalando.de> wrote: > >> On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra < >> tomas.von...@2ndquadrant.com> wrote: >> >>> >>>> Now the backend that has been signaled on the second call to >>>> pg_cmdstatus (it can be either some other backend, or the backend B >>>> again) will not find an unprocessed slot, thus it will not try to >>>> attach/detach the queue and the backend A will block forever. >>>> >>>> This requires a really bad timing and the user should be able to >>>> interrupt the querying backend A still. >>>> >>> >>> I think we can't rely on the low probability that this won't happen, and >>> we should not rely on people interrupting the backend. Being able to detect >>> the situation and fail gracefully should be possible. >>> >>> It may be possible to introduce some lock-less protocol preventing such >>> situations, but it's not there at the moment. If you believe it's possible, >>> you need to explain and "prove" that it's actually safe. >>> >>> Otherwise we may need to introduce some basic locking - for example we >>> may introduce a LWLock for each slot, and lock it with dontWait=true (and >>> skip it if we couldn't lock it). This should prevent most scenarios where >>> one corrupted slot blocks many processes. >> >> >> OK, I will revisit this part then. >> > > I have a radical proposal to remove the need for locking: make the > CmdStatusSlot struct consist of a mere dsm_handle and move all the required > metadata like sender_pid, request_type, etc. into the shared memory segment > itself. > > If we allow the only the requesting process to update the slot (that is > the handle value itself) this removes the need for locking between sender > and receiver. > > The sender will walk through the slots looking for a non-zero dsm handle > (according to dsm_create() implementation 0 is considered an invalid > handle), and if it finds a valid one, it will attach and look inside, to > check if it's destined for this process ID. At first that might sound > strange, but I would expect 99% of the time that the only valid slot would > be for the process that has been just signaled. > > The sender process will then calculate the response message, update the > result_code in the shared memory segment and finally send the message > through the queue. If the receiver has since detached we get a detached > result code and bail out. > > Clearing the slot after receiving the message should be the requesting > process' responsibility. This way the receiver only writes to the slot and > the sender only reads from it. > > By the way, is it safe to assume atomic read/writes of dsm_handle > (uint32)? I would be surprised if not. > I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much. Pavel > > -- > Alex > >