We still need to decide what to do with queue full situations in the proposed listen/notify implementation. I have a new version of the patch to allow for a variable payload size. However, the whole notification must fit into one page so the payload needs to be less than 8K.
I have also added the XID, so that we can write to the queue before committing to clog which allows for rollback if we encounter write errors (disk full for example). Especially the implications of this change make the patch a lot more complicated. The queue is slru-based, slru uses int page numbers, so we can use up to 2147483647 (INT_MAX) pages with some small changes in slru.c. When do we have a full queue? Well, the idea is that notifications are written to the queue and that they are read as soon as the notifying transaction commits. Only if a listening backend is busy, it won't read the notifications and so it won't update its pointer for some time. With the current space we can acommodate at least 2147483647 notifications or more, depending on the payload length. That gives us something in between of 214 GB (100 Bytes per notification) and 17 TB (8000 Bytes per notification). So in order to have a full queue, we need to generate that amount of notifications while one backend is still busy and is not reading the accumulating notifications. In general chances are not too high that anyone will ever have a full notification queue, but we need to define the behavior anyway... These are the solutions that I currently see: 1) drop new notifications if the queue is full (silently or with rollback) 2) block until readers catch up (what if the backend that tries to write the notifications actually is the "lazy" reader that everybody is waiting for to proceed?) 3) invent a new signal reason and send SIGUSR1 to the "lazy" readers, they need to interrupt whatever they are doing and copy the notifications into their own address space (without delivering the notifications since they are in a transaction at that moment). For 1) there can be warnings way ahead of when the queue is actually full, like one when it is 50% full, another one when it is 75% full and so on and they could point to the backend that is most behind in reading notifications... I think that 2) is the least practical approach. If there is a pile of at least 2,147,483,647 notifications, then a backend hasn't read the notifications for a long long time... Chances are low that it will read them within the next few seconds. In a sense 2) implies 3) for the special case that the writing backend is the one that everybody is waiting for to proceed reading notifications, in the end this backend is waiting for itself. For 3) the question is if we can just invent a new signal reason PROCSIG_NOTIFYCOPY_INTERRUPT or similar and upon reception the backend copies the notification data to its private address space? Would this function be called by every backend after at most a few seconds even if it is processing a long running query? Admittedly, once 3) is in place we can also put a smaller queue into shared memory and remove the slru thing alltogether but we need to be sure that we can interrupt the backends at any time since the queue size would be a lot smaller than 200 GB... Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers