Re: proposal: make NOTIFY list de-duplication optional
I wrote: > I think that we'd probably be better off fixing the root performance issue > than adding semantic complexity to bypass it. ... > Accordingly, I looked into making a hash table when there are more than > a small number of notifications pending, and attached is a lightly-tested > version of that. This seems to be more or less similar speed to the > existing code for up to 100 or so distinct notifies, but it soon pulls > away above that. I noticed that the cfbot was unhappy with this, because it (intentionally) changes the results of the async-notify isolation tests I added awhile ago. So here's an updated version that adjusts that test, and also changes the NOTIFY documentation to remove the old weasel wording about whether we de-dup or not. > I also noticed that as things stand, it costs us two or three pallocs to > construct a Notification event. It wouldn't be terribly hard to reduce > that to one palloc, and maybe it'd be worthwhile if we're thinking that > transactions with many many notifies are a case worth optimizing. > But I didn't do that here; it seems like a separable patch. I also did that, attached as the second patch below. This way ends up requiring us to palloc the Notification event and then pfree it again, if it turns out to be a dup. Despite that, it's faster than the first patch alone, and also faster than HEAD in every case I tried. Not much faster, if there's not a lot of dups, but as far as I can find there isn't any case where it loses compared to HEAD. Even with subtransactions, where in principle the time to merge subtransaction event lists into the parent transaction ought to cost us. You can't get that to matter unless the subtransaction had a lot of distinct events, and then HEAD hits its O(N^2) behavior inside the subxact. So I can't really see any reason not to commit these. That leaves the question of whether we want to continue pursuing the proposed feature for user control of de-duping. I'd tend to vote against, because it seems like semantic complexity we don't need. While the idea sounds straightforward, I think it isn't so much when you start to think hard about how notifies issued with and without "collapse" ought to interact. regards, tom lane diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index e0e125a..d7dcbea 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -94,9 +94,9 @@ NOTIFY channel [ , - If the same channel name is signaled multiple times from the same - transaction with identical payload strings, the - database server can decide to deliver a single notification only. + If the same channel name is signaled multiple times with identical + payload strings within the same transaction, only one instance of the + notification event is delivered to listeners. On the other hand, notifications with distinct payload strings will always be delivered as distinct notifications. Similarly, notifications from different transactions will never get folded into one notification. diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 6e9c580..3f5f054 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -135,6 +135,7 @@ #include "storage/sinval.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/hashutils.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/snapmgr.h" @@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */ /* * State for outbound notifies consists of a list of all channels+payloads - * NOTIFYed in the current transaction. We do not actually perform a NOTIFY - * until and unless the transaction commits. pendingNotifies is NIL if no - * NOTIFYs have been done in the current transaction. + * NOTIFYed in the current transaction. We do not actually perform a NOTIFY + * until and unless the transaction commits. pendingNotifies is NULL if no + * NOTIFYs have been done in the current (sub) transaction. + * + * We discard duplicate notify events issued in the same transaction. + * Hence, in addition to the list proper (which we need to track the order + * of the events, since we guarantee to deliver them in order), we build a + * hash table which we can probe to detect duplicates. Since building the + * hash table is somewhat expensive, we do so only once we have at least + * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction; + * before that we just scan the events linearly. * * The list is kept in CurTransactionContext. In subtransactions, each * subtransaction has its own list in its own CurTransactionContext, but - * successful subtransactions attach their lists to their parent's list. - * Failed subtransactions simply discard their lists. + * successful subtransactions add their entries to their parent's list. + * Failed subtransactions simply discard their
Re: proposal: make NOTIFY list de-duplication optional
=?UTF-8?Q?Filip_Rembia=C5=82kowski?= writes: > Still no hash table fallback is implemented, so this is *not* a > performance improvement. Only a little more flexibility. I think that we'd probably be better off fixing the root performance issue than adding semantic complexity to bypass it. Especially since, if you don't de-duplicate, that's going to cost you when it comes time to actually send the notifications, receive them, forward them to clients, and process them in the clients. Admittedly, if the app *knows* that it's generating non-duplicate events, maybe it'd be all right to skip checking that. But if we can make the check cheap, I'd just as soon keep it. Accordingly, I looked into making a hash table when there are more than a small number of notifications pending, and attached is a lightly-tested version of that. This seems to be more or less similar speed to the existing code for up to 100 or so distinct notifies, but it soon pulls away above that. A point that needs discussion is that this patch, unlike the existing code, *does* de-duplicate fully: any events generated by a subtransaction that duplicate events already emitted by a parent will get removed when the subxact is merged to its parent. I did this partly because we have to expend O(N) work to merge N subtransaction notifies in any case, now that we have to make new hashtable entries in the parent xact; so the old excuse that subxact-end processing is really cheap no longer applies. Also because the Assert(!found) assertions in the attached hash coding fall over if we cheat on this. If we really want to maintain exactly the old semantics here, we could relax the hashtable code to just ignore duplicate entries. But, per the argument above, de-duplication is a good thing so I'm inclined to keep it like this. I also noticed that as things stand, it costs us two or three pallocs to construct a Notification event. It wouldn't be terribly hard to reduce that to one palloc, and maybe it'd be worthwhile if we're thinking that transactions with many many notifies are a case worth optimizing. But I didn't do that here; it seems like a separable patch. I also thought for awhile about not having the hashtable as an auxiliary data structure, but making it the main data structure. We could preserve the required notification ordering information by threading the live hashtable entries into an slist, say. However, this would greatly increase the overhead for transactions with just one or a few distinct NOTIFY events, since we'd have to set up the hashtable at the first one. I think that's a common enough case that we shouldn't de-optimize it. A smaller objection is that such a data structure would absolutely commit us to de-duplication semantics, whereas the list plus separate hashtable can cope with not de-duping if someone persuades us that's sane. Thoughts? regards, tom lane diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 6e9c580..c21daa5 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -135,6 +135,7 @@ #include "storage/sinval.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/hashutils.h" #include "utils/memutils.h" #include "utils/ps_status.h" #include "utils/snapmgr.h" @@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */ /* * State for outbound notifies consists of a list of all channels+payloads - * NOTIFYed in the current transaction. We do not actually perform a NOTIFY - * until and unless the transaction commits. pendingNotifies is NIL if no - * NOTIFYs have been done in the current transaction. + * NOTIFYed in the current transaction. We do not actually perform a NOTIFY + * until and unless the transaction commits. pendingNotifies is NULL if no + * NOTIFYs have been done in the current (sub) transaction. + * + * We discard duplicate notify events issued in the same transaction. + * Hence, in addition to the list proper (which we need to track the order + * of the events, since we guarantee to deliver them in order), we build a + * hash table which we can probe to detect duplicates. Since building the + * hash table is somewhat expensive, we do so only once we have at least + * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction; + * before that we just scan the events linearly. * * The list is kept in CurTransactionContext. In subtransactions, each * subtransaction has its own list in its own CurTransactionContext, but - * successful subtransactions attach their lists to their parent's list. - * Failed subtransactions simply discard their lists. + * successful subtransactions add their entries to their parent's list. + * Failed subtransactions simply discard their lists. Since these lists + * are independent, there may be notify events in a subtransaction's list + * that duplicate events in some ancestor (sub) transaction; we get rid
Re: Re: proposal: make NOTIFY list de-duplication optional
Thank you. Here is my latest attempt, with actual syntax error handling. Also, the syntax is updated to what Tom Lane suggested in other thread (with another variant of the same thing, from Julien Demoor) NOTIFY [ ( option [, ...] ) ] channel [ , payload ] Still no hash table fallback is implemented, so this is *not* a performance improvement. Only a little more flexibility. On Sat, Mar 9, 2019 at 3:31 AM Thomas Munro wrote: > > On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski > wrote: > > See attached patch... I'm ready to work on so it can get merged in the next > > CF. > > Hi Filip, > > Seen on Travis: > > async... FAILED 126 ms > > Looks like the new error isn't being raised for invalid send mode? > (What kind of error message is "?" anyway? :-)) > > ERROR: channel name too long > -- Should fail. Invalid 3rd parameter > NOTIFY notify_async2, 'test', 'invalid'; > -ERROR: ? > NOTIFY notify_async2, 'test', true; > -ERROR: ? > --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands > NOTIFY notify_async2; > NOTIFY notify_async2, ''; > > -- > Thomas Munro > https://enterprisedb.com contrib/tcn/tcn.c | 2 +- doc/src/sgml/ref/notify.sgml| 64 +++-- src/backend/commands/async.c| 93 +++-- src/backend/nodes/copyfuncs.c | 7 +-- src/backend/nodes/equalfuncs.c | 7 +-- src/backend/nodes/outfuncs.c| 3 +- src/backend/nodes/readfuncs.c | 3 +- src/backend/parser/gram.y | 78 ++- src/backend/tcop/utility.c | 8 ++-- src/backend/utils/adt/ruleutils.c | 2 +- src/include/catalog/pg_proc.dat | 4 ++ src/include/commands/async.h| 4 +- src/include/nodes/parsenodes.h | 12 +++-- src/test/regress/expected/async.out | 21 + src/test/regress/sql/async.sql | 10 15 files changed, 257 insertions(+), 61 deletions(-) diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 5355a64c5e..b80337a5ce 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -161,7 +161,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\''); } -Async_Notify(channel, payload->data); +Async_Notify(channel, payload->data, true); } ReleaseSysCache(indexTuple); break; diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index e0e125a2a2..e06b00f1f3 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -21,7 +21,11 @@ PostgreSQL documentation -NOTIFY channel [ , payload ] +NOTIFY [ ( option [, ...] ) ] channel [ , payload ] + +where option can be one of: + +COLLAPSE [ boolean ] @@ -47,10 +51,10 @@ NOTIFY channel [ , - The information passed to the client for a notification event includes the - notification channel - name, the notifying session's server process PID, and the - payload string, which is an empty string if it has not been specified. + The information passed to the client for a notification event includes + the notification channel name, the notifying session's server process + PID, and the payload string, which is an empty string + if it has not been specified. @@ -92,21 +96,13 @@ NOTIFY channel [ , NOTIFY for real-time signaling should try to keep their transactions short. - - If the same channel name is signaled multiple times from the same - transaction with identical payload strings, the - database server can decide to deliver a single notification only. - On the other hand, notifications with distinct payload strings will - always be delivered as distinct notifications. Similarly, notifications from - different transactions will never get folded into one notification. Except for dropping later instances of duplicate notifications, NOTIFY guarantees that notifications from the same transaction get delivered in the order they were sent. It is also - guaranteed that messages from different transactions are delivered in - the order in which the transactions committed. + guaranteed that messages from different transactions are delivered + in the order in which the transactions committed. - It is common for a client that executes NOTIFY to be listening on the same notification channel itself. In that case @@ -147,6 +143,21 @@ NOTIFY channel [ , + +COLLAPSE + + + Controls collapsing of repeated messages. + + When set to on, notification will be skipped if + the same channel was already signaled with identical payload in the same + transaction block (finished with COMMIT, + END or SAVEPOINT). + + When set to off, duplicates will not be collapsed. + + + @@ -190,6 +201,13 @@ NOTIFY channel [ , NOTIFY command if you need to work with non-constant
Re: Re: proposal: make NOTIFY list de-duplication optional
On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski wrote: > See attached patch... I'm ready to work on so it can get merged in the next > CF. Hi Filip, Seen on Travis: async... FAILED 126 ms Looks like the new error isn't being raised for invalid send mode? (What kind of error message is "?" anyway? :-)) ERROR: channel name too long -- Should fail. Invalid 3rd parameter NOTIFY notify_async2, 'test', 'invalid'; -ERROR: ? NOTIFY notify_async2, 'test', true; -ERROR: ? --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands NOTIFY notify_async2; NOTIFY notify_async2, ''; -- Thomas Munro https://enterprisedb.com
Re: Re: proposal: make NOTIFY list de-duplication optional
Thanks for all the input. Finally I found time and motivation to revive this. See attached patch... I'm ready to work on so it can get merged in the next CF. On Thu, Mar 17, 2016 at 12:44 AM David Steele wrote: > > On 3/11/16 1:46 PM, David Steele wrote: > > Hi Filip, > > > > On 2/20/16 8:00 AM, Filip Rembiałkowski wrote: > >> On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob >> On 2/9/16, Tom Lane mailto:t...@sss.pgh.pa.us>> > >> wrote: > >> > FWIW, I think it would be a good thing if the NOTIFY statement > >> syntax were > >> > not remarkably different from the syntax used in the pg_notify() > >> function > >> > call. To do otherwise would certainly be confusing. So on the > >> whole > >> > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option. > >> > >> Filip, do you agree with Tom's proposal? Do you plan to rework the > >> patch on these lines? If you are I'll try to review it, if not I could > >> give it a shot as I'm interested in having this in 9.6. > >> > >> I see that Tom's remarks give more flexibility, and your refinement > >> makes sense. > > > > It looks like we are waiting on a new patch from you before this can be > > reviewed. Are you close to having that done? > > > > Meanwhile, I have marked it "Waiting on Author". > > Since there has been no activity on this thread since before the CF and > no response from the author I have marked this "returned with feedback". > Please feel free to resubmit for 9.7! > > -- > -David > da...@pgmasters.net contrib/tcn/tcn.c | 2 +- doc/src/sgml/ref/notify.sgml| 63 + src/backend/commands/async.c| 69 +++-- src/backend/nodes/copyfuncs.c | 7 ++-- src/backend/nodes/equalfuncs.c | 7 ++-- src/backend/nodes/outfuncs.c| 3 +- src/backend/nodes/readfuncs.c | 3 +- src/backend/parser/gram.y | 44 +-- src/backend/tcop/utility.c | 8 ++--- src/backend/utils/adt/ruleutils.c | 4 ++- src/include/catalog/pg_proc.dat | 4 +++ src/include/commands/async.h| 8 - src/include/nodes/parsenodes.h | 18 +++--- src/test/regress/expected/async.out | 20 +++ src/test/regress/sql/async.sql | 9 + 15 files changed, 212 insertions(+), 57 deletions(-) diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 5355a64c5e..196e0989a4 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -161,7 +161,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\''); } -Async_Notify(channel, payload->data); +Async_Notify(channel, payload->data, NOTIFY_SEND_UNIQUE); } ReleaseSysCache(indexTuple); break; diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index e0e125a2a2..ad0795fd2d 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -NOTIFY channel [ , payload ] +NOTIFY channel [ , payload [ , send_mode ] ] @@ -47,10 +47,10 @@ NOTIFY channel [ , - The information passed to the client for a notification event includes the - notification channel - name, the notifying session's server process PID, and the - payload string, which is an empty string if it has not been specified. + The information passed to the client for a notification event includes + the notification channel name, the notifying session's server process + PID, and the payload string, which is an empty string + if it has not been specified. @@ -92,21 +92,13 @@ NOTIFY channel [ , NOTIFY for real-time signaling should try to keep their transactions short. - - If the same channel name is signaled multiple times from the same - transaction with identical payload strings, the - database server can decide to deliver a single notification only. - On the other hand, notifications with distinct payload strings will - always be delivered as distinct notifications. Similarly, notifications from - different transactions will never get folded into one notification. Except for dropping later instances of duplicate notifications, NOTIFY guarantees that notifications from the same transaction get delivered in the order they were sent. It is also - guaranteed that messages from different transactions are delivered in - the order in which the transactions committed. + guaranteed that messages from different transactions are delivered + in the order in which the transactions committed. - It is common for a client that executes NOTIFY to be listening on the same notification channel itself. In that case @@ -147,6 +139,23 @@ NOTIFY channel [ , + +send_mode + + + Controls collapsing of repeated messages. Can be either + 'unique' (the default) or