Re: proposal: make NOTIFY list de-duplication optional

2019-08-14 Thread Tom Lane
I wrote:
> I think that we'd probably be better off fixing the root performance issue
> than adding semantic complexity to bypass it. ...
> Accordingly, I looked into making a hash table when there are more than
> a small number of notifications pending, and attached is a lightly-tested
> version of that.  This seems to be more or less similar speed to the
> existing code for up to 100 or so distinct notifies, but it soon pulls
> away above that.

I noticed that the cfbot was unhappy with this, because it (intentionally)
changes the results of the async-notify isolation tests I added awhile
ago.  So here's an updated version that adjusts that test, and also
changes the NOTIFY documentation to remove the old weasel wording about
whether we de-dup or not.

> I also noticed that as things stand, it costs us two or three pallocs to
> construct a Notification event.  It wouldn't be terribly hard to reduce
> that to one palloc, and maybe it'd be worthwhile if we're thinking that
> transactions with many many notifies are a case worth optimizing.
> But I didn't do that here; it seems like a separable patch.

I also did that, attached as the second patch below.  This way ends up
requiring us to palloc the Notification event and then pfree it again,
if it turns out to be a dup.  Despite that, it's faster than the first
patch alone, and also faster than HEAD in every case I tried.  Not
much faster, if there's not a lot of dups, but as far as I can find
there isn't any case where it loses compared to HEAD.  Even with
subtransactions, where in principle the time to merge subtransaction
event lists into the parent transaction ought to cost us.  You can't
get that to matter unless the subtransaction had a lot of distinct
events, and then HEAD hits its O(N^2) behavior inside the subxact.

So I can't really see any reason not to commit these.

That leaves the question of whether we want to continue pursuing
the proposed feature for user control of de-duping.  I'd tend
to vote against, because it seems like semantic complexity we
don't need.  While the idea sounds straightforward, I think it
isn't so much when you start to think hard about how notifies
issued with and without "collapse" ought to interact.

regards, tom lane

diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index e0e125a..d7dcbea 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -94,9 +94,9 @@ NOTIFY channel [ , 
 
   
-   If the same channel name is signaled multiple times from the same
-   transaction with identical payload strings, the
-   database server can decide to deliver a single notification only.
+   If the same channel name is signaled multiple times with identical
+   payload strings within the same transaction, only one instance of the
+   notification event is delivered to listeners.
On the other hand, notifications with distinct payload strings will
always be delivered as distinct notifications. Similarly, notifications from
different transactions will never get folded into one notification.
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 6e9c580..3f5f054 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -135,6 +135,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/hashutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */
 
 /*
  * State for outbound notifies consists of a list of all channels+payloads
- * NOTIFYed in the current transaction. We do not actually perform a NOTIFY
- * until and unless the transaction commits.  pendingNotifies is NIL if no
- * NOTIFYs have been done in the current transaction.
+ * NOTIFYed in the current transaction.  We do not actually perform a NOTIFY
+ * until and unless the transaction commits.  pendingNotifies is NULL if no
+ * NOTIFYs have been done in the current (sub) transaction.
+ *
+ * We discard duplicate notify events issued in the same transaction.
+ * Hence, in addition to the list proper (which we need to track the order
+ * of the events, since we guarantee to deliver them in order), we build a
+ * hash table which we can probe to detect duplicates.  Since building the
+ * hash table is somewhat expensive, we do so only once we have at least
+ * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction;
+ * before that we just scan the events linearly.
  *
  * The list is kept in CurTransactionContext.  In subtransactions, each
  * subtransaction has its own list in its own CurTransactionContext, but
- * successful subtransactions attach their lists to their parent's list.
- * Failed subtransactions simply discard their lists.
+ * successful subtransactions add their entries to their parent's list.
+ * Failed subtransactions simply discard their 

Re: proposal: make NOTIFY list de-duplication optional

2019-07-26 Thread Tom Lane
=?UTF-8?Q?Filip_Rembia=C5=82kowski?=  writes:
> Still no hash table fallback is implemented, so this is *not* a
> performance improvement. Only a little more flexibility.

I think that we'd probably be better off fixing the root performance issue
than adding semantic complexity to bypass it.  Especially since, if you
don't de-duplicate, that's going to cost you when it comes time to
actually send the notifications, receive them, forward them to clients,
and process them in the clients.

Admittedly, if the app *knows* that it's generating non-duplicate events,
maybe it'd be all right to skip checking that.  But if we can make the
check cheap, I'd just as soon keep it.

Accordingly, I looked into making a hash table when there are more than
a small number of notifications pending, and attached is a lightly-tested
version of that.  This seems to be more or less similar speed to the
existing code for up to 100 or so distinct notifies, but it soon pulls
away above that.

A point that needs discussion is that this patch, unlike the existing
code, *does* de-duplicate fully: any events generated by a subtransaction
that duplicate events already emitted by a parent will get removed when
the subxact is merged to its parent.  I did this partly because we have
to expend O(N) work to merge N subtransaction notifies in any case,
now that we have to make new hashtable entries in the parent xact;
so the old excuse that subxact-end processing is really cheap no
longer applies.  Also because the Assert(!found) assertions in the
attached hash coding fall over if we cheat on this.  If we really want
to maintain exactly the old semantics here, we could relax the hashtable
code to just ignore duplicate entries.  But, per the argument above,
de-duplication is a good thing so I'm inclined to keep it like this.

I also noticed that as things stand, it costs us two or three pallocs to
construct a Notification event.  It wouldn't be terribly hard to reduce
that to one palloc, and maybe it'd be worthwhile if we're thinking that
transactions with many many notifies are a case worth optimizing.
But I didn't do that here; it seems like a separable patch.

I also thought for awhile about not having the hashtable as an auxiliary
data structure, but making it the main data structure.  We could preserve
the required notification ordering information by threading the live
hashtable entries into an slist, say.  However, this would greatly
increase the overhead for transactions with just one or a few distinct
NOTIFY events, since we'd have to set up the hashtable at the first one.
I think that's a common enough case that we shouldn't de-optimize it.
A smaller objection is that such a data structure would absolutely commit
us to de-duplication semantics, whereas the list plus separate hashtable
can cope with not de-duping if someone persuades us that's sane.

Thoughts?

regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 6e9c580..c21daa5 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -135,6 +135,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/hashutils.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -323,14 +324,25 @@ static List *upperPendingActions = NIL; /* list of upper-xact lists */
 
 /*
  * State for outbound notifies consists of a list of all channels+payloads
- * NOTIFYed in the current transaction. We do not actually perform a NOTIFY
- * until and unless the transaction commits.  pendingNotifies is NIL if no
- * NOTIFYs have been done in the current transaction.
+ * NOTIFYed in the current transaction.  We do not actually perform a NOTIFY
+ * until and unless the transaction commits.  pendingNotifies is NULL if no
+ * NOTIFYs have been done in the current (sub) transaction.
+ *
+ * We discard duplicate notify events issued in the same transaction.
+ * Hence, in addition to the list proper (which we need to track the order
+ * of the events, since we guarantee to deliver them in order), we build a
+ * hash table which we can probe to detect duplicates.  Since building the
+ * hash table is somewhat expensive, we do so only once we have at least
+ * MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction;
+ * before that we just scan the events linearly.
  *
  * The list is kept in CurTransactionContext.  In subtransactions, each
  * subtransaction has its own list in its own CurTransactionContext, but
- * successful subtransactions attach their lists to their parent's list.
- * Failed subtransactions simply discard their lists.
+ * successful subtransactions add their entries to their parent's list.
+ * Failed subtransactions simply discard their lists.  Since these lists
+ * are independent, there may be notify events in a subtransaction's list
+ * that duplicate events in some ancestor (sub) transaction; we get rid 

Re: Re: proposal: make NOTIFY list de-duplication optional

2019-03-10 Thread Filip Rembiałkowski
Thank you.

Here is my latest attempt, with actual syntax error handling.

Also,  the syntax is updated to what Tom Lane suggested in other
thread (with another variant of the same thing, from Julien Demoor)
NOTIFY [ ( option [, ...] ) ] channel [ , payload ]

Still no hash table fallback is implemented, so this is *not* a
performance improvement. Only a little more flexibility.












On Sat, Mar 9, 2019 at 3:31 AM Thomas Munro  wrote:
>
> On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski
>  wrote:
> > See attached patch... I'm ready to work on so it can get merged in the next 
> > CF.
>
> Hi Filip,
>
> Seen on Travis:
>
>  async... FAILED  126 ms
>
> Looks like the new error isn't being raised for invalid send mode?
> (What kind of error message is "?" anyway? :-))
>
>  ERROR:  channel name too long
>  -- Should fail. Invalid 3rd parameter
>  NOTIFY notify_async2, 'test', 'invalid';
> -ERROR:  ?
>  NOTIFY notify_async2, 'test', true;
> -ERROR:  ?
>  --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands
>  NOTIFY notify_async2;
>  NOTIFY notify_async2, '';
>
> --
> Thomas Munro
> https://enterprisedb.com
 contrib/tcn/tcn.c   |  2 +-
 doc/src/sgml/ref/notify.sgml| 64 +++--
 src/backend/commands/async.c| 93 +++--
 src/backend/nodes/copyfuncs.c   |  7 +--
 src/backend/nodes/equalfuncs.c  |  7 +--
 src/backend/nodes/outfuncs.c|  3 +-
 src/backend/nodes/readfuncs.c   |  3 +-
 src/backend/parser/gram.y   | 78 ++-
 src/backend/tcop/utility.c  |  8 ++--
 src/backend/utils/adt/ruleutils.c   |  2 +-
 src/include/catalog/pg_proc.dat |  4 ++
 src/include/commands/async.h|  4 +-
 src/include/nodes/parsenodes.h  | 12 +++--
 src/test/regress/expected/async.out | 21 +
 src/test/regress/sql/async.sql  | 10 
 15 files changed, 257 insertions(+), 61 deletions(-)

diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 5355a64c5e..b80337a5ce 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -161,7 +161,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 	strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\'');
 }
 
-Async_Notify(channel, payload->data);
+Async_Notify(channel, payload->data, true);
 			}
 			ReleaseSysCache(indexTuple);
 			break;
diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index e0e125a2a2..e06b00f1f3 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-NOTIFY channel [ , payload ]
+NOTIFY [ ( option [, ...] ) ] channel [ , payload ]
+
+where option can be one of:
+
+COLLAPSE [ boolean ]
 
  
 
@@ -47,10 +51,10 @@ NOTIFY channel [ , 
 
   
-   The information passed to the client for a notification event includes the
-   notification channel
-   name, the notifying session's server process PID, and the
-   payload string, which is an empty string if it has not been specified.
+   The information passed to the client for a notification event includes
+   the notification channel name, the notifying session's server process
+   PID, and the payload string, which is an empty string
+   if it has not been specified.
   
 
   
@@ -92,21 +96,13 @@ NOTIFY channel [ , NOTIFY for real-time signaling
should try to keep their transactions short.
   
-
   
-   If the same channel name is signaled multiple times from the same
-   transaction with identical payload strings, the
-   database server can decide to deliver a single notification only.
-   On the other hand, notifications with distinct payload strings will
-   always be delivered as distinct notifications. Similarly, notifications from
-   different transactions will never get folded into one notification.
Except for dropping later instances of duplicate notifications,
NOTIFY guarantees that notifications from the same
transaction get delivered in the order they were sent.  It is also
-   guaranteed that messages from different transactions are delivered in
-   the order in which the transactions committed.
+   guaranteed that messages from different transactions are delivered
+   in the order in which the transactions committed.
   
-
   
It is common for a client that executes NOTIFY
to be listening on the same notification channel itself.  In that case
@@ -147,6 +143,21 @@ NOTIFY channel [ , 
 

+   
+COLLAPSE
+
+ 
+  Controls collapsing of repeated messages.
+
+  When set to on, notification will be skipped if
+  the same channel was already signaled with identical payload in the same
+  transaction block (finished with COMMIT,
+  END or SAVEPOINT).
+  
+  When set to off, duplicates will not be collapsed.
+ 
+
+   
   
  
 
@@ -190,6 +201,13 @@ NOTIFY channel [ , NOTIFY command if you need to work with
 non-constant 

Re: Re: proposal: make NOTIFY list de-duplication optional

2019-03-08 Thread Thomas Munro
On Fri, Mar 8, 2019 at 1:37 PM Filip Rembiałkowski
 wrote:
> See attached patch... I'm ready to work on so it can get merged in the next 
> CF.

Hi Filip,

Seen on Travis:

 async... FAILED  126 ms

Looks like the new error isn't being raised for invalid send mode?
(What kind of error message is "?" anyway? :-))

 ERROR:  channel name too long
 -- Should fail. Invalid 3rd parameter
 NOTIFY notify_async2, 'test', 'invalid';
-ERROR:  ?
 NOTIFY notify_async2, 'test', true;
-ERROR:  ?
 --Should work. Valid NOTIFY/LISTEN/UNLISTEN commands
 NOTIFY notify_async2;
 NOTIFY notify_async2, '';

-- 
Thomas Munro
https://enterprisedb.com



Re: Re: proposal: make NOTIFY list de-duplication optional

2019-03-07 Thread Filip Rembiałkowski
Thanks for all the input.

Finally I found time and motivation to revive this.

See attached patch... I'm ready to work on so it can get merged in the next CF.


On Thu, Mar 17, 2016 at 12:44 AM David Steele  wrote:
>
> On 3/11/16 1:46 PM, David Steele wrote:
> > Hi Filip,
> >
> > On 2/20/16 8:00 AM, Filip Rembiałkowski wrote:
> >> On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob  >>  On 2/9/16, Tom Lane mailto:t...@sss.pgh.pa.us>>
> >>  wrote:
> >>  > FWIW, I think it would be a good thing if the NOTIFY statement 
> >> syntax were
> >>  > not remarkably different from the syntax used in the pg_notify() 
> >> function
> >>  > call.  To do otherwise would certainly be confusing.  So on the 
> >> whole
> >>  > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option.
> >>
> >>  Filip, do you agree with Tom's proposal? Do you plan to rework the
> >>  patch on these lines? If you are I'll try to review it, if not I could
> >>  give it a shot as I'm interested in having this in 9.6.
> >>
> >> I see that Tom's remarks give more flexibility, and your refinement
> >> makes sense.
> >
> > It looks like we are waiting on a new patch from you before this can be
> > reviewed.  Are you close to having that done?
> >
> > Meanwhile, I have marked it "Waiting on Author".
>
> Since there has been no activity on this thread since before the CF and
> no response from the author I have marked this "returned with feedback".
> Please feel free to resubmit for 9.7!
>
> --
> -David
> da...@pgmasters.net
 contrib/tcn/tcn.c   |  2 +-
 doc/src/sgml/ref/notify.sgml| 63 +
 src/backend/commands/async.c| 69 +++--
 src/backend/nodes/copyfuncs.c   |  7 ++--
 src/backend/nodes/equalfuncs.c  |  7 ++--
 src/backend/nodes/outfuncs.c|  3 +-
 src/backend/nodes/readfuncs.c   |  3 +-
 src/backend/parser/gram.y   | 44 +--
 src/backend/tcop/utility.c  |  8 ++---
 src/backend/utils/adt/ruleutils.c   |  4 ++-
 src/include/catalog/pg_proc.dat |  4 +++
 src/include/commands/async.h|  8 -
 src/include/nodes/parsenodes.h  | 18 +++---
 src/test/regress/expected/async.out | 20 +++
 src/test/regress/sql/async.sql  |  9 +
 15 files changed, 212 insertions(+), 57 deletions(-)

diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 5355a64c5e..196e0989a4 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -161,7 +161,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 	strcpy_quoted(payload, SPI_getvalue(trigtuple, tupdesc, colno), '\'');
 }
 
-Async_Notify(channel, payload->data);
+Async_Notify(channel, payload->data, NOTIFY_SEND_UNIQUE);
 			}
 			ReleaseSysCache(indexTuple);
 			break;
diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index e0e125a2a2..ad0795fd2d 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-NOTIFY channel [ , payload ]
+NOTIFY channel [ , payload [ , send_mode ] ]
 
  
 
@@ -47,10 +47,10 @@ NOTIFY channel [ , 
 
   
-   The information passed to the client for a notification event includes the
-   notification channel
-   name, the notifying session's server process PID, and the
-   payload string, which is an empty string if it has not been specified.
+   The information passed to the client for a notification event includes
+   the notification channel name, the notifying session's server process
+   PID, and the payload string, which is an empty string
+   if it has not been specified.
   
 
   
@@ -92,21 +92,13 @@ NOTIFY channel [ , NOTIFY for real-time signaling
should try to keep their transactions short.
   
-
   
-   If the same channel name is signaled multiple times from the same
-   transaction with identical payload strings, the
-   database server can decide to deliver a single notification only.
-   On the other hand, notifications with distinct payload strings will
-   always be delivered as distinct notifications. Similarly, notifications from
-   different transactions will never get folded into one notification.
Except for dropping later instances of duplicate notifications,
NOTIFY guarantees that notifications from the same
transaction get delivered in the order they were sent.  It is also
-   guaranteed that messages from different transactions are delivered in
-   the order in which the transactions committed.
+   guaranteed that messages from different transactions are delivered
+   in the order in which the transactions committed.
   
-
   
It is common for a client that executes NOTIFY
to be listening on the same notification channel itself.  In that case
@@ -147,6 +139,23 @@ NOTIFY channel [ , 
 

+   
+send_mode
+
+ 
+  Controls collapsing of repeated messages. Can be either
+  'unique' (the default) or