Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
On Sat, Feb 6, 2016 at 6:03 AM, Filip Rembiałkowskiwrote: > Thanks for the feedback. > > This patch is my first and obvious approach. > > @Merlin, I'm not sure if I get your idea: > - keep previous behaviour as obligatory? (which is: automatic > de-duplicating of incoming messages by channel+payload), > - instead of trivial search (sorting by browsing) use some kind of > faster lookups? > > I'm not sure if this statement in async.c is carved in stone: > > * Duplicate notifications from the same transaction are sent out as one > * notification only. This is done to save work when for example a trigger > * on a 2 million row table fires a notification for each row that has been > * changed. If the application needs to receive every single notification > * that has been sent, it can easily add some unique string into the extra > * payload parameter. > > 1) "work-saving" is disputable in some cases > > 2) an idea to "add some unique string" is OK logical-wise but it's not > OK performance-wise. > > Current search code is a sequential search: > https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139 > > I'm not that smart to devise an algorithm for faster lookups - > probably you guys can give some advice. > > Again, my rationale is... This feature can burn a lot of CPU for > nothing. I was hoping to use NOTIFY/LISTEN as superfast notification > mechanism. Superfast regardless on whether you insert 100, 10k or 1m > rows. Sure, I get it -- you want to have fast notification events -- this is a good thing to want to have. However, a GUC is probably not the best way to do that in this particular case. It's way to fringey and the bar for behavior controlling GUC is incredibly high (short version: most modern introductions were to manage security issues). I'm far from the last word on this thoug, but it's better to get this all sorted out now. Anyways, it should be possible to micro-optimize that path. Perhaps using a hash table? I'm not sure. Another possible way to work things out here is to expose your switch in the syntax of the command itself, or perhaps via the pg_notify function to avoid syntax issues. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
On Mon, Feb 8, 2016 at 8:35 AM, Merlin Moncurewrote: > On Sat, Feb 6, 2016 at 6:03 AM, Filip Rembiałkowski > wrote: >> Thanks for the feedback. >> >> This patch is my first and obvious approach. >> >> @Merlin, I'm not sure if I get your idea: >> - keep previous behaviour as obligatory? (which is: automatic >> de-duplicating of incoming messages by channel+payload), >> - instead of trivial search (sorting by browsing) use some kind of >> faster lookups? >> >> I'm not sure if this statement in async.c is carved in stone: >> >> * Duplicate notifications from the same transaction are sent out as one >> * notification only. This is done to save work when for example a trigger >> * on a 2 million row table fires a notification for each row that has been >> * changed. If the application needs to receive every single notification >> * that has been sent, it can easily add some unique string into the extra >> * payload parameter. >> >> 1) "work-saving" is disputable in some cases >> >> 2) an idea to "add some unique string" is OK logical-wise but it's not >> OK performance-wise. >> >> Current search code is a sequential search: >> https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139 >> >> I'm not that smart to devise an algorithm for faster lookups - >> probably you guys can give some advice. >> >> Again, my rationale is... This feature can burn a lot of CPU for >> nothing. I was hoping to use NOTIFY/LISTEN as superfast notification >> mechanism. Superfast regardless on whether you insert 100, 10k or 1m >> rows. > > Sure, I get it -- you want to have fast notification events -- this is > a good thing to want to have. However, a GUC is probably not the best > way to do that in this particular case. It's way to fringey and the > bar for behavior controlling GUC is incredibly high (short version: > most modern introductions were to manage security issues). I'm far > from the last word on this thoug, but it's better to get this all > sorted out now. > > Anyways, it should be possible to micro-optimize that path. Perhaps > using a hash table? I'm not sure. > > Another possible way to work things out here is to expose your switch > in the syntax of the command itself, or perhaps via the pg_notify > function to avoid syntax issues. whoops, I just noticed this thread moved to -hackers -- so please respond there. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
Thanks for the feedback. This patch is my first and obvious approach. @Merlin, I'm not sure if I get your idea: - keep previous behaviour as obligatory? (which is: automatic de-duplicating of incoming messages by channel+payload), - instead of trivial search (sorting by browsing) use some kind of faster lookups? I'm not sure if this statement in async.c is carved in stone: * Duplicate notifications from the same transaction are sent out as one * notification only. This is done to save work when for example a trigger * on a 2 million row table fires a notification for each row that has been * changed. If the application needs to receive every single notification * that has been sent, it can easily add some unique string into the extra * payload parameter. 1) "work-saving" is disputable in some cases 2) an idea to "add some unique string" is OK logical-wise but it's not OK performance-wise. Current search code is a sequential search: https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139 I'm not that smart to devise an algorithm for faster lookups - probably you guys can give some advice. Again, my rationale is... This feature can burn a lot of CPU for nothing. I was hoping to use NOTIFY/LISTEN as superfast notification mechanism. Superfast regardless on whether you insert 100, 10k or 1m rows. On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncurewrote: > On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski > wrote: >> patch submitted on -hackers list. >> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com >> >> results after the patch: >> >> trigger= BEGIN RETURN NULL; END >> rows=4 >> 228ms COPY test.tab FROM '/tmp/test.dat' >> 205ms COPY test.tab FROM '/tmp/test.dat' >> rows=8 >> 494ms COPY test.tab FROM '/tmp/test.dat' >> 395ms COPY test.tab FROM '/tmp/test.dat' >> rows=12 >> 678ms COPY test.tab FROM '/tmp/test.dat' >> 652ms COPY test.tab FROM '/tmp/test.dat' >> rows=16 >> 956ms COPY test.tab FROM '/tmp/test.dat' >> 822ms COPY test.tab FROM '/tmp/test.dat' >> rows=20 >> 1184ms COPY test.tab FROM '/tmp/test.dat' >> 1072ms COPY test.tab FROM '/tmp/test.dat' >> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END >> rows=4 >> 440ms COPY test.tab FROM '/tmp/test.dat' >> 406ms COPY test.tab FROM '/tmp/test.dat' >> rows=8 >> 887ms COPY test.tab FROM '/tmp/test.dat' >> 769ms COPY test.tab FROM '/tmp/test.dat' >> rows=12 >> 1346ms COPY test.tab FROM '/tmp/test.dat' >> 1171ms COPY test.tab FROM '/tmp/test.dat' >> rows=16 >> 1710ms COPY test.tab FROM '/tmp/test.dat' >> 1709ms COPY test.tab FROM '/tmp/test.dat' >> rows=20 >> 2189ms COPY test.tab FROM '/tmp/test.dat' >> 2206ms COPY test.tab FROM '/tmp/test.dat' > > I'm not so sure that this is a great idea. Generally, we tend to > discourage GUCs that control behavior at the SQL level. Are you 100% > certain that there is no path to optimizing this case without changing > behvior? > > merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
patch submitted on -hackers list. http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com results after the patch: trigger= BEGIN RETURN NULL; END rows=4 228ms COPY test.tab FROM '/tmp/test.dat' 205ms COPY test.tab FROM '/tmp/test.dat' rows=8 494ms COPY test.tab FROM '/tmp/test.dat' 395ms COPY test.tab FROM '/tmp/test.dat' rows=12 678ms COPY test.tab FROM '/tmp/test.dat' 652ms COPY test.tab FROM '/tmp/test.dat' rows=16 956ms COPY test.tab FROM '/tmp/test.dat' 822ms COPY test.tab FROM '/tmp/test.dat' rows=20 1184ms COPY test.tab FROM '/tmp/test.dat' 1072ms COPY test.tab FROM '/tmp/test.dat' trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END rows=4 440ms COPY test.tab FROM '/tmp/test.dat' 406ms COPY test.tab FROM '/tmp/test.dat' rows=8 887ms COPY test.tab FROM '/tmp/test.dat' 769ms COPY test.tab FROM '/tmp/test.dat' rows=12 1346ms COPY test.tab FROM '/tmp/test.dat' 1171ms COPY test.tab FROM '/tmp/test.dat' rows=16 1710ms COPY test.tab FROM '/tmp/test.dat' 1709ms COPY test.tab FROM '/tmp/test.dat' rows=20 2189ms COPY test.tab FROM '/tmp/test.dat' 2206ms COPY test.tab FROM '/tmp/test.dat' On Fri, Feb 5, 2016 at 1:45 PM, Filip Rembiałkowski < filip.rembialkow...@gmail.com> wrote: > On Thu, Feb 4, 2016 at 11:41 PM, Tom Lanewrote: > >> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= >> writes: >> > A table has a trigger. >> > The trigger sends a NOTIFY. >> > Test with COPY FROM shows non-linear correlation between number of >> inserted >> > rows and COPY duration. >> >> No surprise, see AsyncExistsPendingNotify. You would have a lot of other >> performance issues with sending hundreds of thousands of distinct notify >> events from one transaction anyway, so I can't get terribly excited about >> this. >> > > > What kind of issues? Do you mean, problems in postgres or problems in > client? > > Is there an additional non-linear cost on COMMIT (extra to the cost I > already showed)? > > The 8GB internal queue (referenced in a Note at > http://www.postgresql.org/docs/current/static/sql-notify.html) should be > able to keep ~ 1E8 such notifications (assumed one notification will fit in > 80 bytes). > > On client side, this seems legit - the LISTENer deamon will collect these > notifications and process them in line. > There might be no LISTENer running at all. > > Still, the main problem I get with this approach is quadratic cost on big > insert transactions. > I wonder if this behavior is possible to change in future postgres > versions. And how much programming work does it require. > > Is duplicate-elimination a fundamental, non-negotiable requirement? > > > > Thank you, > Filip > >
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
Tom Lanewrites: > No surprise, see AsyncExistsPendingNotify. You would have a lot of other > performance issues with sending hundreds of thousands of distinct notify > events from one transaction anyway, so I can't get terribly excited about > this. @Filip: you probably want a per-statement trigger rather than a per-row trigger: insert all rows with COPY, then send one notification. You have to mark the new rows somehow yourself; unfortunately PostgreSQL has no way to tell them in a statement trigger. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowskiwrote: > patch submitted on -hackers list. > http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com > > results after the patch: > > trigger= BEGIN RETURN NULL; END > rows=4 > 228ms COPY test.tab FROM '/tmp/test.dat' > 205ms COPY test.tab FROM '/tmp/test.dat' > rows=8 > 494ms COPY test.tab FROM '/tmp/test.dat' > 395ms COPY test.tab FROM '/tmp/test.dat' > rows=12 > 678ms COPY test.tab FROM '/tmp/test.dat' > 652ms COPY test.tab FROM '/tmp/test.dat' > rows=16 > 956ms COPY test.tab FROM '/tmp/test.dat' > 822ms COPY test.tab FROM '/tmp/test.dat' > rows=20 > 1184ms COPY test.tab FROM '/tmp/test.dat' > 1072ms COPY test.tab FROM '/tmp/test.dat' > trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END > rows=4 > 440ms COPY test.tab FROM '/tmp/test.dat' > 406ms COPY test.tab FROM '/tmp/test.dat' > rows=8 > 887ms COPY test.tab FROM '/tmp/test.dat' > 769ms COPY test.tab FROM '/tmp/test.dat' > rows=12 > 1346ms COPY test.tab FROM '/tmp/test.dat' > 1171ms COPY test.tab FROM '/tmp/test.dat' > rows=16 > 1710ms COPY test.tab FROM '/tmp/test.dat' > 1709ms COPY test.tab FROM '/tmp/test.dat' > rows=20 > 2189ms COPY test.tab FROM '/tmp/test.dat' > 2206ms COPY test.tab FROM '/tmp/test.dat' I'm not so sure that this is a great idea. Generally, we tend to discourage GUCs that control behavior at the SQL level. Are you 100% certain that there is no path to optimizing this case without changing behvior? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
On Thu, Feb 4, 2016 at 11:41 PM, Tom Lanewrote: > =?UTF-8?Q?Filip_Rembia=C5=82kowski?= > writes: > > A table has a trigger. > > The trigger sends a NOTIFY. > > Test with COPY FROM shows non-linear correlation between number of > inserted > > rows and COPY duration. > > No surprise, see AsyncExistsPendingNotify. You would have a lot of other > performance issues with sending hundreds of thousands of distinct notify > events from one transaction anyway, so I can't get terribly excited about > this. > What kind of issues? Do you mean, problems in postgres or problems in client? Is there an additional non-linear cost on COMMIT (extra to the cost I already showed)? The 8GB internal queue (referenced in a Note at http://www.postgresql.org/docs/current/static/sql-notify.html) should be able to keep ~ 1E8 such notifications (assumed one notification will fit in 80 bytes). On client side, this seems legit - the LISTENer deamon will collect these notifications and process them in line. There might be no LISTENer running at all. Still, the main problem I get with this approach is quadratic cost on big insert transactions. I wonder if this behavior is possible to change in future postgres versions. And how much programming work does it require. Is duplicate-elimination a fundamental, non-negotiable requirement? Thank you, Filip
Re: [PERFORM] bad COPY performance with NOTIFY in a trigger
=?UTF-8?Q?Filip_Rembia=C5=82kowski?=writes: > A table has a trigger. > The trigger sends a NOTIFY. > Test with COPY FROM shows non-linear correlation between number of inserted > rows and COPY duration. No surprise, see AsyncExistsPendingNotify. You would have a lot of other performance issues with sending hundreds of thousands of distinct notify events from one transaction anyway, so I can't get terribly excited about this. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bad COPY performance with NOTIFY in a trigger
Hi. A table has a trigger. The trigger sends a NOTIFY. Test with COPY FROM shows non-linear correlation between number of inserted rows and COPY duration. Table "test.tab" Column | Type | Modifiers -+-+--- id | integer | not null default nextval('test.tab_id_seq'::regclass) payload | text| Indexes: "tab_pkey" PRIMARY KEY, btree (id) Triggers: trg AFTER INSERT ON test.tab FOR EACH ROW EXECUTE PROCEDURE test.fun() Test Series 1. Trigger code: BEGIN RETURN NULL; END You can see linear scaling. rows=4 191ms COPY test.tab FROM '/tmp/test.dat' 201ms COPY test.tab FROM '/tmp/test.dat' rows=8 426ms COPY test.tab FROM '/tmp/test.dat' 415ms COPY test.tab FROM '/tmp/test.dat' rows=12 634ms COPY test.tab FROM '/tmp/test.dat' 616ms COPY test.tab FROM '/tmp/test.dat' rows=16 843ms COPY test.tab FROM '/tmp/test.dat' 834ms COPY test.tab FROM '/tmp/test.dat' rows=20 1101ms COPY test.tab FROM '/tmp/test.dat' 1094ms COPY test.tab FROM '/tmp/test.dat' Test Series 2. Trigger code: BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; You can see non-linear scaling. rows=4 9767ms COPY test.tab FROM '/tmp/test.dat' 8901ms COPY test.tab FROM '/tmp/test.dat' rows=8 37409ms COPY test.tab FROM '/tmp/test.dat' 38015ms COPY test.tab FROM '/tmp/test.dat' rows=12 90227ms COPY test.tab FROM '/tmp/test.dat' 87838ms COPY test.tab FROM '/tmp/test.dat' rows=16 160080ms COPY test.tab FROM '/tmp/test.dat' 159801ms COPY test.tab FROM '/tmp/test.dat' rows=20 247330ms COPY test.tab FROM '/tmp/test.dat' 251191ms COPY test.tab FROM '/tmp/test.dat' O(N^2)