Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-08 Thread Merlin Moncure
On Sat, Feb 6, 2016 at 6:03 AM, Filip Rembiałkowski
 wrote:
> Thanks for the feedback.
>
> This patch is my first and obvious approach.
>
> @Merlin, I'm not sure if I get your idea:
> - keep previous behaviour as obligatory? (which is: automatic
> de-duplicating of incoming messages by channel+payload),
> - instead of trivial search (sorting by browsing) use some kind of
> faster lookups?
>
> I'm not sure if this statement in async.c is carved in stone:
>
> * Duplicate notifications from the same transaction are sent out as one
> * notification only. This is done to save work when for example a trigger
> * on a 2 million row table fires a notification for each row that has been
> * changed. If the application needs to receive every single notification
> * that has been sent, it can easily add some unique string into the extra
> * payload parameter.
>
> 1) "work-saving" is disputable in some cases
>
> 2) an idea to "add some unique string" is OK logical-wise but it's not
> OK performance-wise.
>
> Current search code is a sequential search:
> https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139
>
> I'm not that smart to devise an algorithm for faster lookups -
> probably you guys can give some advice.
>
> Again, my rationale is... This feature can burn a lot of CPU for
> nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
> mechanism. Superfast regardless on whether you insert 100, 10k or 1m
> rows.

Sure, I get it -- you want to have fast notification events -- this is
a good thing to want to have.  However, a GUC is probably not the best
way to do that in this particular case.  It's way to fringey and the
bar for behavior controlling GUC is incredibly high (short version:
most modern introductions were to manage security issues).  I'm far
from the last word on this thoug, but it's better to get this all
sorted out now.

Anyways, it should be possible to micro-optimize that path.  Perhaps
using a hash table?  I'm not sure.

Another possible way to work things out here is to expose your switch
in the syntax of the command itself, or perhaps via the pg_notify
function to avoid syntax issues.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-08 Thread Merlin Moncure
On Mon, Feb 8, 2016 at 8:35 AM, Merlin Moncure  wrote:
> On Sat, Feb 6, 2016 at 6:03 AM, Filip Rembiałkowski
>  wrote:
>> Thanks for the feedback.
>>
>> This patch is my first and obvious approach.
>>
>> @Merlin, I'm not sure if I get your idea:
>> - keep previous behaviour as obligatory? (which is: automatic
>> de-duplicating of incoming messages by channel+payload),
>> - instead of trivial search (sorting by browsing) use some kind of
>> faster lookups?
>>
>> I'm not sure if this statement in async.c is carved in stone:
>>
>> * Duplicate notifications from the same transaction are sent out as one
>> * notification only. This is done to save work when for example a trigger
>> * on a 2 million row table fires a notification for each row that has been
>> * changed. If the application needs to receive every single notification
>> * that has been sent, it can easily add some unique string into the extra
>> * payload parameter.
>>
>> 1) "work-saving" is disputable in some cases
>>
>> 2) an idea to "add some unique string" is OK logical-wise but it's not
>> OK performance-wise.
>>
>> Current search code is a sequential search:
>> https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139
>>
>> I'm not that smart to devise an algorithm for faster lookups -
>> probably you guys can give some advice.
>>
>> Again, my rationale is... This feature can burn a lot of CPU for
>> nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
>> mechanism. Superfast regardless on whether you insert 100, 10k or 1m
>> rows.
>
> Sure, I get it -- you want to have fast notification events -- this is
> a good thing to want to have.  However, a GUC is probably not the best
> way to do that in this particular case.  It's way to fringey and the
> bar for behavior controlling GUC is incredibly high (short version:
> most modern introductions were to manage security issues).  I'm far
> from the last word on this thoug, but it's better to get this all
> sorted out now.
>
> Anyways, it should be possible to micro-optimize that path.  Perhaps
> using a hash table?  I'm not sure.
>
> Another possible way to work things out here is to expose your switch
> in the syntax of the command itself, or perhaps via the pg_notify
> function to avoid syntax issues.

whoops, I just noticed this thread moved to -hackers -- so please respond there.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-06 Thread Filip Rembiałkowski
Thanks for the feedback.

This patch is my first and obvious approach.

@Merlin, I'm not sure if I get your idea:
- keep previous behaviour as obligatory? (which is: automatic
de-duplicating of incoming messages by channel+payload),
- instead of trivial search (sorting by browsing) use some kind of
faster lookups?

I'm not sure if this statement in async.c is carved in stone:

* Duplicate notifications from the same transaction are sent out as one
* notification only. This is done to save work when for example a trigger
* on a 2 million row table fires a notification for each row that has been
* changed. If the application needs to receive every single notification
* that has been sent, it can easily add some unique string into the extra
* payload parameter.

1) "work-saving" is disputable in some cases

2) an idea to "add some unique string" is OK logical-wise but it's not
OK performance-wise.

Current search code is a sequential search:
https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139

I'm not that smart to devise an algorithm for faster lookups -
probably you guys can give some advice.

Again, my rationale is... This feature can burn a lot of CPU for
nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
mechanism. Superfast regardless on whether you insert 100, 10k or 1m
rows.




On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncure  wrote:
> On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
>  wrote:
>> patch submitted on -hackers list.
>> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com
>>
>> results after the patch:
>>
>> trigger= BEGIN RETURN NULL; END
>> rows=4
>>   228ms COPY test.tab FROM '/tmp/test.dat'
>>   205ms COPY test.tab FROM '/tmp/test.dat'
>> rows=8
>>   494ms COPY test.tab FROM '/tmp/test.dat'
>>   395ms COPY test.tab FROM '/tmp/test.dat'
>> rows=12
>>   678ms COPY test.tab FROM '/tmp/test.dat'
>>   652ms COPY test.tab FROM '/tmp/test.dat'
>> rows=16
>>   956ms COPY test.tab FROM '/tmp/test.dat'
>>   822ms COPY test.tab FROM '/tmp/test.dat'
>> rows=20
>>  1184ms COPY test.tab FROM '/tmp/test.dat'
>>  1072ms COPY test.tab FROM '/tmp/test.dat'
>> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
>> rows=4
>>   440ms COPY test.tab FROM '/tmp/test.dat'
>>   406ms COPY test.tab FROM '/tmp/test.dat'
>> rows=8
>>   887ms COPY test.tab FROM '/tmp/test.dat'
>>   769ms COPY test.tab FROM '/tmp/test.dat'
>> rows=12
>>  1346ms COPY test.tab FROM '/tmp/test.dat'
>>  1171ms COPY test.tab FROM '/tmp/test.dat'
>> rows=16
>>  1710ms COPY test.tab FROM '/tmp/test.dat'
>>  1709ms COPY test.tab FROM '/tmp/test.dat'
>> rows=20
>>  2189ms COPY test.tab FROM '/tmp/test.dat'
>>  2206ms COPY test.tab FROM '/tmp/test.dat'
>
> I'm not so sure that this is a great idea.  Generally, we tend to
> discourage GUCs that control behavior at the SQL level.  Are you 100%
> certain that there is no path to optimizing this case without changing
> behvior?
>
> merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
patch submitted on -hackers list.
http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com

results after the patch:

trigger= BEGIN RETURN NULL; END
rows=4
  228ms COPY test.tab FROM '/tmp/test.dat'
  205ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  494ms COPY test.tab FROM '/tmp/test.dat'
  395ms COPY test.tab FROM '/tmp/test.dat'
rows=12
  678ms COPY test.tab FROM '/tmp/test.dat'
  652ms COPY test.tab FROM '/tmp/test.dat'
rows=16
  956ms COPY test.tab FROM '/tmp/test.dat'
  822ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 1184ms COPY test.tab FROM '/tmp/test.dat'
 1072ms COPY test.tab FROM '/tmp/test.dat'
trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
rows=4
  440ms COPY test.tab FROM '/tmp/test.dat'
  406ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  887ms COPY test.tab FROM '/tmp/test.dat'
  769ms COPY test.tab FROM '/tmp/test.dat'
rows=12
 1346ms COPY test.tab FROM '/tmp/test.dat'
 1171ms COPY test.tab FROM '/tmp/test.dat'
rows=16
 1710ms COPY test.tab FROM '/tmp/test.dat'
 1709ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 2189ms COPY test.tab FROM '/tmp/test.dat'
 2206ms COPY test.tab FROM '/tmp/test.dat'



On Fri, Feb 5, 2016 at 1:45 PM, Filip Rembiałkowski <
filip.rembialkow...@gmail.com> wrote:

> On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane  wrote:
>
>> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= 
>> writes:
>> > A table has a trigger.
>> > The trigger sends a NOTIFY.
>> > Test with COPY FROM shows non-linear correlation between number of
>> inserted
>> > rows and COPY duration.
>>
>> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
>> performance issues with sending hundreds of thousands of distinct notify
>> events from one transaction anyway, so I can't get terribly excited about
>> this.
>>
>
>
> What kind of issues? Do you mean, problems in postgres or problems in
> client?
>
> Is there an additional non-linear cost on COMMIT (extra to the cost I
> already showed)?
>
> The 8GB internal queue (referenced in a Note at
> http://www.postgresql.org/docs/current/static/sql-notify.html) should be
> able to keep ~ 1E8 such notifications (assumed one notification will fit in
> 80 bytes).
>
> On client side, this seems legit - the LISTENer deamon will collect these
> notifications and process them in line.
> There might be no LISTENer running at all.
>
> Still, the main problem I get with this approach is quadratic cost on big
> insert transactions.
> I wonder if this behavior is possible to change in future postgres
> versions. And how much programming work does it require.
>
> Is duplicate-elimination a fundamental, non-negotiable requirement?
>
>
>
> Thank you,
> Filip
>
>


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Harald Fuchs
Tom Lane  writes:

> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
> performance issues with sending hundreds of thousands of distinct notify
> events from one transaction anyway, so I can't get terribly excited about
> this.

@Filip: you probably want a per-statement trigger rather than a per-row
trigger: insert all rows with COPY, then send one notification.

You have to mark the new rows somehow yourself; unfortunately PostgreSQL
has no way to tell them in a statement trigger.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Merlin Moncure
On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
 wrote:
> patch submitted on -hackers list.
> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com
>
> results after the patch:
>
> trigger= BEGIN RETURN NULL; END
> rows=4
>   228ms COPY test.tab FROM '/tmp/test.dat'
>   205ms COPY test.tab FROM '/tmp/test.dat'
> rows=8
>   494ms COPY test.tab FROM '/tmp/test.dat'
>   395ms COPY test.tab FROM '/tmp/test.dat'
> rows=12
>   678ms COPY test.tab FROM '/tmp/test.dat'
>   652ms COPY test.tab FROM '/tmp/test.dat'
> rows=16
>   956ms COPY test.tab FROM '/tmp/test.dat'
>   822ms COPY test.tab FROM '/tmp/test.dat'
> rows=20
>  1184ms COPY test.tab FROM '/tmp/test.dat'
>  1072ms COPY test.tab FROM '/tmp/test.dat'
> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
> rows=4
>   440ms COPY test.tab FROM '/tmp/test.dat'
>   406ms COPY test.tab FROM '/tmp/test.dat'
> rows=8
>   887ms COPY test.tab FROM '/tmp/test.dat'
>   769ms COPY test.tab FROM '/tmp/test.dat'
> rows=12
>  1346ms COPY test.tab FROM '/tmp/test.dat'
>  1171ms COPY test.tab FROM '/tmp/test.dat'
> rows=16
>  1710ms COPY test.tab FROM '/tmp/test.dat'
>  1709ms COPY test.tab FROM '/tmp/test.dat'
> rows=20
>  2189ms COPY test.tab FROM '/tmp/test.dat'
>  2206ms COPY test.tab FROM '/tmp/test.dat'

I'm not so sure that this is a great idea.  Generally, we tend to
discourage GUCs that control behavior at the SQL level.  Are you 100%
certain that there is no path to optimizing this case without changing
behvior?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane  wrote:

> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= 
> writes:
> > A table has a trigger.
> > The trigger sends a NOTIFY.
> > Test with COPY FROM shows non-linear correlation between number of
> inserted
> > rows and COPY duration.
>
> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
> performance issues with sending hundreds of thousands of distinct notify
> events from one transaction anyway, so I can't get terribly excited about
> this.
>


What kind of issues? Do you mean, problems in postgres or problems in
client?

Is there an additional non-linear cost on COMMIT (extra to the cost I
already showed)?

The 8GB internal queue (referenced in a Note at
http://www.postgresql.org/docs/current/static/sql-notify.html) should be
able to keep ~ 1E8 such notifications (assumed one notification will fit in
80 bytes).

On client side, this seems legit - the LISTENer deamon will collect these
notifications and process them in line.
There might be no LISTENer running at all.

Still, the main problem I get with this approach is quadratic cost on big
insert transactions.
I wonder if this behavior is possible to change in future postgres
versions. And how much programming work does it require.

Is duplicate-elimination a fundamental, non-negotiable requirement?



Thank you,
Filip


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-04 Thread Tom Lane
=?UTF-8?Q?Filip_Rembia=C5=82kowski?=  writes:
> A table has a trigger.
> The trigger sends a NOTIFY.
> Test with COPY FROM shows non-linear correlation between number of inserted
> rows and COPY duration.

No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
performance issues with sending hundreds of thousands of distinct notify
events from one transaction anyway, so I can't get terribly excited about
this.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-04 Thread Filip Rembiałkowski
Hi.

A table has a trigger.
The trigger sends a NOTIFY.

Test with COPY FROM shows non-linear correlation between number of inserted
rows and COPY duration.

 Table "test.tab"
 Column  |  Type   |   Modifiers
-+-+---
 id  | integer | not null default nextval('test.tab_id_seq'::regclass)
 payload | text|
Indexes:
"tab_pkey" PRIMARY KEY, btree (id)
Triggers:
trg AFTER INSERT ON test.tab FOR EACH ROW EXECUTE PROCEDURE test.fun()


Test Series 1. Trigger code:
BEGIN RETURN NULL; END
You can see linear scaling.

rows=4
  191ms COPY test.tab FROM '/tmp/test.dat'
  201ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  426ms COPY test.tab FROM '/tmp/test.dat'
  415ms COPY test.tab FROM '/tmp/test.dat'
rows=12
  634ms COPY test.tab FROM '/tmp/test.dat'
  616ms COPY test.tab FROM '/tmp/test.dat'
rows=16
  843ms COPY test.tab FROM '/tmp/test.dat'
  834ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 1101ms COPY test.tab FROM '/tmp/test.dat'
 1094ms COPY test.tab FROM '/tmp/test.dat'


Test Series 2. Trigger code:
BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL;
You can see non-linear scaling.

rows=4
 9767ms COPY test.tab FROM '/tmp/test.dat'
 8901ms COPY test.tab FROM '/tmp/test.dat'
rows=8
37409ms COPY test.tab FROM '/tmp/test.dat'
38015ms COPY test.tab FROM '/tmp/test.dat'
rows=12
90227ms COPY test.tab FROM '/tmp/test.dat'
87838ms COPY test.tab FROM '/tmp/test.dat'
rows=16
   160080ms COPY test.tab FROM '/tmp/test.dat'
   159801ms COPY test.tab FROM '/tmp/test.dat'
rows=20
   247330ms COPY test.tab FROM '/tmp/test.dat'
   251191ms COPY test.tab FROM '/tmp/test.dat'


O(N^2)