Re: [PERFORM] gin performance issue.

2016-02-05 Thread Tom Lane
Marc Mamin  writes:
> http://explain.depesz.com/s/wKv7
> Postgres Version 9.3.10 (Linux)

> Hello,
> this is a large daily table that only get bulk inserts (200-400 /days) with 
> no update.
> After rebuilding the whole table, the Bitmap Index Scan on 
> r_20160204_ix_toprid falls under 1 second (from 800)

> Fastupdate is using the default, but autovacuum is disabled on that table 
> which contains 30 Mio rows.

Pre-9.5, it's a pretty bad idea to disable autovacuum on a GIN index,
because then the "pending list" only gets flushed when it exceeds
work_mem.  (Obviously, using a large work_mem setting makes this worse.)

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
patch submitted on -hackers list.
http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com

results after the patch:

trigger= BEGIN RETURN NULL; END
rows=4
  228ms COPY test.tab FROM '/tmp/test.dat'
  205ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  494ms COPY test.tab FROM '/tmp/test.dat'
  395ms COPY test.tab FROM '/tmp/test.dat'
rows=12
  678ms COPY test.tab FROM '/tmp/test.dat'
  652ms COPY test.tab FROM '/tmp/test.dat'
rows=16
  956ms COPY test.tab FROM '/tmp/test.dat'
  822ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 1184ms COPY test.tab FROM '/tmp/test.dat'
 1072ms COPY test.tab FROM '/tmp/test.dat'
trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
rows=4
  440ms COPY test.tab FROM '/tmp/test.dat'
  406ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  887ms COPY test.tab FROM '/tmp/test.dat'
  769ms COPY test.tab FROM '/tmp/test.dat'
rows=12
 1346ms COPY test.tab FROM '/tmp/test.dat'
 1171ms COPY test.tab FROM '/tmp/test.dat'
rows=16
 1710ms COPY test.tab FROM '/tmp/test.dat'
 1709ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 2189ms COPY test.tab FROM '/tmp/test.dat'
 2206ms COPY test.tab FROM '/tmp/test.dat'



On Fri, Feb 5, 2016 at 1:45 PM, Filip Rembiałkowski <
filip.rembialkow...@gmail.com> wrote:

> On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane  wrote:
>
>> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= 
>> writes:
>> > A table has a trigger.
>> > The trigger sends a NOTIFY.
>> > Test with COPY FROM shows non-linear correlation between number of
>> inserted
>> > rows and COPY duration.
>>
>> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
>> performance issues with sending hundreds of thousands of distinct notify
>> events from one transaction anyway, so I can't get terribly excited about
>> this.
>>
>
>
> What kind of issues? Do you mean, problems in postgres or problems in
> client?
>
> Is there an additional non-linear cost on COMMIT (extra to the cost I
> already showed)?
>
> The 8GB internal queue (referenced in a Note at
> http://www.postgresql.org/docs/current/static/sql-notify.html) should be
> able to keep ~ 1E8 such notifications (assumed one notification will fit in
> 80 bytes).
>
> On client side, this seems legit - the LISTENer deamon will collect these
> notifications and process them in line.
> There might be no LISTENer running at all.
>
> Still, the main problem I get with this approach is quadratic cost on big
> insert transactions.
> I wonder if this behavior is possible to change in future postgres
> versions. And how much programming work does it require.
>
> Is duplicate-elimination a fundamental, non-negotiable requirement?
>
>
>
> Thank you,
> Filip
>
>


Re: [PERFORM] View containing a recursive function

2016-02-05 Thread Mathieu De Zutter
On Mon, 1 Feb 2016 at 10:45 Tom Lane  wrote:

> Mathieu De Zutter  writes:
> Assuming you have a reasonably late-model PG, you could rewrite the
> view with a lateral function call:
>
> CREATE OR REPLACE VIEW covering_works_r AS
>   SELECT
> w.idAS work_id,
> fn.fAS covering_work_id
>   FROM work w, fn_covering_works(w.id) as fn(f);
>
> which puts the SRF into FROM where the planner can deal with it much
> better.
>

Thanks a lot. That fixes it!

Another problem is that you let the function default to being VOLATILE,
> which would have disabled view flattening even if this didn't.  I see
> no reason for this function not to be marked STABLE.
>

By marking it STABLE, it ignores my row estimate of 1 - I guess because of
the inlining. The number of results is usually just 1, though the number
can go up to 10 in exceptional cases. That's still a lot better than the
inexplicable estimate of the planner (101) when marked STABLE, which often
leads to triggering a hash join instead of a nested loop in complex queries:

->  Recursive Union  (cost=0.00..795.53 rows=*101* width=4) (actual
time=0.001..0.009 rows=1 loops=4)
  ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.001..0.001 rows=1 loops=4)
  ->  Nested Loop  (cost=0.29..79.35 rows=10 width=4) (actual
time=0.005..0.005 rows=0 loops=5)
->  WorkTable Scan on func f  (cost=0.00..0.20 rows=10 width=4)
(actual time=0.000..0.000 rows=1 loops=5)
->  Index Scan using adaptation_adapted_idx on adaptation ad
 (cost=0.29..7.91 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=5)
  Index Cond: (adapted_id = f.work_id)


Thanks again,

Mathieu


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Harald Fuchs
Tom Lane  writes:

> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
> performance issues with sending hundreds of thousands of distinct notify
> events from one transaction anyway, so I can't get terribly excited about
> this.

@Filip: you probably want a per-statement trigger rather than a per-row
trigger: insert all rows with COPY, then send one notification.

You have to mark the new rows somehow yourself; unfortunately PostgreSQL
has no way to tell them in a statement trigger.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Merlin Moncure
On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
 wrote:
> patch submitted on -hackers list.
> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com
>
> results after the patch:
>
> trigger= BEGIN RETURN NULL; END
> rows=4
>   228ms COPY test.tab FROM '/tmp/test.dat'
>   205ms COPY test.tab FROM '/tmp/test.dat'
> rows=8
>   494ms COPY test.tab FROM '/tmp/test.dat'
>   395ms COPY test.tab FROM '/tmp/test.dat'
> rows=12
>   678ms COPY test.tab FROM '/tmp/test.dat'
>   652ms COPY test.tab FROM '/tmp/test.dat'
> rows=16
>   956ms COPY test.tab FROM '/tmp/test.dat'
>   822ms COPY test.tab FROM '/tmp/test.dat'
> rows=20
>  1184ms COPY test.tab FROM '/tmp/test.dat'
>  1072ms COPY test.tab FROM '/tmp/test.dat'
> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
> rows=4
>   440ms COPY test.tab FROM '/tmp/test.dat'
>   406ms COPY test.tab FROM '/tmp/test.dat'
> rows=8
>   887ms COPY test.tab FROM '/tmp/test.dat'
>   769ms COPY test.tab FROM '/tmp/test.dat'
> rows=12
>  1346ms COPY test.tab FROM '/tmp/test.dat'
>  1171ms COPY test.tab FROM '/tmp/test.dat'
> rows=16
>  1710ms COPY test.tab FROM '/tmp/test.dat'
>  1709ms COPY test.tab FROM '/tmp/test.dat'
> rows=20
>  2189ms COPY test.tab FROM '/tmp/test.dat'
>  2206ms COPY test.tab FROM '/tmp/test.dat'

I'm not so sure that this is a great idea.  Generally, we tend to
discourage GUCs that control behavior at the SQL level.  Are you 100%
certain that there is no path to optimizing this case without changing
behvior?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane  wrote:

> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= 
> writes:
> > A table has a trigger.
> > The trigger sends a NOTIFY.
> > Test with COPY FROM shows non-linear correlation between number of
> inserted
> > rows and COPY duration.
>
> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
> performance issues with sending hundreds of thousands of distinct notify
> events from one transaction anyway, so I can't get terribly excited about
> this.
>


What kind of issues? Do you mean, problems in postgres or problems in
client?

Is there an additional non-linear cost on COMMIT (extra to the cost I
already showed)?

The 8GB internal queue (referenced in a Note at
http://www.postgresql.org/docs/current/static/sql-notify.html) should be
able to keep ~ 1E8 such notifications (assumed one notification will fit in
80 bytes).

On client side, this seems legit - the LISTENer deamon will collect these
notifications and process them in line.
There might be no LISTENer running at all.

Still, the main problem I get with this approach is quadratic cost on big
insert transactions.
I wonder if this behavior is possible to change in future postgres
versions. And how much programming work does it require.

Is duplicate-elimination a fundamental, non-negotiable requirement?



Thank you,
Filip


[PERFORM] gin performance issue.

2016-02-05 Thread Marc Mamin
http://explain.depesz.com/s/wKv7
Postgres Version 9.3.10 (Linux)

Hello,
this is a large daily table that only get bulk inserts (200-400 /days) with no 
update.
After rebuilding the whole table, the Bitmap Index Scan on r_20160204_ix_toprid 
falls under 1 second (from 800)

Fastupdate is using the default, but autovacuum is disabled on that table which 
contains 30 Mio rows.
Another specificity is that the cardinality of the indexed column is very high. 
The average count per distinct values is only 2.7

I'm not sure what the problem is. Does the missing vacuum affect the gin index 
sanity further than not cleaning the pending list?
As I understand it, this list will be merged into the index automatically when 
it get full, independently from the vaccum setting.

Can it be an index bloating issue ?

and last but not least, can I reduce the problem by configuration ?

regards,

Marc Mamin