Re: [PERFORM] gin performance issue.

2016-02-08 Thread Marc Mamin


> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Freitag, 5. Februar 2016 16:07
 

> > http://explain.depesz.com/s/wKv7
> > Postgres Version 9.3.10 (Linux)
> > 
> > Hello,
> > this is a large daily table that only get bulk inserts (200-400 /days) with 
> > no update.
> > After rebuilding the whole table, the Bitmap Index Scan on
> > r_20160204_ix_toprid falls under 1 second (from 800)
> >
> > Fastupdate is using the default, but autovacuum is disabled on that
> > table which contains 30 Mio rows.


> Pre-9.5, it's a pretty bad idea to disable autovacuum on a GIN index,
> because then the "pending list" only gets flushed when it exceeds
> work_mem.  (Obviously, using a large work_mem setting makes this
> worse.)
> 
>   regards, tom lane


Hello,
knowing what the problem is don't really help here:

- auto vacuum will not run as these are insert only tables
- according to this post, auto analyze would also do the job:
  http://postgresql.nabble.com/Performance-problem-with-gin-index-td5867870.html
  It seems that this information is missing in the doc
  
  but it sadly neither triggers in our case as we have manual analyzes called 
during the dataprocesssing just following the imports.
  Manual vacuum is just too expensive here.
  
  Hence disabling fast update seems to be our only option. 
  
  I hope this problem will help push up the 9.5 upgrade on our todo list :)
  
  Ideally, we would then like to flush the pending list inconditionally after 
the imports. 
  I guess we could achieve something approaching while modifying the analyze 
scale factor  and gin_pending_list_limit
  before/after the (bulk) imports, but having the possibility to flush it per 
SQL would be better. 
  Is this a reasonable feature wish?
  
  And a last question: how does the index update work with bulk (COPY) inserts:
  without pending list: is it like a per row trigger or will the index be cared 
of afterwards ?
  with small pending lists : is there a concurrency problem, or can both tasks 
cleanly work in parallel ?
  
  best regards,
  
  Marc mamin
  
  


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] gin performance issue.

2016-02-08 Thread Jeff Janes
On Mon, Feb 8, 2016 at 2:21 AM, Marc Mamin  wrote:
>
> - auto vacuum will not run as these are insert only tables
> - according to this post, auto analyze would also do the job:
>   
> http://postgresql.nabble.com/Performance-problem-with-gin-index-td5867870.html
>   It seems that this information is missing in the doc
>
>   but it sadly neither triggers in our case as we have manual analyzes called 
> during the dataprocesssing just following the imports.
>   Manual vacuum is just too expensive here.
>
>   Hence disabling fast update seems to be our only option.

Does disabling fast update cause problems?  I always start with
fastupdate disabled, and only turn on if it I have a demonstrable
problem with it being off.

I would think "off" is likely to be better for you.  You say each
distinct key only appears in 2.7 rows.  So you won't get much benefit
from aggregating together all the new rows for each key before
updating the index for that key, as there is very little to aggregate.

Also, you say the inserts come in bulk.  It is generally a good thing
to slow down bulk operations by making them clean up their own messes,
for the sake of everyone else.


>   I hope this problem will help push up the 9.5 upgrade on our todo list :)
>
>   Ideally, we would then like to flush the pending list inconditionally after 
> the imports.
>   I guess we could achieve something approaching while modifying the analyze 
> scale factor  and gin_pending_list_limit
>   before/after the (bulk) imports, but having the possibility to flush it per 
> SQL would be better.
>   Is this a reasonable feature wish?

That feature has already been committed for the 9.6 branch.

>   And a last question: how does the index update work with bulk (COPY) 
> inserts:
>   without pending list: is it like a per row trigger or will the index be 
> cared of afterwards ?

Done for each row.

>   with small pending lists : is there a concurrency problem, or can both 
> tasks cleanly work in parallel ?

I don't understand the question.  What are the two tasks you are
referring to? Do you have multiple COPY running at the same time in
different processes?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] gin performance issue.

2016-02-05 Thread Tom Lane
Marc Mamin  writes:
> http://explain.depesz.com/s/wKv7
> Postgres Version 9.3.10 (Linux)

> Hello,
> this is a large daily table that only get bulk inserts (200-400 /days) with 
> no update.
> After rebuilding the whole table, the Bitmap Index Scan on 
> r_20160204_ix_toprid falls under 1 second (from 800)

> Fastupdate is using the default, but autovacuum is disabled on that table 
> which contains 30 Mio rows.

Pre-9.5, it's a pretty bad idea to disable autovacuum on a GIN index,
because then the "pending list" only gets flushed when it exceeds
work_mem.  (Obviously, using a large work_mem setting makes this worse.)

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] gin performance issue.

2016-02-05 Thread Marc Mamin
http://explain.depesz.com/s/wKv7
Postgres Version 9.3.10 (Linux)

Hello,
this is a large daily table that only get bulk inserts (200-400 /days) with no 
update.
After rebuilding the whole table, the Bitmap Index Scan on r_20160204_ix_toprid 
falls under 1 second (from 800)

Fastupdate is using the default, but autovacuum is disabled on that table which 
contains 30 Mio rows.
Another specificity is that the cardinality of the indexed column is very high. 
The average count per distinct values is only 2.7

I'm not sure what the problem is. Does the missing vacuum affect the gin index 
sanity further than not cleaning the pending list?
As I understand it, this list will be merged into the index automatically when 
it get full, independently from the vaccum setting.

Can it be an index bloating issue ?

and last but not least, can I reduce the problem by configuration ?

regards,

Marc Mamin