[PERFORM] seqscan on UNION'ed views

2013-02-27 Thread Dmitry Karasik
Dear all,

I have a problem with seqscan I hope you might help me with.
Attached is the simple script that reproduces a database and results, which I 
have tested both on 9.0.4 and 9.3-devel with identical results.

I need to have a sort of a time machine, where select statements on tables
could be easily replaced to select statements on tables as they were some time 
in the past, 
including all related table. To do so, I used views (see in the script) that 
UNION
both current and archive tables and filter them by a timestamp.

The problem arises when there are two such views used in a JOIN, and apparently
the query planner doesn't look deep enough into views, creating a very slow
seqscan-based plan. The setup here demonstrates how a join that needs to
extract a single row, includes a seqscan on the whole table (see 1.Bad plan in
explain.txt, and 1000 of rows are being scanned.  For the test purposes 1000
rows is not a high number, but on my system this is several millions, and that
takes significant time.

If I rewrite the query into what I would expect the planner would do for me
(see 2.Good plan), then (expectably) there are no seqscans. But I'm using an ORM
which can't rewrite joins in such a way automatically, and there are so many of
those automated queries that rewriting them by hand is also a rather bad
alternative.  So my question is, is it possible to somehow nudge the planner
into the right direction?

Thank you in advance!

-- 
Sincerely,
Dmitry Karasik

1: Bad plan:
==

testjoin= EXPLAIN ANALYZE SELECT id FROM a JOIN b ON a.id = b.a_id WHERE b.id 
= 1;
 QUERY PLAN 


 Hash Join  (cost=122.29..151.29 rows=12 width=12) (actual time=4.854..4.974 
rows=1 loops=1)
   Hash Cond: (archive_a.id = archive_b.a_id)
   -  HashAggregate  (cost=73.05..85.21 rows=1216 width=4) (actual 
time=3.399..4.139 rows=1000 loops=1)
 -  Append  (cost=0.00..70.01 rows=1216 width=4) (actual 
time=0.052..2.164 rows=1000 loops=1)
   -  Seq Scan on archive_a  (cost=0.00..39.10 rows=216 width=4) 
(actual time=0.019..0.019 rows=0 loops=1)
 Filter: ((start_time = 1000) AND (end_time  1000))
 Rows Removed by Filter: 1
   -  Seq Scan on table_a  (cost=0.00..18.75 rows=1000 width=4) 
(actual time=0.032..1.100 rows=1000 loops=1)




 Filter: (updated = 1000)
 Rows Removed by Filter: 100
   -  Hash  (cost=49.21..49.21 rows=2 width=8) (actual time=0.172..0.172 
rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 -  Unique  (cost=49.18..49.19 rows=2 width=8) (actual 
time=0.165..0.168 rows=1 loops=1)
   -  Sort  (cost=49.18..49.18 rows=2 width=8) (actual 
time=0.162..0.163 rows=1 loops=1)
 Sort Key: archive_b.id, archive_b.a_id
 Sort Method: quicksort  Memory: 25kB
 -  Append  (cost=0.00..49.17 rows=2 width=8) (actual 
time=0.067..0.070 rows=1 loops=1)
   -  Seq Scan on archive_b  (cost=0.00..40.98 rows=1 
width=8) (actual time=0.041..0.041 rows=0 loops=1)
 Filter: ((start_time = 1000) AND (end_time  
1000) AND (id = 1))
 Rows Removed by Filter: 1
   -  Index Scan using table_b_pkey on table_b  
(cost=0.15..8.17 rows=1 width=8) (actual time=0.024..0.027 rows=1 loops=1)
 Index Cond: (id = 1)
 Filter: (updated = 1000)
 Total runtime: 5.455 ms
(24 rows)


2: Good plan:
==

EXPLAIN ANALYZE 
 SELECT a.id FROM table_a   a JOIN table_b   b ON a.id = b.a_id WHERE b.id = 1 
AND a.updated = 1000 AND b.updated = 1000
UNION
 SELECT a.id FROM table_a   a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 
AND a.updated = 1000 AND b.start_time = 1000 AND b.end_time  1000
UNION
 SELECT a.id FROM archive_a a JOIN table_b   b ON a.id = b.a_id WHERE b.id = 1 
AND a.start_time = 1000 AND a.end_time  1000 AND b.updated = 1000
UNION
 SELECT a.id FROM archive_a a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 
AND a.start_time = 1000 AND a.end_time  1000 AND b.start_time = 1000 AND 
b.end_time  1000
;

   QUERY PLAN   


Re: [PERFORM] Server stalls, all CPU 100% system time

2013-02-27 Thread Andre

On 27/02/2013 9:29 AM, Josh Berkus wrote:

Andre,

Please see the related thread on this list, High CPU usage / load
average after upgrading to Ubuntu 12.04.  You may be experiencing some
of the same issues.  General perspective seems to be that kernels 3.0
through 3.4 have serious performance issues.



Josh,

I saw that thread, but it did not appear to be the same symptoms that I had. 
Where they have a high load average, I only saw spikes during which the server 
was unresponsive. During that time the load would jump to 50-70 (on 24 cores).
Anyway, after upgrading the Kernel to 3.4.28 and the latest Intel network card 
driver the problem seems to be gone.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan on UNION'ed views

2013-02-27 Thread Tom Lane
Dmitry Karasik dmi...@karasik.eu.org writes:
 I need to have a sort of a time machine, where select statements on tables
 could be easily replaced to select statements on tables as they were some 
 time in the past, 
 including all related table. To do so, I used views (see in the script) that 
 UNION
 both current and archive tables and filter them by a timestamp.

If you use UNION ALL instead of UNION, you should get better results
(as well as inherently cheaper queries, since no duplicate-elimination
step will be needed).

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Estimation question...

2013-02-27 Thread Matt Daw
Quick follow up... I've found that the row estimate in:

explain select count(id) from versions where project_id IN (80,115)
AND project_id=115;
   QUERY PLAN
-
 Aggregate  (cost=178572.75..178572.76 rows=1 width=4)
   -  Index Scan using dneg_versions_project_id on versions
(cost=0.00..178306.94 rows=106323 width=4)
 Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
(project_id = 115))


... is the sum of two other estimates, seen when rewriting the query
using OR instead of IN:


explain select count(id) from versions where (project_id = 80 OR
project_id = 115) AND project_id=115;
QUERY PLAN
---
 Aggregate  (cost=305896.95..305896.96 rows=1 width=4)
   -  Bitmap Heap Scan on versions  (cost=2315.08..305632.00
rows=105980 width=4)
 Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR
((project_id = 115) AND (project_id = 115)))
 -  BitmapOr  (cost=2315.08..2315.08 rows=106323 width=0)
   -  Bitmap Index Scan on dneg_versions_project_id
(cost=0.00..94.52 rows=3709 width=0)
 Index Cond: ((project_id = 80) AND (project_id = 115))
   -  Bitmap Index Scan on dneg_versions_project_id
(cost=0.00..2167.57 rows=102614 width=0)
 Index Cond: ((project_id = 115) AND (project_id = 115))

106323 = 3709 + 102614

Looks like the underlying problem is that the estimate for
((project_id = 115) AND (project_id = 115)) doesn't end up being the
same as (project_id=115) on its own.

Matt

On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote:
 Howdy, the query generator in my app sometimes creates redundant
 filters of the form:

 project_id IN ( list of projects user has permission to see ) AND
 project_id = single project user is looking at 

 ... and this is leading to a bad estimate (and thus a bad plan) on a
 few complex queries. I've included simplified examples below. This
 server is running 9.0.10 and the statistics target has been updated to
 1000 on the project_id column. I've also loaded the one table into a
 9.2.2 instance and replicated the behaviour.

 I can change how the query is being generated, but I'm curious why I'm
 getting a bad estimate. Is this an expected result?

 Thanks!

 Matt

 =

 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good.

 explain (analyze,buffers) select count(id) from versions WHERE project_id=115;

  QUERY PLAN
 ---
  Aggregate  (cost=1218111.01..1218111.02 rows=1 width=4) (actual
 time=1531.341..1531.342 rows=1 loops=1)
Buffers: shared hit=452619
-  Bitmap Heap Scan on versions  (cost=34245.06..1215254.86
 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197
 loops=1)
  Recheck Cond: (project_id = 115)
  Buffers: shared hit=452619
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..33959.45 rows=1142461 width=0) (actual
 time=139.709..139.709 rows=1116037 loops=1)
Index Cond: (project_id = 115)
Buffers: shared hit=22077
  Total runtime: 1531.399 ms

 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower.

 explain (analyze,buffers) select count(id) from versions WHERE
 project_id IN (80,115) AND project_id=115;;

 QUERY PLAN
 -
  Aggregate  (cost=327066.18..327066.19 rows=1 width=4) (actual
 time=1637.889..1637.889 rows=1 loops=1)
Buffers: shared hit=458389
-  Bitmap Heap Scan on versions  (cost=3546.56..326793.17
 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180
 loops=1)
  Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
 (project_id = 115))
  Buffers: shared hit=458389
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502
 rows=1125436 loops=1)
Index Cond: ((project_id = ANY ('{80,115}'::integer[]))
 AND (project_id = 115))
Buffers: shared hit=22076
  Total runtime: 1637.941 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-02-27 Thread Jeff Janes
On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks 
stonec.regis...@sympatico.ca wrote:

 Is each of these write operations just covering a single row?  Does this
 description apply to just one of the many (how many?) databases, so that
 there are really 14*N concurrent sessions?

 ** **

 ** **

 All writes are single row. All DB’s have exactly the same structure, only
 the content is different. Currently the server is hosting five active DB’s
 – although there 14 DB’s actually on the host, the balance are backups and
 or testing environments.


I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads.  But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?



 When a feed comes in, it can be anything from dozens to millions of rows,
 and may take minutes or days to run. I had asked that PG bouncer be
 installed in front of the host to act as a traffic cop. Try as I may to
 convince the engineering team that fewer sessions running faster is
 optimal, they say that the 14 concurrent sessions is based on real-world
 experience of what imports the fastest.



pgbouncer is more for making connections line up single-file when the line
is moving at a very fast clip, say 0.01 second per turn.  If I were trying
to make tasks that can each last for hours or days line up and take turns,
I don't think pgbouncer would be the way to go.



 

 ** **

  You really need to know whether those reads and writes are
 concentrated in a small region (relative to the amount of your RAM), or
 widely scattered.  If you are reading and writing intensively (which you do
 seem to be doing) but only within a compact region, then it should not
 drive other data out of the cache.  But, since you do seem to have IO
 problems from cache misses, and you do have a high level of activity, the
 easy conclusion is that you have too little RAM to hold the working size of
 your data.
 

 ** **

 It won’t be a problem of physical RAM, I believe there is at least 32GB of
 RAM. What constitutes “a compact region”?


If you have 14 actively going on simultaneously, I'd say a compact region
would then be about 512 MB.
(32GB/ 14 / margin of safety of 4).  Again, assuming that that is the
problem.


 The ETL process takes the feed and distributes it to 85 core tables.  I
 have been through many PG configuration cycles with the generous help of
 people in this forum. I think the big problem when getting help has been
 this issue of those offering assistance understanding that the whopping
 majority of the time, the system is performing single row reads and writes.
 The assumption tends to be that the end point of an ETL should just be a
 series of COPY statements, and it should all happen very quickly in classic
 SQL bulk queries.


That is often a reasonable assumption, as ETL does end with L :)

Is the original query you posted part of the transform process, rather than
being the production query you run after the ETL is over?

If so, maybe you need a EL(S)TL process, were you first load the data to
staging table in bulk, and then transform it in bulk rather than one row at
a time.

Cheers,

Jeff


Re: [PERFORM] Are bitmap index scans slow to start?

2013-02-27 Thread Carlo Stonebanks
 pgbouncer is more for making connections line up single-file when the
line is moving at a very fast clip, say 0.01 second per turn.  If I were
trying to make tasks that can each last for hours or days line up and take
turns, I don't think pgbouncer would be the way to go.


 

The recommendation at the time was assuming that write contention was
slowing things down and consuming resources, since I can't stop people from
creating big multi-threaded imports. Each import consists of about 50 writes


 

 Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?



Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular  item (an item being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated -
all in the order they occurred.

 

You can imagine how useful this in creating a document describing what
happened and why.

 

The same data generated by the report is used to resurrect an item. If -
for example - our business logic has changed, but the change only affects a
small sub-set of our core data, then we perform a rollback (a logical
cascading delete) on the affected items. Then we create a rebuild which is
a script that is generated to re-import ONLY the import table rows defined
in the audit report.

 

So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.



Re: [PERFORM] Server stalls, all CPU 100% system time

2013-02-27 Thread Josh Berkus

 Someone commented they think it might be related to this kernel bug:
 
 https://lkml.org/lkml/2012/10/9/210
 

We have some evidence that that is the case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-02-27 Thread Carlo Stonebanks
 

I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads.  But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?


 

Sorry, just caught this. 

 

Your first interpretation was correct. Each DB runs an ETL that can have up
to 14 concurrent threads. I don't think the number should be that high, but
the engineering team insists the load time is better than fewer threads
running faster.