[PERFORM] seqscan on UNION'ed views
Dear all, I have a problem with seqscan I hope you might help me with. Attached is the simple script that reproduces a database and results, which I have tested both on 9.0.4 and 9.3-devel with identical results. I need to have a sort of a time machine, where select statements on tables could be easily replaced to select statements on tables as they were some time in the past, including all related table. To do so, I used views (see in the script) that UNION both current and archive tables and filter them by a timestamp. The problem arises when there are two such views used in a JOIN, and apparently the query planner doesn't look deep enough into views, creating a very slow seqscan-based plan. The setup here demonstrates how a join that needs to extract a single row, includes a seqscan on the whole table (see 1.Bad plan in explain.txt, and 1000 of rows are being scanned. For the test purposes 1000 rows is not a high number, but on my system this is several millions, and that takes significant time. If I rewrite the query into what I would expect the planner would do for me (see 2.Good plan), then (expectably) there are no seqscans. But I'm using an ORM which can't rewrite joins in such a way automatically, and there are so many of those automated queries that rewriting them by hand is also a rather bad alternative. So my question is, is it possible to somehow nudge the planner into the right direction? Thank you in advance! -- Sincerely, Dmitry Karasik 1: Bad plan: == testjoin= EXPLAIN ANALYZE SELECT id FROM a JOIN b ON a.id = b.a_id WHERE b.id = 1; QUERY PLAN Hash Join (cost=122.29..151.29 rows=12 width=12) (actual time=4.854..4.974 rows=1 loops=1) Hash Cond: (archive_a.id = archive_b.a_id) - HashAggregate (cost=73.05..85.21 rows=1216 width=4) (actual time=3.399..4.139 rows=1000 loops=1) - Append (cost=0.00..70.01 rows=1216 width=4) (actual time=0.052..2.164 rows=1000 loops=1) - Seq Scan on archive_a (cost=0.00..39.10 rows=216 width=4) (actual time=0.019..0.019 rows=0 loops=1) Filter: ((start_time = 1000) AND (end_time 1000)) Rows Removed by Filter: 1 - Seq Scan on table_a (cost=0.00..18.75 rows=1000 width=4) (actual time=0.032..1.100 rows=1000 loops=1) Filter: (updated = 1000) Rows Removed by Filter: 100 - Hash (cost=49.21..49.21 rows=2 width=8) (actual time=0.172..0.172 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB - Unique (cost=49.18..49.19 rows=2 width=8) (actual time=0.165..0.168 rows=1 loops=1) - Sort (cost=49.18..49.18 rows=2 width=8) (actual time=0.162..0.163 rows=1 loops=1) Sort Key: archive_b.id, archive_b.a_id Sort Method: quicksort Memory: 25kB - Append (cost=0.00..49.17 rows=2 width=8) (actual time=0.067..0.070 rows=1 loops=1) - Seq Scan on archive_b (cost=0.00..40.98 rows=1 width=8) (actual time=0.041..0.041 rows=0 loops=1) Filter: ((start_time = 1000) AND (end_time 1000) AND (id = 1)) Rows Removed by Filter: 1 - Index Scan using table_b_pkey on table_b (cost=0.15..8.17 rows=1 width=8) (actual time=0.024..0.027 rows=1 loops=1) Index Cond: (id = 1) Filter: (updated = 1000) Total runtime: 5.455 ms (24 rows) 2: Good plan: == EXPLAIN ANALYZE SELECT a.id FROM table_a a JOIN table_b b ON a.id = b.a_id WHERE b.id = 1 AND a.updated = 1000 AND b.updated = 1000 UNION SELECT a.id FROM table_a a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 AND a.updated = 1000 AND b.start_time = 1000 AND b.end_time 1000 UNION SELECT a.id FROM archive_a a JOIN table_b b ON a.id = b.a_id WHERE b.id = 1 AND a.start_time = 1000 AND a.end_time 1000 AND b.updated = 1000 UNION SELECT a.id FROM archive_a a JOIN archive_b b ON a.id = b.a_id WHERE b.id = 1 AND a.start_time = 1000 AND a.end_time 1000 AND b.start_time = 1000 AND b.end_time 1000 ; QUERY PLAN
Re: [PERFORM] Server stalls, all CPU 100% system time
On 27/02/2013 9:29 AM, Josh Berkus wrote: Andre, Please see the related thread on this list, High CPU usage / load average after upgrading to Ubuntu 12.04. You may be experiencing some of the same issues. General perspective seems to be that kernels 3.0 through 3.4 have serious performance issues. Josh, I saw that thread, but it did not appear to be the same symptoms that I had. Where they have a high load average, I only saw spikes during which the server was unresponsive. During that time the load would jump to 50-70 (on 24 cores). Anyway, after upgrading the Kernel to 3.4.28 and the latest Intel network card driver the problem seems to be gone. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] seqscan on UNION'ed views
Dmitry Karasik dmi...@karasik.eu.org writes: I need to have a sort of a time machine, where select statements on tables could be easily replaced to select statements on tables as they were some time in the past, including all related table. To do so, I used views (see in the script) that UNION both current and archive tables and filter them by a timestamp. If you use UNION ALL instead of UNION, you should get better results (as well as inherently cheaper queries, since no duplicate-elimination step will be needed). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Estimation question...
Quick follow up... I've found that the row estimate in: explain select count(id) from versions where project_id IN (80,115) AND project_id=115; QUERY PLAN - Aggregate (cost=178572.75..178572.76 rows=1 width=4) - Index Scan using dneg_versions_project_id on versions (cost=0.00..178306.94 rows=106323 width=4) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) ... is the sum of two other estimates, seen when rewriting the query using OR instead of IN: explain select count(id) from versions where (project_id = 80 OR project_id = 115) AND project_id=115; QUERY PLAN --- Aggregate (cost=305896.95..305896.96 rows=1 width=4) - Bitmap Heap Scan on versions (cost=2315.08..305632.00 rows=105980 width=4) Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR ((project_id = 115) AND (project_id = 115))) - BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..94.52 rows=3709 width=0) Index Cond: ((project_id = 80) AND (project_id = 115)) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..2167.57 rows=102614 width=0) Index Cond: ((project_id = 115) AND (project_id = 115)) 106323 = 3709 + 102614 Looks like the underlying problem is that the estimate for ((project_id = 115) AND (project_id = 115)) doesn't end up being the same as (project_id=115) on its own. Matt On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote: Howdy, the query generator in my app sometimes creates redundant filters of the form: project_id IN ( list of projects user has permission to see ) AND project_id = single project user is looking at ... and this is leading to a bad estimate (and thus a bad plan) on a few complex queries. I've included simplified examples below. This server is running 9.0.10 and the statistics target has been updated to 1000 on the project_id column. I've also loaded the one table into a 9.2.2 instance and replicated the behaviour. I can change how the query is being generated, but I'm curious why I'm getting a bad estimate. Is this an expected result? Thanks! Matt = 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. explain (analyze,buffers) select count(id) from versions WHERE project_id=115; QUERY PLAN --- Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual time=1531.341..1531.342 rows=1 loops=1) Buffers: shared hit=452619 - Bitmap Heap Scan on versions (cost=34245.06..1215254.86 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 loops=1) Recheck Cond: (project_id = 115) Buffers: shared hit=452619 - Bitmap Index Scan on versions_project_id (cost=0.00..33959.45 rows=1142461 width=0) (actual time=139.709..139.709 rows=1116037 loops=1) Index Cond: (project_id = 115) Buffers: shared hit=22077 Total runtime: 1531.399 ms 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. explain (analyze,buffers) select count(id) from versions WHERE project_id IN (80,115) AND project_id=115;; QUERY PLAN - Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual time=1637.889..1637.889 rows=1 loops=1) Buffers: shared hit=458389 - Bitmap Heap Scan on versions (cost=3546.56..326793.17 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 loops=1) Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=458389 - Bitmap Index Scan on versions_project_id (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 rows=1125436 loops=1) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=22076 Total runtime: 1637.941 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Is each of these write operations just covering a single row? Does this description apply to just one of the many (how many?) databases, so that there are really 14*N concurrent sessions? ** ** ** ** All writes are single row. All DB’s have exactly the same structure, only the content is different. Currently the server is hosting five active DB’s – although there 14 DB’s actually on the host, the balance are backups and or testing environments. I had thought you were saying that any one ETL procedure into one database used 14 concurrent threads. But really, each ETL procedure is single-threaded, and there can be up to 5 (or theoretically up to 14) of them running at a time into different databases? When a feed comes in, it can be anything from dozens to millions of rows, and may take minutes or days to run. I had asked that PG bouncer be installed in front of the host to act as a traffic cop. Try as I may to convince the engineering team that fewer sessions running faster is optimal, they say that the 14 concurrent sessions is based on real-world experience of what imports the fastest. pgbouncer is more for making connections line up single-file when the line is moving at a very fast clip, say 0.01 second per turn. If I were trying to make tasks that can each last for hours or days line up and take turns, I don't think pgbouncer would be the way to go. ** ** You really need to know whether those reads and writes are concentrated in a small region (relative to the amount of your RAM), or widely scattered. If you are reading and writing intensively (which you do seem to be doing) but only within a compact region, then it should not drive other data out of the cache. But, since you do seem to have IO problems from cache misses, and you do have a high level of activity, the easy conclusion is that you have too little RAM to hold the working size of your data. ** ** It won’t be a problem of physical RAM, I believe there is at least 32GB of RAM. What constitutes “a compact region”? If you have 14 actively going on simultaneously, I'd say a compact region would then be about 512 MB. (32GB/ 14 / margin of safety of 4). Again, assuming that that is the problem. The ETL process takes the feed and distributes it to 85 core tables. I have been through many PG configuration cycles with the generous help of people in this forum. I think the big problem when getting help has been this issue of those offering assistance understanding that the whopping majority of the time, the system is performing single row reads and writes. The assumption tends to be that the end point of an ETL should just be a series of COPY statements, and it should all happen very quickly in classic SQL bulk queries. That is often a reasonable assumption, as ETL does end with L :) Is the original query you posted part of the transform process, rather than being the production query you run after the ETL is over? If so, maybe you need a EL(S)TL process, were you first load the data to staging table in bulk, and then transform it in bulk rather than one row at a time. Cheers, Jeff
Re: [PERFORM] Are bitmap index scans slow to start?
pgbouncer is more for making connections line up single-file when the line is moving at a very fast clip, say 0.01 second per turn. If I were trying to make tasks that can each last for hours or days line up and take turns, I don't think pgbouncer would be the way to go. The recommendation at the time was assuming that write contention was slowing things down and consuming resources, since I can't stop people from creating big multi-threaded imports. Each import consists of about 50 writes Is the original query you posted part of the transform process, rather than being the production query you run after the ETL is over? Neither, it is part of our auditing and maintenance processes. It is not called with any great frequency. The audit report generates rows defining how the a particular item (an item being a particular table/row) was created: it returns the names of the import tables, the row ids, the write operations and any transformation messages that may have been generated - all in the order they occurred. You can imagine how useful this in creating a document describing what happened and why. The same data generated by the report is used to resurrect an item. If - for example - our business logic has changed, but the change only affects a small sub-set of our core data, then we perform a rollback (a logical cascading delete) on the affected items. Then we create a rebuild which is a script that is generated to re-import ONLY the import table rows defined in the audit report. So, this query is not called often, but the fact is that if it takes over 30 seconds to load an item (because the audit report takes so long to prepare the bitmap index scan when passed new query parameters) then it severely restricts how much data we can resurrect at any one time.
Re: [PERFORM] Server stalls, all CPU 100% system time
Someone commented they think it might be related to this kernel bug: https://lkml.org/lkml/2012/10/9/210 We have some evidence that that is the case. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
I had thought you were saying that any one ETL procedure into one database used 14 concurrent threads. But really, each ETL procedure is single-threaded, and there can be up to 5 (or theoretically up to 14) of them running at a time into different databases? Sorry, just caught this. Your first interpretation was correct. Each DB runs an ETL that can have up to 14 concurrent threads. I don't think the number should be that high, but the engineering team insists the load time is better than fewer threads running faster.