[PERFORM] Parallel Select query performance and shared buffers
We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing shared_buffers to include the working set mitigates the problem to some extent. For our experiments, we chose an 8-core machine with 68 GB of memory from Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set shared_buffers to 4 GB. We then generated 1, 2, 4, and 8 separate tables using the data generator from the industry standard TPC-H benchmark. Each table we generated, called lineitem-1, lineitem-2, etc., had about 750 MB of data. Next, we sent 1, 2, 4, and 8 concurrent Select queries to these tables to observe the scale out behavior. Our expectation was that since this machine had 8 cores, our run times would stay constant all throughout. Also, we would have expected the machine's CPU utilization to go up to 100% at 8 concurrent queries. Neither of those assumptions held true. We found that query run times degraded as we increased the number of concurrent Select queries. Also, CPU utilization flattened out at less than 50% for the simpler queries. Full results with block size of 8KB are below: Table select count(*)TPC-H Simple (#6)[2] TPC-H Complex (#1)[1] 1 Table / 1 query 1.5 s2.5 s 8.4 s 2 Tables / 2 queries 1.5 s2.5 s 8.4 s 4 Tables / 4 queries 2.0 s2.9 s 8.8 s 8 Tables / 8 queries 3.3 s4.0 s 9.6 s We then increased the block size (BLCKSZ) from 8 KB to 32 KB and recompiled PostgreSQL. This change had a positive impact on query completion times. Here are the new results with block size of 32 KB: Table select count(*)TPC-H Simple (#6)[2] TPC-H Complex (#1)[1] 1 Table / 1 query 1.5 s2.3 s 8.0 s 2 Tables / 2 queries 1.5 s2.3 s 8.0 s 4 Tables / 4 queries 1.6 s2.4 s 8.1 s 8 Tables / 8 queries 1.8 s2.7 s 8.3 s As a quick side, we also repeated the same experiment on an EC2 instance with 16 CPU cores, and found that the scale out behavior became worse there. (We also tried increasing the shared_buffers to 30 GB. This change completely solved the scaling out problem on this instance type, but hurt our performance on the hi1.4xlarge instances.) Unfortunately, increasing the block size from 8 to 32 KB has other implications for some of our customers. Could you help us out with the problem here? What can we do to identify the problem's root cause? Can we work around it? Thank you, Metin [1] http://examples.citusdata.com/tpch_queries.html#query-1 [2] http://examples.citusdata.com/tpch_queries.html#query-6
Re: [PERFORM] Parallel Select query performance and shared buffers
Metin Doslu wrote: When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing shared_buffers to include the working set mitigates the problem to some extent. Maybe you could help test this patch: http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parallel Select query performance and shared buffers
On Tue, Dec 3, 2013 at 10:49 AM, Metin Doslu me...@citusdata.com wrote: We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing shared_buffers to include the working set mitigates the problem to some extent. For our experiments, we chose an 8-core machine with 68 GB of memory from Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set shared_buffers to 4 GB. If you are certain your tables fit in RAM, you may want to disable synchronized sequential scans, as they will create contention between the threads. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parallel Select query performance and shared buffers
Looking into syncscan.c, it says in comments: When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. But in my workload, every process was running on a different table. On Tue, Dec 3, 2013 at 5:56 PM, Claudio Freire klaussfre...@gmail.comwrote: On Tue, Dec 3, 2013 at 10:49 AM, Metin Doslu me...@citusdata.com wrote: We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing shared_buffers to include the working set mitigates the problem to some extent. For our experiments, we chose an 8-core machine with 68 GB of memory from Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set shared_buffers to 4 GB. If you are certain your tables fit in RAM, you may want to disable synchronized sequential scans, as they will create contention between the threads.
Re: [PERFORM] Parallel Select query performance and shared buffers
On Tue, Dec 3, 2013 at 1:24 PM, Metin Doslu me...@citusdata.com wrote: Looking into syncscan.c, it says in comments: When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. But in my workload, every process was running on a different table. Ah, ok, so that's what you meant by independent tables. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] One query run twice in parallel results in huge performance decrease
Jeff Janes wrote: I think what I would do next is EXPLAIN (without ANALYZE) one of the queries repeatedly, say once a second, while the other query either runs or doesn't run repeatedly, that is the other query runs for 11 minutes (or however it takes to run), and then sleeps for 11 minutes in a loop. Then you can see if the explain plan differs very reliably, and if the transition is exactly aligned with the other starting and stopping or if it is offset. Hi Jeff, I ran the one analyze over and over again as you proposed - but the result never changed. But I think I found a solution for the problem. While browsing through the manual I found a statement about GIN indexes: For tables with GIN indexes, VACUUM (in any form) also completes any pending index insertions, by moving pending index entries to the appropriate places in the main GIN index structure. I use a gist and no gin index, but I tried to vacuum the (freshly filled) table, and it helped. It seems that the planer is simply not aware of the existence of the index although I run an analyze on the table right before the query. Thank you all for your suggestions! Jan