Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-10 Thread Merlin Moncure
On Wed, Nov 27, 2013 at 2:28 AM, Metin Doslu me...@citusdata.com wrote: We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-10 Thread Claudio Freire
On Tue, Dec 10, 2013 at 5:03 PM, Merlin Moncure mmonc...@gmail.com wrote: Also, can I see a typical 'top' during poor scaling count(*) activity? In particular, what's sys cpu%. I'm guessing it's non trivial. There was another thread, this seems like a mistaken double post or something like

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-10 Thread Merlin Moncure
On Tue, Dec 10, 2013 at 2:06 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Dec 10, 2013 at 5:03 PM, Merlin Moncure mmonc...@gmail.com wrote: Also, can I see a typical 'top' during poor scaling count(*) activity? In particular, what's sys cpu%. I'm guessing it's non trivial. There

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Metin Doslu
- When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? If we build with -DLWLOCK_STATS to print locking stats from PostgreSQL, we see tons of

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Andres Freund
On 2013-12-05 11:15:20 +0200, Metin Doslu wrote: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? If we build with -DLWLOCK_STATS to

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Metin Doslu
Is your workload bigger than RAM? RAM is bigger than workload (more than a couple of times). I think a good bit of the contention you're seeing in that listing is populating shared_buffers - and might actually vanish once you're halfway cached. From what I've seen so far the bigger problem

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Andres Freund
On 2013-12-05 11:33:29 +0200, Metin Doslu wrote: Is your workload bigger than RAM? RAM is bigger than workload (more than a couple of times). I think a good bit of the contention you're seeing in that listing is populating shared_buffers - and might actually vanish once you're halfway

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Metin Doslu
From what I've seen so far the bigger problem than contention in the lwlocks itself, is the spinlock protecting the lwlocks... Postgres 9.3.1 also reports spindelay, it seems that there is no contention on spinlocks. PID 21121 lwlock 0: shacq 0 exacq 33 blk 1 spindelay 0 PID 21121 lwlock 33:

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-05 Thread Claudio Freire
On Thu, Dec 5, 2013 at 1:03 PM, Metin Doslu me...@citusdata.com wrote: From what I've seen so far the bigger problem than contention in the lwlocks itself, is the spinlock protecting the lwlocks... Postgres 9.3.1 also reports spindelay, it seems that there is no contention on spinlocks. Did

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-04 Thread Metin Doslu
Maybe you could help test this patch: http://www.postgresql.org/message-id/20131115194725.gg5...@awork2.anarazel.de Which repository should I apply these patches. I tried main repository, 9.3 stable and source code of 9.3.1, and in my trials at least of one the patches is failed. What patch

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-04 Thread Metin Doslu
Here are some extra information: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? - I tried this test with 4 core machines including my

[PERFORM] Parallel Select query performance and shared buffers

2013-12-04 Thread Metin Doslu
We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-04 Thread Amit Kapila
On Wed, Dec 4, 2013 at 11:49 PM, Metin Doslu me...@citusdata.com wrote: Here are some extra information: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL

[PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Metin Doslu
We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or table joins. When we send concurrent Select queries to these tables, query

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Alvaro Herrera
Metin Doslu wrote: When we send concurrent Select queries to these tables, query performance doesn't scale out with the number of CPU cores. We find that complex Select queries scale out better than simpler ones. We also find that increasing the block size from 8 KB to 32 KB, or increasing

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Claudio Freire
On Tue, Dec 3, 2013 at 10:49 AM, Metin Doslu me...@citusdata.com wrote: We have several independent tables on a multi-core machine serving Select queries. These tables fit into memory; and each Select queries goes over one table's pages sequentially. In this experiment, there are no indexes or

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Metin Doslu
Looking into syncscan.c, it says in comments: When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. But in my workload, every process was running on a different table. On Tue, Dec 3, 2013 at 5:56 PM, Claudio Freire

Re: [PERFORM] Parallel Select query performance and shared buffers

2013-12-03 Thread Claudio Freire
On Tue, Dec 3, 2013 at 1:24 PM, Metin Doslu me...@citusdata.com wrote: Looking into syncscan.c, it says in comments: When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. But in my workload, every process was