Hi,

In the nearby thread at
http://archives.postgresql.org/message-id/20140202140014.GM5930%40awork2.anarazel.de
Peter and I discovered that there is a large performance difference
between different max_connections on a larger machine (4x Opteron 6272,
64 cores together) in a readonly pgbench tests...

Just as reference, we're talking about a performance degradation from
475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
max_connections to 90, from 91...

On 2014-02-02 15:00:14 +0100, Andres Freund wrote:
> On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> > Here are the results of a benchmark on Nathan Boley's 64-core, 4
> > socket server: 
> > http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
>
> That's interesting. The maximum number of what you see here (~293125)
> is markedly lower than what I can get.
>
> ... poke around ...
>
> Hm, that's partially because you're using pgbench without -M prepared if
> I see that correctly. The bottleneck in that case is primarily memory
> allocation. But even after that I am getting higher
> numbers: ~342497.
>
> Trying to nail down the differnce it oddly seems to be your
> max_connections=80 vs my 100. The profile in both cases is markedly
> different, way much more spinlock contention with 80. All in
> Pin/UnpinBuffer().
>
> I think =80 has to lead to some data being badly aligned. I can
> reproduce that =91 has *much* better performance than =90. 170841.844938
> vs 368490.268577 in a 10s test. Reproducable both with an without the test.
> That's certainly worth some investigation.
> This is *not* reproducable on the intel machine, so it might the
> associativity of the L1/L2 cache on the AMD.

So, I looked into this, and I am fairly certain it's because of the
(mis-)alignment of the buffer descriptors. With certain max_connections
settings InitBufferPool() happens to get 64byte aligned addresses, with
others not. I checked the alignment with gdb to confirm that.

A quick hack (attached) making BufferDescriptor 64byte aligned indeed
restored performance across all max_connections settings. It's not
surprising that a misaligned buffer descriptor causes problems -
there'll be plenty of false sharing of the spinlocks otherwise. Curious
that the the intel machine isn't hurt much by this.

Now all this hinges on the fact that by a mere accident
BufferDescriptors are 64byte in size:
struct sbufdesc {
        BufferTag                  tag;                  /*     0    20 */
        BufFlags                   flags;                /*    20     2 */
        uint16                     usage_count;          /*    22     2 */
        unsigned int               refcount;             /*    24     4 */
        int                        wait_backend_pid;     /*    28     4 */
        slock_t                    buf_hdr_lock;         /*    32     1 */

        /* XXX 3 bytes hole, try to pack */

        int                        buf_id;               /*    36     4 */
        int                        freeNext;             /*    40     4 */

        /* XXX 4 bytes hole, try to pack */

        LWLock *                   io_in_progress_lock;  /*    48     8 */
        LWLock *                   content_lock;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

        /* size: 64, cachelines: 1, members: 10 */
        /* sum members: 57, holes: 2, sum holes: 7 */
};

We could polish up the attached patch and apply it to all the branches,
the costs of memory are minimal. But I wonder if we shouldn't instead
make ShmemInitStruct() always return cacheline aligned addresses. That
will require some fiddling, but it might be a good idea nonetheless?

I think we should also consider some more reliable measures to have
BufferDescriptors cacheline sized, rather than relying on the happy
accident. Debugging alignment issues isn't fun, too much of a guessing
game...

Thoughts?

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index e187242..96b4eea 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -78,10 +78,14 @@ InitBufferPool(void)
 	BufferDescriptors = (BufferDesc *)
 		ShmemInitStruct("Buffer Descriptors",
 						NBuffers * sizeof(BufferDesc), &foundDescs);
+	BufferDescriptors = (BufferDesc *)(
+		TYPEALIGN(64, BufferDescriptors));
 
 	BufferBlocks = (char *)
 		ShmemInitStruct("Buffer Blocks",
 						NBuffers * (Size) BLCKSZ, &foundBufs);
+	BufferBlocks = (char *) (
+		TYPEALIGN(64, BufferBlocks));
 
 	if (foundDescs || foundBufs)
 	{
@@ -167,9 +171,11 @@ BufferShmemSize(void)
 
 	/* size of buffer descriptors */
 	size = add_size(size, mul_size(NBuffers, sizeof(BufferDesc)));
+	size = add_size(size, 64);
 
 	/* size of data pages */
 	size = add_size(size, mul_size(NBuffers, BLCKSZ));
+	size = add_size(size, 64);
 
 	/* size of stuff controlled by freelist.c */
 	size = add_size(size, StrategyShmemSize());
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to