Hi,
In the nearby thread at
http://archives.postgresql.org/message-id/20140202140014.GM5930%40awork2.anarazel.de
Peter and I discovered that there is a large performance difference
between different max_connections on a larger machine (4x Opteron 6272,
64 cores together) in a readonly pgbench tests...
Just as reference, we're talking about a performance degradation from
475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
max_connections to 90, from 91...
On 2014-02-02 15:00:14 +0100, Andres Freund wrote:
> On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> > Here are the results of a benchmark on Nathan Boley's 64-core, 4
> > socket server:
> > http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
>
> That's interesting. The maximum number of what you see here (~293125)
> is markedly lower than what I can get.
>
> ... poke around ...
>
> Hm, that's partially because you're using pgbench without -M prepared if
> I see that correctly. The bottleneck in that case is primarily memory
> allocation. But even after that I am getting higher
> numbers: ~342497.
>
> Trying to nail down the differnce it oddly seems to be your
> max_connections=80 vs my 100. The profile in both cases is markedly
> different, way much more spinlock contention with 80. All in
> Pin/UnpinBuffer().
>
> I think =80 has to lead to some data being badly aligned. I can
> reproduce that =91 has *much* better performance than =90. 170841.844938
> vs 368490.268577 in a 10s test. Reproducable both with an without the test.
> That's certainly worth some investigation.
> This is *not* reproducable on the intel machine, so it might the
> associativity of the L1/L2 cache on the AMD.
So, I looked into this, and I am fairly certain it's because of the
(mis-)alignment of the buffer descriptors. With certain max_connections
settings InitBufferPool() happens to get 64byte aligned addresses, with
others not. I checked the alignment with gdb to confirm that.
A quick hack (attached) making BufferDescriptor 64byte aligned indeed
restored performance across all max_connections settings. It's not
surprising that a misaligned buffer descriptor causes problems -
there'll be plenty of false sharing of the spinlocks otherwise. Curious
that the the intel machine isn't hurt much by this.
Now all this hinges on the fact that by a mere accident
BufferDescriptors are 64byte in size:
struct sbufdesc {
BufferTag tag; /* 0 20 */
BufFlags flags; /* 20 2 */
uint16 usage_count; /* 22 2 */
unsigned int refcount; /* 24 4 */
int wait_backend_pid; /* 28 4 */
slock_t buf_hdr_lock; /* 32 1 */
/* XXX 3 bytes hole, try to pack */
int buf_id; /* 36 4 */
int freeNext; /* 40 4 */
/* XXX 4 bytes hole, try to pack */
LWLock * io_in_progress_lock; /* 48 8 */
LWLock * content_lock; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
/* size: 64, cachelines: 1, members: 10 */
/* sum members: 57, holes: 2, sum holes: 7 */
};
We could polish up the attached patch and apply it to all the branches,
the costs of memory are minimal. But I wonder if we shouldn't instead
make ShmemInitStruct() always return cacheline aligned addresses. That
will require some fiddling, but it might be a good idea nonetheless?
I think we should also consider some more reliable measures to have
BufferDescriptors cacheline sized, rather than relying on the happy
accident. Debugging alignment issues isn't fun, too much of a guessing
game...
Thoughts?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index e187242..96b4eea 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -78,10 +78,14 @@ InitBufferPool(void)
BufferDescriptors = (BufferDesc *)
ShmemInitStruct("Buffer Descriptors",
NBuffers * sizeof(BufferDesc), &foundDescs);
+ BufferDescriptors = (BufferDesc *)(
+ TYPEALIGN(64, BufferDescriptors));
BufferBlocks = (char *)
ShmemInitStruct("Buffer Blocks",
NBuffers * (Size) BLCKSZ, &foundBufs);
+ BufferBlocks = (char *) (
+ TYPEALIGN(64, BufferBlocks));
if (foundDescs || foundBufs)
{
@@ -167,9 +171,11 @@ BufferShmemSize(void)
/* size of buffer descriptors */
size = add_size(size, mul_size(NBuffers, sizeof(BufferDesc)));
+ size = add_size(size, 64);
/* size of data pages */
size = add_size(size, mul_size(NBuffers, BLCKSZ));
+ size = add_size(size, 64);
/* size of stuff controlled by freelist.c */
size = add_size(size, StrategyShmemSize());
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers