On 2014-02-04 16:24:02 -0800, Peter Geoghegan wrote: > On Mon, Feb 3, 2014 at 3:38 PM, Andres Freund <and...@2ndquadrant.com> wrote: > >> > A quick hack (attached) making BufferDescriptor 64byte aligned indeed > >> > restored performance across all max_connections settings. It's not > >> > surprising that a misaligned buffer descriptor causes problems - > >> > there'll be plenty of false sharing of the spinlocks otherwise. Curious > >> > that the the intel machine isn't hurt much by this. > > >> What fiddling are you thinking of? > > > > Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before > > returning from ShmemAlloc() (and thereby ShmemInitStruct). > > There is something you have not drawn explicit attention to that is > very interesting. If we take REL9_3_STABLE tip to be representative > (built with full -O2 optimization, no assertions just debugging > symbols), setting max_connections to 91 from 90 does not have the > effect of making the BufferDescriptors array aligned; it has the > effect of making it *misaligned*. You reported that 91 was much better > than 90. I think that the problem actually occurs when the array *is* > aligned!
I don't think you can learn much from the alignment in 9.3 vs. HEAD. Loads has changed since, most prominently and recently Robert's LWLock work. That certainly has changed allocation patterns. It will also depend on some other parameters, e.g. changing max_wal_senders, max_background_workers will also change the offset. It's not that 91 is intrinsically better, it just happened to give a aligned BufferDescriptors array when the other parameters weren't changed at the same time. > I suspect that the scenario described in this article accounts for the > quite noticeable effect reported: http://danluu.com/3c-conflict I don't think that's applicable here. What's described there is relevant for access patterns that are larger multiple of the cacheline size - but our's is exactly cacheline sized. What can happen in such scenarios is that all your accesses map to the same set of cachelines, so instead of using most of the cache, you end up using only 8 or so (8 is a common size of set associative caches these days). Theoretically we could see something like that for shared_buffers itself, but I *think* our accesses are too far spread around in them for that to be a significant issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers