On Wed, Nov 16, 2011 at 9:47 AM, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > This suggests that in the long term, it might be worth investigating > whether we can arrange for a connection's process to have some > degree of core affinity and encourage each process to allocate local > memory from RAM controlled by that core. To some extent I would > expect the process-based architecture of PostgreSQL to help with > that, as you would expect a NUMA-aware OS to try to arrange that to > some degree.
I've done some testing on HP/UX-Itanium and have not been able to demonstrate any significant performance benefit from overriding the operating system's default policies regarding processor affinity. For example, I hacked the code to request that the shared memory be allocated as cell-local memory, then used mpsched with the FILL_TREE policy to bind everything to a single cell, and sure enough it all ran in that cell, but it wasn't any better than 4 clients running on different cells with the shared memory segment allocated interleaved. This result didn't really make much sense to me, because it seemed like it SHOULD have helped. So it's possible I did something wrong. But if so, I couldn't find it. The other possibility is that the OS is smart enough about moving things around to get good locality that sticking locality hints on top doesn't really make any difference. Certainly, I would expect any OS to be smart enough to allocate backend-local memory on the same processor where the task is running, and to avoid moving processes between cells more than necessary. Regarding results instability, on some patch sets I've tried, I've seen very unstable performance. I've also noticed that a very short run sometimes gives much higher performance than a longer run. My working theory is that this is the result of spinlock contention. Suppose you have a spinlock that is getting passed around very quickly between, say, 32 processes. Since the data protected by the spinlock is on the same cache line as the lock, what ideally happens is that the process gets the lock and then finishes its work and releases the lock before anyone else tries to pull the cache line away. And at the beginning of the run, that's what does actually happen. But then for some reason a process gets context-switched out while it holds the lock, or maybe it's just that somebody gets unlucky enough to have the cache line stolen before they can dump the spinlock and can't quite get it back fast enough. Now people start to pile up trying to get that spinlock, and that means trouble, because now it's much harder for any given process to get the cache line and finish its work before the cache line gets stolen away. So my theory is that now the performance goes down more or less "permanently", unless or until there's some momentary break in the action that lets the queue of people waiting for that spinlock drain out. This is just a wild-ass guess, and I might be totally wrong... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers