Re: [HACKERS] why roll-your-own s_lock? / improving scalability
Martijn van Oosterhout writes: > And then you have fabulous things like: > https://git.reviewboard.kde.org/r/102145/ > (OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support > it.) > Seems not very well tested in any case. > It might be worthwhile testing futexes on Linux though, they are > specifically supported on any kind of shared memory (shm/mmap/fork/etc) > and quite well tested. Yeah, a Linux-specific replacement of spinlocks with futexes seems like a lot safer idea than "let's rely on posix mutexes everywhere". It's still unproven whether it'd be an improvement, but you could expect to prove it one way or the other with a well-defined amount of testing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why roll-your-own s_lock? / improving scalability
On Tue, Jun 26, 2012 at 01:46:06PM -0500, Merlin Moncure wrote: > Well, that would introduce a backend dependency on pthreads, which is > unpleasant. Also you'd need to feature test via > _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between > processes (and configure your mutexes as such when you do). There are > probably other reasons why this can't be done, but I personally don' t > klnow of any. And then you have fabulous things like: https://git.reviewboard.kde.org/r/102145/ (OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support it.) Seems not very well tested in any case. It might be worthwhile testing futexes on Linux though, they are specifically supported on any kind of shared memory (shm/mmap/fork/etc) and quite well tested. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] why roll-your-own s_lock? / improving scalability
> But if you start with "let's not support any platforms that don't have this > feature" This will never be my intention. Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why roll-your-own s_lock? / improving scalability
Hi Merlin, > _POSIX_THREAD_PROCESS_SHARED sure. > Also, it's forbidden to do things like invoke i/o in the backend while > holding only a spinlock. As to your larger point, it's an interesting > assertion -- some data to back it up would help. Let's see if I can get any. ATM I've only got indications, but no proof. Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why roll-your-own s_lock? / improving scalability
Nils Goroll writes: > Now that the scene is set, here's the simple question: Why all this? Why not > simply use posix mutexes which, on modern platforms, will map to efficient > implementations like adaptive mutexes or futexes? (1) They do not exist everywhere. (2) There is absolutely no evidence to suggest that they'd make things better. If someone cared to rectify (2), we could consider how to use them as an alternative implementation. But if you start with "let's not support any platforms that don't have this feature", you're going to get a cold reception. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why roll-your-own s_lock? / improving scalability
On Tue, Jun 26, 2012 at 12:02 PM, Nils Goroll wrote: > Hi, > > I am currently trying to understand what looks like really bad scalability of > 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but > only > marginal amounts of additional load seem to push it to 70% and the application > becomes highly unresponsive. > > My current understanding basically matches the issues being addressed by > various > 9.2 improvements, well summarized in > http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf > > An additional aspect is that, in order to address the latent risk of data > loss & > corruption with WBCs and async replication, we have deliberately moved the db > from a similar system with WB cached storage to ssd based storage without a > WBC, > which, by design, has (in the best WBC case) approx. 100x higher latencies, > but > much higher sustained throughput. > > > On the new system, even with 30% user "acceptable" load, oprofile makes > apparent > significant lock contention: > > opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres > > > Profiling through timer interrupt > samples % image name symbol name > 30240 27.9720 postgres s_lock > 5069 4.6888 postgres GetSnapshotData > 3743 3.4623 postgres AllocSetAlloc > 3167 2.9295 libc-2.12.so strcoll_l > 2662 2.4624 postgres SearchCatCache > 2495 2.3079 postgres hash_search_with_hash_value > 2143 1.9823 postgres nocachegetattr > 1860 1.7205 postgres LWLockAcquire > 1642 1.5189 postgres base_yyparse > 1604 1.4837 libc-2.12.so __strcmp_sse42 > 1543 1.4273 libc-2.12.so __strlen_sse42 > 1156 1.0693 libc-2.12.so memcpy > > Unfortunately I don't have profiling data for the high-load / contention > condition yet, but I fear the picture will be worse and pointing in the same > direction. > > > In particular, the _impression_ is that lock contention could also be related > to > I/O latencies making me fear that cases could exist where spin locks are being > helt while blocking on IO. > > > > Looking at the code, it appears to me that the roll-your-own s_lock code > cannot > handle a couple of cases, for instance it will also spin when the lock holder > is > not running at all or blocking on IO (which could even be implicit, e.g. for a > page flush). These issues have long been addressed by adaptive mutexes and > futexes. > > Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when > having spun for long (not not blocked), spin even longer in future), which > appears to me to have the potential of becoming highly counter-productive. > > > Now that the scene is set, here's the simple question: Why all this? Why not > simply use posix mutexes which, on modern platforms, will map to efficient > implementations like adaptive mutexes or futexes? Well, that would introduce a backend dependency on pthreads, which is unpleasant. Also you'd need to feature test via _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between processes (and configure your mutexes as such when you do). There are probably other reasons why this can't be done, but I personally don' t klnow of any. Also, it's forbidden to do things like invoke i/o in the backend while holding only a spinlock. As to your larger point, it's an interesting assertion -- some data to back it up would help. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] why roll-your-own s_lock? / improving scalability
Hi, I am currently trying to understand what looks like really bad scalability of 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only marginal amounts of additional load seem to push it to 70% and the application becomes highly unresponsive. My current understanding basically matches the issues being addressed by various 9.2 improvements, well summarized in http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf An additional aspect is that, in order to address the latent risk of data loss & corruption with WBCs and async replication, we have deliberately moved the db from a similar system with WB cached storage to ssd based storage without a WBC, which, by design, has (in the best WBC case) approx. 100x higher latencies, but much higher sustained throughput. On the new system, even with 30% user "acceptable" load, oprofile makes apparent significant lock contention: opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres Profiling through timer interrupt samples %image name symbol name 3024027.9720 postgres s_lock 5069 4.6888 postgres GetSnapshotData 3743 3.4623 postgres AllocSetAlloc 3167 2.9295 libc-2.12.so strcoll_l 2662 2.4624 postgres SearchCatCache 2495 2.3079 postgres hash_search_with_hash_value 2143 1.9823 postgres nocachegetattr 1860 1.7205 postgres LWLockAcquire 1642 1.5189 postgres base_yyparse 1604 1.4837 libc-2.12.so __strcmp_sse42 1543 1.4273 libc-2.12.so __strlen_sse42 1156 1.0693 libc-2.12.so memcpy Unfortunately I don't have profiling data for the high-load / contention condition yet, but I fear the picture will be worse and pointing in the same direction. In particular, the _impression_ is that lock contention could also be related to I/O latencies making me fear that cases could exist where spin locks are being helt while blocking on IO. Looking at the code, it appears to me that the roll-your-own s_lock code cannot handle a couple of cases, for instance it will also spin when the lock holder is not running at all or blocking on IO (which could even be implicit, e.g. for a page flush). These issues have long been addressed by adaptive mutexes and futexes. Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when having spun for long (not not blocked), spin even longer in future), which appears to me to have the potential of becoming highly counter-productive. Now that the scene is set, here's the simple question: Why all this? Why not simply use posix mutexes which, on modern platforms, will map to efficient implementations like adaptive mutexes or futexes? Thanks, Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers