Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane
Martijn van Oosterhout  writes:
> And then you have fabulous things like:
> https://git.reviewboard.kde.org/r/102145/
> (OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
> it.)

> Seems not very well tested in any case.

> It might be worthwhile testing futexes on Linux though, they are
> specifically supported on any kind of shared memory (shm/mmap/fork/etc)
> and quite well tested.

Yeah, a Linux-specific replacement of spinlocks with futexes seems like
a lot safer idea than "let's rely on posix mutexes everywhere".  It's
still unproven whether it'd be an improvement, but you could expect to
prove it one way or the other with a well-defined amount of testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Martijn van Oosterhout
On Tue, Jun 26, 2012 at 01:46:06PM -0500, Merlin Moncure wrote:
> Well, that would introduce a backend dependency on pthreads, which is
> unpleasant.  Also you'd need to feature test via
> _POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
> processes (and configure your mutexes as such when you do).  There are
> probably other reasons why this can't be done, but I personally don' t
> klnow of any.

And then you have fabulous things like:

https://git.reviewboard.kde.org/r/102145/
(OSX defines _POSIX_THREAD_PROCESS_SHARED but does not actually support
it.)

Seems not very well tested in any case.

It might be worthwhile testing futexes on Linux though, they are
specifically supported on any kind of shared memory (shm/mmap/fork/etc)
and quite well tested.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll

> But if you start with "let's not support any platforms that don't have this 
> feature"

This will never be my intention.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll
Hi Merlin,

> _POSIX_THREAD_PROCESS_SHARED

sure.

> Also, it's forbidden to do things like invoke i/o in the backend while
> holding only a spinlock. As to your larger point, it's an interesting
> assertion -- some data to back it up would help.

Let's see if I can get any. ATM I've only got indications, but no proof.

Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Tom Lane
Nils Goroll  writes:
> Now that the scene is set, here's the simple question: Why all this? Why not
> simply use posix mutexes which, on modern platforms, will map to efficient
> implementations like adaptive mutexes or futexes?

(1) They do not exist everywhere.
(2) There is absolutely no evidence to suggest that they'd make things better.

If someone cared to rectify (2), we could consider how to use them as an
alternative implementation.  But if you start with "let's not support
any platforms that don't have this feature", you're going to get a cold
reception.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Merlin Moncure
On Tue, Jun 26, 2012 at 12:02 PM, Nils Goroll  wrote:
> Hi,
>
> I am currently trying to understand what looks like really bad scalability of
> 9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but 
> only
> marginal amounts of additional load seem to push it to 70% and the application
> becomes highly unresponsive.
>
> My current understanding basically matches the issues being addressed by 
> various
> 9.2 improvements, well summarized in
> http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf
>
> An additional aspect is that, in order to address the latent risk of data 
> loss &
> corruption with WBCs and async replication, we have deliberately moved the db
> from a similar system with WB cached storage to ssd based storage without a 
> WBC,
> which, by design, has (in the best WBC case) approx. 100x higher latencies, 
> but
> much higher sustained throughput.
>
>
> On the new system, even with 30% user "acceptable" load, oprofile makes 
> apparent
> significant lock contention:
>
> opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres
>
>
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 30240    27.9720  postgres                 s_lock
> 5069      4.6888  postgres                 GetSnapshotData
> 3743      3.4623  postgres                 AllocSetAlloc
> 3167      2.9295  libc-2.12.so             strcoll_l
> 2662      2.4624  postgres                 SearchCatCache
> 2495      2.3079  postgres                 hash_search_with_hash_value
> 2143      1.9823  postgres                 nocachegetattr
> 1860      1.7205  postgres                 LWLockAcquire
> 1642      1.5189  postgres                 base_yyparse
> 1604      1.4837  libc-2.12.so             __strcmp_sse42
> 1543      1.4273  libc-2.12.so             __strlen_sse42
> 1156      1.0693  libc-2.12.so             memcpy
>
> Unfortunately I don't have profiling data for the high-load / contention
> condition yet, but I fear the picture will be worse and pointing in the same
> direction.
>
> 
> In particular, the _impression_ is that lock contention could also be related 
> to
> I/O latencies making me fear that cases could exist where spin locks are being
> helt while blocking on IO.
> 
>
>
> Looking at the code, it appears to me that the roll-your-own s_lock code 
> cannot
> handle a couple of cases, for instance it will also spin when the lock holder 
> is
> not running at all or blocking on IO (which could even be implicit, e.g. for a
> page flush). These issues have long been addressed by adaptive mutexes and 
> futexes.
>
> Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
> having spun for long (not not blocked), spin even longer in future), which
> appears to me to have the potential of becoming highly counter-productive.
>
>
> Now that the scene is set, here's the simple question: Why all this? Why not
> simply use posix mutexes which, on modern platforms, will map to efficient
> implementations like adaptive mutexes or futexes?

Well, that would introduce a backend dependency on pthreads, which is
unpleasant.  Also you'd need to feature test via
_POSIX_THREAD_PROCESS_SHARED to make sure you can mutex between
processes (and configure your mutexes as such when you do).  There are
probably other reasons why this can't be done, but I personally don' t
klnow of any.

Also, it's forbidden to do things like invoke i/o in the backend while
holding only a spinlock. As to your larger point, it's an interesting
assertion -- some data to back it up would help.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] why roll-your-own s_lock? / improving scalability

2012-06-26 Thread Nils Goroll
Hi,

I am currently trying to understand what looks like really bad scalability of
9.1.3 on a 64core 512GB RAM system: the system runs OK when at 30% usr, but only
marginal amounts of additional load seem to push it to 70% and the application
becomes highly unresponsive.

My current understanding basically matches the issues being addressed by various
9.2 improvements, well summarized in
http://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU-performance-in-9.2.pdf

An additional aspect is that, in order to address the latent risk of data loss &
corruption with WBCs and async replication, we have deliberately moved the db
from a similar system with WB cached storage to ssd based storage without a WBC,
which, by design, has (in the best WBC case) approx. 100x higher latencies, but
much higher sustained throughput.


On the new system, even with 30% user "acceptable" load, oprofile makes apparent
significant lock contention:

opreport --symbols --merge tgid -l /mnt/db1/hdd/pgsql-9.1/bin/postgres


Profiling through timer interrupt
samples  %image name   symbol name
3024027.9720  postgres s_lock
5069  4.6888  postgres GetSnapshotData
3743  3.4623  postgres AllocSetAlloc
3167  2.9295  libc-2.12.so strcoll_l
2662  2.4624  postgres SearchCatCache
2495  2.3079  postgres hash_search_with_hash_value
2143  1.9823  postgres nocachegetattr
1860  1.7205  postgres LWLockAcquire
1642  1.5189  postgres base_yyparse
1604  1.4837  libc-2.12.so __strcmp_sse42
1543  1.4273  libc-2.12.so __strlen_sse42
1156  1.0693  libc-2.12.so memcpy

Unfortunately I don't have profiling data for the high-load / contention
condition yet, but I fear the picture will be worse and pointing in the same
direction.


In particular, the _impression_ is that lock contention could also be related to
I/O latencies making me fear that cases could exist where spin locks are being
helt while blocking on IO.



Looking at the code, it appears to me that the roll-your-own s_lock code cannot
handle a couple of cases, for instance it will also spin when the lock holder is
not running at all or blocking on IO (which could even be implicit, e.g. for a
page flush). These issues have long been addressed by adaptive mutexes and 
futexes.

Also, the s_lock code tries to be somehow adaptive using spins_per_delay (when
having spun for long (not not blocked), spin even longer in future), which
appears to me to have the potential of becoming highly counter-productive.


Now that the scene is set, here's the simple question: Why all this? Why not
simply use posix mutexes which, on modern platforms, will map to efficient
implementations like adaptive mutexes or futexes?

Thanks, Nils

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers