Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Robert Haas
On Sun, Aug 5, 2012 at 7:19 PM, Nils Goroll sl...@schokola.de wrote:
 meanwhile we're using the patch in production (again, this is 9.1.3) and
 after running it under full load for one week I believe it is pretty safe to
 say that replacing the spinlock code with pthread_mutexes on Linux (which
 basically are a futex wrapper) has solved the scalability issue and all
 stability/performance problems on this system are simply gone.

 While the improved pgbench run had already given a clear indication
 regarding the optimization potential, we can now be pretty certain that
 spinlock contention had really been the most significant root cause for the
 issues I had described in my early postings (why roll-your-own s_lock? /
 improving scalability / experimental: replace s_lock spinlock code with
 pthread_mutex on linux).

 I am attaching annotated graphs showing the load averages and cpu statistics
 of the respective machine. Please note the fact that the highest spikes have
 been averaged out in these graphs. As I had mentioned before, with the
 original code in place we had seen saturation of 64 cores and load averages
 in excess of 300.


 I fully agree that improvements in more recent pgsql code to reduce the
 number of required locks or, even better, lockless data structures are the
 way to go, but for the remaining cases it should now have become apparent
 that favoring efficient mutex implementations is advantageous for large
 SMPs, where they exist (e.g. futexes on Linux).

Interesting data.  I guess the questions in my mind are:

1. How much we're paying for this in the uncontended case?

2. Should we be modifying our spinlock implementation on Linux to use
futexes rather than pulling pthreads into the mix?

Anyone have data on the first point, or opinions on the second one?

I certainly think there is some potential here in terms of preventing
the worst-case situation where the entire machine ends up spending a
major portion of its CPU time in s_lock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Martijn van Oosterhout
On Mon, Aug 06, 2012 at 08:54:11AM -0400, Robert Haas wrote:
 2. Should we be modifying our spinlock implementation on Linux to use
 futexes rather than pulling pthreads into the mix?
 
 Anyone have data on the first point, or opinions on the second one?

I'm not sure whether pthreads is such a thick layer. Or are you
referring to the fact that you don't want to link against the library
at all?

If we've found a situation where our locks work better than the ones in
pthreads than either (a) we're doing something wrong or (b) the
pthreads implementation could do with improvement.

In either case it might be worth some investigation. If we can improve
the standard pthreads implementation everybody wins.

BTW, I read that some *BSDs have futex implementations (to emulate
linux), it might be an idea to see where they're going.

e.g. http://osdir.com/ml/os.dragonfly-bsd.kernel/2003-10/msg00232.html

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] spinlock-pthread_mutex : real world results

2012-08-06 Thread Nils Goroll

Robert,


1. How much we're paying for this in the uncontended case?


Using glibc, we have the overhead of an additional library function call, which 
we could eliminate by pulling in the code from glibc/nptl or a source of other 
proven reference code.


The pgbench results I had posted before 
http://archives.postgresql.org/pgsql-hackers/2012-07/msg00061.php could give an 
indication on the higher base cost for the simple approach.



I have mentioned this before: While I agree that minimizing the base overhead is 
good, IMHO, optimizing the worst case is the important part here.


Nils

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers