Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-28 Thread Jeff Janes
On Tue, Jun 26, 2012 at 3:58 PM, Nils Goroll sl...@schokola.de wrote:
 It's
 still unproven whether it'd be an improvement, but you could expect to
 prove it one way or the other with a well-defined amount of testing.

 I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see
 attached patch. The patch is for the git head, but it can easily be applied 
 for
 9.1.3, which is what I did for my tests.

 This had disastrous effects on Solaris because it does not use anything 
 similar
 to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do
 without syscalls for the simple case).

 But I was surprised to see that it works relatively well on linux. Here's a
 glimpse of my results:

 hacked code 9.1.3:
...
 tps = 485.964355 (excluding connections establishing)

 original code (vanilla build on amd64) 9.1.3:
...
 tps = 510.410883 (excluding connections establishing)


It looks like the hacked code is slower than the original.  That
doesn't seem so good to me.  Am I misreading this?

Also, 20 transactions per connection is not enough of a run to make
any evaluation on.

How many cores are you testing on?

 Regarding the actual production issue, I did not manage to synthetically 
 provoke
 the saturation we are seeing in production using pgbench - I could not even 
 get
 anywhere near the production load.

What metrics/tools are you using to compare the two loads?  What is
the production load like?

Each transaction has to update one of ten pgbench_branch rows, so you
can't have more than ten transactions productively active at any given
time, even though you have 768 connections.  So you need to jack up
the pgbench scale, or switch to using -N mode.

Also, you should use -M prepared, otherwise you spend more time
parsing and planning the statements than executing them.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-28 Thread Robert Haas
On Thu, Jun 28, 2012 at 11:21 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 Also, 20 transactions per connection is not enough of a run to make
 any evaluation on.

FWIW, I kicked off a looong benchmarking run on this a couple of days
ago on the IBM POWER7 box, testing pgbench -S, regular pgbench, and
pgbench --unlogged-tables at various client counts with and without
the patch; three half-hour test runs for each test configuration.  It
should be done tonight and I will post the results once they're in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-27 Thread Martijn van Oosterhout
On Wed, Jun 27, 2012 at 12:58:47AM +0200, Nils Goroll wrote:
 So it looks like using pthread_mutexes could at least be an option on Linux.
 
 Using futexes directly could be even cheaper.

Note that below this you only have the futex(2) system call. Futexes
require all counter manipulation to happen in userspace, just like now,
so all the per architecture stuff remains.  On Linux pthread mutexes
are really just a thin wrapper on top of this.

The futex(2) system call merely provides an interface for handling the
blocking and waking of other processes and releasing locks on process
exit (so everything can still work after a kill -9).

So it's more a replacement for the SysV semaphores than anything else.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-27 Thread Nils Goroll
 Using futexes directly could be even cheaper.
 Note that below this you only have the futex(2) system call.
I was only referring to the fact that we could save one function and one library
call, which could make a difference for the uncontended case.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-26 Thread Nils Goroll
 It's
 still unproven whether it'd be an improvement, but you could expect to
 prove it one way or the other with a well-defined amount of testing.

I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see
attached patch. The patch is for the git head, but it can easily be applied for
9.1.3, which is what I did for my tests.

This had disastrous effects on Solaris because it does not use anything similar
to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do
without syscalls for the simple case).

But I was surprised to see that it works relatively well on linux. Here's a
glimpse of my results:

hacked code 9.1.3:

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time
./postgres -D ../data -p 55502  ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ;
./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid
sending incremental file list
...
ransaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 768
number of threads: 128
number of transactions per client: 20
number of transactions actually processed: 15360/15360
tps = 476.873261 (including connections establishing)
tps = 485.964355 (excluding connections establishing)
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
-bash-4.1$ LOG:  shutting down
LOG:  database system is shut down
210.58user 78.88system 0:50.64elapsed 571%CPU (0avgtext+0avgdata
1995968maxresident)k
0inputs+1153872outputs (0major+2464649minor)pagefaults 0swaps

original code (vanilla build on amd64) 9.1.3:

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ ../data/ ; /usr/bin/time
./postgres -D ../data -p 55502  ppid=$! ; pid=$(pgrep -P $ppid ) ; sleep 15 ;
./pgbench -c 768 -t 20 -j 128 -p 55502 postgres ; kill $pid
sending incremental file list
...
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 768
number of threads: 128
number of transactions per client: 20
number of transactions actually processed: 15360/15360
tps = 499.993685 (including connections establishing)
tps = 510.410883 (excluding connections establishing)
LOG:  received smart shutdown request
-bash-4.1$ LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
196.21user 71.38system 0:47.99elapsed 557%CPU (0avgtext+0avgdata
1360800maxresident)k
0inputs+1147904outputs (0major+2375965minor)pagefaults 0swaps


config:

-bash-4.1$ egrep '^[a-z]' /tmp/test_template_data/postgresql.conf
max_connections = 1800  # (change requires restart)
shared_buffers = 10GB   # min 128kB
temp_buffers = 64MB # min 800kB
work_mem = 256MB# min 64kB,d efault 1MB
maintenance_work_mem = 2GB  # min 1MB, default 16MB
bgwriter_delay = 10ms   # 10-1ms between rounds
bgwriter_lru_maxpages = 1000# 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10.0  # 0-10.0 multipler on buffers 
scanned/round
wal_level = hot_standby # minimal, archive, or hot_standby
wal_buffers = 64MB  # min 32kB, -1 sets based on 
shared_buffers
commit_delay = 1# range 0-10, in microseconds
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error 
message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'  # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.5  # same scale as above (default: 4.0)
cpu_tuple_cost = 0.005
cpu_index_tuple_cost = 0.0025
cpu_operator_cost = 0.0001
effective_cache_size = 192GB



So it looks like using pthread_mutexes could at least be an option on Linux.

Using futexes directly could be even cheaper.


As a side note, it looks like I have not expressed myself clearly:

I did not intend to suggest to replace proven, working code (which probably is
the best you can get for some platforms) with posix calls. I apologize for the
provocative question.


Regarding the actual production issue, I did not manage to synthetically provoke
the saturation we are seeing in production using pgbench - I could not even get
anywhere near the production load. So I cannot currently test if reducing the
amount of spinning and waking up exactly one waiter (which is what linux/nptl
pthread_mutex_unlock does) would solve/mitigate the production issue I am
working on, and I'd highly appreciate any pointers in this direction.

Cheers, Nils
diff --git a/src/backend/storage/lmgr/s_lock.c 
b/src/backend/storage/lmgr/s_lock.c
index bc8d89f..a45fdf6 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -20,6 +20,8 @@