>> That would be great. What I've been using as a test case is pgbench >> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and >> shared_buffers=8GB. >> >> I think what you'd want to compare is the performance of unpatched >> master, vs. the performance with this line added to s_lock.h for your >> architecture: >> >> #define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock)) >> >> We've now added that line for ia64 (the line is present in two >> different places in the file, one for GCC and the other for HP's >> compiler). So the question is whether we need it for any other >> architectures. > > Ok. Let me talk to IBM guys...
With help from IBM Japan Ltd. we did some tests on a larger IBM machine than Tom Lane has used for his test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us). In his case it was IBM 8406-71Y, which has 8 physical cores and 4SMT(32 threadings). Ours is IBM Power 750 Express, which has 32 physical cores and 4SMT(128 threadings), 256GB RAM. The test method was same as the one in the article above. The differences are OS(RHEL 6.1), gcc version (4.4.5) and shared buffer size(8GB). We tested 3 methods to enhance spin lock contention: 1) Add "hint" parameter to lwarx op which is usable POWER6 or later architecure. 2) Add non-locked test in TAS() 3) #1 + #2 We saw small performance enhancement with #1, larger one with #2 and even better with #1+#2. Stock git head: pgbench -c 1 -j 1 -S -T 300 bench tps = 10356.306513 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 21841.10285 (including ... pgbench -c 8 -j 4 -S -T 300 bench tps = 63800.868529 (including ... pgbench -c 16 -j 8 -S -T 300 bench tps = 144872.64726 (including ... pgbench -c 32 -j 16 -S -T 300 bench tps = 120943.238461 (including ... pgbench -c 64 -j 32 -S -T 300 bench tps = 108144.933981 (including ... pgbench -c 128 -j 64 -S -T 300 bench tps = 92202.782791 (including ... With hint (method #1): pgbench -c 1 -j 1 -S -T 300 bench tps = 11198.1872 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 21390.592014 (including ... pgbench -c 8 -j 4 -S -T 300 bench tps = 74423.488089 (including ... pgbench -c 16 -j 8 -S -T 300 bench tps = 153766.351105 (including ... pgbench -c 32 -j 16 -S -T 300 bench tps = 134313.758113 (including ... pgbench -c 64 -j 32 -S -T 300 bench tps = 129392.154047 (including ... pgbench -c 128 -j 64 -S -T 300 bench tps = 105506.948058 (including ... Non-locked test in TAS() (method #2): pgbench -c 1 -j 1 -S -T 300 bench tps = 10537.893154 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 22019.388666 (including ... pgbench -c 8 -j 4 -S -T 300 bench tps = 78763.930379 (including ... pgbench -c 16 -j 8 -S -T 300 bench tps = 142791.99724 (including ... pgbench -c 32 -j 16 -S -T 300 bench tps = 222008.903675 (including ... pgbench -c 64 -j 32 -S -T 300 bench tps = 209912.691058 (including ... pgbench -c 128 -j 64 -S -T 300 bench tps = 199437.23965 (including ... With hint and non-locked test in TAS (#1 + #2) pgbench -c 1 -j 1 -S -T 300 bench tps = 11419.881375 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 21919.530209 (including ... pgbench -c 8 -j 4 -S -T 300 bench tps = 74788.242876 (including ... pgbench -c 16 -j 8 -S -T 300 bench tps = 156354.988564 (including ... pgbench -c 32 -j 16 -S -T 300 bench tps = 240521.495 (including ... pgbench -c 64 -j 32 -S -T 300 bench tps = 235709.272642 (including ... pgbench -c 128 -j 64 -S -T 300 bench tps = 220135.729663 (including ... Since each core usage is around 50% in the benchmark, there is room for further performance improvement by eliminating other contentions, tuning compiler option etc. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers