Here are some stats wrt to loop and native memset after enabling optimization with the same test tool(tested for long and long align using MemSetAligned). Corresponding glibc is linked on PPcle and AIX libc is linked on AIX.
https://postgrespro.com/list/thread-id/1673194 AIX loop-1 PPCle - loop1 AIX loop-2 PPCle - loop2 Loop by long (size=8) : 0 0 0.000001 0 Loop Align by long (size=8) : 0 0 0 0 memset by long (size=8) : 0.00999 0.010229 0.00994 0.010211 Loop by long (size=16) : 0 0 0 0 Loop Align by long (size=16) : 0 0 0 0 memset by long (size=16) : 0.010082 0.010036 0.010094 0.01003 Loop by long (size=32) : 0.32903 0.227726 0.329027 0.227707 Loop Align by long (size=32) : 0.329486 0.227705 0.328932 0.227712 memset by long (size=32) : 0.021061 0.01064 0.021115 0.01064 Loop by long (size=64) : 0.334761 0.227714 0.34326 0.227688 Loop Align by long (size=64) : 0.329005 0.236937 0.329084 0.236906 memset by long (size=64) : 0.059559 0.025612 0.053004 0.029589 Loop by long (size=128) : 0.420381 0.329634 0.420332 0.329524 Loop Align by long (size=128) : 0.420376 0.337169 0.42022 0.337162 memset by long (size=128) : 0.420153 0.098774 0.420312 0.101888 Loop by long (size=256) : 0.472187 0.428049 0.472774 0.429217 Loop Align by long (size=256) : 0.472586 0.438316 0.472447 0.438325 memset by long (size=256) : 0.473731 0.428013 0.473864 0.42759 Loop by long (size=512) : 0.676089 0.435649 0.632774 0.43574 Loop Align by long (size=512) : 0.66702 0.428013 0.630751 0.427319 memset by long (size=512) : 0.666619 0.427989 0.691485 0.427263 Loop by long (size=1024) : 1.00773 0.45079 0.925212 0.452131 Loop Align by long (size=1024) : 0.92114 0.45084 0.920574 0.452994 memset by long (size=1024) : 0.935062 0.450821 0.917 0.452396 Loop by long (size=2048) : 1.52585 0.702127 1.265107 0.701822 Loop Align by long (size=2048) : 1.57524 0.702158 1.439109 0.702651 memset by long (size=2048) : 1.614771 0.702247 1.384672 0.701857 Loop by long (size=4096) : 1.418133 1.37568 1.325803 1.376005 Loop Align by long (size=4096) : 1.421619 1.375741 1.325743 1.376071 memset by long (size=4096) : 1.423404 1.375716 1.325666 1.376091 After enabling optimization levels, both are performing similar. As both are performing similar we have removed the MEMSET_LOOP in the AIX template and tried the below benchmark after running pgbench. Run#1 >> pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10 -L 10 -j 20 pgbench (18devel) starting vacuum...end. progress: 10.0 s, 2603.2 tps, lat 18.692 ms stddev 61.947, 0 failed progress: 20.0 s, 3373.2 tps, lat 14.841 ms stddev 17.724, 0 failed progress: 30.0 s, 2599.6 tps, lat 19.222 ms stddev 99.307, 0 failed progress: 40.0 s, 3531.3 tps, lat 14.159 ms stddev 14.786, 0 failed progress: 50.0 s, 2561.3 tps, lat 15.180 ms stddev 33.532, 0 failed progress: 60.0 s, 3315.4 tps, lat 18.421 ms stddev 111.988, 0 failed progress: 70.0 s, 3517.4 tps, lat 14.203 ms stddev 14.931, 0 failed progress: 80.0 s, 2023.4 tps, lat 21.858 ms stddev 125.718, 0 failed progress: 90.0 s, 3472.1 tps, lat 16.049 ms stddev 55.152, 0 failed progress: 100.0 s, 3580.5 tps, lat 13.966 ms stddev 14.636, 0 failed progress: 110.0 s, 2823.4 tps, lat 14.572 ms stddev 20.433, 0 failed progress: 120.0 s, 3140.3 tps, lat 18.717 ms stddev 120.447, 0 failed progress: 130.0 s, 3488.4 tps, lat 14.329 ms stddev 15.057, 0 failed progress: 140.0 s, 2503.7 tps, lat 19.966 ms stddev 125.551, 0 failed progress: 150.0 s, 3083.3 tps, lat 16.212 ms stddev 56.652, 0 failed progress: 160.0 s, 3572.0 tps, lat 13.991 ms stddev 14.660, 0 failed progress: 170.0 s, 3642.2 tps, lat 13.722 ms stddev 14.507, 0 failed progress: 180.0 s, 2453.6 tps, lat 20.364 ms stddev 133.126, 0 failed transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 20 maximum number of tries: 1 duration: 180 s number of transactions actually processed: 552889 number of failed transactions: 0 (0.000%) number of transactions above the 10.0 ms latency limit: 227816/552889 (41.205%) latency average = 16.252 ms latency stddev = 69.656 ms initial connection time = 237.245 ms tps = 3074.421213 (without initial connection time) statement latencies in milliseconds and failures: 0.002 0 \set aid random(1, 100000 * :scale) 0.001 0 \set bid random(1, 1 * :scale) 0.001 0 \set tid random(1, 10 * :scale) 0.001 0 \set delta random(-5000, 5000) 1.090 0 BEGIN; 3.153 0 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 1.462 0 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 2.012 0 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 4.060 0 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 1.224 0 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_T 3.246 0 END; Run#2 >> pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10 -L 10 -j 20 pgbench (18devel) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 20 maximum number of tries: 1 duration: 180 s number of transactions actually processed: 577290 number of failed transactions: 0 (0.000%) number of transactions above the 10.0 ms latency limit: 234815/577290 (40.675%) latency average = 15.558 ms latency stddev = 65.428 ms initial connection time = 314.109 ms tps = 3211.642930 (without initial connection time) statement latencies in milliseconds and failures: 0.002 0 \set aid random(1, 100000 * :scale) 0.001 0 \set bid random(1, 1 * :scale) 0.001 0 \set tid random(1, 10 * :scale) 0.001 0 \set delta random(-5000, 5000) 1.084 0 BEGIN; 2.761 0 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 1.371 0 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 2.000 0 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 4.014 0 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 1.229 0 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_T 3.093 0 END; >> diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h > - Does GCC on AIX (still) use the IBM assembler? > - Does the IBM assembler still not understand the label syntax? > - Is there some other label syntax that would work on the IBM assembler? > - Is it possible to use the GNU assembler instead? GCC on AIX still uses the AIX native assembler only. The GNU assembler has some level of support in AIX through some of the patches. But still GCC/gnu assembler combination is not very much tested. We removed AIX specific changes for TAS(), which would now use the __sync_lock_test_and_set() routines directly instead, and we ran pgbench on it. + pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10 -L 10 -j 20 pgbench (18devel) starting vacuum...end. scaling factor: 50 query mode: simple number of clients: 50 number of threads: 20 maximum number of tries: 1 duration: 180 s number of transactions actually processed: 550838 number of failed transactions: 0 (0.000%) number of transactions above the 10.0 ms latency limit: 227805/550838 (41.356%) latency average = 16.323 ms latency stddev = 68.404 ms initial connection time = 235.449 ms tps = 3061.041640 (without initial connection time) statement latencies in milliseconds and failures: 0.002 0 \set aid random(1, 100000 * :scale) 0.001 0 \set bid random(1, 1 * :scale) 0.001 0 \set tid random(1, 10 * :scale) 0.001 0 \set delta random(-5000, 5000) 1.098 0 BEGIN; 2.993 0 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 1.501 0 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 2.004 0 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 4.127 0 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 1.238 0 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 3.356 0 END; > 21 -- test overflow/underflow handling > > 22 SELECT gamma(float8 '-infinity'); > > 23 ERROR: value out of range: overflow WRT failure in lgamma(), we worked with the libm team to resolve it. It’s an issue with the errno that is being set. I’ll work on the testcase. >> ./gamma-test NaN Gamma and natural logarithm of gamma for the input values: Gamma(NaNS) = NaNQ errno: 34 lgamma(NaNS) = NaNQ errno: 34 With fixed libm + ./gamma-test NaN Gamma and natural logarithm of gamma for the input values: Gamma(NaNS) = NaNQ errno: 0 lgamma(NaNS) = NaNQ errno: 0 Warm Regards, Sriram