Here are some stats wrt to loop and native memset after enabling optimization 
with the same test tool(tested for long and long align using MemSetAligned). 
Corresponding glibc is linked on PPcle and AIX libc is linked on AIX.

https://postgrespro.com/list/thread-id/1673194


                                        AIX loop-1  PPCle - loop1   AIX loop-2  
PPCle - loop2
    Loop by long (size=8)           :   0           0               0.000001    0
    Loop Align by long (size=8)     :   0           0               0           0
    memset by long (size=8)         :   0.00999     0.010229        0.00994     
0.010211
    Loop by long (size=16)          :   0           0               0           0
    Loop Align by long (size=16)    :   0           0               0           0
    memset by long (size=16)        :   0.010082    0.010036        0.010094    
0.01003
    Loop by long (size=32)          :   0.32903     0.227726        0.329027    
0.227707
    Loop Align by long (size=32)    :   0.329486    0.227705        0.328932    
0.227712
    memset by long (size=32)        :   0.021061    0.01064         0.021115    
0.01064
    Loop by long (size=64)          :   0.334761    0.227714        0.34326     
0.227688
    Loop Align by long (size=64)    :   0.329005    0.236937        0.329084    
0.236906
    memset by long (size=64)        :   0.059559    0.025612        0.053004    
0.029589
    Loop by long (size=128)         :   0.420381    0.329634        0.420332    
0.329524
    Loop Align by long (size=128)   :   0.420376    0.337169        0.42022     
0.337162
    memset by long (size=128)       :   0.420153    0.098774        0.420312    
0.101888
    Loop by long (size=256)         :   0.472187    0.428049        0.472774    
0.429217
    Loop Align by long (size=256)   :   0.472586    0.438316        0.472447    
0.438325
    memset by long (size=256)       :   0.473731    0.428013        0.473864    
0.42759
    Loop by long (size=512)         :   0.676089    0.435649        0.632774    
0.43574
    Loop Align by long (size=512)   :   0.66702     0.428013        0.630751    
0.427319
    memset by long (size=512)       :   0.666619    0.427989        0.691485    
0.427263
    Loop by long (size=1024)        :   1.00773     0.45079         0.925212    
0.452131
    Loop Align by long (size=1024)  :   0.92114     0.45084         0.920574    
0.452994
    memset by long (size=1024)      :   0.935062    0.450821        0.917       
0.452396
    Loop by long (size=2048)        :   1.52585     0.702127        1.265107    
0.701822
    Loop Align by long (size=2048)  :   1.57524     0.702158        1.439109    
0.702651
    memset by long (size=2048)      :   1.614771    0.702247        1.384672    
0.701857
    Loop by long (size=4096)        :   1.418133    1.37568         1.325803    
1.376005
    Loop Align by long (size=4096)  :   1.421619    1.375741        1.325743    
1.376071
    memset by long (size=4096)      :   1.423404    1.375716        1.325666    
1.376091

After enabling optimization levels, both are performing similar.
As both are performing similar we have removed the MEMSET_LOOP in the AIX 
template and tried the below benchmark after running pgbench.

    Run#1
    >> pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10  -L 10 -j 20
    pgbench (18devel)
    starting vacuum...end.
    progress: 10.0 s, 2603.2 tps, lat 18.692 ms stddev 61.947, 0 failed
    progress: 20.0 s, 3373.2 tps, lat 14.841 ms stddev 17.724, 0 failed
    progress: 30.0 s, 2599.6 tps, lat 19.222 ms stddev 99.307, 0 failed
    progress: 40.0 s, 3531.3 tps, lat 14.159 ms stddev 14.786, 0 failed
    progress: 50.0 s, 2561.3 tps, lat 15.180 ms stddev 33.532, 0 failed
    progress: 60.0 s, 3315.4 tps, lat 18.421 ms stddev 111.988, 0 failed
    progress: 70.0 s, 3517.4 tps, lat 14.203 ms stddev 14.931, 0 failed
    progress: 80.0 s, 2023.4 tps, lat 21.858 ms stddev 125.718, 0 failed
    progress: 90.0 s, 3472.1 tps, lat 16.049 ms stddev 55.152, 0 failed
    progress: 100.0 s, 3580.5 tps, lat 13.966 ms stddev 14.636, 0 failed
    progress: 110.0 s, 2823.4 tps, lat 14.572 ms stddev 20.433, 0 failed
    progress: 120.0 s, 3140.3 tps, lat 18.717 ms stddev 120.447, 0 failed
    progress: 130.0 s, 3488.4 tps, lat 14.329 ms stddev 15.057, 0 failed
    progress: 140.0 s, 2503.7 tps, lat 19.966 ms stddev 125.551, 0 failed
    progress: 150.0 s, 3083.3 tps, lat 16.212 ms stddev 56.652, 0 failed
    progress: 160.0 s, 3572.0 tps, lat 13.991 ms stddev 14.660, 0 failed
    progress: 170.0 s, 3642.2 tps, lat 13.722 ms stddev 14.507, 0 failed
    progress: 180.0 s, 2453.6 tps, lat 20.364 ms stddev 133.126, 0 failed
    transaction type: <builtin: TPC-B (sort of)>
    scaling factor: 50
    query mode: simple
    number of clients: 50
    number of threads: 20
    maximum number of tries: 1
    duration: 180 s
    number of transactions actually processed: 552889
    number of failed transactions: 0 (0.000%)
    number of transactions above the 10.0 ms latency limit: 227816/552889 
(41.205%)
    latency average = 16.252 ms
    latency stddev = 69.656 ms
    initial connection time = 237.245 ms
    tps = 3074.421213 (without initial connection time)
    statement latencies in milliseconds and failures:
             0.002           0 \set aid random(1, 100000 * :scale)
             0.001           0 \set bid random(1, 1 * :scale)
             0.001           0 \set tid random(1, 10 * :scale)
             0.001           0 \set delta random(-5000, 5000)
             1.090           0 BEGIN;
             3.153           0 UPDATE pgbench_accounts SET abalance = abalance 
+ :delta WHERE aid = :aid;
             1.462           0 SELECT abalance FROM pgbench_accounts WHERE aid 
= :aid;
             2.012           0 UPDATE pgbench_tellers SET tbalance = tbalance + 
:delta WHERE tid = :tid;
             4.060           0 UPDATE pgbench_branches SET bbalance = bbalance 
+ :delta WHERE bid = :bid;
             1.224           0 INSERT INTO pgbench_history (tid, bid, aid, 
delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_T
             3.246           0 END;


    Run#2
    >> pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10  -L 10 -j 20
    pgbench (18devel)
    starting vacuum...end.

    transaction type: <builtin: TPC-B (sort of)>
    scaling factor: 50
    query mode: simple
    number of clients: 50
    number of threads: 20
    maximum number of tries: 1
    duration: 180 s
    number of transactions actually processed: 577290
    number of failed transactions: 0 (0.000%)
    number of transactions above the 10.0 ms latency limit: 234815/577290 
(40.675%)
    latency average = 15.558 ms
    latency stddev = 65.428 ms
    initial connection time = 314.109 ms
    tps = 3211.642930 (without initial connection time)
    statement latencies in milliseconds and failures:
             0.002           0 \set aid random(1, 100000 * :scale)
             0.001           0 \set bid random(1, 1 * :scale)
             0.001           0 \set tid random(1, 10 * :scale)
             0.001           0 \set delta random(-5000, 5000)
             1.084           0 BEGIN;
             2.761           0 UPDATE pgbench_accounts SET abalance = abalance 
+ :delta WHERE aid = :aid;
             1.371           0 SELECT abalance FROM pgbench_accounts WHERE aid 
= :aid;
             2.000           0 UPDATE pgbench_tellers SET tbalance = tbalance + 
:delta WHERE tid = :tid;
             4.014           0 UPDATE pgbench_branches SET bbalance = bbalance 
+ :delta WHERE bid = :bid;
             1.229           0 INSERT INTO pgbench_history (tid, bid, aid, 
delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_T
             3.093           0 END;



>> diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h

> - Does GCC on AIX (still) use the IBM assembler?
> - Does the IBM assembler still not understand the label syntax?
> - Is there some other label syntax that would work on the IBM assembler?
> - Is it possible to use the GNU assembler instead?

GCC on AIX still uses the AIX native assembler only. The GNU assembler has some 
level of support in AIX through some of the patches. But still GCC/gnu 
assembler combination is not very much tested.

We removed AIX specific changes for TAS(), which would now use the 
__sync_lock_test_and_set() routines directly instead, and we ran pgbench on it.

    + pgbench -c 50 -p 5678 -d postgres -T 180 -r -P 10 -L 10 -j 20
    pgbench (18devel)
    starting vacuum...end.

    scaling factor: 50
    query mode: simple
    number of clients: 50
    number of threads: 20
    maximum number of tries: 1
    duration: 180 s
    number of transactions actually processed: 550838
    number of failed transactions: 0 (0.000%)
    number of transactions above the 10.0 ms latency limit: 227805/550838 
(41.356%)
    latency average = 16.323 ms
    latency stddev = 68.404 ms
    initial connection time = 235.449 ms
    tps = 3061.041640 (without initial connection time)
    statement latencies in milliseconds and failures:
             0.002           0 \set aid random(1, 100000 * :scale)
             0.001           0 \set bid random(1, 1 * :scale)
             0.001           0 \set tid random(1, 10 * :scale)
             0.001           0 \set delta random(-5000, 5000)
             1.098           0 BEGIN;
             2.993           0 UPDATE pgbench_accounts SET abalance = abalance 
+ :delta WHERE aid = :aid;
             1.501           0 SELECT abalance FROM pgbench_accounts WHERE aid 
= :aid;
             2.004           0 UPDATE pgbench_tellers SET tbalance = tbalance + 
:delta WHERE tid = :tid;
             4.127           0 UPDATE pgbench_branches SET bbalance = bbalance 
+ :delta WHERE bid = :bid;
             1.238           0 INSERT INTO pgbench_history (tid, bid, aid, 
delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
             3.356           0 END;



> 21  -- test overflow/underflow handling
>
> 22  SELECT gamma(float8 '-infinity');
>
> 23  ERROR:  value out of range: overflow

WRT failure in lgamma(), we worked with the libm team to resolve it. It’s an 
issue with the errno that is being set. I’ll work on the testcase.

   >> ./gamma-test NaN

    Gamma and natural logarithm of gamma for the input values:
    Gamma(NaNS) = NaNQ errno: 34
    lgamma(NaNS) = NaNQ errno: 34


   With fixed libm

    + ./gamma-test NaN

    Gamma and natural logarithm of gamma for the input values:
    Gamma(NaNS) = NaNQ errno: 0
    lgamma(NaNS) = NaNQ errno: 0



Warm Regards,
Sriram






Reply via email to