Hello. While running pgbench, I found tps excluding connection somewhat strange. To emphasize the behavior, I inserted some sleep at the end of doConnect() and ran pgbench with several times.
$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 409 ! latency average = 48.980 ms > tps = 81.665671 (including connections establishing) > tps = 102.323285 (excluding connections establishing) $ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 1628 ! latency average = 12.303 ms > tps = 325.123448 (including connections establishing) > tps = 422.671015 (excluding connections establishing) Why the latencies are so different? If I tell pgbench to measure the latencies, it is about 8-9ms in both cases. $ pgbench -T 5 -j 1 -c 4 -L100 (with 1s sleep in doConnect()) > number of transactions actually processed: 419 > number of transactions above the 100.0 ms latency limit: 0/419 (0.000 %) ! latency average = 8.984 ms > latency stddev = 3.259 ms > tps = 83.633694 (including connections establishing) > tps = 104.909130 (excluding connections establishing) $ pgbench -T 5 -j 4 -c 4 -L100 (with 1s sleep in doConnect()) > number of transactions actually processed: 2047 > number of transactions above the 100.0 ms latency limit: 0/2047 (0.000 %) ! latency average = 7.807 ms > latency stddev = 2.916 ms > tps = 408.677024 (including connections establishing) > tps = 511.837509 (excluding connections establishing) That's reasonable. So the latency estimation is apparently a kind of bogus. It seems to me that the calculation is wrong to use time_include. I think it should use time_exclude instead. $ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 407 ! latency average = 11.674 ms > tps = 81.262992 (including connections establishing) > tps = 101.724507 (excluding connections establishing) > $ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 1718 ! latency average = 11.674 ms > tps = 342.652333 (including connections establishing) > tps = 428.850338 (excluding connections establishing) Looks far better, but "tps excluding connection establishing" looks strange. The reason seems to me threadRun() counts connection time a wrong way. thread->conn_time is assumed to be the sum of the time took by each client until it runs the first transaction. If each connection took 1 second, conn_time of a thread that runs three clients is calculated as 3 seconds. But in reality, if each connection took 1 second, all clients on the thread waits for 3 seconds before starting a transaction so it should be 3s * 3(clients/thread) = 9 seconds. We could multiply conn_time by nstate on each thread, but the same happens for -C (is_connect) mode. When a client took 1 second to establish a connection, no other clients can do anything meanwhile. Somewhat different from the !is_connect case precisely, but basically the same at the point that making a connection of a client actually blocks all other clients on the same thread. Therefore I changed nclients in the calculation of time_exclude to nthreads. $ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 399 > latency average = 9.402 ms > tps = 79.671581 (including connections establishing) > tps = 425.460179 (excluding connections establishing) $ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect()) > number of transactions actually processed: 1903 > latency average = 8.330 ms > tps = 379.944770 (including connections establishing) > tps = 480.211639 (excluding connections establishing) Looks fine. Also the -C cases looks sainer. (- is unpatched, + is patched) > $ pgbench -T 5 -j 1 -c 4 -C (with 100ms sleep in doConnect()) ### time_include=5.1, conn_total_time=4.9 - number of transactions actually processed: 43 - latency average = 469.850 ms - tps = 8.513360 (including connections establishing) - tps = 11.246110 (excluding connections establishing) + number of transactions actually processed: 38 + latency average = 16.735 ms + tps = 7.493800 (including connections establishing) + tps = 239.022345 (excluding connections establishing) > $ pgbench -T 5 -j 4 -c 4 -C (with 100ms sleep in doConnect()) ### time_include=5.1, conn_total_time=18.3 -number of transactions actually processed: 172 -latency average = 117.571 ms -tps = 34.021873 (including connections establishing) -tps = 346.296591 (excluding connections establishing) + number of transactions actually processed: 172 + latency average = 11.671 ms + tps = 33.851599 (including connections establishing) + tps = 342.716081 (excluding connections establishing) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 117118699e24085db209ab66698cb33c7a70fd21 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga....@gmail.com> Date: Thu, 17 Sep 2020 17:20:10 +0900 Subject: [PATCH] Fix latency and tps calculation of pgbench Fix the calculation for "latency average" and the "tps excluding connections establishing" not to wrongly affected by connections establishing time. --- src/bin/pgbench/pgbench.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 332eabf637..adb522ed77 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -5194,16 +5194,35 @@ printResults(StatsData *total, instr_time total_time, instr_time conn_total_time, int64 latency_late) { double time_include, + time_exclude, tps_include, tps_exclude; int64 ntx = total->cnt - total->skipped; time_include = INSTR_TIME_GET_DOUBLE(total_time); + /* + * conn_total_time is the sum of the time each clients took to establish a + * connection. In the multi-threaded case, all clients run on a thread + * waits for all the clients to establish a connection. So the actual total + * connection time of a thread is thread->conn_time * thread->nstate. So + * the total time took for connection establishment is: + * + * sum(thread->conn_time * thread->nstate) / nclients + * + * Assuming all client took the same time to connection establishment and + * clients are distributed equally to threads, the expression is + * approximated as: + * + * thread->conn_time * (nclients/nthreads) / nclients + * = conn_total_time / nthreads + */ + time_exclude = (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / + nthreads)); + /* tps is about actually executed transactions */ tps_include = ntx / time_include; - tps_exclude = ntx / - (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); + tps_exclude = ntx / time_exclude; /* Report test parameters. */ printf("transaction type: %s\n", @@ -5249,7 +5268,7 @@ printResults(StatsData *total, instr_time total_time, { /* no measurement, show average latency computed from run time */ printf("latency average = %.3f ms\n", - 1000.0 * time_include * nclients / total->cnt); + 1000.0 * time_exclude * nclients / total->cnt); } if (throttle_delay) -- 2.18.4