pgbench calculates summary numbers a wrong way.

Kyotaro Horiguchi Thu, 17 Sep 2020 01:42:42 -0700

Hello.

While running pgbench, I found tps excluding connection somewhat
strange. To emphasize the behavior, I inserted some sleep at the end
of doConnect() and ran pgbench with several times.


$ pgbench -T 5 -j 1 -c 4  (with 1s sleep in doConnect())
>   number of transactions actually processed: 409
!   latency average = 48.980 ms
>   tps = 81.665671 (including connections establishing)
>   tps = 102.323285 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4  (with 1s sleep in doConnect())
>   number of transactions actually processed: 1628
!   latency average = 12.303 ms
>   tps = 325.123448 (including connections establishing)
>   tps = 422.671015 (excluding connections establishing)

Why the latencies are so different?  If I tell pgbench to measure the
latencies, it is about 8-9ms in both cases.

$ pgbench -T 5 -j 1 -c 4 -L100  (with 1s sleep in doConnect())
>   number of transactions actually processed: 419
>   number of transactions above the 100.0 ms latency limit: 0/419 (0.000 %)
!   latency average = 8.984 ms
>   latency stddev = 3.259 ms
>   tps = 83.633694 (including connections establishing)
>   tps = 104.909130 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4 -L100  (with 1s sleep in doConnect())
>   number of transactions actually processed: 2047
>   number of transactions above the 100.0 ms latency limit: 0/2047 (0.000 %)
!   latency average = 7.807 ms
>   latency stddev = 2.916 ms
>   tps = 408.677024 (including connections establishing)
>   tps = 511.837509 (excluding connections establishing)

That's reasonable. So the latency estimation is apparently a kind of
bogus.  It seems to me that the calculation is wrong to use
time_include. I think it should use time_exclude instead.

$ pgbench -T 5 -j 1 -c 4  (with 1s sleep in doConnect())
>   number of transactions actually processed: 407
!   latency average = 11.674 ms
>   tps = 81.262992 (including connections establishing)
>   tps = 101.724507 (excluding connections establishing)
> 

$ pgbench -T 5 -j 4 -c 4  (with 1s sleep in doConnect())
>   number of transactions actually processed: 1718
!   latency average = 11.674 ms
>   tps = 342.652333 (including connections establishing)
>   tps = 428.850338 (excluding connections establishing)

Looks far better, but "tps excluding connection establishing" looks
strange. The reason seems to me threadRun() counts connection time a
wrong way. thread->conn_time is assumed to be the sum of the time took
by each client until it runs the first transaction.  If each
connection took 1 second, conn_time of a thread that runs three
clients is calculated as 3 seconds. But in reality, if each connection
took 1 second, all clients on the thread waits for 3 seconds before
starting a transaction so it should be 3s * 3(clients/thread) = 9
seconds.

We could multiply conn_time by nstate on each thread, but the same
happens for -C (is_connect) mode. When a client took 1 second to
establish a connection, no other clients can do anything meanwhile.
Somewhat different from the !is_connect case precisely, but basically
the same at the point that making a connection of a client actually
blocks all other clients on the same thread.

Therefore I changed nclients in the calculation of time_exclude to
nthreads.

$ pgbench -T 5 -j 1 -c 4  (with 1s sleep in doConnect())
>  number of transactions actually processed: 399
>  latency average = 9.402 ms
>  tps = 79.671581 (including connections establishing)
>  tps = 425.460179 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
>  number of transactions actually processed: 1903
>  latency average = 8.330 ms
>  tps = 379.944770 (including connections establishing)
>  tps = 480.211639 (excluding connections establishing)

Looks fine. Also the -C cases looks sainer.

(- is unpatched, + is patched)

> $ pgbench -T 5 -j 1 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=4.9
- number of transactions actually processed: 43
- latency average = 469.850 ms
- tps = 8.513360 (including connections establishing)
- tps = 11.246110 (excluding connections establishing)
+ number of transactions actually processed: 38
+ latency average = 16.735 ms
+ tps = 7.493800 (including connections establishing)
+ tps = 239.022345 (excluding connections establishing)


> $ pgbench -T 5 -j 4 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=18.3
-number of transactions actually processed: 172
-latency average = 117.571 ms
-tps = 34.021873 (including connections establishing)
-tps = 346.296591 (excluding connections establishing)
+ number of transactions actually processed: 172
+ latency average = 11.671 ms
+ tps = 33.851599 (including connections establishing)
+ tps = 342.716081 (excluding connections establishing)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 117118699e24085db209ab66698cb33c7a70fd21 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga....@gmail.com>
Date: Thu, 17 Sep 2020 17:20:10 +0900
Subject: [PATCH] Fix latency and tps calculation of pgbench

Fix the calculation for "latency average" and the "tps excluding
connections establishing" not to wrongly affected by connections
establishing time.
---
 src/bin/pgbench/pgbench.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 332eabf637..adb522ed77 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5194,16 +5194,35 @@ printResults(StatsData *total, instr_time total_time,
 			 instr_time conn_total_time, int64 latency_late)
 {
 	double		time_include,
+				time_exclude,
 				tps_include,
 				tps_exclude;
 	int64		ntx = total->cnt - total->skipped;
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 
+	/*
+	 * conn_total_time is the sum of the time each clients took to establish a
+	 * connection. In the multi-threaded case, all clients run on a thread
+	 * waits for all the clients to establish a connection. So the actual total
+	 * connection time of a thread is thread->conn_time * thread->nstate. So
+	 * the total time took for connection establishment is:
+	 *
+	 *   sum(thread->conn_time * thread->nstate) / nclients
+	 *
+	 * Assuming all client took the same time to connection establishment and
+	 * clients are distributed equally to threads, the expression is
+	 * approximated as:
+	 *
+	 *   thread->conn_time * (nclients/nthreads) / nclients
+	 * = conn_total_time / nthreads
+	 */
+	time_exclude = (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) /
+									nthreads));
+
 	/* tps is about actually executed transactions */
 	tps_include = ntx / time_include;
-	tps_exclude = ntx /
-		(time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
+	tps_exclude = ntx / time_exclude;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5249,7 +5268,7 @@ printResults(StatsData *total, instr_time total_time,
 	{
 		/* no measurement, show average latency computed from run time */
 		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+			   1000.0 * time_exclude * nclients / total->cnt);
 	}
 
 	if (throttle_delay)
-- 
2.18.4

pgbench calculates summary numbers a wrong way.

Reply via email to