On 09/09/2014 03:35 PM, Fabien COELHO wrote:
Hello Heikki,
I think we have to reconsider what we're reporting in 9.4, when --rate
is enabled, even though it's already very late in the release cycle.
It's a bad idea to change the definition of latency between 9.4 and 9.5,
so let's get it right in 9.4.
Indeed.
As per the attached patch. I think we should commit this to 9.4. Any
objections?
Ok for me. Some more propositions about the doc below.
I looked closer at the this, and per Jan's comments, realized that we
don't log the lag time in the per-transaction log file. I think that's a
serious omission; when --rate is used, the schedule lag time is
important information to make sense of the result. I think we have to
apply the attached patch, and backpatch to 9.4. (The documentation on
the log file format needs updating)
Also, this is bizarre:
/*
* Use inverse transform sampling to randomly generate a delay,
such
* that the series of delays will approximate a Poisson
distribution
* centered on the throttle_delay time.
*
* 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay
* multiplier, and results in a 0.055 % target underestimation
bias:
*
* SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000)
AS i;
* = 1.000552717032611116335474
*
* If transactions are too slow or a given wait is shorter than
a
* transaction, the next transaction will start right away.
*/
int64 wait = (int64) (throttle_delay *
1.00055271703 * -log(getrand(thread, 1,
10000) / 10000.0));
We're using getrand() to generate a uniformly distributed random value
between 1 and 10000, and then convert it to a double between 0.0 and
1.0. But getrand() is implemented by taking a double between 0.0 and 1.0
and converting it an integer, so we're just truncating the original
floating point number unnecessarily. I think we should add a new
function, getPoissonRand(), that uses pg_erand48() directly. We already
have similiar getGaussianRand() and getExponentialRand() functions.
Barring objections, I'll prepare another patch to do that, and backpatch
to 9.4.
- Heikki
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2f7d80e..f7a3a5f 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -210,10 +210,10 @@ typedef struct
* sent */
int sleeping; /* 1 indicates that the client is napping */
bool throttling; /* whether nap is for throttling */
- int64 until; /* napping until (usec) */
Variable *variables; /* array of variable definitions */
int nvariables;
- instr_time txn_begin; /* used for measuring transaction latencies */
+ int64 txn_scheduled; /* scheduled start time of transaction (usec) */
+ instr_time txn_begin; /* used for measuring schedule lag times */
instr_time stmt_begin; /* used for measuring statement latencies */
int64 txn_latencies; /* cumulated latencies */
int64 txn_sqlats; /* cumulated square latencies */
@@ -284,12 +284,17 @@ typedef struct
long start_time; /* when does the interval start */
int cnt; /* number of transactions */
- double min_duration; /* min/max durations */
- double max_duration;
- double sum; /* sum(duration), sum(duration^2) - for
+
+ double min_latency; /* min/max latencies */
+ double max_latency;
+ double sum_latency; /* sum(latency), sum(latency^2) - for
* estimates */
- double sum2;
+ double sum2_latency;
+ double min_lag;
+ double max_lag;
+ double sum_lag; /* sum(lag) */
+ double sum2_lag; /* sum(lag*lag) */
} AggVals;
static Command **sql_files[MAX_FILES]; /* SQL script files */
@@ -968,12 +973,18 @@ agg_vals_init(AggVals *aggs, instr_time start)
{
/* basic counters */
aggs->cnt = 0; /* number of transactions */
- aggs->sum = 0; /* SUM(duration) */
- aggs->sum2 = 0; /* SUM(duration*duration) */
+ aggs->sum_latency = 0; /* SUM(latency) */
+ aggs->sum2_latency = 0; /* SUM(latency*latency) */
/* min and max transaction duration */
- aggs->min_duration = 0;
- aggs->max_duration = 0;
+ aggs->min_latency = 0;
+ aggs->max_latency = 0;
+
+ /* schedule lag counters */
+ aggs->sum_lag = 0;
+ aggs->sum2_lag = 0;
+ aggs->min_lag = 0;
+ aggs->max_lag = 0;
/* start of the current interval */
aggs->start_time = INSTR_TIME_GET_DOUBLE(start);
@@ -1016,7 +1027,7 @@ top:
thread->throttle_trigger += wait;
- st->until = thread->throttle_trigger;
+ st->txn_scheduled = thread->throttle_trigger;
st->sleeping = 1;
st->throttling = true;
st->is_throttled = true;
@@ -1032,13 +1043,13 @@ top:
INSTR_TIME_SET_CURRENT(now);
now_us = INSTR_TIME_GET_MICROSEC(now);
- if (st->until <= now_us)
+ if (st->txn_scheduled <= now_us)
{
st->sleeping = 0; /* Done sleeping, go ahead with next command */
if (st->throttling)
{
/* Measure lag of throttled transaction relative to target */
- int64 lag = now_us - st->until;
+ int64 lag = now_us - st->txn_scheduled;
thread->throttle_lag += lag;
if (lag > thread->throttle_lag_max)
@@ -1052,6 +1063,11 @@ top:
if (st->listen)
{ /* are we receiver? */
+ instr_time now;
+ bool now_valid = false;
+
+ INSTR_TIME_SET_ZERO(now); /* initialize to keep compiler quiet */
+
if (commands[st->state]->type == SQL_COMMAND)
{
if (debug)
@@ -1071,10 +1087,13 @@ top:
*/
if (is_latencies)
{
- instr_time now;
int cnum = commands[st->state]->command_num;
- INSTR_TIME_SET_CURRENT(now);
+ if (!now_valid)
+ {
+ INSTR_TIME_SET_CURRENT(now);
+ now_valid = true;
+ }
INSTR_TIME_ACCUM_DIFF(thread->exec_elapsed[cnum],
now, st->stmt_begin);
thread->exec_count[cnum]++;
@@ -1083,12 +1102,16 @@ top:
/* transaction finished: record latency under progress or throttling */
if ((progress || throttle_delay) && commands[st->state + 1] == NULL)
{
- instr_time diff;
int64 latency;
- INSTR_TIME_SET_CURRENT(diff);
- INSTR_TIME_SUBTRACT(diff, st->txn_begin);
- latency = INSTR_TIME_GET_MICROSEC(diff);
+ if (!now_valid)
+ {
+ INSTR_TIME_SET_CURRENT(now);
+ now_valid = true;
+ }
+
+ latency = INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled;
+
st->txn_latencies += latency;
/*
@@ -1106,9 +1129,8 @@ top:
*/
if (logfile && commands[st->state + 1] == NULL)
{
- instr_time now;
- instr_time diff;
- double usec;
+ double lag;
+ double latency;
/*
* write the log entry if this row belongs to the random sample,
@@ -1117,10 +1139,13 @@ top:
if (sample_rate == 0.0 ||
pg_erand48(thread->random_state) <= sample_rate)
{
- INSTR_TIME_SET_CURRENT(now);
- diff = now;
- INSTR_TIME_SUBTRACT(diff, st->txn_begin);
- usec = (double) INSTR_TIME_GET_MICROSEC(diff);
+ if (!now_valid)
+ {
+ INSTR_TIME_SET_CURRENT(now);
+ now_valid = true;
+ }
+ latency = (double) (INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled);
+ lag = (double) (INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled);
/* should we aggregate the results or not? */
if (agg_interval > 0)
@@ -1132,15 +1157,27 @@ top:
if (agg->start_time + agg_interval >= INSTR_TIME_GET_DOUBLE(now))
{
agg->cnt += 1;
- agg->sum += usec;
- agg->sum2 += usec * usec;
+ agg->sum_latency += latency;
+ agg->sum2_latency += latency * latency;
/* first in this aggregation interval */
- if ((agg->cnt == 1) || (usec < agg->min_duration))
- agg->min_duration = usec;
+ if ((agg->cnt == 1) || (latency < agg->min_latency))
+ agg->min_latency = latency;
+
+ if ((agg->cnt == 1) || (latency > agg->max_latency))
+ agg->max_latency = latency;
+
+ /* and the same for schedule lag */
+ if (throttle_delay)
+ {
+ agg->sum_lag += lag;
+ agg->sum2_lag += lag * lag;
- if ((agg->cnt == 1) || (usec > agg->max_duration))
- agg->max_duration = usec;
+ if ((agg->cnt == 1) || (lag < agg->min_lag))
+ agg->min_lag = lag;
+ if ((agg->cnt == 1) || (lag > agg->max_lag))
+ agg->max_lag = lag;
+ }
}
else
{
@@ -1156,23 +1193,32 @@ top:
* ifdef in usage), so we don't need to handle
* this in a special way (see below).
*/
- fprintf(logfile, "%ld %d %.0f %.0f %.0f %.0f\n",
+ fprintf(logfile, "%ld %d %.0f %.0f %.0f %.0f %.0f %.0f %.0f %.0f\n",
agg->start_time,
agg->cnt,
- agg->sum,
- agg->sum2,
- agg->min_duration,
- agg->max_duration);
+ agg->sum_latency,
+ agg->sum2_latency,
+ agg->min_latency,
+ agg->max_latency,
+ agg->sum_lag,
+ agg->sum2_lag,
+ agg->min_lag,
+ agg->max_lag
+ );
/* move to the next inteval */
agg->start_time = agg->start_time + agg_interval;
/* reset for "no transaction" intervals */
agg->cnt = 0;
- agg->min_duration = 0;
- agg->max_duration = 0;
- agg->sum = 0;
- agg->sum2 = 0;
+ agg->min_latency = 0;
+ agg->max_latency = 0;
+ agg->sum_latency = 0;
+ agg->sum2_latency = 0;
+ agg->min_lag = 0;
+ agg->max_lag = 0;
+ agg->sum2_lag = 0;
+ agg->sum2_lag = 0;
}
/*
@@ -1180,10 +1226,14 @@ top:
* current)
*/
agg->cnt = 1;
- agg->min_duration = usec;
- agg->max_duration = usec;
- agg->sum = usec;
- agg->sum2 = usec * usec;
+ agg->min_latency = latency;
+ agg->max_latency = latency;
+ agg->sum_latency = latency;
+ agg->sum2_latency = latency * latency;
+ agg->min_lag = lag;
+ agg->max_lag = lag;
+ agg->sum_lag = lag;
+ agg->sum2_lag = lag * lag;
}
}
else
@@ -1195,17 +1245,17 @@ top:
* This is more than we really ought to know about
* instr_time
*/
- fprintf(logfile, "%d %d %.0f %d %ld %ld\n",
- st->id, st->cnt, usec, st->use_file,
- (long) now.tv_sec, (long) now.tv_usec);
+ fprintf(logfile, "%d %d %.0f %d %ld %ld %.0f\n",
+ st->id, st->cnt, latency, st->use_file,
+ (long) now.tv_sec, (long) now.tv_usec, lag);
#else
/*
* On Windows, instr_time doesn't provide a timestamp
* anyway
*/
- fprintf(logfile, "%d %d %.0f %d 0 0\n",
- st->id, st->cnt, usec, st->use_file);
+ fprintf(logfile, "%d %d %.0f %d 0 0 %.0f\n",
+ st->id, st->cnt, usec, st->use_file, lag);
#endif
}
}
@@ -1295,8 +1345,17 @@ top:
/* Record transaction start time under logging, progress or throttling */
if ((logfile || progress || throttle_delay) && st->state == 0)
+ {
INSTR_TIME_SET_CURRENT(st->txn_begin);
+ /*
+ * When not throttling, this is also the transaction's scheduled start
+ * time.
+ */
+ if (!throttle_delay)
+ st->txn_scheduled = INSTR_TIME_GET_MICROSEC(st->txn_begin);
+ }
+
/* Record statement start time if per-command latencies are requested */
if (is_latencies)
INSTR_TIME_SET_CURRENT(st->stmt_begin);
@@ -1620,7 +1679,7 @@ top:
usec *= 1000000;
INSTR_TIME_SET_CURRENT(now);
- st->until = INSTR_TIME_GET_MICROSEC(now) + usec;
+ st->txn_scheduled = INSTR_TIME_GET_MICROSEC(now) + usec;
st->sleeping = 1;
st->listen = 1;
@@ -3309,7 +3368,7 @@ threadRun(void *arg)
now_usec = INSTR_TIME_GET_MICROSEC(now);
}
- this_usec = st->until - now_usec;
+ this_usec = st->txn_scheduled - now_usec;
if (min_usec > this_usec)
min_usec = this_usec;
}
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index b479105..f293954 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -403,8 +403,10 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
Show progress report every <literal>sec</> seconds. The report
includes the time since the beginning of the run, the tps since the
last report, and the transaction latency average and standard
- deviation since the last report. Under throttling (<option>-R</>), it
- also includes the average schedule lag time since the last report.
+ deviation since the last report. Under throttling (<option>-R</>),
+ the latency is computed with respect to the transaction scheduled
+ start time, not the actual transaction beginning time, thus it also
+ includes the average schedule lag time.
</para>
</listitem>
</varlistentry>
@@ -440,29 +442,25 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
possible for later ones to catch up again.
</para>
<para>
- When throttling is active, the average and maximum transaction
- schedule lag time are reported in ms. This is the delay between
- the original scheduled transaction time and the actual transaction
- start times. The schedule lag shows whether a transaction was
- started on time or late. Once a client starts running behind its
- schedule, every following transaction can continue to be penalized
- for schedule lag. If faster transactions are able to catch up, it's
- possible for them to get back on schedule again. The lag measurement
- of every transaction is shown when pgbench is run with debugging
- output.
+ When throttling is active, the transaction latency reported at the
+ end of the run is calculated from the scheduled start times, so it
+ includes the time each transaction had to wait for the previous
+ transaction to finish. The wait time is called the schedule lag time,
+ and its average and maximum are also reported separately. The
+ transaction latency with respect to the actual transaction start time,
+ i.e. the time spent executing the transaction in the database, can be
+ computed by subtracting the reported schedule lag time from the
+ latency.
</para>
+
<para>
- High rate limit schedule lag values, that is lag values that are large
- compared to the actual transaction latency, indicate that something is
- amiss in the throttling process. High schedule lag can highlight a subtle
- problem there even if the target rate limit is met in the end. One
- possible cause of schedule lag is insufficient pgbench threads to
- handle all of the clients. To improve that, consider reducing the
- number of clients, increasing the number of threads in pgbench, or
- running pgbench on a separate host. Another possibility is that the
- database is not keeping up with the load at some point. When that
- happens, you will have to reduce the expected transaction rate to
- lower schedule lag.
+ A high schedule lag time is an indication that the system cannot
+ process transactions at the specified rate, with the chosen number of
+ clients and threads. When the average transaction execution time is
+ longer than the scheduled interval between each transaction, each
+ successive transaction will fall further behind, and the schedule lag
+ time will keep increasing the longer the test run is. When that
+ happens, you will have to reduce the specified transaction rate.
</para>
</listitem>
</varlistentry>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers