Hi,
On 2025/07/15 11:16, Yugo Nagata wrote:
>> I noticed one small thing I’d like to discuss. I'm not sure that users
>> clearly
>> understand which aborted in the following error message, the client or the
>> script.
>>> pgbench: error: client 0 script 0 aborted in command ... query ...
>>
>> Since the code path always results in a client abort, I wonder if the
>> following
>> message might be clearer:
>>> pgbench: error: client 0 aborted in script 0 command ... query ...
>
> Indeed, it seems clearer to explicitly state that it is the client that
> was aborted.
>
> I've attached an updated patch that replaces the remaining message mentioned
> above with a call to commandFailed(). With this change, the output in such
> situations will now be:
>
> "client 0 aborted in command 0 (SQL) of script 0; ...."
Thank you for updating the patch!
When I executed a custom script that may raise a unique constraint violation, I
got the following output:
> pgbench: error: client 0 script 0 aborted in command 1 query 0: ERROR:
duplicate key value violates unique constraint "test_col2_key"
I think we should also change the error message in pg_log_error. I modified the
patch v8-0003 as follows:
@@ -3383,8 +3383,8 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
default:
/* anything else is unexpected */
- pg_log_error("client %d script %d aborted in
command %d query %d: %s",
- st->id, st->use_file,
st->command, qrynum,
+ pg_log_error("client %d aborted in command %d
query %d of script %d: %s",
+ st->id, st->command,
qrynum, st->use_file,
PQerrorMessage(st->con));
goto error;
}
With this change, the output now is like this:
> pgbench: error: client 0 aborted in command 1 query 0 of script 0: ERROR:
duplicate key value violates unique constraint "test_col2_key"
I want hear your thoughts.
Also, let me ask one question. In this case, I directly modified your commit in
the v8-0003 patch. Is that the right way to update the patch?
Regards,
Rintaro Ikeda
From 202e24cfad77763bf4da2f3023845223adb60e2c Mon Sep 17 00:00:00 2001
From: Rintaro Ikeda <ikedarinta...@oss.nttdata.com>
Date: Wed, 9 Jul 2025 23:36:37 +0900
Subject: [PATCH v9 1/3] Add --continue-on-error option
When the option is set, client rolls back the failed transaction and starts a
new one when its transaction fails due to the reason other than the deadlock and
serialization failure.
---
doc/src/sgml/ref/pgbench.sgml | 71 +++++++++++++++-----
src/bin/pgbench/pgbench.c | 57 +++++++++++++---
src/bin/pgbench/t/001_pgbench_with_server.pl | 22 ++++++
3 files changed, 125 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ab252d9fc74..15fcb45e223 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -76,9 +76,8 @@ tps = 896.967014 (without initial connection time)
and number of transactions per client); these will be equal unless the run
failed before completion or some SQL command(s) failed. (In
<option>-T</option> mode, only the actual number of transactions is printed.)
- The next line reports the number of failed transactions due to
- serialization or deadlock errors (see <xref linkend="failures-and-retries"/>
- for more information).
+ The next line reports the number of failed transactions (see
+ <xref linkend="failures-and-retries"/> for more information).
The last line reports the number of transactions per second.
</para>
@@ -790,6 +789,9 @@ pgbench <optional> <replaceable>options</replaceable>
</optional> <replaceable>d
<listitem>
<para>deadlock failures;</para>
</listitem>
+ <listitem>
+ <para>other failures;</para>
+ </listitem>
</itemizedlist>
See <xref linkend="failures-and-retries"/> for more information.
</para>
@@ -914,6 +916,26 @@ pgbench <optional> <replaceable>options</replaceable>
</optional> <replaceable>d
</listitem>
</varlistentry>
+ <varlistentry id="pgbench-option-continue-on-error">
+ <term><option>--continue-on-error</option></term>
+ <listitem>
+ <para>
+ Allows clients to continue their run even if an SQL statement fails
due to
+ errors other than serialization or deadlock. Unlike serialization and
deadlock
+ failures, clients do not retry the same transactions but start new
transaction.
+ This option is useful when your custom script may raise errors due to
some
+ reason like unique constraints violation. Without this option, the
client is
+ aborted after such errors.
+ </para>
+ <para>
+ Note that serialization and deadlock failures never cause the client
to be
+ aborted even after clients retries <option>--max-tries</option> times
by
+ default, so they are not affected by this option.
+ See <xref linkend="failures-and-retries"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
@@ -2409,8 +2431,8 @@ END;
will be reported as <literal>failed</literal>. If you use the
<option>--failures-detailed</option> option, the
<replaceable>time</replaceable> of the failed transaction will be reported
as
- <literal>serialization</literal> or
- <literal>deadlock</literal> depending on the type of failure (see
+ <literal>serialization</literal>, <literal>deadlock</literal>, or
+ <literal>other</literal> depending on the type of failure (see
<xref linkend="failures-and-retries"/> for more information).
</para>
@@ -2638,6 +2660,16 @@ END;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><replaceable>other_sql_failures</replaceable></term>
+ <listitem>
+ <para>
+ number of transactions that got a SQL error
+ (zero unless <option>--failures-detailed</option> is specified)
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
@@ -2646,8 +2678,8 @@ END;
<screen>
<userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log
--rate=1000 --latency-limit=10 --failures-detailed --max-tries=10
test</userinput>
-1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64
7564 28340 4148 0
-1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598
7392 26621 4527 0
+1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64
7564 28340 4148 0 0
+1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598
7392 26621 4527 0 0
</screen>
</para>
@@ -2839,9 +2871,11 @@ statement latencies in milliseconds, failures and
retries:
<option>--exit-on-abort</option> is specified. Otherwise in the worst
case they only lead to the abortion of the failed client while other
clients continue their run (but some client errors are handled without
- an abortion of the client and reported separately, see below). Later
in
- this section it is assumed that the discussed errors are only the
- direct client errors and they are not internal
+ an abortion of the client and reported separately, see below). When
+ <option>--continue-on-error</option> is specified, the client
+ continues to process new transactions even if it encounters an error.
+ Later in this section it is assumed that the discussed errors are only
+ the direct client errors and they are not internal
<application>pgbench</application> errors.
</para>
</listitem>
@@ -2851,14 +2885,17 @@ statement latencies in milliseconds, failures and
retries:
<para>
A client's run is aborted in case of a serious error; for example, the
connection with the database server was lost or the end of script was
reached
- without completing the last transaction. In addition, if execution of an SQL
+ without completing the last transaction. By default, if execution of an SQL
or meta command fails for reasons other than serialization or deadlock
errors,
- the client is aborted. Otherwise, if an SQL command fails with
serialization or
- deadlock errors, the client is not aborted. In such cases, the current
- transaction is rolled back, which also includes setting the client variables
- as they were before the run of this transaction (it is assumed that one
- transaction script contains only one transaction; see
- <xref linkend="transactions-and-scripts"/> for more information).
+ the client is aborted. However, if the --continue-on-error option is
specified,
+ the client does not abort and proceeds to the next transaction regardless of
+ the error. These cases are reported as "other failures" in the output.
+ On contrast, if an SQL command fails with serialization or deadlock errors,
the
+ client is not aborted even without <option>--continue-on-error</option>.
+ Instead, the current transaction is rolled back, which also includes setting
+ the client variables as they were before the run of this transaction
+ (it is assumed that one transaction script contains only one transaction;
+ see <xref linkend="transactions-and-scripts"/> for more information).
Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum
number of tries (specified by the <option>--max-tries</option> option) /
the maximum
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 497a936c141..edd8b01f794 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -402,15 +402,23 @@ typedef struct StatsData
* directly successful transactions (they were successfully completed
on
* the first try).
*
- * A failed transaction is defined as unsuccessfully retried
transactions.
- * It can be one of two types:
+ * A failed transaction is counted differently depending on whether
+ * the --continue-on-error option is specified.
*
+ * Without --continue-on-error:
* failed (the number of failed transactions) =
* 'serialization_failures' (they got a serialization error and were
not
* successfully retried) +
* 'deadlock_failures' (they got a deadlock error and were not
* successfully retried).
*
+ * When --continue-on-error is specified:
+ *
+ * failed (number of failed transactions) =
+ * 'serialization_failures' + 'deadlock_failures' +
+ * 'other_sql_failures' (they got some other SQL error; the
transaction was
+ * not retried and counted as failed due to --continue-on-error).
+ *
* If the transaction was retried after a serialization or a deadlock
* error this does not guarantee that this retry was successful. Thus
*
@@ -440,6 +448,11 @@ typedef struct StatsData
int64 deadlock_failures; /* number of transactions that
were not
*
successfully retried after a deadlock
*
error */
+ int64 other_sql_failures; /* number of failed transactions for
+ *
reasons other than
+ *
serialization/deadlock failure, which
+ * is
counted if --continue-on-error is
+ *
specified */
SimpleStats latency;
SimpleStats lag;
} StatsData;
@@ -770,6 +783,7 @@ static int64 total_weight = 0;
static bool verbose_errors = false; /* print verbose messages of all errors */
static bool exit_on_abort = false; /* exit when any client is aborted */
+static bool continue_on_error = false; /* continue after errors */
/* Builtin test scripts */
typedef struct BuiltinScript
@@ -954,6 +968,7 @@ usage(void)
" --log-prefix=PREFIX prefix for transaction time log
file\n"
" (default: \"pgbench_log\")\n"
" --max-tries=NUM max number of tries to run
transaction (default: 1)\n"
+ " --continue-on-error continue running after an SQL
error\n"
" --progress-timestamp use Unix epoch timestamps for
progress\n"
" --random-seed=SEED set random seed (\"time\",
\"rand\", integer)\n"
" --sampling-rate=NUM fraction of transactions to log
(e.g., 0.01 for 1%%)\n"
@@ -1467,6 +1482,7 @@ initStats(StatsData *sd, pg_time_usec_t start)
sd->retried = 0;
sd->serialization_failures = 0;
sd->deadlock_failures = 0;
+ sd->other_sql_failures = 0;
initSimpleStats(&sd->latency);
initSimpleStats(&sd->lag);
}
@@ -1516,6 +1532,9 @@ accumStats(StatsData *stats, bool skipped, double lat,
double lag,
case ESTATUS_DEADLOCK_ERROR:
stats->deadlock_failures++;
break;
+ case ESTATUS_OTHER_SQL_ERROR:
+ stats->other_sql_failures++;
+ break;
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
@@ -3356,7 +3375,7 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
case PGRES_FATAL_ERROR:
st->estatus =
getSQLErrorStatus(PQresultErrorField(res,
PG_DIAG_SQLSTATE));
- if (canRetryError(st->estatus))
+ if (continue_on_error ||
canRetryError(st->estatus))
{
if (verbose_errors)
commandError(st,
PQerrorMessage(st->con));
@@ -4007,7 +4026,8 @@ advanceConnectionState(TState *thread, CState *st,
StatsData *agg)
if (PQpipelineStatus(st->con) !=
PQ_PIPELINE_ON)
st->state = CSTATE_END_COMMAND;
}
- else if (canRetryError(st->estatus))
+ else if ((st->estatus ==
ESTATUS_OTHER_SQL_ERROR && continue_on_error) ||
+ canRetryError(st->estatus))
st->state = CSTATE_ERROR;
else
st->state = CSTATE_ABORTED;
@@ -4528,7 +4548,8 @@ static int64
getFailures(const StatsData *stats)
{
return (stats->serialization_failures +
- stats->deadlock_failures);
+ stats->deadlock_failures +
+ stats->other_sql_failures);
}
/*
@@ -4548,6 +4569,8 @@ getResultString(bool skipped, EStatus estatus)
return "serialization";
case ESTATUS_DEADLOCK_ERROR:
return "deadlock";
+ case ESTATUS_OTHER_SQL_ERROR:
+ return "other";
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d",
estatus);
@@ -4603,6 +4626,7 @@ doLog(TState *thread, CState *st,
int64 skipped = 0;
int64 serialization_failures = 0;
int64 deadlock_failures = 0;
+ int64 other_sql_failures = 0;
int64 retried = 0;
int64 retries = 0;
@@ -4643,10 +4667,12 @@ doLog(TState *thread, CState *st,
{
serialization_failures =
agg->serialization_failures;
deadlock_failures = agg->deadlock_failures;
+ other_sql_failures = agg->other_sql_failures;
}
- fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+ fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " "
INT64_FORMAT,
serialization_failures,
- deadlock_failures);
+ deadlock_failures,
+ other_sql_failures);
fputc('\n', logfile);
@@ -6285,6 +6311,7 @@ printProgressReport(TState *threads, int64 test_start,
pg_time_usec_t now,
cur.serialization_failures +=
threads[i].stats.serialization_failures;
cur.deadlock_failures += threads[i].stats.deadlock_failures;
+ cur.other_sql_failures += threads[i].stats.other_sql_failures;
}
/* we count only actually executed transactions */
@@ -6427,7 +6454,8 @@ printResults(StatsData *total,
/*
* Remaining stats are nonsensical if we failed to execute any xacts due
- * to others than serialization or deadlock errors
+ * to other than serialization or deadlock errors and
--continue-on-error
+ * is not set.
*/
if (total_cnt <= 0)
return;
@@ -6443,6 +6471,9 @@ printResults(StatsData *total,
printf("number of deadlock failures: " INT64_FORMAT "
(%.3f%%)\n",
total->deadlock_failures,
100.0 * total->deadlock_failures / total_cnt);
+ printf("number of other failures: " INT64_FORMAT " (%.3f%%)\n",
+ total->other_sql_failures,
+ 100.0 * total->other_sql_failures / total_cnt);
}
/* it can be non-zero only if max_tries is not equal to one */
@@ -6546,6 +6577,10 @@ printResults(StatsData *total,
sstats->deadlock_failures,
(100.0 *
sstats->deadlock_failures /
script_total_cnt));
+ printf(" - number of other
failures: " INT64_FORMAT " (%.3f%%)\n",
+
sstats->other_sql_failures,
+ (100.0 *
sstats->other_sql_failures /
+
script_total_cnt));
}
/*
@@ -6705,6 +6740,7 @@ main(int argc, char **argv)
{"verbose-errors", no_argument, NULL, 15},
{"exit-on-abort", no_argument, NULL, 16},
{"debug", no_argument, NULL, 17},
+ {"continue-on-error", no_argument, NULL, 18},
{NULL, 0, NULL, 0}
};
@@ -7058,6 +7094,10 @@ main(int argc, char **argv)
case 17: /* debug */
pg_logging_increase_verbosity();
break;
+ case 18: /* continue-on-error */
+ benchmarking_option_set = true;
+ continue_on_error = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
@@ -7413,6 +7453,7 @@ main(int argc, char **argv)
stats.retried += thread->stats.retried;
stats.serialization_failures +=
thread->stats.serialization_failures;
stats.deadlock_failures += thread->stats.deadlock_failures;
+ stats.other_sql_failures += thread->stats.other_sql_failures;
latency_late += thread->latency_late;
conn_total_duration += thread->conn_duration;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 7dd78940300..8bb35dda5f7 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1813,6 +1813,28 @@ update counter set i = i+1 returning i \gset
# Clean up
$node->safe_psql('postgres', 'DROP TABLE counter;');
+# Test --continue-on-error
+$node->safe_psql('postgres',
+ 'CREATE TABLE unique_table(i int unique);' . 'INSERT INTO unique_table
VALUES (0);');
+
+$node->pgbench(
+ '-t 10 --continue-on-error --failures-detailed',
+ 0,
+ [
+ qr{processed: 0/10\b},
+ qr{other failures: 10\b}
+ ],
+ [],
+ 'test --continue-on-error',
+ {
+ '002_continue_on_error' => q{
+ insert into unique_table values 0;
+ }
+ });
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE unique_table;');
+
# done
$node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
$node->stop;
--
2.39.5 (Apple Git-154)
From e92761bfffc97117b732d589a786f8ee4d9e29a7 Mon Sep 17 00:00:00 2001
From: Rintaro Ikeda <ikedarinta...@oss.nttdata.com>
Date: Wed, 9 Jul 2025 23:50:36 +0900
Subject: [PATCH v9 2/3] Rename a confusing enumerator
Rename the confusing enumerator which may be mistakenly assumed to be related to
other_sql_errors
---
src/bin/pgbench/pgbench.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index edd8b01f794..7dbeb79ca8d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -485,7 +485,7 @@ typedef enum TStatus
TSTATUS_IDLE,
TSTATUS_IN_BLOCK,
TSTATUS_CONN_ERROR,
- TSTATUS_OTHER_ERROR,
+ TSTATUS_UNKNOWN_ERROR,
} TStatus;
/* Various random sequences are initialized from this one. */
@@ -3577,12 +3577,12 @@ getTransactionStatus(PGconn *con)
* not. Internal error which should never occur.
*/
pg_log_error("unexpected transaction status %d",
tx_status);
- return TSTATUS_OTHER_ERROR;
+ return TSTATUS_UNKNOWN_ERROR;
}
/* not reached */
Assert(false);
- return TSTATUS_OTHER_ERROR;
+ return TSTATUS_UNKNOWN_ERROR;
}
/*
--
2.39.5 (Apple Git-154)
From e83f635c5dbafa6d7c2de6c2b9a111b4fd906e55 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nag...@sraoss.co.jp>
Date: Thu, 10 Jul 2025 17:21:05 +0900
Subject: [PATCH v9 3/3] Improve error messages for errors that cause client
abortion
This commit modifies relevant error messages to explicitly indicate that the
client was aborted. As part of this change, pg_log_error was replaced with
commandFailed().
---
src/bin/pgbench/pgbench.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7dbeb79ca8d..3e855c1b0aa 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3309,8 +3309,7 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
case PGRES_EMPTY_QUERY: /* may be used for testing
no-op overhead */
if (is_last && meta == META_GSET)
{
- pg_log_error("client %d script %d
command %d query %d: expected one row, got %d",
- st->id,
st->use_file, st->command, qrynum, 0);
+ commandFailed(st, "gset",
psprintf("expected one row, got %d", 0));
st->estatus =
ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3324,8 +3323,7 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
if (meta == META_GSET && ntuples != 1)
{
/* under \gset, report the
error */
- pg_log_error("client %d script
%d command %d query %d: expected one row, got %d",
-
st->id, st->use_file, st->command, qrynum, PQntuples(res));
+ commandFailed(st, "gset",
psprintf("expected one row, got %d", PQntuples(res)));
st->estatus =
ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3339,18 +3337,18 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
for (int fld = 0; fld < PQnfields(res);
fld++)
{
char *varname =
PQfname(res, fld);
+ char *cmd = (meta ==
META_ASET ? "aset" : "gset");
/* allocate varname only if
necessary, freed below */
if (*varprefix != '\0')
varname =
psprintf("%s%s", varprefix, varname);
/* store last row result as a
string */
- if
(!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
+ if
(!putVariable(&st->variables, cmd, varname,
PQgetvalue(res, ntuples - 1, fld)))
{
/* internal error */
- pg_log_error("client %d
script %d command %d query %d: error storing into variable %s",
-
st->id, st->use_file, st->command, qrynum, varname);
+ commandFailed(st, cmd,
psprintf("error storing into variable %s", varname));
st->estatus =
ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3385,8 +3383,8 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
default:
/* anything else is unexpected */
- pg_log_error("client %d script %d aborted in
command %d query %d: %s",
- st->id, st->use_file,
st->command, qrynum,
+ pg_log_error("client %d aborted in command %d
query %d of script %d: %s",
+ st->id, st->command,
qrynum, st->use_file,
PQerrorMessage(st->con));
goto error;
}
--
2.39.5 (Apple Git-154)