On Wed, 16 Jul 2025 21:35:01 +0900
Rintaro Ikeda <[email protected]> wrote:
> Hi,
>
> On 2025/07/15 11:16, Yugo Nagata wrote:
> >> I noticed one small thing I’d like to discuss. I'm not sure that users
> >> clearly
> >> understand which aborted in the following error message, the client or the
> >> script.
> >>> pgbench: error: client 0 script 0 aborted in command ... query ...
> >>
> >> Since the code path always results in a client abort, I wonder if the
> >> following
> >> message might be clearer:
> >>> pgbench: error: client 0 aborted in script 0 command ... query ...
> >
> > Indeed, it seems clearer to explicitly state that it is the client that
> > was aborted.
> >
> > I've attached an updated patch that replaces the remaining message mentioned
> > above with a call to commandFailed(). With this change, the output in such
> > situations will now be:
> >
> > "client 0 aborted in command 0 (SQL) of script 0; ...."
>
> Thank you for updating the patch!
>
> When I executed a custom script that may raise a unique constraint violation,
> I
> got the following output:
> > pgbench: error: client 0 script 0 aborted in command 1 query 0: ERROR:
> duplicate key value violates unique constraint "test_col2_key"
I'm sorry. I must have failed to attach the correct patch in my previous post.
As a result, patch v8 was actually the same as v7, and the message in question
was not modified as intended.
>
> I think we should also change the error message in pg_log_error. I modified
> the
> patch v8-0003 as follows:
> @@ -3383,8 +3383,8 @@ readCommandResponse(CState *st, MetaCommand meta, char
> *varprefix)
>
> default:
> /* anything else is unexpected */
> - pg_log_error("client %d script %d aborted in
> command %d query %d: %s",
> - st->id, st->use_file,
> st->command, qrynum,
> + pg_log_error("client %d aborted in command %d
> query %d of script %d: %s",
> + st->id, st->command,
> qrynum, st->use_file,
>
> PQerrorMessage(st->con));
> goto error;
> }
>
> With this change, the output now is like this:
> > pgbench: error: client 0 aborted in command 1 query 0 of script 0: ERROR:
> duplicate key value violates unique constraint "test_col2_key"
>
> I want hear your thoughts.
My idea is to modify this as follows;
default:
/* anything else is unexpected */
- pg_log_error("client %d script %d aborted in
command %d query %d: %s",
- st->id, st->use_file,
st->command, qrynum,
-
PQerrorMessage(st->con));
+ commandFailed(st, "SQL",
PQerrorMessage(st->con));
goto error;
}
This fix is originally planned to be included in patch v8, but was missed.
It is now included in the attached patch, v10.
With this change, the output becomes:
pgbench: error: client 0 aborted in command 0 (SQL) of script 0;
ERROR: duplicate key value violates unique constraint "t2_pkey"
Although there is a slight difference, the message is essentially the same as
your proposal. Also, I believe the use of commandFailed() makes the code simpler
and more consistent.
What do you think?
> Also, let me ask one question. In this case, I directly modified your commit
> in
> the v8-0003 patch. Is that the right way to update the patch?
I’m not sure if that’s the best way, but I think modifying the patch directly
is a
valid way to propose an alternative approach during discussion, as long as the
original
patch is respected. It can often help clarify suggestions.
Regards,
Yugo Nagata
--
Yugo Nagata <[email protected]>
>From 9b45e1a0d5a2efd9443002bd84e0f3b93e6a4332 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <[email protected]>
Date: Thu, 10 Jul 2025 17:21:05 +0900
Subject: [PATCH v10 3/3] Improve error messages for errors that cause client
abortion
This commit modifies relevant error messages to explicitly indicate that the
client was aborted. As part of this change, pg_log_error was replaced with
commandFailed().
---
src/bin/pgbench/pgbench.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7dbeb79ca8d..4124c7b341c 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3309,8 +3309,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */
if (is_last && meta == META_GSET)
{
- pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
- st->id, st->use_file, st->command, qrynum, 0);
+ commandFailed(st, "gset", psprintf("expected one row, got %d", 0));
st->estatus = ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3324,8 +3323,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
if (meta == META_GSET && ntuples != 1)
{
/* under \gset, report the error */
- pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
- st->id, st->use_file, st->command, qrynum, PQntuples(res));
+ commandFailed(st, "gset", psprintf("expected one row, got %d", PQntuples(res)));
st->estatus = ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3339,18 +3337,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
for (int fld = 0; fld < PQnfields(res); fld++)
{
char *varname = PQfname(res, fld);
+ char *cmd = (meta == META_ASET ? "aset" : "gset");
/* allocate varname only if necessary, freed below */
if (*varprefix != '\0')
varname = psprintf("%s%s", varprefix, varname);
/* store last row result as a string */
- if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
+ if (!putVariable(&st->variables, cmd, varname,
PQgetvalue(res, ntuples - 1, fld)))
{
/* internal error */
- pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
- st->id, st->use_file, st->command, qrynum, varname);
+ commandFailed(st, cmd, psprintf("error storing into variable %s", varname));
st->estatus = ESTATUS_META_COMMAND_ERROR;
goto error;
}
@@ -3385,9 +3383,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
default:
/* anything else is unexpected */
- pg_log_error("client %d script %d aborted in command %d query %d: %s",
- st->id, st->use_file, st->command, qrynum,
- PQerrorMessage(st->con));
+ commandFailed(st, "SQL", PQerrorMessage(st->con));
goto error;
}
--
2.43.0
>From 54ae59a76bd9b465f546c02ac248df14d82aa36c Mon Sep 17 00:00:00 2001
From: Rintaro Ikeda <[email protected]>
Date: Wed, 9 Jul 2025 23:50:36 +0900
Subject: [PATCH v10 2/3] Rename a confusing enumerator
Rename the confusing enumerator which may be mistakenly assumed to be related to
other_sql_errors
---
src/bin/pgbench/pgbench.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index edd8b01f794..7dbeb79ca8d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -485,7 +485,7 @@ typedef enum TStatus
TSTATUS_IDLE,
TSTATUS_IN_BLOCK,
TSTATUS_CONN_ERROR,
- TSTATUS_OTHER_ERROR,
+ TSTATUS_UNKNOWN_ERROR,
} TStatus;
/* Various random sequences are initialized from this one. */
@@ -3577,12 +3577,12 @@ getTransactionStatus(PGconn *con)
* not. Internal error which should never occur.
*/
pg_log_error("unexpected transaction status %d", tx_status);
- return TSTATUS_OTHER_ERROR;
+ return TSTATUS_UNKNOWN_ERROR;
}
/* not reached */
Assert(false);
- return TSTATUS_OTHER_ERROR;
+ return TSTATUS_UNKNOWN_ERROR;
}
/*
--
2.43.0
>From 7d948731260679b7dde6861a7176a0cf8cb2b8b9 Mon Sep 17 00:00:00 2001
From: Rintaro Ikeda <[email protected]>
Date: Wed, 9 Jul 2025 23:36:37 +0900
Subject: [PATCH v10 1/3] Add --continue-on-error option
When the option is set, client rolls back the failed transaction and starts a
new one when its transaction fails due to the reason other than the deadlock and
serialization failure.
---
doc/src/sgml/ref/pgbench.sgml | 71 +++++++++++++++-----
src/bin/pgbench/pgbench.c | 57 +++++++++++++---
src/bin/pgbench/t/001_pgbench_with_server.pl | 22 ++++++
3 files changed, 125 insertions(+), 25 deletions(-)
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ab252d9fc74..15fcb45e223 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -76,9 +76,8 @@ tps = 896.967014 (without initial connection time)
and number of transactions per client); these will be equal unless the run
failed before completion or some SQL command(s) failed. (In
<option>-T</option> mode, only the actual number of transactions is printed.)
- The next line reports the number of failed transactions due to
- serialization or deadlock errors (see <xref linkend="failures-and-retries"/>
- for more information).
+ The next line reports the number of failed transactions (see
+ <xref linkend="failures-and-retries"/> for more information).
The last line reports the number of transactions per second.
</para>
@@ -790,6 +789,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
<listitem>
<para>deadlock failures;</para>
</listitem>
+ <listitem>
+ <para>other failures;</para>
+ </listitem>
</itemizedlist>
See <xref linkend="failures-and-retries"/> for more information.
</para>
@@ -914,6 +916,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
</listitem>
</varlistentry>
+ <varlistentry id="pgbench-option-continue-on-error">
+ <term><option>--continue-on-error</option></term>
+ <listitem>
+ <para>
+ Allows clients to continue their run even if an SQL statement fails due to
+ errors other than serialization or deadlock. Unlike serialization and deadlock
+ failures, clients do not retry the same transactions but start new transaction.
+ This option is useful when your custom script may raise errors due to some
+ reason like unique constraints violation. Without this option, the client is
+ aborted after such errors.
+ </para>
+ <para>
+ Note that serialization and deadlock failures never cause the client to be
+ aborted even after clients retries <option>--max-tries</option> times by
+ default, so they are not affected by this option.
+ See <xref linkend="failures-and-retries"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
@@ -2409,8 +2431,8 @@ END;
will be reported as <literal>failed</literal>. If you use the
<option>--failures-detailed</option> option, the
<replaceable>time</replaceable> of the failed transaction will be reported as
- <literal>serialization</literal> or
- <literal>deadlock</literal> depending on the type of failure (see
+ <literal>serialization</literal>, <literal>deadlock</literal>, or
+ <literal>other</literal> depending on the type of failure (see
<xref linkend="failures-and-retries"/> for more information).
</para>
@@ -2638,6 +2660,16 @@ END;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><replaceable>other_sql_failures</replaceable></term>
+ <listitem>
+ <para>
+ number of transactions that got a SQL error
+ (zero unless <option>--failures-detailed</option> is specified)
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
@@ -2646,8 +2678,8 @@ END;
<screen>
<userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test</userinput>
-1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0
-1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0
+1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0 0
+1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0 0
</screen>
</para>
@@ -2839,9 +2871,11 @@ statement latencies in milliseconds, failures and retries:
<option>--exit-on-abort</option> is specified. Otherwise in the worst
case they only lead to the abortion of the failed client while other
clients continue their run (but some client errors are handled without
- an abortion of the client and reported separately, see below). Later in
- this section it is assumed that the discussed errors are only the
- direct client errors and they are not internal
+ an abortion of the client and reported separately, see below). When
+ <option>--continue-on-error</option> is specified, the client
+ continues to process new transactions even if it encounters an error.
+ Later in this section it is assumed that the discussed errors are only
+ the direct client errors and they are not internal
<application>pgbench</application> errors.
</para>
</listitem>
@@ -2851,14 +2885,17 @@ statement latencies in milliseconds, failures and retries:
<para>
A client's run is aborted in case of a serious error; for example, the
connection with the database server was lost or the end of script was reached
- without completing the last transaction. In addition, if execution of an SQL
+ without completing the last transaction. By default, if execution of an SQL
or meta command fails for reasons other than serialization or deadlock errors,
- the client is aborted. Otherwise, if an SQL command fails with serialization or
- deadlock errors, the client is not aborted. In such cases, the current
- transaction is rolled back, which also includes setting the client variables
- as they were before the run of this transaction (it is assumed that one
- transaction script contains only one transaction; see
- <xref linkend="transactions-and-scripts"/> for more information).
+ the client is aborted. However, if the --continue-on-error option is specified,
+ the client does not abort and proceeds to the next transaction regardless of
+ the error. These cases are reported as "other failures" in the output.
+ On contrast, if an SQL command fails with serialization or deadlock errors, the
+ client is not aborted even without <option>--continue-on-error</option>.
+ Instead, the current transaction is rolled back, which also includes setting
+ the client variables as they were before the run of this transaction
+ (it is assumed that one transaction script contains only one transaction;
+ see <xref linkend="transactions-and-scripts"/> for more information).
Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum
number of tries (specified by the <option>--max-tries</option> option) / the maximum
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 497a936c141..edd8b01f794 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -402,15 +402,23 @@ typedef struct StatsData
* directly successful transactions (they were successfully completed on
* the first try).
*
- * A failed transaction is defined as unsuccessfully retried transactions.
- * It can be one of two types:
+ * A failed transaction is counted differently depending on whether
+ * the --continue-on-error option is specified.
*
+ * Without --continue-on-error:
* failed (the number of failed transactions) =
* 'serialization_failures' (they got a serialization error and were not
* successfully retried) +
* 'deadlock_failures' (they got a deadlock error and were not
* successfully retried).
*
+ * When --continue-on-error is specified:
+ *
+ * failed (number of failed transactions) =
+ * 'serialization_failures' + 'deadlock_failures' +
+ * 'other_sql_failures' (they got some other SQL error; the transaction was
+ * not retried and counted as failed due to --continue-on-error).
+ *
* If the transaction was retried after a serialization or a deadlock
* error this does not guarantee that this retry was successful. Thus
*
@@ -440,6 +448,11 @@ typedef struct StatsData
int64 deadlock_failures; /* number of transactions that were not
* successfully retried after a deadlock
* error */
+ int64 other_sql_failures; /* number of failed transactions for
+ * reasons other than
+ * serialization/deadlock failure, which
+ * is counted if --continue-on-error is
+ * specified */
SimpleStats latency;
SimpleStats lag;
} StatsData;
@@ -770,6 +783,7 @@ static int64 total_weight = 0;
static bool verbose_errors = false; /* print verbose messages of all errors */
static bool exit_on_abort = false; /* exit when any client is aborted */
+static bool continue_on_error = false; /* continue after errors */
/* Builtin test scripts */
typedef struct BuiltinScript
@@ -954,6 +968,7 @@ usage(void)
" --log-prefix=PREFIX prefix for transaction time log file\n"
" (default: \"pgbench_log\")\n"
" --max-tries=NUM max number of tries to run transaction (default: 1)\n"
+ " --continue-on-error continue running after an SQL error\n"
" --progress-timestamp use Unix epoch timestamps for progress\n"
" --random-seed=SEED set random seed (\"time\", \"rand\", integer)\n"
" --sampling-rate=NUM fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1467,6 +1482,7 @@ initStats(StatsData *sd, pg_time_usec_t start)
sd->retried = 0;
sd->serialization_failures = 0;
sd->deadlock_failures = 0;
+ sd->other_sql_failures = 0;
initSimpleStats(&sd->latency);
initSimpleStats(&sd->lag);
}
@@ -1516,6 +1532,9 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag,
case ESTATUS_DEADLOCK_ERROR:
stats->deadlock_failures++;
break;
+ case ESTATUS_OTHER_SQL_ERROR:
+ stats->other_sql_failures++;
+ break;
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
@@ -3356,7 +3375,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
case PGRES_FATAL_ERROR:
st->estatus = getSQLErrorStatus(PQresultErrorField(res,
PG_DIAG_SQLSTATE));
- if (canRetryError(st->estatus))
+ if (continue_on_error || canRetryError(st->estatus))
{
if (verbose_errors)
commandError(st, PQerrorMessage(st->con));
@@ -4007,7 +4026,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
st->state = CSTATE_END_COMMAND;
}
- else if (canRetryError(st->estatus))
+ else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) ||
+ canRetryError(st->estatus))
st->state = CSTATE_ERROR;
else
st->state = CSTATE_ABORTED;
@@ -4528,7 +4548,8 @@ static int64
getFailures(const StatsData *stats)
{
return (stats->serialization_failures +
- stats->deadlock_failures);
+ stats->deadlock_failures +
+ stats->other_sql_failures);
}
/*
@@ -4548,6 +4569,8 @@ getResultString(bool skipped, EStatus estatus)
return "serialization";
case ESTATUS_DEADLOCK_ERROR:
return "deadlock";
+ case ESTATUS_OTHER_SQL_ERROR:
+ return "other";
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
@@ -4603,6 +4626,7 @@ doLog(TState *thread, CState *st,
int64 skipped = 0;
int64 serialization_failures = 0;
int64 deadlock_failures = 0;
+ int64 other_sql_failures = 0;
int64 retried = 0;
int64 retries = 0;
@@ -4643,10 +4667,12 @@ doLog(TState *thread, CState *st,
{
serialization_failures = agg->serialization_failures;
deadlock_failures = agg->deadlock_failures;
+ other_sql_failures = agg->other_sql_failures;
}
- fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+ fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
serialization_failures,
- deadlock_failures);
+ deadlock_failures,
+ other_sql_failures);
fputc('\n', logfile);
@@ -6285,6 +6311,7 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
cur.serialization_failures +=
threads[i].stats.serialization_failures;
cur.deadlock_failures += threads[i].stats.deadlock_failures;
+ cur.other_sql_failures += threads[i].stats.other_sql_failures;
}
/* we count only actually executed transactions */
@@ -6427,7 +6454,8 @@ printResults(StatsData *total,
/*
* Remaining stats are nonsensical if we failed to execute any xacts due
- * to others than serialization or deadlock errors
+ * to other than serialization or deadlock errors and --continue-on-error
+ * is not set.
*/
if (total_cnt <= 0)
return;
@@ -6443,6 +6471,9 @@ printResults(StatsData *total,
printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
total->deadlock_failures,
100.0 * total->deadlock_failures / total_cnt);
+ printf("number of other failures: " INT64_FORMAT " (%.3f%%)\n",
+ total->other_sql_failures,
+ 100.0 * total->other_sql_failures / total_cnt);
}
/* it can be non-zero only if max_tries is not equal to one */
@@ -6546,6 +6577,10 @@ printResults(StatsData *total,
sstats->deadlock_failures,
(100.0 * sstats->deadlock_failures /
script_total_cnt));
+ printf(" - number of other failures: " INT64_FORMAT " (%.3f%%)\n",
+ sstats->other_sql_failures,
+ (100.0 * sstats->other_sql_failures /
+ script_total_cnt));
}
/*
@@ -6705,6 +6740,7 @@ main(int argc, char **argv)
{"verbose-errors", no_argument, NULL, 15},
{"exit-on-abort", no_argument, NULL, 16},
{"debug", no_argument, NULL, 17},
+ {"continue-on-error", no_argument, NULL, 18},
{NULL, 0, NULL, 0}
};
@@ -7058,6 +7094,10 @@ main(int argc, char **argv)
case 17: /* debug */
pg_logging_increase_verbosity();
break;
+ case 18: /* continue-on-error */
+ benchmarking_option_set = true;
+ continue_on_error = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -7413,6 +7453,7 @@ main(int argc, char **argv)
stats.retried += thread->stats.retried;
stats.serialization_failures += thread->stats.serialization_failures;
stats.deadlock_failures += thread->stats.deadlock_failures;
+ stats.other_sql_failures += thread->stats.other_sql_failures;
latency_late += thread->latency_late;
conn_total_duration += thread->conn_duration;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 7dd78940300..8bb35dda5f7 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1813,6 +1813,28 @@ update counter set i = i+1 returning i \gset
# Clean up
$node->safe_psql('postgres', 'DROP TABLE counter;');
+# Test --continue-on-error
+$node->safe_psql('postgres',
+ 'CREATE TABLE unique_table(i int unique);' . 'INSERT INTO unique_table VALUES (0);');
+
+$node->pgbench(
+ '-t 10 --continue-on-error --failures-detailed',
+ 0,
+ [
+ qr{processed: 0/10\b},
+ qr{other failures: 10\b}
+ ],
+ [],
+ 'test --continue-on-error',
+ {
+ '002_continue_on_error' => q{
+ insert into unique_table values 0;
+ }
+ });
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE unique_table;');
+
# done
$node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
$node->stop;
--
2.43.0