Hi,
On 2025/10/02 1:22, Fujii Masao wrote:
> Regarding 0002:
>
> - if (canRetryError(st->estatus))
> + if (continue_on_error || canRetryError(st->estatus))
> {
> if (verbose_errors)
> commandError(st, PQresultErrorMessage(res));
> goto error;
>
> With this change, even non-SQL errors (e.g., connection failures) would
> satisfy the condition when --continue-on-error is set. Isn't that a problem?
> Shouldn't we also check that the error status is one that
> --continue-on-error is meant to handle?
I agree that connection failures should not be ignored even when
--continue-on-error is specified.
For now, I’m not sure if other cases would cause issues, so the updated patch
explicitly checks the connection status and emits an error message when the
connection is lost.
>
>
> + * Without --continue-on-error:
> *
> * failed (the number of failed transactions) =
> * 'serialization_failures' (they got a serialization error and were not
> * successfully retried) +
> * 'deadlock_failures' (they got a deadlock error and were not
> * successfully retried).
> *
> + * With --continue-on-error:
> + *
> + * failed (number of failed transactions) =
> + * 'serialization_failures' + 'deadlock_failures' +
> + * 'other_sql_failures' (they got some other SQL error; the transaction
> was
> + * not retried and counted as failed due to --continue-on-error).
>
> About the comments on failed transactions: I don't think we need
> to split them into separate "with/without --continue-on-error" sections.
> How about simplifying them like this?
>
>
> ------------------------
> * failed (the number of failed transactions) =
> * 'serialization_failures' (they got a serialization error and were not
> * successfully retried) +
> * 'deadlock_failures' (they got a deadlock error and were not
> * successfully retried) +
> * 'other_sql_failures' (they failed on the first try or after retries
> * due to a SQL error other than serialization or
> * deadlock; they are counted as a failed transaction
> * only when --continue-on-error is specified).
> ------------------------
>
Thank you for the suggestion. I’ve updated the comments as you proposed.
>
> * 'retried' (number of all retried transactions) =
> * successfully retried transactions +
> * failed transactions.
>
> Since transactions that failed on the first try (i.e., no retries) due to
> an SQL error are not counted as 'retried', shouldn't this source comment
> be updated?
Agreed. I added "failed transactions" is actually counted when they are retied.
I've attached the updated patch v17-0002. 0003 remains unchanged.
Best regards,
Rintaro Ikeda
From 8ae5be55a2704f813e200917968ae040146486ab Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Fri, 19 Sep 2025 16:54:49 +0900
Subject: [PATCH v17 2/3] pgbench: Add --continue-on-error option
When the option is set, client rolls back the failed transaction and starts a
new one when its transaction fails due to the reason other than the deadlock and
serialization failure.
---
doc/src/sgml/ref/pgbench.sgml | 64 ++++++++++++++++----
src/bin/pgbench/pgbench.c | 63 +++++++++++++++----
src/bin/pgbench/t/001_pgbench_with_server.pl | 22 +++++++
3 files changed, 125 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ab252d9fc74..63230102357 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -76,9 +76,8 @@ tps = 896.967014 (without initial connection time)
and number of transactions per client); these will be equal unless the run
failed before completion or some SQL command(s) failed. (In
<option>-T</option> mode, only the actual number of transactions is printed.)
- The next line reports the number of failed transactions due to
- serialization or deadlock errors (see <xref linkend="failures-and-retries"/>
- for more information).
+ The next line reports the number of failed transactions (see
+ <xref linkend="failures-and-retries"/> for more information).
The last line reports the number of transactions per second.
</para>
@@ -790,6 +789,9 @@ pgbench <optional> <replaceable>options</replaceable>
</optional> <replaceable>d
<listitem>
<para>deadlock failures;</para>
</listitem>
+ <listitem>
+ <para>other failures;</para>
+ </listitem>
</itemizedlist>
See <xref linkend="failures-and-retries"/> for more information.
</para>
@@ -914,6 +916,26 @@ pgbench <optional> <replaceable>options</replaceable>
</optional> <replaceable>d
</listitem>
</varlistentry>
+ <varlistentry id="pgbench-option-continue-on-error">
+ <term><option>--continue-on-error</option></term>
+ <listitem>
+ <para>
+ Allows clients to continue running even if an SQL statement fails due
to
+ errors other than serialization or deadlock. Unlike serialization and
deadlock
+ failures, clients do not retry the same transactions but proceed to
the next
+ transaction. This option is useful when your custom script may raise
errors for
+ reasons such as unique constraints violation. Without this option, the
+ client is aborted after such errors.
+ </para>
+ <para>
+ Note that serialization and deadlock failures never cause the client
to be
+ aborted even after clients retries <option>--max-tries</option> times
by
+ default, so they are not affected by this option.
+ See <xref linkend="failures-and-retries"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
@@ -2409,8 +2431,8 @@ END;
will be reported as <literal>failed</literal>. If you use the
<option>--failures-detailed</option> option, the
<replaceable>time</replaceable> of the failed transaction will be reported
as
- <literal>serialization</literal> or
- <literal>deadlock</literal> depending on the type of failure (see
+ <literal>serialization</literal>, <literal>deadlock</literal>, or
+ <literal>other</literal> depending on the type of failure (see
<xref linkend="failures-and-retries"/> for more information).
</para>
@@ -2638,6 +2660,16 @@ END;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><replaceable>other_sql_failures</replaceable></term>
+ <listitem>
+ <para>
+ number of transactions that got an SQL error
+ (zero unless <option>--failures-detailed</option> is specified)
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
@@ -2646,8 +2678,8 @@ END;
<screen>
<userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log
--rate=1000 --latency-limit=10 --failures-detailed --max-tries=10
test</userinput>
-1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64
7564 28340 4148 0
-1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598
7392 26621 4527 0
+1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64
7564 28340 4148 0 0
+1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598
7392 26621 4527 0 0
</screen>
</para>
@@ -2851,10 +2883,20 @@ statement latencies in milliseconds, failures and
retries:
<para>
A client's run is aborted in case of a serious error; for example, the
connection with the database server was lost or the end of script was
reached
- without completing the last transaction. In addition, if execution of an SQL
- or meta command fails for reasons other than serialization or deadlock
errors,
- the client is aborted. Otherwise, if an SQL command fails with
serialization or
- deadlock errors, the client is not aborted. In such cases, the current
+ without completing the last transaction. The client also aborts
+ if a meta command fails, or if an SQL command fails for reasons other than
+ serialization or deadlock errors when <option>--continue-on-error</option>
+ is not specified. With <option>--continue-on-error</option>,
+ the client does not abort on such SQL errors and instead proceeds to
+ the next transaction. These cases are reported as
+ <literal>other failures</literal> in the output. If the error occurs
+ in a meta command, however, the client still aborts even when this option
+ is specified.
+ </para>
+ <para>
+ If an SQL command fails due to serialization or deadlock errors, the
+ client does not abort, regardless of whether
+ <option>--continue-on-error</option> is used. Instead, the current
transaction is rolled back, which also includes setting the client variables
as they were before the run of this transaction (it is assumed that one
transaction script contains only one transaction; see
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8656a87d280..7aa4dd0a893 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -402,14 +402,15 @@ typedef struct StatsData
* directly successful transactions (they were successfully completed
on
* the first try).
*
- * A failed transaction is defined as unsuccessfully retried
transactions.
- * It can be one of two types:
- *
- * failed (the number of failed transactions) =
+ * 'failed' (the number of failed transactions) =
* 'serialization_failures' (they got a serialization error and were
not
- * successfully retried) +
+ * successfully retried) +
* 'deadlock_failures' (they got a deadlock error and were not
- * successfully retried).
+ * successfully retried) +
+ * 'other_sql_failures' (they failed on the first try or after
retries
+ * due to a SQL error other than serialization or
+ * deadlock; they are counted as a failed
transaction
+ * only when --continue-on-error is specified).
*
* If the transaction was retried after a serialization or a deadlock
* error this does not guarantee that this retry was successful. Thus
@@ -421,7 +422,7 @@ typedef struct StatsData
*
* 'retried' (number of all retried transactions) =
* successfully retried transactions +
- * failed transactions.
+ * unsuccessful retried transactions.
*----------
*/
int64 cnt; /* number of successful
transactions, not
@@ -440,6 +441,11 @@ typedef struct StatsData
int64 deadlock_failures; /* number of transactions that
were not
*
successfully retried after a deadlock
*
error */
+ int64 other_sql_failures; /* number of failed transactions for
+ *
reasons other than
+ *
serialization/deadlock failure, which
+ * is
counted if --continue-on-error is
+ *
specified */
SimpleStats latency;
SimpleStats lag;
} StatsData;
@@ -770,6 +776,7 @@ static int64 total_weight = 0;
static bool verbose_errors = false; /* print verbose messages of all errors */
static bool exit_on_abort = false; /* exit when any client is aborted */
+static bool continue_on_error = false; /* continue after errors */
/* Builtin test scripts */
typedef struct BuiltinScript
@@ -954,6 +961,7 @@ usage(void)
" --log-prefix=PREFIX prefix for transaction time log
file\n"
" (default: \"pgbench_log\")\n"
" --max-tries=NUM max number of tries to run
transaction (default: 1)\n"
+ " --continue-on-error continue running after an SQL
error\n"
" --progress-timestamp use Unix epoch timestamps for
progress\n"
" --random-seed=SEED set random seed (\"time\",
\"rand\", integer)\n"
" --sampling-rate=NUM fraction of transactions to log
(e.g., 0.01 for 1%%)\n"
@@ -1467,6 +1475,7 @@ initStats(StatsData *sd, pg_time_usec_t start)
sd->retried = 0;
sd->serialization_failures = 0;
sd->deadlock_failures = 0;
+ sd->other_sql_failures = 0;
initSimpleStats(&sd->latency);
initSimpleStats(&sd->lag);
}
@@ -1516,6 +1525,9 @@ accumStats(StatsData *stats, bool skipped, double lat,
double lag,
case ESTATUS_DEADLOCK_ERROR:
stats->deadlock_failures++;
break;
+ case ESTATUS_OTHER_SQL_ERROR:
+ stats->other_sql_failures++;
+ break;
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
@@ -3356,7 +3368,8 @@ readCommandResponse(CState *st, MetaCommand meta, char
*varprefix)
case PGRES_FATAL_ERROR:
st->estatus =
getSQLErrorStatus(PQresultErrorField(res,
PG_DIAG_SQLSTATE));
- if (canRetryError(st->estatus))
+ if ((continue_on_error ||
canRetryError(st->estatus)) &&
+ PQstatus(st->con) != CONNECTION_BAD)
{
if (verbose_errors)
commandError(st,
PQresultErrorMessage(res));
@@ -4020,7 +4033,10 @@ advanceConnectionState(TState *thread, CState *st,
StatsData *agg)
if (PQpipelineStatus(st->con) !=
PQ_PIPELINE_ON)
st->state = CSTATE_END_COMMAND;
}
- else if (canRetryError(st->estatus))
+ else if (PQstatus(st->con) == CONNECTION_BAD)
+ st->state = CSTATE_ABORTED;
+ else if ((st->estatus ==
ESTATUS_OTHER_SQL_ERROR && continue_on_error) ||
+ canRetryError(st->estatus))
st->state = CSTATE_ERROR;
else
st->state = CSTATE_ABORTED;
@@ -4541,7 +4557,8 @@ static int64
getFailures(const StatsData *stats)
{
return (stats->serialization_failures +
- stats->deadlock_failures);
+ stats->deadlock_failures +
+ stats->other_sql_failures);
}
/*
@@ -4561,6 +4578,8 @@ getResultString(bool skipped, EStatus estatus)
return "serialization";
case ESTATUS_DEADLOCK_ERROR:
return "deadlock";
+ case ESTATUS_OTHER_SQL_ERROR:
+ return "other";
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d",
estatus);
@@ -4616,6 +4635,7 @@ doLog(TState *thread, CState *st,
int64 skipped = 0;
int64 serialization_failures = 0;
int64 deadlock_failures = 0;
+ int64 other_sql_failures = 0;
int64 retried = 0;
int64 retries = 0;
@@ -4656,10 +4676,12 @@ doLog(TState *thread, CState *st,
{
serialization_failures =
agg->serialization_failures;
deadlock_failures = agg->deadlock_failures;
+ other_sql_failures = agg->other_sql_failures;
}
- fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+ fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " "
INT64_FORMAT,
serialization_failures,
- deadlock_failures);
+ deadlock_failures,
+ other_sql_failures);
fputc('\n', logfile);
@@ -6298,6 +6320,7 @@ printProgressReport(TState *threads, int64 test_start,
pg_time_usec_t now,
cur.serialization_failures +=
threads[i].stats.serialization_failures;
cur.deadlock_failures += threads[i].stats.deadlock_failures;
+ cur.other_sql_failures += threads[i].stats.other_sql_failures;
}
/* we count only actually executed transactions */
@@ -6440,7 +6463,8 @@ printResults(StatsData *total,
/*
* Remaining stats are nonsensical if we failed to execute any xacts due
- * to others than serialization or deadlock errors
+ * to other than serialization or deadlock errors and
--continue-on-error
+ * is not set.
*/
if (total_cnt <= 0)
return;
@@ -6456,6 +6480,9 @@ printResults(StatsData *total,
printf("number of deadlock failures: " INT64_FORMAT "
(%.3f%%)\n",
total->deadlock_failures,
100.0 * total->deadlock_failures / total_cnt);
+ printf("number of other failures: " INT64_FORMAT " (%.3f%%)\n",
+ total->other_sql_failures,
+ 100.0 * total->other_sql_failures / total_cnt);
}
/* it can be non-zero only if max_tries is not equal to one */
@@ -6559,6 +6586,10 @@ printResults(StatsData *total,
sstats->deadlock_failures,
(100.0 *
sstats->deadlock_failures /
script_total_cnt));
+ printf(" - number of other
failures: " INT64_FORMAT " (%.3f%%)\n",
+
sstats->other_sql_failures,
+ (100.0 *
sstats->other_sql_failures /
+
script_total_cnt));
}
/*
@@ -6718,6 +6749,7 @@ main(int argc, char **argv)
{"verbose-errors", no_argument, NULL, 15},
{"exit-on-abort", no_argument, NULL, 16},
{"debug", no_argument, NULL, 17},
+ {"continue-on-error", no_argument, NULL, 18},
{NULL, 0, NULL, 0}
};
@@ -7071,6 +7103,10 @@ main(int argc, char **argv)
case 17: /* debug */
pg_logging_increase_verbosity();
break;
+ case 18: /* continue-on-error */
+ benchmarking_option_set = true;
+ continue_on_error = true;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more
information.", progname);
@@ -7426,6 +7462,7 @@ main(int argc, char **argv)
stats.retried += thread->stats.retried;
stats.serialization_failures +=
thread->stats.serialization_failures;
stats.deadlock_failures += thread->stats.deadlock_failures;
+ stats.other_sql_failures += thread->stats.other_sql_failures;
latency_late += thread->latency_late;
conn_total_duration += thread->conn_duration;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 7dd78940300..3c19a36a005 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1813,6 +1813,28 @@ update counter set i = i+1 returning i \gset
# Clean up
$node->safe_psql('postgres', 'DROP TABLE counter;');
+# Test --continue-on-error
+$node->safe_psql('postgres',
+ 'CREATE TABLE unique_table(i int unique);');
+
+$node->pgbench(
+ '-n -t 10 --continue-on-error --failures-detailed',
+ 0,
+ [
+ qr{processed: 1/10\b},
+ qr{other failures: 9\b}
+ ],
+ [],
+ 'test --continue-on-error',
+ {
+ '001_continue_on_error' => q{
+ INSERT INTO unique_table VALUES(0);
+ }
+ });
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE unique_table;');
+
# done
$node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
$node->stop;
--
2.39.5 (Apple Git-154)