Hi,
It is quite common to set a global statement_timeout to a few seconds
(or minutes) in postgresql.conf in order to avoid hitting a production
server with slow/bad queries.
This value is applied to all connections in the system unless it is
redefined per database, user, or explicitly changed in the connection.
Pg_rewind runs quite a few queries on the primary and if
statement_timeout hits one of them it breaks the whole process. I
can't tell for sure if this is an unrecoverable error or not and maybe
the second run of pg_rewind would be able to finish the process. Even
in case if retry is possible it makes it hard to use it for reliable
automation.
There are a few workarounds to this problem:
1. ALTER DATABASE postgres SET statement_timeout = 0;
2. ALTER rewind_username SET statement_timeout = 0;
3. Run export PGOPTIONS="-c statement_timeout=0" before calling pg_rwind.
All of them have certain pros and cons. The third approach works good
for automation, but IMHO we should simply fix pg_rewind itself and SET
statement_timeout after establishing a connection, so everybody will
benefit from it.
Patch attached.
Regards,
--
Alexander Kukushkin
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 37eccc3126..18740c15d0 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -54,6 +54,13 @@ libpqConnect(const char *connstr)
if (showprogress)
pg_log_info("connected to server");
+ /* We don't want our queries cancelled due to statement timeout */
+ res = PQexec(conn, "SET statement_timeout = 0");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ pg_fatal("could not set up statement_timeout: %s",
+ PQresultErrorMessage(res));
+ PQclear(res);
+
res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
pg_fatal("could not clear search_path: %s",