Hello,
On PG 9.0.8, we just observed a long-running query executing on a replica (~1
hour), which was effectively blocking replication. I say effectively, as checks
on streaming replication appeared as if everything was up-to-date (using SELECT
pg_current_xlog_location() on the primary and SELECT
pg_last_xlog_receive_location() on the replica). However, when we checked a
frequently updated table on the replica, it was ~1 hour behind the primary.
It has been our experience (and configuration) that long running queries that
block replication get cancelled after at most 10 minutes. These are the
relevant settings from our postgresql.conf on the replica:
# - Standby Servers -
hot_standby = on # "on" allows queries during recovery
# (change requires restart)
max_standby_archive_delay = 600s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
max_standby_streaming_delay = 600s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
----
It is worth noting that the query was joining a table from the main schema that
is not updated often with a table from another schema (that may or may not be
updated often). However, it appears that replication ground to a halt until we
terminated the query (which triggered crash recovery).
Are we seeing something abnormal or unexpected here? It caught us by surpriseā¦
Thank you,
Andrew Hannon
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general