We've had more buildfarm failures due to hard-coded, short timeouts:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2018-10-13%2021%3A06%3A58
10s timeout while running pg_recvlogical
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-12-03%2005%3A52%3A12
30s timeout while running pg_recvlogical
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2018-11-30%2014%3A31%3A18
60s timeout in isolationtester try_complete_step()
The 180s timeout in poll_query_until has been trouble-free since 2a0f89c
introduced it two years ago. I plan to raise the timeouts in question to
180s, as attached.
diff --git a/src/test/isolation/README b/src/test/isolation/README
index bea278a..780b6dc 100644
--- a/src/test/isolation/README
+++ b/src/test/isolation/README
@@ -108,8 +108,8 @@ Each step may contain commands that block until further
action has been taken
deadlock). A test that uses this ability must manually specify valid
permutations, i.e. those that would not expect a blocked session to execute a
command. If a test fails to follow that rule, isolationtester will cancel it
-after 60 seconds. If the cancel doesn't work, isolationtester will exit
-uncleanly after a total of 75 seconds of wait time. Testing invalid
+after 180 seconds. If the cancel doesn't work, isolationtester will exit
+uncleanly after a total of 200 seconds of wait time. Testing invalid
permutations should be avoided because they can make the isolation tests take
a very long time to run, and they serve no useful testing purpose.
diff --git a/src/test/isolation/isolationtester.c
b/src/test/isolation/isolationtester.c
index 7df67da..9134b05 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -783,15 +783,15 @@ try_complete_step(Step *step, int flags)
td += (int64) current_time.tv_usec - (int64)
start_time.tv_usec;
/*
- * After 60 seconds, try to cancel the query.
+ * After 180 seconds, try to cancel the query.
*
* If the user tries to test an invalid permutation, we
don't want
* to hang forever, especially when this is running in
the
- * buildfarm. So try to cancel it after a minute.
This will
- * presumably lead to this permutation failing, but
remaining
- * permutations and tests should still be OK.
+ * buildfarm. This will presumably lead to this
permutation
+ * failing, but remaining permutations and tests should
still be
+ * OK.
*/
- if (td > 60 * USECS_PER_SEC && !canceled)
+ if (td > 180 * USECS_PER_SEC && !canceled)
{
PGcancel *cancel = PQgetCancel(conn);
@@ -808,15 +808,15 @@ try_complete_step(Step *step, int flags)
}
/*
- * After 75 seconds, just give up and die.
+ * After 200 seconds, just give up and die.
*
* Since cleanup steps won't be run in this case, this
may cause
* later tests to fail. That stinks, but it's better
than waiting
* forever for the server to respond to the cancel.
*/
- if (td > 75 * USECS_PER_SEC)
+ if (td > 200 * USECS_PER_SEC)
{
- fprintf(stderr, "step %s timed out after 75
seconds\n",
+ fprintf(stderr, "step %s timed out after 200
seconds\n",
step->name);
exit_nicely();
}
diff --git a/src/test/recovery/t/006_logical_decoding.pl
b/src/test/recovery/t/006_logical_decoding.pl
index 884b0ae..c23cc4d 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -72,7 +72,7 @@ my $endpos = $node_master->safe_psql('postgres',
print "waiting to replay $endpos\n";
my $stdout_recv = $node_master->pg_recvlogical_upto(
- 'postgres', 'test_slot', $endpos, 10,
+ 'postgres', 'test_slot', $endpos, 180,
'include-xids' => '0',
'skip-empty-xacts' => '1');
chomp($stdout_recv);
@@ -84,7 +84,7 @@ $node_master->poll_query_until('postgres',
) or die "slot never became inactive";
$stdout_recv = $node_master->pg_recvlogical_upto(
- 'postgres', 'test_slot', $endpos, 10,
+ 'postgres', 'test_slot', $endpos, 180,
'include-xids' => '0',
'skip-empty-xacts' => '1');
chomp($stdout_recv);
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl
b/src/test/recovery/t/010_logical_decoding_timelines.pl
index 4fbd386..e582b20 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -183,7 +183,7 @@ my $endpos = $node_replica->safe_psql('postgres',
$stdout = $node_replica->pg_recvlogical_upto(
'postgres', 'before_basebackup',
- $endpos, 30,
+ $endpos, 180,
'include-xids' => '0',
'skip-empty-xacts' => '1');