recovery TAP tests

Craig Ringer Sun, 25 Jun 2017 18:49:15 -0700

On 26 June 2017 at 05:10, Tom Lane <[email protected]> wrote:
> I've been experimenting with a change to pg_ctl, which I'll post
> separately, to reduce its reaction time so that it reports success
> more quickly after a wait for postmaster start/stop.  I found one
> case in "make check-world" that got a failure when I reduced the
> reaction time to ~1ms.  That's the very last test in 001_stream_rep.pl,
> "cascaded slot xmin reset after startup with hs feedback reset", and
> the cause appears to be that it's not allowing any delay time for a
> replication slot's state to update after a postmaster restart.
>
> This seems worth fixing independently of any possible code changes,
> because it shows that this test could fail on a slow or overloaded
> machine.  I couldn't find any instances of such a failure in the
> buildfarm archives, but that may have a lot to do with the fact that
> owners of slow buildfarm animals are (mostly?) not running this test.
>
> Some experimentation says that the minimum delay needed to make it
> work reliably on my workstation is about 100ms, so a simple patch
> along the lines of the attached might be good enough.  I find this
> approach conceptually dissatisfying, though, since it's still
> potentially vulnerable to the failure under sufficient load.
> I wonder if there is an easy way to improve that ... maybe convert
> to something involving poll_query_until?


This should do the trick:

$node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL
from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']);




-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

Reply via email to