On 26 June 2017 at 05:10, Tom Lane <t...@sss.pgh.pa.us> wrote: > I've been experimenting with a change to pg_ctl, which I'll post > separately, to reduce its reaction time so that it reports success > more quickly after a wait for postmaster start/stop. I found one > case in "make check-world" that got a failure when I reduced the > reaction time to ~1ms. That's the very last test in 001_stream_rep.pl, > "cascaded slot xmin reset after startup with hs feedback reset", and > the cause appears to be that it's not allowing any delay time for a > replication slot's state to update after a postmaster restart. > > This seems worth fixing independently of any possible code changes, > because it shows that this test could fail on a slow or overloaded > machine. I couldn't find any instances of such a failure in the > buildfarm archives, but that may have a lot to do with the fact that > owners of slow buildfarm animals are (mostly?) not running this test. > > Some experimentation says that the minimum delay needed to make it > work reliably on my workstation is about 100ms, so a simple patch > along the lines of the attached might be good enough. I find this > approach conceptually dissatisfying, though, since it's still > potentially vulnerable to the failure under sufficient load. > I wonder if there is an easy way to improve that ... maybe convert > to something involving poll_query_until?
This should do the trick: $node_standby_1->poll_query_until('postgres', q[SELECT xmin IS NULL from pg_replication_slots WHERE slot_name = '] . $slotname_2 . q[']); -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers