> On 20 Aug 2021, at 20:47, Tom Lane <[email protected]> wrote:
> 
> Daniel Gustafsson <[email protected]> writes:
>> If we want the test to run but not fail the entire test suite if it fails 
>> then
>> it should use a TODO block instead, but that’s intended for tests known to 
>> fail
>> and this doesn’t seem to fall in that category.
> 
> That seems pretty useless.  If we did break things in this area,
> such a test would not help us notice.

For sure.  I wasn’t advocating it, merely indicating that the SKIP block isn’t
working the way attributed to upthread.

> The problem with the test seems blindingly obvious from here: it
> is assuming first that psql will start fast enough to print its
> PID within one second, and next that we'll be able to issue
> the cancel (and have the backend react) in less than 2 seconds
> more.  This seems about guaranteed to fail on cache-clobber
> animals, for example, but animals that are merely slow or overloaded
> would have issues too.
> 
> I think you should drop the overly-cute bit with a SIGALRM handler,
> and instead have a loop-with-delay around an attempt to read the
> psql.pid file, after launching the psql run without an immediate
> wait for termination.  That gets rid of the first problem (though
> you still want the loop to timeout eventually, it could wait up
> to say 180 seconds, as we do elsewhere).  Then the second problem
> is easy to solve by making the pg_sleep delay twice as much.

This could perhaps be done with a PostgresNode::interactive_psql session?  I
used that in a similar, but far from the same, test setup in the online
checksums patchset.

--
Daniel Gustafsson               https://vmware.com/



Reply via email to