> On 20 Aug 2021, at 20:47, Tom Lane <[email protected]> wrote: > > Daniel Gustafsson <[email protected]> writes: >> If we want the test to run but not fail the entire test suite if it fails >> then >> it should use a TODO block instead, but that’s intended for tests known to >> fail >> and this doesn’t seem to fall in that category. > > That seems pretty useless. If we did break things in this area, > such a test would not help us notice.
For sure. I wasn’t advocating it, merely indicating that the SKIP block isn’t working the way attributed to upthread. > The problem with the test seems blindingly obvious from here: it > is assuming first that psql will start fast enough to print its > PID within one second, and next that we'll be able to issue > the cancel (and have the backend react) in less than 2 seconds > more. This seems about guaranteed to fail on cache-clobber > animals, for example, but animals that are merely slow or overloaded > would have issues too. > > I think you should drop the overly-cute bit with a SIGALRM handler, > and instead have a loop-with-delay around an attempt to read the > psql.pid file, after launching the psql run without an immediate > wait for termination. That gets rid of the first problem (though > you still want the loop to timeout eventually, it could wait up > to say 180 seconds, as we do elsewhere). Then the second problem > is easy to solve by making the pg_sleep delay twice as much. This could perhaps be done with a PostgresNode::interactive_psql session? I used that in a similar, but far from the same, test setup in the online checksums patchset. -- Daniel Gustafsson https://vmware.com/
