Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes: > It turns out I was wrong about the problem jacana has been having with > the pg_ctl tests hanging. The problem was not the use of select as a > timeout mechanism, although I think the change to using > Time::Hires::usleep() is correct and shouldn't be reverted.
> The problem is command_like's use of redirection to strings. Why this > should be a problem for this particular use is a matter of speculation. > I suspect it's to do with the fact that in this instance pg_ctl is > leaving behind some child processes (i.e. postmaster and children) after > it exits, and so on this particular path IPC::Run isn't detecting the > exit properly. The workaround I have found to work is to redirect > command_like's output instead to a couple of files and then slurp in > those files and delete them. A bit hacky, I know, so I'm open to other > suggestions. Yeah, I'd been eyeing that behavior of IPC::Run a month or so back, though from the opposite direction. If you are reading either stdout or stderr of the executed command into Perl, then it detects command completion by waiting till it gets EOF on those stream(s). If you are reading neither, then it goes into this wonky backoff behavior where it sleeps a bit and then checks waitpid(WNOHANG), with the value of "a bit" continually increasing until it reaches a fairly large value, half a second or a second (I forget). So you have potentially some sizable fraction of a second that's just wasted after command termination. I'd been able to make a small but noticeable improvement in the runtime of some of our TAP test suites by forcing the first behavior, ie reading stdout even if we were going to throw it away. So I'm not really that excited about going in the other direction ;-). It shouldn't matter much time-wise for short-lived commands, but it's disturbing if the EOF technique fails entirely for some cases. I looked at jacana's two recent pg_ctlCheck failures, and they both seem to have failed on this: command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data", '-l', "$TestLib::log_path/001_start_stop_server.log" ], qr/done.*server started/s, 'pg_ctl start'); That is redirecting the postmaster's stdout/stderr into a file, for sure, so the child processes shouldn't impact EOF detection AFAICS. It's also hard to explain this way why it only fails some of the time. I think we need to look at what the recent changes were in this area and try to form a better theory of why it's started to fail here. regards, tom lane -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers