At 10:04 AM 4/22/2002 -0400, Charles Lane wrote: >The ones that I've seen are hanging in the "my_waitpid" code. Here's >what seems to be going on: > > --> pipe to/from subprocess, it does it's thing and exits > --> pipe code picks up exit via termination ast, deletes pipe structs > --> my_waitpid called from Perl: > doesn't find match to pipe (it was deleted)... > does a getjpi and finds a termination mbx > tries to read termination mbx, hangs forever
Urk. I considered the my_waitpid stuff, but had ruled it out because I trusted its ability to know who was a pipe subprocess and who wasn't. >Now, the piping code does *not* set up termination mailboxes...it uses >LIB$SPAWN to create subprocesses, and LIB$SPAWN does not give you that >option. > >Why? It looks like LIB$SPAWN is using a termination mailbox >internally, to trigger the termination AST that we're waiting for. > >So if we are sucessful in opening a channel to the termination mbx and >grabbing the termination message, we'll mess up whatever code was >waiting for that message. Really? What prevents two readers from reading the same thing? >But if we don't grab the termination message, >we hang forever. > >Triggering this problem is timing dependant (I triggered it on a variety >of the torture tests... a bit of delay here or there could change which >test was more likely to hang), because it has to occur: > (a) AFTER the termination message goes to the piping code, so that > the pid is removed from the "open pipes" list. >and (b) BEFORE the process is finally deleted by VMS, so that getjpi > still returns sucessfully. > >Possible action items: > keep a list of pipe/subprocess PIDs around to match with waitpid calls > (a memory leak if we keep all of them...just the last N perhaps?) > get rid of the attempts to grab termination mailboxes What about putting a timeout on the $qiow that is reading from the termination mailbox? If it completes with a timeout, we can requeue it, but in the meanwhile whatever else was pending should have a chance to fire. We really need to get the pipe torture tests or some subset of them into the test suite so we catch these problems when they first arise.
