Re: piping hangs again

Craig A. Berry Mon, 22 Apr 2002 14:31:59 -0700

At 04:52 PM 4/22/2002 -0400, Charles Lane wrote:

>(it sounded like you were assuming that a read doesn't remove the
>message; it does)


I guess I was assuming that whatever writes the termination message will 
satisfy an arbitrary number of readers. I'm not sure where I got this 
impression and it may be completely wrong.  Somewhere I could swear I saw an 
example (possibly in the Apache sources?) of reading a termination mailbox 
where the reader discarded a message and requeued itself if the pid didn't 
match (perhaps was a different child of the same parent) or the type of 
message was anything other than process termination (since the same mbx is 
used, as you indicate below, for other purposes).  This would only work if 
the writer was willing to satisfy multiple requests since otherwise, as you 
suggest, whoever got there first would steal the message from the others.

>  It 
>> looked to me like the only time we could fail to recognize a pipe subprocess 
>> was when my_pclose had already initiated shutdown with either $delprc or 
>> $forcex.  

>Not true, I was seeing the hang even when the child process exits
>normally. 

In which case the process would still be in the midst of getting deleted, 
right?

>> --- vms/vms.c;-0        Tue Apr  9 14:26:06 2002
>> +++ vms/vms.c   Mon Apr 22 12:14:09 2002
>>                memset((void *) &trmmsg, 0, sizeof(trmmsg));
>> -              sts = sys$qiow(0,mbxchan,IO$_READVBLK,&qio_iosb,0,0,
>> +              sts = sys$qiow(0,mbxchan,IO$_READVBLK|IO$M_WRITERCHECK,&qio_iosb,0,0,
>>                               &trmmsg,ACC$K_TERMLEN,0,0,0,0);
> 
>it's that WRITERCHECK that's bailing you out here;  my bet is that
>the termination message is going back to the parent process, and the
>child is deleted.  After that happens, there is no "writer" and the
>read will terminate with an error.

The WRITERCHECK made no discernible difference by itself.  I left it in 
because it seemed like the right thing to do, though if the parent has a 
read/write channel open then the WRITERCHECK will be ignored.  It was 
definitely the addition of a check for process deletion in progress that 
kept it from hanging.  

>Which isn't that bad a way of getting a "wait for termination" without
>having to do polling, but it still is problematic.
>
>For example (and I have a program that does stuff like this):
>
>    Process A creates a "general purpose message mailbox", opens read/write
>        channel to that mailbox, queues a read to mbx.
>    Process A spawns child B, with the term mbx -> general purpose mbx
>    Process A spawns child C, with the term mbx -> general purpose mbx

Eek!  Why is process A using its general purpose mbx as the termination mbx 
of its children?  That will definitely cause hilarity, if not mayhem.  It 
couldn't happen in the piping code since you use lib$spawn instead of 
sys$creprc.

>In this case, the WRITERCHECK won't help, because A maintains a r/w
>channel to the mailbox.  (the write is used to put "internal" messages
>in its input queue).
>
>Perl is waiting for child B, so you queue a read to the mbx; child B
>terminates, but since process A had a read queued first, it gets the
>termination message. 
>
>Then A queues another read.
>
>Then child C exits, giving another termination message, or perhaps
>something else writes to the mailbox (a DECnet event, for example).
>
>Since Perl's waitpid queued its read before A's, waitpid will get the
>message and A won't.  Hilarity ensues.

It's got to be a bit more complicated than that.  Perhaps the writer can 
satisfy multiple readers but not an arbitrary number of them, i.e., perhaps 
it knows how many children it has (not how many read channels are open) and 
what I've done is create a sort of musical chairs situation where one child 
always gets left without a mailbox message.

>In the above example, the problem is that while a CHILD subprocess has
>a single termination mailbox, that mailbox can also be used by the
>PARENT for many disparate purposes, and when we *can* break into the
>communication between parent and child, we probably shouldn't unless
>we know exactly what we're doing.  Which is not something one can do
>in a "general purpose" utility routine.

Possibly not.  Unless someone can help us out with how termination mailboxes 
really work, we should probably ditch waitpid's attempt to use them for now.

Re: piping hangs again

Reply via email to