Re: piping hangs again

Charles Lane Mon, 22 Apr 2002 13:34:45 -0700

Craig A. Berry writes:
> At 02:25 PM 4/22/2002 -0400, Charles Lane wrote:
> >Craig A. Berry writes:
> >> At 10:04 AM 4/22/2002 -0400, Charles Lane wrote:


> >> >So if we are sucessful in opening a channel to the termination mbx and
> >> >grabbing the termination message, we'll mess up whatever code was
> >> >waiting for that message.  
> >
> >> Really?  What prevents two readers from reading the same thing?
> >
> >Well, whoever reads the message first causes the message to be removed
> >from the mailbox.  

> If only the $creprc caller can safely read from the termination 
> mailbox, then yes, my assumptions in making waitpid do so are off base.  I 
> wonder why lib$spawn even bothers with a termination mailbox then.

I'm getting a strong feeling that one (or both) of us is misunderstanding!

As I understand mailbox I/O....

    process A creates termination mbx
    process A queues a read to mbx
    process A spawns subprocess B

    process C (perl interloper) queues a read to mbx
    process B terminates, writing message to mbx

    THEN, either: 
       process A has its read complete, gets the message, *which is removed
           from the mailbox*
    OR:
       process C has its read complete, gets the message, *which is removed
            from the mailbox*

(it sounded like you were assuming that a read doesn't remove the
message; it does)

Which one of these occurs? Either it's random -OR- it depends on the order
in which they are queued, in which case A will always win. (Thinking about
it, the `queue order' sounds right).

Random or not, the result is undesirable for Perl's waitpid.

I think this is basic mailbox i/o.

> >Here's a patch that implements my suggested changes to waitpid; give it 
> >a shot.  It seems to get us through the pipe torture tests. 

> OK, I'll try it.  And just to confuse things, here's an alternate patch.  It 
> looked to me like the only time we could fail to recognize a pipe subprocess 
> was when my_pclose had already initiated shutdown with either $delprc or 
> $forcex.  So, this just checks to see if the process is in one of those 
> states and refrains from posting a read to the termination mailbox.  It also 
> protects itself from the case of no writer on the mailbox existing.  
> test_pipe.pl looks good after this, but if a successful read from the 
> mailbox messes up another reader then your patch is probably better.

Not true, I was seeing the hang even when the child process exits
normally.  The pipe structure is deleted after the i/o completes (EOF
to/from child) and a termination AST is received...do a waitpid after
that, and the child process won't be in the list of pipe structures.

> --- vms/vms.c;-0        Tue Apr  9 14:26:06 2002
> +++ vms/vms.c   Mon Apr 22 12:14:09 2002
>                memset((void *) &trmmsg, 0, sizeof(trmmsg));
> -              sts = sys$qiow(0,mbxchan,IO$_READVBLK,&qio_iosb,0,0,
> +              sts = sys$qiow(0,mbxchan,IO$_READVBLK|IO$M_WRITERCHECK,&qio_iosb,0,0,
>                               &trmmsg,ACC$K_TERMLEN,0,0,0,0);
 
it's that WRITERCHECK that's bailing you out here;  my bet is that
the termination message is going back to the parent process, and the
child is deleted.  After that happens, there is no "writer" and the
read will terminate with an error.

Which isn't that bad a way of getting a "wait for termination" without
having to do polling, but it still is problematic.

For example (and I have a program that does stuff like this):

    Process A creates a "general purpose message mailbox", opens read/write
        channel to that mailbox, queues a read to mbx.
    Process A spawns child B, with the term mbx -> general purpose mbx
    Process A spawns child C, with the term mbx -> general purpose mbx

In this case, the WRITERCHECK won't help, because A maintains a r/w
channel to the mailbox.  (the write is used to put "internal" messages
in its input queue).

Perl is waiting for child B, so you queue a read to the mbx; child B
terminates, but since process A had a read queued first, it gets the
termination message.  

Then A queues another read.

Then child C exits, giving another termination message, or perhaps
something else writes to the mailbox (a DECnet event, for example).

Since Perl's waitpid queued its read before A's, waitpid will get the
message and A won't.  Hilarity ensues.

In the above example, the problem is that while a CHILD subprocess has
a single termination mailbox, that mailbox can also be used by the
PARENT for many disparate purposes, and when we *can* break into the
communication between parent and child, we probably shouldn't unless
we know exactly what we're doing.  Which is not something one can do
in a "general purpose" utility routine.

About the only "general purpose" utility for getting at termination and
status information on other processes is using the accounting facilities,
which are likely to be too tightly protected for casual use by waitpid.
--
 Drexel University       \V                    --Chuck Lane
======]---------->--------*------------<-------[===========
     (215) 895-1545     _/ \  Particle Physics
FAX: (215) 895-5934     /\ /~~~~~~~~~~~        [EMAIL PROTECTED]

Re: piping hangs again

Reply via email to