Several people have been able to reproduce a problem building Perl with MMK on OpenVMS Alpha v7.3-1. The symptom is a lot of looping in kernel mode and the parent process doing tens of thousands of DIOs per second (or so MONITOR and SHOW PROC/CONT claim). I believe the sequence of events when shutting down a child process in MMK is tripping over something introduced as part the performance improvements in 7.3-1, perhaps those affecting AST delivery or those affecting mailbox I/O. The performance improvements in 7.3-1 are summarized here:
<http://www.openvms.compaq.com/doc/731FINAL/6657/6657pro.html#pfeat> What it comes down to is that the write attention AST that is supposed to set up notification for when the child wants to send info to the parent via a mailbox keeps requeueing itself infinitely even if the child no longer exists. This is timing sensitive and doesn't always happen, probably because if the parent image manages to exit soon enough the looping sequence never gets started, or perhaps it manages to delete the mailbox quickly enough sometimes. I have no idea why a successfully queued write attention AST is counted as a DIO rather than a BIO. I also don't understand why the write attention AST would continually fire when the only process that could be writing to it no longer exists; this may well be a bug in some of the new VMS code. However, it seems clear that you never want to queue a write attention AST when you know for sure the writer is already gone. I've made the modifications described below to accomplish that. All may not peachy yet, however, I did see an accvio once after this patch, though I could not reproduce it. There may still be something fishy going on in the shutdown sequence. Reinvoking MMK after the accvio completed the build successfully. I also got Perl 5.8.0 to run its test suite, which uses MMK extensively, and it passed all tests. I've patched echo_ast in build_target.c to store and return the status it gets from sp_receive. I've patched sp_wrtattn_ast in sp_mgr.c to check the return status it gets from echo_ast. If sp_wrtattn_ast gets SS$_NONEXPR then it knows it does not need to queue itself again because the child no longer exists so there's not much point in wanting to be notified when it writes to its output mailbox. These changes against MMK 3.9-3 are available below as a GNU unified diff and also as the output of DIFFERENCES/SLP. The former can be applied with GNU patch and the latter with EDIT/SUM. --- build_target.c;-0 Mon Dec 28 07:00:47 1998 +++ build_target.c Thu Oct 17 13:42:12 2002 @@ -869,3 +869,3 @@ ** Keeps reading the output and echoing it until it gets the magic -** end-of-command text. +** end-of-command text or the read fails. ** @@ -891,5 +891,6 @@ $DESCRIPTOR(end_marker,EOM_TEXT); + unsigned int status; INIT_DYNDESC(rcvstr); - while (OK(sp_receive(&spctx, &rcvstr, 0))) { + while (OK(status = sp_receive(&spctx, &rcvstr, 0))) { if (rcvstr.dsc$w_length > EOM_LEN && @@ -909,3 +910,3 @@ } - return SS$_NORMAL; + return status; } /* echo_ast */ --- sp_mgr.c;-0 Mon Dec 28 06:15:58 1998 +++ sp_mgr.c Thu Oct 17 14:06:42 2002 @@ -451,9 +451,10 @@ unsigned int status; status = (ctx->rcvast)(ctx->astprm); - sys$qiow(0, ctx->outchn, IO$_SETMODE|IO$M_WRTATTN, 0, 0, 0, - sp_wrtattn_ast, ctx, 0, 0, 0, 0); - + if (status != SS$_NONEXPR) { + sys$qiow(0, ctx->outchn, IO$_SETMODE|IO$M_WRTATTN, 0, 0, 0, + sp_wrtattn_ast, ctx, 0, 0, 0, 0); + } return status; } [end of patch] $ type build_target.dif - 870, 870 ** end-of-command text or the read fails. - 892, 894 unsigned int status; INIT_DYNDESC(rcvstr); while (OK(status = sp_receive(&spctx, &rcvstr, 0))) { - 910, 910 return status; / [end of BUILD_TARGET.DIF] $ type sp_mgr.dif - 454, 456 if (status != SS$_NONEXPR) { sys$qiow(0, ctx->outchn, IO$_SETMODE|IO$M_WRTATTN, 0, 0, 0, sp_wrtattn_ast, ctx, 0, 0, 0, 0); } / [end of SP_MGR.DIF] -- ________________________________________ Craig A. Berry mailto:craigberry@;mac.com "... getting out of a sonnet is much more difficult than getting in." Brad Leithauser
