Barry:

Can you tolerate the following workaround for Hydra's error cleanup or
do you need it to be internal?  I presume you know enough bash to
generalize a.sh appropriately.

alcfwl181:build jhammond$ cat a.sh
#!/bin/sh
$1
true
alcfwl181:build jhammond$ mpiexec -n 1 -env
MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.sh ./a.out
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
alcfwl181:build jhammond$ mpiexec -n 1 -env
MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.sh ./a.out

alcfwl181:build jhammond$ mpiexec -n 1 -env
MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 61123 RUNNING AT alcfwl181.alcf.anl.gov
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
alcfwl181:build jhammond$ mpiexec -n 1 -env
MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 61126 RUNNING AT alcfwl181.alcf.anl.gov
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

On Fri, Feb 21, 2014 at 3:10 PM, Jeff Hammond <[email protected]> wrote:
> Barry:
>
> Would the following behavior be acceptable to you?  I have only made
> the changes in MPI but am looking at the process manager now.
>
> Jeff
>
>
> # Without the process manager
>
> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0
> alcfwl181:build jhammond$ ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1
> alcfwl181:build jhammond$ ./a.out
>
> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE
> alcfwl181:build jhammond$ ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> # With the process manager
>
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 61023 RUNNING AT alcfwl181.alcf.anl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 61026 RUNNING AT alcfwl181.alcf.anl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 61032 RUNNING AT alcfwl181.alcf.anl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
>
>
>
> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <[email protected]> wrote:
>>
>>    Is there any way to turn off MPICH (and others) printing messages about 
>> MPI_Abort?  We have already prepared and presented useful error messages to 
>> the user about the situation and would like to avoid having these additional 
>> messages printed (that often make the situation look worse than it is)
>>
>>     Thanks
>>
>>    Barry
>>
>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>> [cli_0]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>
>> ==================================================================mailto:[email protected]=================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   EXIT CODE: 56
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     [email protected]
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> [email protected]



-- 
Jeff Hammond
[email protected]

Reply via email to