Barry: Can you tolerate the following workaround for Hydra's error cleanup or do you need it to be internal? I presume you know enough bash to generalize a.sh appropriately.
alcfwl181:build jhammond$ cat a.sh #!/bin/sh $1 true alcfwl181:build jhammond$ mpiexec -n 1 -env MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.sh ./a.out application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 alcfwl181:build jhammond$ mpiexec -n 1 -env MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.sh ./a.out alcfwl181:build jhammond$ mpiexec -n 1 -env MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 61123 RUNNING AT alcfwl181.alcf.anl.gov = EXIT CODE: 1 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== alcfwl181:build jhammond$ mpiexec -n 1 -env MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 61126 RUNNING AT alcfwl181.alcf.anl.gov = EXIT CODE: 1 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== On Fri, Feb 21, 2014 at 3:10 PM, Jeff Hammond <[email protected]> wrote: > Barry: > > Would the following behavior be acceptable to you? I have only made > the changes in MPI but am looking at the process manager now. > > Jeff > > > # Without the process manager > > alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0 > alcfwl181:build jhammond$ ./a.out > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 > alcfwl181:build jhammond$ ./a.out > > alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE > alcfwl181:build jhammond$ ./a.out > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > # With the process manager > > alcfwl181:build jhammond$ mpiexec -n 1 -env > MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 61023 RUNNING AT alcfwl181.alcf.anl.gov > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > alcfwl181:build jhammond$ mpiexec -n 1 -env > MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 61026 RUNNING AT alcfwl181.alcf.anl.gov > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > alcfwl181:build jhammond$ mpiexec -n 1 ./a.out > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 61032 RUNNING AT alcfwl181.alcf.anl.gov > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > > > > On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <[email protected]> wrote: >> >> Is there any way to turn off MPICH (and others) printing messages about >> MPI_Abort? We have already prepared and presented useful error messages to >> the user about the situation and would like to avoid having these additional >> messages printed (that often make the situation look worse than it is) >> >> Thanks >> >> Barry >> >> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 >> [cli_0]: aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 >> >> ==================================================================mailto:[email protected]================= >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = EXIT CODE: 56 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> >> >> >> >> _______________________________________________ >> discuss mailing list [email protected] >> To manage subscription options or unsubscribe: >> https://lists.mpich.org/mailman/listinfo/discuss > > > > -- > Jeff Hammond > [email protected] -- Jeff Hammond [email protected]
