There are several output-controlling options - e.g., you could redirect the 
output from each process to its own file or directory.

However, it makes little sense to me for someone to write convergence data into 
a file and then parse it. Typically, convergence data results from all procs 
reaching the end of a computational epoch or cycle - i.e., you need all the 
procs to reach the same point. So why not just have the procs report their 
convergence data to rank=0 using an MPI_Gather collective, and then have that 
proc output whatever info you want to see?

You would then no longer be dependent on some implementation-specific "mpirun" 
cmd line option, so you could run the same code using srun, aprun, prun, or 
mpirun and get the exact same output.

Am I missing something?
Ralph


On Dec 5, 2021, at 12:19 PM, Gus Correa via users <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> > wrote:

Hi Mark

Back in the days I liked the mpirun/mpiexec --tag-output option.
Jeff: Does it still exist?
It may not prevent 100% the splitting of output lines,
but tagging the lines with the process rank helps.
You can grep the stdout log for the rank that you want,
which helps a lot when several processes are talking.

I hope this helps,
Gus Correa


On Sun, Dec 5, 2021 at 1:12 PM Jeff Squyres (jsquyres) via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:
FWIW: Open MPI 4.1.2 has been released -- you can probably stop using an RC 
release.

I think you're probably running into an issue that is just a fact of life.  
Especially when there's a lot of output simultaneously from multiple MPI 
processes (potentially on different nodes), the stdout/stderr lines can just 
get munged together.

Can you check for convergence a different way?

--
Jeff Squyres
jsquy...@cisco.com <mailto:jsquy...@cisco.com> 

________________________________________
From: users <users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> > on behalf of Fisher (US), Mark S 
via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Sent: Thursday, December 2, 2021 10:48 AM
To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
Cc: Fisher (US), Mark S
Subject: [OMPI users] stdout scrambled in file

We are using Mellanox HPC-X MPI based on OpenMPI 4.1.1RC1 and having issues 
with lines scrambling together occasionally. This causes issues our convergence 
checking code since we put convergence data there. We are not using any mpirun 
options for stdout we just redirect stdout/stderr to a file before we run the 
mpirun command so all output goes there. We had similar issue with Intel MPI in 
the past and used the -ordered-output to fix it but I do not see any similar 
option for OpenMPI. See example below. Is there anyway to ensure a line from a 
process gets one line in the output file?


The data in red below is scrambled up and should look like the cleaned-up 
version. You can see it put a line from a different process inside a line from 
another processes and the rest of the line ended up a couple of lines down.

ZONE   0 : Min/Max CFL= 5.000E-01 1.500E+01 Min/Max DT= 8.411E-10 1.004E-01 sec

*IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 -4.945E-06  
aerosurfs
*IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 -2.785E-05  
aerosurfs
*IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 -4.945E-06  
Aircraft-Total
*IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 -2.785E-05  
Aircr Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin:  699  1625     12
Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin:  111  1626      6
aft-Total
*IGSTAB* 1626 6.623E-02 2.137E-01 -9.063E-04 8.450E-03 -5.485E-04 -4.961E-06  
Aircraft-OML
*IGMNTAERO* 1626 -6.118E-04 -1.602E-02 6.404E-04 5.756E-08 3.341E-04 -2.791E-05 
 Aircraft-OML


Cleaned up version:

ZONE   0 : Min/Max CFL= 5.000E-01 1.500E+01 Min/Max DT= 8.411E-10 1.004E-01 sec

*IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 -4.945E-06  
aerosurfs
*IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 -2.785E-05  
aerosurfs
*IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 -4.945E-06  
Aircraft-Total
*IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 -2.785E-05  
Aircraft-Total
 Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin:  699  1625     12
Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin:  111  1626      6
*IGSTAB* 1626 6.623E-02 2.137E-01 -9.063E-04 8.450E-03 -5.485E-04 -4.961E-06  
Aircraft-OML
*IGMNTAERO* 1626 -6.118E-04 -1.602E-02 6.404E-04 5.756E-08 3.341E-04 -2.791E-05 
 Aircraft-OML

Thanks!

Reply via email to