Hi Mark Back in the days I liked the mpirun/mpiexec *--tag-output *option. Jeff: Does it still exist? It may not prevent 100% the splitting of output lines, but tagging the lines with the process rank helps. You can grep the stdout log for the rank that you want, which helps a lot when several processes are talking.
I hope this helps, Gus Correa On Sun, Dec 5, 2021 at 1:12 PM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > FWIW: Open MPI 4.1.2 has been released -- you can probably stop using an > RC release. > > I think you're probably running into an issue that is just a fact of > life. Especially when there's a lot of output simultaneously from multiple > MPI processes (potentially on different nodes), the stdout/stderr lines can > just get munged together. > > Can you check for convergence a different way? > > -- > Jeff Squyres > jsquy...@cisco.com > > ________________________________________ > From: users <users-boun...@lists.open-mpi.org> on behalf of Fisher (US), > Mark S via users <users@lists.open-mpi.org> > Sent: Thursday, December 2, 2021 10:48 AM > To: users@lists.open-mpi.org > Cc: Fisher (US), Mark S > Subject: [OMPI users] stdout scrambled in file > > We are using Mellanox HPC-X MPI based on OpenMPI 4.1.1RC1 and having > issues with lines scrambling together occasionally. This causes issues our > convergence checking code since we put convergence data there. We are not > using any mpirun options for stdout we just redirect stdout/stderr to a > file before we run the mpirun command so all output goes there. We had > similar issue with Intel MPI in the past and used the -ordered-output to > fix it but I do not see any similar option for OpenMPI. See example below. > Is there anyway to ensure a line from a process gets one line in the output > file? > > > The data in red below is scrambled up and should look like the cleaned-up > version. You can see it put a line from a different process inside a line > from another processes and the rest of the line ended up a couple of lines > down. > > ZONE 0 : Min/Max CFL= 5.000E-01 1.500E+01 Min/Max DT= 8.411E-10 > 1.004E-01 sec > > *IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 > -4.945E-06 aerosurfs > *IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 > -2.785E-05 aerosurfs > *IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 > -4.945E-06 Aircraft-Total > *IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 > -2.785E-05 Aircr Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin: 699 > 1625 12 > Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin: 111 1626 6 > aft-Total > *IGSTAB* 1626 6.623E-02 2.137E-01 -9.063E-04 8.450E-03 -5.485E-04 > -4.961E-06 Aircraft-OML > *IGMNTAERO* 1626 -6.118E-04 -1.602E-02 6.404E-04 5.756E-08 3.341E-04 > -2.791E-05 Aircraft-OML > > > Cleaned up version: > > ZONE 0 : Min/Max CFL= 5.000E-01 1.500E+01 Min/Max DT= 8.411E-10 > 1.004E-01 sec > > *IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 > -4.945E-06 aerosurfs > *IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 > -2.785E-05 aerosurfs > *IGSTAB* 1626 7.392E-02 2.470E-01 -9.075E-04 8.607E-03 -5.911E-04 > -4.945E-06 Aircraft-Total > *IGMNTAERO* 1626 -6.120E-04 1.406E-02 6.395E-04 4.473E-08 3.112E-04 > -2.785E-05 Aircraft-Total > Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin: 699 1625 12 > Warning: BCFD: US_UPDATEQ: izon, iter, nBadpmin: 111 1626 6 > *IGSTAB* 1626 6.623E-02 2.137E-01 -9.063E-04 8.450E-03 -5.485E-04 > -4.961E-06 Aircraft-OML > *IGMNTAERO* 1626 -6.118E-04 -1.602E-02 6.404E-04 5.756E-08 3.341E-04 > -2.791E-05 Aircraft-OML > > Thanks! >