Re: [OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Jeff Squyres (jsquyres) via users
On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users 
mailto:users@lists.open-mpi.org>> wrote:

Hi,

We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important 
behavior change with respect to above option.

We invoke mpirun as

% mpirun –output-filename /app.log –np  

With 2.0.1, the above produced /app.log. file for stdout of 
the application, where  is the rank of the process. However, with 4.0.1, the 
above produces /app.log/1/rank./stdout

Is this intentional change? The documentation does not seem to indicate any 
expected changes in the behavior. How do I get 4.0.1 to get log files as seen 
in 2.0.1?

This was reported recently here:

 https://github.com/open-mpi/ompi/issues/7095

It looks like we changed this behavior quite some time ago, and neglected to 
update the documentation.  :-\

Further, earlier, mpirun had no output on stdout, except to indicate, when all 
the child process had completed. However, now the mpirun also spews out all the 
application stdout messages (from each rank process) on its stdout. How do I 
stop that?

Oh, did the prior behavior *only* output to the file and not to stdout/stderr?  
Huh.

I guess a workaround for that would be:

mpirun  ... > /dev/null

--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Kulshrestha, Vipul via users
Hi,

We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important 
behavior change with respect to above option.

We invoke mpirun as

% mpirun -output-filename /app.log -np  

With 2.0.1, the above produced /app.log. file for stdout of 
the application, where  is the rank of the process. However, with 4.0.1, the 
above produces /app.log/1/rank./stdout

Is this intentional change? The documentation does not seem to indicate any 
expected changes in the behavior. How do I get 4.0.1 to get log files as seen 
in 2.0.1?

Further, earlier, mpirun had no output on stdout, except to indicate, when all 
the child process had completed. However, now the mpirun also spews out all the 
application stdout messages (from each rank process) on its stdout. How do I 
stop that?

Thanks,
Vipul



Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles,

Having implemented some of the underlying collective algorithms, I am
puzzled by the need to force the sync to 1 to have things flowing. I would
definitely appreciate a reproducer so that I can identify (and hopefully)
fix the underlying problem.

Thanks,
  George.


On Tue, Oct 29, 2019 at 2:20 PM Garrett, Charles via users <
users@lists.open-mpi.org> wrote:

> Last time I did a reply on here, it created a new thread.  Sorry about
> that everyone.  I just hit the Reply via email button.  Hopefully this one
> will work.
>
>
>
> To Gilles Gouaillardet:
>
> My first thread has a reproducer that causes the problem.
>
>
>
> To Beorge Bosilca:
>
> I had to set coll_sync_barrier_before=1.  Even setting to 10 did not fix
> my problem.  I was surprised by this and I’m still surprised given your
> comment on setting to larger than a few tens.  Thanks for the explanation
> about the problem.
>
>
>
> Charles Garrett
>


Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread Garrett, Charles via users
Last time I did a reply on here, it created a new thread.  Sorry about that 
everyone.  I just hit the Reply via email button.  Hopefully this one will work.

To Gilles Gouaillardet:
My first thread has a reproducer that causes the problem.

To Beorge Bosilca:
I had to set coll_sync_barrier_before=1.  Even setting to 10 did not fix my 
problem.  I was surprised by this and I'm still surprised given your comment on 
setting to larger than a few tens.  Thanks for the explanation about the 
problem.

Charles Garrett


Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles,

There is a known issue with calling collectives on a tight loop, due to
lack of control flow at the network level. It results in a significant
slow-down, that might appear as a deadlock to users. The work around this
is to enable the sync collective module, that will insert a fake barrier at
regular intervals in the tight collective loop, allowing a more streamlined
usage of the network.

Run `ompi_info --param coll sync -l 9` to see the options you need to play
with. I think setting one of the coll_sync_barrier_before
or coll_sync_barrier_after to anything larger than a few tens should be
good enough.

  George.


On Mon, Oct 28, 2019 at 9:29 PM Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Charles,
>
>
> unless you expect yes or no answers, can you please post a simple
> program that evidences
>
> the issue you are facing ?
>
>
> Cheers,
>
>
> Gilles
>
> On 10/29/2019 6:37 AM, Garrett, Charles via users wrote:
> >
> > Does anyone have any idea why this is happening?  Has anyone seen this
> > problem before?
> >
>