Re: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Gilles Gouaillardet
Jeff,

that could be a copy/paste error and/or an email client issue.

The syntax is
mpirun --mca variable value ...

(short hyphen, short hyphen, m, c, a)

The error message is about the missing —-mca executable
(long hyphen, short hyphen, m, c, a)

This is most likely the root cause of this issue.

An other option is to set these parameters via the environment
export OMPI_MCA_coll_sync_priority=100
export OMPI_MCA_coll_sync_barrier_after=10
and then invoke mpirun without the --mca options.

Cheers,

Gilles

On Sat, Jan 19, 2019 at 11:28 AM Jeff Wentworth via users
 wrote:
>
> Hi,
>
> Thanks for the quick response.  But it looks like I am missing something 
> because neither -mca nor --mca is being recognized by my mpirun command.
>
> % mpirun --mca coll_sync_priority 100 --mca coll_sync_barrier_after 10 -q -np 
> 2 a.out
> --
> mpirun was unable to find the specified executable file, and therefore
> did not launch the job.  This error was first reported for process
> rank 0; it may have occurred for other processes as well.
>
> NOTE: A common cause for this error is misspelling a mpirun command
>   line parameter option (remember that mpirun interprets the first
>   unrecognized command line token as the executable).
>
> Node:   mia
> Executable: —-mca
> --
> 4 total processes failed to start
>
> % which mpirun
> /usr/local/bin/mpirun
> % ls -l /usr/local/bin/mpirun
> lrwxrwxrwx. 1 root root 7 Jan 15 20:50 /usr/local/bin/mpirun -> orterun
>
> jw2002
>
> ----------------
> On Fri, 1/18/19, Nathan Hjelm via users  wrote:
>
>  Subject: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce 
> calls?
>  To: "Open MPI Users" 
>  Cc: "Nathan Hjelm" 
>  Date: Friday, January 18, 2019, 9:00 PM
>
>
>  Since neither bcast nor reduce acts as
>  a barrier it is possible to run out of resources if either
>  of these calls (or both) are used in a tight loop. The sync
>  coll component exists for this scenario. You can enable it
>  by  adding the following to mpirun (or setting these
>  variables through the environment or a file):
>
>  —mca coll_sync_priority 100 —mca
>  coll_sync_barrier_after 10
>
>
>  This will effectively throttle the
>  collective calls for you. You can also change the reduce to
>  an allreduce.
>
>
>  -Nathan
>
>  > On Jan 18, 2019, at 6:31 PM, Jeff
>  Wentworth via users 
>  wrote:
>  >
>  > Greetings everyone,
>  >
>  > I have a scientific code using
>  Open MPI (v3.1.3) that seems to work fine when MPI_Bcast()
>  and MPI_Reduce() calls are well spaced out in time.
>  Yet if the time between these calls is short, eventually one
>  of the nodes hangs at some random point, never returning
>  from the broadcast or reduce call.  Is there some
>  minimum time between calls that needs to be obeyed in order
>  for Open MPI to process these reliably?
>  >
>  > The reason this has come up is
>  because I am trying to run in a multi-node environment some
>  established acceptance tests in order to verify that the
>  Open MPI configured version of the code yields the same
>  baseline result as the original single node version of the
>  code.  These acceptance tests must pass in order for
>  the code to be considered validated and deliverable to the
>  customer.  One of these acceptance tests that hangs
>  does involve 90 broadcasts and 90 reduces in a short period
>  of time (less than .01 cpu sec), as in:
>  >
>  > Broadcast #89 in
>  >  Broadcast #89 out 8 bytes
>  >  Calculate angle #89
>  >  Reduce #89 in
>  >  Reduce #89 out 208 bytes
>  > Write result #89 to file on
>  service node
>  > Broadcast #90 in
>  >  Broadcast #90 out 8 bytes
>  >  Calculate angle #89
>  >  Reduce #90 in
>  >  Reduce #90 out 208 bytes
>  > Write result #90 to file on
>  service node
>  >
>  > If I slow down the above
>  acceptance test, for example by running it under valgrind,
>  then it runs to completion and yields the correct
>  result.  So it seems to suggest that something internal
>  to Open MPI is getting swamped.  I understand that
>  these acceptance tests might be pushing the limit, given
>  that they involve so many short calculations combined with
>  frequent, yet tiny, transfers of data among nodes.
>  >
>  > Would it be worthwhile for me to
>  enforce with some minimum wait time between the MPI calls,
>  say 0.01 or 0.001 sec via nanosleep()?  The only time
>  it would matter would be when acceptanc

Re: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Jeff Wentworth via users
Hi,

Thanks for the quick response.  But it looks like I am missing something 
because neither -mca nor --mca is being recognized by my mpirun command.  

% mpirun --mca coll_sync_priority 100 --mca coll_sync_barrier_after 10 -q -np 2 
a.out
--
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
  line parameter option (remember that mpirun interprets the first
  unrecognized command line token as the executable).

Node:   mia
Executable: —-mca
--
4 total processes failed to start

% which mpirun
/usr/local/bin/mpirun
% ls -l /usr/local/bin/mpirun
lrwxrwxrwx. 1 root root 7 Jan 15 20:50 /usr/local/bin/mpirun -> orterun

jw2002


On Fri, 1/18/19, Nathan Hjelm via users  wrote:

 Subject: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?
 To: "Open MPI Users" 
 Cc: "Nathan Hjelm" 
 Date: Friday, January 18, 2019, 9:00 PM
 
 
 Since neither bcast nor reduce acts as
 a barrier it is possible to run out of resources if either
 of these calls (or both) are used in a tight loop. The sync
 coll component exists for this scenario. You can enable it
 by  adding the following to mpirun (or setting these
 variables through the environment or a file):
 
 —mca coll_sync_priority 100 —mca
 coll_sync_barrier_after 10
 
 
 This will effectively throttle the
 collective calls for you. You can also change the reduce to
 an allreduce.
 
 
 -Nathan
 
 > On Jan 18, 2019, at 6:31 PM, Jeff
 Wentworth via users 
 wrote:
 > 
 > Greetings everyone,
 > 
 > I have a scientific code using
 Open MPI (v3.1.3) that seems to work fine when MPI_Bcast()
 and MPI_Reduce() calls are well spaced out in time. 
 Yet if the time between these calls is short, eventually one
 of the nodes hangs at some random point, never returning
 from the broadcast or reduce call.  Is there some
 minimum time between calls that needs to be obeyed in order
 for Open MPI to process these reliably?
 > 
 > The reason this has come up is
 because I am trying to run in a multi-node environment some
 established acceptance tests in order to verify that the
 Open MPI configured version of the code yields the same
 baseline result as the original single node version of the
 code.  These acceptance tests must pass in order for
 the code to be considered validated and deliverable to the
 customer.  One of these acceptance tests that hangs
 does involve 90 broadcasts and 90 reduces in a short period
 of time (less than .01 cpu sec), as in:
 > 
 > Broadcast #89 in
 >  Broadcast #89 out 8 bytes
 >  Calculate angle #89
 >  Reduce #89 in
 >  Reduce #89 out 208 bytes
 > Write result #89 to file on
 service node
 > Broadcast #90 in
 >  Broadcast #90 out 8 bytes
 >  Calculate angle #89
 >  Reduce #90 in
 >  Reduce #90 out 208 bytes
 > Write result #90 to file on
 service node
 > 
 > If I slow down the above
 acceptance test, for example by running it under valgrind,
 then it runs to completion and yields the correct
 result.  So it seems to suggest that something internal
 to Open MPI is getting swamped.  I understand that
 these acceptance tests might be pushing the limit, given
 that they involve so many short calculations combined with
 frequent, yet tiny, transfers of data among nodes.  
 > 
 > Would it be worthwhile for me to
 enforce with some minimum wait time between the MPI calls,
 say 0.01 or 0.001 sec via nanosleep()?  The only time
 it would matter would be when acceptance tests are run, as
 the situation doesn't arise when beefier runs are performed.
 
 > 
 > Thanks.
 > 
 > jw2002
 >
 ___
 > users mailing list
 > users@lists.open-mpi.org
 > https://lists.open-mpi.org/mailman/listinfo/users
 
 
 ___
 users mailing list
 users@lists.open-mpi.org
 https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Nathan Hjelm via users

Since neither bcast nor reduce acts as a barrier it is possible to run out of 
resources if either of these calls (or both) are used in a tight loop. The sync 
coll component exists for this scenario. You can enable it by  adding the 
following to mpirun (or setting these variables through the environment or a 
file):

—mca coll_sync_priority 100 —mca coll_sync_barrier_after 10


This will effectively throttle the collective calls for you. You can also 
change the reduce to an allreduce.


-Nathan

> On Jan 18, 2019, at 6:31 PM, Jeff Wentworth via users 
>  wrote:
> 
> Greetings everyone,
> 
> I have a scientific code using Open MPI (v3.1.3) that seems to work fine when 
> MPI_Bcast() and MPI_Reduce() calls are well spaced out in time.  Yet if the 
> time between these calls is short, eventually one of the nodes hangs at some 
> random point, never returning from the broadcast or reduce call.  Is there 
> some minimum time between calls that needs to be obeyed in order for Open MPI 
> to process these reliably?
> 
> The reason this has come up is because I am trying to run in a multi-node 
> environment some established acceptance tests in order to verify that the 
> Open MPI configured version of the code yields the same baseline result as 
> the original single node version of the code.  These acceptance tests must 
> pass in order for the code to be considered validated and deliverable to the 
> customer.  One of these acceptance tests that hangs does involve 90 
> broadcasts and 90 reduces in a short period of time (less than .01 cpu sec), 
> as in:
> 
> Broadcast #89 in
>  Broadcast #89 out 8 bytes
>  Calculate angle #89
>  Reduce #89 in
>  Reduce #89 out 208 bytes
> Write result #89 to file on service node
> Broadcast #90 in
>  Broadcast #90 out 8 bytes
>  Calculate angle #89
>  Reduce #90 in
>  Reduce #90 out 208 bytes
> Write result #90 to file on service node
> 
> If I slow down the above acceptance test, for example by running it under 
> valgrind, then it runs to completion and yields the correct result.  So it 
> seems to suggest that something internal to Open MPI is getting swamped.  I 
> understand that these acceptance tests might be pushing the limit, given that 
> they involve so many short calculations combined with frequent, yet tiny, 
> transfers of data among nodes.  
> 
> Would it be worthwhile for me to enforce with some minimum wait time between 
> the MPI calls, say 0.01 or 0.001 sec via nanosleep()?  The only time it would 
> matter would be when acceptance tests are run, as the situation doesn't arise 
> when beefier runs are performed. 
> 
> Thanks.
> 
> jw2002
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users