Re: [OMPI users] Memchecker and MPI_Comm_spawn

2020-05-09 Thread Gilles Gouaillardet via users
Kurt,

the error is "valgrind myApp" is not an executable (but this is a
command a shell can interpret)
so you have several options:
 - use a wrapper (e.g. myApp.valgrind) that forks&exec valgrind myApp)
 - MPI_Comm_spawn("valgrind", argv, ...) after you inserted "myApp" at
the beginning of argv
 - use the fork agent: mpirun --mca orte_fork_agent valgrind.sh ...
  so mpirun/orted will fork&exec valgrind.sh a.out instead of the
default a.out (and it is up to you to write the valgrind.sh wrapper)

Cheers,

Gilles

On Sun, May 10, 2020 at 4:31 AM Mccall, Kurt E. (MSFC-EV41) via users
 wrote:
>
> How can I run OpenMPI’s Memchecker on a process created by MPI_Comm_spawn()?  
>  I’ve configured OpenMPI 4.0.3 for Memchecker, along with Valgrind 3.15.0 and 
> it works quite well on processes created directly by mpiexec.
>
>
>
> I tried to do something analogous by pre-pending “valgrind” onto the command 
> passed to MPI_Comm_spawn(), but the process is not launched and it returns no 
> error code.
>
>
>
> char *argv[n];
>
> MPI_Info info;
>
> MPI_Comm comm;
>
> int error_code[1];
>
>
>
> MPI_Comm_spawn (“valgrind   myApp”,  argv,  1,  info,  0,  MPI_COMM_SELF,  
> &comm,  error_code);
>
>
>
> I didn’t change the array of myApp arguments argv after adding “valgrind” to 
> the command;   maybe it needs to be adjusted somehow.
>
>
>
> Thanks,
>
> Kurt
>
>


[OMPI users] Memchecker and MPI_Comm_spawn

2020-05-09 Thread Mccall, Kurt E. (MSFC-EV41) via users
How can I run OpenMPI's Memchecker on a process created by MPI_Comm_spawn()?   
I've configured OpenMPI 4.0.3 for Memchecker, along with Valgrind 3.15.0 and it 
works quite well on processes created directly by mpiexec.

I tried to do something analogous by pre-pending "valgrind" onto the command 
passed to MPI_Comm_spawn(), but the process is not launched and it returns no 
error code.

char *argv[n];
MPI_Info info;
MPI_Comm comm;
int error_code[1];

MPI_Comm_spawn ("valgrind   myApp",  argv,  1,  info,  0,  MPI_COMM_SELF,  
&comm,  error_code);

I didn't change the array of myApp arguments argv after adding "valgrind" to 
the command;   maybe it needs to be adjusted somehow.

Thanks,
Kurt



Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
This material is working for nearly 10 years with several generations of
nodes and OpenMPI without any problem. Today it is possible to found
refurbished parts at low price on the web and it can help building small
clusters. it is really more efficient than 10Gb ethernet for parallel
codes due to the very low latency.
Now I'm moving to 100-200Gb/s infiniband architectures... for the next
10 years. ;-)

Patrick

Le 09/05/2020 à 16:09, Heinz, Michael William a écrit :
> That it! I was trying to remember what the setting was but I haven’t
> worked on those HCAs since around 2012, so it was faint.
>
> That said, I found the Intel TrueScale manual online
> at 
> https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserGuide_G91902_06.pdf
> 
>
> TS is the same hardware as the old QLogic QDR HCAs so the manual might
> be helpful to you in the future.
>
> Sent from my iPad
>
>> On May 9, 2020, at 9:52 AM, Patrick Bégou via users
>>  wrote:
>>
>> 
>> Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :
>>>
>>> We often get the following errors when more than one job runs on the
>>> same compute node. We are using Slurm with OpenMPI. The IB cards are
>>> QLogic using PSM:
>>>
>>> 10698ipath_userinit: assign_context command failed: Network is down
>>> node01.10698can't open /dev/ipath, network down (err=26)
>>> node01.10703ipath_userinit: assign_context command failed: Network
>>> is down
>>> node01.10703can't open /dev/ipath, network down (err=26)
>>> node01.10701ipath_userinit: assign_context command failed: Network
>>> is down
>>> node01.10701can't open /dev/ipath, network down (err=26)
>>> node01.10700ipath_userinit: assign_context command failed: Network
>>> is down
>>> node01.10700can't open /dev/ipath, network down (err=26)
>>> node01.10697ipath_userinit: assign_context command failed: Network
>>> is down
>>> node01.10697can't open /dev/ipath, network down (err=26)
>>> --
>>> PSM was unable to open an endpoint. Please make sure that the
>>> network link is
>>> active on the node and the hardware is functioning.
>>>
>>> Error: Could not detect network connectivity
>>> --
>>>
>>> Any Ideas how to fix this?
>>>
>>> -- 
>>> Prentice 
>>
>>
>> Hi Prentice,
>>
>> This is not openMPI related but merely due to your hardware. I've not
>> many details but I think this occurs when several jobs share the same
>> node and you have a large number of cores on these nodes (> 14). If
>> this is the case:
>>
>> On Qlogic (I'm using such a hardware at this time) you have 16
>> channel for communication on each HBA and, if I remember what I had
>> read many years ago, 2 are dedicated to the system. When launching
>> MPI applications, each process of a job request for it's own
>> dedicated channel if available, else they share ALL the available
>> channels. So if a second job starts on the same node it do not
>> remains any available channel.
>>
>> To avoid this situation I force sharing the channels (my nodes have
>> 20 codes) by 2 MPI processes. You can set this with a simple
>> environment variable. On all my cluster nodes I create the file:
>>
>> */etc/profile.d/ibsetcontext.sh*
>>
>> And it contains:
>>
>> # allow 4 processes to share an hardware MPI context
>> # in infiniband with PSM
>> *export PSM_RANKS_PER_CONTEXT=2*
>>
>> Of course if some people manage to oversubscribe on the cores (more
>> than one process by core) it could rise again the problem but we do
>> not oversubscribe.
>>
>> Hope this can help you.
>>
>> Patrick
>>



Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
That it! I was trying to remember what the setting was but I haven’t worked on 
those HCAs since around 2012, so it was faint.

That said, I found the Intel TrueScale manual online at 
https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/OFED_Host_Software_UserGuide_G91902_06.pdf

TS is the same hardware as the old QLogic QDR HCAs so the manual might be 
helpful to you in the future.

Sent from my iPad

On May 9, 2020, at 9:52 AM, Patrick Bégou via users  
wrote:


Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :

We often get the following errors when more than one job runs on the same 
compute node. We are using Slurm with OpenMPI. The IB cards are QLogic using 
PSM:

10698ipath_userinit: assign_context command failed: Network is down
node01.10698can't open /dev/ipath, network down (err=26)
node01.10703ipath_userinit: assign_context command failed: Network is down
node01.10703can't open /dev/ipath, network down (err=26)
node01.10701ipath_userinit: assign_context command failed: Network is down
node01.10701can't open /dev/ipath, network down (err=26)
node01.10700ipath_userinit: assign_context command failed: Network is down
node01.10700can't open /dev/ipath, network down (err=26)
node01.10697ipath_userinit: assign_context command failed: Network is down
node01.10697can't open /dev/ipath, network down (err=26)
--
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

Error: Could not detect network connectivity
--

Any Ideas how to fix this?

--
Prentice


Hi Prentice,

This is not openMPI related but merely due to your hardware. I've not many 
details but I think this occurs when several jobs share the same node and you 
have a large number of cores on these nodes (> 14). If this is the case:

On Qlogic (I'm using such a hardware at this time) you have 16 channel for 
communication on each HBA and, if I remember what I had read many years ago, 2 
are dedicated to the system. When launching MPI applications, each process of a 
job request for it's own dedicated channel if available, else they share ALL 
the available channels. So if a second job starts on the same node it do not 
remains any available channel.

To avoid this situation I force sharing the channels (my nodes have 20 codes) 
by 2 MPI processes. You can set this with a simple environment variable. On all 
my cluster nodes I create the file:

/etc/profile.d/ibsetcontext.sh

And it contains:

# allow 4 processes to share an hardware MPI context
# in infiniband with PSM
export PSM_RANKS_PER_CONTEXT=2

Of course if some people manage to oversubscribe on the cores (more than one 
process by core) it could rise again the problem but we do not oversubscribe.

Hope this can help you.

Patrick


Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit :
>
> We often get the following errors when more than one job runs on the
> same compute node. We are using Slurm with OpenMPI. The IB cards are
> QLogic using PSM:
>
> 10698ipath_userinit: assign_context command failed: Network is down
> node01.10698can't open /dev/ipath, network down (err=26)
> node01.10703ipath_userinit: assign_context command failed: Network is down
> node01.10703can't open /dev/ipath, network down (err=26)
> node01.10701ipath_userinit: assign_context command failed: Network is down
> node01.10701can't open /dev/ipath, network down (err=26)
> node01.10700ipath_userinit: assign_context command failed: Network is down
> node01.10700can't open /dev/ipath, network down (err=26)
> node01.10697ipath_userinit: assign_context command failed: Network is down
> node01.10697can't open /dev/ipath, network down (err=26)
> --
> PSM was unable to open an endpoint. Please make sure that the network
> link is
> active on the node and the hardware is functioning.
>
> Error: Could not detect network connectivity
> --
>
> Any Ideas how to fix this?
>
> -- 
> Prentice 


Hi Prentice,

This is not openMPI related but merely due to your hardware. I've not
many details but I think this occurs when several jobs share the same
node and you have a large number of cores on these nodes (> 14). If this
is the case:

On Qlogic (I'm using such a hardware at this time) you have 16 channel
for communication on each HBA and, if I remember what I had read many
years ago, 2 are dedicated to the system. When launching MPI
applications, each process of a job request for it's own dedicated
channel if available, else they share ALL the available channels. So if
a second job starts on the same node it do not remains any available
channel.

To avoid this situation I force sharing the channels (my nodes have 20
codes) by 2 MPI processes. You can set this with a simple environment
variable. On all my cluster nodes I create the file:

*/etc/profile.d/ibsetcontext.sh*

And it contains:

# allow 4 processes to share an hardware MPI context
# in infiniband with PSM
*export PSM_RANKS_PER_CONTEXT=2*

Of course if some people manage to oversubscribe on the cores (more than
one process by core) it could rise again the problem but we do not
oversubscribe.

Hope this can help you.

Patrick



[OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Heinz, Michael William via users
Prentice,

Avoiding the obvious question of whether your FM is running and the fabric is 
in an active state, It sounds like your exhausting a resource on the cards. 
Ralph is correct about support for QLogic cards being long past but I’ll see 
what I can dig up in the archives on Monday to see if there’s a parameter you 
can adjust.

My vague recollection is that you shouldn’t try to have more compute processes 
than you have cores, that some resources are allocated on that basis. You might 
also look at the modinfo output for the device driver to see if there are any 
likely looking suspects.

Honestly, chances are better that you’ll get a hint from modinfo than that I’ll 
find a tuning guide laying around. Are these cards DDR or QDR?

Sent from my iPad