I fear those cards are past end-of-life so far as support is concerned. I'm not 
sure if anyone can really advise you on them. It sounds like the fabric is 
experiencing failures, but that's just a guess.


On May 8, 2020, at 12:56 PM, Prentice Bisbal via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:

 

We often get the following errors when more than one job runs on the same 
compute node. We are using Slurm with OpenMPI. The IB cards are QLogic using 
PSM: 
 

10698ipath_userinit: assign_context command failed: Network is down
 node01.10698can't open /dev/ipath, network down (err=26)
 node01.10703ipath_userinit: assign_context command failed: Network is down
 node01.10703can't open /dev/ipath, network down (err=26)
 node01.10701ipath_userinit: assign_context command failed: Network is down
 node01.10701can't open /dev/ipath, network down (err=26)
 node01.10700ipath_userinit: assign_context command failed: Network is down
 node01.10700can't open /dev/ipath, network down (err=26)
 node01.10697ipath_userinit: assign_context command failed: Network is down
 node01.10697can't open /dev/ipath, network down (err=26)
--------------------------------------------------------------------------
 PSM was unable to open an endpoint. Please make sure that the network link is
 active on the node and the hardware is functioning. 
 
 Error: Could not detect network connectivity
--------------------------------------------------------------------------

Any Ideas how to fix this? 
 

 

-- 
Prentice 

 

Reply via email to