I fear those cards are past end-of-life so far as support is concerned. I'm not sure if anyone can really advise you on them. It sounds like the fabric is experiencing failures, but that's just a guess.
On May 8, 2020, at 12:56 PM, Prentice Bisbal via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: We often get the following errors when more than one job runs on the same compute node. We are using Slurm with OpenMPI. The IB cards are QLogic using PSM: 10698ipath_userinit: assign_context command failed: Network is down node01.10698can't open /dev/ipath, network down (err=26) node01.10703ipath_userinit: assign_context command failed: Network is down node01.10703can't open /dev/ipath, network down (err=26) node01.10701ipath_userinit: assign_context command failed: Network is down node01.10701can't open /dev/ipath, network down (err=26) node01.10700ipath_userinit: assign_context command failed: Network is down node01.10700can't open /dev/ipath, network down (err=26) node01.10697ipath_userinit: assign_context command failed: Network is down node01.10697can't open /dev/ipath, network down (err=26) -------------------------------------------------------------------------- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Could not detect network connectivity -------------------------------------------------------------------------- Any Ideas how to fix this? -- Prentice