[OMPI users] handle_wc() in openib and IBV_WC_DRIVER2/MLX5DV_WC_RAW_WQE completion code

2022-02-22 Thread Crni Gorac via users
We've encountered OpenMPI crashing in handle_wc(), with following error message: [.../opal/mca/btl/openib/btl_openib_component.c:3610:handle_wc] Unhandled work completion opcode is 136 Our setup is admittedly little tricky, but I'm still worried that it may be a genuine problem, so please bear

[OMPI users] OMPI_COMM_WORLD_LOCAL_SIZE problem between PBS and MLNX_OFED

2022-01-18 Thread Crni Gorac via users
Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and have PBS 18.1.4 installed on my cluster (cluster nodes are running CentOS 7.9). When I try to submit a job that will run on two nodes in the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2, instead of 1, and

Re: [OMPI users] OMPI_COMM_WORLD_LOCAL_SIZE problem between PBS and MLNX_OFED

2022-01-18 Thread Crni Gorac via users
that you are using the ssh launcher - what is odd is > that you should wind up with two procs on the first node, in which case those > envars are correct. If you are seeing one proc on each node, then something > is wrong. > > > > On Jan 18, 2022, at 1:33 PM, Crni Gorac via users

Re: [OMPI users] OMPI_COMM_WORLD_LOCAL_SIZE problem between PBS and MLNX_OFED

2022-01-18 Thread Crni Gorac via users
using the ssh launcher - what is odd is > that you should wind up with two procs on the first node, in which case those > envars are correct. If you are seeing one proc on each node, then something > is wrong. > > > > On Jan 18, 2022, at 1:33 PM, Crni Gorac via users > >

Re: [OMPI users] OMPI_COMM_WORLD_LOCAL_SIZE problem between PBS and MLNX_OFED

2022-01-18 Thread Crni Gorac via users
: > > Afraid I can't understand your scenario - when you say you "submit a job" to > run on two nodes, how many processes are you running on each node?? > > > > On Jan 18, 2022, at 1:07 PM, Crni Gorac via users > > wrote: > > > > Using OpenMPI

Re: [OMPI users] OMPI_COMM_WORLD_LOCAL_SIZE problem between PBS and MLNX_OFED

2022-01-18 Thread Crni Gorac via users
-host node1:5" assigns 5 slots > to node1. > > If tm support is included, then we read the PBS allocation and see one slot > on each node - and launch accordingly. > > > > On Jan 18, 2022, at 2:44 PM, Crni Gorac via users > > wrote: > > > > OK,