On Sep 12, 2018, at 4:54 AM, Balázs Hajgató <balazs.hajg...@vub.be> wrote: > > Setting mca oob to tcp works. I will stick to this solution in our production > environment.
Great! > I am not sure that it is relevant, but I also tried the patch on a > non-procduction OpenMPI 3.1.1, and "mpirun -host nic114,nic151 hostname" > works without any parameters, but issuing the libibverbs error (libibverbs: > GRH is mandatory For RoCE address handle) Yeah, I didn't think the fix I did would get rid of that warning. I didn't dig any deeper in the "ud" oob plugin than looking for that double free. > However, if i enforce mca oob ud, then it does not work, it hangs after > issuing error: > [nic151:23609] [[45140,0],2] ORTE_ERROR_LOG: Unreachable in file > oob_ud_send.c at line 141 Somehow the ud oob plugin is failing to make a UD IB verbs handle to contact all the other possible interfaces for the peer. I'm not sure why that is happening -- perhaps IPoIB isn't setup? This is likely a question for your IB support people. It shouldn't be hanging, either, but that's unlikely to get fixed, unfortunately (i.e., because this is an uncommon error and because the "ud" oob component is EOLed / will be removed in Open MPI v4.0.0 -- it's on its last, dying breaths in the v3.1.x series...). -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users