On Sep 12, 2018, at 4:54 AM, Balázs Hajgató <balazs.hajg...@vub.be> wrote:
> 
> Setting mca oob to tcp works. I will stick to this solution in our production 
> environment.

Great!

> I am not sure that it is relevant, but I also tried the patch on a 
> non-procduction OpenMPI 3.1.1, and "mpirun -host nic114,nic151 hostname" 
> works without any parameters, but issuing the libibverbs error (libibverbs: 
> GRH is mandatory For RoCE address handle)

Yeah, I didn't think the fix I did would get rid of that warning.  I didn't dig 
any deeper in the "ud" oob plugin than looking for that double free.

> However, if i enforce mca oob ud, then it does not work, it hangs after 
> issuing error:
> [nic151:23609] [[45140,0],2] ORTE_ERROR_LOG: Unreachable in file 
> oob_ud_send.c at line 141

Somehow the ud oob plugin is failing to make a UD IB verbs handle to contact 
all the other possible interfaces for the peer.  I'm not sure why that is 
happening -- perhaps IPoIB isn't setup?  This is likely a question for your IB 
support people.

It shouldn't be hanging, either, but that's unlikely to get fixed, 
unfortunately (i.e., because this is an uncommon error and because the "ud" oob 
component is EOLed / will be removed in Open MPI v4.0.0 -- it's on its last, 
dying breaths in the v3.1.x series...).

-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to