Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-15 Thread Jeff Squyres (jsquyres)
On May 15, 2018, at 1:39 AM, Max Mellette wrote: > > Thanks everyone for all your assistance. The problem seems to be resolved > now, although I'm not entirely sure why these changes made a difference. > There were two things I changed: > > (1) I had some additional `export

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-15 Thread Gustavo Correa
Hi Max Name resolution in /etc/hosts is a simple solution for (2). I hope this helps, Gus > On May 15, 2018, at 01:39, Max Mellette wrote: > > Thanks everyone for all your assistance. The problem seems to be resolved > now, although I'm not entirely sure why these changes

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread Gilles Gouaillardet
The long story is you need always need a subnet manager to initialize the fabric. That means you can run the subnet manager and stop it once so each HCA is assigned a LID. In that case, the commands that interact with the SM (ibhosts, ibdiagnet) will obviously fail. Cheers, Gilles On

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread John Hearns via users
Xie, as far as I know you need to run OpenSM even on two hosts. On 15 May 2018 at 03:29, Blade Shieh wrote: > Hi, John: > > You are right on the network framework. I do have no IB switch and just > connect the servers with an IB cable. I did not even open the opensmd >

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread Blade Shieh
Hi Gilles, Thank you for pointing out my error on *-N*. And you are right that I opened opensmd service before so the link up can be set up correctly. But many IB-related command cannot be executed correctly, like ibhosts and ibdiagnet. As for pml, I am pretty sure I was using ob1, because