Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread Gilles Gouaillardet
The long story is you need always need a subnet manager to initialize the fabric. That means you can run the subnet manager and stop it once so each HCA is assigned a LID. In that case, the commands that interact with the SM (ibhosts, ibdiagnet) will obviously fail. Cheers, Gilles On

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread John Hearns via users
Xie, as far as I know you need to run OpenSM even on two hosts. On 15 May 2018 at 03:29, Blade Shieh wrote: > Hi, John: > > You are right on the network framework. I do have no IB switch and just > connect the servers with an IB cable. I did not even open the opensmd >

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread Blade Shieh
Hi Gilles, Thank you for pointing out my error on *-N*. And you are right that I opened opensmd service before so the link up can be set up correctly. But many IB-related command cannot be executed correctly, like ibhosts and ibdiagnet. As for pml, I am pretty sure I was using ob1, because

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Gilles Gouaillardet
Xie Bin, According to the man page, -N is equivalent to npernode, which is equivalent to --map-by ppr:N:node. This is *not* equivalent to -map-by node : The former packs tasks to the same node, and the latter scatters tasks accross the nodes [gilles@login ~]$ mpirun --host n0:2,n1:2 -N

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, George: My command lines are: 1) single node mpirun --allow-run-as-root -mca btl self,tcp(or openib) -mca btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x OMP_NUM_THREADS=2 -n 32 myapp 2) 2-node cluster mpirun --allow-run-as-root -mca btl ^tcp(or ^openib) -mca btl_tcp_if_include

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, John: You are right on the network framework. I do have no IB switch and just connect the servers with an IB cable. I did not even open the opensmd service because it seems unnecessary in this situation. Can this be the reason why IB performs poorer? Interconnection details are in the

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread George Bosilca
Shared memory communication is important for multi-core platforms, especially when you have multiple processes per node. But this is only part of your issue here. You haven't specified how your processes will be mapped on your resources. As a result rank 0 and 1 will be on the same node, so you

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread John Hearns via users
Xie Bin, I do hate to ask this. You say "in a two-node cluster (IB direcet-connected). " Does that mean that you have no IB switch, and that there is a single IB cable joining up these two servers? If so please run:ibstatusibhosts ibdiagnet I am trying to check if the IB fabric is

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, Nathan: Thanks for you reply. 1) It was my mistake not to notice usage of osu_latency. Now it worked well, but still poorer in openib. 2) I did not use sm or vader because I wanted to check performance between tcp and openib. Besides, I will run the application in cluster, so vader is not

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-13 Thread Nathan Hjelm
I see several problems 1) osu_latency only works with two procs. 2) You explicitly excluded shared memory support by specifying only self and openib (or tcp). If you want to just disable tcp or openib use —mca btl ^tcp or —mca btl ^openib Also, it looks like you have multiple ports active