Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
Thanks everyone for all your assistance. The problem seems to be resolved now, although I'm not entirely sure why these changes made a difference. There were two things I changed: (1) I had some additional `export ...` lines in .bashrc before the `export PATH=...` and `export LD_LIBRARY_PATH=...`

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Gilles Gouaillardet
Xie Bin, According to the man page, -N is equivalent to npernode, which is equivalent to --map-by ppr:N:node. This is *not* equivalent to -map-by node : The former packs tasks to the same node, and the latter scatters tasks accross the nodes [gilles@login ~]$ mpirun --host n0:2,n1:2 -N

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, George: My command lines are: 1) single node mpirun --allow-run-as-root -mca btl self,tcp(or openib) -mca btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x OMP_NUM_THREADS=2 -n 32 myapp 2) 2-node cluster mpirun --allow-run-as-root -mca btl ^tcp(or ^openib) -mca btl_tcp_if_include

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, John: You are right on the network framework. I do have no IB switch and just connect the servers with an IB cable. I did not even open the opensmd service because it seems unnecessary in this situation. Can this be the reason why IB performs poorer? Interconnection details are in the

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Gilles Gouaillardet
In the initial report, the /usr/bin/ssh process was in the 'T' state (it generally hints the process is attached by a debugger) /usr/bin/ssh -x b09-32 orted did behave as expected (e.g. orted was executed, exited with an error since the command line is invalid, and error message was received)

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Jeff Squyres (jsquyres)
Yes, that "T" state is quite puzzling. You didn't attach a debugger or hit the ssh with a signal, did you? (we had a similar situation on the devel list recently, but it only happened with a very old version of Slurm. We concluded that it was a SLURM bug that has since been fixed. And just

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread r...@open-mpi.org
You got that error because the orted is looking for its rank on the cmd line and not finding it. > On May 14, 2018, at 12:37 PM, Max Mellette wrote: > > Hi Gus, > > Thanks for the suggestions. The correct version of openmpi seems to be > getting picked up; I also

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
Hi Gus, Thanks for the suggestions. The correct version of openmpi seems to be getting picked up; I also prepended .bashrc with the installation path like you suggested, but it didn't seemed to help: user@b09-30:~$ cat .bashrc export

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Gus Correa
Hi Max Just in case, as environment mix often happens. Could it be that you are inadvertently picking another installation of OpenMPI, perhaps installed from packages in /usr , or /usr/local? That's easy to check with 'which mpiexec' or 'which mpicc', for instance. Have you tried to prepend (as

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-14 Thread Nathan Hjelm
Still looks to me like MPI_Scan is what you want. Just need three additional communicators (one for each direction). With a recurive doubling MPI_Scan inplementation it is O(log n) compared to O(n) in time. > On May 14, 2018, at 8:42 AM, Pierre Gubernatis >

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread George Bosilca
Shared memory communication is important for multi-core platforms, especially when you have multiple processes per node. But this is only part of your issue here. You haven't specified how your processes will be mapped on your resources. As a result rank 0 and 1 will be on the same node, so you

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-14 Thread Pierre Gubernatis
Thank you to all of you for your answers (I was off up to now). Actually my question was't well posed. I stated it more clearly in this post, with the answer:

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
John, Thanks for the suggestions. In this case there is no cluster manager / job scheduler; these are just a couple of individual hosts in a rack. The reason for the generic names is that I anonymized the full network address in the previous posts, truncating to just the host name. My home

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread John Hearns via users
Xie Bin, I do hate to ask this. You say "in a two-node cluster (IB direcet-connected). " Does that mean that you have no IB switch, and that there is a single IB cable joining up these two servers? If so please run:ibstatusibhosts ibdiagnet I am trying to check if the IB fabric is

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread Blade Shieh
Hi, Nathan: Thanks for you reply. 1) It was my mistake not to notice usage of osu_latency. Now it worked well, but still poorer in openib. 2) I did not use sm or vader because I wanted to check performance between tcp and openib. Besides, I will run the application in cluster, so vader is not

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread John Hearns via users
One very, very stupid question here. This arose over on the Slurm list actually. Those hostnames look like quite generic names, ie they are part of an HPC cluster? Do they happen to have independednt home directories for your userid? Could that possibly make a difference to the MPI launcher? On