Re: [OMPI users] Problem in starting openmpi job - no output just hangs - SOLVED

2020-09-01 Thread Tony Ladd via users
Jeff I found the solution - rdma needs significant memory so the limits on the shell have to be increased. I needed to add the lines * soft memlock unlimited * hard memlock unlimited to the end of the file /etc/security/limits.conf. After that the openib driver loads and everything is fine

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread Jeff Squyres (jsquyres) via users
On Aug 24, 2020, at 9:44 PM, Tony Ladd wrote: > > I appreciate your help (and John's as well). At this point I don't think is > an OMPI problem - my mistake. I think the communication with RDMA is somehow > disabled (perhaps its the verbs layer - I am not very knowledgeable with > this). It

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
I apologise. That was an Omnipath issue https://www.beowulf.org/pipermail/beowulf/2017-March/034214.html On Tue, 25 Aug 2020 at 08:17, John Hearns wrote: > Aha. I dimly remember a problem with the ibverbs /dev device - maybe the > permissions, > or more likely the owner account for that device.

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
Aha. I dimly remember a problem with the ibverbs /dev device - maybe the permissions, or more likely the owner account for that device. On Tue, 25 Aug 2020 at 02:44, Tony Ladd wrote: > Hi Jeff > > I appreciate your help (and John's as well). At this point I don't think > is an OMPI problem -

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Tony Ladd via users
Hi Jeff I appreciate your help (and John's as well). At this point I don't think is an OMPI problem - my mistake. I think the communication with RDMA is somehow disabled (perhaps its the verbs layer - I am not very knowledgeable with this). It used to work like a dream but Mellanox has

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't have many better answers for you. I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv *on a single node* with `--mca btl openib,self`? I don't remember offhand, but I didn't think that openib was supposed to do loopback communication. E.g., if both

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread Tony Ladd via users
Hi John Thanks for the response. I have run all those diagnostics, and as best I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) and the fabric passes all the tests. There is 1 warning: I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread John Hearns via users
Tony, start at a low level. Is the Infiniband fabric healthy? Run ibstatus on every node sminfo on one node ibdiagnet on one node On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users wrote: > Hi Jeff > > I installed ucx as you suggested. But I can't get even the simplest code >

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-22 Thread Tony Ladd via users
Hi Jeff I installed ucx as you suggested. But I can't get even the simplest code (ucp_client_server) to work across the network. I can compile openMPI with UCX but it has the same problem - mpi codes will not execute and there are no messages. Really, UCX is not helping. It is adding another

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Jeff Squyres (jsquyres) via users
Tony -- Have you tried compiling Open MPI with UCX support? This is Mellanox (NVIDIA's) preferred mechanism for InfiniBand support these days -- the openib BTL is legacy. You can run: mpirun --mca pml ucx ... > On Aug 19, 2020, at 12:46 PM, Tony Ladd via users > wrote: > > One other

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Tony Ladd via users
One other update. I compiled OpenMPI-4.0.4 The outcome was the same but there is no mention of ibv_obj this time. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel:

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-17 Thread Tony Ladd via users
My apologies - I did not read the FAQ's carefully enough - with regard to 14: 1. openib 2. Ubuntu supplied drivers etc. 3. Ubuntu 18.04  4.15.0-112-generic 4. opensm-3.3.5_mlnx-0.1.g6b18e73 5. Attached 6. Attached 7. unlimited on foam and 16384 on f34 I changed the ulimit to unlimited on