I'm afraid I don't have many better answers for you. I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv *on a single node* with `--mca btl openib,self`?
I don't remember offhand, but I didn't think that openib was supposed to do loopback communication. E.g., if both MPI processes are on the same node, `--mca btl openib,vader,self` should do the trick (where "vader" = shared memory support). More specifically: are you running into a problem running openib (and/or UCX) across multiple nodes? I can't speak to Nvidia support on various models of [older] hardware (including UCX support on that hardware). But be aware that openib is definitely going away; it is wholly being replaced by UCX. It may be that your only option is to stick with older software stacks in these hardware environments. > On Aug 23, 2020, at 9:46 PM, Tony Ladd via users <users@lists.open-mpi.org> > wrote: > > Hi John > > Thanks for the response. I have run all those diagnostics, and as best I can > tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) > and the fabric passes all the tests. There is 1 warning: > > I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00 > -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps > > but according to a number of sources this is harmless. > > I have run Mellanox's P2P performance tests (ib_write_bw) between different > pairs of nodes and it reports 3.22 GB/sec which is reasonable (its PCIe 2 x8 > interface ie 4 GB/s). I have also configured 2 nodes back to back to check > that the switch is not the problem - it makes no difference. > > I have been playing with the btl params with openMPI (v. 2.1.1 which is what > is relelased in Ubuntu 18.04). So with tcp as the transport layer everything > works fine - 1 node or 2 node communication - I have tested up to 16 > processes (8+8) and it seems fine. Of course the latency is much higher on > the tcp interface, so I would still like to access the RDMA layer. But unless > I exclude the openib module, it always hangs. Same with OpenMPI v4 compiled > from source. > > I think an important component is that Mellanox is not supporting Connect X2 > for some time. This is really infuriating; a $500 network card with no > supported drivers, but that is business for you I suppose. I have 50 NICS and > I can't afford to replace them all. The other component is the MLNX-OFED is > tied to specific software versions, so I can't just run an older set of > drivers. I have not seen source files for the Mellanox drivers - I would take > a crack at compiling them if I did. In the past I have used the OFED drivers > (on Centos 5) with no problem, but I don't think this is an option now. > > Ubuntu claims to support Connect X2 with their drivers (Mellanox confirms > this), but of course this is community support and the number of cases is > obviously small. I use the Ubuntu drivers right now because the OFED install > seems broken and there is no help with it. Its not supported! Neat huh? > > The only handle I have is with openmpi v. 2 when there is a message (see my > original post) that ibv_obj returns a NULL result. But I don't understand the > significance of the message (if any). > > I am not enthused about UCX - the documentation has several obvious typos in > it, which is not encouraging when you a floundering. I know its a newish > project but I have used openib for 10+ years and its never had a problem > until now. I think this is not so much openib as the software below. One > other thing I should say is that if I run any recent version of mstflint is > always complains: > > Failed to identify the device - Can not create SignatureManager! > > Going back to my original OFED 1.5 this did not happen, but they are at v5 > now. > > Everything else works as far as I can see. But I could not burn new firmware > except by going back to the 1.5 OS. Perhaps this is connected with the > obv_obj = NULL result. > > Thanks for helping out. As you can see I am rather stuck. > > Best > > Tony > > On 8/23/20 3:01 AM, John Hearns via users wrote: >> *[External Email]* >> >> Tony, start at a low level. Is the Infiniband fabric healthy? >> Run >> ibstatus on every node >> sminfo on one node >> ibdiagnet on one node >> >> On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users <users@lists.open-mpi.org >> <mailto:users@lists.open-mpi.org>> wrote: >> >> Hi Jeff >> >> I installed ucx as you suggested. But I can't get even the >> simplest code >> (ucp_client_server) to work across the network. I can compile openMPI >> with UCX but it has the same problem - mpi codes will not execute and >> there are no messages. Really, UCX is not helping. It is adding >> another >> (not so well documented) software layer, which does not offer better >> diagnostics as far as I can see. Its also unclear to me how to >> control >> what drivers are being loaded - UCX wants to make that decision >> for you. >> With openMPI I can see that (for instance) the tcp module works both >> locally and over the network - it must be using the Mellanox NIC >> for the >> bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But >> if I >> try to use openib (or allow ucx or openmpi to choose the transport >> layer) it just hangs. Annoyingly I have this server where everything >> works just fine - I can run locally over openib and its fine. All the >> other nodes cannot seem to load openib so even local jobs fail. >> >> The only good (as best I can tell) diagnostic is from openMPI. >> ibv_obj >> (from v2.x) complains that openib returns a NULL object, whereas >> on my >> server it returns logical_index=1. Can we not try to diagnose the >> problem with openib not loading (see my original post for >> details). I am >> pretty sure if we can that would fix the problem. >> >> Thanks >> >> Tony >> >> PS I tried configuring two nodes back to back to see if it was a >> switch >> issue, but the result was the same. >> >> >> On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote: >> > [External Email] >> > >> > Tony -- >> > >> > Have you tried compiling Open MPI with UCX support? This is >> Mellanox (NVIDIA's) preferred mechanism for InfiniBand support >> these days -- the openib BTL is legacy. >> > >> > You can run: mpirun --mca pml ucx ... >> > >> > >> >> On Aug 19, 2020, at 12:46 PM, Tony Ladd via users >> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >> >> >> >> One other update. I compiled OpenMPI-4.0.4 The outcome was the >> same but there is no mention of ibv_obj this time. >> >> >> >> Tony >> >> >> >> -- >> >> >> >> Tony Ladd >> >> >> >> Chemical Engineering Department >> >> University of Florida >> >> Gainesville, Florida 32611-6005 >> >> USA >> >> >> >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu> >> >> Web http://ladd.che.ufl.edu >> >> >> >> Tel: (352)-392-6509 >> >> FAX: (352)-392-9514 >> >> >> >> <outf34-4.0><outfoam-4.0> >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> > >> -- Tony Ladd >> >> Chemical Engineering Department >> University of Florida >> Gainesville, Florida 32611-6005 >> USA >> >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu> >> Web http://ladd.che.ufl.edu >> >> Tel: (352)-392-6509 >> FAX: (352)-392-9514 >> > -- > Tony Ladd > > Chemical Engineering Department > University of Florida > Gainesville, Florida 32611-6005 > USA > > Email: tladd-"(AT)"-che.ufl.edu > Web http://ladd.che.ufl.edu > > Tel: (352)-392-6509 > FAX: (352)-392-9514 > -- Jeff Squyres jsquy...@cisco.com