I apologise. That was an Omnipath issue https://www.beowulf.org/pipermail/beowulf/2017-March/034214.html
On Tue, 25 Aug 2020 at 08:17, John Hearns <hear...@gmail.com> wrote: > Aha. I dimly remember a problem with the ibverbs /dev device - maybe the > permissions, > or more likely the owner account for that device. > > > > On Tue, 25 Aug 2020 at 02:44, Tony Ladd <tl...@che.ufl.edu> wrote: > >> Hi Jeff >> >> I appreciate your help (and John's as well). At this point I don't think >> is an OMPI problem - my mistake. I think the communication with RDMA is >> somehow disabled (perhaps its the verbs layer - I am not very >> knowledgeable with this). It used to work like a dream but Mellanox has >> apparently disabled some of the Connect X2 components, because neither >> ompi or ucx (with/without ompi) could connect with the RDMA layer. Some >> of the infiniband functions are also not working on the X2 (mstflint, >> mstconfig). >> >> In fact ompi always tries to access the openib module. I have to >> explicitly disable it even to run on 1 node. So I think it is in >> initialization not communication that the problem lies. This is why (I >> think) ibv_obj returns NULL. The better news is that with the tcp stack >> everything works fine (ompi, ucx, 1 node, many nodes) - the bandwidth is >> similar to rdma so for large messages its semi OK. Its a partial >> solution - not all I wanted of course. The direct rdma functions >> ib_read_lat etc also work fine with expected results. I am suspicious >> this disabling of the driver is a commercial more than a technical >> decision. >> >> I am going to try going back to Ubuntu 16.04 - there is a version of >> OFED that still supports the X2. But I think it may still get messed up >> by kernel upgrades (it does for 18.04 I found). So its not an easy path. >> >> Thanks again. >> >> Tony >> >> On 8/24/20 11:35 AM, Jeff Squyres (jsquyres) wrote: >> > [External Email] >> > >> > I'm afraid I don't have many better answers for you. >> > >> > I can't quite tell from your machines, but are you running IMB-MPI1 >> Sendrecv *on a single node* with `--mca btl openib,self`? >> > >> > I don't remember offhand, but I didn't think that openib was supposed >> to do loopback communication. E.g., if both MPI processes are on the same >> node, `--mca btl openib,vader,self` should do the trick (where "vader" = >> shared memory support). >> > >> > More specifically: are you running into a problem running openib >> (and/or UCX) across multiple nodes? >> > >> > I can't speak to Nvidia support on various models of [older] hardware >> (including UCX support on that hardware). But be aware that openib is >> definitely going away; it is wholly being replaced by UCX. It may be that >> your only option is to stick with older software stacks in these hardware >> environments. >> > >> > >> >> On Aug 23, 2020, at 9:46 PM, Tony Ladd via users < >> users@lists.open-mpi.org> wrote: >> >> >> >> Hi John >> >> >> >> Thanks for the response. I have run all those diagnostics, and as best >> I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + >> server) and the fabric passes all the tests. There is 1 warning: >> >> >> >> I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps >> SL:0x00 >> >> -W- Suboptimal rate for group. Lowest member rate:40Gbps > >> group-rate:10Gbps >> >> >> >> but according to a number of sources this is harmless. >> >> >> >> I have run Mellanox's P2P performance tests (ib_write_bw) between >> different pairs of nodes and it reports 3.22 GB/sec which is reasonable >> (its PCIe 2 x8 interface ie 4 GB/s). I have also configured 2 nodes back to >> back to check that the switch is not the problem - it makes no difference. >> >> >> >> I have been playing with the btl params with openMPI (v. 2.1.1 which >> is what is relelased in Ubuntu 18.04). So with tcp as the transport layer >> everything works fine - 1 node or 2 node communication - I have tested up >> to 16 processes (8+8) and it seems fine. Of course the latency is much >> higher on the tcp interface, so I would still like to access the RDMA >> layer. But unless I exclude the openib module, it always hangs. Same with >> OpenMPI v4 compiled from source. >> >> >> >> I think an important component is that Mellanox is not supporting >> Connect X2 for some time. This is really infuriating; a $500 network card >> with no supported drivers, but that is business for you I suppose. I have >> 50 NICS and I can't afford to replace them all. The other component is the >> MLNX-OFED is tied to specific software versions, so I can't just run an >> older set of drivers. I have not seen source files for the Mellanox drivers >> - I would take a crack at compiling them if I did. In the past I have used >> the OFED drivers (on Centos 5) with no problem, but I don't think this is >> an option now. >> >> >> >> Ubuntu claims to support Connect X2 with their drivers (Mellanox >> confirms this), but of course this is community support and the number of >> cases is obviously small. I use the Ubuntu drivers right now because the >> OFED install seems broken and there is no help with it. Its not supported! >> Neat huh? >> >> >> >> The only handle I have is with openmpi v. 2 when there is a message >> (see my original post) that ibv_obj returns a NULL result. But I don't >> understand the significance of the message (if any). >> >> >> >> I am not enthused about UCX - the documentation has several obvious >> typos in it, which is not encouraging when you a floundering. I know its a >> newish project but I have used openib for 10+ years and its never had a >> problem until now. I think this is not so much openib as the software >> below. One other thing I should say is that if I run any recent version of >> mstflint is always complains: >> >> >> >> Failed to identify the device - Can not create SignatureManager! >> >> >> >> Going back to my original OFED 1.5 this did not happen, but they are >> at v5 now. >> >> >> >> Everything else works as far as I can see. But I could not burn new >> firmware except by going back to the 1.5 OS. Perhaps this is connected with >> the obv_obj = NULL result. >> >> >> >> Thanks for helping out. As you can see I am rather stuck. >> >> >> >> Best >> >> >> >> Tony >> >> >> >> On 8/23/20 3:01 AM, John Hearns via users wrote: >> >>> *[External Email]* >> >>> >> >>> Tony, start at a low level. Is the Infiniband fabric healthy? >> >>> Run >> >>> ibstatus on every node >> >>> sminfo on one node >> >>> ibdiagnet on one node >> >>> >> >>> On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users < >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >> >>> >> >>> Hi Jeff >> >>> >> >>> I installed ucx as you suggested. But I can't get even the >> >>> simplest code >> >>> (ucp_client_server) to work across the network. I can compile >> openMPI >> >>> with UCX but it has the same problem - mpi codes will not execute >> and >> >>> there are no messages. Really, UCX is not helping. It is adding >> >>> another >> >>> (not so well documented) software layer, which does not offer >> better >> >>> diagnostics as far as I can see. Its also unclear to me how to >> >>> control >> >>> what drivers are being loaded - UCX wants to make that decision >> >>> for you. >> >>> With openMPI I can see that (for instance) the tcp module works >> both >> >>> locally and over the network - it must be using the Mellanox NIC >> >>> for the >> >>> bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But >> >>> if I >> >>> try to use openib (or allow ucx or openmpi to choose the transport >> >>> layer) it just hangs. Annoyingly I have this server where >> everything >> >>> works just fine - I can run locally over openib and its fine. All >> the >> >>> other nodes cannot seem to load openib so even local jobs fail. >> >>> >> >>> The only good (as best I can tell) diagnostic is from openMPI. >> >>> ibv_obj >> >>> (from v2.x) complains that openib returns a NULL object, whereas >> >>> on my >> >>> server it returns logical_index=1. Can we not try to diagnose the >> >>> problem with openib not loading (see my original post for >> >>> details). I am >> >>> pretty sure if we can that would fix the problem. >> >>> >> >>> Thanks >> >>> >> >>> Tony >> >>> >> >>> PS I tried configuring two nodes back to back to see if it was a >> >>> switch >> >>> issue, but the result was the same. >> >>> >> >>> >> >>> On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote: >> >>> > [External Email] >> >>> > >> >>> > Tony -- >> >>> > >> >>> > Have you tried compiling Open MPI with UCX support? This is >> >>> Mellanox (NVIDIA's) preferred mechanism for InfiniBand support >> >>> these days -- the openib BTL is legacy. >> >>> > >> >>> > You can run: mpirun --mca pml ucx ... >> >>> > >> >>> > >> >>> >> On Aug 19, 2020, at 12:46 PM, Tony Ladd via users >> >>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> >> wrote: >> >>> >> >> >>> >> One other update. I compiled OpenMPI-4.0.4 The outcome was the >> >>> same but there is no mention of ibv_obj this time. >> >>> >> >> >>> >> Tony >> >>> >> >> >>> >> -- >> >>> >> >> >>> >> Tony Ladd >> >>> >> >> >>> >> Chemical Engineering Department >> >>> >> University of Florida >> >>> >> Gainesville, Florida 32611-6005 >> >>> >> USA >> >>> >> >> >>> >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu> >> >>> >> Web http://ladd.che.ufl.edu >> >>> >> >> >>> >> Tel: (352)-392-6509 >> >>> >> FAX: (352)-392-9514 >> >>> >> >> >>> >> <outf34-4.0><outfoam-4.0> >> >>> > >> >>> > -- >> >>> > Jeff Squyres >> >>> > jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> >>> > >> >>> -- Tony Ladd >> >>> >> >>> Chemical Engineering Department >> >>> University of Florida >> >>> Gainesville, Florida 32611-6005 >> >>> USA >> >>> >> >>> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu> >> >>> Web http://ladd.che.ufl.edu >> >>> >> >>> Tel: (352)-392-6509 >> >>> FAX: (352)-392-9514 >> >>> >> >> -- >> >> Tony Ladd >> >> >> >> Chemical Engineering Department >> >> University of Florida >> >> Gainesville, Florida 32611-6005 >> >> USA >> >> >> >> Email: tladd-"(AT)"-che.ufl.edu >> >> Web http://ladd.che.ufl.edu >> >> >> >> Tel: (352)-392-6509 >> >> FAX: (352)-392-9514 >> >> >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com >> > >> -- >> Tony Ladd >> >> Chemical Engineering Department >> University of Florida >> Gainesville, Florida 32611-6005 >> USA >> >> Email: tladd-"(AT)"-che.ufl.edu >> Web http://ladd.che.ufl.edu >> >> Tel: (352)-392-6509 >> FAX: (352)-392-9514 >> >>