Aha. I dimly remember a problem with the ibverbs /dev device - maybe the
permissions,
or more likely the owner account for that device.



On Tue, 25 Aug 2020 at 02:44, Tony Ladd <tl...@che.ufl.edu> wrote:

> Hi Jeff
>
> I appreciate your help (and John's as well). At this point I don't think
> is an OMPI problem - my mistake. I think the communication with RDMA is
> somehow disabled (perhaps its the verbs layer - I am not very
> knowledgeable with this). It used to work like a dream but Mellanox has
> apparently disabled some of the Connect X2 components, because neither
> ompi or ucx (with/without ompi) could connect with the RDMA layer. Some
> of the infiniband functions are also not working on the X2 (mstflint,
> mstconfig).
>
> In fact ompi always tries to access the openib module. I have to
> explicitly disable it even to run on 1 node. So I think it is in
> initialization not communication that the problem lies. This is why (I
> think) ibv_obj returns NULL. The better news is that with the tcp stack
> everything works fine (ompi, ucx, 1 node, many nodes) - the bandwidth is
> similar to rdma so for large messages its semi OK. Its a partial
> solution - not all I wanted of course. The direct rdma functions
> ib_read_lat etc also work fine with expected results. I am suspicious
> this disabling of the driver is a commercial more than a technical
> decision.
>
> I am going to try going back to Ubuntu 16.04 - there is a version of
> OFED that still supports the X2. But I think it may still get messed up
> by kernel upgrades (it does for 18.04 I found). So its not an easy path.
>
> Thanks again.
>
> Tony
>
> On 8/24/20 11:35 AM, Jeff Squyres (jsquyres) wrote:
> > [External Email]
> >
> > I'm afraid I don't have many better answers for you.
> >
> > I can't quite tell from your machines, but are you running IMB-MPI1
> Sendrecv *on a single node* with `--mca btl openib,self`?
> >
> > I don't remember offhand, but I didn't think that openib was supposed to
> do loopback communication.  E.g., if both MPI processes are on the same
> node, `--mca btl openib,vader,self` should do the trick (where "vader" =
> shared memory support).
> >
> > More specifically: are you running into a problem running openib (and/or
> UCX) across multiple nodes?
> >
> > I can't speak to Nvidia support on various models of [older] hardware
> (including UCX support on that hardware).  But be aware that openib is
> definitely going away; it is wholly being replaced by UCX.  It may be that
> your only option is to stick with older software stacks in these hardware
> environments.
> >
> >
> >> On Aug 23, 2020, at 9:46 PM, Tony Ladd via users <
> users@lists.open-mpi.org> wrote:
> >>
> >> Hi John
> >>
> >> Thanks for the response. I have run all those diagnostics, and as best
> I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients +
> server) and the fabric passes all the tests. There is 1 warning:
> >>
> >> I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps
> SL:0x00
> >> -W- Suboptimal rate for group. Lowest member rate:40Gbps >
> group-rate:10Gbps
> >>
> >> but according to a number of sources this is harmless.
> >>
> >> I have run Mellanox's P2P performance tests (ib_write_bw) between
> different pairs of nodes and it reports 3.22 GB/sec which is reasonable
> (its PCIe 2 x8 interface ie 4 GB/s). I have also configured 2 nodes back to
> back to check that the switch is not the problem - it makes no difference.
> >>
> >> I have been playing with the btl params with openMPI (v. 2.1.1 which is
> what is relelased in Ubuntu 18.04). So with tcp as the transport layer
> everything works fine - 1 node or 2 node communication - I have tested up
> to 16 processes (8+8) and it seems fine. Of course the latency is much
> higher on the tcp interface, so I would still like to access the RDMA
> layer. But unless I exclude the openib module, it always hangs. Same with
> OpenMPI v4 compiled from source.
> >>
> >> I think an important component is that Mellanox is not supporting
> Connect X2 for some time. This is really infuriating; a $500 network card
> with no supported drivers, but that is business for you I suppose. I have
> 50 NICS and I can't afford to replace them all. The other component is the
> MLNX-OFED is tied to specific software versions, so I can't just run an
> older set of drivers. I have not seen source files for the Mellanox drivers
> - I would take a crack at compiling them if I did. In the past I have used
> the OFED drivers (on Centos 5) with no problem, but I don't think this is
> an option now.
> >>
> >> Ubuntu claims to support Connect X2 with their drivers (Mellanox
> confirms this), but of course this is community support and the number of
> cases is obviously small. I use the Ubuntu drivers right now because the
> OFED install seems broken and there is no help with it. Its not supported!
> Neat huh?
> >>
> >> The only handle I have is with openmpi v. 2 when there is a message
> (see my original post) that ibv_obj returns a NULL result. But I don't
> understand the significance of the message (if any).
> >>
> >> I am not enthused about UCX - the documentation has several obvious
> typos in it, which is not encouraging when you a floundering. I know its a
> newish project but I have used openib for 10+ years and its never had a
> problem until now. I think this is not so much openib as the software
> below. One other thing I should say is that if I run any recent version of
> mstflint is always complains:
> >>
> >> Failed to identify the device - Can not create SignatureManager!
> >>
> >> Going back to my original OFED 1.5 this did not happen, but they are at
> v5 now.
> >>
> >> Everything else works as far as I can see. But I could not burn new
> firmware except by going back to the 1.5 OS. Perhaps this is connected with
> the obv_obj = NULL result.
> >>
> >> Thanks for helping out. As you can see I am rather stuck.
> >>
> >> Best
> >>
> >> Tony
> >>
> >> On 8/23/20 3:01 AM, John Hearns via users wrote:
> >>> *[External Email]*
> >>>
> >>> Tony, start at a low level. Is the Infiniband fabric healthy?
> >>> Run
> >>> ibstatus   on every node
> >>> sminfo on one node
> >>> ibdiagnet on one node
> >>>
> >>> On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users <
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
> >>>
> >>>     Hi Jeff
> >>>
> >>>     I installed ucx as you suggested. But I can't get even the
> >>>     simplest code
> >>>     (ucp_client_server) to work across the network. I can compile
> openMPI
> >>>     with UCX but it has the same problem - mpi codes will not execute
> and
> >>>     there are no messages. Really, UCX is not helping. It is adding
> >>>     another
> >>>     (not so well documented) software layer, which does not offer
> better
> >>>     diagnostics as far as I can see. Its also unclear to me how to
> >>>     control
> >>>     what drivers are being loaded - UCX wants to make that decision
> >>>     for you.
> >>>     With openMPI I can see that (for instance) the tcp module works
> both
> >>>     locally and over the network - it must be using the Mellanox NIC
> >>>     for the
> >>>     bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But
> >>>     if I
> >>>     try to use openib (or allow ucx or openmpi to choose the transport
> >>>     layer) it just hangs. Annoyingly I have this server where
> everything
> >>>     works just fine - I can run locally over openib and its fine. All
> the
> >>>     other nodes cannot seem to load openib so even local jobs fail.
> >>>
> >>>     The only good (as best I can tell) diagnostic is from openMPI.
> >>>     ibv_obj
> >>>     (from v2.x) complains  that openib returns a NULL object, whereas
> >>>     on my
> >>>     server it returns logical_index=1. Can we not try to diagnose the
> >>>     problem with openib not loading (see my original post for
> >>>     details). I am
> >>>     pretty sure if we can that would fix the problem.
> >>>
> >>>     Thanks
> >>>
> >>>     Tony
> >>>
> >>>     PS I tried configuring two nodes back to back to see if it was a
> >>>     switch
> >>>     issue, but the result was the same.
> >>>
> >>>
> >>>     On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote:
> >>>     > [External Email]
> >>>     >
> >>>     > Tony --
> >>>     >
> >>>     > Have you tried compiling Open MPI with UCX support? This is
> >>>     Mellanox (NVIDIA's) preferred mechanism for InfiniBand support
> >>>     these days -- the openib BTL is legacy.
> >>>     >
> >>>     > You can run: mpirun --mca pml ucx ...
> >>>     >
> >>>     >
> >>>     >> On Aug 19, 2020, at 12:46 PM, Tony Ladd via users
> >>>     <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>
> wrote:
> >>>     >>
> >>>     >> One other update. I compiled OpenMPI-4.0.4 The outcome was the
> >>>     same but there is no mention of ibv_obj this time.
> >>>     >>
> >>>     >> Tony
> >>>     >>
> >>>     >> --
> >>>     >>
> >>>     >> Tony Ladd
> >>>     >>
> >>>     >> Chemical Engineering Department
> >>>     >> University of Florida
> >>>     >> Gainesville, Florida 32611-6005
> >>>     >> USA
> >>>     >>
> >>>     >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
> >>>     >> Web http://ladd.che.ufl.edu
> >>>     >>
> >>>     >> Tel:   (352)-392-6509
> >>>     >> FAX:   (352)-392-9514
> >>>     >>
> >>>     >> <outf34-4.0><outfoam-4.0>
> >>>     >
> >>>     > --
> >>>     > Jeff Squyres
> >>>     > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> >>>     >
> >>>     --     Tony Ladd
> >>>
> >>>     Chemical Engineering Department
> >>>     University of Florida
> >>>     Gainesville, Florida 32611-6005
> >>>     USA
> >>>
> >>>     Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
> >>>     Web http://ladd.che.ufl.edu
> >>>
> >>>     Tel:   (352)-392-6509
> >>>     FAX:   (352)-392-9514
> >>>
> >> --
> >> Tony Ladd
> >>
> >> Chemical Engineering Department
> >> University of Florida
> >> Gainesville, Florida 32611-6005
> >> USA
> >>
> >> Email: tladd-"(AT)"-che.ufl.edu
> >> Web    http://ladd.che.ufl.edu
> >>
> >> Tel:   (352)-392-6509
> >> FAX:   (352)-392-9514
> >>
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> --
> Tony Ladd
>
> Chemical Engineering Department
> University of Florida
> Gainesville, Florida 32611-6005
> USA
>
> Email: tladd-"(AT)"-che.ufl.edu
> Web    http://ladd.che.ufl.edu
>
> Tel:   (352)-392-6509
> FAX:   (352)-392-9514
>
>

Reply via email to