I apologise. That was an Omnipath issue
https://www.beowulf.org/pipermail/beowulf/2017-March/034214.html

On Tue, 25 Aug 2020 at 08:17, John Hearns <hear...@gmail.com> wrote:

> Aha. I dimly remember a problem with the ibverbs /dev device - maybe the
> permissions,
> or more likely the owner account for that device.
>
>
>
> On Tue, 25 Aug 2020 at 02:44, Tony Ladd <tl...@che.ufl.edu> wrote:
>
>> Hi Jeff
>>
>> I appreciate your help (and John's as well). At this point I don't think
>> is an OMPI problem - my mistake. I think the communication with RDMA is
>> somehow disabled (perhaps its the verbs layer - I am not very
>> knowledgeable with this). It used to work like a dream but Mellanox has
>> apparently disabled some of the Connect X2 components, because neither
>> ompi or ucx (with/without ompi) could connect with the RDMA layer. Some
>> of the infiniband functions are also not working on the X2 (mstflint,
>> mstconfig).
>>
>> In fact ompi always tries to access the openib module. I have to
>> explicitly disable it even to run on 1 node. So I think it is in
>> initialization not communication that the problem lies. This is why (I
>> think) ibv_obj returns NULL. The better news is that with the tcp stack
>> everything works fine (ompi, ucx, 1 node, many nodes) - the bandwidth is
>> similar to rdma so for large messages its semi OK. Its a partial
>> solution - not all I wanted of course. The direct rdma functions
>> ib_read_lat etc also work fine with expected results. I am suspicious
>> this disabling of the driver is a commercial more than a technical
>> decision.
>>
>> I am going to try going back to Ubuntu 16.04 - there is a version of
>> OFED that still supports the X2. But I think it may still get messed up
>> by kernel upgrades (it does for 18.04 I found). So its not an easy path.
>>
>> Thanks again.
>>
>> Tony
>>
>> On 8/24/20 11:35 AM, Jeff Squyres (jsquyres) wrote:
>> > [External Email]
>> >
>> > I'm afraid I don't have many better answers for you.
>> >
>> > I can't quite tell from your machines, but are you running IMB-MPI1
>> Sendrecv *on a single node* with `--mca btl openib,self`?
>> >
>> > I don't remember offhand, but I didn't think that openib was supposed
>> to do loopback communication.  E.g., if both MPI processes are on the same
>> node, `--mca btl openib,vader,self` should do the trick (where "vader" =
>> shared memory support).
>> >
>> > More specifically: are you running into a problem running openib
>> (and/or UCX) across multiple nodes?
>> >
>> > I can't speak to Nvidia support on various models of [older] hardware
>> (including UCX support on that hardware).  But be aware that openib is
>> definitely going away; it is wholly being replaced by UCX.  It may be that
>> your only option is to stick with older software stacks in these hardware
>> environments.
>> >
>> >
>> >> On Aug 23, 2020, at 9:46 PM, Tony Ladd via users <
>> users@lists.open-mpi.org> wrote:
>> >>
>> >> Hi John
>> >>
>> >> Thanks for the response. I have run all those diagnostics, and as best
>> I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients +
>> server) and the fabric passes all the tests. There is 1 warning:
>> >>
>> >> I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps
>> SL:0x00
>> >> -W- Suboptimal rate for group. Lowest member rate:40Gbps >
>> group-rate:10Gbps
>> >>
>> >> but according to a number of sources this is harmless.
>> >>
>> >> I have run Mellanox's P2P performance tests (ib_write_bw) between
>> different pairs of nodes and it reports 3.22 GB/sec which is reasonable
>> (its PCIe 2 x8 interface ie 4 GB/s). I have also configured 2 nodes back to
>> back to check that the switch is not the problem - it makes no difference.
>> >>
>> >> I have been playing with the btl params with openMPI (v. 2.1.1 which
>> is what is relelased in Ubuntu 18.04). So with tcp as the transport layer
>> everything works fine - 1 node or 2 node communication - I have tested up
>> to 16 processes (8+8) and it seems fine. Of course the latency is much
>> higher on the tcp interface, so I would still like to access the RDMA
>> layer. But unless I exclude the openib module, it always hangs. Same with
>> OpenMPI v4 compiled from source.
>> >>
>> >> I think an important component is that Mellanox is not supporting
>> Connect X2 for some time. This is really infuriating; a $500 network card
>> with no supported drivers, but that is business for you I suppose. I have
>> 50 NICS and I can't afford to replace them all. The other component is the
>> MLNX-OFED is tied to specific software versions, so I can't just run an
>> older set of drivers. I have not seen source files for the Mellanox drivers
>> - I would take a crack at compiling them if I did. In the past I have used
>> the OFED drivers (on Centos 5) with no problem, but I don't think this is
>> an option now.
>> >>
>> >> Ubuntu claims to support Connect X2 with their drivers (Mellanox
>> confirms this), but of course this is community support and the number of
>> cases is obviously small. I use the Ubuntu drivers right now because the
>> OFED install seems broken and there is no help with it. Its not supported!
>> Neat huh?
>> >>
>> >> The only handle I have is with openmpi v. 2 when there is a message
>> (see my original post) that ibv_obj returns a NULL result. But I don't
>> understand the significance of the message (if any).
>> >>
>> >> I am not enthused about UCX - the documentation has several obvious
>> typos in it, which is not encouraging when you a floundering. I know its a
>> newish project but I have used openib for 10+ years and its never had a
>> problem until now. I think this is not so much openib as the software
>> below. One other thing I should say is that if I run any recent version of
>> mstflint is always complains:
>> >>
>> >> Failed to identify the device - Can not create SignatureManager!
>> >>
>> >> Going back to my original OFED 1.5 this did not happen, but they are
>> at v5 now.
>> >>
>> >> Everything else works as far as I can see. But I could not burn new
>> firmware except by going back to the 1.5 OS. Perhaps this is connected with
>> the obv_obj = NULL result.
>> >>
>> >> Thanks for helping out. As you can see I am rather stuck.
>> >>
>> >> Best
>> >>
>> >> Tony
>> >>
>> >> On 8/23/20 3:01 AM, John Hearns via users wrote:
>> >>> *[External Email]*
>> >>>
>> >>> Tony, start at a low level. Is the Infiniband fabric healthy?
>> >>> Run
>> >>> ibstatus   on every node
>> >>> sminfo on one node
>> >>> ibdiagnet on one node
>> >>>
>> >>> On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users <
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>> >>>
>> >>>     Hi Jeff
>> >>>
>> >>>     I installed ucx as you suggested. But I can't get even the
>> >>>     simplest code
>> >>>     (ucp_client_server) to work across the network. I can compile
>> openMPI
>> >>>     with UCX but it has the same problem - mpi codes will not execute
>> and
>> >>>     there are no messages. Really, UCX is not helping. It is adding
>> >>>     another
>> >>>     (not so well documented) software layer, which does not offer
>> better
>> >>>     diagnostics as far as I can see. Its also unclear to me how to
>> >>>     control
>> >>>     what drivers are being loaded - UCX wants to make that decision
>> >>>     for you.
>> >>>     With openMPI I can see that (for instance) the tcp module works
>> both
>> >>>     locally and over the network - it must be using the Mellanox NIC
>> >>>     for the
>> >>>     bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But
>> >>>     if I
>> >>>     try to use openib (or allow ucx or openmpi to choose the transport
>> >>>     layer) it just hangs. Annoyingly I have this server where
>> everything
>> >>>     works just fine - I can run locally over openib and its fine. All
>> the
>> >>>     other nodes cannot seem to load openib so even local jobs fail.
>> >>>
>> >>>     The only good (as best I can tell) diagnostic is from openMPI.
>> >>>     ibv_obj
>> >>>     (from v2.x) complains  that openib returns a NULL object, whereas
>> >>>     on my
>> >>>     server it returns logical_index=1. Can we not try to diagnose the
>> >>>     problem with openib not loading (see my original post for
>> >>>     details). I am
>> >>>     pretty sure if we can that would fix the problem.
>> >>>
>> >>>     Thanks
>> >>>
>> >>>     Tony
>> >>>
>> >>>     PS I tried configuring two nodes back to back to see if it was a
>> >>>     switch
>> >>>     issue, but the result was the same.
>> >>>
>> >>>
>> >>>     On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote:
>> >>>     > [External Email]
>> >>>     >
>> >>>     > Tony --
>> >>>     >
>> >>>     > Have you tried compiling Open MPI with UCX support? This is
>> >>>     Mellanox (NVIDIA's) preferred mechanism for InfiniBand support
>> >>>     these days -- the openib BTL is legacy.
>> >>>     >
>> >>>     > You can run: mpirun --mca pml ucx ...
>> >>>     >
>> >>>     >
>> >>>     >> On Aug 19, 2020, at 12:46 PM, Tony Ladd via users
>> >>>     <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>
>> wrote:
>> >>>     >>
>> >>>     >> One other update. I compiled OpenMPI-4.0.4 The outcome was the
>> >>>     same but there is no mention of ibv_obj this time.
>> >>>     >>
>> >>>     >> Tony
>> >>>     >>
>> >>>     >> --
>> >>>     >>
>> >>>     >> Tony Ladd
>> >>>     >>
>> >>>     >> Chemical Engineering Department
>> >>>     >> University of Florida
>> >>>     >> Gainesville, Florida 32611-6005
>> >>>     >> USA
>> >>>     >>
>> >>>     >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
>> >>>     >> Web http://ladd.che.ufl.edu
>> >>>     >>
>> >>>     >> Tel:   (352)-392-6509
>> >>>     >> FAX:   (352)-392-9514
>> >>>     >>
>> >>>     >> <outf34-4.0><outfoam-4.0>
>> >>>     >
>> >>>     > --
>> >>>     > Jeff Squyres
>> >>>     > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>> >>>     >
>> >>>     --     Tony Ladd
>> >>>
>> >>>     Chemical Engineering Department
>> >>>     University of Florida
>> >>>     Gainesville, Florida 32611-6005
>> >>>     USA
>> >>>
>> >>>     Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
>> >>>     Web http://ladd.che.ufl.edu
>> >>>
>> >>>     Tel:   (352)-392-6509
>> >>>     FAX:   (352)-392-9514
>> >>>
>> >> --
>> >> Tony Ladd
>> >>
>> >> Chemical Engineering Department
>> >> University of Florida
>> >> Gainesville, Florida 32611-6005
>> >> USA
>> >>
>> >> Email: tladd-"(AT)"-che.ufl.edu
>> >> Web    http://ladd.che.ufl.edu
>> >>
>> >> Tel:   (352)-392-6509
>> >> FAX:   (352)-392-9514
>> >>
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> >
>> --
>> Tony Ladd
>>
>> Chemical Engineering Department
>> University of Florida
>> Gainesville, Florida 32611-6005
>> USA
>>
>> Email: tladd-"(AT)"-che.ufl.edu
>> Web    http://ladd.che.ufl.edu
>>
>> Tel:   (352)-392-6509
>> FAX:   (352)-392-9514
>>
>>

Reply via email to