I'm afraid I don't have many better answers for you.

I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv 
*on a single node* with `--mca btl openib,self`?

I don't remember offhand, but I didn't think that openib was supposed to do 
loopback communication.  E.g., if both MPI processes are on the same node, 
`--mca btl openib,vader,self` should do the trick (where "vader" = shared 
memory support).

More specifically: are you running into a problem running openib (and/or UCX) 
across multiple nodes?

I can't speak to Nvidia support on various models of [older] hardware 
(including UCX support on that hardware).  But be aware that openib is 
definitely going away; it is wholly being replaced by UCX.  It may be that your 
only option is to stick with older software stacks in these hardware 
environments.


> On Aug 23, 2020, at 9:46 PM, Tony Ladd via users <users@lists.open-mpi.org> 
> wrote:
> 
> Hi John
> 
> Thanks for the response. I have run all those diagnostics, and as best I can 
> tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) 
> and the fabric passes all the tests. There is 1 warning:
> 
> I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00
> -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps
> 
> but according to a number of sources this is harmless.
> 
> I have run Mellanox's P2P performance tests (ib_write_bw) between different 
> pairs of nodes and it reports 3.22 GB/sec which is reasonable (its PCIe 2 x8 
> interface ie 4 GB/s). I have also configured 2 nodes back to back to check 
> that the switch is not the problem - it makes no difference.
> 
> I have been playing with the btl params with openMPI (v. 2.1.1 which is what 
> is relelased in Ubuntu 18.04). So with tcp as the transport layer everything 
> works fine - 1 node or 2 node communication - I have tested up to 16 
> processes (8+8) and it seems fine. Of course the latency is much higher on 
> the tcp interface, so I would still like to access the RDMA layer. But unless 
> I exclude the openib module, it always hangs. Same with OpenMPI v4 compiled 
> from source.
> 
> I think an important component is that Mellanox is not supporting Connect X2 
> for some time. This is really infuriating; a $500 network card with no 
> supported drivers, but that is business for you I suppose. I have 50 NICS and 
> I can't afford to replace them all. The other component is the MLNX-OFED is 
> tied to specific software versions, so I can't just run an older set of 
> drivers. I have not seen source files for the Mellanox drivers - I would take 
> a crack at compiling them if I did. In the past I have used the OFED drivers 
> (on Centos 5) with no problem, but I don't think this is an option now.
> 
> Ubuntu claims to support Connect X2 with their drivers (Mellanox confirms 
> this), but of course this is community support and the number of cases is 
> obviously small. I use the Ubuntu drivers right now because the OFED install 
> seems broken and there is no help with it. Its not supported! Neat huh?
> 
> The only handle I have is with openmpi v. 2 when there is a message (see my 
> original post) that ibv_obj returns a NULL result. But I don't understand the 
> significance of the message (if any).
> 
> I am not enthused about UCX - the documentation has several obvious typos in 
> it, which is not encouraging when you a floundering. I know its a newish 
> project but I have used openib for 10+ years and its never had a problem 
> until now. I think this is not so much openib as the software below. One 
> other thing I should say is that if I run any recent version of mstflint is 
> always complains:
> 
> Failed to identify the device - Can not create SignatureManager!
> 
> Going back to my original OFED 1.5 this did not happen, but they are at v5 
> now.
> 
> Everything else works as far as I can see. But I could not burn new firmware 
> except by going back to the 1.5 OS. Perhaps this is connected with the 
> obv_obj = NULL result.
> 
> Thanks for helping out. As you can see I am rather stuck.
> 
> Best
> 
> Tony
> 
> On 8/23/20 3:01 AM, John Hearns via users wrote:
>> *[External Email]*
>> 
>> Tony, start at a low level. Is the Infiniband fabric healthy?
>> Run
>> ibstatus   on every node
>> sminfo on one node
>> ibdiagnet on one node
>> 
>> On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users <users@lists.open-mpi.org 
>> <mailto:users@lists.open-mpi.org>> wrote:
>> 
>>    Hi Jeff
>> 
>>    I installed ucx as you suggested. But I can't get even the
>>    simplest code
>>    (ucp_client_server) to work across the network. I can compile openMPI
>>    with UCX but it has the same problem - mpi codes will not execute and
>>    there are no messages. Really, UCX is not helping. It is adding
>>    another
>>    (not so well documented) software layer, which does not offer better
>>    diagnostics as far as I can see. Its also unclear to me how to
>>    control
>>    what drivers are being loaded - UCX wants to make that decision
>>    for you.
>>    With openMPI I can see that (for instance) the tcp module works both
>>    locally and over the network - it must be using the Mellanox NIC
>>    for the
>>    bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But
>>    if I
>>    try to use openib (or allow ucx or openmpi to choose the transport
>>    layer) it just hangs. Annoyingly I have this server where everything
>>    works just fine - I can run locally over openib and its fine. All the
>>    other nodes cannot seem to load openib so even local jobs fail.
>> 
>>    The only good (as best I can tell) diagnostic is from openMPI.
>>    ibv_obj
>>    (from v2.x) complains  that openib returns a NULL object, whereas
>>    on my
>>    server it returns logical_index=1. Can we not try to diagnose the
>>    problem with openib not loading (see my original post for
>>    details). I am
>>    pretty sure if we can that would fix the problem.
>> 
>>    Thanks
>> 
>>    Tony
>> 
>>    PS I tried configuring two nodes back to back to see if it was a
>>    switch
>>    issue, but the result was the same.
>> 
>> 
>>    On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote:
>>    > [External Email]
>>    >
>>    > Tony --
>>    >
>>    > Have you tried compiling Open MPI with UCX support? This is
>>    Mellanox (NVIDIA's) preferred mechanism for InfiniBand support
>>    these days -- the openib BTL is legacy.
>>    >
>>    > You can run: mpirun --mca pml ucx ...
>>    >
>>    >
>>    >> On Aug 19, 2020, at 12:46 PM, Tony Ladd via users
>>    <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>    >>
>>    >> One other update. I compiled OpenMPI-4.0.4 The outcome was the
>>    same but there is no mention of ibv_obj this time.
>>    >>
>>    >> Tony
>>    >>
>>    >> --
>>    >>
>>    >> Tony Ladd
>>    >>
>>    >> Chemical Engineering Department
>>    >> University of Florida
>>    >> Gainesville, Florida 32611-6005
>>    >> USA
>>    >>
>>    >> Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
>>    >> Web http://ladd.che.ufl.edu
>>    >>
>>    >> Tel:   (352)-392-6509
>>    >> FAX:   (352)-392-9514
>>    >>
>>    >> <outf34-4.0><outfoam-4.0>
>>    >
>>    > --
>>    > Jeff Squyres
>>    > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>    >
>>    --     Tony Ladd
>> 
>>    Chemical Engineering Department
>>    University of Florida
>>    Gainesville, Florida 32611-6005
>>    USA
>> 
>>    Email: tladd-"(AT)"-che.ufl.edu <http://che.ufl.edu>
>>    Web http://ladd.che.ufl.edu
>> 
>>    Tel:   (352)-392-6509
>>    FAX:   (352)-392-9514
>> 
> -- 
> Tony Ladd
> 
> Chemical Engineering Department
> University of Florida
> Gainesville, Florida 32611-6005
> USA
> 
> Email: tladd-"(AT)"-che.ufl.edu
> Web    http://ladd.che.ufl.edu
> 
> Tel:   (352)-392-6509
> FAX:   (352)-392-9514
> 


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to