I'm using the infiniband drivers in the CentOS7 distribution, not the
Mellanox drivers.  The version of Lustre we're using is built against the
distro drivers and breaks if the Mellanox drivers get installed.

Is there a particular version of ucx which should be used with openmpi
4.0.4?  I downloaded ucx 1.8.1 and installed it, then tried to configure
openmpi with --with-ucx=<location> but the configure failed.  The configure
finds the ucx installation OK but thinks some symbols are undeclared.  I
tried to find those in the ucx source area (in case I configured ucx wrong)
but didn't turn them up anywhere.  Here is the bottom of the configure
output showing mostly "yes" for checks but a series of "no" at the end.

[...]
checking ucp/api/ucp.h usability... yes
checking ucp/api/ucp.h presence... yes
checking for ucp/api/ucp.h... yes
checking for library containing ucp_cleanup... no
checking whether ucp_tag_send_nbr is declared... yes
checking whether ucp_ep_flush_nb is declared... yes
checking whether ucp_worker_flush_nb is declared... yes
checking whether ucp_request_check_status is declared... yes
checking whether ucp_put_nb is declared... yes
checking whether ucp_get_nb is declared... yes
checking whether ucm_test_events is declared... yes
checking whether UCP_ATOMIC_POST_OP_AND is declared... yes
checking whether UCP_ATOMIC_POST_OP_OR is declared... yes
checking whether UCP_ATOMIC_POST_OP_XOR is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FAND is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FOR is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FXOR is declared... yes
checking whether UCP_PARAM_FIELD_ESTIMATED_NUM_PPN is declared... yes
checking whether UCP_WORKER_ATTR_FIELD_ADDRESS_FLAGS is declared... yes
checking whether ucp_tag_send_nbx is declared... no
checking whether ucp_tag_send_sync_nbx is declared... no
checking whether ucp_tag_recv_nbx is declared... no
checking for ucp_request_param_t... no
configure: error: UCX support requested but not found.  Aborting


.. Lana (lana.de...@gmail.com)




On Mon, Jul 20, 2020 at 12:43 PM Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

> Correct, UCX = OpenUCX.org.
>
> If you have the Mellanox drivers package installed, it probably would have
> installed UCX (and Open MPI).  You'll have to talk to your sysadmin and/or
> Mellanox support for details about that.
>
>
> On Jul 20, 2020, at 11:36 AM, Lana Deere <lana.de...@gmail.com> wrote:
>
> I assume UCX is https://www.openucx.org?  (Google found several things
> called UCX when I searched, but that seemed the right one.)  I will try
> installing it and then reinstall OpenMPI.  Hopefully it will then choose
> between network transports automatically based on what's available.  I'll
> also look at the slides and see if I can make sense of them.  Thanks.
>
> .. Lana (lana.de...@gmail.com)
>
>
>
>
> On Sat, Jul 18, 2020 at 9:41 AM Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> On Jul 16, 2020, at 2:56 PM, Lana Deere via users <
>> users@lists.open-mpi.org> wrote:
>>
>>
>> I am new to open mpi.  I built 4.0.4 on a CentOS7 machine and tried doing
>> an mpirun of a small program compiled against openmpi.  It seems to have
>> failed because my host does not have infiniband.  I can't seem to figure
>> out how I should configure when I build so it will do what I want, namely
>> use infiniband if there are IB HCAs on the system and otherwise use the
>> ethernet on the system.
>>
>>
>> UCX is the underlying library that Mellanox/Nvidia prefers these days for
>> use with MPI and InfiniBand.
>>
>> Meaning: you should first install UCX and then build Open MPI with
>> --with-ucx=/directory/of/ucx/installation.
>>
>> We just hosted parts 1 and 2 of a seminar entitled "The ABCs of Open MPI"
>> that covered topics like this.  Check out:
>>
>> https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-1
>> and
>> https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-2
>>
>> In particular, you might want to look at slides 28-42 in part 2 for a
>> bunch of discussion about how Open MPI (by default) picks the underlying
>> network / APIs to use, and then how you can override that if you want to.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>

Reply via email to