Thanks Arun and Dmitry for your support.
Well, I am building my own libfabric, and I export the right variables and
source intel mpi with -ofi_internal=0. I figured out where the problem is:1. If
libfabric is built for all providers, i.e. run ./configure without including
and exluding providers, it will build ibverbs among others; however, the mpi
test program will hang during execution.
2. If libfabric configured with only enabling ibverbs and setting all other
providers, i.e. ./configure --enable-verbs=yes --enable-rxm=no --enable-rxd=no
--enable-sockets=no --enable-tcp=no --enable-udp=no, mpi test program will run
through
Another observation when I enable debug, --enable-debug, I get the
aforementioned message (here it is
again):prov/verbs/src/ep_rdm/verbs_rdm_cm.c:337:fi_ibv_rdm_process_addr_resolved:
Assertion `id->verbs ==ep->domain->verbs' failed.
and the mpi test program runs through in case 2 above. I am not sure whether or
not I should take this message seriously?
I did not see any difference in the test mpi program behaviour if I build
ibverbs as a DSO (--enable-verbs=dl) or as the default which I suppose would be
part of libfabric (--enable-verbs=yes) except in case of DSO, the
FI_PROVIDER_PATH must be exported. However, worth mentioning as a bug
(probably), when ibverbs (or any other provider I assume) is built as a DSO,
the libfabric folder under which the provider DSOs are put has wrong
permissions, which means if you build libfabric as a root and use default
installation folders (/usr/local/lib), your mpi program would not run throughÂ
if you launch it as some other user.
Regards,Mohammed
Am Mittwoch, 21. November 2018, 19:42:24 MEZ hat Ilango, Arun
<[email protected]> Folgendes geschrieben:
Mohammed,
Just to add what Dmitry said, if you're using your own libfabric, please make
sure it's the latest (i.e. v1.6.2). You can check the version by running
fi_info --version.
Other things to check:
1. Make sure you have librdmacm package installed.
2. Check if the IPoIB interface of the node has been configured with an IP
address and is pingable from other nodes in the cluster.
Thanks,
Arun.
-----Original Message-----
From: Gladkov, Dmitry
Sent: Wednesday, November 21, 2018 10:31 AM
To: Hefty, Sean <[email protected]>; Mohammed Shaheen
<[email protected]>; [email protected];
[email protected]
Cc: Ilango, Arun <[email protected]>
Subject: RE: [libfabric-users] intel mpi with libfabric
Hi Mohammed,
Do you use your own version of libfabirc?
IMPI 2019 U1 uses its internal libfabric by default.
If you use your libfabric, please, specify LD_LIBRABRY_PATH to your library and
FI_PROVIDER_PATH to path to OFI DL providers (<ofi_install_dir>/lib/libfabric)
if you use DL provider, or unset this variable (mpivars.sh sets it).
--
Dmitry
-----Original Message-----
From: Hefty, Sean
Sent: Wednesday, November 21, 2018 8:52 PM
To: Mohammed Shaheen <[email protected]>;
[email protected]; [email protected]
Cc: Ilango, Arun <[email protected]>; Gladkov, Dmitry
<[email protected]>
Subject: RE: [libfabric-users] intel mpi with libfabric
Copying ofiwg and key developers for this issue.
- Sean
> I get the following error running a small mpi test program using intel
> mpi 2019 from intel parallel studio cluster edition update 1 (the
> newest) on Mellanox FDR Cluster:
>
>
>
> test.e: prov/verbs/src/ep_rdm/verbs_rdm_cm.c:337:
> fi_ibv_rdm_process_addr_resolved: Assertion `id->verbs == ep->domain-
> >verbs' failed.
>
>
>
> The program hangs on this error message. I installed the newest
> release of libfabric and configured it with only ibverbs support. I
> used the inbox (sles 11 sp4 and sles 12 sp3) ibverbs and rdma
> libraries. I also tried with mellanox ofed to no avail.
>
>
>
>
> Any ideas how to go about it?
>
>
>
>
>
> Regards,
>
> Mohammed
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg