Good morning,

We've never recommended the use of dapl on Linux. I think it might have worked at one time, but I don't think anyone bothered to maintain it.
On Linux, you should probably use native verbs support, instead.

Well, we use 'Open MPI + openib' since some years now (started with Sun's ClusterTools and Open MPI 1.2.x, now we have self-build 1.4.x and 1.5.x Open MPI).

The problem is, that on our new, big, sexy cluster (some 1700 nodes connected to common QDR InfiniBand fabric), running MPI over DAPL seem to be quite faster than running over native IB. Yes, it is puzzling.

But  reproducible:
Intel MPI (over DAPL) => 100%
OpenMPI (over openib) => 90% on some 4/5 machines (Westmere dual-Socket)
OpenMPI (over openib) => 45% on some 1/5 machines (Nehalem quad-Socket)
Intel MPI (over ofa) ==> the same values than OpenMPI!

(Bandwidth in a PingPong test, e.g. Intel MPI benchmark, and two other PingPongs)

The question about WHY native IB is slower than DAPL is a very good one (did you have any ideas?). As said it is reproducible: switching from dapl to ofa in Intel MPI also switches the performance of PingPong.

(You may say "your test is wrong" but we tried out three different PingPong tests, producing very similar values).

The second question is How to Learn OpenMPI to Use DAPL.

Meanwhile, I compiled lotz of versions (1.4.3, 1.4.4, 1.5.3, 1.5.4) using at least two DAPL versions and option --with-udapl. The versions are build well, but always on start, the initialisation of DAPL fails (message see below) and the communication runs as usual over openib.

Also the error message says "may be an invalid Registry in the dat.conf file", this seem to be very unlikely: with the same dat.conf the Intel MPI can use DAPL. (and yes, OpenMPI really use the same dat.conf than Intel MPI, set over DAT_OVERRIDE - checked and double-checked).

--------------------------------------------------------------------------
WARNING: Failed to open "ofa-v2-mlx4_0-1u" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------

Because of the anticipated performance gain we would be very keen on using DAPL with Open MPI. Does somebody have any idea what could be wrong and what to check?





On Dec 2, 2011, at 1:21 PM, Paul Kapinos wrote:

Dear Open MPI developer,

OFED 1.5.4 will contain DAPL 2.0.34.

I tried to compile the newest release of Open MPI (1.5.4) with this DAPL 
release and I was not successful.

Configuring with --with-udapl=/path/to/2.0.34/dapl
got the error "/path/to/2.0.34/dapl/include/dat/udat.h not found"
Looking into include dir: there is no 'dat' subdir but 'dat2'.

Just for fun I also tried to move 'dat2' to 'dat' back (dirty hack I know :-) - 
the configure stage was then successful but the compilation failed. The header 
seem to be really changed, not just moved.

The question: are the Open MPI developer aware of this changes, and when a 
version of Open MPI will be available with support for DAPL 2.0.34?

(Background: we have some trouble with Intel MPI and current DAPL which we do 
not have with DAPL 2.0.34, so our dream is to update as soon as possible)

Best wishes and an nice weekend,

Paul






http://www.openfabrics.org/downloads/OFED/release_notes/OFED_1.5.4_release_notes



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to