Dear OpenMPI developers / users,

This is much more a comment than a question since I believe I have already
solved my issue. But I would like to report it.

I have noticed my code performed very badly with OpenMPI when Infinand is
enabled, sometime +50% or even +100% overhead.
I also have this slowdown when running with one thread and one process. In
such case, there is no other MPI call than MPI_Init() and MPI_Finalize().
This overhead disappears if I disable at runtime the openib btl, ie with '--mca
btl ^openib'.
After further investigation, I figured out it comes from the memory
allocator which is aligning every memory allocation when Infiniband is used.
This makes sense because my code is a large irregular C++ code creating and
deleting many objects.

Just below is the documentation of the relevant MCA parameters coming
ompi_info:

MCA btl: parameter "*btl_openib_memalign*" (current value: "32", data
source: default, level: 9 dev/all, type: int)
         [64 | 32 | 0] - Enable (64bit or 32bit)/Disable(0) memoryalignment
for all malloc calls if btl openib is used.

MCA btl: parameter "*btl_openib_memalign_threshold*" (current value: "*0*",
data source: default, level: 9 dev/all, type: size_t)
         Allocating memory more than btl_openib_memalign_threshholdbytes
will automatically be algined to the value of btl_openib_memalign
bytes.*memalign_threshhold
defaults to the same value as mca_btl_openib_eager_limit*.

MCA btl: parameter "*btl_openib_eager_limit*" (current value: "*12288*",
data source: default, level: 4 tuner/basic, type: size_t)
         Maximum size (in bytes, including header) of "short" messages
(must be >= 1).


In the end, the problem is that the default value for
btl_openib_memalign_threshold is 0, which means that *all* memory
allocations are aligned to 32 bits.
The documentation says that the default value of
btl_openib_memalign_threshold should be the the same as
btl_openib_eager_limit, ie 12288 instead of 0.

On my side, changing btl_openib_memalign_threshold to 12288 fixes my
performance issue.
However, I believe that the default value of btl_openib_memalign_threshold
should be fixed in the OpenMPI code (or at least the documentation should
be fixed).

I tried OpenMPI 1.8.5, 1.7.3 and 1.6.4 and it's all the same.


Bonus question:
As this issue might impact other users, I'm considering applying a global
fix on our clusters by setting this default value
etc/openmpi-mca-params.conf.
Do you see any good reason not doing it?

Thank you for your comments.

Best regards,

Xavier


-- 
Dr Xavier BESSERON
Research associate
FSTC, University of Luxembourg
Campus Kirchberg, Office E-007
Phone: +352 46 66 44 5418
http://luxdem.uni.lu/

Reply via email to