On May 25, 2015, at 3:04 AM, Xavier Besseron <xavier.besse...@uni.lu> wrote:
Hi,
Thanks for your reply Ralph.
The option only I'm using when configuring OpenMPI is '--prefix'.
When checking the config.log file, I see
configure:208504: checking whether the openib BTL will use malloc hooks
configure:208510: result: yes
so I guess it is properly enabled (full config.log in attachment of this email).
However, I think I have the reason of the bug (lines refer to source code of
OpenMPI 1.8.5):
The default value of memalign_threshold is taken from eager_limit in function
btl_openib_register_mca_params() in btl_openib_mca.c line 717.
But the default value is eager_limit is set in btl_openib_component.c at line
193 right after the call to btl_openib_register_mca_params().
To summarize, memalign_threshold gets its value from eager_limit before this
one gets its value assigned.
Best regards,
Xavier
On Mon, May 25, 2015 at 2:27 AM, Ralph Castain <r...@open-mpi.org> wrote:
Looking at the code, we do in fact set the memalign_threshold = eager_limit by
default, but only if you configured with —enable-btl-openib-malloc-alignment
AND/OR we found the malloc hook functions were available.
You might check config.log to see if the openib malloc hooks were enabled. My
guess is that they weren’t, for some reason.
On May 24, 2015, at 9:07 AM, Xavier Besseron <xavier.besse...@uni.lu> wrote:
Dear OpenMPI developers / users,
This is much more a comment than a question since I believe I have already
solved my issue. But I would like to report it.
I have noticed my code performed very badly with OpenMPI when Infinand is
enabled, sometime +50% or even +100% overhead.
I also have this slowdown when running with one thread and one process. In such
case, there is no other MPI call than MPI_Init() and MPI_Finalize().
This overhead disappears if I disable at runtime the openib btl, ie with '--mca
btl ^openib'.
After further investigation, I figured out it comes from the memory allocator
which is aligning every memory allocation when Infiniband is used.
This makes sense because my code is a large irregular C++ code creating and
deleting many objects.
Just below is the documentation of the relevant MCA parameters coming ompi_info:
MCA btl: parameter "btl_openib_memalign" (current value: "32", data source:
default, level: 9 dev/all, type: int)
[64 | 32 | 0] - Enable (64bit or 32bit)/Disable(0) memoryalignment
for all malloc calls if btl openib is used.
MCA btl: parameter "btl_openib_memalign_threshold" (current value: "0", data
source: default, level: 9 dev/all, type: size_t)
Allocating memory more than btl_openib_memalign_threshholdbytes will
automatically be algined to the value of btl_openib_memalign
bytes.memalign_threshhold defaults to the same value as
mca_btl_openib_eager_limit.
MCA btl: parameter "btl_openib_eager_limit" (current value: "12288", data
source: default, level: 4 tuner/basic, type: size_t)
Maximum size (in bytes, including header) of "short" messages (must be
>= 1).
In the end, the problem is that the default value for
btl_openib_memalign_threshold is 0, which means that all memory allocations are
aligned to 32 bits.
The documentation says that the default value of btl_openib_memalign_threshold
should be the the same as btl_openib_eager_limit, ie 12288 instead of 0.
On my side, changing btl_openib_memalign_threshold to 12288 fixes my
performance issue.
However, I believe that the default value of btl_openib_memalign_threshold
should be fixed in the OpenMPI code (or at least the documentation should be
fixed).
I tried OpenMPI 1.8.5, 1.7.3 and 1.6.4 and it's all the same.
Bonus question:
As this issue might impact other users, I'm considering applying a global fix
on our clusters by setting this default value etc/openmpi-mca-params.conf.
Do you see any good reason not doing it?
Thank you for your comments.
Best regards,
Xavier
--
Dr Xavier BESSERON
Research associate
FSTC, University of Luxembourg
Campus Kirchberg, Office E-007
Phone: +352 46 66 44 5418
http://luxdem.uni.lu/
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/05/26913.php
--
Dr Xavier BESSERON
Research associate
FSTC, University of Luxembourg
Campus Kirchberg, Office E-007
Phone: +352 46 66 44 5418
http://luxdem.uni.lu/
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/05/26915.php