Dear Open MPI developers
Well, I just can't keep my promises for too long ...
So, here I am pestering you again, although this time
it is not a request for more documentation.
Hopefully it is something more legit.
I am having trouble using knem with Open MPI 1.8.3,
and need your help.
I configured Open MPI 1.8.3 with knem.
I had done the same with some builds of Open MPI 1.6.5 before.
When I build and launch the Intel MPI benchmarks (IMB)
with Open MPI 1.6.5,
starts showing non-zero-and-growing statistics right away.
However, when I build and launch IMB with Open MPI 1.8.3,
/dev/knem shows only zeros,
no statistics growing, nothing.
Knem just seems to be completely asleep.
So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
at least not for me.
The runtime environment related to knem is setup the
same way on both OPMI releases.
I tried setting it up both on the command line:
-mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576
and on the MCA parameter file:
btl_sm_use_knem = 1
btl_sm_eager_limit = 32768
btl_sm_knem_dma_min = 1048576
and the behavior is the same (i.e., knem is active in 1.6.5,
but doesn't seem to be used by 1.8.3, as indicated by the
When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:
#define OMPI_BTL_SM_HAVE_KNEM 1
suggesting that both configurations picked up knem correctly.
On the other hand, when I do 'ompi_info --all --all |grep knem',
OMPI 1.6.5 shows "btl_sm_have_knem_support":
'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data
source: default value) Whether this component supports the knem Linux
kernel module or not'
By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item
although the *other* 'btl sm knem' items are there,
I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.
- Am I doing something totally wrong,
perhaps with the knem runtime environment?
- Was knem somehow phased out in 1.8.3?
- Could there be a bad interaction with other runtime parameters that
somehow is knocking out knem in 1.8.3?
(FYI, besides knem, I'm just excluding the tcp btl, binding to core, and
reporting the bindings, which is exactly what I do on 1.6.5,
although the runtime parameter syntax has changed.)
- Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
(i.e. a bug)
- Is there a way to increase verbosity to detect if knem is being
used by OMPI?
That would certainly help to check what is going on.
I tried '-mca btl_base_verbose 30' but there was no trace of knem
in sderr/stdout of either 1.6.5 or 1.8.3.
So, the evidence I have that knem is
active in 1.6.5 but not in 1.8.3 comes only from the statistics in
PS - As an aside, I also have some questions on the knem setup,
which I mostly copied from the knem web site
(hopefully Brice Goglin is listening ...):
- Is 32768 in 'btl_sm_eager_limit 32768' a good number,
or should it be larger/smaller/something else?
[OK, I know I should benchmark it, but exploring the whole parameter
space takes long, so why not asking? ]
- Is it worth using 'btl_sm_knem_dma_min 1048576'?
[I think I read somewhere that this dma engine offload
is an Intel thing, not AMD.]
- How about btl_sm_knem_max_simultaneous?
That one is not mentioned in the knem web site.
Should I leave it default to zero or set it to 1? 2? 4? Something else?