Hello,
This is my first post, I've searched the FAQ for (what I think) are
relative terms but am not finding an answer to my question.
We have several dozen 32-core clustered worker nodes interconnected with
QLogic infiniband. Each node has two QLogic QLE7340 HCAs. As I
understand QLogic's technology, each card offers 16 'hardware contexts'
that are consumed by cooperating MPI processes - this is why we have two
cards per host (we do not use 'shared contexts' and do not want to).
What we are seeing is when 32-process MPI jobs run on these nodes, all
cooperating processes consume a hardware-context (runs using psm module).
When one tries to run an MPI job using the psm module on the same node, a
'network not found' error is returned (this is expected and normal).
We would rather that OpenMPI use shared-mem (sm) module when running
intra-node processes. We believe that by using our scheduler's allocation
policy (packing) and considering our job mix, we might be able to add
nodes to this cluster using only one HCA per node (again, we would rather
not use 'shared contexts').
To test, I started a 32 process MPI on a single node and observed that all
hardward contexts were consumed (ipathstats | awk '/CtxtsOpen/{print
$2}'). Then I try to start another (mpigreetings) on the same node with
these variations of mpirun:
mpirun --mca btl sm --mca mtl psm -np 32 mpigreetings
this fails with 'network not found' (it tried to use psm and did not try
to use sm)
mpirun --mca btl sm --mca mtl ^psm -np 32 mpigreetings
this works (it uses sm). This will not work in general (for our
customers) because not all MPI jobs will run intra-node.
I messed around with MCA params mtl_psm_priority and btl_sm_priority with
no success...
Is it possible to make OpenMPI use sm when it's available before psm for
processes on the same node?
TIA,
Tom
Tom Harvill
HCC - hcc.unl.edu
402.472.5660