Hello,

This is my first post, I've searched the FAQ for (what I think) are relative terms but am not finding an answer to my question.

We have several dozen 32-core clustered worker nodes interconnected with QLogic infiniband. Each node has two QLogic QLE7340 HCAs. As I understand QLogic's technology, each card offers 16 'hardware contexts' that are consumed by cooperating MPI processes - this is why we have two cards per host (we do not use 'shared contexts' and do not want to).

What we are seeing is when 32-process MPI jobs run on these nodes, all cooperating processes consume a hardware-context (runs using psm module). When one tries to run an MPI job using the psm module on the same node, a 'network not found' error is returned (this is expected and normal).

We would rather that OpenMPI use shared-mem (sm) module when running intra-node processes. We believe that by using our scheduler's allocation policy (packing) and considering our job mix, we might be able to add nodes to this cluster using only one HCA per node (again, we would rather not use 'shared contexts').

To test, I started a 32 process MPI on a single node and observed that all hardward contexts were consumed (ipathstats | awk '/CtxtsOpen/{print $2}'). Then I try to start another (mpigreetings) on the same node with these variations of mpirun:

mpirun --mca btl sm --mca mtl psm -np 32 mpigreetings

this fails with 'network not found' (it tried to use psm and did not try to use sm)

mpirun --mca btl sm --mca mtl ^psm -np 32 mpigreetings

this works (it uses sm). This will not work in general (for our customers) because not all MPI jobs will run intra-node.

I messed around with MCA params mtl_psm_priority and btl_sm_priority with no success...

Is it possible to make OpenMPI use sm when it's available before psm for processes on the same node?

TIA,
Tom

Tom Harvill
HCC - hcc.unl.edu
402.472.5660

Reply via email to