Re: [OMPI users] Question about component priority (psm/sm)

2012-06-01 Thread Tom Harvill



We would rather that OpenMPI use shared-mem (sm) module when running 
intra-node processes.


Doesn't PSM use shared memory to communicate between peers on the same 
node?


Possibly, yes (I'm not sure).  Even if it does it appears to consume a 
'hardware context' for each peer - this is what we want to avoid.


We believe that by using our scheduler's allocation policy (packing) 
and considering our job mix, we might be able to add nodes to this 
cluster using only one HCA per node (again, we would rather not use 
'shared contexts').


Are you saying that you want Open MPI to not use PSM when the job 
entirely fits within a single node?


Yes, considering that the use of sm instead of psm would conserve 
hardware contexts (and thus reduce the need for HCAs)


If so, you might want to experiment with the pre-job hook in the job 
scheduler.  You could try setting MCA parameters as environment 
variables (e.g., setenv OMPI_MCA_pml ob1 -- which would exclude the CM 
PML and therefore the PSM MTL) if your pre-job hook can tell if the job 
fits entirely on a single node.


Does that help?



That's an interesting idea that I will investigate.

Thank you,
Tom

Tom Harvill
hcc.unl.edu
402.472.5660


Re: [OMPI users] Question about component priority (psm/sm)

2012-06-01 Thread Jeff Squyres
On Jun 1, 2012, at 4:28 PM, Tom Harvill wrote:

> We would rather that OpenMPI use shared-mem (sm) module when running 
> intra-node processes.  

Doesn't PSM use shared memory to communicate between peers on the same node?

(that is hidden from us in Open MPI; I am *assuming* that internal to PSM, it 
uses shared memory for on-node communication and QLogic IB for off-node 
communication)

> We believe that by using our scheduler's allocation policy (packing) and 
> considering our job mix, we might be able to add nodes to this cluster using 
> only one HCA per node (again, we would rather not use 'shared contexts').

Are you saying that you want Open MPI to not use PSM when the job entirely fits 
within a single node?

If so, you might want to experiment with the pre-job hook in the job scheduler. 
 You could try setting MCA parameters as environment variables (e.g., setenv 
OMPI_MCA_pml ob1 -- which would exclude the CM PML and therefore the PSM MTL) 
if your pre-job hook can tell if the job fits entirely on a single node.

Does that help?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Question about component priority (psm/sm)

2012-06-01 Thread George Bosilca
MTL and BTL are mutually exclusive. If you use the psm MTL there is no way you 
can take advantage of the sm BTl.

  george.

On Jun 2, 2012, at 05:28 , Tom Harvill wrote:

> 
> Hello,
> 
> This is my first post, I've searched the FAQ for (what I think) are relative 
> terms but am not finding an answer to my question.
> 
> We have several dozen 32-core clustered worker nodes interconnected with 
> QLogic infiniband.  Each node has two QLogic QLE7340 HCAs.  As I understand 
> QLogic's technology, each card offers 16 'hardware contexts' that are 
> consumed by cooperating MPI processes - this is why we have two cards per 
> host (we do not use 'shared contexts' and do not want to).
> 
> What we are seeing is when 32-process MPI jobs run on these nodes, all 
> cooperating processes consume a hardware-context (runs using psm module). 
> When one tries to run an MPI job using the psm module on the same node, a 
> 'network not found' error is returned (this is expected and normal).
> 
> We would rather that OpenMPI use shared-mem (sm) module when running 
> intra-node processes.  We believe that by using our scheduler's allocation 
> policy (packing) and considering our job mix, we might be able to add nodes 
> to this cluster using only one HCA per node (again, we would rather not use 
> 'shared contexts').
> 
> To test, I started a 32 process MPI on a single node and observed that all 
> hardward contexts were consumed (ipathstats | awk '/CtxtsOpen/{print $2}').  
> Then I try to start another (mpigreetings) on the same node with these 
> variations of mpirun:
> 
> mpirun --mca btl sm --mca mtl psm -np 32 mpigreetings
> 
> this fails with 'network not found' (it tried to use psm and did not try to 
> use sm)
> 
> mpirun --mca btl sm --mca mtl ^psm -np 32 mpigreetings
> 
> this works (it uses sm).  This will not work in general (for our customers) 
> because not all MPI jobs will run intra-node.
> 
> I messed around with MCA params mtl_psm_priority and btl_sm_priority with no 
> success...
> 
> Is it possible to make OpenMPI use sm when it's available before psm for 
> processes on the same node?
> 
> TIA,
> Tom
> 
> Tom Harvill
> HCC - hcc.unl.edu
> 402.472.5660
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users