hi
please subscribe [email protected]
https://gridengine.org/mailman/listinfo/users

regards


On 3/23/2012 12:08 PM, Ventre, Brian D. wrote:
All,

We are in the process of setting up a Rocks 5.4.3 cluster with SGE and need 
some advice on setting up queues.  Our configuration is:
- 1 front-end node (4 cores, Xeon X3460, Lynnfield)
- 1 login node (16 cores, 2x Xeon E5-2690, Sandy Bridge-EP)
- 4 "small" compute nodes (4 cores, Xeon X3460, Lynnfield)
- 5 "large" compute nodes (16 cores, 2x Xeon E5-2690, Sandy Bridge-EP)
All nodes are connected with 1 GbE.  We made two separate purchases about 1.5 years 
apart, hence the node disparity.  This cluster is built specifically to support a 
home-grown MPI application (with later growth to other apps).  The application is 
structured somewhat differently than I think is standard, in that rank 0 is purely single 
threaded (a "collector" type process), while the other ranks expand to fill 
every core on their node.

Because of security requirements, there is no shell access via SSH to any of 
the compute nodes (except root).  I think this means no mpirun, and that we 
have to support the scheduler from the start.  I want to set up a special queue 
for our app that:
- Puts rank 0 on one of the small compute nodes (which conceivably could live 
with another rank 0 from a separate instance)
- Spreads all other ranks across the large compute nodes (1 node == 1 rank), up 
to the size specified by a user during job submission.
Since the cluster is so small, I can't really afford to waste a large node on rank 0.  
Processing speed would take a big hit if we end up with one of the non-rank 0 processes 
on a small node.  Alternatively, if we could "double up" rank 0 and one other 
rank, that would probably work as well.

We are already making a roll with our customizations (user authentication, 
custom apps, etc), so we have a place to put any modifications.

So my questions:
- What suggestions, if any, do people have for this type of layout?
- Should I make a separate appliance type for my small/large nodes (so I/SGE 
can tell the difference)?
- My experience with SGE is nil, and I haven't found anything that gives a good 
guide to heterogeneous queues.  Anyone have a resource?
- Has anyone setup a queue like the above (maybe 2 separate resource pools for 
large/small, and pull 1 from small and rest from large?)?  What does it look 
like?
- Can this custom queue play well with others, and still let us use the cluster 
for other programs?
- I know there's a new kernel in my future (to support AVX).  Any thoughts?

Thanks all for your help.

Brian Ventre
Johns Hopkins University
Applied Physics Laboratory

--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840

http://laotsao.blogspot.com/
http://laotsao.wordpress.com/
http://blogs.oracle.com/hstsao/

<<attachment: laotsao.vcf>>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to