hi please subscribe [email protected] https://gridengine.org/mailman/listinfo/users
regards On 3/23/2012 12:08 PM, Ventre, Brian D. wrote:
All, We are in the process of setting up a Rocks 5.4.3 cluster with SGE and need some advice on setting up queues. Our configuration is: - 1 front-end node (4 cores, Xeon X3460, Lynnfield) - 1 login node (16 cores, 2x Xeon E5-2690, Sandy Bridge-EP) - 4 "small" compute nodes (4 cores, Xeon X3460, Lynnfield) - 5 "large" compute nodes (16 cores, 2x Xeon E5-2690, Sandy Bridge-EP) All nodes are connected with 1 GbE. We made two separate purchases about 1.5 years apart, hence the node disparity. This cluster is built specifically to support a home-grown MPI application (with later growth to other apps). The application is structured somewhat differently than I think is standard, in that rank 0 is purely single threaded (a "collector" type process), while the other ranks expand to fill every core on their node. Because of security requirements, there is no shell access via SSH to any of the compute nodes (except root). I think this means no mpirun, and that we have to support the scheduler from the start. I want to set up a special queue for our app that: - Puts rank 0 on one of the small compute nodes (which conceivably could live with another rank 0 from a separate instance) - Spreads all other ranks across the large compute nodes (1 node == 1 rank), up to the size specified by a user during job submission. Since the cluster is so small, I can't really afford to waste a large node on rank 0. Processing speed would take a big hit if we end up with one of the non-rank 0 processes on a small node. Alternatively, if we could "double up" rank 0 and one other rank, that would probably work as well. We are already making a roll with our customizations (user authentication, custom apps, etc), so we have a place to put any modifications. So my questions: - What suggestions, if any, do people have for this type of layout? - Should I make a separate appliance type for my small/large nodes (so I/SGE can tell the difference)? - My experience with SGE is nil, and I haven't found anything that gives a good guide to heterogeneous queues. Anyone have a resource? - Has anyone setup a queue like the above (maybe 2 separate resource pools for large/small, and pull 1 from small and rest from large?)? What does it look like? - Can this custom queue play well with others, and still let us use the cluster for other programs? - I know there's a new kernel in my future (to support AVX). Any thoughts? Thanks all for your help. Brian Ventre Johns Hopkins University Applied Physics Laboratory
-- Hung-Sheng Tsao Ph D. Founder& Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/
<<attachment: laotsao.vcf>>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
