On Thu, Oct 27, 2011 at 12:22:52PM -0400, He, Janet (NIH/NIAID) [C] wrote:
Hi all,

1.  What is the purpose of all.q in SGE?  What is the optimize way to configure 
all.q?  Should all.q have all the cores in the cluster or a subset of the 
cores?  Since the host list for all.q includes all the nodes in the cluster, 
when jobs submit to this queue, how to manage the oversubscribtion for those 
nodes that also belong to other queues?

All.q is a default queue that contains all of the exec *hosts* in your
system.  If you want to use a subset of your compute nodes, you
certainly can.

You will have to manage oversubscription by other means.  There are
several options (and this is not a comprehensive list, I'm sure)

* configure the queues such that they do not share exec hosts.
* Set appropriate load thresholds on the queue instances so new
  jobs are not sent to nodes when the load is "too high"
* Configure each queue such that there is a limited number of slots on
  each node, such that the slots used by all queues on an exec host does
  not exceed the nubmer of CPU cores.

2.  Is there a way to assign node to multiple queues but not make the node over 
subscribed? Are there any tips or reference site with the information?

Yes, see above.

3. How exactly np_load_avg used in the job scheduing?  Is this applying to all 
the nodes in the queue?  How do I calculate the value for np_load_avg if the 
nodes specification in the queue are not the same?

SGE will dispatch new jobs to the "least busy" exec hosts, as determined by the 
value of np_load_avg for each exec host.

I don't understand the second part of the question.  However,
np_load_average should be comparable between different systems, even
when the hardware is not the same.  This is because np_load_avg equals
the 5 minute load, divided by the number of CPU cores that SGE finds on
that host.  A load of "1" on a single-core box would be considered
the same as a load of "4" on a quad-core box, for example.

4.  If a node is hyper-threading enabled, should we count the real core number 
in SGE or using the hyper-threading core number in SGE, i.e for 8-core node,  
with hyper-threading, the core becomes 16.   Should SGE uses 8 or 16 in the 
configuration?

That depends on your specific jobs.  Some will benefit from
hyperthreading, and some will not.  You should try it both ways to see
which works best.


Thanks in advance for your help.

Janet He
Linux/HPC Team



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to