Hi,

Am 16.03.2011 um 11:32 schrieb Alex Phillips:

> Dear List,
> We have a cluster of 1920 cores spread over 160 nodes (12 cores/node), we 
> only run one code in one queue, with jobs of between 48 and 256 cores using 
> an mpi pe.
> When benchmarking our code we found a 14-15% speedup by running on 6 
> cores/node, compared with 12 cores/node.

yes, this can been seen with certain applications.


> We also found that if we ran on 6 cores/node, with a second job on the other 
> 6cores/node, we still have a 5-6% speedup.
> So I have configured our mpi pe with allocation_rule = 6, and this works, 
> however, as the cluster fills up, the scheduler is starting a second job on 
> some nodes, before all the nodes are busy.

Well, there are two problems: first there is no rule in SGE to prevent a second 
job on a node with this allocation_rule in case there are 12 slots. You could 
configure two queues instead with only 6 slots. The second queue could get a 
load_sensor (type boolean) as load_thresholds, which will enable the queue only 
if the first queue has no slots left (a global load_sensor, which will just 
count the free slots in the primary queue). This is not really safe, as over 
time something is running in the second queue and might misslead the free slot 
count of the primary queue, but it's worth to test it I think.

The second issue which would solve it is a missing seq_no for PEs. Then you 
could setup two PEs for two queues like above, and the second will only be 
taken if nothing left in the first queue due to the seq_no. It's an RFE though.

-- Reuti


> How can we configure the scheduler to run one job on all the nodes, before 
> starting a second job ?
> I have tried defining the number of slots as a complex value on the execution 
> hosts, I’ve tried –np_load_avg, np_load_avg, slots, and -slots as the 
> load_formula, but I can’t get it to work.
> I’ve read 
> _http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least_ but I 
> can’t set the allocation rule to $pe_slots, as we only want to run on 6 
> cores/node, not 12.
> Any suggestions ?
> Regards,
> *Alex Phillips*
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to