[slurm-dev] Re: Load based scheduling limits

Moe Jette Tue, 27 Nov 2012 14:31:09 -0800

Slurm is designed to dedicate resources (e.g. CPUs, memory, GPUs,  
etc.) and bind those resources to specific user jobs. Slurm does  
record system load and this could be used as a basis of scheduling  
decisions with minor code changes, but that will probably severely  
impact parallel job performance, which is what Slurm was really  
designed for.


Quoting Mario Kadastik <[email protected]>:

>
> Hi,
>
> is it possible to configure SLURM to use system load as a basis on  
> where to schedule as well as when to stop sending new jobs to shared  
> resources?
>
> The reason I ask is that we have 24 core and 32 core nodes and  
> 99.99% of our jobs are single core jobs. So we'd configure SLURM  
> with consumable resources and allow therefore 24 or 32 jobs to share  
> the nodes. We'd however like it to be done so that if say we had 170  
> work servers the first 170 jobs would each get one of the servers,  
> the next 170 would become the second job in each server etc.
>
> The second part is that we'd like during transition period run slurm  
> and torque in parallel. This means that we've already set up torque  
> to schedule based on system load and pbs_mom has an attribute of  
> max_load that is adhered to therefore if the system load rises above  
> 10% of the core count (27 for 24 and 36 for 32) there are no more  
> jobs scheduled as it's assumed the node is busy. It can lead to even  
> blocking the scheduling of further jobs even though count wise there  
> may be free slots still. Can the same be done for slurm? This way  
> the two could schedule jobs both and only if the cluster really gets  
> full would both stop scheduling until some jobs end and the load  
> drops below some margin.
>
> If that can be achieved, then can you let me know what I'd need in  
> the slurm.conf to make it work like this :) Or is that the default  
> behavior.
>
> Thanks,
>
> Mario Kadastik, PhD
> Researcher
>
> ---
>   "Physics is like sex, sure it may have practical reasons, but  
> that's not why we do it"
>      -- Richard P. Feynman
>

[slurm-dev] Re: Load based scheduling limits

Reply via email to