Kesim, what you are saying is that Slurm schedukes tasks based on the number of allocated CPUs. Rather than the actual load factor on the server. As I recall Gridengine actually used the load factor.
However you comment that "users run programs on the nodes" and "the slurm is aware about the load of non-slurm jobs" IMHO, in any well-run HPC setup any user running jobs without using the scheduler would have their fingers broken. or at least bruised using the clue stick. Seriously, three points: a) tell users to use 'salloc' and 'srun' to run interactive jobs. They can easily open a Bash session on a compute node and do what they like. Under the Slurm scheduler. b) implement the pam-slurm PAMmodule. It is a few minutes work. This means your users cannot go behind the sluem scheduler and log into the nodes c) on Bright clusters which I configure, you have a healtcheck running which wans you when a user is detected as logging in withotu using Slurm Seriously again. You have implemented an HPC infrastructure, and have gone to the time and effort to implement a batch scheduling system. A batch scheduler can be adapted to let your users do their jobs, including interactive shell sessions and remote visualization sessions. Do not let the users ride roughshod over you. ________________________________________ From: kesim [[email protected]] Sent: 18 March 2017 16:16 To: slurm-dev Subject: [slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load Unbelievable but it seems that nobody knows how to do that. It is astonishing that such sophisticated system fails with such simple problem. The slurm is aware about the cpu load of non-slurm jobs but it does not use the info. My original understanding of LLN was apparently correct. I can practically kill the CPUs on particular node with nonslurm tasks but slurm will diligently submit 7 jobs to this node leaving other idling. I consider this as a serious bug of this program. On Fri, Mar 17, 2017 at 10:32 AM, kesim <[email protected]<mailto:[email protected]>> wrote: Dear All, Yesterday I did some tests and it seemed that the scheduling is following CPU load but I was wrong. My configuration is at the moment: SelectType=select/cons_res SelectTypeParameters=CR_CPU,CR_LLN Today I submitted 70 threaded jobs to the queue and here is the CPU_LOAD info node1 0.08 7/0/0/7 node2 0.01 7/0/0/7 node3 0.00 7/0/0/7 node4 2.97 7/0/0/7 node5 0.00 7/0/0/7 node6 0.01 7/0/0/7 node7 0.00 7/0/0/7 node8 0.05 7/0/0/7 node9 0.07 7/0/0/7 node10 0.38 7/0/0/7 node11 0.01 0/7/0/7 As you can see it allocated 7 CPUs on node 4 with CPU_LOAD 2.97 and 0 CPUs on idling node11. Why such simple thing is not a default? What am I missing??? On Thu, Mar 16, 2017 at 7:53 PM, kesim <[email protected]<mailto:[email protected]>> wrote: Than you for great suggestion. It is working! However the description of CR_LLN is misleading "Schedule resources to jobs on the least loaded nodes (based upon the number of idle CPUs)" Which I understood that if the two nodes has not fully allocated CPUs the node with smaller number of allocated CPUs will take precedence. Therefore the bracketed comment should be removed from the description. On Thu, Mar 16, 2017 at 6:24 PM, Paul Edmon <[email protected]<mailto:[email protected]>> wrote: You should look at LLN (least loaded nodes): https://slurm.schedmd.com/slurm.conf.html That should do what you want. -Paul Edmon- On 03/16/2017 12:54 PM, kesim wrote: ---------- Forwarded message ---------- From: kesim <[email protected]<mailto:[email protected]>> Date: Thu, Mar 16, 2017 at 5:50 PM Subject: Scheduling jobs according to the CPU load To: [email protected]<mailto:[email protected]> Hi all, I am a new user and I created a small network of 11 nodes 7 CPUs per node out of users desktops. I configured slurm as: SelectType=select/cons_res SelectTypeParameters=CR_CPU When I submit a task with srun -n70 task It will fill 10 nodes with 7 tasks/node. However, I have no clue what is the algorithm of choosing the nodes. Users run programs on the nodes and some nodes are more busy than others. It seems logical that the scheduler should submit the tasks to the less busy nodes but it is not the case. In the sinfo -N -o '%N %O %C' I can see that the jobs are allocated to the node11 with the load 2.06 leaving the node4 which is totally idling. That somehow make no sense to me. node1 0.00 7/0/0/7 node2 0.26 7/0/0/7 node3 0.54 7/0/0/7 node4 0.07 0/7/0/7 node5 0.00 7/0/0/7 node6 0.01 7/0/0/7 node7 0.00 7/0/0/7 node8 0.01 7/0/0/7 node9 0.06 7/0/0/7 node10 0.11 7/0/0/7 node11 2.06 7/0/0/7 How can I configure slurm to be able to fill the node with minimum load first? Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Employees of XMA Ltd are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to company policy and outside the scope of the employment of the individual concerned. The company will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising. XMA Limited is registered in England and Wales (registered no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
