Hello everybody, I'm a young system administrator that is moving from Torque/MAUI to Slurm. I set up a pretty peculiar resource management in the previous queue system and I would like to port it in the new one.
- I have the following two partitions that are totally independent to each others (like having to separate queues): Part A --> has 24 cores per node at higher speed, 16 nodes in total; Part B --> has 4 cores per node at lower speed, 11 nodes in total. - There are two kinds of accounts (I hope that this is the right word...): Acc A --> every user can request up to 24 cores/6 nodes (i.e. 144 total CPUs) for all his/her jobs belonged to Part A, up to 4 cores/11 nodes (i.e. 44 total CPUs) for all his/her jobs belonged to part B, all jobs have very low priority; Acc B --> each user can request up to 12 cores/1 node (i.e. 12 total CPUs) per each job in Part A, up to 4 cores/3 nodes (i.e. 12 total CPUs) per each job in Part B, all jobs have high priority, only 10 jobs for all users can be executed at the same time in Part A, only 12 jobs can be queued for each user in Part A, no such limits in Part B. - There are no time limit for all jobs. - I did not use any database to track cluster usage in the past. If needed, I would like to use a very simple one since I have no experience with it. - The purpose of this set up is to give more resources to users of Acc A since they're doing a massive usage of the cluster. This being said, all jobs of Acc B must be executed as soon as resources are available since they are much quicker. Could you please suggest me which keywords I should use in slurm.conf file? And what about the manual, are there any pages I have to check in order to let this set up to work? I would like to use the last version of Slurm to get rid of all bugs and take advantage of the new features that could help me. Thank you very much for your kindness, Emanuele