Hello everybody,
I'm a young system administrator that is moving from Torque/MAUI to Slurm.
I set up a pretty peculiar resource management in the previous queue system
and I would like to port it in the new one.

- I have the following two partitions that are totally independent to each
others (like having to separate queues):

Part A  -->  has 24 cores per node at higher speed, 16 nodes in total;

Part B -->  has 4 cores per node at lower speed, 11 nodes in total.


- There are two kinds of accounts (I hope that this is the right word...):

Acc A   -->  every user can request up to 24 cores/6 nodes (i.e. 144 total
CPUs) for all his/her jobs belonged to Part A, up to 4 cores/11 nodes (i.e.
44 total CPUs) for all his/her jobs belonged to part B, all jobs have very
low priority;

Acc B  -->  each user can request up to 12 cores/1 node (i.e. 12 total
CPUs) per each job in Part A, up to 4 cores/3 nodes (i.e. 12 total CPUs)
per each job in Part B, all jobs have high priority, only 10 jobs for all
users can be executed at the same time in Part A, only 12 jobs can be
queued for each user in Part A, no such limits in Part B.


- There are no time limit for all jobs.


- I did not use any database to track cluster usage in the past. If needed,
I would like to use a very simple one since I have no experience with it.


- The purpose of this set up is to give more resources to users of Acc A
since they're doing a massive usage of the cluster. This being said, all
jobs of Acc B must be executed as soon as resources are available since
they are much quicker.

Could you please suggest me which keywords I should use in slurm.conf file?
And what about the manual, are there any pages I have to check in order to
let this set up to work?
I would like to use the last version of Slurm to get rid of all bugs and
take advantage of the new features that could help me.

Thank you very much for your kindness,

Emanuele

Reply via email to