Hello SLURM developers.

I am investigating SLURM for a potential use at our company. I spent some time 
playing with SLURM and have few questions that are related to the specific 
needs we have. I would like to know whether the functionality we seek is a) 
available "out of the box" b) can be achieved by writing custom plugin or c) 
would require to change SLURM core. Of course, any advice on how to proceed 
would be highly appreciated.Here are the main questions:
1) How to allocate "compound resource". Our software has a client/server 
architecture. We need to allocate resource for (exactly) one server task and 
many client tasks. Server and clients have in general different requirements 
for CPU/task numbers. The requirement is to have all the resources allocated 
under one jobid. For example I would like to allocate resources for 17 tasks 
with exactly 1 task running on node with feature "SERVER" using 1 CPU/task and 
16 tasks running on nodes with feature "CLIENT" with 8 CPUs per task. The 
closest I could come up with is to use the something like salloc -n17 -c8 
--constraint="[SERVER*1]" option but that assumes that the CPU/task is the same 
for the server task and client task and it could actually allocate more than 
one task on the SERVER node.

2) Running job resize. FAQ 24 describes how to add/remove resource to a running 
job. The mechanism for adding resource seems to meet our needs and allows to 
add resource at the task granularity. However, removing resources seems to be 
working at the node level. So for example if I have a node with 32 CPUs and 
running 16 tasks with 2 CPU/tasks on a node then I am not able to only remove 
allocation for one task (2 CPUs) but need to relinquish the whole node. This 
granularity is too course for our use case.
Regards,
Martin

Reply via email to