Hello SLURM developers. I am investigating SLURM for a potential use at our company. I spent some time playing with SLURM and have few questions that are related to the specific needs we have. I would like to know whether the functionality we seek is a) available "out of the box" b) can be achieved by writing custom plugin or c) would require to change SLURM core. Of course, any advice on how to proceed would be highly appreciated.Here are the main questions: 1) How to allocate "compound resource". Our software has a client/server architecture. We need to allocate resource for (exactly) one server task and many client tasks. Server and clients have in general different requirements for CPU/task numbers. The requirement is to have all the resources allocated under one jobid. For example I would like to allocate resources for 17 tasks with exactly 1 task running on node with feature "SERVER" using 1 CPU/task and 16 tasks running on nodes with feature "CLIENT" with 8 CPUs per task. The closest I could come up with is to use the something like salloc -n17 -c8 --constraint="[SERVER*1]" option but that assumes that the CPU/task is the same for the server task and client task and it could actually allocate more than one task on the SERVER node.
2) Running job resize. FAQ 24 describes how to add/remove resource to a running job. The mechanism for adding resource seems to meet our needs and allows to add resource at the task granularity. However, removing resources seems to be working at the node level. So for example if I have a node with 32 CPUs and running 16 tasks with 2 CPU/tasks on a node then I am not able to only remove allocation for one task (2 CPUs) but need to relinquish the whole node. This granularity is too course for our use case. Regards, Martin
