Would like to mention a problem which seems to be a calculation bug in slurm version 15.08.7

If a node is divided into 2 partitions using MaxCPUsPerNode by this configuration

   slurm.conf:
   NodeName=n1 CPUs=20
   PartitionName=cpu NodeName=n1    MaxCPUsPerNode=16
   PartitionName=gpu NodeName=n1    MaxCPUsPerNode=4

I get a strange scheduling situation.
This situation occurs after a fresh restart of the slurmctld daemon.

I start two jobs one by one:

case 1
   systemctl restart slurmctld.service
   sbatch -n 16 -p cpu cpu.sh
   sbatch -n 1  -p gpu gpu.sh

   => Problem now: The second jobs keeps in PENDING state.

This picture changes if I start the jobs this way

case 2
   systemctl restart slurmctld.service
   sbatch -n 1  -p gpu gpu.sh
   scancel <gpu job_id>
   sbatch -n 16 -p cpu cpu.sh
   sbatch -n 1  -p gpu gpu.sh

and both jobs are running fine.


By looking into the code I figured out a wrong calculation of 'used_cores' in function _allocate_sc()

plugins/select/cons_res/job_test.c

_allocate_sc(...)
...
        for (c = core_begin; c < core_end; c++) {
                i = (uint16_t) (c - core_begin) / cores_per_socket;

                if (bit_test(core_map, c)) {
                        free_cores[i]++;
                        free_core_count++;
                } else {
                        used_cores[i]++;
                }
                if (part_core_map && bit_test(part_core_map, c))
                        used_cpu_array[i]++;


This part of code seems to work only if part_core_map exists for a partition. But in case 1 there is no part_core_map created yet, hence there was no gpu job running before. All non free cores of the cpu partion are counted as used cores in gpu partition and this condition will match

   free_cpu_count + used_cpu_count >  job_ptr->part_ptr->max_cpus_per_node

what is definitely wrong.

I changed this code to the following and all works fine.

        for (c = core_begin; c < core_end; c++) {
                i = (uint16_t) (c - core_begin) / cores_per_socket;

               if (bit_test(core_map, c)) {
                        free_cores[i]++;
                        free_core_count++;
                } else {
                    if (part_core_map && bit_test(part_core_map, c)){
                        used_cpu_array[i]++;
                        used_cores[i]++;
                    }
                }


Best,
Marco

Reply via email to