[slurm-dev] Wrong used_cores calculation if MaxCPUsPerNode are used

Marco Ehlert Sun, 07 Feb 2016 15:06:22 -0800

Would like to mention a problem which seems to be a calculation bug inslurm version 15.08.7

If a node is divided into 2 partitions using MaxCPUsPerNodeby this configuration


   slurm.conf:
   NodeName=n1 CPUs=20
   PartitionName=cpu NodeName=n1    MaxCPUsPerNode=16
   PartitionName=gpu NodeName=n1    MaxCPUsPerNode=4

I get a strange scheduling situation.

This situation occurs after a fresh restart of theslurmctld daemon.


I start two jobs one by one:

case 1
   systemctl restart slurmctld.service
   sbatch -n 16 -p cpu cpu.sh
   sbatch -n 1  -p gpu gpu.sh

   => Problem now: The second jobs keeps in PENDING state.

This picture changes if I start the jobs this way

case 2
   systemctl restart slurmctld.service
   sbatch -n 1  -p gpu gpu.sh
   scancel <gpu job_id>
   sbatch -n 16 -p cpu cpu.sh
   sbatch -n 1  -p gpu gpu.sh

and both jobs are running fine.

By looking into the code I figured out a wrong calculation of'used_cores' in function _allocate_sc()


plugins/select/cons_res/job_test.c

_allocate_sc(...)
...
        for (c = core_begin; c < core_end; c++) {
                i = (uint16_t) (c - core_begin) / cores_per_socket;

                if (bit_test(core_map, c)) {
                        free_cores[i]++;
                        free_core_count++;
                } else {
                        used_cores[i]++;
                }
                if (part_core_map && bit_test(part_core_map, c))
                        used_cpu_array[i]++;

This part of code seems to work only if part_core_map exists for apartition. But in case 1 there is no part_core_map created yet, hencethere was no gpu job running before. All non free cores of thecpu partion are counted as used cores in gpu partition and this conditionwill match


   free_cpu_count + used_cpu_count >  job_ptr->part_ptr->max_cpus_per_node

what is definitely wrong.

I changed this code to the following and all works fine.

        for (c = core_begin; c < core_end; c++) {
                i = (uint16_t) (c - core_begin) / cores_per_socket;

               if (bit_test(core_map, c)) {
                        free_cores[i]++;
                        free_core_count++;
                } else {
                    if (part_core_map && bit_test(part_core_map, c)){
                        used_cpu_array[i]++;
                        used_cores[i]++;
                    }
                }


Best,
Marco

[slurm-dev] Wrong used_cores calculation if MaxCPUsPerNode are used

Reply via email to