Hey @all,

we are running slurm-2.6.6 on our BULL cluster which has

Intel(R) Xeon(R)  E5-2680 v3

cpus with hyperthreading enabled. Furthermore we are using

SelectType=select/cons_res
PriorityType=priority/multifactor

With the Multifactor Priority Plugin we want to use Fair-share scheduling and hence need accurate usage data. But there is a problem how the RawUsage is defined. In priority_multifactor.c function _apply_new_usage(...) one has

real_decay = run_decay * (double)job_ptr->total_cpus;
...
assoc->usage->usage_raw += (long double)real_decay;

where run_decay are mainly the seconds that the job ran (in the last period) and total_cpus causes the problem.

Using a full node (24 physical cores, each with 2 hyperthreads) exclusively gives different values for total_cpus depending on the setting of --threads-per-core. If one is using HT (--threads-per-core=2) then 48 CPUs are accounted, otherwise only 24. This behaviour might be comprehensible from the Consumable Resource Allocation Plugin viewpoint, but it causes the accounting to be wrong. For the above example, the HT case gets accounted the double amount of RawUsage although both cases are using a full node exclusively.

Do you have any idea if this is an intended behaviour or a bug?

Many thanks in advance,
Hendryk

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to