Hi all, We have a SLURM cluster with 2 gpus per node. There are two quite interesting issues. I am sending the both issues in a single email, because I guess they are linked somehow.
ISSUE 1:
When CR_CORE_DEFAULT_DIST_BLOCK is set and gres:gpu=1 is requested by user,
then slurmctld dies with segmentation fault. When --gres:gpu=2, then it
works fine.
I found that the segfault happens in
./src/plugins/select/cons_res/dist_tasks.c:
/*
* If SelectTypeParameters mentions to use a block distribution for
* cores by default, use that kind of distribution if no particular
* cores distribution specified.
* Note : cyclic cores distribution, which is the default, is
treated
* by the next code block
*/
if ( slurmctld_conf.select_type_param & CR_CORE_DEFAULT_DIST_BLOCK
) {
switch(job_ptr->details->task_dist) {
case SLURM_DIST_ARBITRARY:
case SLURM_DIST_BLOCK:
case SLURM_DIST_CYCLIC:
case SLURM_DIST_UNKNOWN:
_block_sync_core_bitmap(job_ptr, cr_type);
<-------------------
return SLURM_SUCCESS;
}
}
Disabling CR_CORE_DEFAULT_DIST_BLOCK fixes the segfaults. In particular
slurmctld dies on this line:
sufficient = sockets_cpu_cnt[s] >= req_cpus
;
because s=3154116728 (according gdb), which, in turn, (my guess) happens
because ntasks_per_core=65535
in the same function, which looks like an integer overflow somewhere.
Stack trace is attached.
ISSUE 2:
When user requests 2 gpus, then job *always* rejected. For example:
[roman@headnode ~]$ srun -N1 -c2 -n2 --gres=gpu:2 -p k20 hostname
srun: error: Unable to allocate resources: Requested node configuration is
not available
[roman@headnode ~]$
When cons_res is enabled:
[root@headnode ~]# grep Select /etc/slurm/slurm.conf
SelectType=select/cons_res
#SelectTypeParameters=CR_Core,CR_CORE_DEFAULT_DIST_BLOCK
SelectTypeParameters=CR_Core
[root@headnode ~]# grep debug -i /etc/slurm/slurm.conf
DebugFlags=Gres,CPU_BIND,Steps
SlurmctldDebug=5
SlurmdDebug=5
then I see these errors in /var/log/slurmctld:
[2013-07-24T01:03:36+08:00] cons_res: _can_job_run_on_node: 0 cpus on
node007(0), mem 0/64000
[2013-07-24T01:03:36+08:00] cons_res: _can_job_run_on_node: 0 cpus on
node008(0), mem 0/64000
When user requests 1 gpu per node, then it works fine:
[2013-07-24T01:11:59+08:00] cons_res: _can_job_run_on_node: 8 cpus on
node007(0), mem 0/1
[2013-07-24T01:11:59+08:00] cons_res: _can_job_run_on_node: 8 cpus on
node008(0), mem 0/1
When cons_res is disabled, but 2 gpus are requested I see:
[2013-07-24T01:17:56+08:00] gres: gpu state for job 3623
[2013-07-24T01:17:56+08:00] gres_cnt:2 node_cnt:0
[2013-07-24T01:17:56+08:00] _pick_best_nodes: job 3623 never runnable
[2013-07-24T01:17:56+08:00] debug: (node_scheduler.c:165) job id: 3623 --
No nodes in bitmap of job_record!
[2013-07-24T01:17:56+08:00] debug: (node_scheduler.c:1785) job id: 3623 --
job_record->gres: (gpu:2), job_record->gres_alloc: ()
[2013-07-24T01:17:56+08:00] debug: (node_scheduler.c:1687) job id: 3623 --
job_record->gres: (gpu:2), job_record->gres_alloc: ()
[2013-07-24T01:17:56+08:00] _slurm_rpc_allocate_resources: Requested node
configuration is not available
Nodes are configured this way:
NodeName=node008 Arch=x86_64 CoresPerSocket=8
CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=0.00 Features=(null)
Gres=gpu:2
NodeAddr=node008 NodeHostName=node008
OS=Linux RealMemory=64000 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2013-07-23T00:31:38 SlurmdStartTime=2013-07-24T00:07:13
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
Each /etc/slurm/gres.conf contains these lines:
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia1 CPUs=8-15
This issue can also be related on
https://groups.google.com/forum/#!topic/slurm-devel/N5j1AjAbsbw
but disabling CPU binding does not help.
Any ideas about this puzzle are highly appropriated!
Best regards,
Taras
bt_slurm.log
Description: Binary data
