Hi,

I'm using Slurm 16.05.8 release.

I have no problem to schedule an interactive session with 4 CPUs on a
regular compute node using the following command:

$ salloc --immediate -p normal --constraint=xeon-e5 --cpus-per-task=4  srun
--pty bash -i

However, it won't work when I tried to schedule an interactive session with
4 CPUs on a GPU compute node using the following command:

$ salloc --immediate -p gpu --gres=gpu:tesla:1 --constraint=xeon-e5
--cpus-per-task=4 srun --pty bash -i
salloc: error: Job submit/allocate failed: Requested node configuration is
not available
salloc: Job allocation 4674359 has been revoked.

There are many idle GPU nodes.  If I remove the option, --cpus-per-task=4,
I can get an interactive session on a GPU node.

$ salloc --immediate -p gpu --gres=gpu:tesla:1 --constraint=xeon-e5 srun
--pty bash -i
salloc: Granted job allocation 4674395

Any suggestions where to look at to resolve the issue?

The Slurm control daemon log says:

[2017-02-17T09:01:40.425] _pick_best_nodes: job 4674359 never runnable in
partition gpu
[2017-02-17T09:01:40.425] _slurm_rpc_allocate_resources: Requested node
configuration is not available


/etc/slurm/slurm.conf
. . .
GresTypes=dbjob,gpu
NodeName=xxxx  Feature=opteron Gres=dbjob:1       RealMemory=128813
Sockets=2 CoresPerSocket=12 ThreadsPerCore=1 State=UNKNOWN
NodeName=yyyy  Feature=xeon-e5 Gres=gpu:tesla:4 RealMemory=515759 Sockets=2
CoresPerSocket=14 ThreadsPerCore=1 State=UNKNOWN

PartitionName=normal  Nodes=xxxxx  Default=YES MaxTime=INFINITE Shared=NO
State=UP
PartitionName=gpu       Nodes=yyyy  Default=NO  MaxTime=INFINITE Shared=NO
State=UP

/etc/slurm/gres.conf
Name=gpu Type=tesla File=/dev/nvidia0 CPUs=0,1
Name=gpu Type=tesla File=/dev/nvidia1 CPUs=0,1
Name=gpu Type=tesla File=/dev/nvidia2 CPUs=2,3
Name=gpu Type=tesla File=/dev/nvidia3 CPUs=2,3

Thanks,
- Chansup

Reply via email to