*I forgot to precise that there are available nodes with gpu, in the mail
posted before,  i used this command for gpu01*










*scontrol show node nodgpu01NodeName=nodgpu01 Arch=x86_64
CoresPerSocket=12   CPUAlloc=0 CPUErr=0 CPUTot=24 CPULoad=0.10
Features=Haswell,Tesla,k40m   Gres=gpu:4   NodeAddr=sirocco03
NodeHostName=sirocco03 Version=14.11   OS=Linux RealMemory=128704
AllocMem=0 Sockets=2 Boards=1   State=IDLE ThreadsPerCore=1 TmpDisk=1726637
Weight=1   BootTime=2016-02-18T16:48:24
SlurmdStartTime=2016-02-29T10:17:24   CurrentWatts=0 LowestJoules=0
ConsumedJoules=0   ExtSensorsJoules=n/s ExtSensorsWatts=0
ExtSensorsTemp=n/s*


*Regads*

2016-03-03 11:37 GMT+01:00 Redouane Bouchouirbat <[email protected]>:

> Dear all,
>
> I configured gpu nodes in slurm.conf like that :
> ...
> *NodeName=nodgpu[01-05]  Procs=24 CoresPerSocket=12 RealMemory=128000
> Sockets=2 ThreadsPerCore=1 TmpDisk=703488 Gres=gpu:4
> Feature=Haswell,Tesla,k40m*
> ...
>
> *GresTypes=Haswell,Tesla,Westmere,gpu,k40m*
>
> and
>
>
>
> *SelectType=select/cons_resSelectTypeParameters=CR_Socket_Memory*...
>
> the gres.conf file on the five nodes:
>
>
>
>
>
> *Name=gpu File=/dev/nvidia0  CPUs=0,2,4,6,8,10,12,14,16,18,20,22Name=gpu
> File=/dev/nvidia1  CPUs=1,3,5,7,9,11,13,15,17,19,21,23Name=gpu
> File=/dev/nvidia2  CPUs=0,2,4,6,8,10,12,14,16,18,20,22Name=gpu
> File=/dev/nvidia3  CPUs=1,3,5,7,9,11,13,15,17,19,21,23Name=mic Count=0*
>
> The cgroup.conf on each node:
>
>
>
>
>
>
>
>
> *CgroupMountpoint="/sys/fs/cgroup"CgroupAutomount=yesCgroupReleaseAgentDir="/etc/slurm/cgroup"ConstrainRAMSpace=yesAllowedRAMSpace=100ConstrainCores=yesTaskAffinity=no*
>
> The slurm version used is 14.11.11.
>
> when i ask for one node with all their gpus, slurm tells that the node is
> not available.
>
> *salloc -p testgpu -N1 --ntasks-per-node 24  --gres=gpu:4*
>
> salloc: Job allocation 101944 has been revoked.
> salloc: error: Job submit/allocate failed: Requested node configuration is
> not available
>
> the node configuration read by slurm:
>
>
>
>
>
>
>
>
>
>
>
> *scontrol show node nodgpuNodeName=nodgpu Arch=x86_64 CoresPerSocket=12
> CPUAlloc=24 CPUErr=0 CPUTot=24 CPULoad=0.94 Features=Haswell,Tesla,k40m
> Gres=gpu:4   NodeAddr=nodgpu NodeHostName=nodgpu Version=14.11   OS=Linux
> RealMemory=128704 AllocMem=64416 Sockets=2 Boards=1   State=ALLOCATED
> ThreadsPerCore=1 TmpDisk=1726637 Weight=1   BootTime=2016-02-18T16:48:22
> SlurmdStartTime=2016-02-18T17:14:56   CurrentWatts=0 LowestJoules=0
> ConsumedJoules=0   ExtSensorsJoules=n/s ExtSensorsWatts=0
> ExtSensorsTemp=n/s*
> I don't know what is the problem.
> Any idea?
>
> Regards
>
>
>
>
>
>

Reply via email to