Hi folks, I have a question and a I am wondering if we can shed some light on the subject or point me in a direction I have gpu cluster, 8 nodes with 4 gpus each, 16-20 cores per The user would like to schedule in one sbatch 32 independent GPU tasks with 1 GPU and 2 cores per each job. This is what we are doing – #!/bin/bash #SBATCH -D /path #SBATCH -J amber_single_gpu #SBATCH --partition=defq #SBATCH --get-user-env #SBATCH --nodes=8 #SBATCH –cpus-per-task=2 #SBATCH --tasks-per-node=4 #SBATCH --gres=gpu:4 #SBATCH --time=120:00:00 source /etc/profile.d/modules.sh export CUDA_HOME=/cm/shared/apps/cuda80/ export LD_LIBRARY_PATH=/cm/shared/apps/cuda80/lib64/ ( 32 of these.. ) srun --gres=gpu:1 -n1 -N1 pmemd.cuda –O I have been fiddling with various permutations and have not be able to get this to work.. When I do this it says no node has this configuration ( gres has 4 gpus ) Sinfo -Nle NODELIST�� NODES PARTITION������ STATE CPUS��� S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON node001������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� titanx none node002������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� titanx none node003������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� titanx none node004������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� titanx none node005������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� gtx1080 none node006������� 1���� defq*������� idle�� 20�� 2:10:2 257863���� 2038����� 1� gtx1080 none node007������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� gtx1080 none node008������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� gtx1080 none Slurm.conf ( important stuff ) SelectType=select/cons_res SelectTypeParameters=CR_Core #NodeName=node[001-008]� Gres=gpu:4 NodeName=node[001-004] CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2� Gres=gpu:4 Feature=titanx NodeName=node[005-008] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2� Gres=gpu:4 Feature=gtx1080 # Partitions PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO State=UP Nodes=node[001-008] # Generic resources types GresTypes=gpu,mic Thanks, Barrett
