Hi folks,

 

I have a question and a I am wondering if we can shed some light on the subject 
or point me in a direction

 

I have gpu cluster, 8 nodes with 4 gpus each, 16-20 cores per

 

 

The user would like to schedule in one sbatch 32 independent GPU tasks with 1 
GPU and 2 cores per each job.

 

 

This is what we are doing –

 

#!/bin/bash

#SBATCH -D /path 

#SBATCH -J amber_single_gpu

#SBATCH --partition=defq

#SBATCH --get-user-env

#SBATCH --nodes=8

#SBATCH –cpus-per-task=2

#SBATCH --tasks-per-node=4

#SBATCH --gres=gpu:4

#SBATCH --time=120:00:00

 

 

source /etc/profile.d/modules.sh

export CUDA_HOME=/cm/shared/apps/cuda80/

export LD_LIBRARY_PATH=/cm/shared/apps/cuda80/lib64/

 

( 32 of these.. )

srun --gres=gpu:1 -n1 -N1 pmemd.cuda –O

 

 

 

I have been fiddling with various permutations and have not be able to get this 
to work.. 

 

When I do this it says no node has this configuration ( gres has 4 gpus )

 

Sinfo -Nle

 

NODELIST�� NODES PARTITION������ STATE CPUS��� S:C:T MEMORY TMP_DISK WEIGHT 
AVAIL_FE REASON

node001������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� 
titanx none

node002������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� 
titanx none

node003������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� 
titanx none

node004������� 1���� defq*������� idle�� 16��� 2:8:2 257870���� 2038����� 1�� 
titanx none

node005������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� 
gtx1080 none

node006������� 1���� defq*������� idle�� 20�� 2:10:2 257863���� 2038����� 1� 
gtx1080 none

node007������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� 
gtx1080 none

node008������� 1���� defq*������� idle�� 20�� 2:10:2 257864���� 2038����� 1� 
gtx1080 none

 

 

Slurm.conf ( important stuff )

 

SelectType=select/cons_res

SelectTypeParameters=CR_Core

 

#NodeName=node[001-008]� Gres=gpu:4

NodeName=node[001-004] CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2� 
Gres=gpu:4 Feature=titanx

NodeName=node[005-008] CPUs=20 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2� 
Gres=gpu:4 Feature=gtx1080

 

 

# Partitions

PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL DisableRootJobs=NO 
RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO 
AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO PriorityJobFactor=1 
PriorityTier=1 OverSubscribe=NO State=UP Nodes=node[001-008]

# Generic resources types

GresTypes=gpu,mic

 

Thanks,

 

Barrett

Reply via email to