Hello,

I need some guidance about how to setup a priority structure letting
me de following.

I have 8 graphics cards 64 cores and 256 gb memory in a single computer.

I have to take care of GPU usage so every user has basically the same
priority to submit jobs.

There is no limit on the gpu request ammount but it should be took in
consideration for the priority of jobs in the queue. More gpu time
used on last 7 days, less priority for the jobs in the queue.

At the moment I have the slurm working with:

slurm.conf
ControlMachine=etse-75-51
AuthType=auth/munge
CryptoType=crypto/munge
MpiDefault=none
ReturnToService=1
SlurmctldPidFile=/var/slurm/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/slurm/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/slurm/SlurmdSpoolDir
SlurmUser=slurm
StateSaveLocation=/var/slurm/StateSaveLocation
SwitchType=switch/none
TaskPlugin=task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityFavorSmall=NO
PriorityMaxAge=7-0
PriorityWeightAge=10
PriorityWeightFairshare=1000
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS= 0
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
PriorityWeightTRES=CPU=1,MEM=1,gres/gpu=100
PriorityFlags=MAX_TRES
ClusterName=dcc
DebugFlags=Gres
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=1
SlurmctldLogFile=/var/slurm/log/ctld.log
SlurmdDebug=1
SlurmdLogFile=/var/slurm/log/d.log
GresTypes=gpu
NodeName=etse-75-51 CPUs=64 RealMemory=257847 Sockets=2
CoresPerSocket=16 ThreadsPerCore=2 Gres=gpu:8
PartitionName=dcc Nodes=etse-75-51 Default=YES
TRESBillingWeights="CPU=1.0,Mem=1.0G,gres/gpu=100.0" MaxTime=INFINITE
State=UP

gres.conf
NodeName=etse-75-51 Name=gpu Count=8 File=/dev/nvidia[0-7] CPUs=0-63

Thanks in advance

Jorge.

Reply via email to