Hello, I need some guidance about how to setup a priority structure letting me de following.
I have 8 graphics cards 64 cores and 256 gb memory in a single computer. I have to take care of GPU usage so every user has basically the same priority to submit jobs. There is no limit on the gpu request ammount but it should be took in consideration for the priority of jobs in the queue. More gpu time used on last 7 days, less priority for the jobs in the queue. At the moment I have the slurm working with: slurm.conf ControlMachine=etse-75-51 AuthType=auth/munge CryptoType=crypto/munge MpiDefault=none ReturnToService=1 SlurmctldPidFile=/var/slurm/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/slurm/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/slurm/SlurmdSpoolDir SlurmUser=slurm StateSaveLocation=/var/slurm/StateSaveLocation SwitchType=switch/none TaskPlugin=task/cgroup InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core PriorityType=priority/multifactor PriorityDecayHalfLife=7-0 PriorityFavorSmall=NO PriorityMaxAge=7-0 PriorityWeightAge=10 PriorityWeightFairshare=1000 PriorityWeightJobSize=0 PriorityWeightPartition=0 PriorityWeightQOS= 0 AccountingStorageType=accounting_storage/slurmdbd AccountingStorageTRES=gres/gpu PriorityWeightTRES=CPU=1,MEM=1,gres/gpu=100 PriorityFlags=MAX_TRES ClusterName=dcc DebugFlags=Gres JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=1 SlurmctldLogFile=/var/slurm/log/ctld.log SlurmdDebug=1 SlurmdLogFile=/var/slurm/log/d.log GresTypes=gpu NodeName=etse-75-51 CPUs=64 RealMemory=257847 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Gres=gpu:8 PartitionName=dcc Nodes=etse-75-51 Default=YES TRESBillingWeights="CPU=1.0,Mem=1.0G,gres/gpu=100.0" MaxTime=INFINITE State=UP gres.conf NodeName=etse-75-51 Name=gpu Count=8 File=/dev/nvidia[0-7] CPUs=0-63 Thanks in advance Jorge.
