I tried modifying a lot of the values but the only thing that re-enabled the fairshare priority was addition of the following parameter:
PriorityFlags=FAIR_TREE Nirmal On Mon, Apr 4, 2016 at 3:47 AM, Loris Bennett <[email protected]> wrote: > > Hi Nirmal, > > Nirmal Seenu <[email protected]> writes: > > > Fair share priority stopped working > > > > Hi, > > > > I just noticed that the fair share priority stopped working in the last > few days > > and would appreciate any help in debugging this problem. I am running > Slurm > > version 14.11.11 on Centos 7.2. > > > > I am not sure when it stopped working but the only thing that I changed > was > > PriorityDecayHalfLife=00:10:00 and PriorityUsageResetPeriod=WEEKLY. The > > following is the current values that I have set -- the initial value > when fair > > share was working fine: > > > > PriorityType=priority/multifactor > > PriorityDecayHalfLife=00:01:00 > > I would think that this value for PriorityDecayHalfLife is much too > short. The CPU-time usage will decay very rapidly, so there will be the > contribution to the priority will be the similar both for heavy users > and those who don't consume much CPU-time. I would guess you want a > value more like a single-digit number of days. > > Cheers, > > Loris > > > PriorityUsageResetPeriod=NONE > > PriorityWeightFairshare=10000 > > PriorityWeightAge=100 > > PriorityWeightPartition=10000 > > PriorityWeightJobSize=10000 > > PriorityMaxAge=7-0 > > > > Everything seems to be fine on the database side: > > > > [root@tcs-bcm-1 ~]# sacctmgr list assoc tree > > format=cluster,account,user,fairshare > > Cluster Account User Share > > ---------- -------------------- ---------- --------- > > slurm_clu+ root 1 > > slurm_clu+ root root 1 > > slurm_clu+ dev 50 > > slurm_clu+ dev c 1 > > slurm_clu+ r 1 > > slurm_clu+ r a2 1 > > slurm_clu+ r a1 1 > > slurm_clu+ r b 1 > > slurm_clu+ r d 1 > > slurm_clu+ r e 1 > > slurm_clu+ r j2 1 > > slurm_clu+ r j1 1 > > slurm_clu+ r m4 1 > > slurm_clu+ r m3 1 > > slurm_clu+ r m2 1 > > slurm_clu+ r m1 1 > > slurm_clu+ r r 1 > > slurm_clu+ r s 1 > > slurm_clu+ r t 1 > > > > [root@tcs-bcm-1 ~]# sprio -l | head > > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE > > 1378456 j1 385 10 0 276 100 0 0 > > 1378457 j1 385 10 0 276 100 0 0 > > 1378458 j1 385 10 0 276 100 0 0 > > > > Relevant log entry when I restarted both slurmdbd and slurm: > > /var/log/slurmctld: > > [2016-03-22T17:47:13.533] Running as primary controller > > [2016-03-22T17:47:13.533] Registering slurmctld at port 6817 with > slurmdbd. > > [2016-03-22T17:47:17.817] > > > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0 > > > > /var/log/slurmdbd: > > [2016-03-22T17:46:53.733] Accounting storage MYSQL plugin loaded > > [2016-03-22T17:46:53.735] error: chdir(/var/log): Permission denied > > [2016-03-22T17:46:53.735] chdir to /var/tmp > > [2016-03-22T17:46:53.744] slurmdbd version 14.11.11 started > > [2016-03-22T17:46:57.010] DBD_JOB_START: cluster not registered > > [2016-03-22T17:47:01.910] DBD_STEP_START: cluster not registered > > > > Thanks in advance for your help! > > Nirmal > > > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email [email protected] >
