Hi,

I just noticed that the fair share priority stopped working in the last few
days and would appreciate any help in debugging this problem. I am running
Slurm version 14.11.11 on Centos 7.2.

I am not sure when it stopped working but the only thing that I changed was
PriorityDecayHalfLife=00:10:00 and PriorityUsageResetPeriod=WEEKLY. The
following is the current values that I have set -- the initial value when
fair share was working fine:

PriorityType=priority/multifactor
PriorityDecayHalfLife=00:01:00
PriorityUsageResetPeriod=NONE
PriorityWeightFairshare=10000
PriorityWeightAge=100
PriorityWeightPartition=10000
PriorityWeightJobSize=10000
PriorityMaxAge=7-0


Everything seems to be fine on the database side:

[root@tcs-bcm-1 ~]# sacctmgr list assoc tree
format=cluster,account,user,fairshare
   Cluster              Account       User     Share
---------- -------------------- ---------- ---------
slurm_clu+ root                                    1
slurm_clu+  root                      root         1
slurm_clu+  dev                                   50
slurm_clu+   dev                c         1
slurm_clu+  r                             1
slurm_clu+   r             a2         1
slurm_clu+   r              a1         1
slurm_clu+   r           b         1
slurm_clu+   r            d         1
slurm_clu+   r              e         1
slurm_clu+   r            j2         1
slurm_clu+   r               j1         1
slurm_clu+   r         m4         1
slurm_clu+   r          m3         1
slurm_clu+   r             m2         1
slurm_clu+   r              m1         1
slurm_clu+   r         r         1
slurm_clu+   r              s         1
slurm_clu+   r            t         1


[root@tcs-bcm-1 ~]# sprio -l | head
          JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE
PARTITION        QOS   NICE
        1378456     j1        385         10          0        276
100          0      0
        1378457     j1        385         10          0        276
100          0      0
        1378458     j1        385         10          0        276
100          0      0

Relevant log entry when I restarted both slurmdbd and slurm:
/var/log/slurmctld:
[2016-03-22T17:47:13.533] Running as primary controller
[2016-03-22T17:47:13.533] Registering slurmctld at port 6817 with slurmdbd.
[2016-03-22T17:47:17.817]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0


/var/log/slurmdbd:
[2016-03-22T17:46:53.733] Accounting storage MYSQL plugin loaded
[2016-03-22T17:46:53.735] error: chdir(/var/log): Permission denied
[2016-03-22T17:46:53.735] chdir to /var/tmp
[2016-03-22T17:46:53.744] slurmdbd version 14.11.11 started
[2016-03-22T17:46:57.010] DBD_JOB_START: cluster not registered
[2016-03-22T17:47:01.910] DBD_STEP_START: cluster not registered

Thanks in advance for your help!
Nirmal

Reply via email to