I tried modifying a lot of the values but the only thing that re-enabled
the fairshare priority was addition of the following parameter:

PriorityFlags=FAIR_TREE

Nirmal

On Mon, Apr 4, 2016 at 3:47 AM, Loris Bennett <[email protected]>
wrote:

>
> Hi Nirmal,
>
> Nirmal Seenu <[email protected]> writes:
>
> > Fair share priority stopped working
> >
> > Hi,
> >
> > I just noticed that the fair share priority stopped working in the last
> few days
> > and would appreciate any help in debugging this problem. I am running
> Slurm
> > version 14.11.11 on Centos 7.2.
> >
> > I am not sure when it stopped working but the only thing that I changed
> was
> > PriorityDecayHalfLife=00:10:00 and PriorityUsageResetPeriod=WEEKLY. The
> > following is the current values that I have set -- the initial value
> when fair
> > share was working fine:
> >
> > PriorityType=priority/multifactor
> > PriorityDecayHalfLife=00:01:00
>
> I would think that this value for PriorityDecayHalfLife is much too
> short.  The CPU-time usage will decay very rapidly, so there will be the
> contribution to the priority will be the similar both for heavy users
> and those who don't consume much CPU-time.  I would guess you want a
> value more like a single-digit number of days.
>
> Cheers,
>
> Loris
>
> > PriorityUsageResetPeriod=NONE
> > PriorityWeightFairshare=10000
> > PriorityWeightAge=100
> > PriorityWeightPartition=10000
> > PriorityWeightJobSize=10000
> > PriorityMaxAge=7-0
> >
> > Everything seems to be fine on the database side:
> >
> > [root@tcs-bcm-1 ~]# sacctmgr list assoc tree
> > format=cluster,account,user,fairshare
> > Cluster Account User Share
> > ---------- -------------------- ---------- ---------
> > slurm_clu+ root 1
> > slurm_clu+ root root 1
> > slurm_clu+ dev 50
> > slurm_clu+ dev c 1
> > slurm_clu+ r 1
> > slurm_clu+ r a2 1
> > slurm_clu+ r a1 1
> > slurm_clu+ r b 1
> > slurm_clu+ r d 1
> > slurm_clu+ r e 1
> > slurm_clu+ r j2 1
> > slurm_clu+ r j1 1
> > slurm_clu+ r m4 1
> > slurm_clu+ r m3 1
> > slurm_clu+ r m2 1
> > slurm_clu+ r m1 1
> > slurm_clu+ r r 1
> > slurm_clu+ r s 1
> > slurm_clu+ r t 1
> >
> > [root@tcs-bcm-1 ~]# sprio -l | head
> > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE
> > 1378456 j1 385 10 0 276 100 0 0
> > 1378457 j1 385 10 0 276 100 0 0
> > 1378458 j1 385 10 0 276 100 0 0
> >
> > Relevant log entry when I restarted both slurmdbd and slurm:
> > /var/log/slurmctld:
> > [2016-03-22T17:47:13.533] Running as primary controller
> > [2016-03-22T17:47:13.533] Registering slurmctld at port 6817 with
> slurmdbd.
> > [2016-03-22T17:47:17.817]
> >
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0
> >
> > /var/log/slurmdbd:
> > [2016-03-22T17:46:53.733] Accounting storage MYSQL plugin loaded
> > [2016-03-22T17:46:53.735] error: chdir(/var/log): Permission denied
> > [2016-03-22T17:46:53.735] chdir to /var/tmp
> > [2016-03-22T17:46:53.744] slurmdbd version 14.11.11 started
> > [2016-03-22T17:46:57.010] DBD_JOB_START: cluster not registered
> > [2016-03-22T17:47:01.910] DBD_STEP_START: cluster not registered
> >
> > Thanks in advance for your help!
> > Nirmal
> >
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin         Email [email protected]
>

Reply via email to