On Tue, 07 Feb 2012 10:22:49 -0800
Danny Auble <d...@schedmd.com> wrote:

> Yuri, turn on the Priority DebugFlag in the slurm.conf and see what is 
> happening.  Perhaps that would shead some light on the subject.  You can 
> do it from sview or alter the slurm.conf file and scontrol reconfig 
> without having to restart the slurmctld.

Ok, I had to submit ~1000 jobs to make it happen again:

$ sprio -j 465060
Unable to find jobs matching user/id(s) specified
$ squeue -j 465060
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
 465060     batch   sbatch   ydelia  PD       0:00      1 (Priority)
$ sacct -j 465060
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
465060           sbatch      batch    default          0    PENDING      0:0 

The slurmctld.log contains the following:

[2012-02-07T19:39:33] Fairshare priority of job 465060 for user ydelia in acct 
default is 2**(-0.999268/0.050000) = 0.000001 
[2012-02-07T19:39:33] Weighted Age priority is 0.000000 * 1000 = 0.00
[2012-02-07T19:39:33] Weighted Fairshare priority is 0.000001 * 10000 = 0.01
[2012-02-07T19:39:33] Weighted JobSize priority is 0.000000 * 0 = 0.00
[2012-02-07T19:39:33] Weighted Partition priority is 1.000000 * 1000 = 1000.00
[2012-02-07T19:39:33] Weighted QOS priority is 0.000000 * 0 = 0.00
[2012-02-07T19:39:33] Job 465060 priority: 0.00 + 0.01 + 0.00 + 1000.00 + 0.00 
- 1000 = 2.00
[2012-02-07T19:39:33] _slurm_rpc_submit_batch_job JobId=465060 usec=84514
[2012-02-07T19:39:33] Normalized usage for account default off root 
5747776.753815 / 5747776.753815 = 1.000000
[2012-02-07T19:39:33] Effective usage for account default off root 1.000000 
1.000000
[2012-02-07T19:39:33] Decay factor over 300 seconds goes from 0.999998854166667 
-> 0.999656308878391
[2012-02-07T19:39:34] job 460729 ran for 300 seconds on 1 cpus
[2012-02-07T19:39:34] grp_used_cpu_run_secs is 0, will subtract 0
[2012-02-07T19:39:34] grp_used_cpu_run_secs is 0, will subtract 0
....
(followed by what looks like a priority decay run).

It seems that 465060 is the first submitted job (in a row of submissions) where 
priority has not been calculated. It's immediately followed by a decay run. The 
jobs before/after this job just contain the following:

[2012-02-07T19:39:34] Fairshare priority of job 465061 for user ydelia in acct 
default is 2**(-0.999269/0.050000) = 0.000001
[2012-02-07T19:39:34] Weighted Age priority is 0.000000 * 1000 = 0.00
[2012-02-07T19:39:34] Weighted Fairshare priority is 0.000001 * 10000 = 0.01
[2012-02-07T19:39:34] Weighted JobSize priority is 0.000000 * 0 = 0.00
[2012-02-07T19:39:34] Weighted Partition priority is 1.000000 * 1000 = 1000.00
[2012-02-07T19:39:34] Weighted QOS priority is 0.000000 * 0 = 0.00
[2012-02-07T19:39:34] Job 465061 priority: 0.00 + 0.01 + 0.00 + 1000.00 + 0.00 
- 1000 = 2.00
(repeated over and over)

My relevant config (if necessary):

DebugFlags              = Priority
PriorityDecayHalfLife   = 7-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = 0
PriorityMaxAge          = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType            = priority/multifactor
PriorityWeightAge       = 1000
PriorityWeightFairShare = 10000
PriorityWeightJobSize   = 0
PriorityWeightPartition = 1000
PriorityWeightQOS       = 0

Reply via email to