Yuri D'Elia wrote:
> On Thu, 9 Feb 2012 18:03:00 +0100
> "Yuri D'Elia" <[email protected]> wrote:
>
>   
>> On Tue, 7 Feb 2012 19:59:22 +0100
>> "Yuri D'Elia" <[email protected]> wrote:
>>
>>     
>>> [2012-02-07T19:39:33] Normalized usage for account default off root 
>>> 5747776.753815 / 5747776.753815 = 1.000000
>>> [2012-02-07T19:39:33] Effective usage for account default off root 1.000000 
>>> 1.000000
>>> [2012-02-07T19:39:33] Decay factor over 300 seconds goes from 
>>> 0.999998854166667 -> 0.999656308878391
>>> [2012-02-07T19:39:34] job 460729 ran for 300 seconds on 1 cpus
>>> [2012-02-07T19:39:34] grp_used_cpu_run_secs is 0, will subtract 0
>>> [2012-02-07T19:39:34] grp_used_cpu_run_secs is 0, will subtract 0
>>> ....
>>> (followed by what looks like a priority decay run).
>>>
>>> It seems that 465060 is the first submitted job (in a row of submissions) 
>>> where priority has not been calculated. It's immediately followed by a 
>>> decay run. The jobs before/after this job just contain the following:
>>>       
>> It seems that every time "PriorityCalcPeriod" is run, some jobs (usually the 
>> newer ones, beyond some sort of threshold) loose their "association" with 
>> priorities. I wonder if backfilling limits has anything to do with that?
>>     
>
> Just wanted to add that this seems to be a bug in the priority/multifactor 
> plugin. Indeed every time priorities are recalculated (as determined by 
> PriorityCalcPeriod), jobs simply lose their priority/association.
>   

Hi,

We also have these problems with some jobs going down to priority one 
and staying
there, even though there should be at least one priority point added 
each minute
(PriorityWeightAge=20160, PriorityMaxAge=14-0) and actually also other 
priority
factors that should put the total priority a bit above 100000.

We run PriorityType=priority/multifactor with a slurmdbd talking with a 
MySQL
server. SLURM version is 2.4.0-pre3. (I have seen this problem also in 
versions
below 2.4.)

When running "sprio -l", those priority-one jobs are not showed.

How can this problem be fixed? Our users do not like when their jobs go
into this state.

Cheers,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
   http://www.uppmax.uu.se

Reply via email to