Hi Uwe, I also see this from time to time. I’ve been meaning to point this out. Sometimes slurm restarts will cause all jobs in the queue to be niced to the highest priority setting. This is with 14.11.4
Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 > On Mar 12, 2015, at 3:23 AM, Uwe Sauter <[email protected]> wrote: > > > Hi, > > there is a difference in the output of scontrol show job and sprio (14.11.4). > I have two jobs, one was submitted before slurmctld > was restarted, the other one after the restart. > > sprio -l shows: > > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION > QOS NICE > 14115 XXX 10000 0 0 0 0 > 0 -10000 > 14204 XXX 1000 0 0 0 0 > 1000 0 > > while scontrol shows: > > JobId=14115 JobName=XXX > UserId=XXX(XXX) GroupId=XXX(XXX) > Priority=1233 Nice=0 Account=XXX QOS=normal > […] > > JobId=14204 JobName=XXX > UserId=XXX(XXX) GroupId=XXX(XXX) > Priority=1000 Nice=0 Account=XXX QOS=normal > > > While the shown priority is the same for the second job for both programs, it > differs for the first (10000 vs. 1233). The priority > for job 14115 was at 1233 before the restart. > > My understanding is that the actual priority should be saved and restored > when slurmctld is restarted. This is not the case. It > seems that slurmctld forgets the priority and assigns an arbitraty but higher > number to jobs that were on the queue, but then also > reduces niceness. It also forgets the different sub-priorities that come from > the different proportions of the multifactor > priority plugin. > > > Regards, > > Uwe
