RE: [slurm-dev] Patch: fix wiki2 requeue problem.

Jette, Moe Fri, 13 May 2011 08:12:19 -0700

Hi Pär,

The problem fixed by the earlier patch was that when a job was suspended 
or requeued, its priority would be recalculated based upon the new 
submit/requeue
time rather than the original time. We believed that preserving the original 
time
would be better, although it caused problems for use with Moab as you observed.
I believe that your patch is probably the best solution and have applied it.


Moe

________________________________________
From: [email protected] [[email protected]] On Behalf 
Of Pär Andersson [[email protected]]
Sent: Friday, May 13, 2011 7:04 AM
To: [email protected]
Subject: [slurm-dev] Patch: fix wiki2 requeue problem.

Hi,

We discovered a scheduling problem on a cluster recently upgraded from
2.1.15 to 2.2.5, running Moab and wiki2.

The cluster uses job preemption and requeueing. Requeueable jobs that is
in state PENDING after having been requeued at least one time,
effectively block Moab from starting any other job.

I believe that the root cause is that when a job is requeued it keeps
Priority=100000000, instead of being held.

In the following example 1495705 gets requeued and pending. Moab then
tries to start 1495733 which fails.

slurmctld.log:

1495705 is requeued:

[2011-05-13T13:05:53] wiki msg recv:CK=64e731ac73fec193 TS=1305284753 AUTH=moab 
DT=CMD=REQUEUEJOB ARG=1495705
[2011-05-13T13:05:53] wiki: requeued job 1495705
[2011-05-13T13:05:53] wiki msg send:CK=7653740ce9476756 TS=1305284753 
AUTH=slurm DT=SC=0 RESPONSE=job 1495705 requeued successfully
[2011-05-13T13:05:53] completing job 1495705
[2011-05-13T13:06:06] requeue batch job 1495705
...
[2011-05-13T13:08:42] wiki msg recv:CK=7189ead250b2fd76 TS=1305284922 AUTH=moab 
DT=CMD=STARTJOB ARG=1495733 TASKLIST=n212
[2011-05-13T13:08:42] error: wiki: Could not start job 1495733(n212): Resources
[2011-05-13T13:08:42] wiki msg send:CK=4867c632e7c67a1a TS=1305284922 
AUTH=slurm DT=SC=-913 RESPONSE=Could not start job 1495733(n212): Resources

scheduler log lines about job 1495705 and 1495733 from the same time
period:

[2011-05-13T13:04:34] sched: JobId=1495705. State=PENDING. Reason=JobHeldAdmin. 
Priority=0.
[2011-05-13T13:04:35] sched: JobId=1495705 initiated
[2011-05-13T13:04:35] sched: Allocate JobId=1495705 
NodeList=n[302,304-305,307,310-315] #CPUs=80
[2011-05-13T13:04:35] sched: _slurm_rpc_job_step_create: StepId=1495705.0 
n[302,304-305,307,310-315] usec=267
[2011-05-13T13:04:35] sched: _slurm_rpc_step_complete StepId=1495705.0 usec=12
[2011-05-13T13:06:23] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
...
[2011-05-13T13:08:38] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:08:40] sched: JobId=1495733. State=PENDING. Reason=JobHeldAdmin. 
Priority=0.
[2011-05-13T13:08:40] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:08:41] sched: JobId=1495733. State=PENDING. Reason=JobHeldAdmin. 
Priority=0.
[2011-05-13T13:08:41] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:08:42] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:08:42] sched: JobId=1495733. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=nehalem.
[2011-05-13T13:08:43] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:08:43] sched: JobId=1495733. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=nehalem.
[2011-05-13T13:09:10] sched: JobId=1495733. State=PENDING. Reason=JobHeldAdmin. 
Priority=0.
[2011-05-13T13:09:10] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:09:44] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:09:44] sched: JobId=1495733. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=nehalem.
[2011-05-13T13:09:44] sched: JobId=1495705. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=r_nehalem.
[2011-05-13T13:09:44] sched: JobId=1495733. State=PENDING. Reason=Resources. 
Priority=100000000. Partition=nehalem.


I have made a patch that seems to fix the problem for us. See commit
8212b71ec7480cf8bf292fefdb5547bc4a79dbc2 on github:

https://github.com/paran1/slurm/commit/8212b71ec7480cf8bf292fefdb5547bc4a79dbc2

After creating that patch I found the following commit, that sounds like
it might have introduced this problem.

commit 4059a9232bb0415bb40940c42fc9fbbc54a5c5a6
Author: Moe Jette <[email protected]>
Date:   Wed Mar 23 22:04:44 2011 +0000

     -- Do not reset a job's priority when requeued or suspended.
    Fixes bug reported by Bill Brophy, Bull.

What bug did this fix? Would reverting this be more correct than fixing
it in wiki2 like my patch did?

Looking at this also made me realize that priority probably needs to be
reset to 0 in src/plugins/sched/wiki2/suspend_job.c as well, but we
don't use job suspend so unfortunately I can't test that.

Regards,

Pär Andersson
NSC

RE: [slurm-dev] Patch: fix wiki2 requeue problem.

Reply via email to