The patch below should handle sched/wiki fine. The patch that I sent earlier today would recalculate job priority for requeued jobs and all scheduler types (e.g. sched/backfill, sched/wiki, etc.). Since the requeued job would have a new submit time, it is not clear to me if resetting its priority would be best or not.

Quoting Josh England <[email protected]>:

Moe,

I think you already sent me the one-liner that Pär was referring to
(below).  Which patch is the right one to use in my case, or should I
apply both?

diff --git a/src/plugins/sched/wiki/sched_wiki.c
b/src/plugins/sched/wiki/sched_wiki.c
index b33b9d6..9d9a668 100644
--- a/src/plugins/sched/wiki/sched_wiki.c
+++ b/src/plugins/sched/wiki/sched_wiki.c
@@ -171,7 +171,7 @@ char *slurm_sched_strerror( int errnum )
/**************************************************************************/
  void slurm_sched_plugin_requeue( struct job_record *job_ptr, char
*reason )
  {
-       /* Empty. */
+       job_ptr->priority = 0;
  }

/**************************************************************************/


-JE



On Thu, 2011-08-11 at 09:16 -0600, [email protected] wrote:
This would be a bit different problem.

Josh,
I think this patch will fix the problem. Please report results if you try it.

diff --git a/src/slurmctld/job_mgr.c b/src/slurmctld/job_mgr.c
index 8f245cc..acf1172 100644
--- a/src/slurmctld/job_mgr.c
+++ b/src/slurmctld/job_mgr.c
@@ -9223,6 +9223,7 @@ extern int job_requeue (uid_t uid, uint32_t
job_id, slurm_
         job_ptr->suspend_time = (time_t) 0;
         job_ptr->tot_sus_time = (time_t) 0;
         job_ptr->restart_cnt++;
+       _set_job_prio(job_ptr);
         /* Since the job completion logger removes the submit we need
          * to add it again. */
         acct_policy_add_job_submit(job_ptr);


Quoting Pär Andersson <[email protected]>:

> Hi Josh,
>
> Josh England <[email protected]> writes:
>
>> We're running slurm-2.2.4 on CentOS-5.5 using sched/wiki to interface to
>> a custom scheduler, and there seems to be a bug happening anytime a job
>> is requeued in slurm (either manually or due to node failure).
>
> This sounds familiar. I think this is the same issue that we found and
> fixed for sched/wiki2 back in May (included in 2.2.6).
>
> Please look at this thread:
>
> https://groups.google.com/group/slurm-devel/browse_thread/thread/8c2e072f94873103?pli=1
>
> The fix is a one-line change that resets a jobs priority in
> /src/plugins/sched/wiki2/job_requeue.c when requeued:
>
> https://github.com/paran1/slurm/commit/8212b71ec7480cf8bf292fefdb5547bc4a79dbc2
>
> You most likely have to add something very similar to the sched/wiki
> plugin code.
>
> Regards,
> Pär Andersson
> NSC
>












Reply via email to