I find this in version 2.6.9. I am not sure if it still exists in the
later versions. The patch is against version 16.05.

The root cause of this bug is that sometimes time limit of a job is
INFINITE and in such case backfill will fail to start a the job, since
in backfilling some drained nodes are not excluded when testing whether
the job is runnable. 
From c4b58603301f3c885499ba2663fa6d09755fa881 Mon Sep 17 00:00:00 2001
From: Hongjia Cao <[email protected]>
Date: Fri, 16 Oct 2015 12:53:51 +0800
Subject: [PATCH] fix bug in sched/backfill

---
 src/plugins/sched/backfill/backfill.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/plugins/sched/backfill/backfill.c b/src/plugins/sched/backfill/backfill.c
index 4aeabc2..23ee1f4 100644
--- a/src/plugins/sched/backfill/backfill.c
+++ b/src/plugins/sched/backfill/backfill.c
@@ -1069,7 +1069,7 @@ next_task:
 			part_time_limit = YEAR_MINUTES;
 		else
 			part_time_limit = part_ptr->max_time;
-		if (job_ptr->time_limit == NO_VAL) {
+		if (job_ptr->time_limit == NO_VAL || job_ptr->time_limit == INFINITE) {
 			time_limit = part_time_limit;
 			job_ptr->limit_set.time = 1;
 		} else {
@@ -1163,6 +1163,8 @@ next_task:
 			end_time = (time_limit * 60) + start_res;
 		else
 			end_time = (time_limit * 60) + now;
+		if (end_time < now)
+			end_time = INFINITE;
 		resv_end = find_resv_end(start_res);
 		/* Identify usable nodes for this job */
 		bit_and(avail_bitmap, part_ptr->node_bitmap);
-- 
2.6.1

Reply via email to