After giving this some more thought, treating any job with a reservation as 
having a higher priority
will result in better scheduling performance with or without the sched/backfill 
plugin. The attached
patch will not change the job priorities, but will treat a job with a 
reservation like it has a higher
priority than jobs without reservations. There is a patch attached which is 
just a few lines of code.
It changes the sort algorithm to consider job reservations in addition to job 
priority and preemptability.
This will be included in SLURM v2.2.4.
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of [email protected] [[email protected]]
Sent: Wednesday, March 16, 2011 8:25 AM
To: [email protected]
Subject: [slurm-dev] Jobs requesting a reservation prevented from running

Here is a very simple scenario on SLURM 2.2.3 that prevents a job requesting 
use of a reservation from running.

1.  A reservation is created for user 'A' for some number of nodes.

2. Prior to the start time of the reservation,  user 'B' requests a job that 
needs one or more of the nodes in the reservation,  but the time limit on his 
job would run into the reservation time so his job is PENDING for reason 
'Resources'.

3. Still prior to the start time of the reservation,  user A requests a job 
using the reservation.  Because the start time has not arrived, his job is also 
PENDING, with reason 'Reservation'.

4. When the start time of the reservation arrives,  user A's job now goes to 
reason 'Resources'.    But because user A's job is in the queue behind user B's 
job, it can't start.

5. When the reservation reaches its end time, the reservation expires.   Now 
B's job can run, so it does.  When B's job finishes,  A's job is the next in 
the queue, but it can't run because the reservation has expired.   And the 
reservation can't be cleaned up because there is a job attached.   So you end 
up with a job that won't start and an expired reservation that won't go away.

This only occurs on a job that is queued up on the reservation before the 
StartTime.   A job that is submitted by A after the reservation StartTime 
bypasses the waiting B's job on the same partition and runs.   However, if a 
second job by A is requested, and gets set to PENDING with reason 'Resources' 
because it has to wait for the first 'A' job, then it also ends up waiting 
behind user B's job.

I am not sure what is supposed to happen here.   The "Resource Reservation 
Guide" at "https://computing.llnl.gov/linux/slurm/reservations.html"; does not 
state what occurs when a user that is not on the reservation tries to run a job 
that uses some of the resources in the reservation.   Should user B's job have 
been rejected instead of going to the PENDING state?   Should the priority of 
user A's jobs been increased to allow them to go to the head of the queue?  Or 
am I just misunderstanding something about how the reservations work?

        -Don Albert-
Index: src/slurmctld/job_scheduler.c
===================================================================
--- src/slurmctld/job_scheduler.c	(revision 22791)
+++ src/slurmctld/job_scheduler.c	(working copy)
@@ -605,12 +605,20 @@
 {
 	job_queue_rec_t *job_rec1 = (job_queue_rec_t *) x;
 	job_queue_rec_t *job_rec2 = (job_queue_rec_t *) y;
+	bool has_resv1, has_resv2;
 
 	if (slurm_job_preempt_check(job_rec1, job_rec2))
 		return -1;
 	if (slurm_job_preempt_check(job_rec2, job_rec1))
 		return 1;
 
+	has_resv1 = (job_rec1->job_ptr->resv_id != 0);
+	has_resv2 = (job_rec2->job_ptr->resv_id != 0);
+	if (has_resv1 && !has_resv2)
+		return -1;
+	if (!has_resv1 && has_resv2)
+		return 1;
+
 	if (job_rec1->job_ptr->priority < job_rec2->job_ptr->priority)
 		return 1;
 	if (job_rec1->job_ptr->priority > job_rec2->job_ptr->priority)

Reply via email to