Moe,
As you probably guessed, I was using "sched/builtin" in the original
scenario. I switched to "sched/backfill" and obtained mixed results,
sometimes the user A job would run, and sometimes the job would remain
unscheduled. I think this is due to my scenario being a bit contrived;
for the test, I was creating a reservation starting
"starttime=now+1minutes" and "duration=5". There were no other jobs
running other than my user A and user B requests that were queued. After
the reservation start time passed, it would take quite some time for the
backfill scheduler to schedule the user A job, sometimes up to 4 minutes
or so when it worked. I believe that sometimes it took longer than the
reservation duration time to attempt to schedule the job, and thus the
reservation expired before the job could run.
With the patch for the reservation priority you supplied, the user A job
gets scheduled almost immediately after the reservation start time
arrives. Also this patch works with the "sched/builtin" FIFO scheduler as
well. Thank you very much!
-Don Albert-
[email protected] wrote on 03/16/2011 12:52:10 PM:
> After giving this some more thought, treating any job with a
> reservation as having a higher priority
> will result in better scheduling performance with or without the
> sched/backfill plugin. The attached
> patch will not change the job priorities, but will treat a job with
> a reservation like it has a higher
> priority than jobs without reservations. There is a patch attached
> which is just a few lines of code.
> It changes the sort algorithm to consider job reservations in
> addition to job priority and preemptability.
> This will be included in SLURM v2.2.4.
> ________________________________________
> From: [email protected] [[email protected].
> gov] On Behalf Of [email protected] [[email protected]]
> Sent: Wednesday, March 16, 2011 8:25 AM
> To: [email protected]
> Subject: [slurm-dev] Jobs requesting a reservation prevented from
running
>
> Here is a very simple scenario on SLURM 2.2.3 that prevents a job
> requesting use of a reservation from running.
>
> 1. A reservation is created for user 'A' for some number of nodes.
>
> 2. Prior to the start time of the reservation, user 'B' requests a
> job that needs one or more of the nodes in the reservation, but the
> time limit on his job would run into the reservation time so his job
> is PENDING for reason 'Resources'.
>
> 3. Still prior to the start time of the reservation, user A
> requests a job using the reservation. Because the start time has
> not arrived, his job is also PENDING, with reason 'Reservation'.
>
> 4. When the start time of the reservation arrives, user A's job now
> goes to reason 'Resources'. But because user A's job is in the
> queue behind user B's job, it can't start.
>
> 5. When the reservation reaches its end time, the reservation
> expires. Now B's job can run, so it does. When B's job finishes,
> A's job is the next in the queue, but it can't run because the
> reservation has expired. And the reservation can't be cleaned up
> because there is a job attached. So you end up with a job that
> won't start and an expired reservation that won't go away.
>
> This only occurs on a job that is queued up on the reservation
> before the StartTime. A job that is submitted by A after the
> reservation StartTime bypasses the waiting B's job on the same
> partition and runs. However, if a second job by A is requested,
> and gets set to PENDING with reason 'Resources' because it has to
> wait for the first 'A' job, then it also ends up waiting behind user B's
job.
>
> I am not sure what is supposed to happen here. The "Resource
> Reservation Guide" at "https://computing.llnl.
> gov/linux/slurm/reservations.html" does not state what occurs when a
> user that is not on the reservation tries to run a job that uses
> some of the resources in the reservation. Should user B's job have
> been rejected instead of going to the PENDING state? Should the
> priority of user A's jobs been increased to allow them to go to the
> head of the queue? Or am I just misunderstanding something about
> how the reservations work?
>
> -Don Albert-
> [attachment "resv_prio.patch" deleted by Don Albert/US/BULL]