I see the backfill scheduler starting the job with the reservation ahead of the
earlier job (which I presume has a higher priority), which is what I would 
expect.
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of [email protected] [[email protected]]
Sent: Wednesday, March 16, 2011 8:25 AM
To: [email protected]
Subject: [slurm-dev] Jobs requesting a reservation prevented from running

Here is a very simple scenario on SLURM 2.2.3 that prevents a job requesting 
use of a reservation from running.

1.  A reservation is created for user 'A' for some number of nodes.

2. Prior to the start time of the reservation,  user 'B' requests a job that 
needs one or more of the nodes in the reservation,  but the time limit on his 
job would run into the reservation time so his job is PENDING for reason 
'Resources'.

3. Still prior to the start time of the reservation,  user A requests a job 
using the reservation.  Because the start time has not arrived, his job is also 
PENDING, with reason 'Reservation'.

4. When the start time of the reservation arrives,  user A's job now goes to 
reason 'Resources'.    But because user A's job is in the queue behind user B's 
job, it can't start.

5. When the reservation reaches its end time, the reservation expires.   Now 
B's job can run, so it does.  When B's job finishes,  A's job is the next in 
the queue, but it can't run because the reservation has expired.   And the 
reservation can't be cleaned up because there is a job attached.   So you end 
up with a job that won't start and an expired reservation that won't go away.

This only occurs on a job that is queued up on the reservation before the 
StartTime.   A job that is submitted by A after the reservation StartTime 
bypasses the waiting B's job on the same partition and runs.   However, if a 
second job by A is requested, and gets set to PENDING with reason 'Resources' 
because it has to wait for the first 'A' job, then it also ends up waiting 
behind user B's job.

I am not sure what is supposed to happen here.   The "Resource Reservation 
Guide" at "https://computing.llnl.gov/linux/slurm/reservations.html"; does not 
state what occurs when a user that is not on the reservation tries to run a job 
that uses some of the resources in the reservation.   Should user B's job have 
been rejected instead of going to the PENDING state?   Should the priority of 
user A's jobs been increased to allow them to go to the head of the queue?  Or 
am I just misunderstanding something about how the reservations work?

        -Don Albert-

Reply via email to