Here is a very simple scenario on SLURM 2.2.3 that prevents a job 
requesting use of a reservation from running.

1.  A reservation is created for user 'A' for some number of nodes.

2. Prior to the start time of the reservation,  user 'B' requests a job 
that needs one or more of the nodes in the reservation,  but the time 
limit on his job would run into the reservation time so his job is PENDING 
for reason 'Resources'.

3. Still prior to the start time of the reservation,  user A requests a 
job using the reservation.  Because the start time has not arrived, his 
job is also PENDING, with reason 'Reservation'. 

4. When the start time of the reservation arrives,  user A's job now goes 
to reason 'Resources'.    But because user A's job is in the queue behind 
user B's job, it can't start. 

5. When the reservation reaches its end time, the reservation expires. Now 
B's job can run, so it does.  When B's job finishes,  A's job is the next 
in the queue, but it can't run because the reservation has expired.   And 
the reservation can't be cleaned up because there is a job attached.   So 
you end up with a job that won't start and an expired reservation that 
won't go away.

This only occurs on a job that is queued up on the reservation before the 
StartTime.   A job that is submitted by A after the reservation StartTime 
bypasses the waiting B's job on the same partition and runs.   However, if 
a second job by A is requested, and gets set to PENDING with reason 
'Resources' because it has to wait for the first 'A' job, then it also 
ends up waiting behind user B's job.

I am not sure what is supposed to happen here.   The "Resource Reservation 
Guide" at "https://computing.llnl.gov/linux/slurm/reservations.html"; does 
not state what occurs when a user that is not on the reservation tries to 
run a job that uses some of the resources in the reservation.   Should 
user B's job have been rejected instead of going to the PENDING state? 
Should the priority of user A's jobs been increased to allow them to go to 
the head of the queue?  Or am I just misunderstanding something about how 
the reservations work?

        -Don Albert-

Reply via email to