I see the backfill scheduler starting the job with the reservation ahead of the earlier job (which I presume has a higher priority), which is what I would expect. ________________________________________ From: [email protected] [[email protected]] On Behalf Of [email protected] [[email protected]] Sent: Wednesday, March 16, 2011 8:25 AM To: [email protected] Subject: [slurm-dev] Jobs requesting a reservation prevented from running
Here is a very simple scenario on SLURM 2.2.3 that prevents a job requesting use of a reservation from running. 1. A reservation is created for user 'A' for some number of nodes. 2. Prior to the start time of the reservation, user 'B' requests a job that needs one or more of the nodes in the reservation, but the time limit on his job would run into the reservation time so his job is PENDING for reason 'Resources'. 3. Still prior to the start time of the reservation, user A requests a job using the reservation. Because the start time has not arrived, his job is also PENDING, with reason 'Reservation'. 4. When the start time of the reservation arrives, user A's job now goes to reason 'Resources'. But because user A's job is in the queue behind user B's job, it can't start. 5. When the reservation reaches its end time, the reservation expires. Now B's job can run, so it does. When B's job finishes, A's job is the next in the queue, but it can't run because the reservation has expired. And the reservation can't be cleaned up because there is a job attached. So you end up with a job that won't start and an expired reservation that won't go away. This only occurs on a job that is queued up on the reservation before the StartTime. A job that is submitted by A after the reservation StartTime bypasses the waiting B's job on the same partition and runs. However, if a second job by A is requested, and gets set to PENDING with reason 'Resources' because it has to wait for the first 'A' job, then it also ends up waiting behind user B's job. I am not sure what is supposed to happen here. The "Resource Reservation Guide" at "https://computing.llnl.gov/linux/slurm/reservations.html" does not state what occurs when a user that is not on the reservation tries to run a job that uses some of the resources in the reservation. Should user B's job have been rejected instead of going to the PENDING state? Should the priority of user A's jobs been increased to allow them to go to the head of the queue? Or am I just misunderstanding something about how the reservations work? -Don Albert-
