Thanks, I repeated my tests more accuratelly and it finally worked. As you said it was because bf_window was not configured properly. Greetings, Joan On 04/11/14 14:19, Loris Bennett wrote: Hello Joan,
Joan Arbona <[email protected]> writes: Hello all, We have realized that in our cluster the backfill plugin is not working as we expected. When a user submits jobs using an smaller set of nodes, always get running before jobs with a larger set of nodes, even if these have more priority. Our cluster has: - 1 partition of 40 nodes called THIN. 5 of them are requested by a reservation every day, so they are unusable. - Default Max Time of THIN partition is 3 days (4320 minutes) - Fairshare priority scheme - Backfill scheduler - Backfill parameters are all set to default. Lets assume the following circumstance: 1. User A submits jobs of 10 nodes regularly, lets say, twice or three times a day. Those nodes are exclusive for him. He does not specify any time, so job's max time is 3 days. 2. User B submited one job of 30 nodes at 26th of october. This job is waiting for user A jobs to finish. B's jobs have more priority than A's. The following table shows the output of smap: .........333333333322222222221111111111. (those numbers are JOBID in the table below) JOBID PARTITION USER NAME ST TIME NODES NODELIST 1 thin A gromac R 1-00:11:19 10 foner[132-141] 2 thin A gromac R 21:33:49 10 foner[122-131] 3 thin A gromac R 13:31:49 10 foner[112-121] 4 thin B DART_c PD 00:00:00 30 waiting... 5 thin A gromac PD 00:00:00 10 waiting... Theorically and due to backfill , when user A finishes any of his running jobs (1,2 or 3), although job 4 does not fit in the cluster the schedule should not put job 5 to run. The reason is that job 5 it has less priority than job 4, and backfill does not alter the time of jobs with more prioirty. It should wait until other A's jobs finish and then put job 4 to run. Well, this does not happen. As user A is submitting jobs all the time, they're filling all holes that user A's jobs are leaving, because job 4 doesn't fit (it needs 30 nodes, not 10). Then, job 4 will never start until user A stops sending jobs. I have tried it in a test environment using sleeps. I have realized that I get the same behavior when submitting jobs with more slurm max time (--time) than the duration of the command (sleep time). Also, I have tried to adjust parameters like bf_window, that is set to one day by default, without luck. Does anybody knows why does this happen? Why in this case the backfill principle of not altering jobs with more priority does not apply? Is there a way to solve this? Thanks, Joan Attaching slurm.conf and the output of squeue: squeue --start JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) 5 thin gromacs_ A PD 2014-11-06T12:06:19 10 (Priority) 4 thin DART_cyc B PD 2014-11-06T22:45:49 30 (Resources) In fact, job 4's start_time has been changing all the time when user A's jobs get running. Maybe backfill can't calculate start_time accuratelly? One thing you might need to look at is the value of the scheduler parameter 'bf_window'. The default value is 1440 minutes (1 day) but it should probably be as large as your tMaxTime, i.e. SchedulerParameters=bf_window=4320 See 'man slurm.conf' for more details. Cheers, Loris -- Joan Francesc Arbona Ext. 2582 Centre de Tecnologies de la Informació Universitat de les Illes Balears http://jfdeu.wordpress.com https://mallorca.guifi.net
