>>>>> "B" == Fitzpatrick, Ben <ben.fitzpatr...@metoffice.gov.uk> writes:
Hi Ben, thanks a lot for pointing this out. I'll check whether 15.08.12 really fixes the issue. Cheers, Roland ------- http://www.q-leap.com / http://qlustar.com --- HPC / Storage / Cloud Linux Cluster OS --- B> Hi Roland, We had a similar problem, and I think it is fixed by B> 15.08.7 upwards. Probably this fix in 15.08.7? B> -- Backfill scheduling fix: If a job can't be started due to a B> "group" B> resource B> limit, rather than reserve resources for it when the next B> job ends, B> don't B> reserve any resources for it. B> Cheers, B> Ben B> -----Original Message----- From: Roland Fehrenbacher B> [mailto:r...@q-leap.de] Sent: 08 July 2016 15:07 To: slurm-dev B> Subject: [slurm-dev] Queue blocked because of resource limit / B> priority B> Hi, B> on 15.08.5 we have a config where students are only allowed a B> maximum of 10 cores by using GrpTRES / cpu=10. We also have B> activated fairshare (using FAIR_TREE). B> Now when student 1 submits jobs beyond his limits of 10 cores, B> they are correctly put into state "pending" (showing B> AssocGrpCpuLimit). So far so good. The problem arises when B> another non-student user 1 with much lower fairshare priority B> value submits a job: That user's job doesn't start even though B> there are free resources. It is sitting in the queue because it's B> overall priority is lower than the student's pending job. Is this B> expected behavior or shouldn't the job of user 1 start regardless B> of lower priority, since student 1's job can't start anyway B> because of the 10 core limit? This problem essentially blocks the B> cluster as soon as students submit beyond their limit. B> Thanks, B> Roland B> ------- http://www.q-leap.com / http://qlustar.com B> --- HPC / Storage / Cloud Linux Cluster OS --- --