[slurm-dev] slurm-dev RE: Queue blocked because of resource limit / priority

Roland Fehrenbacher Mon, 11 Jul 2016 09:39:00 -0700

>>>>> "B" == Fitzpatrick, Ben <ben.fitzpatr...@metoffice.gov.uk> writes:


Hi Ben,

thanks a lot for pointing this out. I'll check whether 15.08.12 really
fixes the issue.

Cheers,

Roland

-------
http://www.q-leap.com / http://qlustar.com
          --- HPC / Storage / Cloud Linux Cluster OS ---

    B> Hi Roland, We had a similar problem, and I think it is fixed by
    B> 15.08.7 upwards. Probably this fix in 15.08.7?

    B>   -- Backfill scheduling fix: If a job can't be started due to a
    B>      "group"
    B> resource
    B>      limit, rather than reserve resources for it when the next
    B>      job ends,
    B> don't
    B>      reserve any resources for it.

    B> Cheers,

    B> Ben

    B> -----Original Message----- From: Roland Fehrenbacher
    B> [mailto:r...@q-leap.de] Sent: 08 July 2016 15:07 To: slurm-dev
    B> Subject: [slurm-dev] Queue blocked because of resource limit /
    B> priority


    B> Hi,

    B> on 15.08.5 we have a config where students are only allowed a
    B> maximum of 10 cores by using GrpTRES / cpu=10. We also have
    B> activated fairshare (using FAIR_TREE).

    B> Now when student 1 submits jobs beyond his limits of 10 cores,
    B> they are correctly put into state "pending" (showing
    B> AssocGrpCpuLimit). So far so good. The problem arises when
    B> another non-student user 1 with much lower fairshare priority
    B> value submits a job: That user's job doesn't start even though
    B> there are free resources. It is sitting in the queue because it's
    B> overall priority is lower than the student's pending job. Is this
    B> expected behavior or shouldn't the job of user 1 start regardless
    B> of lower priority, since student 1's job can't start anyway
    B> because of the 10 core limit? This problem essentially blocks the
    B> cluster as soon as students submit beyond their limit.

    B> Thanks,

    B> Roland

    B> ------- http://www.q-leap.com / http://qlustar.com
    B>           --- HPC / Storage / Cloud Linux Cluster OS ---

--

[slurm-dev] slurm-dev RE: Queue blocked because of resource limit / priority

Reply via email to