[slurm-dev] Re: SLURM Partitions

Alejandro Lucero Palau Mon, 10 Feb 2014 08:11:55 -0800

Hi Paul,

What's the max cycle latency for main scheduling cycle in your system? 
You can got it with the sdiag command.

I've been working in a different mechanism for going through the job
queue. It would be helpful for sites with a really high number of queued
jobs, so this could make sense for HTC more than HPC. Also it makes
sense for sites using several partitions and users sending jobs to more
than one partition. So instead of one general queue, this solution would
create one by partition, but it just would take a configurable number of
most priority jobs. Then the scheduler would take the most priority job
from the top of each queue.

Now the scheduler is not efficient for HTC sites with tenths of
thousands or even hundreds of thousands queued jobs. When users use
dependencies strongly and submit jobs to more than one partition, there
are a lot of work to do for the scheduler. Indeed if you have some
special partition which is seldom used it will lead to the scheduler
going through the whole queue even if you try to minimze the problem
with scheuduler parameters. Even if this is not costly by job it could
lead to high latencies when queued jobs are tenths of thousands. We can
see it from time to time and then slurm can be unresponsive while it is
trying to schedule jobs.

The slurm design was for HPC centers where it is not likely to have such
a high number of jobs. But if slurm is being used in other type of
centers like those from genomics, it would be really useful to have
another way of working with queued jobs. Maybe this issue should be
discussed in Slurm Users Meeting next September in Lugano.

On 02/10/2014 03:49 PM, Paul Edmon wrote:
>
> How difficult would it be to put a switch into SLURM where instead of
> considering the global priority chain it would instead consider each
> partition wholly independently with respect to both backfill and main
> scheduling loop?  In our environment we have many partitions.  We also
> have people submitting 1000's of jobs to those partitions and
> partitions are at different priorities.  Since SLURM (even in
> backfill) runs down the priority chain higher priority queues can
> impact scheduling in lower priority queues even of those queues do not
> overlap in terms of hardware.  It would be better in our case is SLURM
> considered each partition as a wholly independent scheduling run and
> did all of them both for backfill and main loop.
>
> I know there is the bf_max_job_part option in the backfill loop but it
> would be better to just have each partition be independent as that way
> you don't get any cross talk.  Can this be done?  It would be
> incredibly helpful for our environment.
>
> -Paul Edmon-

WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

[slurm-dev] Re: SLURM Partitions

Reply via email to