Hi Paul, What's the max cycle latency for main scheduling cycle in your system? You can got it with the sdiag command.
I've been working in a different mechanism for going through the job queue. It would be helpful for sites with a really high number of queued jobs, so this could make sense for HTC more than HPC. Also it makes sense for sites using several partitions and users sending jobs to more than one partition. So instead of one general queue, this solution would create one by partition, but it just would take a configurable number of most priority jobs. Then the scheduler would take the most priority job from the top of each queue. Now the scheduler is not efficient for HTC sites with tenths of thousands or even hundreds of thousands queued jobs. When users use dependencies strongly and submit jobs to more than one partition, there are a lot of work to do for the scheduler. Indeed if you have some special partition which is seldom used it will lead to the scheduler going through the whole queue even if you try to minimze the problem with scheuduler parameters. Even if this is not costly by job it could lead to high latencies when queued jobs are tenths of thousands. We can see it from time to time and then slurm can be unresponsive while it is trying to schedule jobs. The slurm design was for HPC centers where it is not likely to have such a high number of jobs. But if slurm is being used in other type of centers like those from genomics, it would be really useful to have another way of working with queued jobs. Maybe this issue should be discussed in Slurm Users Meeting next September in Lugano. On 02/10/2014 03:49 PM, Paul Edmon wrote: > > How difficult would it be to put a switch into SLURM where instead of > considering the global priority chain it would instead consider each > partition wholly independently with respect to both backfill and main > scheduling loop? In our environment we have many partitions. We also > have people submitting 1000's of jobs to those partitions and > partitions are at different priorities. Since SLURM (even in > backfill) runs down the priority chain higher priority queues can > impact scheduling in lower priority queues even of those queues do not > overlap in terms of hardware. It would be better in our case is SLURM > considered each partition as a wholly independent scheduling run and > did all of them both for backfill and main loop. > > I know there is the bf_max_job_part option in the backfill loop but it > would be better to just have each partition be independent as that way > you don't get any cross talk. Can this be done? It would be > incredibly helpful for our environment. > > -Paul Edmon- WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer
