You might be accustomed to soft limits, but I would strongly encourage you to look at Slurm's Quality Of Service (QOS) capability as a better solution. QOS lets you establish different job limits and supports job preemption. The preemption is especially important in that you can preempt lower priority jobs at will rather than finding your machine full of low priority jobs all Monday morning just because the system went idle on the weekend. A typical configuration would be to establish a "standby" QOS with large time/size limits, but preemptable by normal QOS jobs on demand. See:
http://slurm.schedmd.com/qos.html
http://slurm.schedmd.com/preempt.html

Quoting Filippo Spiga <[email protected]>:

Dear Moe, Dear all,

I wonder if someone has already started to work on implementing soft limits in SLURM. Here at HPCS we are very interested to this functionality and we are keen to start developing it as soon as possible.

Best Regards,
Filippo

--
Mr. Filippo SPIGA, M.Sc. - HPC Application Specialist
High Performance Computing Service, University of Cambridge (UK)
http://www.hpc.cam.ac.uk/ ~ http://filippospiga.me ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert



On Jul 17, 2013, at 6:46 PM, Moe Jette <[email protected]> wrote:

Slurm has no soft limits.

Quoting Wojciech Turek <[email protected]>:

Moe, thank you for your prompt reply. We are already utilizing various
slurm resource limits and we also use slurm QOS facilities in our
scheduling model. We do not use preemption in our current model.
My question was specifically about slurm support of configuring soft and
hard resource limits. Please see scenario below:

Cluster has 200 nodes
User A and User B use the same QOS and they have the same resource limits.
User A submits 1000 single node jobs and has maxjob limit set to 100
User B submits 200 single node jobs and has maxjob limit set to 100

When the 200 jobs of User B completes user A can still have only 100
running jobs, cluster is 50% underutilized.

Soft Hard limit feature would help to keep cluster fully utilized as long
as there are jobs in the queue.

If user A and user B in the above scenario would have limits
maxjob1=100[200] where 100 is soft and 200 is hard limit then cluster would
be fully utilized as long as there are jobs in the queue because user A
could run 200 jobs if there were no demand from other users.

Can this soft/hard limit functionality currently be achieved in SLURM?

If this is not currently possible in slurm I would like to know how many
people would consider this as useful feature and if it is worth to put some
development effort into this.

Best regards,

Wojciech


On 17 July 2013 16:40, Moe Jette <[email protected]> wrote:

Slurm has an assortment of hard limits available:
http://www.schedmd.com/**slurmdocs/resource_limits.html<http://www.schedmd.com/slurmdocs/resource_limits.html>

Slurm also supports various Qualities Of Service (QOS):
http://www.schedmd.com/**slurmdocs/qos.html<http://www.schedmd.com/slurmdocs/qos.html>

Plus job preemption:
http://www.schedmd.com/**slurmdocs/preempt.html<http://www.schedmd.com/slurmdocs/preempt.html>

In a typical scenario, there would be a low priority QOS, say "standby",
whose jobs can be preempted as needed for higher priority work. Another
option is a low priority job queue (partition), again with preemption.


Quoting Wojciech Turek <[email protected]>:

We are migrating our scheduling system from torque/maui/moab to slurm and
there is a particularly important moab/maui feature [hard and soft limits]
which does not seem to be implemented yet in slurm, please see below a
link
to a description of that feature
http://docs.adaptivecomputing.**com/mwm/archive/6-0/6.**
2throttlingpolicies.php#limits<http://docs.adaptivecomputing.com/mwm/archive/6-0/6.2throttlingpolicies.php#limits>

My questions are:
a) Am I missing something and soft/hard limits feature actually is
implemented in slurm ?
b) no this feature does not exists but there is alternative way of doing
this n slurm ?
c) no this feature does not exists but implementing it in slurm would be
easy/difficult

caveat:
I would like to avoid cronjob like solutions that would change limits in
flight according to cluster state.

Many thanks for all your help

--
Wojciech Turek

Senior System Architect

High Performance Computing Service
University of Cambridge







--
Wojciech Turek

Senior System Architect

High Performance Computing Service
University of Cambridge



--
Mr. Filippo SPIGA, M.Sc. - HPC Application Specialist
High Performance Computing Service, University of Cambridge (UK)
http://www.hpc.cam.ac.uk/ ~ http://filippospiga.me ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert





Reply via email to