This email actually reminded me of what we run here. We've been
operating with slurm for the past 3 years in an environment similar to
your own. We schedule about 50k cores, 90 partitions, heterogeneous
architecture of AMD, Intel, high memory versus low memory and GPU.
Slurm has handled it all and a lot of the lessons we learned have been
pushed back to the community. Thus the current version of slurm should
be able to handle all that you have defined there.
The only tricky part that I am aware of may be the Kerberos part. We use
ldap for our auth on the boxes, and slurm does have a pam module for
handling who gets access. That plus cgroups should take care of your
security problems.
Anyways, suffice it to say slurm can work for your environment as your
environment is fairly similar to ours.
-Paul Edmon-
On 05/24/2016 07:33 AM, Šimon Tóth wrote:
Architectural constraints of Slurm
Hello,
I'm looking for information on Slurm architectural constraints as we
are considering a switch to Slurm.
We are currently running a heavily modified version of Torque with a
custom scheduler.
Our system (~13k CPU cores), is heavily heterogenous (~40 clusters)
with complex operational constraints. The system generally handles
10k-50k enqued jobs.
We are currently scheduling CPU cores, Memory, GPU cards, Scratch
space (local, SSD and NFS with different machines having access to
different combination of these) and software licenses.
Machines are described by a set of physical and software properties
(can be requested by users) and their speed (users can request ranges
of machine performance).
Jobs carry complex requests. Each job can request sets of machines,
where each set caries a different specification. Each set is described
by the amount of resources requested, machine properties (negative
specification is supported for specifying nodes that do not have
specific property) and the number of nodes with such specification.
Nodes can be allocated exclusively (in which case, the specification
describes the minimum amount) or shared, with each resource still
being allocated exclusively (jobs cannot overlap in cores, memory, gpu
cards....).
We are relying on Kerberos which is used for both identification and
authentication. Each running task inside of a job has a nanny process
that periodically refreshes the kerberos ticket for that particular
process.
For scalability reasons, our scheduler relies on the server to keep an
up-to-date and full state of the system. The server therefore counts
the current resource allocation state for each of the resources on
each of the nodes.
Please let me know if this sounds like something Slurm could handle,
or if there are any limitations in Slurm that would make this
impossible to support.
Sincerely,
Simon Toth