This email actually reminded me of what we run here. We've been operating with slurm for the past 3 years in an environment similar to your own. We schedule about 50k cores, 90 partitions, heterogeneous architecture of AMD, Intel, high memory versus low memory and GPU. Slurm has handled it all and a lot of the lessons we learned have been pushed back to the community. Thus the current version of slurm should be able to handle all that you have defined there.

The only tricky part that I am aware of may be the Kerberos part. We use ldap for our auth on the boxes, and slurm does have a pam module for handling who gets access. That plus cgroups should take care of your security problems.

Anyways, suffice it to say slurm can work for your environment as your environment is fairly similar to ours.

-Paul Edmon-

On 05/24/2016 07:33 AM, Šimon Tóth wrote:
Architectural constraints of Slurm
Hello,

I'm looking for information on Slurm architectural constraints as we are considering a switch to Slurm.

We are currently running a heavily modified version of Torque with a custom scheduler.

Our system (~13k CPU cores), is heavily heterogenous (~40 clusters) with complex operational constraints. The system generally handles 10k-50k enqued jobs.

We are currently scheduling CPU cores, Memory, GPU cards, Scratch space (local, SSD and NFS with different machines having access to different combination of these) and software licenses.

Machines are described by a set of physical and software properties (can be requested by users) and their speed (users can request ranges of machine performance).

Jobs carry complex requests. Each job can request sets of machines, where each set caries a different specification. Each set is described by the amount of resources requested, machine properties (negative specification is supported for specifying nodes that do not have specific property) and the number of nodes with such specification. Nodes can be allocated exclusively (in which case, the specification describes the minimum amount) or shared, with each resource still being allocated exclusively (jobs cannot overlap in cores, memory, gpu cards....).

We are relying on Kerberos which is used for both identification and authentication. Each running task inside of a job has a nanny process that periodically refreshes the kerberos ticket for that particular process.

For scalability reasons, our scheduler relies on the server to keep an up-to-date and full state of the system. The server therefore counts the current resource allocation state for each of the resources on each of the nodes.

Please let me know if this sounds like something Slurm could handle, or if there are any limitations in Slurm that would make this impossible to support.

Sincerely,
Simon Toth

Reply via email to