Our CentOS cluster uses a shared installation for all the compute nodes, but separate local installations for the head node and backup head node. The compute nodes share binaries and configuration files via NFS, but keep separate logs in their own local /var/log and the startup script in their local init.d.

The head node and backup head node are independent of each other except for shared state information. See "High Availability" in the SLURM docs:

    http://slurm.schedmd.com/quickstart_admin.html#Config

If NFS is properly configured, clients will wait indefinitely and continue where they left off, so an NFS server failure should not result in loss of data as long as the server comes back online while the client is still trying to complete its operations.

There are pros and cons to a separate server for the head node and backup head node state information. With a separate server, both can operate normally while the other is down. However, is the separate server goes down, neither head node can operate normally until it comes back up. A single server failure is more likely with 3 servers than with 2.

If state information is kept on the primary head node, the backup head node will be blocked from updating state information while the primary is down, and vice versa. This shouldn't be a problem as long as the outage is brief, such as a reboot required for system updates. I routinely reboot our primary head node for yum updates (after verifying that the backup head node is running normally).

In any case, the server where the state information is kept should be *very* reliable. We keep ours on the primary head node, which uses a hardware RAID1 for the boot disk and has very strict limits to keep the load to a minimum. Memory use and processes are both limited via /etc/security/limits.d/ and the head node has no access to the computational software installed on the cluster, so users aren't tempted to run "quick" jobs on the head node outside the scheduler.

It would be a nice feature if the head node and backup head node could be completely independent of each other, but I imagine that keeping them synchronized would require some challenging coding and the real benefit would be minimal.

Regards,

    Jason

On 07/25/14 03:33, Bastian Krüger wrote:
Using the same (mounted) slurm installation on all nodes
I recently began working with a cluster that consists of 1 control node and several computation node and it was set up a couple of years ago by someone else. In this current setup, there is only one actual slurm installation, which is located on the control node in /usr/local/slurm. All the other nodes just mount that directory to their /usr/local/slurm. The only thing that is copied between the nodes is the service startup script in /etc/init.d.

The question is, if that is a good idea or not. I realize that if the control node fails, that all the other nodes lose the mounted slurm directory. But how crucial is that?

Also, I'm thinking about adding a backup control node. This node has to share a directory with the first control node. Are there any advises on where this directory should be located? Could it live on the backup control node or would it be better to use a separate server?

Reply via email to