Our CentOS cluster uses a shared installation for all the compute nodes,
but separate local installations for the head node and backup head
node. The compute nodes share binaries and configuration files via NFS,
but keep separate logs in their own local /var/log and the startup
script in their local init.d.
The head node and backup head node are independent of each other except
for shared state information. See "High Availability" in the SLURM docs:
http://slurm.schedmd.com/quickstart_admin.html#Config
If NFS is properly configured, clients will wait indefinitely and
continue where they left off, so an NFS server failure should not result
in loss of data as long as the server comes back online while the client
is still trying to complete its operations.
There are pros and cons to a separate server for the head node and
backup head node state information. With a separate server, both can
operate normally while the other is down. However, is the separate
server goes down, neither head node can operate normally until it comes
back up. A single server failure is more likely with 3 servers than with 2.
If state information is kept on the primary head node, the backup head
node will be blocked from updating state information while the primary
is down, and vice versa. This shouldn't be a problem as long as the
outage is brief, such as a reboot required for system updates. I
routinely reboot our primary head node for yum updates (after verifying
that the backup head node is running normally).
In any case, the server where the state information is kept should be
*very* reliable. We keep ours on the primary head node, which uses a
hardware RAID1 for the boot disk and has very strict limits to keep the
load to a minimum. Memory use and processes are both limited via
/etc/security/limits.d/ and the head node has no access to the
computational software installed on the cluster, so users aren't tempted
to run "quick" jobs on the head node outside the scheduler.
It would be a nice feature if the head node and backup head node could
be completely independent of each other, but I imagine that keeping them
synchronized would require some challenging coding and the real benefit
would be minimal.
Regards,
Jason
On 07/25/14 03:33, Bastian Krüger wrote:
Using the same (mounted) slurm installation on all nodes
I recently began working with a cluster that consists of 1 control
node and several computation node and it was set up a couple of years
ago by someone else. In this current setup, there is only one actual
slurm installation, which is located on the control node in
/usr/local/slurm. All the other nodes just mount that directory to
their /usr/local/slurm. The only thing that is copied between the
nodes is the service startup script in /etc/init.d.
The question is, if that is a good idea or not. I realize that if the
control node fails, that all the other nodes lose the mounted slurm
directory. But how crucial is that?
Also, I'm thinking about adding a backup control node. This node has
to share a directory with the first control node. Are there any
advises on where this directory should be located? Could it live on
the backup control node or would it be better to use a separate server?