I recently began working with a cluster that consists of 1 control node and several computation node and it was set up a couple of years ago by someone else. In this current setup, there is only one actual slurm installation, which is located on the control node in /usr/local/slurm. All the other nodes just mount that directory to their /usr/local/slurm. The only thing that is copied between the nodes is the service startup script in /etc/init.d.
The question is, if that is a good idea or not. I realize that if the control node fails, that all the other nodes lose the mounted slurm directory. But how crucial is that? Also, I'm thinking about adding a backup control node. This node has to share a directory with the first control node. Are there any advises on where this directory should be located? Could it live on the backup control node or would it be better to use a separate server?
