Hello all, I’m working on a small test cluster (2 nodes linked with eth and IB) and am trying to install slurm on them. I have installed Slurm numerous times on a normal system but I am having issues starting the slurm service on the compute node. I am using werewulf to boot my nodes statelessly, so I installed munge and slurm in a chroot on my head node and then provisioned it to my compute node. When my compute node boots, the munge daemon is running, but when I try to start the slurm daemon I get no output. Also, if I query the status of the slurm daemon I get no output. However, if I run “slurmd -C” I see the expected output of all the resources on my node. My head nodes ctl daemon is running but It cannot connect to the compute nodes daemon. I also have the exact users and slurm.conf on both nodes (they are mounted with NFS).
My slurm.conf specifies to create log files in /var/log/slurm, but this folder was not created even though the slurm daemon appears to be running. I’m guessing there is some sort of issue with ownership of the slurm files that is causing this. When I installed munge I had to go through and fix the owners on a number of files and directories. Does anyone have any indication of what files might cause this? Thanks, Trevor
