Hello all,

I’m working on a small test cluster (2 nodes linked with eth and IB) and am 
trying to install slurm on them. I have installed Slurm numerous times on a 
normal system but I am having issues starting the slurm service on the compute 
node. I am using werewulf to boot my nodes statelessly, so I installed munge 
and slurm in a chroot on my head node and then provisioned it to my compute 
node. When my compute node boots, the munge daemon is running, but when I try 
to start the slurm daemon I get no output. Also, if I query the status of the 
slurm daemon I get no output. However, if I run “slurmd -C” I see the expected 
output of all the resources on my node. My head nodes ctl daemon is running but 
It cannot connect to the compute nodes daemon. I also have the exact users and 
slurm.conf on both nodes (they are mounted with NFS). 

My slurm.conf specifies to create log files in /var/log/slurm, but this folder 
was not created even though the slurm daemon appears to be running. I’m 
guessing there is some sort of issue with ownership of the slurm files that is 
causing this. When I installed munge I had to go through and fix the owners on 
a number of files and directories. Does anyone have any indication of what 
files might cause this? 

Thanks,
Trevor

Reply via email to