Hi Krishna, 1. Review the logs and increase debug value if needed.
2. If the slurm config is not exactly the same then the nodes will not be able to communicate with headnode. The logs will report this. 3. Is munge running on head? If this didn't start the communication will fail. These are the first items I'd check. While I was upgrading I ran into these issues intermittently. Kevin On Oct 14, 2014 10:21 AM, "Krishna Teja" <[email protected]> wrote: > Hi All, > > I am having an issue with running jobs using slurm on our cluster. Slurm > was working fine until i rebooted the head node and now when I run "srun -l > /bin/hostname" I get an error saying > > srun: error: task 0 launch failed: Slurmd could not execve job > > Slurm is running on the head node and all the compute nodes too. I > couldn't find anything concrete when I tried searching for it online. > > I am willing to provide any additional details to troubleshoot this. > > Any help appreciated! > > Regards > Krishna >
