Hi All I figured out the issue that caused the _slurm_connect failure. I was launching the slurm controller and slurm daemons on the compute nodes using /etc/slurm/start_all.sh. This script was starting the daemons before starting the slurm controller and slurmdbd, which is why the slurmd processes could not connect to the controller.
Once I changed the order of startup, the _slurm_connect error went away. I still have the error where adding the compute nodes to SLURM causes jobs to fail.
