Greetings! I have a test slurm cluster installed running 14.11.4 on ScientificLinux 6.6. I've been having some weird trouble, and I'm hoping for some assistance. I hope this is the right forum.
Problem number 1: If I issue 'scontrol reconfigure' from any node, the slurm controller (slurmctld) dies and needs to be restarted (after which it runs fine). Problem number 2: 'scontrol show jobs' shows jobs in state RUNNING that don't actually appear to exist. Some of these are days old. What might be going on here? -- Jon Nelson Dyn / Senior Software Engineer p. +1 (603) 263-8029
