Hi it seems that your backup controller does not use the new configuration file and so is not listening on the 6820 port. This message seems to come from unsuccessful attempts to ask the backup controller to stop acting as the primary.
Otherwise, how long are your jobs running ? Is it short jobs or long jobs ? sometimes, the controller can be slowdown by massive amount of jobs that ends at the same time. HTH Matthieu 2011/2/24 Paul Thirumalai <[email protected]> > Thanks Par, > In the slurmctld.log file I see the following error all over the place > > Error connecting slurm stream socket at 192.168.1.18:6820: Connection > refused > > 6820 is one of my slurmctld ports > > I will try your suggestions and see if it helps >
