Hi it seems that your backup controller does not use the new configuration
file and so is not listening on the 6820 port. This message seems to come
from unsuccessful attempts to ask the backup controller to stop acting as
the primary.

Otherwise, how long are your jobs running ? Is it short jobs or long jobs ?
sometimes, the controller can be slowdown by massive amount of jobs that
ends at the same time.

HTH
Matthieu

2011/2/24 Paul Thirumalai <[email protected]>

> Thanks Par,
> In the slurmctld.log file I see the following error all over the place
>
> Error connecting slurm stream socket at 192.168.1.18:6820: Connection
> refused
>
> 6820 is one of my slurmctld ports
>
> I will try your suggestions and see if it helps
>

Reply via email to