Thanks for the responses.  I finally fixed it, though I don't completely
understand what happened.  I logged into all the slave nodes, killed all
slurm services/daemons with "kill" (there was some old-slurm stuff
resurrecting there too).  Then returned to the head node, and killed all
slurm things.  Then, on the head, I did /..new-slurm/slurm start; and then
the same for the slave nodes.  Now, for the first time, slurmctld  is
working properly, without generating lots of errors in the log file, and no
old-slurm things are running.  It seems as if old-slurm things running on
the slave nodes triggered old-slurm accounting things on the head node.  It
has been going well for about 24 hours, so I think it is fixed.
Thanks again
Andrew

On Fri, Jun 5, 2015 at 3:12 AM, Loris Bennett <[email protected]>
wrote:

>
> Christopher Samuel <[email protected]> writes:
>
> > On 05/06/15 16:22, Loris Bennett wrote:
> >
> >> 'scontrol' and 'squeue' are not daemons, so they must be being run
> >> directly.
> >
> > It really sounds like they're being run from a cron job somewhere, do
> > you have some sort of health check or accounting checks that are getting
> > done regularly with the old paths hardcoded in them?
> >
> > What users are they running as?
> >
> > All the best,
> > Chris
>
> Bright Cluster Manager has some integration with the queuing system, so
> it could well be running 'scontrol' and 'squeue' periodically.  You
> should have the tool 'wlm-setup', which enables you to choose the
> workload manager.  However, I can't see a way to adjust the path.
>
> I'm very interested in how this pans out as we are also planning to move
> from our current, old version of Slurm to a more recent version.  In
> this context we would also want to change the path for Slurm.
>
> Cheers,
>
> Loris
>
> --
> This signature is currently under construction.
>

Reply via email to