Hi Andrew,

Andrew Petersen <[email protected]> writes:

> Re: old version of scontrol, sacct keeps resurrecting 
>
> I was able to narrow down my problem further. I managed to stop all slurm
> daemons on head and slave nodes. On head node, if I start the new slurm:
> /.../new-slurm/etc/init.d/slurm start
> the slurmctld daemon runs ok, but with error messages in the 
> /var/log/slurmctld
> file about Incompatible versions. If I do "ps -u root -F |grep slurm", I can 
> see
> why. Periodically, the /../old-slurm/scontrol and squeue run. It seems like
> somehow, the "new-slurm/etc/init.d/slurm start" is triggering these old 
> daemons
> to run.

'scontrol' and 'squeue' are not daemons, so they must be being run
directly.

> I check the path in my $PATH, slurm.conf, and
> /../new-slurm/etc/init.d/slurm file, and there is nothing like that. I would
> appreciate any guidance on resolving this.

Have you updated your module files?  Some users could even be explicity
loading the old version in their .bashrc.

Cheers,

Loris

> Regards 
> Andrew
>
> On Wed, Jun 3, 2015 at 4:52 PM, Andrew Petersen <[email protected]> wrote:
>
>     
>     
>     
>     
>     
>     
>     
>     
>     Hello
>     
>     
>     I installed a new version of slurm, 14.11.3. It works fine. However I
>     noticed that my log file /var/log/slurmctld shows
>     error: slurm_receive_msg: Incompatible versions of client and server code
>     
>     This led me to discover that old slurm scontrol, squeue and sacct are 
> still
>     running on the head node, using
>     ps -u root -F |grep slurm
>     
>     
>     I have tried to kill this every which way, but they wont die, they keep
>     resurrecting with different pid's. I tried
>     /old-slurm-version/bin/scontrol shutdown
>     
>     but this gives me
>     slurm_shutdown error: Zero Bytes were transmitted or received
>     
>     
>     It seems like something is automatically restarting the old slurm. I am
>     using Bright Cluster Manager, and I set it so that it does NOT auto-start 
> or
>     run the slurm daemon, but that did not help.
>     
>     
>     Can someone help me kill this thing? It is causing the creation of big log
>     zip files, and using up cpu capacity on the head node.
>     
>     
>     Regards 
>     
>     Andrew Petersen
>     
>     
>     
>     
>
>

-- 
This signature is currently under construction.

Reply via email to