Having never seen your system, here are a couple of shots in the
dark:
1. Do you have a cron job that might be starting Slurm, perhaps for
a simple failure-restart solution?
2. If "ps --forest" works on your system, it may suggest that some
other process is responsible for running the old Slurm tools and
daemon.
Andy
On 06/04/2015 12:49 PM, Andrew Petersen
wrote:
Re: old version of scontrol, sacct keeps resurrecting
I was able to narrow down my problem further. I
managed to stop all slurm daemons on head and slave
nodes. On head node, if I start the new slurm:
/.../new-slurm/etc/init.d/slurm start
the slurmctld daemon runs ok, but with error messages in
the /var/log/slurmctld file about Incompatible versions.
If I do "ps -u root -F |grep slurm", I can see why.
Periodically, the /../old-slurm/scontrol and squeue run.
It seems like somehow, the "new-slurm/etc/init.d/slurm
start" is triggering these old daemons to run. I check
the path in my $PATH, slurm.conf, and
/../new-slurm/etc/init.d/slurm file, and there is nothing
like that. I would appreciate any guidance on resolving
this.
Regards
Andrew
On Wed, Jun 3, 2015 at 4:52 PM, Andrew
Petersen <[email protected]>
wrote:
Hello
I installed a new version of slurm,
14.11.3. It works fine. However I noticed
that my log file /var/log/slurmctld shows
error: slurm_receive_msg: Incompatible
versions of client and server code
This led me to discover that old slurm
scontrol, squeue and sacct are still running
on the head node, using
ps -u root -F |grep slurm
I have tried to kill this every which way, but
they wont die, they keep resurrecting with
different pid's. I tried
/old-slurm-version/bin/scontrol shutdown
but this gives me
slurm_shutdown error: Zero Bytes were transmitted
or received
It seems like something is automatically restarting
the old slurm. I am using Bright Cluster Manager,
and I set it so that it does NOT auto-start or run
the slurm daemon, but that did not help.
Can someone help me kill this thing? It is causing
the creation of big log zip files, and using up cpu
capacity on the head node.
Regards
Andrew Petersen