Having never seen your system, here are a couple of shots in the
 dark:
 
 1. Do you have a cron job that might be starting Slurm, perhaps for
 a simple failure-restart solution?
 2. If "ps --forest" works on your system, it may suggest that some
 other process is responsible for running the old Slurm tools and
 daemon.
 
 Andy
 On 06/04/2015 12:49 PM, Andrew Petersen
   wrote:
   Re: old version of scontrol, sacct keeps resurrecting
             I was able to narrow down my problem further.  I
               managed to stop all slurm daemons on head and slave
               nodes.  On head node, if I start the new slurm:
             
             /.../new-slurm/etc/init.d/slurm start
           
           the slurmctld daemon runs ok, but with error messages in
           the /var/log/slurmctld file about Incompatible versions. 
           If I do "ps -u root -F |grep slurm", I can see why. 
           Periodically, the /../old-slurm/scontrol and squeue run. 
           It seems like somehow, the "new-slurm/etc/init.d/slurm
           start" is triggering these old daemons to run.  I check
           the path in my $PATH, slurm.conf, and
           /../new-slurm/etc/init.d/slurm file, and there is nothing
           like that.  I would appreciate any guidance on resolving
           this.
       Regards 
     
     Andrew
     On Wed, Jun 3, 2015 at 4:52 PM, Andrew
       Petersen <[email protected]>
       wrote:
                         Hello
                         I installed a new version of slurm,
                         14.11.3.  It works fine.  However I noticed
                         that my log file /var/log/slurmctld shows
                         error: slurm_receive_msg: Incompatible
                         versions of client and server code
                       
                       This led me to discover that old slurm
                       scontrol, squeue and sacct are still running
                       on the head node, using
                       ps -u root -F |grep slurm
                     I have tried to kill this every which way, but
                     they wont die, they keep resurrecting with
                     different pid's.  I tried
                      /old-slurm-version/bin/scontrol shutdown
                   
                   but this gives me
                   slurm_shutdown error: Zero Bytes were transmitted
                   or received
                 It seems like something is automatically restarting
                 the old slurm.  I am using Bright Cluster Manager,
                 and I set it so that it does NOT auto-start or run
                 the slurm daemon, but that did not help.
               Can someone help me kill this thing?  It is causing
               the creation of big log zip files, and using up cpu
               capacity on the head node.
             Regards 
           
           Andrew Petersen

Reply via email to