Also when i do /etc/init.d/slurm start on the node, nothing happens, no log file, nor any error message
On Tue, Feb 22, 2011 at 6:43 PM, Paul Thirumalai <[email protected]>wrote: > Abotu an hour ago I found that some of the nodes in my environment were > running an old outdated version of slurmd. By that I mean that I had tried > to stop all slurm daemons using the script > /etc/slurm/stop_all.sh. This had not killed the slurm daemoon on some > nodes. > > So I explicitly logged into these nodes and did a "kill -9 <pid>" which > killed the slurmd process. > I then deleted the file /var/run/slurmd.pid from each of these nodes. > > I then tried to start up slurm using the /etc/slurm/start_all.sh script. > This time the slurm daemon wont start on the same nodes where I had > explicitly kill the slurm daemon using the kill command > > I did a netstat on some of these nodes to make sure that the SlurmdPort was > not locked, and it was not. > > Is there something I am missing. perhaps a lock file that I should have > deleted and did not. Any help is appreciated. >
