Also when i do /etc/init.d/slurm start on the node, nothing happens, no log
file, nor any error message

On Tue, Feb 22, 2011 at 6:43 PM, Paul Thirumalai
<[email protected]>wrote:

> Abotu an hour ago I found that some of the nodes in my environment were
> running an old outdated version of slurmd. By that I mean that I had tried
> to stop all slurm daemons using the script
> /etc/slurm/stop_all.sh. This had not killed the slurm daemoon on some
> nodes.
>
> So I explicitly logged into these nodes and did a "kill -9 <pid>" which
> killed the slurmd process.
> I then deleted the file /var/run/slurmd.pid from each of these nodes.
>
> I then tried to start up slurm using the /etc/slurm/start_all.sh script.
> This time the slurm daemon wont start on the same nodes where I had
> explicitly kill the slurm daemon using the kill command
>
> I did a netstat on some of these nodes to make sure that the SlurmdPort was
> not locked, and it was not.
>
> Is there something I am missing. perhaps a lock file that I should have
> deleted and did not. Any help is appreciated.
>

Reply via email to