Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-13 Thread Ole Holm Nielsen
Hi Janna, If you're running an old Slurm version, there may be bugs already resolved in the later versions. You can search for bugs with ReqNodeNotAvail in the title: https://bugs.schedmd.com/buglist.cgi?quicksearch=ReqNodeNotAvail For example, this one might be relevant:

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-12 Thread Ole Holm Nielsen
In case your Arp cache is the problem, there is some advice in the Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks I think there are other causes for ReqNodeNotAvail, for example, the node being allocated for other jobs. The "scontrol

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread mercan
Hi Janna; It sounds like a Arp cache table problem to me. If your slurm head node can reachable ~1000 or more network devices (all connected network cards, switches etc., even they are reachable by different ports of the server), you need to increse some network settings at headnode and

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread Chris Samuel
On Friday, 10 July 2020 3:34:44 PM PDT Janna Ore Nugent wrote: > I’ve got an intermittent situation with gpu nodes that sinfo says are > available and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled > the nodes to restart services but it hasn’t helped. Any suggestions for >

[slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread Janna Ore Nugent
Hi All, I’ve got an intermittent situation with gpu nodes that sinfo says are available and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled the nodes to restart services but it hasn’t helped. Any suggestions for resolving this or digging into it more deeply? Thanks, Janna Janna