Glad it worked.  I have been burned like that before too.

Eric

On 6/26/15 8:52 AM, Trevor Gale wrote:

That was it! I had Iptables disabled on one node and not on the other, but once 
I disabled it again it worked!

Thanks so much!

-Trevor

On Jun 25, 2015, at 5:56 PM, Eric Lund <[email protected]> wrote:


Also check your iptables config on both nodes, there might be a firewall rule 
hanging you up...  It sounds like you can talk back to the head node on an 
established socket but you can't establish a socket to the head node from the 
compute node.

Eric

On 6/25/15 4:38 PM, Peter Kjellström wrote:
On Thu, 25 Jun 2015 14:15:17 -0700
Trevor Gale <[email protected]> wrote:
Hello all,

I am experiencing an odd issue where my head node can see the compute
node but the compute node cannot see the headnode. If I run “sinfo”
on the head node I see both nodes in the state idle, but I can’t run
sinfo on the compute node. If i look at the head nodes logs I see no
issues, and I see things like “node_did_resp compute0”. but if I look
at the compute nodes log I see “slurm connect failed: no route to
host”. I am using the IP addresses that I assigned the nodes in my
IPoIB config, and I know these IPs work normally (I can ssh, scp, and
ping with them), but for some reason the compute node does not see
the head node.

Does anyone have any idea what the issue might be?

Two ideas:

1) You have different slurm.conf files with different node definitions
across you cluster causing connectivity problems.

2) You have actual IPoIB connectivity problems, maybe the quite recent
rhel6/centos-6 bug that caused islands of connectivity under certain
circumstances? (fixed in -504.16.2).

/Peter

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to