That was it! I had Iptables disabled on one node and not on the other, but once I disabled it again it worked!
Thanks so much! -Trevor > On Jun 25, 2015, at 5:56 PM, Eric Lund <[email protected]> wrote: > > > Also check your iptables config on both nodes, there might be a firewall rule > hanging you up... It sounds like you can talk back to the head node on an > established socket but you can't establish a socket to the head node from the > compute node. > > Eric > > On 6/25/15 4:38 PM, Peter Kjellström wrote: >> On Thu, 25 Jun 2015 14:15:17 -0700 >> Trevor Gale <[email protected]> wrote: >> > Hello all, >> > >> > I am experiencing an odd issue where my head node can see the compute >> > node but the compute node cannot see the headnode. If I run “sinfo” >> > on the head node I see both nodes in the state idle, but I can’t run >> > sinfo on the compute node. If i look at the head nodes logs I see no >> > issues, and I see things like “node_did_resp compute0”. but if I look >> > at the compute nodes log I see “slurm connect failed: no route to >> > host”. I am using the IP addresses that I assigned the nodes in my >> > IPoIB config, and I know these IPs work normally (I can ssh, scp, and >> > ping with them), but for some reason the compute node does not see >> > the head node. >> > >> > Does anyone have any idea what the issue might be? >> >> Two ideas: >> >> 1) You have different slurm.conf files with different node definitions >> across you cluster causing connectivity problems. >> >> 2) You have actual IPoIB connectivity problems, maybe the quite recent >> rhel6/centos-6 bug that caused islands of connectivity under certain >> circumstances? (fixed in -504.16.2). >> >> /Peter >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity.
