Hi,

Inherited an xcat cluster and having unusual issues with name
resolution from nodes (which is stopping PBS from functioning)


symptom:

>From node5:

[root@separatrix bin]# ssh node5

Last login: Wed Feb  6 18:23:08 2013 from 172.20.0.1
[root@node5 ~]#  telnet separatrix.hpc 42559
separatrix.hpc/42559: Temporary failure in name resolution
[root@node5 ~]#  telnet separatrix.hpc 42559
separatrix.hpc/42559: Temporary failure in name resolution
[root@node5 ~]#  telnet 172.20.0.1 42559
Trying 172.20.0.1...
telnet: connect to address 172.20.0.1: No route to host
telnet: Unable to connect to remote host: No route to host

however ssh from node to masterrnode does work



[root@node5 ~]# qstat | more
Cannot resolve default server host 'separatrix.hpc' - check server_name file.
qstat: cannot connect to server separatrix.hpc (errno=15008) Access
from host not allowed, or unknown host

routing:

[root@node5 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.25.0.0      0.0.0.0         255.255.0.0     U     0      0        0 ib0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 usb0
172.20.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth0


on the masternode name resolution is working for nodes, cluster name
and localhost:

[root@separatrix log]# nslookup node5
Server:         172.20.0.1
Address:        172.20.0.1#53

Name:   node5.hpc
Address: 172.20.101.5

Connection to node5 closed.
[root@separatrix log]# nslookup node5
Server:         172.20.0.1
Address:        172.20.0.1#53

Name:   node5.hpc
Address: 172.20.101.5

[root@separatrix log]# nslookup separatrix
Server:         172.20.0.1
Address:        172.20.0.1#53

Name:   separatrix.hpc
Address: 172.20.0.1

[root@separatrix log]# nslookup separatrix.hpc
Server:         172.20.0.1
Address:        172.20.0.1#53

Name:   separatrix.hpc
Address: 172.20.0.1

[root@separatrix log]# nslookup 127.0.0.1
Server:         172.20.0.1
Address:        172.20.0.1#53

1.0.0.127.in-addr.arpa  name = localhost.


Masternode PBS has error of:

Feb  7 15:56:07 separatrix PBS_Server: LOG_ERROR::is_request, bad
attempt to connect from 127.0.0.1:1022 (address not trusted - check
entry in server_priv/nodes)

Selinux: off
Tested with and without iptables on

Master routing:

[root@separatrix log]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
129.98.105.0    0.0.0.0         255.255.255.0   U     0      0        0 bond1
172.25.0.0      0.0.0.0         255.255.0.0     U     0      0        0 ib0
172.30.0.0      0.0.0.0         255.255.0.0     U     0      0        0 bond0
172.29.0.0      0.0.0.0         255.255.0.0     U     0      0        0 bond0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 usb0
172.20.0.0      0.0.0.0         255.255.0.0     U     0      0        0 bond0
0.0.0.0         129.98.105.1    0.0.0.0         UG    0      0        0 bond1

master and nodes are on different segments:

127.0.0.1               separatrix.hpc separatrix localhost.localdomain 
localhost
::1             localhost6.localdomain6 localhost6
172.20.0.1 separatrix separatrix.hpc
172.25.0.1 separatrix-ib0 separatrix-ib0.hpc
172.20.101.1 node1 node1.hpc
172.20.101.10 node10 node10.hpc


best,
joe

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to