What happens when you use netcat to ping the node? nc -z host.example.com 7777
David Murphy wrote: > Some additional data: > From Web1 ( New Fedora Machine) to Web2: > [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141 > > Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT > Nmap scan report for 192.168.102.141 > Host is up (0.000076s latency). > Not shown: 993 closed ports > PORT STATE SERVICE > 22/tcp open ssh > 80/tcp open http > 81/tcp open hosts2-ns > 111/tcp open rpcbind > 5666/tcp open nrpe > 7777/tcp open unknown > 9102/tcp open jetdirect > MAC Address: 00:50:56:A3:58:5D (VMware) > > Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds > > > From web2 -> web1 (new fedora machine) > [r...@web2 ~]# nmap 192.168.102.140 > > Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT > Interesting ports on 192.168.102.140: > Not shown: 994 closed ports > PORT STATE SERVICE > 22/tcp open ssh > 80/tcp open http > 81/tcp open hosts2-ns > 111/tcp open rpcbind > 443/tcp open https > 7777/tcp open unknown > MAC Address: 00:50:56:A3:14:62 (VMWare) > > Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds > > > Cluster.conf: > cluster: > node_count = 6 > name = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.140 > number = 1 > name = web1 > cluster = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.141 > number = 2 > name = web2 > cluster = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.142 > number = 3 > name = web3 > cluster = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.111 > number = 4 > name = rgapp1 > cluster = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.122 > number = 5 > name = deploy > cluster = appshare > > node: > ip_port = 7777 > ip_address = 192.168.102.112 > number = 6 > name = app1 > cluster = appshare > > DMESG on WEB1: > OCFS2 1.5.0 > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 2 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 3 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 4 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 5 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 6 after 30.0 seconds, giving up and returning errors. > (1262,0):dlm_request_join:1035 ERROR: status = -107 > (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 > (1262,0):dlm_join_domain:1487 ERROR: status = -107 > (1262,0):dlm_register_domain:1753 ERROR: status = -107 > (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 > (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 > (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 > ocfs2: Unmounting device (253,1) on (node 0) > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 2 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 3 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 5 after 30.0 seconds, giving up and returning errors. > (1199,0):o2net_connect_expired:1656 ERROR: no connection established > with node 6 after 30.0 seconds, giving up and returning errors. > (1323,0):dlm_request_join:1035 ERROR: status = -107 > (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 > (1323,0):dlm_join_domain:1487 ERROR: status = -107 > (1323,0):dlm_register_domain:1753 ERROR: status = -107 > (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 > (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 > (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 > ocfs2: Unmounting device (253,1) on (node 0) > VMCI: Major device number is: 249 > VMware memory control driver initialized > vmmemctl: started kernel thread pid=1522 > ocfs2: Unregistered cluster interface o2cb > OCFS2 Node Manager 1.5.0 > OCFS2 DLM 1.5.0 > ocfs2: Registered cluster interface o2cb > OCFS2 DLMFS 1.5.0 > OCFS2 User DLM kernel interface loaded > OCFS2 1.5.0 > (1810,0):o2net_connect_expired:1656 ERROR: no connection established > with node 4 after 30.0 seconds, giving up and returning errors. > (1810,0):o2net_connect_expired:1656 ERROR: no connection established > with node 5 after 30.0 seconds, giving up and returning errors. > (1810,0):o2net_connect_expired:1656 ERROR: no connection established > with node 6 after 30.0 seconds, giving up and returning errors. > (1810,0):o2net_connect_expired:1656 ERROR: no connection established > with node 2 after 30.0 seconds, giving up and returning errors. > (1810,0):o2net_connect_expired:1656 ERROR: no connection established > with node 3 after 30.0 seconds, giving up and returning errors. > (1839,0):dlm_request_join:1035 ERROR: status = -107 > (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 > (1839,0):dlm_join_domain:1487 ERROR: status = -107 > (1839,0):dlm_register_domain:1753 ERROR: status = -107 > (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 > (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 > (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 > ocfs2: Unmounting device (253,1) on (node 0) > > > > So clearly ocfs2 the service things it can connect to the node, but nmap > sees the connection just fine. And Web2 can see the port on web1 just fine, > so there is no firewall blocking the connections. > > I think it might be Fedora 12 used 1.50 for the OCFS kernel module and > CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? > > David > -----Original Message----- > From: Sunil Mushran [mailto:sunil.mush...@oracle.com] > Sent: Thursday, March 25, 2010 6:46 PM > To: David Murphy > Cc: ocfs2-users@oss.oracle.com > Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 > > hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and > populates configfs. AFAIK. > > David Murphy wrote: > >> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >> >> >> >> I decided to rebuild one node with FC12. >> >> >> >> >> >> Which is working fine, however >> >> >> >> Nmap 192.168.200.112 shows 7777 as open >> >> And >> >> >> >> O2cb_ctl is timing out when trying to connect to that node which then >> causes a 107 error. This happens with all node and all node have 7777 >> open via nmap from the FC machine. >> >> >> >> >> >> Is there a way to further debug this to see what exactly o2cb_ctl is >> seeing when trying to connect? >> >> >> >> >> >> David >> >> ---------------------------------------------------------------------- >> -- >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users