Some additional data: >From Web1 ( New Fedora Machine) to Web2: [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141
Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT Nmap scan report for 192.168.102.141 Host is up (0.000076s latency). Not shown: 993 closed ports PORT STATE SERVICE 22/tcp open ssh 80/tcp open http 81/tcp open hosts2-ns 111/tcp open rpcbind 5666/tcp open nrpe 7777/tcp open unknown 9102/tcp open jetdirect MAC Address: 00:50:56:A3:58:5D (VMware) Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >From web2 -> web1 (new fedora machine) [r...@web2 ~]# nmap 192.168.102.140 Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT Interesting ports on 192.168.102.140: Not shown: 994 closed ports PORT STATE SERVICE 22/tcp open ssh 80/tcp open http 81/tcp open hosts2-ns 111/tcp open rpcbind 443/tcp open https 7777/tcp open unknown MAC Address: 00:50:56:A3:14:62 (VMWare) Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds Cluster.conf: cluster: node_count = 6 name = appshare node: ip_port = 7777 ip_address = 192.168.102.140 number = 1 name = web1 cluster = appshare node: ip_port = 7777 ip_address = 192.168.102.141 number = 2 name = web2 cluster = appshare node: ip_port = 7777 ip_address = 192.168.102.142 number = 3 name = web3 cluster = appshare node: ip_port = 7777 ip_address = 192.168.102.111 number = 4 name = rgapp1 cluster = appshare node: ip_port = 7777 ip_address = 192.168.102.122 number = 5 name = deploy cluster = appshare node: ip_port = 7777 ip_address = 192.168.102.112 number = 6 name = app1 cluster = appshare DMESG on WEB1: OCFS2 1.5.0 (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 3 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 4 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 5 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 6 after 30.0 seconds, giving up and returning errors. (1262,0):dlm_request_join:1035 ERROR: status = -107 (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 (1262,0):dlm_join_domain:1487 ERROR: status = -107 (1262,0):dlm_register_domain:1753 ERROR: status = -107 (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 ocfs2: Unmounting device (253,1) on (node 0) (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 3 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 5 after 30.0 seconds, giving up and returning errors. (1199,0):o2net_connect_expired:1656 ERROR: no connection established with node 6 after 30.0 seconds, giving up and returning errors. (1323,0):dlm_request_join:1035 ERROR: status = -107 (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 (1323,0):dlm_join_domain:1487 ERROR: status = -107 (1323,0):dlm_register_domain:1753 ERROR: status = -107 (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 ocfs2: Unmounting device (253,1) on (node 0) VMCI: Major device number is: 249 VMware memory control driver initialized vmmemctl: started kernel thread pid=1522 ocfs2: Unregistered cluster interface o2cb OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded OCFS2 1.5.0 (1810,0):o2net_connect_expired:1656 ERROR: no connection established with node 4 after 30.0 seconds, giving up and returning errors. (1810,0):o2net_connect_expired:1656 ERROR: no connection established with node 5 after 30.0 seconds, giving up and returning errors. (1810,0):o2net_connect_expired:1656 ERROR: no connection established with node 6 after 30.0 seconds, giving up and returning errors. (1810,0):o2net_connect_expired:1656 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors. (1810,0):o2net_connect_expired:1656 ERROR: no connection established with node 3 after 30.0 seconds, giving up and returning errors. (1839,0):dlm_request_join:1035 ERROR: status = -107 (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 (1839,0):dlm_join_domain:1487 ERROR: status = -107 (1839,0):dlm_register_domain:1753 ERROR: status = -107 (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 ocfs2: Unmounting device (253,1) on (node 0) So clearly ocfs2 the service things it can connect to the node, but nmap sees the connection just fine. And Web2 can see the port on web1 just fine, so there is no firewall blocking the connections. I think it might be Fedora 12 used 1.50 for the OCFS kernel module and CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? David -----Original Message----- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Thursday, March 25, 2010 6:46 PM To: David Murphy Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and populates configfs. AFAIK. David Murphy wrote: > > We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. > > > > I decided to rebuild one node with FC12. > > > > > > Which is working fine, however > > > > Nmap 192.168.200.112 shows 7777 as open > > And > > > > O2cb_ctl is timing out when trying to connect to that node which then > causes a 107 error. This happens with all node and all node have 7777 > open via nmap from the FC machine. > > > > > > Is there a way to further debug this to see what exactly o2cb_ctl is > seeing when trying to connect? > > > > > > David > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users