Does it matter that the nodes are numbered 1-6 instead of 0-5?
On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran <sunil.mush...@oracle.com> wrote: > Enable some debugging. > > #debugfs.ocfs2 -l TCP allow > ...do mount... > #debugfs.ocfs2 -l TCP off > > > David Murphy wrote: >> [r...@web2 ~]# nc -z 192.168.102.140 7777 >> Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded! >> >> [r...@web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 7777 >> Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded! >> >> -----Original Message----- >> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >> Sent: Monday, March 29, 2010 5:08 PM >> To: David Murphy >> Cc: ocfs2-users@oss.oracle.com >> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >> >> What happens when you use netcat to ping the node? >> nc -z host.example.com 7777 >> >> David Murphy wrote: >> >>> Some additional data: >>> From Web1 ( New Fedora Machine) to Web2: >>> [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141 >>> >>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT >>> Nmap scan report for 192.168.102.141 >>> Host is up (0.000076s latency). >>> Not shown: 993 closed ports >>> PORT STATE SERVICE >>> 22/tcp open ssh >>> 80/tcp open http >>> 81/tcp open hosts2-ns >>> 111/tcp open rpcbind >>> 5666/tcp open nrpe >>> 7777/tcp open unknown >>> 9102/tcp open jetdirect >>> MAC Address: 00:50:56:A3:58:5D (VMware) >>> >>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >>> >>> >>> From web2 -> web1 (new fedora machine) >>> [r...@web2 ~]# nmap 192.168.102.140 >>> >>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT >>> Interesting ports on 192.168.102.140: >>> Not shown: 994 closed ports >>> PORT STATE SERVICE >>> 22/tcp open ssh >>> 80/tcp open http >>> 81/tcp open hosts2-ns >>> 111/tcp open rpcbind >>> 443/tcp open https >>> 7777/tcp open unknown >>> MAC Address: 00:50:56:A3:14:62 (VMWare) >>> >>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds >>> >>> >>> Cluster.conf: >>> cluster: >>> node_count = 6 >>> name = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.140 >>> number = 1 >>> name = web1 >>> cluster = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.141 >>> number = 2 >>> name = web2 >>> cluster = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.142 >>> number = 3 >>> name = web3 >>> cluster = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.111 >>> number = 4 >>> name = rgapp1 >>> cluster = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.122 >>> number = 5 >>> name = deploy >>> cluster = appshare >>> >>> node: >>> ip_port = 7777 >>> ip_address = 192.168.102.112 >>> number = 6 >>> name = app1 >>> cluster = appshare >>> >>> DMESG on WEB1: >>> OCFS2 1.5.0 >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 2 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 3 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 4 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 5 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 6 after 30.0 seconds, giving up and returning errors. >>> (1262,0):dlm_request_join:1035 ERROR: status = -107 >>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>> (1262,0):dlm_join_domain:1487 ERROR: status = -107 >>> (1262,0):dlm_register_domain:1753 ERROR: status = -107 >>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 >>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>> ocfs2: Unmounting device (253,1) on (node 0) >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 2 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 3 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 5 after 30.0 seconds, giving up and returning errors. >>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 6 after 30.0 seconds, giving up and returning errors. >>> (1323,0):dlm_request_join:1035 ERROR: status = -107 >>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>> (1323,0):dlm_join_domain:1487 ERROR: status = -107 >>> (1323,0):dlm_register_domain:1753 ERROR: status = -107 >>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 >>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>> ocfs2: Unmounting device (253,1) on (node 0) >>> VMCI: Major device number is: 249 >>> VMware memory control driver initialized >>> vmmemctl: started kernel thread pid=1522 >>> ocfs2: Unregistered cluster interface o2cb >>> OCFS2 Node Manager 1.5.0 >>> OCFS2 DLM 1.5.0 >>> ocfs2: Registered cluster interface o2cb >>> OCFS2 DLMFS 1.5.0 >>> OCFS2 User DLM kernel interface loaded >>> OCFS2 1.5.0 >>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 4 after 30.0 seconds, giving up and returning errors. >>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 5 after 30.0 seconds, giving up and returning errors. >>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 6 after 30.0 seconds, giving up and returning errors. >>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 2 after 30.0 seconds, giving up and returning errors. >>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >>> with node 3 after 30.0 seconds, giving up and returning errors. >>> (1839,0):dlm_request_join:1035 ERROR: status = -107 >>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>> (1839,0):dlm_join_domain:1487 ERROR: status = -107 >>> (1839,0):dlm_register_domain:1753 ERROR: status = -107 >>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 >>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>> ocfs2: Unmounting device (253,1) on (node 0) >>> >>> >>> >>> So clearly ocfs2 the service things it can connect to the node, but nmap >>> sees the connection just fine. And Web2 can see the port on web1 just >>> >> fine, >> >>> so there is no firewall blocking the connections. >>> >>> I think it might be Fedora 12 used 1.50 for the OCFS kernel module and >>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? >>> >>> David >>> -----Original Message----- >>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>> Sent: Thursday, March 25, 2010 6:46 PM >>> To: David Murphy >>> Cc: ocfs2-users@oss.oracle.com >>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>> >>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and >>> populates configfs. AFAIK. >>> >>> David Murphy wrote: >>> >>> >>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >>>> >>>> >>>> >>>> I decided to rebuild one node with FC12. >>>> >>>> >>>> >>>> >>>> >>>> Which is working fine, however >>>> >>>> >>>> >>>> Nmap 192.168.200.112 shows 7777 as open >>>> >>>> And >>>> >>>> >>>> >>>> O2cb_ctl is timing out when trying to connect to that node which then >>>> causes a 107 error. This happens with all node and all node have 7777 >>>> open via nmap from the FC machine. >>>> >>>> >>>> >>>> >>>> >>>> Is there a way to further debug this to see what exactly o2cb_ctl is >>>> seeing when trying to connect? >>>> >>>> >>>> >>>> >>>> >>>> David >>>> >>>> ---------------------------------------------------------------------- >>>> -- >>>> >>>> _______________________________________________ >>>> Ocfs2-users mailing list >>>> Ocfs2-users@oss.oracle.com >>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>> >>>> >>> >>> >> >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users