No On Mar 29, 2010, at 8:10 PM, Angelo McComis <ang...@mccomis.com> wrote:
> Does it matter that the nodes are numbered 1-6 instead of 0-5? > > > > On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran <sunil.mush...@oracle.com > > wrote: >> Enable some debugging. >> >> #debugfs.ocfs2 -l TCP allow >> ...do mount... >> #debugfs.ocfs2 -l TCP off >> >> >> David Murphy wrote: >>> [r...@web2 ~]# nc -z 192.168.102.140 7777 >>> Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded! >>> >>> [r...@web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 7777 >>> Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded! >>> >>> -----Original Message----- >>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>> Sent: Monday, March 29, 2010 5:08 PM >>> To: David Murphy >>> Cc: ocfs2-users@oss.oracle.com >>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>> >>> What happens when you use netcat to ping the node? >>> nc -z host.example.com 7777 >>> >>> David Murphy wrote: >>> >>>> Some additional data: >>>> From Web1 ( New Fedora Machine) to Web2: >>>> [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141 >>>> >>>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT >>>> Nmap scan report for 192.168.102.141 >>>> Host is up (0.000076s latency). >>>> Not shown: 993 closed ports >>>> PORT STATE SERVICE >>>> 22/tcp open ssh >>>> 80/tcp open http >>>> 81/tcp open hosts2-ns >>>> 111/tcp open rpcbind >>>> 5666/tcp open nrpe >>>> 7777/tcp open unknown >>>> 9102/tcp open jetdirect >>>> MAC Address: 00:50:56:A3:58:5D (VMware) >>>> >>>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >>>> >>>> >>>> From web2 -> web1 (new fedora machine) >>>> [r...@web2 ~]# nmap 192.168.102.140 >>>> >>>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT >>>> Interesting ports on 192.168.102.140: >>>> Not shown: 994 closed ports >>>> PORT STATE SERVICE >>>> 22/tcp open ssh >>>> 80/tcp open http >>>> 81/tcp open hosts2-ns >>>> 111/tcp open rpcbind >>>> 443/tcp open https >>>> 7777/tcp open unknown >>>> MAC Address: 00:50:56:A3:14:62 (VMWare) >>>> >>>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds >>>> >>>> >>>> Cluster.conf: >>>> cluster: >>>> node_count = 6 >>>> name = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.140 >>>> number = 1 >>>> name = web1 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.141 >>>> number = 2 >>>> name = web2 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.142 >>>> number = 3 >>>> name = web3 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.111 >>>> number = 4 >>>> name = rgapp1 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.122 >>>> number = 5 >>>> name = deploy >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.112 >>>> number = 6 >>>> name = app1 >>>> cluster = appshare >>>> >>>> DMESG on WEB1: >>>> OCFS2 1.5.0 >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 2 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 3 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 4 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 5 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 6 after 30.0 seconds, giving up and returning errors. >>>> (1262,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1262,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1262,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 2 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 3 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 5 after 30.0 seconds, giving up and returning errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 6 after 30.0 seconds, giving up and returning errors. >>>> (1323,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1323,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1323,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> VMCI: Major device number is: 249 >>>> VMware memory control driver initialized >>>> vmmemctl: started kernel thread pid=1522 >>>> ocfs2: Unregistered cluster interface o2cb >>>> OCFS2 Node Manager 1.5.0 >>>> OCFS2 DLM 1.5.0 >>>> ocfs2: Registered cluster interface o2cb >>>> OCFS2 DLMFS 1.5.0 >>>> OCFS2 User DLM kernel interface loaded >>>> OCFS2 1.5.0 >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 4 after 30.0 seconds, giving up and returning errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 5 after 30.0 seconds, giving up and returning errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 6 after 30.0 seconds, giving up and returning errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 2 after 30.0 seconds, giving up and returning errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established >>>> with node 3 after 30.0 seconds, giving up and returning errors. >>>> (1839,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1839,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1839,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> >>>> >>>> >>>> So clearly ocfs2 the service things it can connect to the node, >>>> but nmap >>>> sees the connection just fine. And Web2 can see the port on web1 >>>> just >>>> >>> fine, >>> >>>> so there is no firewall blocking the connections. >>>> >>>> I think it might be Fedora 12 used 1.50 for the OCFS kernel >>>> module and >>>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? >>>> >>>> David >>>> -----Original Message----- >>>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>>> Sent: Thursday, March 25, 2010 6:46 PM >>>> To: David Murphy >>>> Cc: ocfs2-users@oss.oracle.com >>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>>> >>>> hmm.. o2cb_ctl makes no connections. It just reads the >>>> cluster.conf and >>>> populates configfs. AFAIK. >>>> >>>> David Murphy wrote: >>>> >>>> >>>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >>>>> >>>>> >>>>> >>>>> I decided to rebuild one node with FC12. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Which is working fine, however >>>>> >>>>> >>>>> >>>>> Nmap 192.168.200.112 shows 7777 as open >>>>> >>>>> And >>>>> >>>>> >>>>> >>>>> O2cb_ctl is timing out when trying to connect to that node >>>>> which then >>>>> causes a 107 error. This happens with all node and all node have >>>>> 7777 >>>>> open via nmap from the FC machine. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Is there a way to further debug this to see what exactly >>>>> o2cb_ctl is >>>>> seeing when trying to connect? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> David >>>>> >>>>> --- >>>>> --- >>>>> ---------------------------------------------------------------- >>>>> -- >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-users mailing list >>>>> Ocfs2-users@oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users