# debugfs.ocfs2 -l TCP allow /dev/mapper/OCFS2_200Gp1 Enable it by "allow"ing the tracing.
Also, do it on both nodes. The node you are mounting and any one node. Say node 2. David Murphy wrote: > [r...@web1 /dev]# debugfs.ocfs2 -l TCP off /dev/mapper/OCFS2_200Gp1 > [r...@web1 /dev]# mount /dev/mapper/OCFS2_200Gp1 -v > device=/dev/mapper/OCFS2_200Gp1 > mount.ocfs2: Transport endpoint is not connected while mounting > /dev/mapper/OCFS2_200Gp1 on /mnt/appshare. Check 'dmesg' for more > information on this error. > [r...@web1 /dev]#dmesg > > DMESG: > Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 > ERROR: no connection established with node 2 after 30.0 seconds, giving up > and returning errors. > Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 > ERROR: no connection established with node 3 after 30.0 seconds, giving up > and returning errors. > Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 > ERROR: no connection established with node 4 after 30.0 seconds, giving up > and returning errors. > Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 > ERROR: no connection established with node 5 after 30.0 seconds, giving up > and returning errors. > Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 > ERROR: no connection established with node 6 after 30.0 seconds, giving up > and returning errors. > Mar 30 10:23:38 web1 kernel: (1740,0):dlm_request_join:1035 ERROR: > status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):dlm_try_to_join_domain:1209 > ERROR: status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):dlm_join_domain:1487 ERROR: > status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):dlm_register_domain:1753 > ERROR: status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):o2cb_cluster_connect:313 > ERROR: status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_dlm_init:2963 ERROR: > status = -107 > Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_mount_volume:1788 ERROR: > status = -107 > Mar 30 10:23:38 web1 kernel: ocfs2: Unmounting device (253,1) on > (node 0) > > DEBUGFS: > debugfs: curdev > /dev/mapper/OCFS2_200Gp1 > debugfs: controld dump > controld: Unable to access cluster service while obtaining the debug > buffer > debugfs: slotmap > Slot# Node# > 0 3 > 1 5 > 2 2 > 4 4 > 5 6 > debugfs: stats > Revision: 0.90 > Mount Count: 0 Max Mount Count: 20 > State: 0 Errors: 0 > Check Interval: 0 Last Check: Mon Mar 29 10:53:52 2010 > Creator OS: 0 > Feature Compat: 1 backup-super > Feature Incompat: 16 sparse > Tunefs Incomplete: 0 > Feature RO compat: 1 unwritten > Root Blknum: 5 System Dir Blknum: 6 > First Cluster Group Blknum: 3 > Block Size Bits: 12 Cluster Size Bits: 12 > Max Node Slots: 6 > Extended Attributes Inline Size: 0 > Label: OCFS2_APPSHARE_200G > UUID: D6E0DD0AAC8844ED94A4A459FBB6F7FF > UUID_hash: 0 (0x0) > Cluster stack: classic o2cb > Inode: 2 Mode: 00 Generation: 2428834932 (0x90c51474) > FS Generation: 2428834932 (0x90c51474) > CRC32: 00000000 ECC: 0000 > Type: Unknown Attr: 0x0 Flags: Valid System Superblock > Dynamic Features: (0x0) > User: 0 (root) Group: 0 (root) Size: 0 > Links: 0 Clusters: 52428119 > ctime: 0x4a0b2372 -- Wed May 13 14:45:54 2009 > atime: 0x0 -- Wed Dec 31 18:00:00 1969 > mtime: 0x4a0b2372 -- Wed May 13 14:45:54 2009 > dtime: 0x0 -- Wed Dec 31 18:00:00 1969 > ctime_nsec: 0x00000000 -- 0 > atime_nsec: 0x00000000 -- 0 > mtime_nsec: 0x00000000 -- 0 > Last Extblk: 0 > Sub Alloc Slot: Global Sub Alloc Bit: 65535 > > > > It doesn't appear any extra debug logging actually was created. > > David > -----Original Message----- > From: Sunil Mushran [mailto:sunil.mush...@oracle.com] > Sent: Monday, March 29, 2010 10:23 PM > To: Angelo McComis > Cc: David Murphy; ocfs2-users@oss.oracle.com > Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 > > No > > On Mar 29, 2010, at 8:10 PM, Angelo McComis <ang...@mccomis.com> wrote: > > >> Does it matter that the nodes are numbered 1-6 instead of 0-5? >> >> >> >> On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran >> <sunil.mush...@oracle.com >> >>> wrote: >>> Enable some debugging. >>> >>> #debugfs.ocfs2 -l TCP allow >>> ...do mount... >>> #debugfs.ocfs2 -l TCP off >>> >>> >>> David Murphy wrote: >>> >>>> [r...@web2 ~]# nc -z 192.168.102.140 7777 Connection to >>>> 192.168.102.140 7777 port [tcp/cbt] succeeded! >>>> >>>> [r...@web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 >>>> 7777 Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded! >>>> >>>> -----Original Message----- >>>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>>> Sent: Monday, March 29, 2010 5:08 PM >>>> To: David Murphy >>>> Cc: ocfs2-users@oss.oracle.com >>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>>> >>>> What happens when you use netcat to ping the node? >>>> nc -z host.example.com 7777 >>>> >>>> David Murphy wrote: >>>> >>>> >>>>> Some additional data: >>>>> From Web1 ( New Fedora Machine) to Web2: >>>>> [r...@web1 /etc/sysconfig/network-scripts]# nmap >>>>> 192.168.102.141 >>>>> >>>>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT >>>>> Nmap scan report for 192.168.102.141 >>>>> Host is up (0.000076s latency). >>>>> Not shown: 993 closed ports >>>>> PORT STATE SERVICE >>>>> 22/tcp open ssh >>>>> 80/tcp open http >>>>> 81/tcp open hosts2-ns >>>>> 111/tcp open rpcbind >>>>> 5666/tcp open nrpe >>>>> 7777/tcp open unknown >>>>> 9102/tcp open jetdirect >>>>> MAC Address: 00:50:56:A3:58:5D (VMware) >>>>> >>>>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >>>>> >>>>> >>>>> From web2 -> web1 (new fedora machine) >>>>> [r...@web2 ~]# nmap 192.168.102.140 >>>>> >>>>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT >>>>> Interesting ports on 192.168.102.140: >>>>> Not shown: 994 closed ports >>>>> PORT STATE SERVICE >>>>> 22/tcp open ssh >>>>> 80/tcp open http >>>>> 81/tcp open hosts2-ns >>>>> 111/tcp open rpcbind >>>>> 443/tcp open https >>>>> 7777/tcp open unknown >>>>> MAC Address: 00:50:56:A3:14:62 (VMWare) >>>>> >>>>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds >>>>> >>>>> >>>>> Cluster.conf: >>>>> cluster: >>>>> node_count = 6 >>>>> name = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.140 >>>>> number = 1 >>>>> name = web1 >>>>> cluster = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.141 >>>>> number = 2 >>>>> name = web2 >>>>> cluster = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.142 >>>>> number = 3 >>>>> name = web3 >>>>> cluster = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.111 >>>>> number = 4 >>>>> name = rgapp1 >>>>> cluster = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.122 >>>>> number = 5 >>>>> name = deploy >>>>> cluster = appshare >>>>> >>>>> node: >>>>> ip_port = 7777 >>>>> ip_address = 192.168.102.112 >>>>> number = 6 >>>>> name = app1 >>>>> cluster = appshare >>>>> >>>>> DMESG on WEB1: >>>>> OCFS2 1.5.0 >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 2 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 3 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 4 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 5 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 6 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1262,0):dlm_request_join:1035 ERROR: status = -107 >>>>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>>> (1262,0):dlm_join_domain:1487 ERROR: status = -107 >>>>> (1262,0):dlm_register_domain:1753 ERROR: status = -107 >>>>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>>> ocfs2: Unmounting device (253,1) on (node 0) >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 2 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 3 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 5 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 6 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1323,0):dlm_request_join:1035 ERROR: status = -107 >>>>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>>> (1323,0):dlm_join_domain:1487 ERROR: status = -107 >>>>> (1323,0):dlm_register_domain:1753 ERROR: status = -107 >>>>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>>> ocfs2: Unmounting device (253,1) on (node 0) >>>>> VMCI: Major device number is: 249 >>>>> VMware memory control driver initialized >>>>> vmmemctl: started kernel thread pid=1522 >>>>> ocfs2: Unregistered cluster interface o2cb >>>>> OCFS2 Node Manager 1.5.0 >>>>> OCFS2 DLM 1.5.0 >>>>> ocfs2: Registered cluster interface o2cb >>>>> OCFS2 DLMFS 1.5.0 >>>>> OCFS2 User DLM kernel interface loaded >>>>> OCFS2 1.5.0 >>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 4 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 5 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 6 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 2 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>>> established with node 3 after 30.0 seconds, giving up and returning >>>>> errors. >>>>> (1839,0):dlm_request_join:1035 ERROR: status = -107 >>>>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>>> (1839,0):dlm_join_domain:1487 ERROR: status = -107 >>>>> (1839,0):dlm_register_domain:1753 ERROR: status = -107 >>>>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>>> ocfs2: Unmounting device (253,1) on (node 0) >>>>> >>>>> >>>>> >>>>> So clearly ocfs2 the service things it can connect to the node, >>>>> but nmap sees the connection just fine. And Web2 can see the port >>>>> on web1 just >>>>> >>>>> >>>> fine, >>>> >>>> >>>>> so there is no firewall blocking the connections. >>>>> >>>>> I think it might be Fedora 12 used 1.50 for the OCFS kernel >>>>> module and >>>>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? >>>>> >>>>> David >>>>> -----Original Message----- >>>>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>>>> Sent: Thursday, March 25, 2010 6:46 PM >>>>> To: David Murphy >>>>> Cc: ocfs2-users@oss.oracle.com >>>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>>>> >>>>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf >>>>> and populates configfs. AFAIK. >>>>> >>>>> David Murphy wrote: >>>>> >>>>> >>>>> >>>>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >>>>>> >>>>>> >>>>>> >>>>>> I decided to rebuild one node with FC12. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Which is working fine, however >>>>>> >>>>>> >>>>>> >>>>>> Nmap 192.168.200.112 shows 7777 as open >>>>>> >>>>>> And >>>>>> >>>>>> >>>>>> >>>>>> O2cb_ctl is timing out when trying to connect to that node which >>>>>> then causes a 107 error. This happens with all node and all node >>>>>> have >>>>>> 7777 >>>>>> open via nmap from the FC machine. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Is there a way to further debug this to see what exactly >>>>>> o2cb_ctl is >>>>>> seeing when trying to connect? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> David >>>>>> >>>>>> --- >>>>>> --- >>>>>> ---------------------------------------------------------------- >>>>>> -- >>>>>> >>>>>> _______________________________________________ >>>>>> Ocfs2-users mailing list >>>>>> Ocfs2-users@oss.oracle.com >>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users@oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >>> > > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users