Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
Add one for --srcport as well and I think you'll be ok. Actually, since my cluster traffic all goes over a separate switch I usually just allow all traffic in/out of eth1. Brian Bret Palsson b...@getjive.com 2009-01-15 08:12: So it looks like iptables is what is stopping it from working. After disabling iptables completely for 1 minute then trying to mount on node 1 it worked fine. So my new question is why did `iptables -A INPUT -ptcp --dport -j ACCEPT ; service iptables save` not allow ocfs2 to talk? What do people add the their iptables? -Bret On Jan 14, 2009, at 4:50 PM, Sunil Mushran wrote: It's part and parcel of the fs. If you want mainline linux, goto [1]http://kernel.org. Bret Palsson wrote: Can I get the source for DLM 1.5.0 and build it on my other machines? If so where do I grab it? Thanks, Bret On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote: I hate cut-paste's because I have no idea whether I can trust it or not. A misspelled 0 and 1 makes a whole world of difference. But the following seems to indicate that the configuration is bad. (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) Why is the mount failing on node 0? I thought it was mounted on node 0? Maybe best if you file a bugzilla and attach the /var/log/messages of both nodes. Indicate the time you did the mount. Sunil Bret Palsson wrote: Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: status = -107 (5558,1):o2cb_cluster_connect:302 ERROR: status = -107 (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107 (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) So I figured that It must be a firewall issue. I first disabled iptables on both machines and got the same results so I started ip talbes adding an exception on both machines: `iptables -A INPUT -p tcp --dport -j ACCEPT ; service iptables save` The machines can ping each other. and they have the exact same config: cluster: node_count = 2 name = ocfs2 node: ip_port = ip_address = 10.128.255.3 number = 0 name = m3.c12.jiveip.net cluster = ocfs2 node: ip_port = ip_address = 10.128.7.33 number = 1 name = pbx_33.c12.jiveip.net cluster = ocfs2 I then decided to use tcpdump to see what's up (on both machines): `tcpdump -i eth0 port -v` Here is a TCP dump showing port is not blocked (I added an exception in IP tables) (Node 0) 13:13:11.711539 IP (tos 0x0, ttl 64, id 18286, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294911253 0,nop,wscale 6 13:13:14.710703 IP (tos 0x0, ttl 64, id 18287, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 13:13:14.711213 IP (tos 0x0, ttl 64, id 2241, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.54763 10.128.255.3.cbt: S, cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 (Node 1) 13:13:09.956999 IP (tos 0x0, ttl 64, id 18286, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294911253 0,nop,wscale 6 13:13:12.956999 IP (tos 0x0, ttl 64, id 18287, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 13:13:12.956999 IP (tos 0x0, ttl 64, id 2241, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.54763 10.128.255.3.cbt: S, cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 ___ Ocfs2-devel mailing list ocfs2-de...@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: status = -107 (5558,1):o2cb_cluster_connect:302 ERROR: status = -107 (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107 (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) So I figured that It must be a firewall issue. I first disabled iptables on both machines and got the same results so I started ip talbes adding an exception on both machines: `iptables -A INPUT -p tcp --dport -j ACCEPT ; service iptables save` The machines can ping each other. and they have the exact same config: cluster: node_count = 2 name = ocfs2 node: ip_port = ip_address = 10.128.255.3 number = 0 name = m3.c12.jiveip.net cluster = ocfs2 node: ip_port = ip_address = 10.128.7.33 number = 1 name = pbx_33.c12.jiveip.net cluster = ocfs2 I then decided to use tcpdump to see what's up (on both machines): `tcpdump -i eth0 port -v` Here is a TCP dump showing port is not blocked (I added an exception in IP tables) (Node 0) 13:13:11.711539 IP (tos 0x0, ttl 64, id 18286, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294911253 0,nop,wscale 6 13:13:14.710703 IP (tos 0x0, ttl 64, id 18287, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 13:13:14.711213 IP (tos 0x0, ttl 64, id 2241, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.54763 10.128.255.3.cbt: S, cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 (Node 1) 13:13:09.956999 IP (tos 0x0, ttl 64, id 18286, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294911253 0,nop,wscale 6 13:13:12.956999 IP (tos 0x0, ttl 64, id 18287, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 13:13:12.956999 IP (tos 0x0, ttl 64, id 2241, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.54763 10.128.255.3.cbt: S, cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 mss 1460,sackOK,timestamp 4294914253 0,nop,wscale 6 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: status = -107 (5558,1):o2cb_cluster_connect:302 ERROR: status = -107 (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107 (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) So I figured that It must be a firewall issue. I first disabled iptables on both machines and got the same results so I started ip talbes adding an exception on both machines: `iptables -A INPUT -p tcp --dport -j ACCEPT ; service iptables save` The machines can ping each other. and they have the exact same config: cluster: node_count = 2 name = ocfs2 node: ip_port = ip_address = 10.128.255.3 number = 0 name = m3.c12.jiveip.net cluster = ocfs2 node: ip_port = ip_address = 10.128.7.33 number = 1 name = pbx_33.c12.jiveip.net cluster = ocfs2 I then decided to use tcpdump to see what's up (on both machines): `tcpdump -i eth0 port -v` Here is a TCP dump showing port is not blocked (I added an exception in IP tables) (Node 0) 13:13:11.711539 IP (tos 0x0, ttl 64, id 18286, offset 0, flags [DF], proto: TCP (6), length: 60) 10.128.7.33.47601 10.128.255.3.cbt: S, cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 mss 1460,sackOK,timestamp 4294911253
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
I know it sounds stupid, I had this error, and similar dmesg output when I simply didn't have the mountpoint existing (in my case, I mount /dev/sdc1 to /mnt/www, and /mnt/www didn't exist, I had the same output). It's worth checking at least, though I'm sure you already have. Michael ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
I hate cut-paste's because I have no idea whether I can trust it or not. A misspelled 0 and 1 makes a whole world of difference. But the following seems to indicate that the configuration is bad. (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) Why is the mount failing on node 0? I thought it was mounted on node 0? Maybe best if you file a bugzilla and attach the /var/log/messages of both nodes. Indicate the time you did the mount. Sunil Bret Palsson wrote: Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: status = -107 (5558,1):o2cb_cluster_connect:302 ERROR: status = -107 (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107 (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) So I figured that It must be a firewall issue. I first disabled iptables
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
AFAIR, mount will typically will error out with something like mountpoint does not exist. It should. Michael Moody wrote: I know it sounds stupid, I had this error, and similar dmesg output when I simply didn’t have the mountpoint existing (in my case, I mount /dev/sdc1 to /mnt/www, and /mnt/www didn’t exist, I had the same output). It’s worth checking at least, though I’m sure you already have. Michael ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
Well, the last time this happened to me, the error was not mountpoint does not exist. But, as that's been a while, it's very possible that it does now. Michael -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, January 14, 2009 4:30 PM To: Michael Moody Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting AFAIR, mount will typically will error out with something like mountpoint does not exist. It should. Michael Moody wrote: I know it sounds stupid, I had this error, and similar dmesg output when I simply didn't have the mountpoint existing (in my case, I mount /dev/sdc1 to /mnt/www, and /mnt/www didn't exist, I had the same output). It's worth checking at least, though I'm sure you already have. Michael ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
Can I get the source for DLM 1.5.0 and build it on my other machines? If so where do I grab it? Thanks, Bret On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote: I hate cut-paste's because I have no idea whether I can trust it or not. A misspelled 0 and 1 makes a whole world of difference. But the following seems to indicate that the configuration is bad. (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) Why is the mount failing on node 0? I thought it was mounted on node 0? Maybe best if you file a bugzilla and attach the /var/log/messages of both nodes. Indicate the time you did the mount. Sunil Bret Palsson wrote: Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: status = -107 (5558,1):o2cb_cluster_connect:302 ERROR: status = -107 (5558,1):ocfs2_dlm_init:2753
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
It's part and parcel of the fs. If you want mainline linux, goto http://kernel.org. Bret Palsson wrote: Can I get the source for DLM 1.5.0 and build it on my other machines? If so where do I grab it? Thanks, Bret On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote: I hate cut-paste's because I have no idea whether I can trust it or not. A misspelled 0 and 1 makes a whole world of difference. But the following seems to indicate that the configuration is bad. (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) Why is the mount failing on node 0? I thought it was mounted on node 0? Maybe best if you file a bugzilla and attach the /var/log/messages of both nodes. Indicate the time you did the mount. Sunil Bret Palsson wrote: Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote: versions? kernel and fs. Bret Palsson wrote: Does anyone have any idea what to try next? Here are the steps I have taken and the problem: (I wanted to post my question on the first line before I explained the problem and what I have tried) -- Node 0 has the file system mounted just fine and works great. When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / cluster/ data` I get this error after about 30 seconds: mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/data on / cluster/ data. Check 'dmesg' for more information on this error. Here is the output of dmesg: (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) (3130,0):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (5558,1):dlm_request_join:1033 ERROR: status = -107 (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (5558,1):dlm_join_domain:1485 ERROR: status = -107 (5558,1):dlm_register_domain:1732 ERROR: