What about the dmesg on node 1?

Now ideally we want the fs versions to be the same on all nodes.
However as we have not changed the protocol since 1.4.1, this
should still work.

Bret Palsson wrote:
> node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen
> node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4
>
> Output of Node 1 {
> OCFS2 Node Manager 1.5.0
> OCFS2 DLM 1.5.0
> ocfs2: Registered cluster interface o2cb
> OCFS2 DLMFS 1.5.0
> OCFS2 User DLM kernel interface loaded
> device eth0 entered promiscuous mode
> OCFS2 1.5.0
> }
> On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote:
>
>   
>> versions? kernel and fs.
>>
>> Bret Palsson wrote:
>>     
>>> Does anyone have any idea what to try next? Here are the steps I have
>>> taken and the problem:     (I wanted to post my question on the first
>>> line before I explained the problem and what I have tried)
>>>
>>> ----------
>>>
>>> Node 0 has the file system mounted just fine and works great.
>>>
>>> When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data / 
>>> cluster/
>>> data`  I get this error after about 30 seconds: mount.ocfs2:  
>>> Transport
>>> endpoint is not connected while mounting /dev/mapper/data on / 
>>> cluster/
>>> data. Check 'dmesg' for more information on this error.
>>>
>>>
>>> Here is the output of dmesg:
>>> (3130,1):o2net_connect_expired:1659 ERROR: no connection established
>>> with node 0 after 30.0 seconds, giving up and returning errors.
>>> (4670,1):dlm_request_join:1033 ERROR: status = -107
>>> (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>>> (4670,1):dlm_join_domain:1485 ERROR: status = -107
>>> (4670,1):dlm_register_domain:1732 ERROR: status = -107
>>> (4670,1):o2cb_cluster_connect:302 ERROR: status = -107
>>> (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107
>>> (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107
>>> ocfs2: Unmounting device (253,2) on (node 0)
>>> (3130,0):o2net_connect_expired:1659 ERROR: no connection established
>>> with node 0 after 30.0 seconds, giving up and returning errors.
>>> (5558,1):dlm_request_join:1033 ERROR: status = -107
>>> (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>>> (5558,1):dlm_join_domain:1485 ERROR: status = -107
>>> (5558,1):dlm_register_domain:1732 ERROR: status = -107
>>> (5558,1):o2cb_cluster_connect:302 ERROR: status = -107
>>> (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107
>>> (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107
>>> ocfs2: Unmounting device (253,2) on (node 0)
>>>
>>>
>>> So I figured that It must be a firewall issue. I first disabled
>>> iptables on both machines and got the same results so I started ip
>>> talbes adding an exception on both machines: `iptables -A INPUT -p  
>>> tcp
>>> --dport 7777 -j ACCEPT ; service iptables save`
>>>
>>> The machines can ping each other. and they have the exact same  
>>> config:
>>> cluster:
>>>     node_count = 2
>>>     name = ocfs2
>>> node:
>>>     ip_port = 7777
>>>     ip_address = 10.128.255.3
>>>     number = 0
>>>     name = m3.c12.jiveip.net
>>>     cluster = ocfs2
>>> node:
>>>     ip_port = 7777
>>>     ip_address = 10.128.7.33
>>>     number = 1
>>>     name = pbx_33.c12.jiveip.net
>>>     cluster = ocfs2
>>>
>>>
>>> I then decided to use tcpdump to see what's up (on both machines):
>>> `tcpdump -i eth0 port 7777 -v`
>>>
>>> Here is a TCP dump showing port 7777 is not blocked (I added an
>>> exception in IP tables)
>>> (Node 0)
>>> 13:13:11.711539 IP (tos 0x0, ttl  64, id 18286, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.47601 > 10.128.255.3.cbt: S,
>>> cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
>>> 13:13:14.710703 IP (tos 0x0, ttl  64, id 18287, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.47601 > 10.128.255.3.cbt: S,
>>> cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>> 13:13:14.711213 IP (tos 0x0, ttl  64, id 2241, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.54763 > 10.128.255.3.cbt: S,
>>> cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>
>>> (Node 1)
>>> 13:13:09.956999 IP (tos 0x0, ttl  64, id 18286, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.47601 > 10.128.255.3.cbt: S,
>>> cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
>>> 13:13:12.956999 IP (tos 0x0, ttl  64, id 18287, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.47601 > 10.128.255.3.cbt: S,
>>> cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>> 13:13:12.956999 IP (tos 0x0, ttl  64, id 2241, offset 0, flags [DF],
>>> proto: TCP (6), length: 60) 10.128.7.33.54763 > 10.128.255.3.cbt: S,
>>> cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 <mss
>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>       

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to