Add one for --srcport as well and I think you'll be ok.  Actually, since
my cluster traffic all goes over a separate switch I usually just allow
all traffic in/out of eth1.

Brian

Bret Palsson <b...@getjive.com> 2009-01-15 08:12:
>    So it looks like iptables is what is stopping it from working. After
>    disabling iptables completely for 1 minute then trying to mount on node 1
>    it worked fine.
>    So my new question is why did `iptables -A INPUT -ptcp --dport 7777 -j
>    ACCEPT ; service iptables save` not allow ocfs2 to talk?  What do people
>    add the their iptables?
>    -Bret
>    On Jan 14, 2009, at 4:50 PM, Sunil Mushran wrote:
> 
>      It's part and parcel of the fs. If you want mainline linux,
>      goto [1]http://kernel.org.
> 
>      Bret Palsson wrote:
> 
>        Can I get the source for DLM 1.5.0 and build it on my other machines?
> 
>        If so where do I grab it?
> 
>        Thanks,
> 
>        Bret
> 
>        On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote:
> 
>          I hate cut-paste's because I have no idea whether I can trust it
> 
>          or not. A misspelled 0 and 1 makes a whole world of difference.
> 
>          But the following seems to indicate that the configuration is bad.
> 
>          (3130,1):o2net_connect_expired:1659 ERROR: no connection established
> 
>          with node 0 after 30.0 seconds, giving up and returning errors.
> 
>          (4670,1):dlm_request_join:1033 ERROR: status = -107
> 
>          (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107
> 
>          (4670,1):dlm_join_domain:1485 ERROR: status = -107
> 
>          (4670,1):dlm_register_domain:1732 ERROR: status = -107
> 
>          (4670,1):o2cb_cluster_connect:302 ERROR: status = -107
> 
>          (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107
> 
>          (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107
> 
>          ocfs2: Unmounting device (253,2) on (node 0)
> 
>          Why is the mount failing on node 0? I thought it was mounted on
> 
>          node 0?
> 
>          Maybe best if you file a bugzilla and attach the /var/log/messages
> 
>          of both nodes. Indicate the time you did the mount.
> 
>          Sunil
> 
>          Bret Palsson wrote:
> 
>            Output of Node 0 {
> 
>            OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
> 
>            0f78045c75c0174e50e4cf0934bf9eae)
> 
>            OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
> 
>            4ce8fae327880c466761f40fb7619490)
> 
>            OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
> 
>            4ce8fae327880c466761f40fb7619490)
> 
>            OCFS2 User DLM kernel interface loaded
> 
>            SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not
> 
>            configured for labeling
> 
>            eth3: no IPv6 routers present
> 
>            OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build
> 
>            3fc82af4b5669945497b322b6aabd031)
> 
>            ocfs2_dlm: Nodes in domain ("8B2CCF82F1BA4A70B587580B23D9D7F7"): 0
> 
>            kjournald starting.  Commit interval 5 seconds
> 
>            ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered
>            data
> 
>            mode.
> 
>            SELinux: initialized (dev dm-3, type ocfs2), not configured for
> 
>            labeling
> 
>            ocfs2_dlm: Nodes in domain ("222B65A090D6477481AD30DE9FCE7961"): 0
> 
>            kjournald starting.  Commit interval 5 seconds
> 
>            ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered
>            data
> 
>            mode.
> 
>            SELinux: initialized (dev dm-2, type ocfs2), not configured for
> 
>            labeling
> 
>            ocfs2_dlm: Nodes in domain ("0425C0367AF547E989864A46F3DBD6E6"): 0
> 
>            kjournald starting.  Commit interval 5 seconds
> 
>            ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered
>            data
> 
>            mode.
> 
>            SELinux: initialized (dev dm-4, type ocfs2), not configured for
> 
>            labeling
> 
>            }
> 
>            Output of Node 1 {
> 
>            OCFS2 Node Manager 1.5.0
> 
>            OCFS2 DLM 1.5.0
> 
>            ocfs2: Registered cluster interface o2cb
> 
>            OCFS2 DLMFS 1.5.0
> 
>            OCFS2 User DLM kernel interface loaded
> 
>            device eth0 entered promiscuous mode
> 
>            OCFS2 1.5.0
> 
>            }
> 
>            On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote:
> 
>              What about the dmesg on node 1?
> 
>              Now ideally we want the fs versions to be the same on all nodes.
> 
>              However as we have not changed the protocol since 1.4.1, this
> 
>              should still work.
> 
>              Bret Palsson wrote:
> 
>                node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen
> 
>                node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4
> 
>                Output of Node 1 {
> 
>                OCFS2 Node Manager 1.5.0
> 
>                OCFS2 DLM 1.5.0
> 
>                ocfs2: Registered cluster interface o2cb
> 
>                OCFS2 DLMFS 1.5.0
> 
>                OCFS2 User DLM kernel interface loaded
> 
>                device eth0 entered promiscuous mode
> 
>                OCFS2 1.5.0
> 
>                }
> 
>                On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote:
> 
>                  versions? kernel and fs.
> 
>                  Bret Palsson wrote:
> 
>                    Does anyone have any idea what to try next? Here are the
>                    steps I
> 
>                    have
> 
>                    taken and the problem:     (I wanted to post my question
>                    on the
> 
>                    first
> 
>                    line before I explained the problem and what I have tried)
> 
>                    ----------
> 
>                    Node 0 has the file system mounted just fine and works
>                    great.
> 
>                    When trying to mount on Node 1: `mount.ocfs2
>                    /dev/mapper/data /
> 
>                    cluster/
> 
>                    data`  I get this error after about 30 seconds:
>                    mount.ocfs2:
> 
>                    Transport
> 
>                    endpoint is not connected while mounting /dev/mapper/data
>                    on /
> 
>                    cluster/
> 
>                    data. Check 'dmesg' for more information on this error.
> 
>                    Here is the output of dmesg:
> 
>                    (3130,1):o2net_connect_expired:1659 ERROR: no connection
> 
>                    established
> 
>                    with node 0 after 30.0 seconds, giving up and returning
>                    errors.
> 
>                    (4670,1):dlm_request_join:1033 ERROR: status = -107
> 
>                    (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107
> 
>                    (4670,1):dlm_join_domain:1485 ERROR: status = -107
> 
>                    (4670,1):dlm_register_domain:1732 ERROR: status = -107
> 
>                    (4670,1):o2cb_cluster_connect:302 ERROR: status = -107
> 
>                    (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107
> 
>                    (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107
> 
>                    ocfs2: Unmounting device (253,2) on (node 0)
> 
>                    (3130,0):o2net_connect_expired:1659 ERROR: no connection
> 
>                    established
> 
>                    with node 0 after 30.0 seconds, giving up and returning
>                    errors.
> 
>                    (5558,1):dlm_request_join:1033 ERROR: status = -107
> 
>                    (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107
> 
>                    (5558,1):dlm_join_domain:1485 ERROR: status = -107
> 
>                    (5558,1):dlm_register_domain:1732 ERROR: status = -107
> 
>                    (5558,1):o2cb_cluster_connect:302 ERROR: status = -107
> 
>                    (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107
> 
>                    (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107
> 
>                    ocfs2: Unmounting device (253,2) on (node 0)
> 
>                    So I figured that It must be a firewall issue. I first
>                    disabled
> 
>                    iptables on both machines and got the same results so I
>                    started ip
> 
>                    talbes adding an exception on both machines: `iptables -A
>                    INPUT -p
> 
>                    tcp
> 
>                    --dport 7777 -j ACCEPT ; service iptables save`
> 
>                    The machines can ping each other. and they have the exact
>                    same
> 
>                    config:
> 
>                    cluster:
> 
>                      node_count = 2
> 
>                      name = ocfs2
> 
>                    node:
> 
>                      ip_port = 7777
> 
>                      ip_address = 10.128.255.3
> 
>                      number = 0
> 
>                      name = m3.c12.jiveip.net
> 
>                      cluster = ocfs2
> 
>                    node:
> 
>                      ip_port = 7777
> 
>                      ip_address = 10.128.7.33
> 
>                      number = 1
> 
>                      name = pbx_33.c12.jiveip.net
> 
>                      cluster = ocfs2
> 
>                    I then decided to use tcpdump to see what's up (on both
>                    machines):
> 
>                    `tcpdump -i eth0 port 7777 -v`
> 
>                    Here is a TCP dump showing port 7777 is not blocked (I
>                    added an
> 
>                    exception in IP tables)
> 
>                    (Node 0)
> 
>                    13:13:11.711539 IP (tos 0x0, ttl  64, id 18286, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.47601 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
> 
>                    13:13:14.710703 IP (tos 0x0, ttl  64, id 18287, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.47601 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
> 
>                    13:13:14.711213 IP (tos 0x0, ttl  64, id 2241, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.54763 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
> 
>                    (Node 1)
> 
>                    13:13:09.956999 IP (tos 0x0, ttl  64, id 18286, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.47601 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
> 
>                    13:13:12.956999 IP (tos 0x0, ttl  64, id 18287, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.47601 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
> 
>                    13:13:12.956999 IP (tos 0x0, ttl  64, id 2241, offset 0,
>                    flags
> 
>                    [DF],
> 
>                    proto: TCP (6), length: 60) 10.128.7.33.54763 >
> 
>                    10.128.255.3.cbt: S,
> 
>                    cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840
>                    <mss
> 
>                    1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
> 
> References
> 
>    Visible links
>    1. http://kernel.org/

> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to