I have replaced a dead node that was running in dual-primary mode with OCFS2. All the steps work:

`/proc/drbd`

    version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbu...@builder10.centos.org, 2012-05-07 11:56:36

     1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

until I try to mount the volume:

    mount -t ocfs2 /dev/drbd1 /data/webroot/
mount.ocfs2: Transport endpoint is not connected while mounting /dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this error.

`/var/log/kern.log`

kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):dlm_try_to_join_domain:1210 ERROR: status = -107
    kernel: (mount.ocfs2,12037,1):dlm_join_domain:1488 ERROR: status = -107
kernel: (mount.ocfs2,12037,1):dlm_register_domain:1754 ERROR: status = -107
    kernel: (mount.ocfs2,12037,1):ocfs2_dlm_init:2808 ERROR: status = -107
kernel: (mount.ocfs2,12037,1):ocfs2_mount_volume:1447 ERROR: status = -107
    kernel: ocfs2: Unmounting device (147,1) on (node 1)

I'm sure `/etc/ocfs2/cluster.conf` on the both node are identical:

`/etc/ocfs2/cluster.conf`

    node:
        ip_port = 7777
        ip_address = 192.168.3.145
        number = 0
        name = SVR233NTC-3145.localdomain
        cluster = cpc

    node:
        ip_port = 7777
        ip_address = 192.168.2.93
        number = 1
        name = SVR022-293.localdomain
        cluster = cpc

    cluster:
        node_count = 2
        name = cpc

and they are connected fine:

    # nc -z 192.168.3.145 7777
    Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded!

but the O2CB heartbeat is not active on the new node (192.168.2.93):

`/etc/init.d/o2cb status`

    Driver for "configfs": Loaded
    Filesystem "configfs": Mounted
    Driver for "ocfs2_dlmfs": Loaded
    Filesystem "ocfs2_dlmfs": Mounted
    Checking O2CB cluster cpc: Online
    Heartbeat dead threshold = 31
      Network idle timeout: 30000
      Network keepalive delay: 2000
      Network reconnect delay: 2000
    Checking O2CB heartbeat: Not active

Here're the results when running `tcpdump` on the node 0 while starting the `ocfs2` on the node 1:

1 0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0 2 0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223 TSecr=690432180 3 0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223 4 0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt [PSH, ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223 5 0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181 6 0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [RST, ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181

The `RST` flag is sent after every 6 packets.

What other can I do to debug this case?

PS:

OCFS2 versions on the node 0:

 - ocfs2-tools-1.4.4-1.el5
 - ocfs2-2.6.18-274.12.1.el5-1.4.7-1.el5

OCFS2 versions on the node 1:

 - ocfs2-tools-1.4.4-1.el5
 - ocfs2-2.6.18-308.el5-1.4.7-1.el5

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to