Some additional data:
>From Web1 ( New Fedora Machine) to Web2:
        [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141

        Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
        Nmap scan report for 192.168.102.141
        Host is up (0.000076s latency).
        Not shown: 993 closed ports
        PORT     STATE SERVICE
        22/tcp   open  ssh
        80/tcp   open  http
        81/tcp   open  hosts2-ns
        111/tcp  open  rpcbind
        5666/tcp open  nrpe
        7777/tcp open  unknown
        9102/tcp open  jetdirect
        MAC Address: 00:50:56:A3:58:5D (VMware)
        
        Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds


>From   web2 -> web1 (new fedora machine)
        [r...@web2 ~]# nmap 192.168.102.140
        
        Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
        Interesting ports on 192.168.102.140:
        Not shown: 994 closed ports
        PORT     STATE SERVICE
        22/tcp   open  ssh
        80/tcp   open  http
        81/tcp   open  hosts2-ns
        111/tcp  open  rpcbind
        443/tcp  open  https
        7777/tcp open  unknown
        MAC Address: 00:50:56:A3:14:62 (VMWare)

        Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds


Cluster.conf:
        cluster:
                node_count = 6
                name = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.140
                number = 1
                name = web1
                cluster = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.141
                number = 2
                name = web2
                cluster = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.142
                number = 3
                name = web3
                cluster = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.111
                number = 4
                name = rgapp1
                cluster = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.122
                number = 5
                name = deploy
                cluster = appshare
        
        node:
                ip_port = 7777
                ip_address = 192.168.102.112
                number = 6
                name = app1
                cluster = appshare

DMESG on WEB1:
        OCFS2 1.5.0
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
        (1262,0):dlm_request_join:1035 ERROR: status = -107
        (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
        (1262,0):dlm_join_domain:1487 ERROR: status = -107
        (1262,0):dlm_register_domain:1753 ERROR: status = -107
        (1262,0):o2cb_cluster_connect:313 ERROR: status = -107
        (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
        (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
        ocfs2: Unmounting device (253,1) on (node 0)
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
        (1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
        (1323,0):dlm_request_join:1035 ERROR: status = -107
        (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
        (1323,0):dlm_join_domain:1487 ERROR: status = -107
        (1323,0):dlm_register_domain:1753 ERROR: status = -107
        (1323,0):o2cb_cluster_connect:313 ERROR: status = -107
        (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
        (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
        ocfs2: Unmounting device (253,1) on (node 0)
        VMCI: Major device number is: 249
        VMware memory control driver initialized
        vmmemctl: started kernel thread pid=1522
        ocfs2: Unregistered cluster interface o2cb
        OCFS2 Node Manager 1.5.0
        OCFS2 DLM 1.5.0
        ocfs2: Registered cluster interface o2cb
        OCFS2 DLMFS 1.5.0
        OCFS2 User DLM kernel interface loaded
        OCFS2 1.5.0
        (1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
        (1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
        (1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
        (1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
        (1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
        (1839,0):dlm_request_join:1035 ERROR: status = -107
        (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
        (1839,0):dlm_join_domain:1487 ERROR: status = -107
        (1839,0):dlm_register_domain:1753 ERROR: status = -107
        (1839,0):o2cb_cluster_connect:313 ERROR: status = -107
        (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
        (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
        ocfs2: Unmounting device (253,1) on (node 0)
        


So clearly  ocfs2 the service things it can connect to the node, but nmap
sees the connection just fine. And Web2 can see the port on web1 just fine,
so there is no firewall blocking the connections.

I think it might be   Fedora 12 used 1.50 for the OCFS kernel module and
CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?

David
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mush...@oracle.com]
Sent: Thursday, March 25, 2010 6:46 PM
To: David Murphy
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2

hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
populates configfs. AFAIK.

David Murphy wrote:
>
> We had  6 nodes running CentOS 5.4 using  1.4.3 ocfs2-tools.
>
>  
>
> I decided to rebuild one node with FC12.
>
>  
>
>  
>
> Which is working fine, however
>
>  
>
> Nmap 192.168.200.112  shows 7777 as open
>
> And
>
>  
>
> O2cb_ctl is  timing out when trying to connect to that node which then 
> causes a 107 error. This happens with all node and all node have 7777 
> open  via nmap from the FC machine.
>
>  
>
>  
>
> Is there a way to further debug this to see what exactly  o2cb_ctl is 
> seeing when trying to connect?
>
>  
>
>  
>
> David
>
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to