RE: [Ocfs2-users] RE: Access to OCFS2 volume paused when a node crashes

paul fretter (TOC) Tue, 09 Oct 2007 07:26:45 -0700

Many thanks Marcos.


Kind regards

Paul Fretter

 

From: Marcos E. Matsunaga [mailto:[EMAIL PROTECTED] 
Sent: 09 October 2007 13:31
To: paul fretter (TOC)
Cc: [email protected]
Subject: Re: [Ocfs2-users] RE: Access to OCFS2 volume paused when a node crashes

 

You may want to try to increase the network timeout. You will have to do it on 
all nodes.

See the FAQ 
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT  
with special attention to #104 and 105.





Regards,
 
Marcos Eduardo Matsunaga
 
Oracle USA
Linux Engineering
 
 



paul fretter (TOC) wrote: 

To clarify,
 
The host "node1" is the OCFS node 0 in the config file.
 
The log entries are from another system in the cluster.
 
Kind regards
Paul
 
 
 
  

        -----Original Message-----
        From: paul fretter (TOC)
        Sent: 09 October 2007 11:41
        To: [email protected]
        Subject: Access to OCFS2 volume paused when a node crashes
         
        There is a node (node1) on our cluster that for some reason hangs
            

every
  

        now and again, but it seems that when it happens it also pauses access
        to the OCFS2 volume for the other nodes.
         
        We are running the latest version of OCFS2 and the tools, on RHEL4
        (x86_64) with kernel 2.6.9-42.  All nodes area connected by
        fibrechannel to a common LUN for data sharing.
         
        I guess there may be something I can do with configuring timeouts
        etc(?), but I thought I'd check with this list first.  Here is the
        relevant info from /va/log/messages
         
         
        Oct  9 11:24:41 jic55124 kernel: o2net: connection to node node1 (num
        0) at 10.1 0.10.1:7777 has been idle for 10.0 seconds, shutting it
        down.
        Oct  9 11:24:41 jic55124 kernel: (0,1):o2net_idle_timer:1418 here are
        some times  that might help debug the situation: (tmr
            

1191925471.993435
  

        now 1191925481.9942 92 dr 1191925471.993425 adv
        1191925471.993436:1191925471.993437 func (98e2d068:5 07)
        1191924562.14841:1191924562.14844)
        Oct  9 11:24:41 jic55124 kernel: o2net: no longer connected to node
        node1 (num 0 ) at 10.10.10.1:7777
        Oct  9 11:24:41 jic55124 kernel: (727,3):dlm_do_master_request:1418
        ERROR: link to 0 went down!
        Oct  9 11:24:41 jic55124 kernel: (727,3):dlm_get_lock_resource:995
        ERROR: status  = -112
        [EMAIL PROTECTED] ~]# tail /var/log/messages
        Oct  9 11:28:48 jic55124 kernel: (856,2):dlm_get_lock_resource:995
        ERROR: status = -107
        Oct  9 11:28:48 jic55124 kernel: (856,2):dlm_do_master_request:1418
        ERROR: link to 0 went down!
        Oct  9 11:28:48 jic55124 kernel: (856,2):dlm_get_lock_resource:995
        ERROR: status = -107
        Oct  9 11:33:42 jic55124 kernel: (865,0):dlm_get_lock_resource:921
        6B13C23CB44C4D888150894FE4D35D4E:M000000000000000000007571339968: at
        least one node (0) torecover before lock mastery can begin
        Oct  9 11:33:42 jic55124 kernel: (3765,1):ocfs2_dlm_eviction_cb:119
        device (8,80): dlm has evicted node 0
        Oct  9 11:33:43 jic55124 kernel: (865,0):dlm_get_lock_resource:976
        6B13C23CB44C4D888150894FE4D35D4E:M000000000000000000007571339968: at
        least one node (0) torecover before lock mastery can begin
        Oct  9 11:33:46 jic55124 kernel: (727,3):dlm_restart_lock_mastery:1301
        ERROR: node down! 0
        Oct  9 11:33:46 jic55124 kernel:
            

(727,3):dlm_wait_for_lock_mastery:1118
  

        ERROR: status = -11
        Oct  9 11:33:48 jic55124 kernel: (865,1):ocfs2_replay_journal:1167
        Recovering node 0 from slot 5 on device (8,80)
        Oct  9 11:33:50 jic55124 kernel: kjournald starting.  Commit interval
            

5
  

        seconds
         
         
        Many thanks
        Paul Fretter
            

 
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

RE: [Ocfs2-users] RE: Access to OCFS2 volume paused when a node crashes

Reply via email to