Re: Hmm, here is an example. Re: [Ocfs2-users] Also just a comment to the Oracle guys

Luis Freitas Sat, 10 Feb 2007 16:58:59 -0800

Alexei,
   
      Actually your log seems to show that CSSD (Oracle CRS) rebooted the node 
before OCFS2 got a chance to do it.
   
      On a RAC cluster, if the interconnect is interrupted, all the nodes hang 
until a split brain resolution is complete and the recovery of all the crashed 
nodes is completed. This is needed because every read on a Oracle datablock 
needs a ping to the other nodes. 
   
      The view of the data must be consistent, when one node read a particular 
data block, the Oracle Database first ping the other nodes to ensure that they 
did not modify the block and still have not flushed it to disk. Another node 
may even forward a reply with the block, preventing the disk access (Cache 
Fusion). 
   
      When a split brain occurs, there is the loss of these blocks not flushed 
to disk, and they are rebuilt using the redo threads of the particular nodes 
that crashed. During this interval all the database instances "freeze", since 
before the node recovery is complete there is no way to guarantee that a block 
read from disk has not been altered on the crashed node.
   
      So the fencing is needed even if there is no disk activity, as the entire 
cluster becomes "hang" the moment the interconnect is down. And the timeout for 
the fencing must be as small as possible to prevent a long cluster 
reconfiguration delay. Of course the timeout must be tuned so as to be larger 
than ethernet switch failovers, or storage controller or disk multipath 
failovers. Or if possible the failover times should be reduced.
   
     Now, on the other hand, I am too having problems with OCFS2. It seems much 
less robust than ASM and the previous version, OCFS, specially under heavy disk 
activity. But I do expect these problems to get solved in the near future, as 
did the 2.4 kernel VM problems.
   
  Regards,
  Luis
  
Alexei_Roudnev <[EMAIL PROTECTED]> wrote:
          Additional info - node had not ANY active OCFSv2 operations (OCFSv2 
used for backups only and from another node only). So, if system just SUSPEND 
all FS operations and try to rejoin to the cluster, it all could work 
(moreover, connection to the disk system was intact, so it could close file 
sytem gracefully).
   
  It reveals 3 problems at once:
  - single heartbeat link (instead of multiple links)
  - timeout too short (ethernet can't guarantee 10 seconds, it can guarantee 1 
minute minimum);
  - fencing even if system is passive and can remount / reconnect instead of 
rebooting.
   
  All we did in the lab was _disconnect 1 of trunks between switches for a few 
seconds, then insert it back into the socket_. No one other application failed
  (including heartbeat clusters). Database cluster was not doing anything on 
OCFS in time of failure (even backups).
   
  I will try heartbeat between loopback interfaces (and OCFS protocol) next 
time (I am just curios if it can provide 10 seconds for network 
reconfiguration).
   
  ...
  Feb  1 12:19:13 testrac12 kernel: o2net: connection to node testrac11 (num 0) 
at 10.254.32.111:7777 has been idle for 10 seconds, shutting it down. 
Feb  1 12:19:13 testrac12 kernel: (13,3):o2net_idle_timer:1310 here are some 
times that might help debug the situation: (tmr 1170361135.521061 now 
1170361145.520476 dr 1170361141.852795 adv 1170361135.521063:1170361135.521064 
func (c4378452:505) 1170361067.762941:1170361067.762967) 
Feb  1 12:19:13 testrac12 kernel: o2net: no longer connected to node testrac11 
(num 0) at 10.254.32.111:7777 
Feb  1 12:19:13 testrac12 kernel: (1855,3):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:19:13 testrac12 kernel: (1855,3):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:19:13 testrac12 kernel: (1855,1):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:19:13 testrac12 kernel: (1855,1):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:22:22 testrac12 kernel: (1855,2):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:22:22 testrac12 kernel: (1855,2):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:22:27 testrac12 kernel: (13,3):o2quo_make_decision:144 ERROR: fencing 
this node because it is connected to a half-quorum of 1 out of 2 nodes which 
doesn't include the lowest active node 0 
Feb  1 12:22:27 testrac12 kernel: (13,3):o2hb_stop_all_regions:1889 ERROR: 
stopping heartbeat on all active regions. 
Feb  1 12:22:27 testrac12 kernel: Kernel panic: ocfs2 is very sorry to be 
fencing this system by panicing 
Feb  1 12:22:27 testrac12 kernel: 
Feb  1 12:22:28 testrac12 su: pam_unix2: session finished for user oracle, 
service su 
Feb  1 12:22:29 testrac12 logger: Oracle CSSD failure.  Rebooting for cluster 
integrity. 
Feb  1 12:22:32 testrac12 su: pam_unix2: session finished for user oracle, 
service su 
...
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users


 
---------------------------------
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: Hmm, here is an example. Re: [Ocfs2-users] Also just a comment to the Oracle guys

Reply via email to