Hi, just installed 10g RAC on ocfs2 with 2 nodes, on an RHEL AS 4 x86_64 server (4gb quad opteron).
 
Everything seemed ok until the DBA started to build the database and do some heavy operations to it.  Node 0 on the cluster kernel panics after this console message:
 
(6,0):o2hb_write_timeout:164
ERROR: Heartbeat write timeout to device dm-0 after 12001 milliseconds
(6,0):o2hb_stop_all_regions:1673
ERROR: stopping heartbeat on all active regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicking
 
on Node 1, this was on the console:
 
(2585,1): o2net_set_nn_state:421
no longer connected to node DC1ORA01 at 192.168.79.169:7777
(32763,1):ocfs2_replay_journal:1123 Recovery node 0 from slot 0 on device (253,0)
 
and Node 1's OS was barely responsive and wouldn't shutdown cleanly.
 
The DBA said Oracle was creating numerous trace dumps due to I/O errors, especially during heavy load.  Where do these errors point to?  storage drivers? ocfs2 bugs? incompatibilities with 10g RAC and ocfs2?  Where do I start here?
 
Oh, device dm-0 is a standard logical volume made up of 3 physical volumes from a SAN array.  I downloaded the 10g install disk images to it and installed just fine, so the storage appears to be working properly and does for other server environments.
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to