The default disk heartbeat timeouts are way too low. In short, the
buffered write flush is probably flooding the device and delaying
the heartbeat io.
For more, refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT
If you are 1.2.5, then also refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT
Zosen Wang wrote:
I am trying to copy a single 42 gb file from et3 file system to ocfs2
file system on node 1. The ocfs2 file system hang on all nodes
after/during the cp. The /p0ebsdb/u13 is an ocfs2 mount point shared
with other 2 nodes (3 nodes rac).
The following is unix copy command
[EMAIL PROTECTED] migrate]# time cp aexp02.dmp /p0ebsdb/u13/junk
real 17m49.351s
user 0m0.392s
sys 1m49.065s
The following is dmesg on node1
ocfs2_dlm: Nodes in domain ("A2AECED66891407D915CBF282A9E9299"): 0 1 2
o2net: connection to node b30svrxp-ebsdb2.ameripride.com (num 1) at
192.168.3.70:7777 has been idle for 10.0 seconds, shutting it down.
(0,3):o2net_idle_timer:1418 here are some times that might help debug
the situation: (tmr 1184814613.883032 now 1184814623.882842 dr
1184814613.883028 adv 1184814613.883033:1184814613.883033 func
(2b61f804:504) 1184814613.882900:1184814613.882904)
o2net: no longer connected to node b30svrxp-ebsdb2.ameripride.com (num
1) at 192.168.3.70:7777
(6047,3):dlm_send_proxy_ast_msg:459 ERROR: status = -107
(6047,3):dlm_flush_asts:600 ERROR: status = -107
(20810,0):dlm_do_master_request:1418 ERROR: link to 1 went down!
(20810,0):dlm_get_lock_resource:995 ERROR: status = -107
The following is dmesg on node2
(26243,1):dlm_send_remote_convert_request:398 ERROR: status = -107
(26243,1):dlm_wait_for_node_death:365
9EA98E20F6E44FF7B7A89789976C1E32: waiting 5000ms for notification of
death of node 0
(7427,0):dlm_send_remote_convert_request:398 ERROR: status = -107
(7427,0):dlm_wait_for_node_death:365 75990178D36942BFA473A2AE4149690C:
waiting 5000ms for notification of death of node 0
The following is dmesg on node3
mtrr: type mismatch for d8000000,2000000 old: uncachable new:
write-combining
adl_trace[9860]: segfault at 000000000000000c rip 0000000040002462 rsp
0000007fbfffe3e0 error 4
Any clue? And thanks in advance
------------------------------------------------------------------------
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users