Re: [Ocfs2-users] ocfs2 file system hang during copy files

Sunil Mushran Thu, 19 Jul 2007 10:54:20 -0700

The default disk heartbeat timeouts are way too low. In short, the
buffered write flush is probably flooding the device and delaying
the heartbeat io.


For more, refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT

If you are 1.2.5, then also refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT

Zosen Wang wrote:

I am trying to copy a single 42 gb file from et3 file system to ocfs2file system on node 1. The ocfs2 file system hang on all nodesafter/during the cp. The /p0ebsdb/u13 is an ocfs2 mount point sharedwith other 2 nodes (3 nodes rac).
The following is unix copy command

[EMAIL PROTECTED] migrate]# time cp aexp02.dmp /p0ebsdb/u13/junk
real    17m49.351s

user    0m0.392s

sys     1m49.065s
The following is dmesg on node1
ocfs2_dlm: Nodes in domain ("A2AECED66891407D915CBF282A9E9299"): 0 1 2
o2net: connection to node b30svrxp-ebsdb2.ameripride.com (num 1) at192.168.3.70:7777 has been idle for 10.0 seconds, shutting it down.
(0,3):o2net_idle_timer:1418 here are some times that might help debugthe situation: (tmr 1184814613.883032 now 1184814623.882842 dr1184814613.883028 adv 1184814613.883033:1184814613.883033 func(2b61f804:504) 1184814613.882900:1184814613.882904)
o2net: no longer connected to node b30svrxp-ebsdb2.ameripride.com (num1) at 192.168.3.70:7777
(6047,3):dlm_send_proxy_ast_msg:459 ERROR: status = -107

(6047,3):dlm_flush_asts:600 ERROR: status = -107

(20810,0):dlm_do_master_request:1418 ERROR: link to 1 went down!

(20810,0):dlm_get_lock_resource:995 ERROR: status = -107
The following is dmesg on node2

(26243,1):dlm_send_remote_convert_request:398 ERROR: status = -107
(26243,1):dlm_wait_for_node_death:3659EA98E20F6E44FF7B7A89789976C1E32: waiting 5000ms for notification ofdeath of node 0
(7427,0):dlm_send_remote_convert_request:398 ERROR: status = -107
(7427,0):dlm_wait_for_node_death:365 75990178D36942BFA473A2AE4149690C:waiting 5000ms for notification of death of node 0
The following is dmesg on node3
mtrr: type mismatch for d8000000,2000000 old: uncachable new:write-combining
adl_trace[9860]: segfault at 000000000000000c rip 0000000040002462 rsp0000007fbfffe3e0 error 4
Any clue? And thanks in advance

------------------------------------------------------------------------

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users



_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ocfs2 file system hang during copy files

Reply via email to