Ooo, it is well known (for OCFSv2) message. Moreover, THIS particular timeout can not be changed.
In my case, after spending few days, I find that my HugeTLB setting (in Oracle) caused long kernel loop and it forced OCFSv2 to reboot because of losing connection. PS. I dream, when I will see a SET of heartbeat interfaces in OCFSv2. It is THE ONLY system which do not support it (from all clustered systems I have around). Bonding.. hmm, bonding is for another purposes, and have 20 - 30 seconds reconvergence time by design. ----- Original Message ----- From: "Andy Phillips" <[EMAIL PROTECTED]> To: "Sunil Mushran" <[EMAIL PROTECTED]> Cc: "ocfs2-users" <[email protected]> Sent: Monday, August 07, 2006 2:20 AM Subject: Re: [Ocfs2-users] o2net: connect to node has been idle for 10 secs > Hello, > > Well we had the same problem again; > > o2net: connection to node barney (num 0) at 172.16.6.10:7777 > has been idle for 10 seconds, shutting it down. > > kernel: (0,0):o2net_idle_timer:1309 here are some times that might help > debug the situation: (tmr 1154932284.14757 now 1154932294.13147 dr > 1154932284.14717 adv 1154932284.14767:1154932284.14768 func (06aac8a1:1) > 1154932279.15062:1154932279.15068) > > We upgraded to 1.2.3. And it almost immediately died again with the > same error. Our cron job that touches a file every 3 seconds did not > seem to make much difference. This is now quite a serious problem for > us. > > Any suggestions as to how to take this forward? > > Sunil, what do you need from us to roll a custom debugging build? > Can we run the custom build on node 2 and leave the existing build on > node 1, which is now production? > > Andy > > > > > >> Aug 2 19:06:27 fred kernel: o2net: connection to node barney (num 0) at > > > >> 172.16.6.10:7777 has been idle for 10 seconds, shutting it down. > > > >> Aug 2 19:06:27 fred kernel: (0,7):o2net_idle_timer:1309 here are some > > > >> times that might help debug the situation: (tmr 1154545576.798263 now > > > >> 1154545586.796978 dr 1154545576.798238 adv > > > >> 1154545576.798291:1154545576.798293 func (06aac8a1:1) > > > >> 1154545566.800782:1154545566.800787) > > > >> Aug 2 19:06:27 fred kernel: o2net: no longer connected to node barney > > > >> (num 0) at 172.16.6.10:7777 > > > >> Aug 2 19:08:33 fred kernel: (25,7):o2quo_make_decision:143 ERROR: > > > >> fencing this node because it is connected to > > > >> a half-quorum of 1 out of 2 nodes which doesn't include the lowest > > > >> active node 0 > > > >> Aug 2 19:08:33 fred kernel: (25,7):o2hb_stop_all_regions:1908 ERROR: > > > >> stopping heartbeat on all active regions. > > ________________________________________________________________________ > -- > Andy Phillips, FRAS > Systems Architect, Information Systems. > > Direct Line: 0208 834 8436 > > The information in this e-mail and any attachment is confidential and is > intended only for the named recipient(s). The e-mail may not be > disclosed or used by any person other than the addressee, nor may it be > copied in any way. If you are not a named recipient please notify the > sender immediately and delete any copies of this message. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden.Any view or opinions presented are solely > those of the author and do not necessarily represent those of > Betfair.Betfair is the trading name of The Sporting Exchange Limited > whose registered office is: Waterfront, Hammersmith Embankment, > Chancellors Road, London W6 9HP. Registered in England with No. 3770548. > > > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
