Your analysis of the problem is correct. Because you have set the timeout to 20 mins, the cluster waits 20 mins before declaring the node dead and re-admitting it into its dlm domain. There is no solution other than reducing the timeout. That you have to set it to 20 mins suggests that the SAN/io setup needs to be looked into.
Karim Alkhayer wrote: > Hi Sunil, > > Already advised SP4 but the platform supplier is hesitating to support the > upgrade, servers and SAN impact wise. > > It is a dead end when talking about the upgrade, any alternatives? > > Regards, > Karim > > -----Original Message----- > From: Sunil Mushran [mailto:[email protected]] > Sent: Monday, January 26, 2009 7:52 PM > To: Karim Alkhayer > Cc: [email protected] > Subject: Re: [Ocfs2-users] How to force node [a] to consider node [b] dead? > > You are running a 3 year old version of the fs. Please upgrade > to something more current. Like sles9 sp4 or sles10 sp1 that > bundles ocfs2 1.2.9, or sles10 sp2 that ships ocfs2 1.4.1. > > Karim Alkhayer wrote: > >> Hi All, >> >> We have O2CB_HEARTBEAT_THRESHOLD set to 601 as the SAN gets overloaded >> sometimes and hence causing the nodes to panic >> >> This value has proven to be more stable than 31. However, there are >> sometimes where one of the nodes, for instance node [b] crashes, for >> whatever reason. While attempting to startup the troublesome node, >> auto mount is enabled but doesn't succeed, "Transport endpoint is not >> connected" is usually displayed. >> >> My opinion is this: the mount doesn't succeed because node [a] still >> thinks that node [b] is alive >> >> We're talking about a restart that can take around 15 minutes, so >> basically, the threshold is passed >> >> I was wondering if there is a workaround to kick node [b] out of the >> cluster so that it can join it again. What I've done so far, the >> incident happened once - a month ago, is to restart the cluster >> services on both machines. This was very expensive solution as all >> database instances had to go down >> >> OCFS2 1.2.1, SLES9 SP3 2.6.5-7.257-default, RAC 10.1.0.5, 5 DBs >> >> Thanks >> >> Karim >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> [email protected] >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
