On Thu, 2009-06-04 at 18:30 +0200, Lars Marowsky-Bree wrote: > On 2009-06-04T09:23:04, Steven Dake <sd...@redhat.com> wrote: > > > The problem with checking the link status with the current code is that > > the protocol blocks I/O waiting for a response from the failed ring. > > This could of course be modified to behave differently. > > Right, so the rechecking could possibly be a separate thread, sending an > occasional liveness packet on the failed ring and trigger the RRP > recovery after it has heard from other nodes on it?
Well I prefer totem to remain nonthreaded except for encrypted xmit operations, but in general, that is the basic idea. > Some smarts would be needed of course to not constantly retrigger > partially active rings (which would fail again immediately). > > > So the act of failing a link is expensive and we dont want to retest > > that it is valid very often. > > Does "expensive" mean that it'll actually slow down the healthy > ring(s)? > At the moment it blocks until the problem counter reaches the threshold at which point the ring is declared failed and normal communication continues. > > Regards, > Lars > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker