On Thu, Jul 02, 2009 at 01:15:18PM +0200, Jan Friesse wrote: > David Teigland wrote: > > On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote: > >> other nodes should immediately recognize it has > >> previously failed and process a complete failure for it. > > > > i.e. the full equivalent to what apps (using any api's) would see if the > > node had failed via normal token timeout. > > More or less agree, but does this patch fixed problem for you or not?
I haven't tried the patch, but based on the description and a quick look at the patch, I don't think it helps. Think more broadly about what's happening here, don't focus on one particular effect. 1. nodes 1,2,3,4: are cluster members 2. nodes 1,2,3,4: are using services A,B,C,D 3. node4: ifdown eth0, kill corosync 4. node4: ifup eth0, start corosync 5. node4: do not start/use any services 6. nodes 1,2,3: never see node4 removed from membership 7. nodes 1,2,3: services A,B,C,D never see node4 removed/fail Dave _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
