On Tue, 2009-11-03 at 08:46 +1030, Darren Thompson wrote: > Kelly et al > > My experience is from SLES (SUSE Enterprise) so you may need to > "localise" this for your distribution. > > First of all, get rid of that crappy "network bridge fudge script" that > XEN uses to monkey around with your comms when XEND starts. Find the > file Xendconfig.sxp (or the equivalent xend config file). > Find the network section where it explains the various bridging methods > and look for the line "network-bridge". REM this line out completely. > > Now using the appropriate network tools create a network bridge for each > real NIC that you have (in this case br0, br1, br2). > Within the bridge script, connect each of your real Ethernet ports to > the bridge, remove any IP configuration from the Ethernet port, add the > network configuration to the bridge. Do the same for both nodes. > Note; you can also bond the NICs first, then connect the bond to the > bridge and this also works, now with NIC redundancy, > > This way the bridging is intrinsically set up from network > initialisation and is not change part of the way through the boot > process (the ugly Xen network script caused me all sorts of hassles with > clusters which is why I now do it this way). > > I hope this helps. > > Regards > Darren > >
Darren Thanks for the detailed explination. There are numerous issues with xend and openais operating together because xen does wierd networking stuff after openais is started. I hope the above helps. Regards -steve > > > On Mon, 2009-11-02 at 16:55 -0500, Madison Kelly wrote: > > Hi all, > > > > I've been playing with clustering on CentOS 5.x lately and thus far > > I've not really understood AIS' role in it. I've been to the AIS website > > but there isn't much there. > > > > So my first question is; Where can I go to learn the fundamentals of AIS? > > > > Second question relates to a real-world problem I've been having. > > I've got a pretty 2-node simple cluster running DRBD+LVM on eth1. I use > > eth0 between the two nodes as a back channel which connects to an > > internal network via which we can manage the servers with IPMI. Lastly, > > there is eth2 on each node that is Internet-facing. > > > > So far, I can't put eth0 under Xen control (that is, I can't have it > > virtualized on dom0) without it causing a fence loop. I've tried asking > > for help elsewhere but some far I've not heard back. > > > > The reason I am asking here now is that the node that gets fenced > > shows this in it's logs several times just before going down: > > > > Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] FAILED TO RECEIVE > > Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] entering GATHER state from 6. > > > > On the surviving node I see this: > > > > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] The token was lost in the > > OPERATIONAL state. > > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Receive multicast socket > > recv buffer size (288000 bytes). > > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Transmit multicast socket > > send buffer size (262142 bytes). > > Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] entering GATHER state from 2. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering GATHER state from 0. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Creating commit token > > because I am the rep. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Saving state aru 2c high > > seq received 2c > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Storing new sequence id for > > ring 108 > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering COMMIT state. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering RECOVERY state. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] position [0] member > > 10.255.135.3: > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] previous ring seq 260 rep > > 10.255.135.2 > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] aru 2c high delivered 2c > > received flag 1 > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Did not need to originate > > any messages in recovery. > > Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Sending initial ORF token > > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] CLM CONFIGURATION CHANGE > > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] New Configuration: > > Oct 31 00:35:51 vsh03 kernel: dlm: closing connection to node 1 > > Oct 31 00:35:51 vsh03 fenced[3256]: vsh02.domain.com not a cluster > > member after 0 sec post_fail_delay > > Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] r(0) ip(10.255.135.3) > > Oct 31 00:35:51 vsh03 fenced[3256]: fencing node "vsh02.domain.com" > > > > This happens when I put eth0 under Xen's management. The nodes will > > keep fencing until DRBD breaks. At that point, the fencing stops and > > everything seems to be fine. However, once I fix the DRBD partition and > > try to look at the LVM the above errors return and I'm right back into a > > fence loop until DRBD breaks again. > > > > Even a pointer to where I can learn more about AIS/TOTEM so that I > > can try to understand what's going on would be awesome. I'm really stuck > > on this error... > > > > Thanks! > > > > Madi > > _______________________________________________ > > Openais mailing list > > [email protected] > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
