On 02/25/2011 01:48 AM, JiaQiang Xu wrote: > 2011/2/24 Steven Dake <[email protected]>: >> redundant ring is completely untested with udpu. I would focus on >> getting udpu working first and go from there. >> >> passive offers better performance, active consumes more cpu with >> slightly lower latency. >> > > I did some tests on udpu and rrp mode. Here is my findings. > > First I tested 2 interfaces with rrp_mode=active. Here is my config on > one of the test nodes: > > interface { > member { > memberaddr: 192.168.1.3 > } > member { > memberaddr: 192.168.1.4 > } > ringnumber: 0 > bindnetaddr: 192.168.1.3 > mcastport: 4000 > } > interface { > member { > memberaddr: 192.168.2.3 > } > member { > memberaddr: 192.168.2.4 > } > ringnumber: 1 > bindnetaddr: 192.168.2.3 > mcastport: 3000 > } > transport: udpu > > Seems in most cases they work smoothly together. But something > bad happens when I disable and re-enable one of the two physical interfaces. > After re-enabling the if, corosync sometimes crashes with the following > message: > > corosync: totemsrp.c:1194: memb_consensus_agreed: Assertion > `token_memb_entries >= 1' failed. > > (I did not forget to run "corosync-cfgtool -r" after re-enabling the > interface.) > > This bug still exists even if I set rrp_mode=none and config only one > interface. > So I think this bug is not related to the integration of rrp and udpu. > If I use udp multicast instead, this problem disappears. > It may be a bug in the udpu code. > > I also found another bug (I believe) related to udpu: > I configure one udpu interface, without rrp on 2 nodes. > After a regular startup, crm_mon outputs on node 1: > > ============ > Last updated: Fri Feb 25 16:43:31 2011 > Stack: openais > Current DC: ubuntu-1 - partition with quorum > Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 0 Resources configured. > ============ > > Online: [ ubuntu-1 ubuntu-2 ] > > Then I manually disable the net if. crm_mon outputs on node 1 *doesn't > change*. > While on node 2, we have: > > ============ > Last updated: Fri Feb 25 16:43:29 2011 > Stack: openais > Current DC: ubuntu-2 - partition WITHOUT quorum > Version: 1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 0 Resources configured. > ============ > > Node ubuntu-1: UNCLEAN (offline) > Online: [ ubuntu-2 ] > > This result is inconsistent with the result I get from udp multicast > configuration. > Seems the node who loses a connection fails to update the membership with > udpu. >
Thanks for the testing results. One thing I noticed is you said you disabled the interface. Does this mean you did ifcnonfig eth down? See http://www.corosync.org/doku.php?id=faq:ifdown. An ifdown operation in redunddant ring has unknown non-deterministic results. Could you retest using iptables to do the fencing operation rather then ifconfig? Then we can get some bugs filed and know where to look. > BTW, what's your plan for testing udpu with rrp mode? > I personally would like to get redundant ring working well first including things like automatic ring recovery. Unfortunately while alot of people have interest in using redundant ring, not alot of people have interest in working on the related code and fixing problems with it. At this point, redundant ring (and udpu integration thereof) is at the bottom of my personal TODO list. But anyone else is free to work on the code and supply patches. In most use cases, bonding works appropriately and is what I recommend for deployments. Regards -steve > Thanks, > -Jiaqiang _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
