11 сентября 2015 г. 21:01:30 GMT+03:00, Digimer <[email protected]> пишет: >On 11/09/15 01:42 PM, Digimer wrote: >> Hi all, >> >> Starting a new thread from the "Clustered LVM with iptables issue" >> thread... >> >> I've decided to review how I do networking entirely in my cluster. >I >> make zero claims to being great at networks, so I would love some >feedback. >> >> I've got three active/passive bonded interfaces; Back-Channel, >Storage >> and Internet-Facing networks. The IFN is "off limits" to the cluster >as >> it is dedicated to hosted server traffic only. >> >> So before, I uses only the BCN for cluster traffic for >cman/corosync >> multicast traffic, no rrp. A couple months ago, I had a cluster >> partition when VM live migration (also on the BCN) congested the >> network. So I decided to enable RRP using the SN as backup, which has >> been marginally successful. >> >> Now, I want to switch to unicast (<cman transport="udpu"), RRP with >> the SN as the backup and BCN as the primary ring and do a proper >> IPTables firewall. Is this sane? >> >> When I stopped iptables entirely and started cman with unicast + >RRP, >> I saw this: >> >> ====] Node 1 >> Sep 11 17:31:24 node1 kernel: DLM (built Aug 10 2015 09:45:36) >installed >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Corosync Cluster >Engine >> ('1.4.7'): started and ready to provide service. >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Corosync built-in >> features: nss dbus rdma snmp >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Successfully read >> config from /etc/cluster/cluster.conf >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Successfully parsed >> cman config >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >transport >> (UDP/IP Unicast). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >transport >> (UDP/IP Unicast). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] The network >interface >> [10.20.10.1] is now up. >> Sep 11 17:31:24 node1 corosync[2523]: [QUORUM] Using quorum >provider >> quorum_cman >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:24 node1 corosync[2523]: [CMAN ] CMAN 3.0.12.1 (built >> Jul 6 2015 05:30:35) started >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync CMAN membership service 2.90 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> openais checkpoint service B.01.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync extended virtual synchrony service >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync configuration service >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync cluster closed process group service v1.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync cluster config database access v1.01 >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync profile loading service >> Sep 11 17:31:24 node1 corosync[2523]: [QUORUM] Using quorum >provider >> quorum_cman >> Sep 11 17:31:24 node1 corosync[2523]: [SERV ] Service engine >loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:24 node1 corosync[2523]: [MAIN ] Compatibility mode >set >> to whitetank. Using V1 and V2 of the synchronization engine. >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU >member >> {10.20.10.1} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU >member >> {10.20.10.2} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] The network >interface >> [10.10.10.1] is now up. >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU >member >> {10.10.10.1} >> Sep 11 17:31:24 node1 corosync[2523]: [TOTEM ] adding new UDPU >member >> {10.10.10.2} >> Sep 11 17:31:27 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 1 iface 10.10.10.1 to [1 of 3] >> Sep 11 17:31:27 node1 corosync[2523]: [TOTEM ] A processor joined >or >> left the membership and a new membership was formed. >> Sep 11 17:31:27 node1 corosync[2523]: [CMAN ] quorum regained, >> resuming activity >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] This node is within >the >> primary component and will provide service. >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[1]: 1 >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[1]: 1 >> Sep 11 17:31:27 node1 corosync[2523]: [CPG ] chosen downlist: >sender >> r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:0 left:0) >> Sep 11 17:31:27 node1 corosync[2523]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 11 17:31:27 node1 corosync[2523]: [TOTEM ] A processor joined >or >> left the membership and a new membership was formed. >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[2]: 1 2 >> Sep 11 17:31:27 node1 corosync[2523]: [QUORUM] Members[2]: 1 2 >> Sep 11 17:31:27 node1 corosync[2523]: [CPG ] chosen downlist: >sender >> r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:1 left:0) >> Sep 11 17:31:27 node1 corosync[2523]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 11 17:31:29 node1 corosync[2523]: [TOTEM ] ring 1 active with >no >> faults >> Sep 11 17:31:29 node1 fenced[2678]: fenced 3.0.12.1 started >> Sep 11 17:31:29 node1 dlm_controld[2691]: dlm_controld 3.0.12.1 >started >> Sep 11 17:31:30 node1 gfs_controld[2755]: gfs_controld 3.0.12.1 >started >> ==== >> >> ====] Node 2 >> Sep 11 17:31:23 node2 kernel: DLM (built Aug 10 2015 09:45:36) >installed >> Sep 11 17:31:23 node2 corosync[2271]: [MAIN ] Corosync Cluster >Engine >> ('1.4.7'): started and ready to provide service. >> Sep 11 17:31:23 node2 corosync[2271]: [MAIN ] Corosync built-in >> features: nss dbus rdma snmp >> Sep 11 17:31:23 node2 corosync[2271]: [MAIN ] Successfully read >> config from /etc/cluster/cluster.conf >> Sep 11 17:31:23 node2 corosync[2271]: [MAIN ] Successfully parsed >> cman config >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] Initializing >transport >> (UDP/IP Unicast). >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] Initializing >transport >> (UDP/IP Unicast). >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] Initializing >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] The network >interface >> [10.20.10.2] is now up. >> Sep 11 17:31:23 node2 corosync[2271]: [QUORUM] Using quorum >provider >> quorum_cman >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:23 node2 corosync[2271]: [CMAN ] CMAN 3.0.12.1 (built >> Jul 6 2015 05:30:35) started >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync CMAN membership service 2.90 >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> openais checkpoint service B.01.01 >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync extended virtual synchrony service >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync configuration service >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync cluster closed process group service v1.01 >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync cluster config database access v1.01 >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync profile loading service >> Sep 11 17:31:23 node2 corosync[2271]: [QUORUM] Using quorum >provider >> quorum_cman >> Sep 11 17:31:23 node2 corosync[2271]: [SERV ] Service engine >loaded: >> corosync cluster quorum service v0.1 >> Sep 11 17:31:23 node2 corosync[2271]: [MAIN ] Compatibility mode >set >> to whitetank. Using V1 and V2 of the synchronization engine. >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] adding new UDPU >member >> {10.20.10.1} >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] adding new UDPU >member >> {10.20.10.2} >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] The network >interface >> [10.10.10.2] is now up. >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] adding new UDPU >member >> {10.10.10.1} >> Sep 11 17:31:23 node2 corosync[2271]: [TOTEM ] adding new UDPU >member >> {10.10.10.2} >> Sep 11 17:31:26 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 1 iface 10.10.10.2 to [1 of 3] >> Sep 11 17:31:26 node2 corosync[2271]: [TOTEM ] A processor joined >or >> left the membership and a new membership was formed. >> Sep 11 17:31:26 node2 corosync[2271]: [CMAN ] quorum regained, >> resuming activity >> Sep 11 17:31:26 node2 corosync[2271]: [QUORUM] This node is within >the >> primary component and will provide service. >> Sep 11 17:31:26 node2 corosync[2271]: [QUORUM] Members[1]: 2 >> Sep 11 17:31:26 node2 corosync[2271]: [QUORUM] Members[1]: 2 >> Sep 11 17:31:26 node2 corosync[2271]: [CPG ] chosen downlist: >sender >> r(0) ip(10.20.10.2) r(1) ip(10.10.10.2) ; members(old:0 left:0) >> Sep 11 17:31:26 node2 corosync[2271]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 11 17:31:27 node2 corosync[2271]: [TOTEM ] A processor joined >or >> left the membership and a new membership was formed. >> Sep 11 17:31:27 node2 corosync[2271]: [QUORUM] Members[2]: 1 2 >> Sep 11 17:31:27 node2 corosync[2271]: [QUORUM] Members[2]: 1 2 >> Sep 11 17:31:27 node2 corosync[2271]: [CPG ] chosen downlist: >sender >> r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:1 left:0) >> Sep 11 17:31:27 node2 corosync[2271]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Sep 11 17:31:28 node2 corosync[2271]: [TOTEM ] ring 1 active with >no >> faults >> Sep 11 17:31:28 node2 fenced[2359]: fenced 3.0.12.1 started >> Sep 11 17:31:28 node2 dlm_controld[2390]: dlm_controld 3.0.12.1 >started >> Sep 11 17:31:29 node2 gfs_controld[2442]: gfs_controld 3.0.12.1 >started >> ==== >> >> >> This looked good to me. So I wanted to test RRP by ifdown'ing >bcn_bond1 >> on node 1 only, leaving bcn_bond1 up on node2. The cluster survived >and >> seemed to the SN, but I saw this error repeatedly printed; >> >> ====] Node 1 >> Sep 11 17:31:46 node1 kernel: bcn_bond1: Removing slave bcn_link1 >> Sep 11 17:31:46 node1 kernel: bcn_bond1: Releasing active interface >> bcn_link1 >> Sep 11 17:31:46 node1 kernel: bcn_bond1: the permanent HWaddr of >> bcn_link1 - 52:54:00:b0:e4:c8 - is still in use by bcn_bond1 - set >the >> HWaddr of bcn_link1 to a different address to avoid conflicts >> Sep 11 17:31:46 node1 kernel: bcn_bond1: making interface bcn_link2 >the >> new active one >> Sep 11 17:31:46 node1 kernel: ICMPv6 NA: someone advertises our >address >> fe80:0000:0000:0000:5054:00ff:feb0:e4c8 on bcn_link1! >> Sep 11 17:31:46 node1 kernel: bcn_bond1: Removing slave bcn_link2 >> Sep 11 17:31:46 node1 kernel: bcn_bond1: Releasing active interface >> bcn_link2 >> Sep 11 17:31:48 node1 ntpd[2037]: Deleting interface #7 bcn_link1, >> fe80::5054:ff:feb0:e4c8#123, interface stats: received=0, sent=0, >> dropped=0, active_time=48987 secs >> Sep 11 17:31:48 node1 ntpd[2037]: Deleting interface #6 bcn_bond1, >> fe80::5054:ff:feb0:e4c8#123, interface stats: received=0, sent=0, >> dropped=0, active_time=48987 secs >> Sep 11 17:31:48 node1 ntpd[2037]: Deleting interface #3 bcn_bond1, >> 10.20.10.1#123, interface stats: received=0, sent=0, dropped=0, >> active_time=48987 secs >> Sep 11 17:31:51 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 677 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:31:53 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:31:57 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 679 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:31:59 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:04 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 681 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:06 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:11 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 683 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:13 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:17 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 685 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:19 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:24 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 687 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:26 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:31 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 689 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:33 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:37 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 691 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:39 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:44 node1 corosync[2523]: [TOTEM ] Incrementing problem >> counter for seqid 693 iface 10.20.10.1 to [1 of 3] >> Sep 11 17:32:46 node1 corosync[2523]: [TOTEM ] ring 0 active with >no >> faults >> ==== >> >> ====] Node 2 >> Sep 11 17:31:48 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 676 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:31:50 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:31:54 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 678 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:31:56 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:01 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 680 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:03 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:08 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 682 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:10 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:14 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 684 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:16 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:21 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 686 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:23 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:28 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 688 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:30 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:35 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 690 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:37 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:41 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 692 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:43 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> Sep 11 17:32:48 node2 corosync[2271]: [TOTEM ] Incrementing problem >> counter for seqid 694 iface 10.20.10.2 to [1 of 3] >> Sep 11 17:32:50 node2 corosync[2271]: [TOTEM ] ring 0 active with >no >> faults >> ==== >> >> When I ifup'ed bcn_bond1 on node1, the messages stopped printing. So >> before I even start on iptables, I am curious if I am doing something >> incorrect here. >> >> Advice? >> >> Thanks! > >According to this; > >https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-config-network-conga-CA.html > >Unicast + GFS2 is NOT recommended. So maybe that idea is already out >the >window?
That advise is hmmm... weird. Dlm and gfs control daemons use corosync/cman only for membership and quorum. Everything else is done directly in kernel which is unaware of what is corosync and what transport does it use. _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
