Re: [ClusterLabs] Making xt_cluster IP load-sharing work with IPv6 (Was: Concept of a Shared ipaddress/resource for generic applicatons)[
04.01.2020 01:42, Valentin Vidić пишет: > On Thu, Jan 02, 2020 at 09:52:09PM +0100, Jan Pokorný wrote: >> What you've used appears to be akin to what this chunk of manpage >> suggests (amongst others): >> https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.man >> >> which is (yet another) indicator to me that xt_cluster extension >> doesn't carry that functionality on its own (like CLUSTERIP target >> did, as mentioned). > ... > >> * But it doesn't explain the suggested destination MAC renormalization >> * on INPUT, which is currently yet to be heard of for our purpose... > > I did not use the INPUT rules from the xt_cluster documentation and > to be honest don't understand the setup described there. > ARP RFC says that on reply source and target hardware addresses are swapped, so reply is supposed to carry original source MAC as target MAC. AFAICT Linux ARP driver does not check it, but I guess it is good practice to make sure received packet conforms to standard's requirement. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Making xt_cluster IP load-sharing work with IPv6 (Was: Concept of a Shared ipaddress/resource for generic applicatons)[
On Thu, Jan 02, 2020 at 09:52:09PM +0100, Jan Pokorný wrote: > What you've used appears to be akin to what this chunk of manpage > suggests (amongst others): > https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.man > > which is (yet another) indicator to me that xt_cluster extension > doesn't carry that functionality on its own (like CLUSTERIP target > did, as mentioned). Right, the old ipt_CLUSTERIP.c (908 lines) was a complete solution with ARP mangling and /proc file to control node mapping. The new xt_cluster.c (175 lines) is much more limited and only handles the hashing part. The rest needs to be done externally, using iptables and arptables commands (or some nft equivalents). > 2. Is the following, for me viable explanation correct? > > That arrangement is to prevent here unexpectedly leaky specific > associations (I'd call "fixations") of the interface's true (hence > non-multicast) MAC address with meant-to-be-shared IP address at hand, > and hence cancelling the effect of link-multicasted frames (to which > at most a single recipient would respond per the firewall matching > rules), and therefore botching the "shared IP" concept altogether from > the perspective of network members that would undesirably learn > non-multicast address association for the particular > meant-to-be-shared IP leaked like this. Yes, I added VIP to both nodes as a normal address and they would both reply to ARP requests with their normal MAC addresses. The arptables command rewrites the ARP reply to use the multicast MAC for VIP instead. > * But it doesn't explain the suggested destination MAC renormalization > * on INPUT, which is currently yet to be heard of for our purpose... I did not use the INPUT rules from the xt_cluster documentation and to be honest don't understand the setup described there. > 4. Shall not even existing IPaddr2 (whether in CLUSTERIP-based mode >or not) actually verify that >/proc/sys/net/netfilter/nf_conntrack_tcp_loose >gets cleared, at least until told not to through configuration? > > - looks like a good idea not to allow any after-cut packets > interaction (would only apply to anything outside of the > critical cluster infrastructure since it uses UDP), as > a matter of safety precautions (there are no liveness > aspects to wish for in such scenarios, which could > otherwise interfere, I think) Did not use conntrack so not sure how important this setting is. > 5. Here, I had a closer look at the code as well and have an option >to try -- does this help? > > It appears as if that response in the (solicited) Neighbour > Advertisement is -- in Linux kernel -- unconditionally always > picked from the very first address configured on the device (not to > be confused with "permanent address"). Hence it looks to me that > the way to go would be, so as to achieve feature parity IPv4 vs. IPv6, > to either: > > - give up on the sole identity of the interface, so that it either > operates under selected multicast link layer address or doesn't > operate at all (rationale: better not to confuse the network with > occasional MAC flips?) > > - stick with a new macvlan pseudointerface, surprise-surprise, yet > another virtualization/mimicking/independence-increasing layer :-) > > No experience with macvlan on my side, but bridge mode looks appealing, > and would retain the interface addressable through its standard MAC > address as well. And importantly, the newly created interface would > have the correct (multicast) MAC address to respond with to the > respective Neighbour Solicitations (which is exactly what's asked, > IIUIC), and I expect it would be the one selected to respond to > the very matching IP in question? > > Still, this doesn't resolve any concern around point 3. above > (assuming it's not bogus, to begin with). Yes, macvlan or some other similar trick to bind the VIP with multicast MAC could be used here to avoid the whole packet rewrite mess. The only iptables rules required in this setup would be to select using xt_cluster if the incoming packet is to be handled by the current node or just ignored (DROP). -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Making xt_cluster IP load-sharing work with IPv6 (Was: Concept of a Shared ipaddress/resource for generic applicatons)[
On 27/12/19 15:04 +0100, Valentin Vidić wrote: > On Wed, Dec 04, 2019 at 02:44:49PM +0100, Jan Pokorný wrote: >> For the record, based on my feedback, iptables-extensions man page is >> headed to (finally) align with the actual in-kernel deprecation >> message: >> https://lore.kernel.org/netfilter-devel/20191204130921.2914-1-p...@nwl.cc/ > > From a quick run of xt_cluster it seems to be working as expected > for IPv4 FTR. when having "netfilter"/nftables backend available, you can either make use of iptables-translate conversion utility, or deduce a similar takeaway from https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.txlate?h=v1.8.4 possibly allowing to ditch any dependency on iptables-* tooling, and on xt_cluster.ko just as well! As mentioned in a newer incarnation asking about xt_cluster: https://lists.clusterlabs.org/pipermail/users/2020-January/026718.html for the envisioned agent, it would be a way to (optionally) allow for a rather lightweight operation in the future (where iptables may not get installed by default with some Linux distros at all; well, even firewalld-as-a-middleware variant controlled just via DBus calls might be thinkable, meaning that "nft" tool wouldn't be required, too). > It requires iptables rules and ARP reply rewrite like: > > arptables -A OUTPUT -o eth1 --h-length 6 -j mangle --mangle-mac-s > 01:00:5e:00:01:01 pardon my ignorance but you currently appear to be the greatest expert with practical experience on this list regarding the topic. * * * 1. Is this based solely on experience with xt_cluster extension that led you to this ARP-level rewrite unique to using netfilter backend, or would the same actually be needed with true CLUSTERIP target? Actually, I took a look at the code of CLUSTERIP extension, and it in fact is used to do the very same ARP level mangling, even though, it is slightly more precise, akin to (with stray in-line comments): arptables -A OUTPUT \ --h-type 1 \ # Ethernet --proto-type 0x800 \ # IPv4 --h-length 6 \ # perhaps redundant to --h-type? \ # cannot express limitation on the size of network address \ # but that would perhaps be redundant to --proto-type --opcode 2 \ # this time for Reply -j mangle --mangle-mac-s CLUSTERMAC # see also # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4095ebf1e641b0f37ee1cd04c903bb85cf4ed25b arptables -A OUTPUT \ --h-type 1 \ # Ethernet --proto-type 0x800 \ # IPv4 --h-length 6 \ # perhaps redundant to --h-type? \ # cannot express limitation on the size of network address \ # but that would perhaps be redundant to --proto-type --opcode 1 \ # this time for Request -j mangle --mangle-mac-s CLUSTERMAC What you've used appears to be akin to what this chunk of manpage suggests (amongst others): https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.man which is (yet another) indicator to me that xt_cluster extension doesn't carry that functionality on its own (like CLUSTERIP target did, as mentioned). * * Anyway, I'd like to understand why is this necessary in the first * place, getting to my second question. * 2. Is the following, for me viable explanation correct? That arrangement is to prevent here unexpectedly leaky specific associations (I'd call "fixations") of the interface's true (hence non-multicast) MAC address with meant-to-be-shared IP address at hand, and hence cancelling the effect of link-multicasted frames (to which at most a single recipient would respond per the firewall matching rules), and therefore botching the "shared IP" concept altogether from the perspective of network members that would undesirably learn non-multicast address association for the particular meant-to-be-shared IP leaked like this. * * But it doesn't explain the suggested destination MAC renormalization * on INPUT, which is currently yet to be heard of for our purpose... * 3. Is, perhaps, the following plausible explanations sound? - this is so as not spoil the local ARP cache/dependent interactions, such as when actual ARP request is sent with link layer address identical with the particular multicast MAC in use -- likely feasible with the other host on the network configured the same way and with already mentioned source-MAC rewriting on egress in place (so this would actually neutralize harmful network effect it could otherwise be causing) - or any other reasons? * * Finally, when referring to the suggestive example above, there is * one more question to ask... * 4. Shall not even existing IPaddr2 (whether in CLUSTERIP-based mode or not) actually verify that /proc/sys/net/netfilter/nf_conntrack_tcp_loose gets cleared, at least until told not to through configuration? - looks like a good idea not to allow any after-cut packets interaction (would only apply to anything outside of the critical cluster infrastructure since it uses UDP), as a matter