On 5/7/21 7:35 AM, Han Zhou wrote: > Hi Ihar, > > Thanks for the patch, and it's great to see the significant performance > improvement! > Please see my comments below. > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <[email protected]> wrote: >> >> For allow-stateless ACLs, bypass connection tracking by avoiding >> setting ct hints for matching traffic. Avoid sending all traffic to ct >> when a stateful ACL is present. >> >> === >> >> Reusing an existing 'allow' verb for stateless matching would have its >> drawbacks, specifically, when it comes to backwards incompatibility of >> the new behavior with existing environments. When using "allow" ACLs >> in mixed allow/allow-related environment, we still commit "allow" >> traffic to conntrack. This unnecessarily hits performance when mixed >> ACL action types were used for the same datapath. This is why we >> introduce a new action verb to describe stateless behavior. >> >> Another complexity to consider is the fact that with stateless >> matching, one would not be able to rely on 'related' magic that >> guarantees that reply traffic is passed through. Instead, the user >> would have to accurately define matching rules both for request and >> reply directions of a protocol session. Specifically, when allowing >> ICMP for a specific peer host, one has to define 'allow-stateless' >> rules that would match against ip.dst for request direction and ip.src >> for reply direction. Other protocols and scenarios will require their >> own fine grained matching approaches implemented by the user. >> >> === >> >> For performance measurements, qperf was used. Tests were executed on two >> setups: >> >> 1) ovn-fake-multinode virtual environment; >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. >> >> 1) ovn-fake-multinode: >> >> Performance measured between two virtual nodes, two ports >> that belong to different LSs connected via router. Using qperf, >> performance was measured for UDP, TCP, SCTP protocols (using >> <proto>_lat and <proto>_bw tests). The qperf version used: >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and >> averages compared. >> >> Tests were executed with `allow-stateless` rules for the tested >> protocol and `allow-related` for another protocol set for both ports, >> both directions, e.g. for TCP scenario, the following ACLs were >> defined: >> >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless >> >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related >> >> In this particular environment, improvement was seen in send_bw, >> latency, and msg_rate measurements, where applicable, for all three >> protocols under test. >> >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) >> latency: 16 us => 14.08 us (-12%) >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) >> >> for TCP, latency: 18.6 us => 14.88 us (-20%) >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) >> >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) >> >> Interestingly, some performance improvement was also seen for the same >> scenarios with no ACLs set at all, albeit significantly more >> negligible. >> >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) >> latency: 13.74 us => 12.88 us (-6.68%) >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) >> >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) >> >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) >> >> 2) Supermicro RH-OSP cluster: >> >> Two scenarios executed: >> - connectivity between two ports on the same switch >> - connectivity between two ports on different switches connected via >> router >> >> For the same switch, improvements are as follows: >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% >> >> For different switches, improvements are as follows: >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% >> >> The effect is more noticeable in same switch scenario. > > This is interesting. Any idea why with different switches the result is so > different? I'd expect similar improvement just like in same switch, because > in this kind of test the flow should be cached in kernel datapath > (regardless of same/different logical switches) and in both cases the cost > of traversing conntrack is saved. > >> >> Comparable numbers can be captured with iperf. It may be useful to run >> more tests in a more elaborate (bare metal) environment. >> >> === >> >> The patch takes inspiration from a now abandoned patch: >> >> "ovn-northd: Support mixing stateless/stateful ACLs with >> Stateless_Filter." by Dumitru Ceara. >> >> The original patch assumed CMS doesn't require full flexibility of >> matching rules for stateless matching (for example, to be used by >> OpenShift). But other CMS interfaces may require the same >> customizability for stateless as well as stateful matching, like in >> OpenStack Neutron API. Which is why this patch reuses existing ACL >> object type to describe stateless rules. >> >> Signed-off-by: Ihar Hrachyshka <[email protected]> >> >> --- >> >> v1: initial version. >> v2: rebased after conflict. >> v3: added ddlog implementation. >> v3: apply stateless skip-hint-set rules to appropriate direction only. >> v3: added more background as to implementation in commit message. >> v3: test stateless scenarios with ddlog too. >> v3: rebased after conflict. >> v4: introduce a separate allow-stateless ACL match verb. >> v5: rebased. >> v6: updated docs for new allow-stateless approach. >> v6: removed no longer valid comments. >> v6: removed acl_is_stateless. >> v7: bump north db schema version to 5.31.0 -> 5.32.0. >> v8: fixed checkpatch failure on a too long line in dbschema. >> v8: added perf data on baremetal lab testing. >> v9: fixed ovsdb checksum. >> --- >> NEWS | 2 + >> northd/ovn-northd.8.xml | 8 +- >> northd/ovn-northd.c | 65 ++++++++- >> northd/ovn_northd.dl | 31 ++++ >> ovn-nb.ovsschema | 9 +- >> ovn-nb.xml | 9 +- >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ >> utilities/ovn-nbctl.c | 6 +- >> 8 files changed, 428 insertions(+), 11 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 1ddde15f8..d72139e76 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -11,6 +11,8 @@ Post-v21.03.0 >> 'use_parallel_build' to enable it. It is disabled by default. >> - Support vlan-passthru mode for tag=0 localnet ports. >> - Support custom 802.11ad EthType for localnet ports. >> + - Introduce a new "allow-stateless" ACL verb to always bypass > connection >> + tracking. The existing "allow" verb behavior is left intact. >> >> OVN v21.03.0 - 12 Mar 2021 >> ------------------------- >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >> index 54e88d3fa..08c4ea852 100644 >> --- a/northd/ovn-northd.8.xml >> +++ b/northd/ovn-northd.8.xml >> @@ -419,7 +419,9 @@ >> before eventually advancing to ingress table <code>ACLs</code>. If >> special ports such as route ports or localnet ports can't use > ct(), a >> priority-110 flow is added to skip over stateful ACLs. IPv6 > Neighbor >> - Discovery and MLD traffic also skips stateful ACLs. >> + Discovery and MLD traffic also skips stateful ACLs. For > "allow-stateless" >> + ACLs, a flow is added to bypass setting the hint for connection > tracker >> + processing. > > This approach clearly achieves the goal of bypassing conntrack and there is > great dataplane performance gain, but it also doubles the ACL logical flows > which in turn could significantly increase the control plane cost, > especially in ovn-controller. In large scale environments where lots of > ACLs referencing large port-group/address-sets are used, it is already a > bottleneck of ovn-controller flow computing. It would be better if we could > avoid doubling this cost. Instead of directly add the ACL match here, can > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > pre-stateful add a higher priority flow, if this hint is matched then > directly "next" without going through CT? I hope this can achieve the same > goal but not increasing control plane cost. What do you think? >
Wouldn't this break ACL priority? E.g., with the following ACLs: ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless ACL2: from-lport, prio=50, ip, drop If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic will be dropped by the lflow added for ACL2. On the other hand, IIUC, this feature is designed to be used in combination with "allow-related" ACLs defined on the same switch, replacing "allow" ACLs in that case. Before this change an "allow" ACL would've been treated as "allow-related" if there was at least one other "allow-related" ACL applied on the logical switch. Essentially generating 2 logical flows per "allow" ACL: lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then "REGBIT_CONNTRACK_COMMIT=1, next" lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" If I'm not wrong, changing this ACL to "allow-stateless" would also generate 2 logical flows, so at least from a number of logical flows perspective, the result is the same. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
