On Fri, May 7, 2021 at 7:11 PM Han Zhou <[email protected]> wrote: > > > > On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <[email protected]> wrote: > > > > On 5/7/21 7:35 AM, Han Zhou wrote: > > > Hi Ihar, > > > > > > Thanks for the patch, and it's great to see the significant performance > > > improvement! > > > Please see my comments below. > > > > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <[email protected]> > > > wrote: > > >> > > >> For allow-stateless ACLs, bypass connection tracking by avoiding > > >> setting ct hints for matching traffic. Avoid sending all traffic to ct > > >> when a stateful ACL is present. > > >> > > >> === > > >> > > >> Reusing an existing 'allow' verb for stateless matching would have its > > >> drawbacks, specifically, when it comes to backwards incompatibility of > > >> the new behavior with existing environments. When using "allow" ACLs > > >> in mixed allow/allow-related environment, we still commit "allow" > > >> traffic to conntrack. This unnecessarily hits performance when mixed > > >> ACL action types were used for the same datapath. This is why we > > >> introduce a new action verb to describe stateless behavior. > > >> > > >> Another complexity to consider is the fact that with stateless > > >> matching, one would not be able to rely on 'related' magic that > > >> guarantees that reply traffic is passed through. Instead, the user > > >> would have to accurately define matching rules both for request and > > >> reply directions of a protocol session. Specifically, when allowing > > >> ICMP for a specific peer host, one has to define 'allow-stateless' > > >> rules that would match against ip.dst for request direction and ip.src > > >> for reply direction. Other protocols and scenarios will require their > > >> own fine grained matching approaches implemented by the user. > > >> > > >> === > > >> > > >> For performance measurements, qperf was used. Tests were executed on two > > >> setups: > > >> > > >> 1) ovn-fake-multinode virtual environment; > > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > > >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > > >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > >> > > >> 1) ovn-fake-multinode: > > >> > > >> Performance measured between two virtual nodes, two ports > > >> that belong to different LSs connected via router. Using qperf, > > >> performance was measured for UDP, TCP, SCTP protocols (using > > >> <proto>_lat and <proto>_bw tests). The qperf version used: > > >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > > >> averages compared. > > >> > > >> Tests were executed with `allow-stateless` rules for the tested > > >> protocol and `allow-related` for another protocol set for both ports, > > >> both directions, e.g. for TCP scenario, the following ACLs were > > >> defined: > > >> > > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > >> > > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > >> > > >> In this particular environment, improvement was seen in send_bw, > > >> latency, and msg_rate measurements, where applicable, for all three > > >> protocols under test. > > >> > > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > > >> latency: 16 us => 14.08 us (-12%) > > >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > >> > > >> for TCP, latency: 18.6 us => 14.88 us (-20%) > > >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > >> > > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > > >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > >> > > >> Interestingly, some performance improvement was also seen for the same > > >> scenarios with no ACLs set at all, albeit significantly more > > >> negligible. > > >> > > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > > >> latency: 13.74 us => 12.88 us (-6.68%) > > >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > >> > > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) > > >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > >> > > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > > >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > >> > > >> 2) Supermicro RH-OSP cluster: > > >> > > >> Two scenarios executed: > > >> - connectivity between two ports on the same switch > > >> - connectivity between two ports on different switches connected via > > >> router > > >> > > >> For the same switch, improvements are as follows: > > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > >> > > >> For different switches, improvements are as follows: > > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > >> > > >> The effect is more noticeable in same switch scenario. > > > > > > This is interesting. Any idea why with different switches the result is so > > > different? I'd expect similar improvement just like in same switch, > > > because > > > in this kind of test the flow should be cached in kernel datapath > > > (regardless of same/different logical switches) and in both cases the cost > > > of traversing conntrack is saved. > > > > > >> > > >> Comparable numbers can be captured with iperf. It may be useful to run > > >> more tests in a more elaborate (bare metal) environment. > > >> > > >> === > > >> > > >> The patch takes inspiration from a now abandoned patch: > > >> > > >> "ovn-northd: Support mixing stateless/stateful ACLs with > > >> Stateless_Filter." by Dumitru Ceara. > > >> > > >> The original patch assumed CMS doesn't require full flexibility of > > >> matching rules for stateless matching (for example, to be used by > > >> OpenShift). But other CMS interfaces may require the same > > >> customizability for stateless as well as stateful matching, like in > > >> OpenStack Neutron API. Which is why this patch reuses existing ACL > > >> object type to describe stateless rules. > > >> > > >> Signed-off-by: Ihar Hrachyshka <[email protected]> > > >> > > >> --- > > >> > > >> v1: initial version. > > >> v2: rebased after conflict. > > >> v3: added ddlog implementation. > > >> v3: apply stateless skip-hint-set rules to appropriate direction only. > > >> v3: added more background as to implementation in commit message. > > >> v3: test stateless scenarios with ddlog too. > > >> v3: rebased after conflict. > > >> v4: introduce a separate allow-stateless ACL match verb. > > >> v5: rebased. > > >> v6: updated docs for new allow-stateless approach. > > >> v6: removed no longer valid comments. > > >> v6: removed acl_is_stateless. > > >> v7: bump north db schema version to 5.31.0 -> 5.32.0. > > >> v8: fixed checkpatch failure on a too long line in dbschema. > > >> v8: added perf data on baremetal lab testing. > > >> v9: fixed ovsdb checksum. > > >> --- > > >> NEWS | 2 + > > >> northd/ovn-northd.8.xml | 8 +- > > >> northd/ovn-northd.c | 65 ++++++++- > > >> northd/ovn_northd.dl | 31 ++++ > > >> ovn-nb.ovsschema | 9 +- > > >> ovn-nb.xml | 9 +- > > >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > > >> utilities/ovn-nbctl.c | 6 +- > > >> 8 files changed, 428 insertions(+), 11 deletions(-) > > >> > > >> diff --git a/NEWS b/NEWS > > >> index 1ddde15f8..d72139e76 100644 > > >> --- a/NEWS > > >> +++ b/NEWS > > >> @@ -11,6 +11,8 @@ Post-v21.03.0 > > >> 'use_parallel_build' to enable it. It is disabled by default. > > >> - Support vlan-passthru mode for tag=0 localnet ports. > > >> - Support custom 802.11ad EthType for localnet ports. > > >> + - Introduce a new "allow-stateless" ACL verb to always bypass > > > connection > > >> + tracking. The existing "allow" verb behavior is left intact. > > >> > > >> OVN v21.03.0 - 12 Mar 2021 > > >> ------------------------- > > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > >> index 54e88d3fa..08c4ea852 100644 > > >> --- a/northd/ovn-northd.8.xml > > >> +++ b/northd/ovn-northd.8.xml > > >> @@ -419,7 +419,9 @@ > > >> before eventually advancing to ingress table <code>ACLs</code>. If > > >> special ports such as route ports or localnet ports can't use > > > ct(), a > > >> priority-110 flow is added to skip over stateful ACLs. IPv6 > > > Neighbor > > >> - Discovery and MLD traffic also skips stateful ACLs. > > >> + Discovery and MLD traffic also skips stateful ACLs. For > > > "allow-stateless" > > >> + ACLs, a flow is added to bypass setting the hint for connection > > > tracker > > >> + processing. > > > > > > This approach clearly achieves the goal of bypassing conntrack and there > > > is > > > great dataplane performance gain, but it also doubles the ACL logical > > > flows > > > which in turn could significantly increase the control plane cost, > > > especially in ovn-controller. In large scale environments where lots of > > > ACLs referencing large port-group/address-sets are used, it is already a > > > bottleneck of ovn-controller flow computing. It would be better if we > > > could > > > avoid doubling this cost. Instead of directly add the ACL match here, can > > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > > > pre-stateful add a higher priority flow, if this hint is matched then > > > directly "next" without going through CT? I hope this can achieve the same > > > goal but not increasing control plane cost. What do you think? > > > > > > > Wouldn't this break ACL priority? E.g., with the following ACLs: > > > > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless > > ACL2: from-lport, prio=50, ip, drop > > > > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic > > will be dropped by the lflow added for ACL2. > > > You are right. Actually I was thinking about matching the hint again in acl > stage and just move next, but it would skip all the stateful ACLs that could > have a higher priority. This would work only if the stateless ACLs can always > be treated as higher priority than stateful ACLs, but it seems unreasonable. > It seems all the ACL matches have to be evaluated at the same stage to ensure > the priorities are followed. So, forget about this proposal. > > > On the other hand, IIUC, this feature is designed to be used in > > combination with "allow-related" ACLs defined on the same switch, > > replacing "allow" ACLs in that case. > > > > Before this change an "allow" ACL would've been treated as > > "allow-related" if there was at least one other "allow-related" ACL > > applied on the logical switch. Essentially generating 2 logical flows > > per "allow" ACL: > > > > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then > > "REGBIT_CONNTRACK_COMMIT=1, next" > > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" > > > > If I'm not wrong, changing this ACL to "allow-stateless" would also > > generate 2 logical flows, so at least from a number of logical flows > > perspective, the result is the same. > > > You are right. But this also means that when there is no stateful rules, > "allow-stateless" is worse than "allow" because "allow" would generate only a > single lflow in that case, while "allow-stateless" always generates doubled > lflows. So, I think at least we could do the simple improvement to make > "allow-stateless" perform the same as "allow" when there are NO stateful > ACLs. (Otherwise, my comment on the documentation regarding > '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' > would be wrong). We can move the logic of adding the stateless matching flows > of pre-acl stage into a if-block (only when has_stateful is true). Would this > work?
For the record, I looked again into that, and I don't think double lflows concern doesn't apply because we already check for od->has_stateful_acl before introducing any flows that set DEFRAG bit, as well as the new flows that omit doing it. So I don't think there is any optimization as described applicable. Let me know if that's not correct. Ihar _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
