On Fri, May 7, 2021 at 7:11 PM Han Zhou <[email protected]> wrote:
>
>
>
> On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <[email protected]> wrote:
> >
> > On 5/7/21 7:35 AM, Han Zhou wrote:
> > > Hi Ihar,
> > >
> > > Thanks for the patch, and it's great to see the significant performance
> > > improvement!
> > > Please see my comments below.
> > >
> > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <[email protected]> 
> > > wrote:
> > >>
> > >> For allow-stateless ACLs, bypass connection tracking by avoiding
> > >> setting ct hints for matching traffic. Avoid sending all traffic to ct
> > >> when a stateful ACL is present.
> > >>
> > >> ===
> > >>
> > >> Reusing an existing 'allow' verb for stateless matching would have its
> > >> drawbacks, specifically, when it comes to backwards incompatibility of
> > >> the new behavior with existing environments. When using "allow" ACLs
> > >> in mixed allow/allow-related environment, we still commit "allow"
> > >> traffic to conntrack. This unnecessarily hits performance when mixed
> > >> ACL action types were used for the same datapath. This is why we
> > >> introduce a new action verb to describe stateless behavior.
> > >>
> > >> Another complexity to consider is the fact that with stateless
> > >> matching, one would not be able to rely on 'related' magic that
> > >> guarantees that reply traffic is passed through. Instead, the user
> > >> would have to accurately define matching rules both for request and
> > >> reply directions of a protocol session. Specifically, when allowing
> > >> ICMP for a specific peer host, one has to define 'allow-stateless'
> > >> rules that would match against ip.dst for request direction and ip.src
> > >> for reply direction. Other protocols and scenarios will require their
> > >> own fine grained matching approaches implemented by the user.
> > >>
> > >> ===
> > >>
> > >> For performance measurements, qperf was used. Tests were executed on two
> > >> setups:
> > >>
> > >> 1) ovn-fake-multinode virtual environment;
> > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
> > >>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> > >>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> > >>
> > >> 1) ovn-fake-multinode:
> > >>
> > >> Performance measured between two virtual nodes, two ports
> > >> that belong to different LSs connected via router. Using qperf,
> > >> performance was measured for UDP, TCP, SCTP protocols (using
> > >> <proto>_lat and <proto>_bw tests). The qperf version used:
> > >> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> > >> averages compared.
> > >>
> > >> Tests were executed with `allow-stateless` rules for the tested
> > >> protocol and `allow-related` for another protocol set for both ports,
> > >> both directions, e.g. for TCP scenario, the following ACLs were
> > >> defined:
> > >>
> > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> > >>
> > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> > >>
> > >> In this particular environment, improvement was seen in send_bw,
> > >> latency, and msg_rate measurements, where applicable, for all three
> > >> protocols under test.
> > >>
> > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> > >>          latency: 16 us => 14.08 us (-12%)
> > >>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> > >>
> > >> for TCP, latency: 18.6 us => 14.88 us (-20%)
> > >>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> > >>
> > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> > >>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> > >>
> > >> Interestingly, some performance improvement was also seen for the same
> > >> scenarios with no ACLs set at all, albeit significantly more
> > >> negligible.
> > >>
> > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> > >>          latency: 13.74 us => 12.88 us (-6.68%)
> > >>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> > >>
> > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> > >>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> > >>
> > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> > >>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> > >>
> > >> 2) Supermicro RH-OSP cluster:
> > >>
> > >> Two scenarios executed:
> > >> - connectivity between two ports on the same switch
> > >> - connectivity between two ports on different switches connected via
> > >>   router
> > >>
> > >> For the same switch, improvements are as follows:
> > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> > >>
> > >> For different switches, improvements are as follows:
> > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> > >>
> > >> The effect is more noticeable in same switch scenario.
> > >
> > > This is interesting. Any idea why with different switches the result is so
> > > different? I'd expect similar improvement just like in same switch, 
> > > because
> > > in this kind of test the flow should be cached in kernel datapath
> > > (regardless of same/different logical switches) and in both cases the cost
> > > of traversing conntrack is saved.
> > >
> > >>
> > >> Comparable numbers can be captured with iperf. It may be useful to run
> > >> more tests in a more elaborate (bare metal) environment.
> > >>
> > >> ===
> > >>
> > >> The patch takes inspiration from a now abandoned patch:
> > >>
> > >> "ovn-northd: Support mixing stateless/stateful ACLs with
> > >> Stateless_Filter." by Dumitru Ceara.
> > >>
> > >> The original patch assumed CMS doesn't require full flexibility of
> > >> matching rules for stateless matching (for example, to be used by
> > >> OpenShift). But other CMS interfaces may require the same
> > >> customizability for stateless as well as stateful matching, like in
> > >> OpenStack Neutron API. Which is why this patch reuses existing ACL
> > >> object type to describe stateless rules.
> > >>
> > >> Signed-off-by: Ihar Hrachyshka <[email protected]>
> > >>
> > >> ---
> > >>
> > >> v1: initial version.
> > >> v2: rebased after conflict.
> > >> v3: added ddlog implementation.
> > >> v3: apply stateless skip-hint-set rules to appropriate direction only.
> > >> v3: added more background as to implementation in commit message.
> > >> v3: test stateless scenarios with ddlog too.
> > >> v3: rebased after conflict.
> > >> v4: introduce a separate allow-stateless ACL match verb.
> > >> v5: rebased.
> > >> v6: updated docs for new allow-stateless approach.
> > >> v6: removed no longer valid comments.
> > >> v6: removed acl_is_stateless.
> > >> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> > >> v8: fixed checkpatch failure on a too long line in dbschema.
> > >> v8: added perf data on baremetal lab testing.
> > >> v9: fixed ovsdb checksum.
> > >> ---
> > >>  NEWS                    |   2 +
> > >>  northd/ovn-northd.8.xml |   8 +-
> > >>  northd/ovn-northd.c     |  65 ++++++++-
> > >>  northd/ovn_northd.dl    |  31 ++++
> > >>  ovn-nb.ovsschema        |   9 +-
> > >>  ovn-nb.xml              |   9 +-
> > >>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
> > >>  utilities/ovn-nbctl.c   |   6 +-
> > >>  8 files changed, 428 insertions(+), 11 deletions(-)
> > >>
> > >> diff --git a/NEWS b/NEWS
> > >> index 1ddde15f8..d72139e76 100644
> > >> --- a/NEWS
> > >> +++ b/NEWS
> > >> @@ -11,6 +11,8 @@ Post-v21.03.0
> > >>      'use_parallel_build' to enable it.  It is disabled by default.
> > >>    - Support vlan-passthru mode for tag=0 localnet ports.
> > >>    - Support custom 802.11ad EthType for localnet ports.
> > >> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> > > connection
> > >> +    tracking. The existing "allow" verb behavior is left intact.
> > >>
> > >>  OVN v21.03.0 - 12 Mar 2021
> > >>  -------------------------
> > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > >> index 54e88d3fa..08c4ea852 100644
> > >> --- a/northd/ovn-northd.8.xml
> > >> +++ b/northd/ovn-northd.8.xml
> > >> @@ -419,7 +419,9 @@
> > >>        before eventually advancing to ingress table <code>ACLs</code>. If
> > >>        special ports such as route ports or localnet ports can't use
> > > ct(), a
> > >>        priority-110 flow is added to skip over stateful ACLs. IPv6
> > > Neighbor
> > >> -      Discovery and MLD traffic also skips stateful ACLs.
> > >> +      Discovery and MLD traffic also skips stateful ACLs. For
> > > "allow-stateless"
> > >> +      ACLs, a flow is added to bypass setting the hint for connection
> > > tracker
> > >> +      processing.
> > >
> > > This approach clearly achieves the goal of bypassing conntrack and there 
> > > is
> > > great dataplane performance gain, but it also doubles the ACL logical 
> > > flows
> > > which in turn could significantly increase the control plane cost,
> > > especially in ovn-controller. In large scale environments where lots of
> > > ACLs referencing large port-group/address-sets are used, it is already a
> > > bottleneck of ovn-controller flow computing. It would be better if we 
> > > could
> > > avoid doubling this cost. Instead of directly add the ACL match here, can
> > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
> > > pre-stateful add a higher priority flow, if this hint is matched then
> > > directly "next" without going through CT? I hope this can achieve the same
> > > goal but not increasing control plane cost. What do you think?
> > >
> >
> > Wouldn't this break ACL priority?  E.g., with the following ACLs:
> >
> > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
> > ACL2: from-lport, prio=50, ip, drop
> >
> > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
> > will be dropped by the lflow added for ACL2.
> >
> You are right. Actually I was thinking about matching the hint again in acl 
> stage and just move next, but it would skip all the stateful ACLs that could 
> have a higher priority. This would work only if the stateless ACLs can always 
> be treated as higher priority than stateful ACLs, but it seems unreasonable. 
> It seems all the ACL matches have to be evaluated at the same stage to ensure 
> the priorities are followed. So, forget about this proposal.
>
> > On the other hand, IIUC, this feature is designed to be used in
> > combination with "allow-related" ACLs defined on the same switch,
> > replacing "allow" ACLs in that case.
> >
> > Before this change an "allow" ACL would've been treated as
> > "allow-related" if there was at least one other "allow-related" ACL
> > applied on the logical switch.  Essentially generating 2 logical flows
> > per "allow" ACL:
> >
> > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
> > "REGBIT_CONNTRACK_COMMIT=1, next"
> > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"
> >
> > If I'm not wrong, changing this ACL to "allow-stateless" would also
> > generate 2 logical flows, so at least from a number of logical flows
> > perspective, the result is the same.
> >
> You are right. But this also means that when there is no stateful rules, 
> "allow-stateless" is worse than "allow" because "allow" would generate only a 
> single lflow in that case, while "allow-stateless" always generates doubled 
> lflows. So, I think at least we could do the simple improvement to make 
> "allow-stateless" perform the same as "allow" when there are NO stateful 
> ACLs. (Otherwise, my comment on the documentation regarding 
> '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' 
> would be wrong). We can move the logic of adding the stateless matching flows 
> of pre-acl stage into a if-block (only when has_stateful is true). Would this 
> work?

For the record, I looked again into that, and I don't think double
lflows concern doesn't apply because we already check for
od->has_stateful_acl before introducing any flows that set DEFRAG bit,
as well as the new flows that omit doing it. So I don't think there is
any optimization as described applicable. Let me know if that's not
correct.

Ihar

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to