> pod -> pod (directly to the other Pod IP) shouldn't go through any load
balancer related flows though, right?

It didn't match the final vip and ct_lb action. But when the lb rule
exists, it will first send all packets to conntrack and lead recirculation
with ovs clone and it hurts the performance.

And I find the initial commit that send all traffic to conntrack here
https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb
is to fix a bug.

Even if we bypass the conntrack action in ingress pipeline by a customized
ovn, we still cannot bypass the conntrack in the egress pipeline. All
egress packets still need to be sent to conntrack to test if they match a
nat session.

I cannot find the full performance test data at the moment. What I find is
that with the patch to bypass ingress conntrack, with lb rules, the latency
for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules
exist, the pod-to-pod latency drops to 88us.

On Thu, 9 Jun 2022 at 01:52, Dan Williams <[email protected]> wrote:

> On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote:
> > > Could you tell roughly how many packets were sent in a single test?
> > > Was
> > the latency measured for all the UDP packets in average?
> >
> > Let me describe my test method more clearly. In fact, we only tested
> > pod-to-pod performance *not* pod-to-service and then do profile with
> > flamegraph and find the loadbalancer process took about 30% CPU
> > usage.
>
> pod -> pod (directly to the other Pod IP) shouldn't go through any load
> balancer related flows though, right? That seems curious to me... It
> might hit OVN's load balancer stages but (I think!) shouldn't be
> matching any rules in them, because the packet's destination IP
> wouldn't be a LB VIP.
>
> Did you do an ofproto/trace to see what OVS flows the packet was
> hitting and if any were OVN LB related?
>
> Dan
>
> >
> > Run two Pods in two different node, and one run qperf server the
> > other run
> > qperf client to test udp latency and bandwidth performance with
> > command
> > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`.
> >
> > In the first test, we use kube-ovn default setup which use ovn
> > loadbalancer
> > to replace kube-proxy and got the result latency  25.7us and
> > bandwidth
> > 2.8Mb/s
> >
> > Then we manually delete all ovn loadbalancer rules bind to the
> > logical
> > switch, and got a much better result 18.5us and 6Mb/s
> >
> > > Was it clear why the total datapath cannot be offloaded to HW?
> > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't
> > support
> > dp_hash and hash at the moment and these two method are used by
> > group table to select a backend.
> > What makes things worse is that when any lb bind to a ls all packet
> > will go
> > through the lb pipeline even if it not designate to service. So the
> > total
> > ls datapath cannot be offloaded.
> >
> > We have a customized path to bypaas the lb pipeline if traffic not
> > designate to service here
> >
> https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch
> >
> > > I am sorry that I am confused by OVN "L2" LB. I think you might
> > > mean OVN
> > "L3/L4" LB?
> > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to
> > replace
> > kube-proxy
> >
> > >   I am asking because if the packets hit mega flows in the kernel
> > > cache,
> > it shouldn't be slower than kube-proxy which also uses conntrack. If
> > it is
> > HW offloaded it should be faster.
> >
> > In my previous profile it seems unrelated to mega flow cache. The
> > flame
> > graph shows that there is extra ovs clone and reprocess compared to
> > the
> > flame graph without lb. I have introduced how to profile and optimize
> > kube-ovn performance before and give more detail about the lb
> > performance
> > issue at the beginning of the video in Chinese
> > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide
> > more
> > help
> >
> > On Wed, 8 Jun 2022 at 23:53, Han Zhou <[email protected]> wrote:
> >
> > >
> > >
> > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <[email protected]>
> > > wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <[email protected]>
> > > > wrote:
> > > > >
> > > > > Just give some input about eBPF/XDP support.
> > > > >
> > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes,
> > > > > but found
> > > > > that
> > > > > the L2 LB will use conntrack and ovs clone which hurts
> > > > > performance
> > > badly.
> > > > > The latency
> > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth
> > > > > drop
> > > from
> > > > > 6Mb/s to 2.8Mb/s.
> > > > >
> > > Thanks for the input!
> > > Could you tell roughly how many packets were sent in a single test?
> > > Was
> > > the latency measured for all the UDP packets in average? I am
> > > asking
> > > because if the packets hit mega flows in the kernel cache, it
> > > shouldn't be
> > > slower than kube-proxy which also uses conntrack. If it is HW
> > > offloaded it
> > > should be faster.
> > >
> > > > > Even if the traffic does not target to LB VIPs has the same
> > > > > performance
> > > > > drop and it also leads to the
> > > > > total datapath cannot be offloaded to hardware.
> > > > >
> > >
> > > Was it clear why the total datapath cannot be offloaded to HW?
> > > There might
> > > be problems of supporting HW offloading in earlier version of OVN.
> > > There
> > > have been improvements to make it more HW offload friendly.
> > >
> > > > > And finally we turn to using Cilium's chaining mode to replace
> > > > > the OVN
> > > L2
> > > > > LB to implement kube-proxy to
> > > > > resolve the above issues. We hope to see the lb optimization by
> > > eBPF/XDP on
> > > > > the OVN side.
> > > > >
> > > >
> > > > Thanks for your comments and inputs.   I think we should
> > > > definitely
> > > > explore optimizing this use case
> > > > and see if its possible to leverage eBPF/XDP for this.
> > > >
> > >
> > > I am sorry that I am confused by OVN "L2" LB. I think you might
> > > mean OVN
> > > "L3/L4" LB?
> > >
> > > Some general thoughts on this is, OVN is primarily to program OVS
> > > (or
> > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a
> > > data-driven approach (as mentioned by Ben in several talks). The
> > > advantage
> > > is that it uses caches to accelerate datapath, regardless of the
> > > number of
> > > pipeline stages in the forwarding logic; and the disadvantage is of
> > > course
> > > when a packet has a cache miss, it will be slow. So I would think
> > > the
> > > direction of using eBPF/XDP is better to be within OVS itself,
> > > instead of
> > > adding an extra stage that cannot be cached within the OVS
> > > framework,
> > > because even if the extra stage is very fast, it is still extra.
> > >
> > > I would consider such an extra eBPF/XDP stage in OVN directly only
> > > for the
> > > cases that we know it is likely to miss the OVS/HW flow caches. One
> > > example
> > > may be DOS attacks that always trigger CT unestablished entries,
> > > which is
> > > not HW offload friendly. (But I don't have concrete use
> > > cases/scenarios)
> > >
> > > In the case of OVN LB, I don't see a reason why it would miss the
> > > cache
> > > except for the first packets. Adding an extra eBPF/XDP stage on top
> > > of the
> > > OVS cached pipeline doesn't seem to improve the performance.
> > >
> > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <[email protected]>
> > > > > wrote:
> > > > >
> > > > > > On Mon, May 30, 2022 at 5:46 PM <[email protected]> wrote:
> > > > > > >
> > > > > > > From: Numan Siddique <[email protected]>
> > > > > > >
> > > > > > > XDP program - ovn_xdp.c added in this RFC patch  series
> > > > > > > implements
> > > basic
> > > > > > port
> > > > > > > security and drops any packet if the port security check
> > > > > > > fails.
> > > > > > > There are still few TODOs in the port security checks. Like
> > > > > > >       - Make ovn xdp configurable.
> > > > > > >       - Removing the ingress Openflow rules from table 73
> > > > > > > and 74
> > > if ovn
> > > > > > xdp
> > > > > > >         is enabled.
> > > > > > >       - Add IPv6 support.
> > > > > > >       - Enhance the port security xdp program for ARP/IPv6
> > > > > > > ND
> > > checks.
> > > > > > >
> > > > > > > This patch adds a basic XDP support in OVN and in future we
> > > > > > > can
> > > > > > > leverage eBPF/XDP features.
> > > > > > >
> > > > > > > I'm not sure how much value this RFC patch adds to make use
> > > > > > > of
> > > eBPF/XDP
> > > > > > > just for port security.  Submitting as RFC to get some
> > > > > > > feedback and
> > > > > > > start some conversation on eBPF/XDP in OVN.
> > > > > > >
> > > > > > Hi Numan,
> > > > > >
> > > > > > This is really cool. It demonstrates how OVN could leverage
> > > > > > eBPF/XDP.
> > > > > >
> > > > > > On the other hand, for the port-security feature in XDP, I
> > > > > > keep
> > > thinking
> > > > > > about the scenarios and it is still not very clear to me. One
> > > advantage I
> > > > > > can think of is to prevent DOS attacks from VM/Pod when
> > > > > > invalid
> > > IP/MAC are
> > > > > > used, XDP may perform better and drop packets with lower CPU
> > > > > > cost
> > > > > > (comparing with OVS kernel datapath). However, I am also
> > > > > > wondering
> > > why
> > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you
> > > > > > have
> > > some more
> > > > > > thoughts about the use cases?
> > > >
> > > > My idea was to demonstrate the use of eBPF/XDP and port security
> > > > checks were easy to do
> > > > before the packet hits the OVS pipeline.
> > > >
> > > Understand. It is indeed a great demonstration.
> > >
> > > > If we were to move the port security check to XDP, then the only
> > > > advantage we would be getting
> > > > in my opinion is to remove the corresponding ingress port
> > > > security
> > > > check related OF rules from ovs-vswitchd, thereby decreasing some
> > > > looks up during
> > > > flow translation.
> > > >
> > > For slow path, it might reduce the lookups in two tables, but
> > > considering
> > > that we have tens of tables, this cost may be negligible?
> > > For fast path, there is no impact on the megaflow cache.
> > >
> > > > I'm not sure why an attacker would use invalid IP/MAC for DOS
> > > > attacks.
> > > > But from what I know, ovn-kubernetes do want to restrict each POD
> > > > to
> > > > its assigned IP/MAC.
> > > >
> > > Yes, restricting pods to use assigned IP/MAC is for port security,
> > > which
> > > is implemented by the port-security flows. I was talking about DOS
> > > attacks
> > > just to imagine a use case that utilizes the performance advantage
> > > of XDP.
> > > If it is just to detect and drop a regular amount of packets that
> > > try to
> > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't
> > > reflect
> > > the benefit of XDP.
> > >
> > > >  And do you have any performance results
> > > > > > comparing with the current OVS implementation?
> > > >
> > > > I didn't do any scale/performance related tests.
> > > >
> > > > If we were to move port security feature to XDP in OVN, then I
> > > > think  we
> > > need to
> > > >    - Complete the TODO's like adding IPv6 and ARP/ND related
> > > > checks
> > > >    - Do some scale testing and see whether its reducing memory
> > > > footprint of ovs-vswitchd and ovn-controller because of the
> > > > reduction
> > > > in OF rules
> > > >
> > >
> > > Maybe I am wrong, but I think port-security flows are only related
> > > to
> > > local LSPs on each node, which doesn't contribute much to the
> > > OVS/ovn-controller memory footprint, and thanks to your patches
> > > that moves
> > > port-security flow generation from northd to ovn-controller, the
> > > central
> > > components are already out of the picture of the port-security
> > > related
> > > costs. So I guess we won't see obvious differences in scale tests.
> > >
> > > > > >
> > > > > > Another question is, would it work with smart NIC HW-offload,
> > > > > > where
> > > VF
> > > > > > representer ports are added to OVS on the smart NIC? I guess
> > > > > > XDP
> > > doesn't
> > > > > > support representer port, right?
> > > >
> > > > I think so. I don't have much experience/knowledge on this.  From
> > > > what
> > > > I understand,  if datapath flows are offloaded and since XDP is
> > > > not
> > > > offloaded, the xdo checks will be totally missed.
> > > > So if XDP is to be used, then offloading should be disabled.
> > > >
> > >
> > > Agree, although I did hope it could help for HW offload enabled
> > > environments to mitigate the scenarios when packets would miss the
> > > HW flow
> > > cache.
> > >
> > > Thanks,
> > > Han
> > >
> > > > Thanks
> > > > Numan
> > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Han
> > > > > >
> > > > > > > In order to attach and detach xdp programs,  libxdp [1] and
> > > > > > > libbpf
> > > is
> > > > > > used.
> > > > > > >
> > > > > > > To test it out locally, please install libxdp-devel and
> > > libbpf-devel
> > > > > > > and the compile OVN first and then compile ovn_xdp by
> > > > > > > running "make
> > > > > > > bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or
> > > /usr/local/share/ovn/
> > > > > > >
> > > > > > >
> > > > > > > Numan Siddique (2):
> > > > > > >   RFC: Add basic xdp/eBPF support in OVN.
> > > > > > >   RFC: ovn-controller: Attach XDP progs to the VIFs of the
> > > > > > > logical
> > > > > > >     ports.
> > > > > > >
> > > > > > >  Makefile.am                 |   6 +-
> > > > > > >  bpf/.gitignore              |   5 +
> > > > > > >  bpf/automake.mk             |  23 +++
> > > > > > >  bpf/ovn_xdp.c               | 156 +++++++++++++++
> > > > > > >  configure.ac                |   2 +
> > > > > > >  controller/automake.mk      |   4 +-
> > > > > > >  controller/binding.c        |  45 +++--
> > > > > > >  controller/binding.h        |   7 +
> > > > > > >  controller/ovn-controller.c |  79 +++++++-
> > > > > > >  controller/xdp.c            | 389
> > > ++++++++++++++++++++++++++++++++++++
> > > > > > >  controller/xdp.h            |  41 ++++
> > > > > > >  m4/ovn.m4                   |  20 ++
> > > > > > >  tests/automake.mk           |   1 +
> > > > > > >  13 files changed, 753 insertions(+), 25 deletions(-)
> > > > > > >  create mode 100644 bpf/.gitignore
> > > > > > >  create mode 100644 bpf/automake.mk
> > > > > > >  create mode 100644 bpf/ovn_xdp.c
> > > > > > >  create mode 100644 controller/xdp.c
> > > > > > >  create mode 100644 controller/xdp.h
> > > > > > >
> > > > > > > --
> > > > > > > 2.35.3
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > dev mailing list
> > > > > > > [email protected]
> > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > [email protected]
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > 刘梦馨
> > > > > Blog: http://oilbeater.com
> > > > > Weibo: @oilbeater <http://weibo.com/oilbeater>
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > [email protected]
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> >
>
>

-- 
刘梦馨
Blog: http://oilbeater.com
Weibo: @oilbeater <http://weibo.com/oilbeater>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to