> pod -> pod (directly to the other Pod IP) shouldn't go through any load balancer related flows though, right?
It didn't match the final vip and ct_lb action. But when the lb rule exists, it will first send all packets to conntrack and lead recirculation with ovs clone and it hurts the performance. And I find the initial commit that send all traffic to conntrack here https://github.com/ovn-org/ovn/commit/64cc065e2c59c0696edeef738180989d993ceceb is to fix a bug. Even if we bypass the conntrack action in ingress pipeline by a customized ovn, we still cannot bypass the conntrack in the egress pipeline. All egress packets still need to be sent to conntrack to test if they match a nat session. I cannot find the full performance test data at the moment. What I find is that with the patch to bypass ingress conntrack, with lb rules, the latency for pod-to-pod qperf test dropped from 118us to 97us. And if no lb rules exist, the pod-to-pod latency drops to 88us. On Thu, 9 Jun 2022 at 01:52, Dan Williams <[email protected]> wrote: > On Thu, 2022-06-09 at 00:41 +0800, 刘梦馨 wrote: > > > Could you tell roughly how many packets were sent in a single test? > > > Was > > the latency measured for all the UDP packets in average? > > > > Let me describe my test method more clearly. In fact, we only tested > > pod-to-pod performance *not* pod-to-service and then do profile with > > flamegraph and find the loadbalancer process took about 30% CPU > > usage. > > pod -> pod (directly to the other Pod IP) shouldn't go through any load > balancer related flows though, right? That seems curious to me... It > might hit OVN's load balancer stages but (I think!) shouldn't be > matching any rules in them, because the packet's destination IP > wouldn't be a LB VIP. > > Did you do an ofproto/trace to see what OVS flows the packet was > hitting and if any were OVN LB related? > > Dan > > > > > Run two Pods in two different node, and one run qperf server the > > other run > > qperf client to test udp latency and bandwidth performance with > > command > > `qperf {another Pod IP} -ub -oo msg_size:1 -vu udp_lat udp_bw`. > > > > In the first test, we use kube-ovn default setup which use ovn > > loadbalancer > > to replace kube-proxy and got the result latency 25.7us and > > bandwidth > > 2.8Mb/s > > > > Then we manually delete all ovn loadbalancer rules bind to the > > logical > > switch, and got a much better result 18.5us and 6Mb/s > > > > > Was it clear why the total datapath cannot be offloaded to HW? > > The issue we meet with hw-offload is that mellanox cx5/cx6 didn't > > support > > dp_hash and hash at the moment and these two method are used by > > group table to select a backend. > > What makes things worse is that when any lb bind to a ls all packet > > will go > > through the lb pipeline even if it not designate to service. So the > > total > > ls datapath cannot be offloaded. > > > > We have a customized path to bypaas the lb pipeline if traffic not > > designate to service here > > > https://github.com/kubeovn/ovn/commit/d26ae4de0ab070f6b602688ba808c8963f69d5c4.patch > > > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > > mean OVN > > "L3/L4" LB? > > I mean loadbalancers add to ls by ls-lb-add, kube-ovn uses it to > > replace > > kube-proxy > > > > > I am asking because if the packets hit mega flows in the kernel > > > cache, > > it shouldn't be slower than kube-proxy which also uses conntrack. If > > it is > > HW offloaded it should be faster. > > > > In my previous profile it seems unrelated to mega flow cache. The > > flame > > graph shows that there is extra ovs clone and reprocess compared to > > the > > flame graph without lb. I have introduced how to profile and optimize > > kube-ovn performance before and give more detail about the lb > > performance > > issue at the beginning of the video in Chinese > > https://www.youtube.com/watch?v=eqKHs05NUlg&t=27s hope it can provide > > more > > help > > > > On Wed, 8 Jun 2022 at 23:53, Han Zhou <[email protected]> wrote: > > > > > > > > > > > On Wed, Jun 8, 2022 at 8:08 AM Numan Siddique <[email protected]> > > > wrote: > > > > > > > > On Wed, Jun 8, 2022 at 6:34 AM 刘梦馨 <[email protected]> > > > > wrote: > > > > > > > > > > Just give some input about eBPF/XDP support. > > > > > > > > > > We used to use OVN L2 LB to replace kube-proxy in Kubernetes, > > > > > but found > > > > > that > > > > > the L2 LB will use conntrack and ovs clone which hurts > > > > > performance > > > badly. > > > > > The latency > > > > > for 1byte udp packet jumps from 18.5us to 25.7us and bandwidth > > > > > drop > > > from > > > > > 6Mb/s to 2.8Mb/s. > > > > > > > > Thanks for the input! > > > Could you tell roughly how many packets were sent in a single test? > > > Was > > > the latency measured for all the UDP packets in average? I am > > > asking > > > because if the packets hit mega flows in the kernel cache, it > > > shouldn't be > > > slower than kube-proxy which also uses conntrack. If it is HW > > > offloaded it > > > should be faster. > > > > > > > > Even if the traffic does not target to LB VIPs has the same > > > > > performance > > > > > drop and it also leads to the > > > > > total datapath cannot be offloaded to hardware. > > > > > > > > > > > Was it clear why the total datapath cannot be offloaded to HW? > > > There might > > > be problems of supporting HW offloading in earlier version of OVN. > > > There > > > have been improvements to make it more HW offload friendly. > > > > > > > > And finally we turn to using Cilium's chaining mode to replace > > > > > the OVN > > > L2 > > > > > LB to implement kube-proxy to > > > > > resolve the above issues. We hope to see the lb optimization by > > > eBPF/XDP on > > > > > the OVN side. > > > > > > > > > > > > > Thanks for your comments and inputs. I think we should > > > > definitely > > > > explore optimizing this use case > > > > and see if its possible to leverage eBPF/XDP for this. > > > > > > > > > > I am sorry that I am confused by OVN "L2" LB. I think you might > > > mean OVN > > > "L3/L4" LB? > > > > > > Some general thoughts on this is, OVN is primarily to program OVS > > > (or > > > other OpenFlow based datapath) to implement SDN. OVS OpenFlow is a > > > data-driven approach (as mentioned by Ben in several talks). The > > > advantage > > > is that it uses caches to accelerate datapath, regardless of the > > > number of > > > pipeline stages in the forwarding logic; and the disadvantage is of > > > course > > > when a packet has a cache miss, it will be slow. So I would think > > > the > > > direction of using eBPF/XDP is better to be within OVS itself, > > > instead of > > > adding an extra stage that cannot be cached within the OVS > > > framework, > > > because even if the extra stage is very fast, it is still extra. > > > > > > I would consider such an extra eBPF/XDP stage in OVN directly only > > > for the > > > cases that we know it is likely to miss the OVS/HW flow caches. One > > > example > > > may be DOS attacks that always trigger CT unestablished entries, > > > which is > > > not HW offload friendly. (But I don't have concrete use > > > cases/scenarios) > > > > > > In the case of OVN LB, I don't see a reason why it would miss the > > > cache > > > except for the first packets. Adding an extra eBPF/XDP stage on top > > > of the > > > OVS cached pipeline doesn't seem to improve the performance. > > > > > > > > On Wed, 8 Jun 2022 at 14:43, Han Zhou <[email protected]> > > > > > wrote: > > > > > > > > > > > On Mon, May 30, 2022 at 5:46 PM <[email protected]> wrote: > > > > > > > > > > > > > > From: Numan Siddique <[email protected]> > > > > > > > > > > > > > > XDP program - ovn_xdp.c added in this RFC patch series > > > > > > > implements > > > basic > > > > > > port > > > > > > > security and drops any packet if the port security check > > > > > > > fails. > > > > > > > There are still few TODOs in the port security checks. Like > > > > > > > - Make ovn xdp configurable. > > > > > > > - Removing the ingress Openflow rules from table 73 > > > > > > > and 74 > > > if ovn > > > > > > xdp > > > > > > > is enabled. > > > > > > > - Add IPv6 support. > > > > > > > - Enhance the port security xdp program for ARP/IPv6 > > > > > > > ND > > > checks. > > > > > > > > > > > > > > This patch adds a basic XDP support in OVN and in future we > > > > > > > can > > > > > > > leverage eBPF/XDP features. > > > > > > > > > > > > > > I'm not sure how much value this RFC patch adds to make use > > > > > > > of > > > eBPF/XDP > > > > > > > just for port security. Submitting as RFC to get some > > > > > > > feedback and > > > > > > > start some conversation on eBPF/XDP in OVN. > > > > > > > > > > > > > Hi Numan, > > > > > > > > > > > > This is really cool. It demonstrates how OVN could leverage > > > > > > eBPF/XDP. > > > > > > > > > > > > On the other hand, for the port-security feature in XDP, I > > > > > > keep > > > thinking > > > > > > about the scenarios and it is still not very clear to me. One > > > advantage I > > > > > > can think of is to prevent DOS attacks from VM/Pod when > > > > > > invalid > > > IP/MAC are > > > > > > used, XDP may perform better and drop packets with lower CPU > > > > > > cost > > > > > > (comparing with OVS kernel datapath). However, I am also > > > > > > wondering > > > why > > > > > > would a attacker use invalid IP/MAC for DOS attacks? Do you > > > > > > have > > > some more > > > > > > thoughts about the use cases? > > > > > > > > My idea was to demonstrate the use of eBPF/XDP and port security > > > > checks were easy to do > > > > before the packet hits the OVS pipeline. > > > > > > > Understand. It is indeed a great demonstration. > > > > > > > If we were to move the port security check to XDP, then the only > > > > advantage we would be getting > > > > in my opinion is to remove the corresponding ingress port > > > > security > > > > check related OF rules from ovs-vswitchd, thereby decreasing some > > > > looks up during > > > > flow translation. > > > > > > > For slow path, it might reduce the lookups in two tables, but > > > considering > > > that we have tens of tables, this cost may be negligible? > > > For fast path, there is no impact on the megaflow cache. > > > > > > > I'm not sure why an attacker would use invalid IP/MAC for DOS > > > > attacks. > > > > But from what I know, ovn-kubernetes do want to restrict each POD > > > > to > > > > its assigned IP/MAC. > > > > > > > Yes, restricting pods to use assigned IP/MAC is for port security, > > > which > > > is implemented by the port-security flows. I was talking about DOS > > > attacks > > > just to imagine a use case that utilizes the performance advantage > > > of XDP. > > > If it is just to detect and drop a regular amount of packets that > > > try to > > > use fake IP/MAC to circumvent security policies (ACLs), it doesn't > > > reflect > > > the benefit of XDP. > > > > > > > And do you have any performance results > > > > > > comparing with the current OVS implementation? > > > > > > > > I didn't do any scale/performance related tests. > > > > > > > > If we were to move port security feature to XDP in OVN, then I > > > > think we > > > need to > > > > - Complete the TODO's like adding IPv6 and ARP/ND related > > > > checks > > > > - Do some scale testing and see whether its reducing memory > > > > footprint of ovs-vswitchd and ovn-controller because of the > > > > reduction > > > > in OF rules > > > > > > > > > > Maybe I am wrong, but I think port-security flows are only related > > > to > > > local LSPs on each node, which doesn't contribute much to the > > > OVS/ovn-controller memory footprint, and thanks to your patches > > > that moves > > > port-security flow generation from northd to ovn-controller, the > > > central > > > components are already out of the picture of the port-security > > > related > > > costs. So I guess we won't see obvious differences in scale tests. > > > > > > > > > > > > > > > Another question is, would it work with smart NIC HW-offload, > > > > > > where > > > VF > > > > > > representer ports are added to OVS on the smart NIC? I guess > > > > > > XDP > > > doesn't > > > > > > support representer port, right? > > > > > > > > I think so. I don't have much experience/knowledge on this. From > > > > what > > > > I understand, if datapath flows are offloaded and since XDP is > > > > not > > > > offloaded, the xdo checks will be totally missed. > > > > So if XDP is to be used, then offloading should be disabled. > > > > > > > > > > Agree, although I did hope it could help for HW offload enabled > > > environments to mitigate the scenarios when packets would miss the > > > HW flow > > > cache. > > > > > > Thanks, > > > Han > > > > > > > Thanks > > > > Numan > > > > > > > > > > > > > > > > Thanks, > > > > > > Han > > > > > > > > > > > > > In order to attach and detach xdp programs, libxdp [1] and > > > > > > > libbpf > > > is > > > > > > used. > > > > > > > > > > > > > > To test it out locally, please install libxdp-devel and > > > libbpf-devel > > > > > > > and the compile OVN first and then compile ovn_xdp by > > > > > > > running "make > > > > > > > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or > > > /usr/local/share/ovn/ > > > > > > > > > > > > > > > > > > > > > Numan Siddique (2): > > > > > > > RFC: Add basic xdp/eBPF support in OVN. > > > > > > > RFC: ovn-controller: Attach XDP progs to the VIFs of the > > > > > > > logical > > > > > > > ports. > > > > > > > > > > > > > > Makefile.am | 6 +- > > > > > > > bpf/.gitignore | 5 + > > > > > > > bpf/automake.mk | 23 +++ > > > > > > > bpf/ovn_xdp.c | 156 +++++++++++++++ > > > > > > > configure.ac | 2 + > > > > > > > controller/automake.mk | 4 +- > > > > > > > controller/binding.c | 45 +++-- > > > > > > > controller/binding.h | 7 + > > > > > > > controller/ovn-controller.c | 79 +++++++- > > > > > > > controller/xdp.c | 389 > > > ++++++++++++++++++++++++++++++++++++ > > > > > > > controller/xdp.h | 41 ++++ > > > > > > > m4/ovn.m4 | 20 ++ > > > > > > > tests/automake.mk | 1 + > > > > > > > 13 files changed, 753 insertions(+), 25 deletions(-) > > > > > > > create mode 100644 bpf/.gitignore > > > > > > > create mode 100644 bpf/automake.mk > > > > > > > create mode 100644 bpf/ovn_xdp.c > > > > > > > create mode 100644 controller/xdp.c > > > > > > > create mode 100644 controller/xdp.h > > > > > > > > > > > > > > -- > > > > > > > 2.35.3 > > > > > > > > > > > > > > _______________________________________________ > > > > > > > dev mailing list > > > > > > > [email protected] > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > [email protected] > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > > > > -- > > > > > 刘梦馨 > > > > > Blog: http://oilbeater.com > > > > > Weibo: @oilbeater <http://weibo.com/oilbeater> > > > > > _______________________________________________ > > > > > dev mailing list > > > > > [email protected] > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > -- 刘梦馨 Blog: http://oilbeater.com Weibo: @oilbeater <http://weibo.com/oilbeater> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
