Re: [ovs-dev] [PATCH v3 3/3] ovn: Send GARP for router port IPs of a router port connected to bridged logical switch
On Tue, Jul 9, 2019 at 5:46 PM Numan Siddique wrote: > > > On Tue, Jul 9, 2019 at 4:37 PM Ilya Maximets > wrote: > >> On 01.07.2019 10:43, nusid...@redhat.com wrote: >> > From: Numan Siddique >> > >> > This patch handles sending GARPs for >> > >> > - router port IPs of a distributed router port >> > >> > - router port IPs of a router port which belongs to gateway router >> >(with the option - redirect-chassis set in Logical_Router.options) >> > >> > Signed-off-by: Numan Siddique >> > Acked-by: Dumitru Ceara >> > --- >> > ovn/northd/ovn-northd.c | 44 >> > tests/ovn.at| 89 +++-- >> > 2 files changed, 105 insertions(+), 28 deletions(-) >> > >> >> Hi. >> This patch triggers frequent TravisCI failures: >> https://travis-ci.org/openvswitch/ovs/jobs/556015141 >> >> checking packets in ext1/vif1-tx.pcap against ext1-vif1.expected: >> ovn.at:12: waiting until $PYTHON "$top_srcdir/utilities/ovs-pcap.in" >> $rcv_pcap > $rcv_text >> rcv_n=`wc -l < "$rcv_text"` >> echo "rcv_n=$rcv_n exp_n=$exp_n" >> test $rcv_n -ge $exp_n... >> rcv_n=1 exp_n=2 >> rcv_n=1 exp_n=2 >> rcv_n=1 exp_n=2 >> rcv_n=2 exp_n=2 >> ovn.at:12: wait succeeded after 2 seconds >> ../../tests/ovn.at:8593: sort $rcv_text >> --- expout 2019-07-05 19:09:16.471288908 + >> +++ >> /home/travis/build/openvswitch/ovs/openvswitch-2.11.90/_build/tests/testsuite.dir/at-groups/2662/stdout >>2019-07-05 19:09:16.475288910 + >> @@ -1,2 +1,2 @@ >> >> -f0010204020102030800451c3f110100c0a80102ac10010300350008 >> >> +020102030806000108000604000102010203ac100101ac100101 >> >> >> 020102030806000108000604000102010203ac100101ac100101 >> 2662. ovn.at:8422: 2662. ovn -- 4 HV, 1 LS, 1 LR, packet test with HA >> distributed router gateway port (ovn.at:8422): FAILED (ovn.at:8593) >> >> Could you, please, take a look? >> >> Some other tests are affected too, but re-check usually succeeds for them: >> 2664: ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router >> gateway port >> 2691: ovn -- router - check packet length - icmp defrag >> It'll be good to fix them too. >> >> > Hi Ilya, > > I will take a look at it. > Dumitru also mentioned about the same errors. I thought those are timing > related. > Let me take a look. > > I have submitted the patch to fix these test failures - https://patchwork.ozlabs.org/patch/1130867/ Thanks Numan > Thanks > Numan > > >> Best regards, Ilya Maximets. >> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN SFC Support
On Thu, Jul 11, 2019 at 10:36 PM Pradipta Kumar Sahoo wrote: > Hi Numan, > > Can you please share the list OVN SFC patch details for review? > Hi Pradipta, I don't think there are any active OVN SFC patches for review. You can take a look into the github repo below to see the last status of the patches. Thanks Numan > > Thank You, > > *Pradipta Sahoo* > > > > On Thu, Jul 11, 2019 at 3:16 PM Numan Siddique > wrote: > >> On Thu, Jul 11, 2019 at 12:26 PM Sood, Ritu wrote: >> >> > Hi >> > Is there any plan to integrate SFC support in OVN? >> > There was some work done and there are some demos/presentations like one >> > below but it seems like there is no recent activity: >> > http://www.openvswitch.org/support/ovscon2016/7/1400-fourie.pdf >> > Is there any repo being maintained for SFC patches? >> > https://github.com/doonhammer/ovs repo has the patches but doesn't seem >> > to be synced with the mainline code currently. >> > >> >> I am not sure if some one is actively working on it. >> If some one wants to pick those patches, that would be great :) >> >> Thanks >> Numan >> >> >> >> > Regards, >> > -Ritu >> > >> > ___ >> > dev mailing list >> > d...@openvswitch.org >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] 答复: 答复: Why is ovs DPDK much worse than ovs in my test case?
Ilya, you're right, I captured 64K packets although MTU is 1500 when I use ovs-kernel, but packet size is always <1500 in most cases when I use ovs-DPDK. 00:34:33.331360 IP 192.168.200.101.48968 > 192.168.230.101.5201: Flags [.], seq 17462881:17528041, ack 0, win 229, options [nop,nop,TS val 148218621 ecr 148145855], length 65160 00:34:33.332064 IP 192.168.200.101.48968 > 192.168.230.101.5201: Flags [.], seq 17528041:17588857, ack 0, win 229, options [nop,nop,TS val 148218621 ecr 148145855], length 60816 Thank you so much, I will use e1000 for this. It will be great if OVS DPDK can handle it in the same way as kernel does, otherwise it will break people's sense for OVS DPDK, it shocked me at least. -邮件原件- 发件人: Ilya Maximets [mailto:i.maxim...@samsung.com] 发送时间: 2019年7月11日 15:35 收件人: Yi Yang (杨燚)-云服务集团 ; ovs-dev@openvswitch.org 主题: Re: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case? On 11.07.2019 3:27, Yi Yang (杨燚)-云服务集团 wrote: > BTW, offload features are on in my test client1 and server1 (iperf > server) > ... > -邮件原件- > 发件人: Yi Yang (杨燚)-云服务集团 > 发送时间: 2019年7月11日 8:22 > 收件人: i.maxim...@samsung.com; ovs-dev@openvswitch.org > 抄送: Yi Yang (杨燚)-云服务集团 > 主题: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case? > 重要性: 高 > > Ilya, thank you so much, using 9K MTU for all the virtio interfaces in > transport path does help (including DPDK port), the data is here. 8K usually works a bit better for me than 9K. Probably, because of the page size. Have you configured MTU for the tap interfaces on host side too just in case that host kernel doesn't negotiate the MTU with guest? > > vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101 > > Client connecting to 192.168.230.101, TCP port 5001 TCP window size: > 325 KByte (default) > > [ 3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port > 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 315 MBytes 264 Mbits/sec > [ 3] 10.0-20.0 sec 333 MBytes 280 Mbits/sec > [ 3] 20.0-30.0 sec 300 MBytes 252 Mbits/sec > [ 3] 30.0-40.0 sec 307 MBytes 258 Mbits/sec > [ 3] 40.0-50.0 sec 322 MBytes 270 Mbits/sec > [ 3] 50.0-60.0 sec 316 MBytes 265 Mbits/sec > [ 3] 0.0-60.0 sec 1.85 GBytes 265 Mbits/sec > vagrant@client1:~$ > > But it is still much worse than ovs kernel. In my test case, I used > VirtualBox network, the whole transport path traverses several different VMs, > every VM has turned on offload features except ovs DPDK VM, I understand tso > offload should be done on send side, so when the packet is sent out from the > send side or receive side, it has been segmented by tso to adapt to path MTU, > so in ovs kernel VM/ovs DPDK VM, the packet size has been MTU of ovs > port/DPDK port, so it needn't do tso work, right? Not sure if I understand the question correctly, but I'll try to clarify. I assume that all your VMs located on the same physical host. Linux kernel is smart and it will not segment the packets until it is unavoidable. If all the interfaces on a packet path supports TSO, kernel will never segment packets and will always traverse 64K packets all the way from the iperf client to iperf server. In case of OVS with DPDK its VM doesn't support TSO. This way packets will be splitted into segments to fit MTU before sending to that VM. The key point here is the virtio interfaces you're using for VMs. virtio-net is a para-virtual network interface. This means that the guest knows that interface is virtual and it knows that host is able to receive packets larger than MTU if offloading was negotiated. At the same time host knows that guest is able to receive packets larger than MTU too. So, nothing will be segmented. In case of OVS with DPDK host knows that guest is not able to receive packets larger than MTU and splits them before sending. You can't send packets larger than MTU to physical network, but you able to do that with virtual network if it was negotiated. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
On 7/11/2019 2:07 PM, Pravin Shelar wrote: I was bit busy for last couple of days. I will finish review by EOD today. Thanks, Pravin. net-next is closed anyway so no rush, but thanks! - Greg On Mon, Jul 8, 2019 at 4:22 PM Gregory Rose wrote: On 7/8/2019 4:18 PM, Gregory Rose wrote: On 7/8/2019 4:08 PM, David Miller wrote: From: Taehee Yoo Date: Sat, 6 Jul 2019 01:08:09 +0900 When a vport is deleted, the maximum headroom size would be changed. If the vport which has the largest headroom is deleted, the new max_headroom would be set. But, if the new headroom size is equal to the old headroom size, updating routine is unnecessary. Signed-off-by: Taehee Yoo I'm not so sure about the logic here and I'd therefore like an OVS expert to review this. I'll review and test it and get back. Pravin may have input as well. Err, adding Pravin. - Greg Thanks, - Greg Thanks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
I was bit busy for last couple of days. I will finish review by EOD today. Thanks, Pravin. On Mon, Jul 8, 2019 at 4:22 PM Gregory Rose wrote: > > > > On 7/8/2019 4:18 PM, Gregory Rose wrote: > > On 7/8/2019 4:08 PM, David Miller wrote: > >> From: Taehee Yoo > >> Date: Sat, 6 Jul 2019 01:08:09 +0900 > >> > >>> When a vport is deleted, the maximum headroom size would be changed. > >>> If the vport which has the largest headroom is deleted, > >>> the new max_headroom would be set. > >>> But, if the new headroom size is equal to the old headroom size, > >>> updating routine is unnecessary. > >>> > >>> Signed-off-by: Taehee Yoo > >> I'm not so sure about the logic here and I'd therefore like an OVS > >> expert > >> to review this. > > > > I'll review and test it and get back. Pravin may have input as well. > > > > Err, adding Pravin. > > - Greg > > > Thanks, > > > > - Greg > > > >> Thanks. > >> ___ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/2] ovn-performance.at: Fix syntax error in ACL.
On Tue, Jul 09, 2019 at 09:23:11PM -0700, Han Zhou wrote: > From: Han Zhou > > This doesn't impact the effectiveness of the test but just fix an > obvious error in ACL syntax which was noticed when looking at test > logs. > > Signed-off-by: Han Zhou Thanks for the fixes! I applied these to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN SFC Support
Hi Numan, Can you please share the list OVN SFC patch details for review? Thank You, *Pradipta Sahoo* On Thu, Jul 11, 2019 at 3:16 PM Numan Siddique wrote: > On Thu, Jul 11, 2019 at 12:26 PM Sood, Ritu wrote: > > > Hi > > Is there any plan to integrate SFC support in OVN? > > There was some work done and there are some demos/presentations like one > > below but it seems like there is no recent activity: > > http://www.openvswitch.org/support/ovscon2016/7/1400-fourie.pdf > > Is there any repo being maintained for SFC patches? > > https://github.com/doonhammer/ovs repo has the patches but doesn't seem > > to be synced with the mainline code currently. > > > > I am not sure if some one is actively working on it. > If some one wants to pick those patches, that would be great :) > > Thanks > Numan > > > > > Regards, > > -Ritu > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/3] OVN: introduce Controller_Event table
Bleep bloop. Greetings Lorenzo Bianconi, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. build: mv -f $depbase.Tpo $depbase.Po depbase=`echo ovn/controller/ha-chassis.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc -std=gnu99 -DHAVE_CONFIG_H -I.-I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -MT ovn/controller/ha-chassis.o -MD -MP -MF $depbase.Tpo -c -o ovn/controller/ha-chassis.o ovn/controller/ha-chassis.c &&\ mv -f $depbase.Tpo $depbase.Po depbase=`echo ovn/controller/lflow.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc -std=gnu99 -DHAVE_CONFIG_H -I.-I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -MT ovn/controller/lflow.o -MD -MP -MF $depbase.Tpo -c -o ovn/controller/lflow.o ovn/controller/lflow.c &&\ mv -f $depbase.Tpo $depbase.Po depbase=`echo ovn/controller/lport.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc -std=gnu99 -DHAVE_CONFIG_H -I.-I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -MT ovn/controller/lport.o -MD -MP -MF $depbase.Tpo -c -o ovn/controller/lport.o ovn/controller/lport.c &&\ mv -f $depbase.Tpo $depbase.Po depbase=`echo ovn/controller/ofctrl.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc -std=gnu99 -DHAVE_CONFIG_H -I.-I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -MT ovn/controller/ofctrl.o -MD -MP -MF $depbase.Tpo -c -o ovn/controller/ofctrl.o ovn/controller/ofctrl.c &&\ mv -f $depbase.Tpo $depbase.Po depbase=`echo ovn/controller/pinctrl.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc -std=gnu99 -DHAVE_CONFIG_H -I.-I ./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Werror -g -O2 -MT ovn/controller/pinctrl.o -MD -MP -MF $depbase.Tpo -c -o ovn/controller/pinctrl.o ovn/controller/pinctrl.c &&\ mv -f $depbase.Tpo $depbase.Po ovn/controller/pinctrl.c:289:1: error: ‘pinctrl_find_empty_lb_backends_event’ defined but not used [-Werror=unused-function] pinctrl_find_empty_lb_backends_event(char *vip, char *protocol, ^ cc1: all warnings being treated as errors make[2]: *** [ovn/controller/pinctrl.o] Error 1 make[2]: Leaving directory `/var/lib/jenkins/jobs/upstream_build_from_pw/workspace' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/lib/jenkins/jobs/upstream_build_from_pw/workspace' make: *** [all] Error 2 Please check this out. If you feel there has been an error, please email acon...@bytheb.org Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovn: Fix the test failures in travis CI.
From: Numan Siddique After the commit [1], below test cases are failing repeatedly in travis CI. 2663: ovn -- 4 HV, 1 LS, 1 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8597) 2664: ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port FAILED (ovn.at:8844) 2667: ovn -- vlan traffic for external network with distributed router gateway port FAILED (ovn.at:9580) 2691: ovn -- router - check packet length - icmp defrag FAILED (ovn.at:13624) With the commit [1], ovn-controller sends GARPs for the IPs of the distributed router ports. The failing tests did not handle the situation if multiple GARPs are sent. The failures are mostly timing related. This patch fixes these issues. [1] - d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch") Fixes: d65586b6fa97 ("ovn: Send GARP for router port IPs of a router port connected to bridged logical switch") CC: Ilya Maximets Signed-off-by: Numan Siddique --- tests/ovn.at | 53 ++-- 1 file changed, 35 insertions(+), 18 deletions(-) diff --git a/tests/ovn.at b/tests/ovn.at index 4da7059b3..95980f2f1 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -8593,7 +8593,9 @@ grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 OVN_CHECK_PACKETS([ext1/vif1-tx.pcap], [ext1-vif1.expected]) $PYTHON "$top_srcdir/utilities/ovs-pcap.in" $active_gw/br-phys_n1-tx.pcap > packets cat packets | grep $expected > exp -cat packets | grep $exp_gw_ip_garp >> exp +# Its possible that $active_gw/br-phys_n1-tx.pcap may have received multiple +# garp packets. So consider only the first packet. +cat packets | grep $exp_gw_ip_garp | head -1 >> exp AT_CHECK([cat exp], [0], [expout]) rm -f expout if test $backup_vswitchd_dead != 1; then @@ -8840,7 +8842,7 @@ grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 OVN_CHECK_PACKETS([ext1/vif1-tx.pcap], [ext1-vif1.expected]) $PYTHON "$top_srcdir/utilities/ovs-pcap.in" $active_gw/br-phys_n1-tx.pcap > packets cat packets | grep $expected > exp -cat packets | grep $exp_gw_ip_garp >> exp +cat packets | grep $exp_gw_ip_garp | head -1 >> exp AT_CHECK([cat exp], [0], [expout]) $PYTHON "$top_srcdir/utilities/ovs-pcap.in" $backup_gw/br-phys_n1-tx.pcap > packets @@ -9567,20 +9569,9 @@ options:rxq_pcap=${pcap_file}-rx.pcap as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 as hv3 reset_pcap_file hv3-vif1 hv3/vif1 -sleep 2 -# Take note of how many packets arrived on the VLAN switch before generating -# further traffic -n_packets=`as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" | grep actions=clone | sed 's/.*n_packets=\([[0-9]]*\),.*/\1/'` as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet sleep 2 -# On hv1, the packet should not go from vlan switch pipleline to router -# pipeline -as hv1 ovs-ofctl dump-flows br-int -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \ -| grep actions=clone | grep -v n_packets=$n_packets | wc -l], [0], [[0 -]]) - # On hv1, table 32 check that no packet goes via the tunnel port AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \ | grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0 @@ -9624,21 +9615,38 @@ echo $exp_garp_on_foo1 > foo1.expout # ovn-controller on hv2 should send garp with VLAN tag sent_garp="0101020381020806000108000604000101010203c0a80101c0a80101" -echo $sent_garp > br-ex_n2.expout OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [foo1.expout]) -OVN_CHECK_PACKETS([hv2/br-ex_n2-tx.pcap], [br-ex_n2.expout]) +# Wait until we receive atleast 1 packet +OVS_WAIT_UNTIL([test 1=`$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap | wc -l`]) +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap | head -1 > packets +echo $sent_garp > expout +AT_CHECK([cat packets], [0], [expout]) $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv4/br-ex_n2-tx.pcap > empty AT_CHECK([cat empty], [0], []) # Make hv4 master as hv1 reset_pcap_file hv1-vif1 hv1/vif1 -as hv2 reset_pcap_file br-ex_n2 hv2/br-ex_n2 as hv4 reset_pcap_file br-ex_n2 hv4/br-ex_n2 ovn-nbctl --wait=sb ha-chassis-group-add-chassis hagrp1 hv4 40 +# Wait till cr-alice is claimed by hv4 +hv4_chassis=$(ovn-sbctl --bare --columns=_uuid find Chassis name=hv4) +# check that the chassis redirect port has been claimed by the gw1 chassis +OVS_WAIT_UNTIL([ovn-sbctl --columns chassis --bare find Port_Binding \ +logical_port=cr-alice | grep $hv4_chassis | wc -l], [0],[[1 +]]) + +# Reset the pcap file for hv2/br-ex_n2. From now on ovn-controller in hv2 +# should not send GARPs for the router ports. +as hv2 reset_pcap_file br-ex_n2 hv2/br-ex_n2 + +echo $sent_garp > br-ex_n2.expout OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [foo1.expout]) OVN_CHECK_PACKETS([hv4/br-ex_n2-tx.pca
[ovs-dev] [PATCH v2 3/3] OVN: use trigger_event action to report 'empty_lb_rule' events
Add northd logical flows in order to reports that the controller received an IP packet for LB rule witn no backends. This configuration is used by OpenShift to spin up a idle POD Signed-off-by: Mark Michelson Co-authored-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- ovn/northd/ovn-northd.c | 33 + ovn/ovn-nb.xml | 11 +++ tests/ovn.at| 65 + 3 files changed, 109 insertions(+) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index ce382ac89..4929fb666 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -70,6 +70,8 @@ static const char *unixctl_path; static struct hmap macam = HMAP_INITIALIZER(&macam); static struct eth_addr mac_prefix; +static bool controller_event_en; + #define MAX_OVN_TAGS 4096 /* Pipeline stages. */ @@ -3626,6 +3628,34 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows) sset_add(&all_ips, ip_address); } +if (controller_event_en && !node->value[0]) { +struct ds match = DS_EMPTY_INITIALIZER; +char *action; + +if (addr_family == AF_INET) { +ds_put_format(&match, "ip4.dst == %s && %s", + ip_address, lb->protocol); +} else { +ds_put_format(&match, "ip6.dst == %s && %s", + ip_address, lb->protocol); +} +if (port) { +ds_put_format(&match, " && %s.dst == %u", lb->protocol, + port); +} +action = xasprintf("trigger_event(event = \"%s\", " + "vip = \"%s\", protocol = \"%s\", " + "load_balancer = \"" UUID_FMT "\");", + event_to_string(OVN_EVENT_EMPTY_LB_BACKENDS), + node->key, lb->protocol, + UUID_ARGS(&lb->header_.uuid)); +ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 120, + ds_cstr(&match), action); +ds_destroy(&match); +free(action); +continue; +} + free(ip_address); /* Ignore L4 port information in the key because fragmented packets @@ -8115,6 +8145,9 @@ ovnnb_db_run(struct northd_context *ctx, smap_destroy(&options); } +controller_event_en = smap_get_bool(&nb->options, +"controller_event", false); + cleanup_macam(&macam); } diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 318379c1f..b0287563b 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -107,6 +107,17 @@ Configure a given OUI to be used as prefix when L2 address is dynamically assigned, e.g. 00:11:22 + + +Value set by the CMS to enable/disable ovn-controller event reporting. +Traffic into OVS can raise a 'controller' event that results in a +Controller_Event being written to the +table in SBDB. When the CMS has seen the event and taken appropriate +action, it can remove the correponding row in + table. +The intention is for a CMS to see the events and take some sort of +action. Please see the table in SBDB. + diff --git a/tests/ovn.at b/tests/ovn.at index e9ec715df..d2823d77a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -14344,3 +14344,68 @@ AT_CHECK([ovn-nbctl ls-add sw1], [1], [ignore], ]) AT_CLEANUP + +AT_SETUP([ovn -- controller event]) +AT_KEYWORDS([ovn_controller_event]) +ovn_start + +# Create hypervisors hv[12]. +# Add vif1[12] to hv1, vif2[12] to hv2 +# Add all of the vifs to a single logical switch sw0. + +net_add n1 +ovn-nbctl ls-add sw0 +for i in 1 2; do +sim_add hv$i +as hv$i +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.$i + +for j in 1 2; do +ovn-nbctl lsp-add sw0 sw0-p$i$j -- \ +lsp-set-addresses sw0-p$i$j "00:00:00:00:00:$i$j 192.168.1.$i$j" + +ovs-vsctl -- add-port br-int vif$i$j -- \ +set interface vif$i$j \ +external-ids:iface-id=sw0-p$i$j \ +options:tx_pcap=hv$i/vif$i$j-tx.pcap \ +options:rxq_pcap=hv$i/vif$i$j-rx.pcap \ +ofport-request=$i$j +done +done + +ovn-nbctl --wait=hv set NB_Global . options:controller_event=true +ovn-nbctl lb-add lb0 192.168.1.100:80 "" +ovn-nbctl ls-lb-add sw0 lb0 +uuid_lb=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb0) + +OVN_POPULATE_ARP +ovn-nbctl --timeout=3 --wait=hv sync +ovn-sbctl lflow-list +as hv1 ovs-ofctl dump-flows br-int + +packet="inport==\"sw0-p11\" && eth.src==00:00:00:00:00:11 && eth.dst==00:00:00:00:00:21 && + ip4 && ip.ttl==64 && ip4.src==192.168.1.11 && ip4.dst
[ovs-dev] [PATCH v2 2/3] OVN: introduce trigger_event() action
Add trigger_event() ovn action in order to allow ovs-vswitchd to report CMS related events. This commit introduces a new event, empty_lb_backends. This event is raised if a received packet is destined for a load balancer VIP that has no configured backend destinations. For this event, the event info includes the load balancer VIP, the load balancer UUID, and the transport protocol. The use case for this particular event is for the CMS to supply backend resources to handle this traffic. For example, in Openshift, this event can be used to spin up new containers to handle the incoming traffic. Signed-off-by: Mark Michelson Co-authored-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- include/ovn/actions.h | 18 +++- ovn/controller/lflow.c| 26 +- ovn/controller/pinctrl.c | 114 ovn/lib/actions.c | 176 ++ ovn/lib/ovn-l7.h | 46 ++ ovn/ovn-sb.xml| 21 + ovn/utilities/ovn-trace.c | 3 + tests/ovn.at | 10 +++ tests/test-ovn.c | 11 ++- 9 files changed, 419 insertions(+), 6 deletions(-) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index f42bbc277..5ed70e798 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -83,7 +83,8 @@ struct ovn_extend_table; OVNACT(ND_NS, ovnact_nest)\ OVNACT(SET_METER, ovnact_set_meter) \ OVNACT(OVNFIELD_LOAD, ovnact_load)\ -OVNACT(CHECK_PKT_LARGER, ovnact_check_pkt_larger) +OVNACT(CHECK_PKT_LARGER, ovnact_check_pkt_larger) \ +OVNACT(TRIGGER_EVENT, ovnact_controller_event) /* enum ovnact_type, with a member OVNACT_ for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -318,6 +319,14 @@ struct ovnact_check_pkt_larger { struct expr_field dst; /* 1-bit destination field. */ }; +/* OVNACT_EVENT. */ +struct ovnact_controller_event { +struct ovnact ovnact; +int event_type; /* controller event type */ +struct ovnact_gen_option *options; +size_t n_options; +}; + /* Internal use by the helpers below. */ void ovnact_init(struct ovnact *, enum ovnact_type, size_t len); void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len); @@ -486,6 +495,9 @@ enum action_opcode { * The actions, in OpenFlow 1.3 format, follow the action_header. */ ACTION_OPCODE_ICMP4_ERROR, + +/* "trigger_event (event_type)" */ +ACTION_OPCODE_EVENT, }; /* Header. */ @@ -515,6 +527,10 @@ struct ovnact_parse_params { /* hmap of 'struct gen_opts_map' to support 'put_nd_ra_opts' action */ const struct hmap *nd_ra_opts; +/* Array of hmap of 'struct gen_opts_map' to support 'trigger_event' + * action */ +const struct controller_event_options *controller_event_opts; + /* Each OVN flow exists in a logical table within a logical pipeline. * These parameters express this context for a set of OVN actions being * parsed: diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index feb8f8ff7..1aafafb33 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -70,6 +70,7 @@ static bool consider_logical_flow( struct hmap *dhcp_opts, struct hmap *dhcpv6_opts, struct hmap *nd_ra_opts, +struct controller_event_options *controller_event_opts, const struct shash *addr_sets, const struct shash *port_groups, const struct sset *active_tunnels, @@ -297,12 +298,16 @@ add_logical_flows( struct hmap nd_ra_opts = HMAP_INITIALIZER(&nd_ra_opts); nd_ra_opts_init(&nd_ra_opts); +struct controller_event_options controller_event_opts; +controller_event_opts_init(&controller_event_opts); + SBREC_LOGICAL_FLOW_TABLE_FOR_EACH (lflow, logical_flow_table) { if (!consider_logical_flow(sbrec_multicast_group_by_name_datapath, sbrec_port_binding_by_name, lflow, local_datapaths, chassis, &dhcp_opts, &dhcpv6_opts, - &nd_ra_opts, addr_sets, port_groups, + &nd_ra_opts, &controller_event_opts, + addr_sets, port_groups, active_tunnels, local_lport_ids, flow_table, group_table, meter_table, lfrr, conj_id_ofs)) { @@ -315,6 +320,7 @@ add_logical_flows( dhcp_opts_destroy(&dhcp_opts); dhcp_opts_destroy(&dhcpv6_opts); nd_ra_opts_destroy(&nd_ra_opts); +controller_event_opts_destroy(&controller_event_opts); } bool @@ -371,6 +377,10 @@ lflow_handle_changed_flows( lflow_resource_destroy_lflow(lfrr, &lflow->header_.uuid); } } + +struct controller_event_options controller_event_opts; +controller_event_opts_init(&controller_event_opts); + SBREC_LOGICAL_
[ovs-dev] [PATCH v2 1/3] OVN: introduce Controller_Event table
Add Controller_Event table to OVN SBDB in order to report CMS related event. Introduce event_table hashmap array and controller_event related structures to ovn-controller in order to track pending events forwarded by ovs-vswitchd. Moreover integrate event_table hashmap array with event_table ovn-sbdb table Signed-off-by: Mark Michelson Co-authored-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- include/ovn/logical-fields.h| 7 ++ ovn/controller/ovn-controller.c | 10 +++ ovn/controller/pinctrl.c| 151 ovn/controller/pinctrl.h| 2 + ovn/lib/logical-fields.c| 21 + ovn/ovn-sb.ovsschema| 20 - ovn/ovn-sb.xml | 40 + 7 files changed, 248 insertions(+), 3 deletions(-) diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h index 164b338b5..9bac8e027 100644 --- a/include/ovn/logical-fields.h +++ b/include/ovn/logical-fields.h @@ -20,6 +20,11 @@ struct shash; +enum ovn_controller_event { +OVN_EVENT_EMPTY_LB_BACKENDS = 0, +OVN_EVENT_MAX, +}; + /* Logical fields. * * These values are documented in ovn-architecture(7), please update the @@ -118,6 +123,8 @@ ovn_field_from_id(enum ovn_field_id id) return &ovn_fields[id]; } +const char *event_to_string(enum ovn_controller_event event); +int string_to_event(const char *s); const struct ovn_field *ovn_field_from_name(const char *name); void ovn_destroy_ovnfields(void); #endif /* ovn/lib/logical-fields.h */ diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index c4883aa6d..1a90b702e 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -133,6 +133,8 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, * Monitor Logical_Flow, MAC_Binding, Multicast_Group, and DNS tables for * local datapaths. * + * Monitor Controller_Event rows for local chassis. + * * We always monitor patch ports because they allow us to see the linkages * between related logical datapaths. That way, when we know that we have * a VIF on a particular logical switch, we immediately know to monitor all @@ -142,6 +144,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, struct ovsdb_idl_condition mb = OVSDB_IDL_CONDITION_INIT(&mb); struct ovsdb_idl_condition mg = OVSDB_IDL_CONDITION_INIT(&mg); struct ovsdb_idl_condition dns = OVSDB_IDL_CONDITION_INIT(&dns); +struct ovsdb_idl_condition ce = OVSDB_IDL_CONDITION_INIT(&ce); sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "patch"); /* XXX: We can optimize this, if we find a way to only monitor * ports that have a Gateway_Chassis that point's to our own @@ -165,6 +168,9 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, &l2); const struct smap l3 = SMAP_CONST1(&l3, "l3gateway-chassis", id); sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, &l3); + +sbrec_controller_event_add_clause_chassis(&ce, OVSDB_F_EQ, + &chassis->header_.uuid); } if (local_ifaces) { const char *name; @@ -191,11 +197,13 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, sbrec_mac_binding_set_condition(ovnsb_idl, &mb); sbrec_multicast_group_set_condition(ovnsb_idl, &mg); sbrec_dns_set_condition(ovnsb_idl, &dns); +sbrec_controller_event_set_condition(ovnsb_idl, &ce); ovsdb_idl_condition_destroy(&pb); ovsdb_idl_condition_destroy(&lf); ovsdb_idl_condition_destroy(&mb); ovsdb_idl_condition_destroy(&mg); ovsdb_idl_condition_destroy(&dns); +ovsdb_idl_condition_destroy(&ce); } static const char * @@ -1981,6 +1989,8 @@ main(int argc, char *argv[]) sbrec_port_binding_by_name, sbrec_mac_binding_by_lport_ip, sbrec_dns_table_get(ovnsb_idl_loop.idl), +sbrec_controller_event_table_get( +ovnsb_idl_loop.idl), br_int, chassis, &ed_runtime_data.local_datapaths, &ed_runtime_data.active_tunnels); diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index a442738a0..b8ed375fe 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -226,6 +226,153 @@ static bool may_inject_pkts(void); COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); +COVERAGE_DEFINE(pinctrl_drop_controller_event); + +struct empty_lb_backends_event { +struct hmap_node hmap_node; +long long int timestamp; + +char *vip; +char *protocol; +char *load_balancer; +}; + +static struct hmap event_table[OVN_EVENT_MAX]; +static int64_t event_seq_num; + +static void +
[ovs-dev] [PATCH v2 0/3] OVN: add Controller Events
There are situations where arrival of certain types of traffic into OVS does not warrant a "typical" action, such as output to a specific port or dropping. Rather, the decision about what to do needs to be left to a CMS. The series here introduces a new table, Controller_Event, for this purpose. Traffic into OVS can raise a 'controller' event that results in a Controller_Event being written to the southbound database. The intention is for a CMS to see the events and take some sort of action. When the CMS has seen the event and taken appropriate action, then it can remove the correponding row in Controller_Event table. Controller events are only added to the southbound database if the CMS enable the feature in the NB nb_global table setting options:controller_event to true. This series introduces a new event, empty_lb_backends. This event is raised if a received packet is destined for a load balancer VIP that has no configured backend destinations. For this event, the event info includes the load balancer VIP, the load balancer UUID, and the transport protocol. The use case for this particular event is for the CMS to supply backend resources to handle this traffic. For example, in Openshift, this event can be used to spin up new containers to handle the incoming traffic. Changes since v1: - improve documentation - fix code style - moved event_to_string and string_to_event routines in ovn/lib/logical-fields.c - set GC timeout to 10s - removed unnecessary ip parameter in 'empty_lb_backends' logical flow definition - substituted ovs_assert() with a warning log in pinctrl_handle_empty_lb_backends_opts() - fixed commit message of patch 2/3 - remove 'handled' column in Controller_Event table Changes since RFCv2: - introduce event sequence number - improve documentation Changes since RFCv1: - added garbage collector for event hash table - rename send_event in trigger_event - modify event_type from int to string in trigger_event action - added chassis column in Controller_Event as weak reference to Chassis table - added monitoring to 'local' rows in Controller_Event table - fix typos Lorenzo Bianconi (3): OVN: introduce Controller_Event table OVN: introduce trigger_event() action OVN: use trigger_event action to report 'empty_lb_rule' events include/ovn/actions.h | 18 ++- include/ovn/logical-fields.h| 7 + ovn/controller/lflow.c | 26 +++- ovn/controller/ovn-controller.c | 10 ++ ovn/controller/pinctrl.c| 265 ovn/controller/pinctrl.h| 2 + ovn/lib/actions.c | 176 + ovn/lib/logical-fields.c| 21 +++ ovn/lib/ovn-l7.h| 46 ++ ovn/northd/ovn-northd.c | 33 ovn/ovn-nb.xml | 11 ++ ovn/ovn-sb.ovsschema| 20 ++- ovn/ovn-sb.xml | 61 ovn/utilities/ovn-trace.c | 3 + tests/ovn.at| 75 + tests/test-ovn.c| 11 +- 16 files changed, 776 insertions(+), 9 deletions(-) -- 2.21.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovs-macros: An option to suspend test execution on error
Origins for this patch are captured at https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048923.html. Summarizing here, when a test fails, it would be good to pause test execution and let the developer poke around the system to see current status of system. As part of this patch, made a small tweaks to ovs-macros.at, so that when test suite fails, ovs_on_exit() function will be called. And in this function, a check is made to see if an environment variable to OVS_PAUSE_TEST is set. If it is set, then test suite is paused and will continue to wait for user input Ctrl-D. Meanwhile user can poke around the system to see why test case has failed. Once done with investigation, user can press ctrl-d to cleanup the test suite. For example, to re-run test case 139: export OVS_PAUSE_TEST=1 cd tests/system-userspace-testsuite.dir/139 sudo -E ./run When error occurs, above command would display something like this: = Set environment variable to use various ovs utilities export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139 Press ctrl-d to continue: = And from another window, one can execute ovs-xxx commands like: export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139 $ ovs-ofctl dump-ports br0 . . Signed-off-by: Vasu Dasari --- tests/ovs-macros.at | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at index 10593429d..57617a410 100644 --- a/tests/ovs-macros.at +++ b/tests/ovs-macros.at @@ -35,11 +35,36 @@ m4_divert_push([PREPARE_TESTS]) # directory. ovs_init() { ovs_base=`pwd` -trap '. "$ovs_base/cleanup"' 0 +trap ovs_on_exit 0 : > cleanup ovs_setenv } +# Catch testsuite error condition and cleanup test environment by tearing down +# all interfaces and processes spawned. +# User has an option to leave the test environment in error state so that system +# can be poked around to get more information. User can enable this option by setting +# environment variable OVS_PAUSE_TEST=1. User needs to press CTRL-D to resume the +# cleanup operation. +ovs_pause() { +echo "=" +echo "Set environment variable to use various ovs utilities" +echo "export OVS_RUNDIR=$ovs_base" +echo "Press ctrl-d to continue:" +while read -s -n 1 key; do +printf -v keycode "%d" "'$key" +[ $keycode -ne 4 ] || break +done +} + +ovs_on_exit () { +if [ ! -z "${OVS_PAUSE_TEST}" ]; then +trap '' INT +ovs_pause +fi +. "$ovs_base/cleanup" +} + # With no parameter or an empty parameter, sets the OVS_*DIR # environment variables to point to $ovs_base, the base directory in # which the test is running. -- 2.17.2 (Apple Git-113) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 3/3] OVN: Add ovn-northd IGMP support
On Thu, Jul 11, 2019 at 10:57 AM Dumitru Ceara wrote: > > New IP Multicast Snooping Options are added to the Northbound DB > Logical_Switch:other_config column. These allow enabling IGMP snooping and > querier on the logical switch and get translated by ovn-northd to rows in > the IP_Multicast Southbound DB table. > > ovn-northd monitors for changes done by ovn-controllers in the Southbound DB > IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates > Multicast_Group entries in the Southbound DB, one per IGMP_Group address X, > containing the list of logical switch ports (aggregated from all controllers) > that have IGMP_Group entries for that datapath and address X. ovn-northd > also creates a logical flow that matches on IP multicast traffic destined > to address X and outputs it on the tunnel key of the corresponding > Multicast_Group entry. > > Signed-off-by: Dumitru Ceara > Acked-by: Mark Michelson > --- > ovn/northd/ovn-northd.c | 460 > --- > ovn/ovn-nb.xml | 54 ++ > tests/ovn.at| 270 > tests/system-ovn.at | 119 > 4 files changed, 871 insertions(+), 32 deletions(-) > > + > +static void > +build_mcast_groups(struct northd_context *ctx, > + struct hmap *datapaths, struct hmap *ports, > + struct hmap *mcast_groups, > + struct hmap *igmp_groups) > +{ > +struct ovn_port *op; > + > +hmap_init(mcast_groups); > +hmap_init(igmp_groups); > + > +HMAP_FOR_EACH (op, key_node, ports) { > +if (!op->nbsp) { > +continue; > +} > + > +if (lsp_is_enabled(op->nbsp)) { > +ovn_multicast_add(mcast_groups, &mc_flood, op); > +} > +} > + > +const struct sbrec_igmp_group *sb_igmp, *sb_igmp_next; > + > +SBREC_IGMP_GROUP_FOR_EACH_SAFE (sb_igmp, sb_igmp_next, ctx->ovnsb_idl) { > +/* If this is a stale group (e.g., controller had crashed, > + * purge it). > + */ > +if (!sb_igmp->chassis || !sb_igmp->datapath) { > +sbrec_igmp_group_delete(sb_igmp); > +continue; > +} > + > +struct ovn_datapath *od = > +ovn_datapath_from_sbrec(datapaths, sb_igmp->datapath); > +if (!od) { > +sbrec_igmp_group_delete(sb_igmp); > +continue; > +} > + > +ovn_igmp_group_add(mcast_groups, igmp_groups, od, ports, > + sb_igmp->address, sb_igmp->ports, > sb_igmp->n_ports); > +} > +} Hi Ben, Mark, While doing some scale testing I realized that walking the rows of the IGMP_Group table in ovn-northd in the order we get them from the database might create an issue: ovn_igmp_group_add will create a new multicast_group for every unique IGMP group address and allocate a tunnel-id for it. However, because rows are not processed in the order they were added to the database, it can happen that multicast groups that didn't actually change will get a different tunnel-id triggering a change in the associated logical flows. In order to avoid this I would need to reuse the tunnel-ids of multicast groups that didn't change between different runs of the ovn-northd loop. Until now I thought of two different approaches (both with advantages and disadvantages): 1. Force ovn-northd to walk the IGMP table in a way that ensures that IGMP groups are processed in the order they were added to the database: Add a column to the IGMP_Group table storing a free running counter value (unique per ovn-controller instance) and add another compound index [datapath + address + counter]. Every time an ovn-controller adds an IGMP group it increments its own counter. Then have ovn-northd walk the IGMP_Group table with SBREC_IGMP_GROUP_FOR_EACH_BYINDEX which would give us a stable ordering of the entries. Advantages: - relatively straightforward to code and maintain Disadvantages: - extra column in SB DB - populating the index in ovn-northd will take N log(N) operations if i understand correctly the IDL index implementation (N = number of IGMP_Group entries in the DB) 2. Maintain a cache (hashtable) of allocated multicast group tunnel-ids between subsequent runs of the ovn-northd loop: - Once all IGMP_Group entries are processed and their corresponding Multicast_Group entries are collected we'd need to store a mapping (per datapath) between IGMP group address and multicast group tunnel-id. - Next time ovn-northd walks the IGMP_Group table, before allocating a new tunnel-id for a multicast group entry it would check the "cache" from the previous run. If there's already an entry it would reuse the tunnel-id. If not, it will have to allocate a tunnel-id. Store the (IGMP group address, tunnel-id) mapping for next run. Advantages: - Changes are all local to ovn-northd, no need to store additional information in the DB. - Should be faster on average when p
Re: [ovs-dev] [PATCH v10 0/5] dpcls func ptrs & optimizations
> -Original Message- > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > Sent: Thursday, July 11, 2019 3:14 PM > To: Van Haaren, Harry ; d...@openvswitch.org > Cc: malvika.gu...@arm.com; Stokes, Ian ; Michal Orsák > > Subject: Re: [PATCH v10 0/5] dpcls func ptrs & optimizations > > On 09.07.2019 15:34, Harry van Haaren wrote: > > Hey All, > > > > > > Here a v10 of the DPCLS Function Pointer patchset, as has been > > presented at OVS Conf in Nov '18, and discussed on the ML since then. > > I'm aware of the soft-freeze for 2.12, I feel this patchset has had > > enough reviews/versions/testing to be merged in 2.12. > > > > Thanks Ilya and Ian for review comments on v9, they should all be addressed > > in this v10. > > > > Thanks Malvika Gupta for testing (Tested-by tag added to patches) and also > > for reporting ARM performance gains, see here for details: > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-June/360088.html > > > > > > Regards, -Harry > > Hi, Harry. > Thanks for working on this. My pleasure - it’s a nice part of OVS. And there's lots more to do :) > I performed some tests with this version in my usual PVP with bonded PHY > setup and here are some observations: > > * Bug that redirected packets to wrong rules is gone. At least I can't > catch it in my testing anymore. Assuming it's fixed now. > > * dpcls performance boost for 512B packets is around 12% in compare with > current master. Ah great! Glad to hear its giving you performance. > Few remarks about the test scenario: > All packets mostly goes through the NORMAL action with vlan push/pop. > Packets that goes from VM to balanced-tcp bonded PHY goes through > recirculation. Datapath flows for them looks like this: > > Before recirculation: > recirc_id=0,eth,ip,vlan_tci=0x/0x1fff,dl_src=aa:16:3e:24:30:dd,dl_dst=aa:b > b:cc:dd:ee:11,nw_frag=no > > After recirculation: > recirc_id=0x1,dp_hash=0xf5/0xff,eth,ip,dl_vlan=42,dl_vlan_pcp=0,nw_frag=no > > I have 256 flows in datapath for different 'dp_hash'es. > > So, even if the number of ipv4 flows is as high as 256K, I have about ~270 > datapath > flows in dpcls. (This gives a huge advantage to dpcls over EMC and SMC). Right - I'm a big fan of the consistent performance characteristic of DPCLS, which is due to its wildcarding capabilities and lack of caching concepts. > All the flows fits into 5+1 case, i.e. optimized function > dpcls_subtable_lookup_mf_u0w5_u1w1 used. > > Most interesting observation: > > * New version of dpcls lookup outperforms SMC in this setup even on > relatively small number of flows. With 8K flows dpcls faster than SMC > by 1.5% and by 5.7% with 256K flows. > Of course, SMC is 10% faster than dpcls with 8 flows, but it's not very > interesting because no-one can beat EMC in this area. > > I'd like to read the code more carefully tomorrow and probably give some > more feedback. > > Best regards, Ilya Maximets. Thanks for your comments - please do prioritize feedback ASAP, because as you know the 2.12 soft-freeze is already in effect. I'll work on Ian's comments on v10, but hold off sending v11 until there is some feedback from you too :) Thanks again, -Harry ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v10 0/5] dpcls func ptrs & optimizations
On 09.07.2019 15:34, Harry van Haaren wrote: > Hey All, > > > Here a v10 of the DPCLS Function Pointer patchset, as has been > presented at OVS Conf in Nov '18, and discussed on the ML since then. > I'm aware of the soft-freeze for 2.12, I feel this patchset has had > enough reviews/versions/testing to be merged in 2.12. > > Thanks Ilya and Ian for review comments on v9, they should all be addressed > in this v10. > > Thanks Malvika Gupta for testing (Tested-by tag added to patches) and also > for reporting ARM performance gains, see here for details: > https://mail.openvswitch.org/pipermail/ovs-dev/2019-June/360088.html > > > Regards, -Harry Hi, Harry. Thanks for working on this. I performed some tests with this version in my usual PVP with bonded PHY setup and here are some observations: * Bug that redirected packets to wrong rules is gone. At least I can't catch it in my testing anymore. Assuming it's fixed now. * dpcls performance boost for 512B packets is around 12% in compare with current master. Few remarks about the test scenario: All packets mostly goes through the NORMAL action with vlan push/pop. Packets that goes from VM to balanced-tcp bonded PHY goes through recirculation. Datapath flows for them looks like this: Before recirculation: recirc_id=0,eth,ip,vlan_tci=0x/0x1fff,dl_src=aa:16:3e:24:30:dd,dl_dst=aa:bb:cc:dd:ee:11,nw_frag=no After recirculation: recirc_id=0x1,dp_hash=0xf5/0xff,eth,ip,dl_vlan=42,dl_vlan_pcp=0,nw_frag=no I have 256 flows in datapath for different 'dp_hash'es. So, even if the number of ipv4 flows is as high as 256K, I have about ~270 datapath flows in dpcls. (This gives a huge advantage to dpcls over EMC and SMC). All the flows fits into 5+1 case, i.e. optimized function dpcls_subtable_lookup_mf_u0w5_u1w1 used. Most interesting observation: * New version of dpcls lookup outperforms SMC in this setup even on relatively small number of flows. With 8K flows dpcls faster than SMC by 1.5% and by 5.7% with 256K flows. Of course, SMC is 10% faster than dpcls with 8 flows, but it's not very interesting because no-one can beat EMC in this area. I'd like to read the code more carefully tomorrow and probably give some more feedback. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCHv15 2/2] netdev-afxdp: add new netdev type for AF_XDP.
On 09.07.2019 22:35, William Tu wrote: > The patch introduces experimental AF_XDP support for OVS netdev. > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket > type built upon the eBPF and XDP technology. It is aims to have comparable > performance to DPDK but cooperate better with existing kernel's networking > stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program > attached to the netdev, by-passing a couple of Linux kernel's subsystems > As a result, AF_XDP socket shows much better performance than AF_PACKET > For more details about AF_XDP, please see linux kernel's > Documentation/networking/af_xdp.rst. Note that by default, this feature is > not compiled in. > > Signed-off-by: William Tu > --- > v14: > * Mainly address issue reported by Ilya > > https://protect2.fireeye.com/url?k=0b6c291c248670fb.0b6da253-6021601b254970fd&u=https://patchwork.ozlabs.org/patch/1118972/ >when doing 'make check-afxdp' > * Fix xdp frame headroom issue > * Fix vlan test cases by disabling txvlan offload > * Skip cvlan > * Document TCP limitation (currently all tcp tests fail due to >kernel veth driver) > * Fix tunnel test cases due to --disable-system (another patch) > * Switch to use pthread_spin_lock, suggested by Ben > * Add coverage counter for debugging > * Fix buffer starvation issue at batch_send reported by Eelco >when using tap device with type=afxdp > > v15: > * address review feedback from Ilay > > https://protect2.fireeye.com/url?k=ceb755d3074c79a5.ceb6de9c-b1b2f6a490a479b8&u=https://patchwork.ozlabs.org/patch/1125476/ > * skip TCP related test cases > * reclaim all CONS_NUM_DESC at complete tx > * add retries to kick_tx > * increase memory pool size > * remove redundant xdp flag and bind flag > * remove unused rx_dropped var > * make tx_dropped counter atomic > * refactor dp_packet_init_afxdp using dp_packet_init__ > * rebase to ovs master, test with latest bpf-next kernel commit > b14a260e33ddb4 >Ilya's kernel patches are required >commit 455302d1c9ae ("xdp: fix hang while unregistering device bound to > xdp socket") >commit 162c820ed896 ("xdp: hold device for umem regardless of zero-copy > mode") > Possible issues: > * still lots of afxdp_cq_skip (ovs-appctl coverage/show) > afxdp_cq_skip 44325273.6/sec 34362312.683/sec 572705.2114/sec total: > 2106010377 > * TODO: >'make check-afxdp' still not all pass >IP fragmentation expiry test not fix yet, need to implement >deferral memory free, s.t like dpdk_mp_sweep. Currently hit >some missing umem descs when reclaiming. Hi. Regarding this issue: We don't need to reclaim everything from the rings. We only need to count number of descriptors that are currently in rings. When we're xlosing xdp socket kernel stops processing rings, also, all the buffers in the rings are buffers from current umem. So, we could just count them and wait for the number of elements in umem pool to become (size - n_packets_in_rings). 'outstanding_tx' already counts all the packets that are in TX and CQ or in the middle of processing in kernel. If we'll count the same way number of packets in RX and FQ, we'll know the total number of buffers currently in kernel. It might be hard or even impossible to reclaim all the packets from rings because kernel updates consumer/producer heads not for every packet and it depends on the kernel implementation in which state rings will be after the closing of the socket. Suggesting following incremental that works for me: diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index fe9d5300a..4b9262189 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -34,7 +34,9 @@ #include "coverage.h" #include "dp-packet.h" #include "dpif-netdev.h" +#include "fatal-signal.h" #include "openvswitch/dynamic-string.h" +#include "openvswitch/list.h" #include "openvswitch/vlog.h" #include "packets.h" #include "socket-util.h" @@ -65,6 +67,57 @@ static void xsk_destroy(struct xsk_socket_info *xsk); static int xsk_configure_all(struct netdev *netdev); static void xsk_destroy_all(struct netdev *netdev); +struct unused_pool { +struct xsk_umem_info *umem_info; +int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */ +struct ovs_list list_node; +}; + +static struct ovs_mutex unused_pools_mutex = OVS_MUTEX_INITIALIZER; +static struct ovs_list unused_pools OVS_GUARDED_BY(unused_pools_mutex) = +OVS_LIST_INITIALIZER(&unused_pools); + +static void +netdev_afxdp_cleanup_unused_pool(struct unused_pool *pool) +{ +/* free the packet buffer */ +free_pagealign(pool->umem_info->buffer); + +/* cleanup umem pool */ +umem_pool_cleanup(&pool->umem_info->mpool); + +/* cleanup metadata pool */ +xpacket_pool_cleanup(&pool->umem_info->xpool); + +free(pool->umem_info); +} + +static void +netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) +{ +struct unused_pool *pool, *next; +unsi
Re: [ovs-dev] [PATCH] Shutdown SSL connection before closing socket
Sorry about that. The dangers of multiple windows and multiple ovs directories. "Why is this passing for me?!" Oh... The new patch just ignores all SSL errors like lib/stream-ssl.c's ssl_close() instead of just the want read/write. On Wed, Jul 10, 2019 at 2:59 PM Ben Pfaff wrote: > On Wed, Jul 10, 2019 at 11:07:16AM -0500, Terry Wilson wrote: > > Without shutting down the SSL connection, log messages like: > > > > stream_ssl|WARN|SSL_read: unexpected SSL connection close > > jsonrpc|WARN|ssl:127.0.0.1:47052: receive error: Protocol error > > reconnect|WARN|ssl:127.0.0.1:47052: connection dropped (Protocol error) > > > > would occur whenever the socket is closed. This just adds an > > SSLStream.close() that calls shutdown() and ignores read/write > > errors. > > > > Signed-off-by: Terry Wilson > > Thanks for the patch. > > With this applied, I get two test failures, details below. > > ## ## > ## Summary of the failures. ## > ## ## > Failed tests: > openvswitch 2.11.90 test suite test groups: > > NUM: FILE-NAME:LINE TEST-GROUP-NAME > KEYWORDS > > 2108: ovsdb-idl.at:351 simple idl, initially empty, various ops - > Python2 - SSL > ovsdb server idl positive python with ssl socket > 2439: ovsdb-idl.at:1452 simple idl verify notify - Python2 - SSL > ovsdb server idl positive python with ssl socket notify > > ## -- ## > ## Detailed failed tests. ## > ## -- ## > > # -*- compilation -*- > 2108. ovsdb-idl.at:351: testing simple idl, initially empty, various ops > - Python2 - SSL ... > ../../tests/ovsdb-idl.at:351: ovsdb-tool create db > $abs_srcdir/idltest.ovsschema > stderr: > stdout: > ../../tests/ovsdb-idl.at:351: ovsdb-server -vconsole:warn --log-file > --detach --no-chdir \ > --pidfile \ > --private-key=$PKIDIR/testpki-privkey2.pem \ > --certificate=$PKIDIR/testpki-cert2.pem \ > --ca-cert=$PKIDIR/testpki-cacert.pem \ > --remote=pssl:0:127.0.0.1 db > ovsdb-idl.at:351: waiting until TCP_PORT=`sed -n 's/.*0:.*: listening on > port \([0-9]*\)$/\1/p' "ovsdb-server.log"` && test X != X"$TCP_PORT"... > ovsdb-idl.at:351: wait succeeded immediately > ../../tests/ovsdb-idl.at:351: $PYTHON $srcdir/test-ovsdb.py -t10 idl > $srcdir/idltest.ovsschema \ > ssl:127.0.0.1:$TCP_PORT $PKIDIR/testpki-privkey.pem \ > $PKIDIR/testpki-cert.pem $PKIDIR/testpki-cacert.pem > '["idltest", > {"op": "insert", >"table": "simple", >"row": {"i": 1, >"r": 2.0, >"b": true, >"s": "mystring", >"u": ["uuid", "84f5c8f5-ac76-4dbc-a24f-8860eb407fc1"], >"ia": ["set", [1, 2, 3]], >"ra": ["set", [-0.5]], >"ba": ["set", [true]], >"sa": ["set", ["abc", "def"]], >"ua": ["set", [["uuid", > "69443985-7806-45e2-b35f-574a04e720f9"], > ["uuid", > "aad11ef0-816a-4b01-93e6-03b8b4256b98"]]]}}, > {"op": "insert", >"table": "simple", >"row": {}}]' \ > '["idltest", > {"op": "update", >"table": "simple", >"where": [], >"row": {"b": true}}]' \ > '["idltest", > {"op": "update", >"table": "simple", >"where": [], >"row": {"r": 123.5}}]' \ > '["idltest", > {"op": "insert", >"table": "simple", >"row": {"i": -1, >"r": 125, >"b": false, >"s": "", >"ia": ["set", [1]], >"ra": ["set", [1.5]], >"ba": ["set", [false]], >"sa": ["set", []], >"ua": ["set", []]}}]' \ > '["idltest", > {"op": "update", >"table": "simple", >"where": [["i", "<", 1]], >"row": {"s": "newstring"}}]' \ > '["idltest", > {"op": "delete", >"table": "simple", >"where": [["i", "==", 0]]}]' \ > 'reconnect' > stderr: > 2019-07-10T19:57:50Z | 0 | reconnect | DBG | ssl:127.0.0.1:38627: > entering BACKOFF > 2019-07-10T19:57:50Z | 1 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 2 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 3 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 4 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 5 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 6 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 7 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 8 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 9 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 10 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 11 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 12 | poller | DBG | [POLLOUT] on fd 3 > 2019-07-10T19:57:50Z | 13
Re: [ovs-dev] [PATCH V2 1/1] dpif-netlink: Log eth type 0x1234 not offloadable
On 2019-07-03 9:11 PM, Ben Pfaff wrote: > On Wed, Jul 03, 2019 at 04:58:06AM +, Eli Britstein wrote: >> Ethernet type 0x1234 is used for testing and not being offloadable. For >> testing offloadable features, log about using this value. >> >> Signed-off-by: Eli Britstein >> Acked-by: Roi Dayan >> Signed-off-by: Eli Britstein > > Acked-by: Ben Pfaff > ping. can we merge this? thanks ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2 1/1] netdev-vport: Make ip6gre netdev type to use TC rules
On 2019-07-04 10:36 AM, Eli Britstein wrote: > The offload api functions already assigned to every tunnel class. > For ip6gre tunnel class only need to also assign the get_ifindex > function, similarly as done in commit 5e63eaa969a3 ("netdev-vport: Make > gre netdev type to use TC rules"). > > Signed-off-by: Eli Britstein > Reviewed-by: Roi Dayan > --- > lib/netdev-vport.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c > index 92a256af1..b57d21ff8 100644 > --- a/lib/netdev-vport.c > +++ b/lib/netdev-vport.c > @@ -1218,7 +1218,8 @@ netdev_vport_tunnel_register(void) >.type = "ip6gre", >.build_header = netdev_gre_build_header, >.push_header = netdev_gre_push_header, > - .pop_header = netdev_gre_pop_header > + .pop_header = netdev_gre_pop_header, > + .get_ifindex = NETDEV_VPORT_GET_IFINDEX, >}, >{{NULL, NULL, 0, 0}} > }, > ping. can we merge this? thanks ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] Shutdown SSL connection before closing socket
Without shutting down the SSL connection, log messages like: stream_ssl|WARN|SSL_read: unexpected SSL connection close jsonrpc|WARN|ssl:127.0.0.1:47052: receive error: Protocol error reconnect|WARN|ssl:127.0.0.1:47052: connection dropped (Protocol error) would occur whenever the socket is closed. This just adds an SSLStream.close() that calls shutdown() and ignores SSL errors, the same way that lib/stream-ssl.c does in ssl_close(). Signed-off-by: Terry Wilson --- python/ovs/stream.py | 8 1 file changed, 8 insertions(+) diff --git a/python/ovs/stream.py b/python/ovs/stream.py index c15be4b..a98057e 100644 --- a/python/ovs/stream.py +++ b/python/ovs/stream.py @@ -825,6 +825,14 @@ class SSLStream(Stream): except SSL.SysCallError as e: return -ovs.socket_util.get_exception_errno(e) +def close(self): +if self.socket: +try: +self.socket.shutdown() +except SSL.Error: +pass +return super(SSLStream, self).close() + if SSL: # Register SSL only if the OpenSSL module is available -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN SFC Support
On Thu, Jul 11, 2019 at 12:26 PM Sood, Ritu wrote: > Hi > Is there any plan to integrate SFC support in OVN? > There was some work done and there are some demos/presentations like one > below but it seems like there is no recent activity: > http://www.openvswitch.org/support/ovscon2016/7/1400-fourie.pdf > Is there any repo being maintained for SFC patches? > https://github.com/doonhammer/ovs repo has the patches but doesn't seem > to be synced with the mainline code currently. > I am not sure if some one is actively working on it. If some one wants to pick those patches, that would be great :) Thanks Numan > Regards, > -Ritu > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 3/3] OVN: Add ovn-northd IGMP support
New IP Multicast Snooping Options are added to the Northbound DB Logical_Switch:other_config column. These allow enabling IGMP snooping and querier on the logical switch and get translated by ovn-northd to rows in the IP_Multicast Southbound DB table. ovn-northd monitors for changes done by ovn-controllers in the Southbound DB IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates Multicast_Group entries in the Southbound DB, one per IGMP_Group address X, containing the list of logical switch ports (aggregated from all controllers) that have IGMP_Group entries for that datapath and address X. ovn-northd also creates a logical flow that matches on IP multicast traffic destined to address X and outputs it on the tunnel key of the corresponding Multicast_Group entry. Signed-off-by: Dumitru Ceara Acked-by: Mark Michelson --- ovn/northd/ovn-northd.c | 460 --- ovn/ovn-nb.xml | 54 ++ tests/ovn.at| 270 tests/system-ovn.at | 119 4 files changed, 871 insertions(+), 32 deletions(-) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index ce382ac..2b71526 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -29,6 +29,7 @@ #include "openvswitch/json.h" #include "ovn/lex.h" #include "ovn/lib/chassis-index.h" +#include "ovn/lib/ip-mcast-index.h" #include "ovn/lib/ovn-l7.h" #include "ovn/lib/ovn-nb-idl.h" #include "ovn/lib/ovn-sb-idl.h" @@ -57,6 +58,7 @@ struct northd_context { struct ovsdb_idl_txn *ovnnb_txn; struct ovsdb_idl_txn *ovnsb_txn; struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name; +struct ovsdb_idl_index *sbrec_ip_mcast_by_dp; }; static const char *ovnnb_db; @@ -424,6 +426,33 @@ struct ipam_info { bool mac_only; }; +#define OVN_MIN_MULTICAST 32768 +#define OVN_MAX_MULTICAST OVN_MCAST_FLOOD_TUNNEL_KEY +BUILD_ASSERT_DECL(OVN_MIN_MULTICAST < OVN_MAX_MULTICAST); + +#define OVN_MIN_IP_MULTICAST OVN_MIN_MULTICAST +#define OVN_MAX_IP_MULTICAST (OVN_MCAST_UNKNOWN_TUNNEL_KEY - 1) +BUILD_ASSERT_DECL(OVN_MAX_IP_MULTICAST >= OVN_MIN_MULTICAST); + +/* + * Multicast snooping and querier per datapath configuration. + */ +struct mcast_info { +bool enabled; +bool querier; +bool flood_unregistered; + +int64_t table_size; +int64_t idle_timeout; +int64_t query_interval; +char *eth_src; +char *ipv4_src; +int64_t query_max_response; + +uint32_t group_key_next; +uint32_t active_flows; +}; + /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or * sb->external_ids:logical-switch. */ struct ovn_datapath { @@ -448,6 +477,9 @@ struct ovn_datapath { /* IPAM data. */ struct ipam_info ipam_info; +/* Multicast data. */ +struct mcast_info mcast_info; + /* OVN northd only needs to know about the logical router gateway port for * NAT on a distributed router. This "distributed gateway port" is * populated only when there is a "redirect-chassis" specified for one of @@ -522,6 +554,8 @@ ovn_datapath_destroy(struct hmap *datapaths, struct ovn_datapath *od) hmap_remove(datapaths, &od->key_node); destroy_tnlids(&od->port_tnlids); bitmap_free(od->ipam_info.allocated_ipv4s); +free(od->mcast_info.eth_src); +free(od->mcast_info.ipv4_src); free(od->router_ports); ovn_ls_port_group_destroy(&od->nb_pgs); free(od); @@ -659,6 +693,85 @@ init_ipam_info_for_datapath(struct ovn_datapath *od) } static void +init_mcast_info_for_datapath(struct ovn_datapath *od) +{ +if (!od->nbs) { +return; +} + +struct mcast_info *mcast_info = &od->mcast_info; + +mcast_info->enabled = +smap_get_bool(&od->nbs->other_config, "mcast_snoop", false); +mcast_info->querier = +smap_get_bool(&od->nbs->other_config, "mcast_querier", true); +mcast_info->flood_unregistered = +smap_get_bool(&od->nbs->other_config, "mcast_flood_unregistered", + false); + +mcast_info->table_size = +smap_get_ullong(&od->nbs->other_config, "mcast_table_size", +OVN_MCAST_DEFAULT_MAX_ENTRIES); + +uint32_t idle_timeout = +smap_get_ullong(&od->nbs->other_config, "mcast_idle_timeout", +OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S); +if (idle_timeout < OVN_MCAST_MIN_IDLE_TIMEOUT_S) { +idle_timeout = OVN_MCAST_MIN_IDLE_TIMEOUT_S; +} else if (idle_timeout > OVN_MCAST_MAX_IDLE_TIMEOUT_S) { +idle_timeout = OVN_MCAST_MAX_IDLE_TIMEOUT_S; +} +mcast_info->idle_timeout = idle_timeout; + +uint32_t query_interval = +smap_get_ullong(&od->nbs->other_config, "mcast_query_interval", +mcast_info->idle_timeout / 2); +if (query_interval < OVN_MCAST_MIN_QUERY_INTERVAL_S) { +query_interval = OVN_MCAST_MIN_QUERY_INTERVAL_S; +} else if (query
[ovs-dev] [PATCH v3 2/3] OVN: Add IGMP SB definitions and ovn-controller support
A new IP_Multicast table is added to Southbound DB. This table stores the multicast related configuration for each datapath. Each row will be populated by ovn-northd and will control: - if IGMP Snooping is enabled or not, the snooping table size and multicast group idle timeout. - if IGMP Querier is enabled or not (only if snooping is enabled too), query interval, query source addresses (Ethernet and IP) and the max-response field to be stored in outgoing queries. - an additional "seq_no" column is added such that ovn-sbctl or if needed a CMS can flush currently learned groups. This can be achieved by incrementing the "seq_no" value. A new IGMP_Group table is added to Southbound DB. This table stores all the multicast groups learned by ovn-controllers. The table is indexed by datapath, group address and chassis. For a learned multicast group on a specific datapath each ovn-controller will store its own row in this table. Each row contains the list of chassis-local ports on which the group was learned. Rows in the IGMP_Group table are updated or deleted only by the ovn-controllers that created them. A new action ("igmp") is added to punt IGMP packets on a specific logical switch datapath to ovn-controller if IGMP snooping is enabled. Per datapath IGMP multicast snooping support is added to pinctrl: - incoming IGMP reports are processed and multicast groups are maintained (using the OVS mcast-snooping library). - each OVN controller syncs its in-memory IGMP groups to the Southbound DB in the IGMP_Group table. - pinctrl also sends periodic IGMPv3 general queries for all datapaths where querier is enabled. Signed-off-by: Mark Michelson Co-authored-by: Mark Michelson Signed-off-by: Dumitru Ceara Acked-by: Mark Michelson --- include/ovn/actions.h |7 ovn/controller/automake.mk |2 ovn/controller/ip-mcast.c | 164 ovn/controller/ip-mcast.h | 52 +++ ovn/controller/ovn-controller.c | 23 + ovn/controller/pinctrl.c| 786 +++ ovn/controller/pinctrl.h|2 ovn/lib/actions.c | 16 + ovn/lib/automake.mk |2 ovn/lib/ip-mcast-index.c| 40 ++ ovn/lib/ip-mcast-index.h| 39 ++ ovn/lib/logical-fields.c|2 ovn/ovn-sb.ovsschema| 43 ++ ovn/ovn-sb.xml | 80 ovn/utilities/ovn-sbctl.c | 53 +++ ovn/utilities/ovn-trace.c |4 tests/ovn.at|4 17 files changed, 1314 insertions(+), 5 deletions(-) create mode 100644 ovn/controller/ip-mcast.c create mode 100644 ovn/controller/ip-mcast.h create mode 100644 ovn/lib/ip-mcast-index.c create mode 100644 ovn/lib/ip-mcast-index.h diff --git a/include/ovn/actions.h b/include/ovn/actions.h index f42bbc2..fe19424 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -67,6 +67,7 @@ struct ovn_extend_table; OVNACT(ICMP4, ovnact_nest)\ OVNACT(ICMP4_ERROR, ovnact_nest)\ OVNACT(ICMP6, ovnact_nest)\ +OVNACT(IGMP, ovnact_null)\ OVNACT(TCP_RESET, ovnact_nest)\ OVNACT(ND_NA, ovnact_nest)\ OVNACT(ND_NA_ROUTER, ovnact_nest)\ @@ -486,6 +487,12 @@ enum action_opcode { * The actions, in OpenFlow 1.3 format, follow the action_header. */ ACTION_OPCODE_ICMP4_ERROR, + +/* "igmp()". + * + * Snoop IGMP, learn the multicast participants + */ +ACTION_OPCODE_IGMP, }; /* Header. */ diff --git a/ovn/controller/automake.mk b/ovn/controller/automake.mk index fcdf7a4..193ea69 100644 --- a/ovn/controller/automake.mk +++ b/ovn/controller/automake.mk @@ -10,6 +10,8 @@ ovn_controller_ovn_controller_SOURCES = \ ovn/controller/encaps.h \ ovn/controller/ha-chassis.c \ ovn/controller/ha-chassis.h \ + ovn/controller/ip-mcast.c \ + ovn/controller/ip-mcast.h \ ovn/controller/lflow.c \ ovn/controller/lflow.h \ ovn/controller/lport.c \ diff --git a/ovn/controller/ip-mcast.c b/ovn/controller/ip-mcast.c new file mode 100644 index 000..ef36be2 --- /dev/null +++ b/ovn/controller/ip-mcast.c @@ -0,0 +1,164 @@ +/* Copyright (c) 2019, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "ip-mcast.h" +#include "lport.
[ovs-dev] [PATCH v3 1/3] packets: Add IGMPv3 query packet definitions
Signed-off-by: Dumitru Ceara Acked-by: Mark Michelson --- lib/packets.c | 44 lib/packets.h | 19 ++- 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/lib/packets.c b/lib/packets.c index a8fd61f..ab0b1a3 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1281,6 +1281,50 @@ packet_set_icmp(struct dp_packet *packet, uint8_t type, uint8_t code) } } +/* Sets the IGMP type to IGMP_HOST_MEMBERSHIP_QUERY and populates the + * v3 query header fields in 'packet'. 'packet' must be a valid IGMPv3 + * query packet with its l4 offset properly populated. + */ +void +packet_set_igmp3_query(struct dp_packet *packet, uint8_t max_resp, + ovs_be32 group, bool srs, uint8_t qrv, uint8_t qqic) +{ +struct igmpv3_query_header *igh = dp_packet_l4(packet); +ovs_be16 orig_type_max_resp = +htons(igh->type << 8 | igh->max_resp); +ovs_be16 new_type_max_resp = +htons(IGMP_HOST_MEMBERSHIP_QUERY << 8 | max_resp); + +if (orig_type_max_resp != new_type_max_resp) { +igh->type = IGMP_HOST_MEMBERSHIP_QUERY; +igh->max_resp = max_resp; +igh->csum = recalc_csum16(igh->csum, orig_type_max_resp, + new_type_max_resp); +} + +ovs_be32 old_group = get_16aligned_be32(&igh->group); + +if (old_group != group) { +put_16aligned_be32(&igh->group, group); +igh->csum = recalc_csum32(igh->csum, old_group, group); +} + +/* See RFC 3376 4.1.6. */ +if (qrv > 7) { +qrv = 0; +} + +ovs_be16 orig_srs_qrv_qqic = htons(igh->srs_qrv << 8 | igh->qqic); +ovs_be16 new_srs_qrv_qqic = htons(srs << 11 | qrv << 8 | qqic); + +if (orig_srs_qrv_qqic != new_srs_qrv_qqic) { +igh->srs_qrv = (srs << 3 | qrv); +igh->qqic = qqic; +igh->csum = recalc_csum16(igh->csum, orig_srs_qrv_qqic, + new_srs_qrv_qqic); +} +} + void packet_set_nd_ext(struct dp_packet *packet, const ovs_16aligned_be32 rso_flags, const uint8_t opt_type) diff --git a/lib/packets.h b/lib/packets.h index d293b35..4124490 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -681,6 +681,7 @@ char *ip_parse_cidr_len(const char *s, int *n, ovs_be32 *ip, #define IP_ECN_ECT_0 0x02 #define IP_ECN_CE 0x03 #define IP_ECN_MASK 0x03 +#define IP_DSCP_CS6 0xc0 #define IP_DSCP_MASK 0xfc static inline int @@ -763,6 +764,20 @@ struct igmpv3_header { }; BUILD_ASSERT_DECL(IGMPV3_HEADER_LEN == sizeof(struct igmpv3_header)); +#define IGMPV3_QUERY_HEADER_LEN 12 +struct igmpv3_query_header { +uint8_t type; +uint8_t max_resp; +ovs_be16 csum; +ovs_16aligned_be32 group; +uint8_t srs_qrv; +uint8_t qqic; +ovs_be16 nsrcs; +}; +BUILD_ASSERT_DECL( +IGMPV3_QUERY_HEADER_LEN == sizeof(struct igmpv3_query_header +)); + #define IGMPV3_RECORD_LEN 8 struct igmpv3_record { uint8_t type; @@ -1543,7 +1558,9 @@ void packet_set_nd(struct dp_packet *, const struct in6_addr *target, void packet_set_nd_ext(struct dp_packet *packet, const ovs_16aligned_be32 rso_flags, const uint8_t opt_type); - +void packet_set_igmp3_query(struct dp_packet *, uint8_t max_resp, +ovs_be32 group, bool srs, uint8_t qrv, +uint8_t qqic); void packet_format_tcp_flags(struct ds *, uint16_t); const char *packet_tcp_flag_to_string(uint32_t flag); void compose_arp__(struct dp_packet *); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 0/3] OVN: Add IGMP support
This series introduces support for IGMP Snooping and IGMP Querier. IGMP versions v1-v3 are supported for snooping and IGMP queries originated by ovn-controller are general IGMPv3 queries. The rationale behind using v3 for querier is that it's backward compatible with v1-v2. The majority of the code is IP version independent with the thought in mind that support for MLD snooping for IPv6 will be added next. Dumitru Ceara (3): packets: Add IGMPv3 query packet definitions OVN: Add IGMP SB definitions and ovn-controller support OVN: Add ovn-northd IGMP support include/ovn/actions.h |7 lib/packets.c | 44 ++ lib/packets.h | 19 + ovn/controller/automake.mk |2 ovn/controller/ip-mcast.c | 164 ovn/controller/ip-mcast.h | 52 +++ ovn/controller/ovn-controller.c | 23 + ovn/controller/pinctrl.c| 786 +++ ovn/controller/pinctrl.h|2 ovn/lib/actions.c | 16 + ovn/lib/automake.mk |2 ovn/lib/ip-mcast-index.c| 40 ++ ovn/lib/ip-mcast-index.h| 39 ++ ovn/lib/logical-fields.c|2 ovn/northd/ovn-northd.c | 460 +-- ovn/ovn-nb.xml | 54 +++ ovn/ovn-sb.ovsschema| 43 ++ ovn/ovn-sb.xml | 80 ovn/utilities/ovn-sbctl.c | 53 +++ ovn/utilities/ovn-trace.c |4 tests/ovn.at| 274 ++ tests/system-ovn.at | 119 ++ 22 files changed, 2247 insertions(+), 38 deletions(-) create mode 100644 ovn/controller/ip-mcast.c create mode 100644 ovn/controller/ip-mcast.h create mode 100644 ovn/lib/ip-mcast-index.c create mode 100644 ovn/lib/ip-mcast-index.h --- v3: - add acks from Mark Michelson - fix action "igmp": no need for nested actions - fix overlap between unknown multicast tunnel key and IP multicast tunnel keys v2: - address reviewer comments. - fix a memory corruption when reallocating multicast ports in ovn_multicast_add_ports. - add missing NULL checks in pinctrl.c for the case when a logical switch configuration gets deleted by ovn-northd and controllers are updating IGMP_Group entries. - Fix allocation of multicast group IDs in ovn-northd. The multicast group IDs were allocated globally instead of per-datapath which was limiting the total number of IGMP Groups. At most 32K IGMP groups can be learned per datapath regardless of how many datapaths are configured. - add system-ovn.at test. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/3] OVN: Add IGMP SB definitions and ovn-controller support
On Wed, Jul 10, 2019 at 10:44 PM Ben Pfaff wrote: > > On Tue, Jul 09, 2019 at 02:49:18PM +0200, Dumitru Ceara wrote: > > A new action ("igmp") is added to punt IGMP packets on a specific logical > > switch datapath to ovn-controller if IGMP snooping is enabled. > > I do not fully understand the new action. It is defined to take a set > of nested actions inside {}, but I don't see what it actually does with > those actions. I do not think that it executes them, and the > documentation does not mention them. Maybe this means that it is a > mistake to have it take them at all? You're right, it's a mistake. It should be "igmp;" with no nested actions. It will be fixed in v3. Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 0/3] OVN: Add IGMP support
On Wed, Jul 10, 2019 at 9:56 PM Mark Michelson wrote: > > For the series: > > Acked-by: Mark Michelson Thanks for reviewing the patches! > > On 7/9/19 8:48 AM, Dumitru Ceara wrote: > > This series introduces support for IGMP Snooping and IGMP Querier. IGMP > > versions v1-v3 are supported for snooping and IGMP queries originated by > > ovn-controller are general IGMPv3 queries. The rationale behind using v3 for > > querier is that it's backward compatible with v1-v2. > > > > The majority of the code is IP version independent with the thought in mind > > that support for MLD snooping for IPv6 will be added next. > > > > Dumitru Ceara (3): > >packets: Add IGMPv3 query packet definitions > >OVN: Add IGMP SB definitions and ovn-controller support > >OVN: Add ovn-northd IGMP support > > > > > > include/ovn/actions.h |6 > > lib/packets.c | 44 ++ > > lib/packets.h | 19 + > > ovn/controller/automake.mk |2 > > ovn/controller/ip-mcast.c | 164 > > ovn/controller/ip-mcast.h | 52 +++ > > ovn/controller/ovn-controller.c | 23 + > > ovn/controller/pinctrl.c| 787 > > +++ > > ovn/controller/pinctrl.h|2 > > ovn/lib/actions.c | 22 + > > ovn/lib/automake.mk |2 > > ovn/lib/ip-mcast-index.c| 40 ++ > > ovn/lib/ip-mcast-index.h| 38 ++ > > ovn/lib/logical-fields.c|2 > > ovn/northd/ovn-northd.c | 454 +- > > ovn/ovn-nb.xml | 54 +++ > > ovn/ovn-sb.ovsschema| 43 ++ > > ovn/ovn-sb.xml | 80 > > ovn/utilities/ovn-sbctl.c | 53 +++ > > ovn/utilities/ovn-trace.c | 14 + > > tests/ovn.at| 276 ++ > > tests/system-ovn.at | 119 ++ > > 22 files changed, 2260 insertions(+), 36 deletions(-) > > create mode 100644 ovn/controller/ip-mcast.c > > create mode 100644 ovn/controller/ip-mcast.h > > create mode 100644 ovn/lib/ip-mcast-index.c > > create mode 100644 ovn/lib/ip-mcast-index.h > > > > > > --- > > v2: > > - address reviewer comments. > > - fix a memory corruption when reallocating multicast ports in > >ovn_multicast_add_ports. > > - add missing NULL checks in pinctrl.c for the case when a logical switch > >configuration gets deleted by ovn-northd and controllers are updating > >IGMP_Group entries. > > - Fix allocation of multicast group IDs in ovn-northd. The multicast group > >IDs were allocated globally instead of per-datapath which was limiting > >the total number of IGMP Groups. At most 32K IGMP groups can be learned > >per datapath regardless of how many datapaths are configured. > > - add system-ovn.at test. > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] 答复: Why is ovs DPDK much worse than ovs in my test case?
On 11.07.2019 3:27, Yi Yang (杨燚)-云服务集团 wrote: > BTW, offload features are on in my test client1 and server1 (iperf server) > > vagrant@client1:~$ ethtool -k enp0s8 > Features for enp0s8: > rx-checksumming: on [fixed] > tx-checksumming: on > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: on > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: off [fixed] > tx-checksum-sctp: off [fixed] > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: off [fixed] > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: off [fixed] > tx-tcp6-segmentation: on > udp-fragmentation-offload: on > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: off [fixed] > tx-vlan-offload: off [fixed] > ntuple-filters: off [fixed] > receive-hashing: off [fixed] > highdma: on [fixed] > rx-vlan-filter: on [fixed] > vlan-challenged: off [fixed] > tx-lockless: off [fixed] > netns-local: off [fixed] > tx-gso-robust: on [fixed] > tx-fcoe-segmentation: off [fixed] > tx-gre-segmentation: off [fixed] > tx-ipip-segmentation: off [fixed] > tx-sit-segmentation: off [fixed] > tx-udp_tnl-segmentation: off [fixed] > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off [fixed] > tx-vlan-stag-hw-insert: off [fixed] > rx-vlan-stag-hw-parse: off [fixed] > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off [fixed] > busy-poll: on [fixed] > hw-tc-offload: off [fixed] > vagrant@client1:~$ > > vagrant@server1:~$ ifconfig enp0s8 > enp0s8Link encap:Ethernet HWaddr 08:00:27:c0:a6:0b > inet addr:192.168.230.101 Bcast:192.168.230.255 Mask:255.255.255.0 > inet6 addr: fe80::a00:27ff:fec0:a60b/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 > RX packets:4228443 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2484988 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:34527894301 (34.5 GB) TX bytes:528944799 (528.9 MB) > > vagrant@server1:~$ ethtool -k enp0s8 > Features for enp0s8: > rx-checksumming: on [fixed] > tx-checksumming: on > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: on > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: off [fixed] > tx-checksum-sctp: off [fixed] > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: off [fixed] > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: off [fixed] > tx-tcp6-segmentation: on > udp-fragmentation-offload: on > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: off [fixed] > tx-vlan-offload: off [fixed] > ntuple-filters: off [fixed] > receive-hashing: off [fixed] > highdma: on [fixed] > rx-vlan-filter: on [fixed] > vlan-challenged: off [fixed] > tx-lockless: off [fixed] > netns-local: off [fixed] > tx-gso-robust: on [fixed] > tx-fcoe-segmentation: off [fixed] > tx-gre-segmentation: off [fixed] > tx-ipip-segmentation: off [fixed] > tx-sit-segmentation: off [fixed] > tx-udp_tnl-segmentation: off [fixed] > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off [fixed] > tx-vlan-stag-hw-insert: off [fixed] > rx-vlan-stag-hw-parse: off [fixed] > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off [fixed] > busy-poll: on [fixed] > hw-tc-offload: off [fixed] > vagrant@server1:~$ > > -邮件原件- > 发件人: Yi Yang (杨燚)-云服务集团 > 发送时间: 2019年7月11日 8:22 > 收件人: i.maxim...@samsung.com; ovs-dev@openvswitch.org > 抄送: Yi Yang (杨燚)-云服务集团 > 主题: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case? > 重要性: 高 > > Ilya, thank you so much, using 9K MTU for all the virtio interfaces in > transport path does help (including DPDK port), the data is here. 8K usually works a bit better for me than 9K. Probably, because of the page size. Have you configured MTU for the tap interfaces on host side too just in case that host kernel doesn't negotiate the MTU with guest? > > vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101 > > Client connecting to 192.168.230.101, TCP port 5001 > TCP window size: 325 KByte (default) > > [ 3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port > 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 315 MBytes 264 Mbits/sec > [ 3] 10.0-20.0 sec 333 MBytes 280 Mbits/sec > [ 3] 20.0-30.0 sec 300 MBytes 252 Mbits/sec > [ 3] 30.0-40.0 sec 307 MBytes 258 Mbits/sec > [ 3] 40.0-50.0 sec 322 MBytes 270 Mbits/sec > [ 3] 50.0-60.0 sec 316 MBytes 265 Mbits/sec > [