On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique <[email protected]> wrote:
> > > On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique <[email protected]> > wrote: > >> >> >> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou <[email protected]> wrote: >> >>> >>> >>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique <[email protected]> >>> wrote: >>> > >>> > >>> > >>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < >>> [email protected]> wrote: >>> >> >>> >> Thanks Numan for running these tests outside OpenStack! >>> >> >>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <[email protected]> >>> wrote: >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou <[email protected]> wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou <[email protected]> >>> wrote: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < >>> [email protected]> wrote: >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <[email protected]> >>> wrote: >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < >>> [email protected]> wrote: >>> >> >> > >> > >>> >> >> > >> > Thanks a lot Han for the answer! >>> >> >> > >> > >>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <[email protected]> >>> wrote: >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < >>> [email protected]> wrote: >>> >> >> > >> > > > >>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >>> >> >> > >> > > > <[email protected]> wrote: >>> >> >> > >> > > > > >>> >> >> > >> > > > > Hi Han, all, >>> >> >> > >> > > > > >>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' >>> testing of OpenStack >>> >> >> > >> > > > > using OVN and wanted to present some results and >>> issues that we've >>> >> >> > >> > > > > found with the Incremental Processing feature in >>> ovn-controller. Below >>> >> >> > >> > > > > is the scenario that we executed: >>> >> >> > >> > > > > >>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 >>> compute nodes. OVS >>> >> >> > >> > > > > 2.10. >>> >> >> > >> > > > > * The test consists on: >>> >> >> > >> > > > > - Create openstack network (OVN LS), subnet and >>> router >>> >> >> > >> > > > > - Attach subnet to the router and set gw to the >>> external network >>> >> >> > >> > > > > - Create an OpenStack port and apply a Security >>> Group (ACLs to allow >>> >> >> > >> > > > > UDP, SSH and ICMP). >>> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes >>> (randomly) by >>> >> >> > >> > > > > attaching it to a network namespace. >>> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == >>> True' in NB) >>> >> >> > >> > > > > - Wait until the test can ping the port >>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process >>> to execute the >>> >> >> > >> > > > > test above 150 times. >>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat >>> will delete all >>> >> >> > >> > > > > the OpenStack/OVN resources. >>> >> >> > >> > > > > >>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some >>> results which showed >>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as >>> expected) in all >>> >> >> > >> > > > > the nodes especially during the deletion phase: >>> >> >> > >> > > > > >>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>> https://imgur.com/a/8ffKKYF >>> >> >> > >> > > > > >>> >> >> > >> > > > > After conducting the tests above, we replaced >>> ovn-controller in all 7 >>> >> >> > >> > > > > nodes by the one with the current master branch >>> (actually from last >>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers >>> but the >>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). >>> The expected >>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and >>> also better >>> >> >> > >> > > > > times due to the Incremental Processing feature >>> introduced recently. >>> >> >> > >> > > > > However, the results don't look very good: >>> >> >> > >> > > > > >>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>> https://imgur.com/a/99kiyDp >>> >> >> > >> > > > > >>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU >>> consumption is >>> >> >> > >> > > > > that it's much less in the Incremental Processing >>> (IP) case which >>> >> >> > >> > > > > apparently doesn't make much sense. This led us to >>> think that perhaps >>> >> >> > >> > > > > ovn-controller was not installing the necessary flows >>> in the switch >>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the >>> dataplane >>> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were >>> unreachable via ping >>> >> >> > >> > > > > when using ovn-controller from master. >>> >> >> > >> > > > > >>> >> >> > >> > > > > @Han, others, do you have any ideas as of what could >>> be happening >>> >> >> > >> > > > > here? We'll be able to use this setup for a few more >>> days so let me >>> >> >> > >> > > > > know if you want us to pull some other data/traces, >>> ... >>> >> >> > >> > > > > >>> >> >> > >> > > > > Some other interesting things: >>> >> >> > >> > > > > On each of the compute nodes, (with an almost evenly >>> distributed >>> >> >> > >> > > > > number of logical ports bound to them), the max >>> amount of logical >>> >> >> > >> > > > > flows in br-int is ~90K (by the end of the test, >>> right before deleting >>> >> >> > >> > > > > the resources). >>> >> >> > >> > > > > >>> >> >> > >> > > > > It looks like with the IP version, ovn-controller >>> leaks some memory: >>> >> >> > >> > > > > https://imgur.com/a/trQrhWd >>> >> >> > >> > > > > While with OVS 2.10, it remains pretty flat during >>> the test: >>> >> >> > >> > > > > https://imgur.com/a/KCkIT4O >>> >> >> > >> > > > >>> >> >> > >> > > > Hi Daniel, Han, >>> >> >> > >> > > > >>> >> >> > >> > > > I just sent a small patch for the ovn-controller memory >>> leak: >>> >> >> > >> > > > https://patchwork.ozlabs.org/patch/1113758/ >>> >> >> > >> > > > >>> >> >> > >> > > > At least on my setup this is what valgrind was pointing >>> at. >>> >> >> > >> > > > >>> >> >> > >> > > > Cheers, >>> >> >> > >> > > > Dumitru >>> >> >> > >> > > > >>> >> >> > >> > > > > >>> >> >> > >> > > > > Looking forward to hearing back :) >>> >> >> > >> > > > > Daniel >>> >> >> > >> > > > > >>> >> >> > >> > > > > PS. Sorry for my previous email, I sent it by mistake >>> without the subject >>> >> >> > >> > > > > _______________________________________________ >>> >> >> > >> > > > > discuss mailing list >>> >> >> > >> > > > > [email protected] >>> >> >> > >> > > > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>> >> >> > >> > > >>> >> >> > >> > > Thanks Daniel for the testing and reporting, and thanks >>> Dumitru for fixing the memory leak. >>> >> >> > >> > > >>> >> >> > >> > > Currently ovn-controller incremental processing only >>> handles below SB changes incrementally: >>> >> >> > >> > > - logical_flow >>> >> >> > >> > > - port_binding (for regular VIF binding NOT on current >>> chassis) >>> >> >> > >> > > - mc_group >>> >> >> > >> > > - address_set >>> >> >> > >> > > - port_group >>> >> >> > >> > > - mac_binding >>> >> >> > >> > > >>> >> >> > >> > > So, in test scenario you described, since each iteration >>> creates network (SB datapath changes) and router ports (port_binding >>> changes for non VIF), the incremental processing would not help much, >>> because most steps in your test should trigger recompute. It would help if >>> you create more Fake VMs in each iteration, e.g. create 10 VMs or more on >>> each LS. Secondly, when VIF port-binding happens on current chassis, the >>> ovn-controller will still do re-compute, and because you have only 4 >>> compute nodes, so 1/4 of the compute node will still recompute even when >>> binding a regular VIF port. When you have more compute nodes you would see >>> incremental processing more effective. >>> >> >> > >> > >>> >> >> > >> > Got it, it makes sense (although then worst case, it should >>> be at >>> >> >> > >> > least what we had before and not worse but it can also be >>> because >>> >> >> > >> > we're mixing version here: 2.10 vs master). >>> >> >> > >> > > >>> >> >> > >> > > However, what really worries me is the 10% VM >>> unreachable. I have one confusion here on the test steps. The last step you >>> described was: - Wait until the test can ping the port. So if the VM is not >>> pingable the test won't continue? >>> >> >> > >> > >>> >> >> > >> > Sorry I should've explained it better. We wait for 2 >>> minutes to the >>> >> >> > >> > port to respond to pings, if it's not reachable then we >>> continue with >>> >> >> > >> > the next port (16 rally processes are running >>> simultaneously so the >>> >> >> > >> > rest of the process may be doing stuff at the same time). >>> >> >> > >> > >>> >> >> > >> > > >>> >> >> > >> > > To debug the problem, the first thing is to identify what >>> flows are missing for the VMs that is unreachable. Could you do ovs-appctl >>> ofproto/trace for the ICMP flow of any VM with ping failure? And then, >>> please enable debug log for ovn-controller with ovs-appctl -t >>> ovn-controller vlog/set file:dbg. There may be too many logs so please >>> enable it for as short time as any VM with ping failure is reproduced. If >>> the last step "wait until the test can ping the port" is there then it >>> should be able to detect the first occurrence if the VM is not reachable in >>> e.g. 30 sec. >>> >> >> > >> > >>> >> >> > >> > We'll need to hack a bit here but let's see :) >>> >> >> > >> > > >>> >> >> > >> > > In the ovn-scale-test we didn't have data plane test, but >>> this problem was not seen in our live environment either, with a far larger >>> scale. The major difference in your test v.s. our environment are: >>> >> >> > >> > > - We are runing with an older version. So there might be >>> some rebase/refactor problem caused this. To eliminate this, I'd suggest to >>> try a branch I created for 2.10 ( >>> https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), which matches >>> the base test you did which is also 2.10. It may also eliminate >>> compatibility problem, if there is any, between OVN master branch and OVS >>> 2.10 as you mentioned is used in the test. >>> >> >> > >> > > - We don't use Security Group (I guess the ~90k OVS >>> flows you mentioned were mainly introduced by the Security Group use, if >>> all ports were put in same group). The incremental processing is expected >>> to be correct for security-groups, and handling it incrementally because of >>> address_set and port_group incremental processing. However, since the >>> testing only relied on the regression tests, I am not 100% sure if the test >>> coverage was sufficient. So could you try disabling Security Group to rule >>> out the problem? >>> >> >> > >> > >>> >> >> > >> > Ok will try to repeat the tests without the SGs. >>> >> >> > >> > > >>> >> >> > >> > > Thanks, >>> >> >> > >> > > Han >>> >> >> > >> > >>> >> >> > >> > Thanks once again! >>> >> >> > >> > Daniel >>> >> >> > >> >>> >> >> > >> Hi Daniel, >>> >> >> > >> >>> >> >> > >> Any updates? Do you still see the 10% VM unreachable >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> Thanks, >>> >> >> > >> Han >>> >> >> > > >>> >> >> > > >>> >> >> > > Hi Han, >>> >> >> > > >>> >> >> > > As such there is no datapath impact. After increasing the ping >>> wait timeout value from 120 seconds to 180 seconds its 100% now. >>> >> >> > > >>> >> >> > > But the time taken to program the flows is too huge when >>> compared to OVN master without IP patches. >>> >> >> > > Here is some data - http://paste.openstack.org/show/753224/ >>> . I am still investigating it. I will update my findings in some time. >>> >> >> > > >>> >> >> > > Please see the times for the action - vm.wait_for_ping >>> >> >> > > >>> >> >> > >>> >> >> > Thanks Numan for the investigation and update. Glad to hear >>> there is no correctness issue, but sorry for the slowness in your test >>> scenario. I expect that the operations in your test trigger recomputing and >>> the worst case should be similar performance as withour I-P. It is weird >>> that it turned out so much slower in your test. There can be some extra >>> overhead when it tries to do incremental processing and then fallback to >>> full recompute, but it shouldn't cause that big difference. It might be >>> that for some reason the main loop iteration is triggered more times >>> unnecessarily. I'd suggest to compare the coverage counter "lflow_run" >>> between the tests, and also check perf report to see if the hotspot is >>> somewhere else. (Sorry that I can't provide full-time help now since I am >>> still on vacation but I will try to be useful if things are blocked) >>> >> >> >>> >> >> Hi Numan/Daniel, do you have any new findings on why I-P got worse >>> result in your test? The extremely long latency (2 - 3 min) shown in your >>> report reminds me a similar problem I reported before: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346321.html >>> >> >> >>> >> >> The root cause of that problem was still not clear. In that >>> report, the extremely long latency (7 min) was observed without I-P and it >>> didn't happen with I-P. If it is the same problem, then I suspect it is not >>> related to I-P or non I-P, but some problem related to ovsdb monitor >>> condition change. To confirm if it is same problem, could you: >>> >> >> 1. pause the test when the scale is big enough (e.g. when the test >>> is almost completed), and then >>> >> >> 2. enable ovn-controller debug log, and then >>> >> >> 3. run one more iteration of the test, and see if the time was >>> spent on waiting for SB DB update notification. >>> >> >> >>> >> >> Please ignore my speculation above if you already found the root >>> cause and it would be great if you could share it :) >>> >> > >>> >> > >>> >> > Thanks for sharing this Han. >>> >> > >>> >> > I do not have any new findings. Yesterday I ran ovn-scale-test >>> comparing OVN with IP vs without IP (using the master branch). >>> >> > The test creates a new logical switch, adds it to a router, few >>> ACLs and creates 2 logical ports and pings between them. >>> >> > I am using physical deployment which creates actual namespaces >>> instead of sandboxes. >>> >> > >>> >> > The results doesn't show any huge difference between the two. >>> >> 2300 vs 2900 seconds total time or 44 vs 56 seconds for the 95%ile? >>> >> It is not negligible IMHO. It's a >25% penalty with the IP. Maybe I >>> >> missed something from the results? >>> >> >>> > >>> > Initially I ran with ovn-nbctl running commands as one batch (ie >>> combining commands with "--"). The results were very similar. Like this one >>> > >>> > ******* >>> > >>> > With non IP - ovn-nbctl NO daemon mode >>> > >>> > >>> +--------------------------------------------------------------------------------------------------------------+ >>> > | Response Times (sec) >>> | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > | action | min | median | 90%ile | >>> 95%ile | max | avg | success | count | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > | ovn_network.create_routers | 0.288 | 0.429 | 5.454 | >>> 5.538 | 20.531 | 1.523 | 100.0% | 1000 | >>> > | ovn.create_lswitch | 0.046 | 0.139 | 0.202 | >>> 5.084 | 10.259 | 0.441 | 100.0% | 1000 | >>> > | ovn_network.connect_network_to_router | 0.164 | 0.411 | 5.307 | >>> 5.491 | 15.636 | 1.128 | 100.0% | 1000 | >>> > | ovn.create_lport | 0.11 | 0.272 | 0.478 | >>> 5.284 | 15.496 | 0.835 | 100.0% | 1000 | >>> > | ovn_network.bind_port | 1.302 | 2.367 | 2.834 | >>> 3.24 | 12.409 | 2.527 | 100.0% | 1000 | >>> > | ovn_network.wait_port_up | 0.0 | 0.001 | 0.001 | >>> 0.001 | 0.002 | 0.001 | 100.0% | 1000 | >>> > | ovn_network.ping_ports | 0.04 | 10.24 | 10.397 | >>> 10.449 | 10.82 | 6.767 | 100.0% | 1000 | >>> > | total | 2.219 | 13.903 | 23.068 | >>> 24.538 | 49.437 | 13.222 | 100.0% | 1000 | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > >>> > >>> > With IP - ovn-nbctl NO daemon mode >>> > >>> > concurrency - 10 >>> > >>> > >>> +--------------------------------------------------------------------------------------------------------------+ >>> > | Response Times (sec) >>> | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > | action | min | median | 90%ile | >>> 95%ile | max | avg | success | count | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > | ovn_network.create_routers | 0.274 | 0.402 | 0.493 | >>> 0.51 | 0.584 | 0.408 | 100.0% | 1000 | >>> > | ovn.create_lswitch | 0.064 | 0.137 | 0.213 | >>> 0.244 | 0.33 | 0.146 | 100.0% | 1000 | >>> > | ovn_network.connect_network_to_router | 0.203 | 0.395 | 0.677 | >>> 0.766 | 0.912 | 0.427 | 100.0% | 1000 | >>> > | ovn.create_lport | 0.13 | 0.261 | 0.437 | >>> 0.497 | 0.604 | 0.283 | 100.0% | 1000 | >>> > | ovn_network.bind_port | 1.307 | 2.374 | 2.816 | >>> 2.904 | 3.401 | 2.325 | 100.0% | 1000 | >>> > | ovn_network.wait_port_up | 0.0 | 0.001 | 0.001 | >>> 0.001 | 0.002 | 0.001 | 100.0% | 1000 | >>> > | ovn_network.ping_ports | 0.028 | 10.237 | 10.422 | >>> 10.474 | 11.281 | 6.453 | 100.0% | 1000 | >>> > | total | 2.251 | 13.631 | 14.822 | >>> 15.008 | 15.901 | 10.044 | 100.0% | 1000 | >>> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> > >>> > ***************** >>> > >>> > The results I shared in the previous email were with ACLs added and >>> ovn-nbctl - batch mode disabled. >>> > >>> > I agree with you. Let me do few more runs to be sure that the results >>> are consistent. >>> > >>> > Thanks >>> > Numan >>> > >>> > >>> >> > I will test with OVN 2.9 vs 2.11 master along with what you have >>> suggested above and see if there are any problems related to ovsdb monitor >>> condition change. >>> >> > >>> >> > Thanks >>> >> > Numan >>> >> > >>> >> > Below are the results >>> >> > >>> >> > >>> >> > With IP master - nbctl daemon node - No batch mode >>> >> > concurrency - 10 >>> >> > >>> >> > >>> +--------------------------------------------------------------------------------------------------------------+ >>> >> > | Response Times (sec) >>> | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > | action | min | median | 90%ile | >>> 95%ile | max | avg | success | count | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > | ovn_network.create_routers | 0.269 | 0.661 | 10.426 | >>> 15.422 | 37.259 | 3.721 | 100.0% | 1000 | >>> >> > | ovn.create_lswitch | 0.313 | 0.45 | 12.107 | >>> 15.373 | 30.405 | 4.185 | 100.0% | 1000 | >>> >> > | ovn_network.connect_network_to_router | 0.163 | 0.255 | 10.121 | >>> 10.64 | 20.475 | 2.655 | 100.0% | 1000 | >>> >> > | ovn.create_lport | 0.351 | 0.514 | 12.255 | >>> 15.511 | 34.74 | 4.621 | 100.0% | 1000 | >>> >> > | ovn_network.bind_port | 1.362 | 2.447 | 7.34 | >>> 7.651 | 17.651 | 3.146 | 100.0% | 1000 | >>> >> > | ovn_network.wait_port_up | 0.086 | 2.734 | 5.272 | >>> 7.827 | 22.717 | 2.957 | 100.0% | 1000 | >>> >> > | ovn_network.ping_ports | 0.038 | 10.196 | 20.285 | >>> 20.39 | 40.74 | 7.52 | 100.0% | 1000 | >>> >> > | total | 2.862 | 27.267 | 49.956 | >>> 56.39 | 90.884 | 28.808 | 100.0% | 1000 | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > Load duration: 2950.4133141 >>> >> > Full duration: 2951.58845997 seconds >>> >> > >>> >> > *********** >>> >> > With non IP - nbctl daemin node -ACLs - No batch mode >>> >> > >>> >> > concurrency - 10 >>> >> > >>> >> > >>> +--------------------------------------------------------------------------------------------------------------+ >>> >> > | Response Times (sec) >>> | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > | action | min | median | 90%ile | >>> 95%ile | max | avg | success | count | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > | ovn_network.create_routers | 0.267 | 0.421 | 10.395 | >>> 10.735 | 25.501 | 3.09 | 100.0% | 1000 | >>> >> > | ovn.create_lswitch | 0.314 | 0.408 | 10.331 | >>> 10.483 | 25.357 | 3.049 | 100.0% | 1000 | >>> >> > | ovn_network.connect_network_to_router | 0.153 | 0.249 | 6.552 | >>> 10.268 | 20.545 | 2.236 | 100.0% | 1000 | >>> >> > | ovn.create_lport | 0.344 | 0.49 | 10.566 | >>> 15.428 | 25.542 | 3.906 | 100.0% | 1000 | >>> >> > | ovn_network.bind_port | 1.372 | 2.409 | 7.437 | >>> 7.665 | 17.518 | 3.192 | 100.0% | 1000 | >>> >> > | ovn_network.wait_port_up | 0.086 | 1.323 | 5.157 | >>> 7.769 | 20.166 | 2.291 | 100.0% | 1000 | >>> >> > | ovn_network.ping_ports | 0.034 | 2.077 | 10.347 | >>> 10.427 | 20.307 | 5.123 | 100.0% | 1000 | >>> >> > | total | 3.109 | 21.26 | 39.245 | >>> 44.495 | 70.197 | 22.889 | 100.0% | 1000 | >>> >> > >>> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>> >> > Load duration: 2328.11378407 >>> >> > Full duration: 2334.43504095 seconds >>> >> > >>> >> >>> >>> Hi Numan/Daniel, >>> >>> I spent some time investigating this problem you reported. Thanks Numan >>> for the offline help sharing the details. >>> >>> Although I still didn't reproduce the slowness in my current single node >>> testing env with almost same steps and ACLs shared by Numan, I think I may >>> have figured out a highy probable cause of what you have seen. >>> >>> Here is my theory: there is a difference between the I-P and non-I-P in >>> the main loop. The non-I-P version checks ofctrl_can_put() before doing any >>> flow computation (which is introduced to solve a serious performance >>> problem when there are many OVS flows on a single node, see [1]). When >>> worked out the I-P version, I found this may not be the best approach, >>> since there can be new incremental changes coming and we want to process >>> them in current iteration incrementally, so that we don't need to fallback >>> to recompute in next iteration. So this logic is changed so that we always >>> prioritize computing new changes and keeping the desired flow table up to >>> date, while the in-flight messages to ovs-vswitchd may still pending for an >>> older version of desired state. In the end the final desired state will be >>> synced again to ovs-vswitchd. If there are new changes that triggers >>> recompute again, the recompute (which is always slow) will slow down the >>> ofctrl_run() which keeps sending old pending messages to ovs-vswitchd by >>> the same main thread. (But it won't cause the original performance problem >>> any more because incremental processing engine will not recompute when >>> there is no input change). >>> >>> However, when the test scenario triggers recompute frequently, each >>> single change may take longer to be enforced in OVS, because of this new >>> approach. The later recompute iterations would slow down the previous >>> computed OVS flow installation. In your test you used parallel 10, which >>> means at any point there might be new changes from one client such as >>> creating new router that triggers recomputing, which can block the OVS flow >>> installation triggered earlier for another client. So overall you will see >>> much bigger latency for each individual test iteration. >>> >>> This can also explain why I didn't reproduce the problem in my >>> single-client single-node environment, since each iteration is serialized. >>> >>> [1] >>> https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e >>> >>> To prove this theory, could you help with two tests reusing your >>> environment? Thanks a lot! >>> >>> >> Thanks Han. I will try these and come back to you with the results. >> >> Numan >> >> >>> 1. Instead of parallelism of 10, try 1, to make sure the test is >>> serialized. I'd expect the result should be similar w/ v.s. w/o I-P. >>> >>> 2. Try below patch on the I-P version you are testing, to see if the >>> problem is gone. >>> ----8><--------------------------------------------><8--------------- >>> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c >>> index 043abd6..0fcaa72 100644 >>> --- a/ovn/controller/ofctrl.c >>> +++ b/ovn/controller/ofctrl.c >>> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired, >>> * in the correct state and not backlogged with existing flow_mods. >>> (Our >>> * criteria for being backlogged appear very conservative, but the >>> socket >>> * between ovn-controller and OVS provides some buffering.) */ >>> -static bool >>> +bool >>> ofctrl_can_put(void) >>> { >>> if (state != S_UPDATE_FLOWS >>> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h >>> index ed8918a..2b21c11 100644 >>> --- a/ovn/controller/ofctrl.h >>> +++ b/ovn/controller/ofctrl.h >>> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *, >>> const struct sbrec_meter_table *, >>> int64_t nb_cfg, >>> bool flow_changed); >>> +bool ofctrl_can_put(void); >>> void ofctrl_wait(void); >>> void ofctrl_destroy(void); >>> int64_t ofctrl_get_cur_cfg(void); >>> diff --git a/ovn/controller/ovn-controller.c >>> b/ovn/controller/ovn-controller.c >>> index c4883aa..c85c6fa 100644 >>> --- a/ovn/controller/ovn-controller.c >>> +++ b/ovn/controller/ovn-controller.c >>> @@ -1954,7 +1954,7 @@ main(int argc, char *argv[]) >>> >>> stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME, >>> time_msec()); >>> - if (ovnsb_idl_txn) { >>> + if (ovnsb_idl_txn && ofctrl_can_put()) { >>> engine_run(&en_flow_output, ++engine_run_id); >>> } >>> stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, >>> >> > > Hi Han, > > So far I could do just one run after applying your above suggested patch > with the I-P version and results look promising. > It seems to me the problem is gone. > > > +--------------------------------------------------------------------------------------------------------------------------+ > | Response Times (sec) > | > > +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ > | action | min | median | 90%ile | 95%ile > | max | avg | success | count | > > +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ > | ovn_network.ping_ports | 0.037 | 10.236 | 10.392 | 10.462 | 20.455 | > 7.15 | 100.0% | 1000 | > > +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ > | ovn_network.ping_ports | 0.036 | 10.255 | 10.448 | 11.323 | 20.791 | > 7.83 | 100.0% | 1000 | > > +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ > > The first row represents Non IP and the 2nd row represents IP + your > suggested patch. > The values are comparable and lot better compared to without your patch. > > On monday I will do more runs to be sure that the data is consistent and > get back to you. > > If the results are consistent, I would try to run the tests which Daniel > and Lucas ran on an openstack deployment. > > Thanks > Numan > > Glad to see the test result improved! Thanks a lot and looking forward to more data. Once it is finally confirmed, we can discuss whether this should be submitted as a formal patch considering real world scenarios.
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
