Neat! Thanks folks :) I'll try to get an OSP setup where we can patch this and re-run the same tests than previous time to confirm but looks promising.
On Fri, Jul 19, 2019 at 11:12 PM Han Zhou <[email protected]> wrote: > > > > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique <[email protected]> wrote: >> >> >> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique <[email protected]> wrote: >>> >>> >>> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou <[email protected]> wrote: >>>> >>>> >>>> >>>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique <[email protected]> wrote: >>>> > >>>> > >>>> > >>>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez >>>> > <[email protected]> wrote: >>>> >> >>>> >> Thanks Numan for running these tests outside OpenStack! >>>> >> >>>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <[email protected]> >>>> >> wrote: >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou <[email protected]> wrote: >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou <[email protected]> wrote: >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique >>>> >> >> > <[email protected]> wrote: >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <[email protected]> >>>> >> >> > > wrote: >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez >>>> >> >> > >> <[email protected]> wrote: >>>> >> >> > >> > >>>> >> >> > >> > Thanks a lot Han for the answer! >>>> >> >> > >> > >>>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <[email protected]> >>>> >> >> > >> > wrote: >>>> >> >> > >> > > >>>> >> >> > >> > > >>>> >> >> > >> > > >>>> >> >> > >> > > >>>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara >>>> >> >> > >> > > <[email protected]> wrote: >>>> >> >> > >> > > > >>>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >>>> >> >> > >> > > > <[email protected]> wrote: >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > Hi Han, all, >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing >>>> >> >> > >> > > > > of OpenStack >>>> >> >> > >> > > > > using OVN and wanted to present some results and issues >>>> >> >> > >> > > > > that we've >>>> >> >> > >> > > > > found with the Incremental Processing feature in >>>> >> >> > >> > > > > ovn-controller. Below >>>> >> >> > >> > > > > is the scenario that we executed: >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >>>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 >>>> >> >> > >> > > > > compute nodes. OVS >>>> >> >> > >> > > > > 2.10. >>>> >> >> > >> > > > > * The test consists on: >>>> >> >> > >> > > > > - Create openstack network (OVN LS), subnet and router >>>> >> >> > >> > > > > - Attach subnet to the router and set gw to the >>>> >> >> > >> > > > > external network >>>> >> >> > >> > > > > - Create an OpenStack port and apply a Security Group >>>> >> >> > >> > > > > (ACLs to allow >>>> >> >> > >> > > > > UDP, SSH and ICMP). >>>> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes >>>> >> >> > >> > > > > (randomly) by >>>> >> >> > >> > > > > attaching it to a network namespace. >>>> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == >>>> >> >> > >> > > > > True' in NB) >>>> >> >> > >> > > > > - Wait until the test can ping the port >>>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process >>>> >> >> > >> > > > > to execute the >>>> >> >> > >> > > > > test above 150 times. >>>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat >>>> >> >> > >> > > > > will delete all >>>> >> >> > >> > > > > the OpenStack/OVN resources. >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some >>>> >> >> > >> > > > > results which showed >>>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as >>>> >> >> > >> > > > > expected) in all >>>> >> >> > >> > > > > the nodes especially during the deletion phase: >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>>> >> >> > >> > > > > https://imgur.com/a/8ffKKYF >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > After conducting the tests above, we replaced >>>> >> >> > >> > > > > ovn-controller in all 7 >>>> >> >> > >> > > > > nodes by the one with the current master branch >>>> >> >> > >> > > > > (actually from last >>>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers >>>> >> >> > >> > > > > but the >>>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). >>>> >> >> > >> > > > > The expected >>>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and >>>> >> >> > >> > > > > also better >>>> >> >> > >> > > > > times due to the Incremental Processing feature >>>> >> >> > >> > > > > introduced recently. >>>> >> >> > >> > > > > However, the results don't look very good: >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>>> >> >> > >> > > > > https://imgur.com/a/99kiyDp >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU >>>> >> >> > >> > > > > consumption is >>>> >> >> > >> > > > > that it's much less in the Incremental Processing (IP) >>>> >> >> > >> > > > > case which >>>> >> >> > >> > > > > apparently doesn't make much sense. This led us to >>>> >> >> > >> > > > > think that perhaps >>>> >> >> > >> > > > > ovn-controller was not installing the necessary flows >>>> >> >> > >> > > > > in the switch >>>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the >>>> >> >> > >> > > > > dataplane >>>> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were >>>> >> >> > >> > > > > unreachable via ping >>>> >> >> > >> > > > > when using ovn-controller from master. >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > @Han, others, do you have any ideas as of what could be >>>> >> >> > >> > > > > happening >>>> >> >> > >> > > > > here? We'll be able to use this setup for a few more >>>> >> >> > >> > > > > days so let me >>>> >> >> > >> > > > > know if you want us to pull some other data/traces, ... >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > Some other interesting things: >>>> >> >> > >> > > > > On each of the compute nodes, (with an almost evenly >>>> >> >> > >> > > > > distributed >>>> >> >> > >> > > > > number of logical ports bound to them), the max amount >>>> >> >> > >> > > > > of logical >>>> >> >> > >> > > > > flows in br-int is ~90K (by the end of the test, right >>>> >> >> > >> > > > > before deleting >>>> >> >> > >> > > > > the resources). >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > It looks like with the IP version, ovn-controller leaks >>>> >> >> > >> > > > > some memory: >>>> >> >> > >> > > > > https://imgur.com/a/trQrhWd >>>> >> >> > >> > > > > While with OVS 2.10, it remains pretty flat during the >>>> >> >> > >> > > > > test: >>>> >> >> > >> > > > > https://imgur.com/a/KCkIT4O >>>> >> >> > >> > > > >>>> >> >> > >> > > > Hi Daniel, Han, >>>> >> >> > >> > > > >>>> >> >> > >> > > > I just sent a small patch for the ovn-controller memory >>>> >> >> > >> > > > leak: >>>> >> >> > >> > > > https://patchwork.ozlabs.org/patch/1113758/ >>>> >> >> > >> > > > >>>> >> >> > >> > > > At least on my setup this is what valgrind was pointing >>>> >> >> > >> > > > at. >>>> >> >> > >> > > > >>>> >> >> > >> > > > Cheers, >>>> >> >> > >> > > > Dumitru >>>> >> >> > >> > > > >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > Looking forward to hearing back :) >>>> >> >> > >> > > > > Daniel >>>> >> >> > >> > > > > >>>> >> >> > >> > > > > PS. Sorry for my previous email, I sent it by mistake >>>> >> >> > >> > > > > without the subject >>>> >> >> > >> > > > > _______________________________________________ >>>> >> >> > >> > > > > discuss mailing list >>>> >> >> > >> > > > > [email protected] >>>> >> >> > >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>> >> >> > >> > > >>>> >> >> > >> > > Thanks Daniel for the testing and reporting, and thanks >>>> >> >> > >> > > Dumitru for fixing the memory leak. >>>> >> >> > >> > > >>>> >> >> > >> > > Currently ovn-controller incremental processing only >>>> >> >> > >> > > handles below SB changes incrementally: >>>> >> >> > >> > > - logical_flow >>>> >> >> > >> > > - port_binding (for regular VIF binding NOT on current >>>> >> >> > >> > > chassis) >>>> >> >> > >> > > - mc_group >>>> >> >> > >> > > - address_set >>>> >> >> > >> > > - port_group >>>> >> >> > >> > > - mac_binding >>>> >> >> > >> > > >>>> >> >> > >> > > So, in test scenario you described, since each iteration >>>> >> >> > >> > > creates network (SB datapath changes) and router ports >>>> >> >> > >> > > (port_binding changes for non VIF), the incremental >>>> >> >> > >> > > processing would not help much, because most steps in your >>>> >> >> > >> > > test should trigger recompute. It would help if you create >>>> >> >> > >> > > more Fake VMs in each iteration, e.g. create 10 VMs or more >>>> >> >> > >> > > on each LS. Secondly, when VIF port-binding happens on >>>> >> >> > >> > > current chassis, the ovn-controller will still do >>>> >> >> > >> > > re-compute, and because you have only 4 compute nodes, so >>>> >> >> > >> > > 1/4 of the compute node will still recompute even when >>>> >> >> > >> > > binding a regular VIF port. When you have more compute >>>> >> >> > >> > > nodes you would see incremental processing more effective. >>>> >> >> > >> > >>>> >> >> > >> > Got it, it makes sense (although then worst case, it should >>>> >> >> > >> > be at >>>> >> >> > >> > least what we had before and not worse but it can also be >>>> >> >> > >> > because >>>> >> >> > >> > we're mixing version here: 2.10 vs master). >>>> >> >> > >> > > >>>> >> >> > >> > > However, what really worries me is the 10% VM unreachable. >>>> >> >> > >> > > I have one confusion here on the test steps. The last step >>>> >> >> > >> > > you described was: - Wait until the test can ping the port. >>>> >> >> > >> > > So if the VM is not pingable the test won't continue? >>>> >> >> > >> > >>>> >> >> > >> > Sorry I should've explained it better. We wait for 2 minutes >>>> >> >> > >> > to the >>>> >> >> > >> > port to respond to pings, if it's not reachable then we >>>> >> >> > >> > continue with >>>> >> >> > >> > the next port (16 rally processes are running simultaneously >>>> >> >> > >> > so the >>>> >> >> > >> > rest of the process may be doing stuff at the same time). >>>> >> >> > >> > >>>> >> >> > >> > > >>>> >> >> > >> > > To debug the problem, the first thing is to identify what >>>> >> >> > >> > > flows are missing for the VMs that is unreachable. Could >>>> >> >> > >> > > you do ovs-appctl ofproto/trace for the ICMP flow of any VM >>>> >> >> > >> > > with ping failure? And then, please enable debug log for >>>> >> >> > >> > > ovn-controller with ovs-appctl -t ovn-controller vlog/set >>>> >> >> > >> > > file:dbg. There may be too many logs so please enable it >>>> >> >> > >> > > for as short time as any VM with ping failure is >>>> >> >> > >> > > reproduced. If the last step "wait until the test can ping >>>> >> >> > >> > > the port" is there then it should be able to detect the >>>> >> >> > >> > > first occurrence if the VM is not reachable in e.g. 30 sec. >>>> >> >> > >> > >>>> >> >> > >> > We'll need to hack a bit here but let's see :) >>>> >> >> > >> > > >>>> >> >> > >> > > In the ovn-scale-test we didn't have data plane test, but >>>> >> >> > >> > > this problem was not seen in our live environment either, >>>> >> >> > >> > > with a far larger scale. The major difference in your test >>>> >> >> > >> > > v.s. our environment are: >>>> >> >> > >> > > - We are runing with an older version. So there might be >>>> >> >> > >> > > some rebase/refactor problem caused this. To eliminate >>>> >> >> > >> > > this, I'd suggest to try a branch I created for 2.10 >>>> >> >> > >> > > (https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), >>>> >> >> > >> > > which matches the base test you did which is also 2.10. It >>>> >> >> > >> > > may also eliminate compatibility problem, if there is any, >>>> >> >> > >> > > between OVN master branch and OVS 2.10 as you mentioned is >>>> >> >> > >> > > used in the test. >>>> >> >> > >> > > - We don't use Security Group (I guess the ~90k OVS flows >>>> >> >> > >> > > you mentioned were mainly introduced by the Security Group >>>> >> >> > >> > > use, if all ports were put in same group). The incremental >>>> >> >> > >> > > processing is expected to be correct for security-groups, >>>> >> >> > >> > > and handling it incrementally because of address_set and >>>> >> >> > >> > > port_group incremental processing. However, since the >>>> >> >> > >> > > testing only relied on the regression tests, I am not 100% >>>> >> >> > >> > > sure if the test coverage was sufficient. So could you try >>>> >> >> > >> > > disabling Security Group to rule out the problem? >>>> >> >> > >> > >>>> >> >> > >> > Ok will try to repeat the tests without the SGs. >>>> >> >> > >> > > >>>> >> >> > >> > > Thanks, >>>> >> >> > >> > > Han >>>> >> >> > >> > >>>> >> >> > >> > Thanks once again! >>>> >> >> > >> > Daniel >>>> >> >> > >> >>>> >> >> > >> Hi Daniel, >>>> >> >> > >> >>>> >> >> > >> Any updates? Do you still see the 10% VM unreachable >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > >> Thanks, >>>> >> >> > >> Han >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > Hi Han, >>>> >> >> > > >>>> >> >> > > As such there is no datapath impact. After increasing the ping >>>> >> >> > > wait timeout value from 120 seconds to 180 seconds its 100% now. >>>> >> >> > > >>>> >> >> > > But the time taken to program the flows is too huge when >>>> >> >> > > compared to OVN master without IP patches. >>>> >> >> > > Here is some data - http://paste.openstack.org/show/753224/ . >>>> >> >> > > I am still investigating it. I will update my findings in some >>>> >> >> > > time. >>>> >> >> > > >>>> >> >> > > Please see the times for the action - vm.wait_for_ping >>>> >> >> > > >>>> >> >> > >>>> >> >> > Thanks Numan for the investigation and update. Glad to hear there >>>> >> >> > is no correctness issue, but sorry for the slowness in your test >>>> >> >> > scenario. I expect that the operations in your test trigger >>>> >> >> > recomputing and the worst case should be similar performance as >>>> >> >> > withour I-P. It is weird that it turned out so much slower in your >>>> >> >> > test. There can be some extra overhead when it tries to do >>>> >> >> > incremental processing and then fallback to full recompute, but it >>>> >> >> > shouldn't cause that big difference. It might be that for some >>>> >> >> > reason the main loop iteration is triggered more times >>>> >> >> > unnecessarily. I'd suggest to compare the coverage counter >>>> >> >> > "lflow_run" between the tests, and also check perf report to see >>>> >> >> > if the hotspot is somewhere else. (Sorry that I can't provide >>>> >> >> > full-time help now since I am still on vacation but I will try to >>>> >> >> > be useful if things are blocked) >>>> >> >> >>>> >> >> Hi Numan/Daniel, do you have any new findings on why I-P got worse >>>> >> >> result in your test? The extremely long latency (2 - 3 min) shown in >>>> >> >> your report reminds me a similar problem I reported before: >>>> >> >> https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346321.html >>>> >> >> >>>> >> >> The root cause of that problem was still not clear. In that report, >>>> >> >> the extremely long latency (7 min) was observed without I-P and it >>>> >> >> didn't happen with I-P. If it is the same problem, then I suspect it >>>> >> >> is not related to I-P or non I-P, but some problem related to ovsdb >>>> >> >> monitor condition change. To confirm if it is same problem, could >>>> >> >> you: >>>> >> >> 1. pause the test when the scale is big enough (e.g. when the test >>>> >> >> is almost completed), and then >>>> >> >> 2. enable ovn-controller debug log, and then >>>> >> >> 3. run one more iteration of the test, and see if the time was spent >>>> >> >> on waiting for SB DB update notification. >>>> >> >> >>>> >> >> Please ignore my speculation above if you already found the root >>>> >> >> cause and it would be great if you could share it :) >>>> >> > >>>> >> > >>>> >> > Thanks for sharing this Han. >>>> >> > >>>> >> > I do not have any new findings. Yesterday I ran ovn-scale-test >>>> >> > comparing OVN with IP vs without IP (using the master branch). >>>> >> > The test creates a new logical switch, adds it to a router, few ACLs >>>> >> > and creates 2 logical ports and pings between them. >>>> >> > I am using physical deployment which creates actual namespaces >>>> >> > instead of sandboxes. >>>> >> > >>>> >> > The results doesn't show any huge difference between the two. >>>> >> 2300 vs 2900 seconds total time or 44 vs 56 seconds for the 95%ile? >>>> >> It is not negligible IMHO. It's a >25% penalty with the IP. Maybe I >>>> >> missed something from the results? >>>> >> >>>> > >>>> > Initially I ran with ovn-nbctl running commands as one batch (ie >>>> > combining commands with "--"). The results were very similar. Like this >>>> > one >>>> > >>>> > ******* >>>> > >>>> > With non IP - ovn-nbctl NO daemon mode >>>> > >>>> > +--------------------------------------------------------------------------------------------------------------+ >>>> > | Response Times (sec) >>>> > | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > | action | min | median | 90%ile | >>>> > 95%ile | max | avg | success | count | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > | ovn_network.create_routers | 0.288 | 0.429 | 5.454 | >>>> > 5.538 | 20.531 | 1.523 | 100.0% | 1000 | >>>> > | ovn.create_lswitch | 0.046 | 0.139 | 0.202 | >>>> > 5.084 | 10.259 | 0.441 | 100.0% | 1000 | >>>> > | ovn_network.connect_network_to_router | 0.164 | 0.411 | 5.307 | >>>> > 5.491 | 15.636 | 1.128 | 100.0% | 1000 | >>>> > | ovn.create_lport | 0.11 | 0.272 | 0.478 | >>>> > 5.284 | 15.496 | 0.835 | 100.0% | 1000 | >>>> > | ovn_network.bind_port | 1.302 | 2.367 | 2.834 | 3.24 >>>> > | 12.409 | 2.527 | 100.0% | 1000 | >>>> > | ovn_network.wait_port_up | 0.0 | 0.001 | 0.001 | >>>> > 0.001 | 0.002 | 0.001 | 100.0% | 1000 | >>>> > | ovn_network.ping_ports | 0.04 | 10.24 | 10.397 | >>>> > 10.449 | 10.82 | 6.767 | 100.0% | 1000 | >>>> > | total | 2.219 | 13.903 | 23.068 | >>>> > 24.538 | 49.437 | 13.222 | 100.0% | 1000 | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > >>>> > >>>> > With IP - ovn-nbctl NO daemon mode >>>> > >>>> > concurrency - 10 >>>> > >>>> > +--------------------------------------------------------------------------------------------------------------+ >>>> > | Response Times (sec) >>>> > | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > | action | min | median | 90%ile | >>>> > 95%ile | max | avg | success | count | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > | ovn_network.create_routers | 0.274 | 0.402 | 0.493 | 0.51 >>>> > | 0.584 | 0.408 | 100.0% | 1000 | >>>> > | ovn.create_lswitch | 0.064 | 0.137 | 0.213 | >>>> > 0.244 | 0.33 | 0.146 | 100.0% | 1000 | >>>> > | ovn_network.connect_network_to_router | 0.203 | 0.395 | 0.677 | >>>> > 0.766 | 0.912 | 0.427 | 100.0% | 1000 | >>>> > | ovn.create_lport | 0.13 | 0.261 | 0.437 | >>>> > 0.497 | 0.604 | 0.283 | 100.0% | 1000 | >>>> > | ovn_network.bind_port | 1.307 | 2.374 | 2.816 | >>>> > 2.904 | 3.401 | 2.325 | 100.0% | 1000 | >>>> > | ovn_network.wait_port_up | 0.0 | 0.001 | 0.001 | >>>> > 0.001 | 0.002 | 0.001 | 100.0% | 1000 | >>>> > | ovn_network.ping_ports | 0.028 | 10.237 | 10.422 | >>>> > 10.474 | 11.281 | 6.453 | 100.0% | 1000 | >>>> > | total | 2.251 | 13.631 | 14.822 | >>>> > 15.008 | 15.901 | 10.044 | 100.0% | 1000 | >>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> > >>>> > ***************** >>>> > >>>> > The results I shared in the previous email were with ACLs added and >>>> > ovn-nbctl - batch mode disabled. >>>> > >>>> > I agree with you. Let me do few more runs to be sure that the results >>>> > are consistent. >>>> > >>>> > Thanks >>>> > Numan >>>> > >>>> > >>>> >> > I will test with OVN 2.9 vs 2.11 master along with what you have >>>> >> > suggested above and see if there are any problems related to ovsdb >>>> >> > monitor condition change. >>>> >> > >>>> >> > Thanks >>>> >> > Numan >>>> >> > >>>> >> > Below are the results >>>> >> > >>>> >> > >>>> >> > With IP master - nbctl daemon node - No batch mode >>>> >> > concurrency - 10 >>>> >> > >>>> >> > +--------------------------------------------------------------------------------------------------------------+ >>>> >> > | Response Times (sec) >>>> >> > | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > | action | min | median | 90%ile | >>>> >> > 95%ile | max | avg | success | count | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > | ovn_network.create_routers | 0.269 | 0.661 | 10.426 | >>>> >> > 15.422 | 37.259 | 3.721 | 100.0% | 1000 | >>>> >> > | ovn.create_lswitch | 0.313 | 0.45 | 12.107 | >>>> >> > 15.373 | 30.405 | 4.185 | 100.0% | 1000 | >>>> >> > | ovn_network.connect_network_to_router | 0.163 | 0.255 | 10.121 | >>>> >> > 10.64 | 20.475 | 2.655 | 100.0% | 1000 | >>>> >> > | ovn.create_lport | 0.351 | 0.514 | 12.255 | >>>> >> > 15.511 | 34.74 | 4.621 | 100.0% | 1000 | >>>> >> > | ovn_network.bind_port | 1.362 | 2.447 | 7.34 | >>>> >> > 7.651 | 17.651 | 3.146 | 100.0% | 1000 | >>>> >> > | ovn_network.wait_port_up | 0.086 | 2.734 | 5.272 | >>>> >> > 7.827 | 22.717 | 2.957 | 100.0% | 1000 | >>>> >> > | ovn_network.ping_ports | 0.038 | 10.196 | 20.285 | >>>> >> > 20.39 | 40.74 | 7.52 | 100.0% | 1000 | >>>> >> > | total | 2.862 | 27.267 | 49.956 | >>>> >> > 56.39 | 90.884 | 28.808 | 100.0% | 1000 | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > Load duration: 2950.4133141 >>>> >> > Full duration: 2951.58845997 seconds >>>> >> > >>>> >> > *********** >>>> >> > With non IP - nbctl daemin node -ACLs - No batch mode >>>> >> > >>>> >> > concurrency - 10 >>>> >> > >>>> >> > +--------------------------------------------------------------------------------------------------------------+ >>>> >> > | Response Times (sec) >>>> >> > | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > | action | min | median | 90%ile | >>>> >> > 95%ile | max | avg | success | count | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > | ovn_network.create_routers | 0.267 | 0.421 | 10.395 | >>>> >> > 10.735 | 25.501 | 3.09 | 100.0% | 1000 | >>>> >> > | ovn.create_lswitch | 0.314 | 0.408 | 10.331 | >>>> >> > 10.483 | 25.357 | 3.049 | 100.0% | 1000 | >>>> >> > | ovn_network.connect_network_to_router | 0.153 | 0.249 | 6.552 | >>>> >> > 10.268 | 20.545 | 2.236 | 100.0% | 1000 | >>>> >> > | ovn.create_lport | 0.344 | 0.49 | 10.566 | >>>> >> > 15.428 | 25.542 | 3.906 | 100.0% | 1000 | >>>> >> > | ovn_network.bind_port | 1.372 | 2.409 | 7.437 | >>>> >> > 7.665 | 17.518 | 3.192 | 100.0% | 1000 | >>>> >> > | ovn_network.wait_port_up | 0.086 | 1.323 | 5.157 | >>>> >> > 7.769 | 20.166 | 2.291 | 100.0% | 1000 | >>>> >> > | ovn_network.ping_ports | 0.034 | 2.077 | 10.347 | >>>> >> > 10.427 | 20.307 | 5.123 | 100.0% | 1000 | >>>> >> > | total | 3.109 | 21.26 | 39.245 | >>>> >> > 44.495 | 70.197 | 22.889 | 100.0% | 1000 | >>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+ >>>> >> > Load duration: 2328.11378407 >>>> >> > Full duration: 2334.43504095 seconds >>>> >> > >>>> >> >>>> >>>> Hi Numan/Daniel, >>>> >>>> I spent some time investigating this problem you reported. Thanks Numan >>>> for the offline help sharing the details. >>>> >>>> Although I still didn't reproduce the slowness in my current single node >>>> testing env with almost same steps and ACLs shared by Numan, I think I may >>>> have figured out a highy probable cause of what you have seen. >>>> >>>> Here is my theory: there is a difference between the I-P and non-I-P in >>>> the main loop. The non-I-P version checks ofctrl_can_put() before doing >>>> any flow computation (which is introduced to solve a serious performance >>>> problem when there are many OVS flows on a single node, see [1]). When >>>> worked out the I-P version, I found this may not be the best approach, >>>> since there can be new incremental changes coming and we want to process >>>> them in current iteration incrementally, so that we don't need to fallback >>>> to recompute in next iteration. So this logic is changed so that we always >>>> prioritize computing new changes and keeping the desired flow table up to >>>> date, while the in-flight messages to ovs-vswitchd may still pending for >>>> an older version of desired state. In the end the final desired state will >>>> be synced again to ovs-vswitchd. If there are new changes that triggers >>>> recompute again, the recompute (which is always slow) will slow down the >>>> ofctrl_run() which keeps sending old pending messages to ovs-vswitchd by the same main thread. (But it won't cause the original performance problem any more because incremental processing engine will not recompute when there is no input change). >>>> >>>> However, when the test scenario triggers recompute frequently, each single >>>> change may take longer to be enforced in OVS, because of this new >>>> approach. The later recompute iterations would slow down the previous >>>> computed OVS flow installation. In your test you used parallel 10, which >>>> means at any point there might be new changes from one client such as >>>> creating new router that triggers recomputing, which can block the OVS >>>> flow installation triggered earlier for another client. So overall you >>>> will see much bigger latency for each individual test iteration. >>>> >>>> This can also explain why I didn't reproduce the problem in my >>>> single-client single-node environment, since each iteration is serialized. >>>> >>>> [1] >>>> https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e >>>> >>>> To prove this theory, could you help with two tests reusing your >>>> environment? Thanks a lot! >>>> >>> >>> Thanks Han. I will try these and come back to you with the results. >>> >>> Numan >>> >>>> >>>> 1. Instead of parallelism of 10, try 1, to make sure the test is >>>> serialized. I'd expect the result should be similar w/ v.s. w/o I-P. >>>> >>>> 2. Try below patch on the I-P version you are testing, to see if the >>>> problem is gone. >>>> ----8><--------------------------------------------><8--------------- >>>> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c >>>> index 043abd6..0fcaa72 100644 >>>> --- a/ovn/controller/ofctrl.c >>>> +++ b/ovn/controller/ofctrl.c >>>> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired, >>>> * in the correct state and not backlogged with existing flow_mods. (Our >>>> * criteria for being backlogged appear very conservative, but the socket >>>> * between ovn-controller and OVS provides some buffering.) */ >>>> -static bool >>>> +bool >>>> ofctrl_can_put(void) >>>> { >>>> if (state != S_UPDATE_FLOWS >>>> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h >>>> index ed8918a..2b21c11 100644 >>>> --- a/ovn/controller/ofctrl.h >>>> +++ b/ovn/controller/ofctrl.h >>>> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *, >>>> const struct sbrec_meter_table *, >>>> int64_t nb_cfg, >>>> bool flow_changed); >>>> +bool ofctrl_can_put(void); >>>> void ofctrl_wait(void); >>>> void ofctrl_destroy(void); >>>> int64_t ofctrl_get_cur_cfg(void); >>>> diff --git a/ovn/controller/ovn-controller.c >>>> b/ovn/controller/ovn-controller.c >>>> index c4883aa..c85c6fa 100644 >>>> --- a/ovn/controller/ovn-controller.c >>>> +++ b/ovn/controller/ovn-controller.c >>>> @@ -1954,7 +1954,7 @@ main(int argc, char *argv[]) >>>> >>>> stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME, >>>> time_msec()); >>>> - if (ovnsb_idl_txn) { >>>> + if (ovnsb_idl_txn && ofctrl_can_put()) { >>>> engine_run(&en_flow_output, ++engine_run_id); >>>> } >>>> stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, >> >> >> >> Hi Han, >> >> So far I could do just one run after applying your above suggested patch >> with the I-P version and results look promising. >> It seems to me the problem is gone. >> >> +--------------------------------------------------------------------------------------------------------------------------+ >> | Response Times (sec) >> | >> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ >> | action | min | median | 90%ile | 95%ile | >> max | avg | success | count | >> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ >> | ovn_network.ping_ports | 0.037 | 10.236 | 10.392 | 10.462 | 20.455 | >> 7.15 | 100.0% | 1000 | >> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ >> | ovn_network.ping_ports | 0.036 | 10.255 | 10.448 | 11.323 | 20.791 | >> 7.83 | 100.0% | 1000 | >> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+ >> >> The first row represents Non IP and the 2nd row represents IP + your >> suggested patch. >> The values are comparable and lot better compared to without your patch. >> >> On monday I will do more runs to be sure that the data is consistent and get >> back to you. >> >> If the results are consistent, I would try to run the tests which Daniel and >> Lucas ran on an openstack deployment. >> >> Thanks >> Numan >> > > Glad to see the test result improved! Thanks a lot and looking forward to > more data. Once it is finally confirmed, we can discuss whether this should > be submitted as a formal patch considering real world scenarios. _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
