Hi Han,
> On 21 Jun 2019, at 08:16, Han Zhou <[email protected]> wrote: > > > > On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <[email protected]> > wrote: > > > > Thanks a lot Han for the answer! > > > > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <[email protected]> wrote: > > > > > > > > > > > > > > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <[email protected]> wrote: > > > > > > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > > > <[email protected]> wrote: > > > > > > > > > > Hi Han, all, > > > > > > > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > > > > > using OVN and wanted to present some results and issues that we've > > > > > found with the Incremental Processing feature in ovn-controller. Below > > > > > is the scenario that we executed: > > > > > > > > > > * 7 baremetal nodes setup: 3 controllers (running > > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > > > > > 2.10. > > > > > * The test consists on: > > > > > - Create openstack network (OVN LS), subnet and router > > > > > - Attach subnet to the router and set gw to the external network > > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > > > > > UDP, SSH and ICMP). > > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > > > > attaching it to a network namespace. > > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > > > > - Wait until the test can ping the port > > > > > * Running browbeat/rally with 16 simultaneous process to execute the > > > > > test above 150 times. > > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > > > > > the OpenStack/OVN resources. > > > > > > > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > > > > > 100% success but ovn-controller is quite loaded (as expected) in all > > > > > the nodes especially during the deletion phase: > > > > > > > > > > - Compute node: https://imgur.com/a/tzxfrIR > > > > > - Controller node (ovn-northd and ovsdb-servers): > > > > > https://imgur.com/a/8ffKKYF > > > > > > > > > > After conducting the tests above, we replaced ovn-controller in all 7 > > > > > nodes by the one with the current master branch (actually from last > > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > > > > results were to get less ovn-controller CPU usage and also better > > > > > times due to the Incremental Processing feature introduced recently. > > > > > However, the results don't look very good: > > > > > > > > > > - Compute node: https://imgur.com/a/wuq87F1 > > > > > - Controller node (ovn-northd and ovsdb-servers): > > > > > https://imgur.com/a/99kiyDp > > > > > > > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > > > > that it's much less in the Incremental Processing (IP) case which > > > > > apparently doesn't make much sense. This led us to think that perhaps > > > > > ovn-controller was not installing the necessary flows in the switch > > > > > and we confirmed this hypothesis by looking into the dataplane > > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > > > > when using ovn-controller from master. > > > > > > > > > > @Han, others, do you have any ideas as of what could be happening > > > > > here? We'll be able to use this setup for a few more days so let me > > > > > know if you want us to pull some other data/traces, ... > > > > > > > > > > Some other interesting things: > > > > > On each of the compute nodes, (with an almost evenly distributed > > > > > number of logical ports bound to them), the max amount of logical > > > > > flows in br-int is ~90K (by the end of the test, right before deleting > > > > > the resources). > > > > > > > > > > It looks like with the IP version, ovn-controller leaks some memory: > > > > > https://imgur.com/a/trQrhWd > > > > > While with OVS 2.10, it remains pretty flat during the test: > > > > > https://imgur.com/a/KCkIT4O > > > > > > > > Hi Daniel, Han, > > > > > > > > I just sent a small patch for the ovn-controller memory leak: > > > > https://patchwork.ozlabs.org/patch/1113758/ > > > > > > > > At least on my setup this is what valgrind was pointing at. > > > > > > > > Cheers, > > > > Dumitru > > > > > > > > > > > > > > Looking forward to hearing back :) > > > > > Daniel > > > > > > > > > > PS. Sorry for my previous email, I sent it by mistake without the > > > > > subject > > > > > _______________________________________________ > > > > > discuss mailing list > > > > > [email protected] > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > Thanks Daniel for the testing and reporting, and thanks Dumitru for > > > fixing the memory leak. > > > > > > Currently ovn-controller incremental processing only handles below SB > > > changes incrementally: > > > - logical_flow > > > - port_binding (for regular VIF binding NOT on current chassis) > > > - mc_group > > > - address_set > > > - port_group > > > - mac_binding > > > > > > So, in test scenario you described, since each iteration creates network > > > (SB datapath changes) and router ports (port_binding changes for non > > > VIF), the incremental processing would not help much, because most steps > > > in your test should trigger recompute. It would help if you create more > > > Fake VMs in each iteration, e.g. create 10 VMs or more on each LS. > > > Secondly, when VIF port-binding happens on current chassis, the > > > ovn-controller will still do re-compute, and because you have only 4 > > > compute nodes, so 1/4 of the compute node will still recompute even when > > > binding a regular VIF port. When you have more compute nodes you would > > > see incremental processing more effective. > > > > Got it, it makes sense (although then worst case, it should be at > > least what we had before and not worse but it can also be because > > we're mixing version here: 2.10 vs master). > > > > > > However, what really worries me is the 10% VM unreachable. I have one > > > confusion here on the test steps. The last step you described was: - Wait > > > until the test can ping the port. So if the VM is not pingable the test > > > won't continue? > > > > Sorry I should've explained it better. We wait for 2 minutes to the > > port to respond to pings, if it's not reachable then we continue with > > the next port (16 rally processes are running simultaneously so the > > rest of the process may be doing stuff at the same time). > > > > > > > > To debug the problem, the first thing is to identify what flows are > > > missing for the VMs that is unreachable. Could you do ovs-appctl > > > ofproto/trace for the ICMP flow of any VM with ping failure? And then, > > > please enable debug log for ovn-controller with ovs-appctl -t > > > ovn-controller vlog/set file:dbg. There may be too many logs so please > > > enable it for as short time as any VM with ping failure is reproduced. If > > > the last step "wait until the test can ping the port" is there then it > > > should be able to detect the first occurrence if the VM is not reachable > > > in e.g. 30 sec. > > > > We'll need to hack a bit here but let's see :) > > > > > > In the ovn-scale-test we didn't have data plane test, but this problem > > > was not seen in our live environment either, with a far larger scale. The > > > major difference in your test v.s. our environment are: > > > - We are runing with an older version. So there might be some > > > rebase/refactor problem caused this. To eliminate this, I'd suggest to > > > try a branch I created for 2.10 > > > (https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), which matches > > > the base test you did which is also 2.10. It may also eliminate > > > compatibility problem, if there is any, between OVN master branch and OVS > > > 2.10 as you mentioned is used in the test. > > > - We don't use Security Group (I guess the ~90k OVS flows you mentioned > > > were mainly introduced by the Security Group use, if all ports were put > > > in same group). The incremental processing is expected to be correct for > > > security-groups, and handling it incrementally because of address_set and > > > port_group incremental processing. However, since the testing only relied > > > on the regression tests, I am not 100% sure if the test coverage was > > > sufficient. So could you try disabling Security Group to rule out the > > > problem? > > > > Ok will try to repeat the tests without the SGs. > > > > > > Thanks, > > > Han > > > > Thanks once again! > > Daniel > > Hi Daniel, > > Any updates? Do you still see the 10% VM unreachable? > I’m working with Numan on this these past few days and he will send a more detailed update. The thing is that what I reported to be unreachable was after 2 minutes timeout waiting for the ping. However, increasing the timeout further showed that all the VMs were reachable but some of them took 3-4 minutes to respond to ping since when they became active. This is what we used for testing: https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad Numan will update soon with great findings he has made over the past days. Thanks! Daniel > Thanks, > Han
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
