Thanks Numan for running these tests outside OpenStack!

On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <nusid...@redhat.com> wrote:
>
>
>
> On Tue, Jul 9, 2019 at 11:05 AM Han Zhou <zhou...@gmail.com> wrote:
>>
>>
>>
>> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou <zhou...@gmail.com> wrote:
>> >
>> >
>> >
>> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <nusid...@redhat.com> 
>> > wrote:
>> > >
>> > >
>> > >
>> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <zhou...@gmail.com> wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
>> > >> <dalva...@redhat.com> wrote:
>> > >> >
>> > >> > Thanks a lot Han for the answer!
>> > >> >
>> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <zhou...@gmail.com> wrote:
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <dce...@redhat.com> 
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> > >> > > > <dalva...@redhat.com> wrote:
>> > >> > > > >
>> > >> > > > > Hi Han, all,
>> > >> > > > >
>> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of 
>> > >> > > > > OpenStack
>> > >> > > > > using OVN and wanted to present some results and issues that 
>> > >> > > > > we've
>> > >> > > > > found with the Incremental Processing feature in 
>> > >> > > > > ovn-controller. Below
>> > >> > > > > is the scenario that we executed:
>> > >> > > > >
>> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute 
>> > >> > > > > nodes. OVS
>> > >> > > > > 2.10.
>> > >> > > > > * The test consists on:
>> > >> > > > >   - Create openstack network (OVN LS), subnet and router
>> > >> > > > >   - Attach subnet to the router and set gw to the external 
>> > >> > > > > network
>> > >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs 
>> > >> > > > > to allow
>> > >> > > > > UDP, SSH and ICMP).
>> > >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
>> > >> > > > > attaching it to a network namespace.
>> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in 
>> > >> > > > > NB)
>> > >> > > > >   - Wait until the test can ping the port
>> > >> > > > > * Running browbeat/rally with 16 simultaneous process to 
>> > >> > > > > execute the
>> > >> > > > > test above 150 times.
>> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete 
>> > >> > > > > all
>> > >> > > > > the OpenStack/OVN resources.
>> > >> > > > >
>> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which 
>> > >> > > > > showed
>> > >> > > > > 100% success but ovn-controller is quite loaded (as expected) 
>> > >> > > > > in all
>> > >> > > > > the nodes especially during the deletion phase:
>> > >> > > > >
>> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>> > >> > > > > https://imgur.com/a/8ffKKYF
>> > >> > > > >
>> > >> > > > > After conducting the tests above, we replaced ovn-controller in 
>> > >> > > > > all 7
>> > >> > > > > nodes by the one with the current master branch (actually from 
>> > >> > > > > last
>> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
>> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The 
>> > >> > > > > expected
>> > >> > > > > results were to get less ovn-controller CPU usage and also 
>> > >> > > > > better
>> > >> > > > > times due to the Incremental Processing feature introduced 
>> > >> > > > > recently.
>> > >> > > > > However, the results don't look very good:
>> > >> > > > >
>> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>> > >> > > > > https://imgur.com/a/99kiyDp
>> > >> > > > >
>> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU 
>> > >> > > > > consumption is
>> > >> > > > > that it's much less in the Incremental Processing (IP) case 
>> > >> > > > > which
>> > >> > > > > apparently doesn't make much sense. This led us to think that 
>> > >> > > > > perhaps
>> > >> > > > > ovn-controller was not installing the necessary flows in the 
>> > >> > > > > switch
>> > >> > > > > and we confirmed this hypothesis by looking into the dataplane
>> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via 
>> > >> > > > > ping
>> > >> > > > > when using ovn-controller from master.
>> > >> > > > >
>> > >> > > > > @Han, others, do you have any ideas as of what could be 
>> > >> > > > > happening
>> > >> > > > > here? We'll be able to use this setup for a few more days so 
>> > >> > > > > let me
>> > >> > > > > know if you want us to pull some other data/traces, ...
>> > >> > > > >
>> > >> > > > > Some other interesting things:
>> > >> > > > > On each of the compute nodes, (with an almost evenly distributed
>> > >> > > > > number of logical ports bound to them), the max amount of 
>> > >> > > > > logical
>> > >> > > > > flows in br-int is ~90K (by the end of the test, right before 
>> > >> > > > > deleting
>> > >> > > > > the resources).
>> > >> > > > >
>> > >> > > > > It looks like with the IP version, ovn-controller leaks some 
>> > >> > > > > memory:
>> > >> > > > > https://imgur.com/a/trQrhWd
>> > >> > > > > While with OVS 2.10, it remains pretty flat during the test:
>> > >> > > > > https://imgur.com/a/KCkIT4O
>> > >> > > >
>> > >> > > > Hi Daniel, Han,
>> > >> > > >
>> > >> > > > I just sent a small patch for the ovn-controller memory leak:
>> > >> > > > https://patchwork.ozlabs.org/patch/1113758/
>> > >> > > >
>> > >> > > > At least on my setup this is what valgrind was pointing at.
>> > >> > > >
>> > >> > > > Cheers,
>> > >> > > > Dumitru
>> > >> > > >
>> > >> > > > >
>> > >> > > > > Looking forward to hearing back :)
>> > >> > > > > Daniel
>> > >> > > > >
>> > >> > > > > PS. Sorry for my previous email, I sent it by mistake without 
>> > >> > > > > the subject
>> > >> > > > > _______________________________________________
>> > >> > > > > discuss mailing list
>> > >> > > > > disc...@openvswitch.org
>> > >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > >> > >
>> > >> > > Thanks Daniel for the testing and reporting, and thanks Dumitru for 
>> > >> > > fixing the memory leak.
>> > >> > >
>> > >> > > Currently ovn-controller incremental processing only handles below 
>> > >> > > SB changes incrementally:
>> > >> > > - logical_flow
>> > >> > > - port_binding (for regular VIF binding NOT on current chassis)
>> > >> > > - mc_group
>> > >> > > - address_set
>> > >> > > - port_group
>> > >> > > - mac_binding
>> > >> > >
>> > >> > > So, in test scenario you described, since each iteration creates 
>> > >> > > network (SB datapath changes) and router ports (port_binding 
>> > >> > > changes for non VIF), the incremental processing would not help 
>> > >> > > much, because most steps in your test should trigger recompute. It 
>> > >> > > would help if you create more Fake VMs in each iteration, e.g. 
>> > >> > > create 10 VMs or more on each LS. Secondly, when VIF port-binding 
>> > >> > > happens on current chassis, the ovn-controller will still do 
>> > >> > > re-compute, and because you have only 4 compute nodes, so 1/4 of 
>> > >> > > the compute node will still recompute even when binding a regular 
>> > >> > > VIF port. When you have more compute nodes you would see 
>> > >> > > incremental processing more effective.
>> > >> >
>> > >> > Got it, it makes sense (although then worst case, it should be at
>> > >> > least what we had before and not worse but it can also be because
>> > >> > we're mixing version here: 2.10 vs master).
>> > >> > >
>> > >> > > However, what really worries me is the 10% VM unreachable. I have 
>> > >> > > one confusion here on the test steps. The last step you described 
>> > >> > > was: - Wait until the test can ping the port. So if the VM is not 
>> > >> > > pingable the test won't continue?
>> > >> >
>> > >> > Sorry I should've explained it better. We wait for 2 minutes to the
>> > >> > port to respond to pings, if it's not reachable then we continue with
>> > >> > the next port (16 rally processes are running simultaneously so the
>> > >> > rest of the process may be doing stuff at the same time).
>> > >> >
>> > >> > >
>> > >> > > To debug the problem, the first thing is to identify what flows are 
>> > >> > > missing for the VMs that is unreachable. Could you do ovs-appctl 
>> > >> > > ofproto/trace for the ICMP flow of any VM with ping failure? And 
>> > >> > > then, please enable debug log for ovn-controller with ovs-appctl -t 
>> > >> > > ovn-controller vlog/set file:dbg. There may be too many logs so 
>> > >> > > please enable it for as short time as any VM with ping failure is 
>> > >> > > reproduced. If the last step "wait until the test can ping the 
>> > >> > > port" is there then it should be able to detect the first 
>> > >> > > occurrence if the VM is not reachable in e.g. 30 sec.
>> > >> >
>> > >> > We'll need to hack a bit here but let's see :)
>> > >> > >
>> > >> > > In the ovn-scale-test we didn't have data plane test, but this 
>> > >> > > problem was not seen in our live environment either, with a far 
>> > >> > > larger scale. The major difference in your test v.s. our 
>> > >> > > environment are:
>> > >> > > - We are runing with an older version. So there might be some 
>> > >> > > rebase/refactor problem caused this. To eliminate this, I'd suggest 
>> > >> > > to try a branch I created for 2.10 
>> > >> > > (https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), which 
>> > >> > > matches the base test you did which is also 2.10. It may also 
>> > >> > > eliminate compatibility problem, if there is any, between OVN 
>> > >> > > master branch and OVS 2.10 as you mentioned is used in the test.
>> > >> > > - We don't use Security Group (I guess the  ~90k OVS flows you 
>> > >> > > mentioned were mainly introduced by the Security Group use, if all 
>> > >> > > ports were put in same group). The incremental processing is 
>> > >> > > expected to be correct for security-groups, and handling it 
>> > >> > > incrementally because of address_set and port_group incremental 
>> > >> > > processing. However, since the testing only relied on the 
>> > >> > > regression tests, I am not 100% sure if the test coverage was 
>> > >> > > sufficient. So could you try disabling Security Group to rule out 
>> > >> > > the problem?
>> > >> >
>> > >> > Ok will try to repeat the tests without the SGs.
>> > >> > >
>> > >> > > Thanks,
>> > >> > > Han
>> > >> >
>> > >> > Thanks once again!
>> > >> > Daniel
>> > >>
>> > >> Hi Daniel,
>> > >>
>> > >> Any updates? Do you still see the 10% VM unreachable
>> > >>
>> > >>
>> > >> Thanks,
>> > >> Han
>> > >
>> > >
>> > > Hi Han,
>> > >
>> > > As such there is no datapath impact. After increasing the ping wait 
>> > > timeout value from 120 seconds to 180 seconds its 100% now.
>> > >
>> > > But the time taken to program the flows is too huge when compared to OVN 
>> > > master without IP patches.
>> > > Here is some data -  http://paste.openstack.org/show/753224/ .  I am 
>> > > still investigating it. I will update my findings in some time.
>> > >
>> > > Please see the times for the action - vm.wait_for_ping
>> > >
>> >
>> > Thanks Numan for the investigation and update. Glad to hear there is no 
>> > correctness issue, but sorry for the slowness in your test scenario. I 
>> > expect that the operations in your test trigger recomputing and the worst 
>> > case should be similar performance as withour I-P. It is weird that it 
>> > turned out so much slower in your test. There can be some extra overhead 
>> > when it tries to do incremental processing and then fallback to full 
>> > recompute, but it shouldn't cause that big difference. It might be that 
>> > for some reason the main loop iteration is triggered more times 
>> > unnecessarily. I'd suggest to compare the coverage counter "lflow_run" 
>> > between the tests, and also check perf report to see if the hotspot is 
>> > somewhere else. (Sorry that I can't provide full-time help now since I am 
>> > still on vacation but I will try to be useful if things are blocked)
>>
>> Hi Numan/Daniel, do you have any new findings on why I-P got worse result in 
>> your test? The extremely long latency (2 - 3 min) shown in your report 
>> reminds me a similar problem I reported before: 
>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346321.html
>>
>> The root cause of that problem was still not clear. In that report, the 
>> extremely long latency (7 min) was observed without I-P and it didn't happen 
>> with I-P. If it is the same problem, then I suspect it is not related to I-P 
>> or non I-P, but some problem related to ovsdb monitor condition change. To 
>> confirm if it is same problem, could you:
>> 1. pause the test when the scale is big enough (e.g. when the test is almost 
>> completed), and then
>> 2. enable ovn-controller debug log, and then
>> 3. run one more iteration of the test, and see if the time was spent on 
>> waiting for SB DB update notification.
>>
>> Please ignore my speculation above if you already found the root cause and 
>> it would be great if you could share it :)
>
>
> Thanks for sharing this Han.
>
> I do not have any new findings. Yesterday I ran ovn-scale-test comparing OVN 
> with IP vs without IP (using the master branch).
> The test creates a new logical switch, adds it to a router, few ACLs and 
> creates 2 logical ports and pings between them.
> I am using physical deployment which creates actual namespaces instead of 
> sandboxes.
>
> The results doesn't show any huge difference between the two.
2300 vs 2900 seconds total time or  44 vs 56 seconds for the 95%ile?
It is not negligible IMHO. It's a >25% penalty with the IP. Maybe I
missed something from the results?

> I will test with OVN 2.9 vs 2.11 master along with what you have suggested 
> above and see if there are any problems related to ovsdb monitor condition 
> change.
>
> Thanks
> Numan
>
> Below are the results
>
>
> With IP master - nbctl daemon node - No batch mode
> concurrency - 10
>
> +--------------------------------------------------------------------------------------------------------------+
> |                                             Response Times (sec)            
>                                  |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> | action                                | min   | median | 90%ile | 95%ile | 
> max    | avg    | success | count |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> | ovn_network.create_routers            | 0.269 | 0.661  | 10.426 | 15.422 | 
> 37.259 | 3.721  | 100.0%  | 1000  |
> | ovn.create_lswitch                    | 0.313 | 0.45   | 12.107 | 15.373 | 
> 30.405 | 4.185  | 100.0%  | 1000  |
> | ovn_network.connect_network_to_router | 0.163 | 0.255  | 10.121 | 10.64  | 
> 20.475 | 2.655  | 100.0%  | 1000  |
> | ovn.create_lport                      | 0.351 | 0.514  | 12.255 | 15.511 | 
> 34.74  | 4.621  | 100.0%  | 1000  |
> | ovn_network.bind_port                 | 1.362 | 2.447  | 7.34   | 7.651  | 
> 17.651 | 3.146  | 100.0%  | 1000  |
> | ovn_network.wait_port_up              | 0.086 | 2.734  | 5.272  | 7.827  | 
> 22.717 | 2.957  | 100.0%  | 1000  |
> | ovn_network.ping_ports                | 0.038 | 10.196 | 20.285 | 20.39  | 
> 40.74  | 7.52   | 100.0%  | 1000  |
> | total                                 | 2.862 | 27.267 | 49.956 | 56.39  | 
> 90.884 | 28.808 | 100.0%  | 1000  |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> Load duration: 2950.4133141
> Full duration: 2951.58845997 seconds
>
> ***********
> With non IP - nbctl daemin node -ACLs - No batch mode
>
> concurrency - 10
>
> +--------------------------------------------------------------------------------------------------------------+
> |                                             Response Times (sec)            
>                                  |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> | action                                | min   | median | 90%ile | 95%ile | 
> max    | avg    | success | count |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> | ovn_network.create_routers            | 0.267 | 0.421  | 10.395 | 10.735 | 
> 25.501 | 3.09   | 100.0%  | 1000  |
> | ovn.create_lswitch                    | 0.314 | 0.408  | 10.331 | 10.483 | 
> 25.357 | 3.049  | 100.0%  | 1000  |
> | ovn_network.connect_network_to_router | 0.153 | 0.249  | 6.552  | 10.268 | 
> 20.545 | 2.236  | 100.0%  | 1000  |
> | ovn.create_lport                      | 0.344 | 0.49   | 10.566 | 15.428 | 
> 25.542 | 3.906  | 100.0%  | 1000  |
> | ovn_network.bind_port                 | 1.372 | 2.409  | 7.437  | 7.665  | 
> 17.518 | 3.192  | 100.0%  | 1000  |
> | ovn_network.wait_port_up              | 0.086 | 1.323  | 5.157  | 7.769  | 
> 20.166 | 2.291  | 100.0%  | 1000  |
> | ovn_network.ping_ports                | 0.034 | 2.077  | 10.347 | 10.427 | 
> 20.307 | 5.123  | 100.0%  | 1000  |
> | total                                 | 3.109 | 21.26  | 39.245 | 44.495 | 
> 70.197 | 22.889 | 100.0%  | 1000  |
> +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
> Load duration: 2328.11378407
> Full duration: 2334.43504095 seconds
>
>
>>
>>
>> Thanks,
>> Han
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to