Neat! Thanks folks :)
I'll try to get an OSP setup where we can patch this and re-run the
same tests than previous time to confirm but looks promising.

On Fri, Jul 19, 2019 at 11:12 PM Han Zhou <[email protected]> wrote:
>
>
>
> On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique <[email protected]> wrote:
>>
>>
>>
>> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique <[email protected]> wrote:
>>>
>>>
>>>
>>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique <[email protected]> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez 
>>>> > <[email protected]> wrote:
>>>> >>
>>>> >> Thanks Numan for running these tests outside OpenStack!
>>>> >>
>>>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <[email protected]> 
>>>> >> wrote:
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou <[email protected]> wrote:
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou <[email protected]> wrote:
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
>>>> >> >> > <[email protected]> wrote:
>>>> >> >> > >
>>>> >> >> > >
>>>> >> >> > >
>>>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <[email protected]> 
>>>> >> >> > > wrote:
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
>>>> >> >> > >> <[email protected]> wrote:
>>>> >> >> > >> >
>>>> >> >> > >> > Thanks a lot Han for the answer!
>>>> >> >> > >> >
>>>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <[email protected]> 
>>>> >> >> > >> > wrote:
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > >
>>>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
>>>> >> >> > >> > > <[email protected]> wrote:
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>>>> >> >> > >> > > > <[email protected]> wrote:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Hi Han, all,
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing 
>>>> >> >> > >> > > > > of OpenStack
>>>> >> >> > >> > > > > using OVN and wanted to present some results and issues 
>>>> >> >> > >> > > > > that we've
>>>> >> >> > >> > > > > found with the Incremental Processing feature in 
>>>> >> >> > >> > > > > ovn-controller. Below
>>>> >> >> > >> > > > > is the scenario that we executed:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>>>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 
>>>> >> >> > >> > > > > compute nodes. OVS
>>>> >> >> > >> > > > > 2.10.
>>>> >> >> > >> > > > > * The test consists on:
>>>> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and router
>>>> >> >> > >> > > > >   - Attach subnet to the router and set gw to the 
>>>> >> >> > >> > > > > external network
>>>> >> >> > >> > > > >   - Create an OpenStack port and apply a Security Group 
>>>> >> >> > >> > > > > (ACLs to allow
>>>> >> >> > >> > > > > UDP, SSH and ICMP).
>>>> >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes 
>>>> >> >> > >> > > > > (randomly) by
>>>> >> >> > >> > > > > attaching it to a network namespace.
>>>> >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == 
>>>> >> >> > >> > > > > True' in NB)
>>>> >> >> > >> > > > >   - Wait until the test can ping the port
>>>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process 
>>>> >> >> > >> > > > > to execute the
>>>> >> >> > >> > > > > test above 150 times.
>>>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat 
>>>> >> >> > >> > > > > will delete all
>>>> >> >> > >> > > > > the OpenStack/OVN resources.
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some 
>>>> >> >> > >> > > > > results which showed
>>>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as 
>>>> >> >> > >> > > > > expected) in all
>>>> >> >> > >> > > > > the nodes especially during the deletion phase:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>>>> >> >> > >> > > > > https://imgur.com/a/8ffKKYF
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > After conducting the tests above, we replaced 
>>>> >> >> > >> > > > > ovn-controller in all 7
>>>> >> >> > >> > > > > nodes by the one with the current master branch 
>>>> >> >> > >> > > > > (actually from last
>>>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers 
>>>> >> >> > >> > > > > but the
>>>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). 
>>>> >> >> > >> > > > > The expected
>>>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and 
>>>> >> >> > >> > > > > also better
>>>> >> >> > >> > > > > times due to the Incremental Processing feature 
>>>> >> >> > >> > > > > introduced recently.
>>>> >> >> > >> > > > > However, the results don't look very good:
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>>>> >> >> > >> > > > > https://imgur.com/a/99kiyDp
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU 
>>>> >> >> > >> > > > > consumption is
>>>> >> >> > >> > > > > that it's much less in the Incremental Processing (IP) 
>>>> >> >> > >> > > > > case which
>>>> >> >> > >> > > > > apparently doesn't make much sense. This led us to 
>>>> >> >> > >> > > > > think that perhaps
>>>> >> >> > >> > > > > ovn-controller was not installing the necessary flows 
>>>> >> >> > >> > > > > in the switch
>>>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the 
>>>> >> >> > >> > > > > dataplane
>>>> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were 
>>>> >> >> > >> > > > > unreachable via ping
>>>> >> >> > >> > > > > when using ovn-controller from master.
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > @Han, others, do you have any ideas as of what could be 
>>>> >> >> > >> > > > > happening
>>>> >> >> > >> > > > > here? We'll be able to use this setup for a few more 
>>>> >> >> > >> > > > > days so let me
>>>> >> >> > >> > > > > know if you want us to pull some other data/traces, ...
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Some other interesting things:
>>>> >> >> > >> > > > > On each of the compute nodes, (with an almost evenly 
>>>> >> >> > >> > > > > distributed
>>>> >> >> > >> > > > > number of logical ports bound to them), the max amount 
>>>> >> >> > >> > > > > of logical
>>>> >> >> > >> > > > > flows in br-int is ~90K (by the end of the test, right 
>>>> >> >> > >> > > > > before deleting
>>>> >> >> > >> > > > > the resources).
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > It looks like with the IP version, ovn-controller leaks 
>>>> >> >> > >> > > > > some memory:
>>>> >> >> > >> > > > > https://imgur.com/a/trQrhWd
>>>> >> >> > >> > > > > While with OVS 2.10, it remains pretty flat during the 
>>>> >> >> > >> > > > > test:
>>>> >> >> > >> > > > > https://imgur.com/a/KCkIT4O
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > Hi Daniel, Han,
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > I just sent a small patch for the ovn-controller memory 
>>>> >> >> > >> > > > leak:
>>>> >> >> > >> > > > https://patchwork.ozlabs.org/patch/1113758/
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > At least on my setup this is what valgrind was pointing 
>>>> >> >> > >> > > > at.
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > Cheers,
>>>> >> >> > >> > > > Dumitru
>>>> >> >> > >> > > >
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > Looking forward to hearing back :)
>>>> >> >> > >> > > > > Daniel
>>>> >> >> > >> > > > >
>>>> >> >> > >> > > > > PS. Sorry for my previous email, I sent it by mistake 
>>>> >> >> > >> > > > > without the subject
>>>> >> >> > >> > > > > _______________________________________________
>>>> >> >> > >> > > > > discuss mailing list
>>>> >> >> > >> > > > > [email protected]
>>>> >> >> > >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> >> >> > >> > >
>>>> >> >> > >> > > Thanks Daniel for the testing and reporting, and thanks 
>>>> >> >> > >> > > Dumitru for fixing the memory leak.
>>>> >> >> > >> > >
>>>> >> >> > >> > > Currently ovn-controller incremental processing only 
>>>> >> >> > >> > > handles below SB changes incrementally:
>>>> >> >> > >> > > - logical_flow
>>>> >> >> > >> > > - port_binding (for regular VIF binding NOT on current 
>>>> >> >> > >> > > chassis)
>>>> >> >> > >> > > - mc_group
>>>> >> >> > >> > > - address_set
>>>> >> >> > >> > > - port_group
>>>> >> >> > >> > > - mac_binding
>>>> >> >> > >> > >
>>>> >> >> > >> > > So, in test scenario you described, since each iteration 
>>>> >> >> > >> > > creates network (SB datapath changes) and router ports 
>>>> >> >> > >> > > (port_binding changes for non VIF), the incremental 
>>>> >> >> > >> > > processing would not help much, because most steps in your 
>>>> >> >> > >> > > test should trigger recompute. It would help if you create 
>>>> >> >> > >> > > more Fake VMs in each iteration, e.g. create 10 VMs or more 
>>>> >> >> > >> > > on each LS. Secondly, when VIF port-binding happens on 
>>>> >> >> > >> > > current chassis, the ovn-controller will still do 
>>>> >> >> > >> > > re-compute, and because you have only 4 compute nodes, so 
>>>> >> >> > >> > > 1/4 of the compute node will still recompute even when 
>>>> >> >> > >> > > binding a regular VIF port. When you have more compute 
>>>> >> >> > >> > > nodes you would see incremental processing more effective.
>>>> >> >> > >> >
>>>> >> >> > >> > Got it, it makes sense (although then worst case, it should 
>>>> >> >> > >> > be at
>>>> >> >> > >> > least what we had before and not worse but it can also be 
>>>> >> >> > >> > because
>>>> >> >> > >> > we're mixing version here: 2.10 vs master).
>>>> >> >> > >> > >
>>>> >> >> > >> > > However, what really worries me is the 10% VM unreachable. 
>>>> >> >> > >> > > I have one confusion here on the test steps. The last step 
>>>> >> >> > >> > > you described was: - Wait until the test can ping the port. 
>>>> >> >> > >> > > So if the VM is not pingable the test won't continue?
>>>> >> >> > >> >
>>>> >> >> > >> > Sorry I should've explained it better. We wait for 2 minutes 
>>>> >> >> > >> > to the
>>>> >> >> > >> > port to respond to pings, if it's not reachable then we 
>>>> >> >> > >> > continue with
>>>> >> >> > >> > the next port (16 rally processes are running simultaneously 
>>>> >> >> > >> > so the
>>>> >> >> > >> > rest of the process may be doing stuff at the same time).
>>>> >> >> > >> >
>>>> >> >> > >> > >
>>>> >> >> > >> > > To debug the problem, the first thing is to identify what 
>>>> >> >> > >> > > flows are missing for the VMs that is unreachable. Could 
>>>> >> >> > >> > > you do ovs-appctl ofproto/trace for the ICMP flow of any VM 
>>>> >> >> > >> > > with ping failure? And then, please enable debug log for 
>>>> >> >> > >> > > ovn-controller with ovs-appctl -t ovn-controller vlog/set 
>>>> >> >> > >> > > file:dbg. There may be too many logs so please enable it 
>>>> >> >> > >> > > for as short time as any VM with ping failure is 
>>>> >> >> > >> > > reproduced. If the last step "wait until the test can ping 
>>>> >> >> > >> > > the port" is there then it should be able to detect the 
>>>> >> >> > >> > > first occurrence if the VM is not reachable in e.g. 30 sec.
>>>> >> >> > >> >
>>>> >> >> > >> > We'll need to hack a bit here but let's see :)
>>>> >> >> > >> > >
>>>> >> >> > >> > > In the ovn-scale-test we didn't have data plane test, but 
>>>> >> >> > >> > > this problem was not seen in our live environment either, 
>>>> >> >> > >> > > with a far larger scale. The major difference in your test 
>>>> >> >> > >> > > v.s. our environment are:
>>>> >> >> > >> > > - We are runing with an older version. So there might be 
>>>> >> >> > >> > > some rebase/refactor problem caused this. To eliminate 
>>>> >> >> > >> > > this, I'd suggest to try a branch I created for 2.10 
>>>> >> >> > >> > > (https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), 
>>>> >> >> > >> > > which matches the base test you did which is also 2.10. It 
>>>> >> >> > >> > > may also eliminate compatibility problem, if there is any, 
>>>> >> >> > >> > > between OVN master branch and OVS 2.10 as you mentioned is 
>>>> >> >> > >> > > used in the test.
>>>> >> >> > >> > > - We don't use Security Group (I guess the  ~90k OVS flows 
>>>> >> >> > >> > > you mentioned were mainly introduced by the Security Group 
>>>> >> >> > >> > > use, if all ports were put in same group). The incremental 
>>>> >> >> > >> > > processing is expected to be correct for security-groups, 
>>>> >> >> > >> > > and handling it incrementally because of address_set and 
>>>> >> >> > >> > > port_group incremental processing. However, since the 
>>>> >> >> > >> > > testing only relied on the regression tests, I am not 100% 
>>>> >> >> > >> > > sure if the test coverage was sufficient. So could you try 
>>>> >> >> > >> > > disabling Security Group to rule out the problem?
>>>> >> >> > >> >
>>>> >> >> > >> > Ok will try to repeat the tests without the SGs.
>>>> >> >> > >> > >
>>>> >> >> > >> > > Thanks,
>>>> >> >> > >> > > Han
>>>> >> >> > >> >
>>>> >> >> > >> > Thanks once again!
>>>> >> >> > >> > Daniel
>>>> >> >> > >>
>>>> >> >> > >> Hi Daniel,
>>>> >> >> > >>
>>>> >> >> > >> Any updates? Do you still see the 10% VM unreachable
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >> Thanks,
>>>> >> >> > >> Han
>>>> >> >> > >
>>>> >> >> > >
>>>> >> >> > > Hi Han,
>>>> >> >> > >
>>>> >> >> > > As such there is no datapath impact. After increasing the ping 
>>>> >> >> > > wait timeout value from 120 seconds to 180 seconds its 100% now.
>>>> >> >> > >
>>>> >> >> > > But the time taken to program the flows is too huge when 
>>>> >> >> > > compared to OVN master without IP patches.
>>>> >> >> > > Here is some data -  http://paste.openstack.org/show/753224/ .  
>>>> >> >> > > I am still investigating it. I will update my findings in some 
>>>> >> >> > > time.
>>>> >> >> > >
>>>> >> >> > > Please see the times for the action - vm.wait_for_ping
>>>> >> >> > >
>>>> >> >> >
>>>> >> >> > Thanks Numan for the investigation and update. Glad to hear there 
>>>> >> >> > is no correctness issue, but sorry for the slowness in your test 
>>>> >> >> > scenario. I expect that the operations in your test trigger 
>>>> >> >> > recomputing and the worst case should be similar performance as 
>>>> >> >> > withour I-P. It is weird that it turned out so much slower in your 
>>>> >> >> > test. There can be some extra overhead when it tries to do 
>>>> >> >> > incremental processing and then fallback to full recompute, but it 
>>>> >> >> > shouldn't cause that big difference. It might be that for some 
>>>> >> >> > reason the main loop iteration is triggered more times 
>>>> >> >> > unnecessarily. I'd suggest to compare the coverage counter 
>>>> >> >> > "lflow_run" between the tests, and also check perf report to see 
>>>> >> >> > if the hotspot is somewhere else. (Sorry that I can't provide 
>>>> >> >> > full-time help now since I am still on vacation but I will try to 
>>>> >> >> > be useful if things are blocked)
>>>> >> >>
>>>> >> >> Hi Numan/Daniel, do you have any new findings on why I-P got worse 
>>>> >> >> result in your test? The extremely long latency (2 - 3 min) shown in 
>>>> >> >> your report reminds me a similar problem I reported before: 
>>>> >> >> https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346321.html
>>>> >> >>
>>>> >> >> The root cause of that problem was still not clear. In that report, 
>>>> >> >> the extremely long latency (7 min) was observed without I-P and it 
>>>> >> >> didn't happen with I-P. If it is the same problem, then I suspect it 
>>>> >> >> is not related to I-P or non I-P, but some problem related to ovsdb 
>>>> >> >> monitor condition change. To confirm if it is same problem, could 
>>>> >> >> you:
>>>> >> >> 1. pause the test when the scale is big enough (e.g. when the test 
>>>> >> >> is almost completed), and then
>>>> >> >> 2. enable ovn-controller debug log, and then
>>>> >> >> 3. run one more iteration of the test, and see if the time was spent 
>>>> >> >> on waiting for SB DB update notification.
>>>> >> >>
>>>> >> >> Please ignore my speculation above if you already found the root 
>>>> >> >> cause and it would be great if you could share it :)
>>>> >> >
>>>> >> >
>>>> >> > Thanks for sharing this Han.
>>>> >> >
>>>> >> > I do not have any new findings. Yesterday I ran ovn-scale-test 
>>>> >> > comparing OVN with IP vs without IP (using the master branch).
>>>> >> > The test creates a new logical switch, adds it to a router, few ACLs 
>>>> >> > and creates 2 logical ports and pings between them.
>>>> >> > I am using physical deployment which creates actual namespaces 
>>>> >> > instead of sandboxes.
>>>> >> >
>>>> >> > The results doesn't show any huge difference between the two.
>>>> >> 2300 vs 2900 seconds total time or  44 vs 56 seconds for the 95%ile?
>>>> >> It is not negligible IMHO. It's a >25% penalty with the IP. Maybe I
>>>> >> missed something from the results?
>>>> >>
>>>> >
>>>> > Initially I ran with ovn-nbctl running commands as one batch (ie 
>>>> > combining commands with "--"). The results were very similar. Like this 
>>>> > one
>>>> >
>>>> > *******
>>>> >
>>>> > With non IP - ovn-nbctl NO daemon mode
>>>> >
>>>> > +--------------------------------------------------------------------------------------------------------------+
>>>> > |                                             Response Times (sec)       
>>>> >                                       |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> > | action                                | min   | median | 90%ile | 
>>>> > 95%ile | max    | avg    | success | count |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> > | ovn_network.create_routers            | 0.288 | 0.429  | 5.454  | 
>>>> > 5.538  | 20.531 | 1.523  | 100.0%  | 1000  |
>>>> > | ovn.create_lswitch                    | 0.046 | 0.139  | 0.202  | 
>>>> > 5.084  | 10.259 | 0.441  | 100.0%  | 1000  |
>>>> > | ovn_network.connect_network_to_router | 0.164 | 0.411  | 5.307  | 
>>>> > 5.491  | 15.636 | 1.128  | 100.0%  | 1000  |
>>>> > | ovn.create_lport                      | 0.11  | 0.272  | 0.478  | 
>>>> > 5.284  | 15.496 | 0.835  | 100.0%  | 1000  |
>>>> > | ovn_network.bind_port                 | 1.302 | 2.367  | 2.834  | 3.24 
>>>> >   | 12.409 | 2.527  | 100.0%  | 1000  |
>>>> > | ovn_network.wait_port_up              | 0.0   | 0.001  | 0.001  | 
>>>> > 0.001  | 0.002  | 0.001  | 100.0%  | 1000  |
>>>> > | ovn_network.ping_ports                | 0.04  | 10.24  | 10.397 | 
>>>> > 10.449 | 10.82  | 6.767  | 100.0%  | 1000  |
>>>> > | total                                 | 2.219 | 13.903 | 23.068 | 
>>>> > 24.538 | 49.437 | 13.222 | 100.0%  | 1000  |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >
>>>> >
>>>> > With IP - ovn-nbctl NO daemon mode
>>>> >
>>>> > concurrency - 10
>>>> >
>>>> > +--------------------------------------------------------------------------------------------------------------+
>>>> > |                                             Response Times (sec)       
>>>> >                                       |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> > | action                                | min   | median | 90%ile | 
>>>> > 95%ile | max    | avg    | success | count |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> > | ovn_network.create_routers            | 0.274 | 0.402  | 0.493  | 0.51 
>>>> >   | 0.584  | 0.408  | 100.0%  | 1000  |
>>>> > | ovn.create_lswitch                    | 0.064 | 0.137  | 0.213  | 
>>>> > 0.244  | 0.33   | 0.146  | 100.0%  | 1000  |
>>>> > | ovn_network.connect_network_to_router | 0.203 | 0.395  | 0.677  | 
>>>> > 0.766  | 0.912  | 0.427  | 100.0%  | 1000  |
>>>> > | ovn.create_lport                      | 0.13  | 0.261  | 0.437  | 
>>>> > 0.497  | 0.604  | 0.283  | 100.0%  | 1000  |
>>>> > | ovn_network.bind_port                 | 1.307 | 2.374  | 2.816  | 
>>>> > 2.904  | 3.401  | 2.325  | 100.0%  | 1000  |
>>>> > | ovn_network.wait_port_up              | 0.0   | 0.001  | 0.001  | 
>>>> > 0.001  | 0.002  | 0.001  | 100.0%  | 1000  |
>>>> > | ovn_network.ping_ports                | 0.028 | 10.237 | 10.422 | 
>>>> > 10.474 | 11.281 | 6.453  | 100.0%  | 1000  |
>>>> > | total                                 | 2.251 | 13.631 | 14.822 | 
>>>> > 15.008 | 15.901 | 10.044 | 100.0%  | 1000  |
>>>> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >
>>>> > *****************
>>>> >
>>>> > The results I shared in the previous email were with  ACLs added and 
>>>> > ovn-nbctl - batch mode disabled.
>>>> >
>>>> > I agree with you. Let me do few more runs to be sure that the results 
>>>> > are consistent.
>>>> >
>>>> > Thanks
>>>> > Numan
>>>> >
>>>> >
>>>> >> > I will test with OVN 2.9 vs 2.11 master along with what you have 
>>>> >> > suggested above and see if there are any problems related to ovsdb 
>>>> >> > monitor condition change.
>>>> >> >
>>>> >> > Thanks
>>>> >> > Numan
>>>> >> >
>>>> >> > Below are the results
>>>> >> >
>>>> >> >
>>>> >> > With IP master - nbctl daemon node - No batch mode
>>>> >> > concurrency - 10
>>>> >> >
>>>> >> > +--------------------------------------------------------------------------------------------------------------+
>>>> >> > |                                             Response Times (sec)    
>>>> >> >                                          |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > | action                                | min   | median | 90%ile | 
>>>> >> > 95%ile | max    | avg    | success | count |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > | ovn_network.create_routers            | 0.269 | 0.661  | 10.426 | 
>>>> >> > 15.422 | 37.259 | 3.721  | 100.0%  | 1000  |
>>>> >> > | ovn.create_lswitch                    | 0.313 | 0.45   | 12.107 | 
>>>> >> > 15.373 | 30.405 | 4.185  | 100.0%  | 1000  |
>>>> >> > | ovn_network.connect_network_to_router | 0.163 | 0.255  | 10.121 | 
>>>> >> > 10.64  | 20.475 | 2.655  | 100.0%  | 1000  |
>>>> >> > | ovn.create_lport                      | 0.351 | 0.514  | 12.255 | 
>>>> >> > 15.511 | 34.74  | 4.621  | 100.0%  | 1000  |
>>>> >> > | ovn_network.bind_port                 | 1.362 | 2.447  | 7.34   | 
>>>> >> > 7.651  | 17.651 | 3.146  | 100.0%  | 1000  |
>>>> >> > | ovn_network.wait_port_up              | 0.086 | 2.734  | 5.272  | 
>>>> >> > 7.827  | 22.717 | 2.957  | 100.0%  | 1000  |
>>>> >> > | ovn_network.ping_ports                | 0.038 | 10.196 | 20.285 | 
>>>> >> > 20.39  | 40.74  | 7.52   | 100.0%  | 1000  |
>>>> >> > | total                                 | 2.862 | 27.267 | 49.956 | 
>>>> >> > 56.39  | 90.884 | 28.808 | 100.0%  | 1000  |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > Load duration: 2950.4133141
>>>> >> > Full duration: 2951.58845997 seconds
>>>> >> >
>>>> >> > ***********
>>>> >> > With non IP - nbctl daemin node -ACLs - No batch mode
>>>> >> >
>>>> >> > concurrency - 10
>>>> >> >
>>>> >> > +--------------------------------------------------------------------------------------------------------------+
>>>> >> > |                                             Response Times (sec)    
>>>> >> >                                          |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > | action                                | min   | median | 90%ile | 
>>>> >> > 95%ile | max    | avg    | success | count |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > | ovn_network.create_routers            | 0.267 | 0.421  | 10.395 | 
>>>> >> > 10.735 | 25.501 | 3.09   | 100.0%  | 1000  |
>>>> >> > | ovn.create_lswitch                    | 0.314 | 0.408  | 10.331 | 
>>>> >> > 10.483 | 25.357 | 3.049  | 100.0%  | 1000  |
>>>> >> > | ovn_network.connect_network_to_router | 0.153 | 0.249  | 6.552  | 
>>>> >> > 10.268 | 20.545 | 2.236  | 100.0%  | 1000  |
>>>> >> > | ovn.create_lport                      | 0.344 | 0.49   | 10.566 | 
>>>> >> > 15.428 | 25.542 | 3.906  | 100.0%  | 1000  |
>>>> >> > | ovn_network.bind_port                 | 1.372 | 2.409  | 7.437  | 
>>>> >> > 7.665  | 17.518 | 3.192  | 100.0%  | 1000  |
>>>> >> > | ovn_network.wait_port_up              | 0.086 | 1.323  | 5.157  | 
>>>> >> > 7.769  | 20.166 | 2.291  | 100.0%  | 1000  |
>>>> >> > | ovn_network.ping_ports                | 0.034 | 2.077  | 10.347 | 
>>>> >> > 10.427 | 20.307 | 5.123  | 100.0%  | 1000  |
>>>> >> > | total                                 | 3.109 | 21.26  | 39.245 | 
>>>> >> > 44.495 | 70.197 | 22.889 | 100.0%  | 1000  |
>>>> >> > +---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>>>> >> > Load duration: 2328.11378407
>>>> >> > Full duration: 2334.43504095 seconds
>>>> >> >
>>>> >>
>>>>
>>>> Hi Numan/Daniel,
>>>>
>>>> I spent some time investigating this problem you reported. Thanks Numan 
>>>> for the offline help sharing the details.
>>>>
>>>> Although I still didn't reproduce the slowness in my current single node 
>>>> testing env with almost same steps and ACLs shared by Numan, I think I may 
>>>> have figured out a highy probable cause of what you have seen.
>>>>
>>>> Here is my theory: there is a difference between the I-P and non-I-P in 
>>>> the main loop. The non-I-P version checks ofctrl_can_put() before doing 
>>>> any flow computation (which is introduced to solve a serious performance 
>>>> problem when there are many OVS flows on a single node, see [1]). When 
>>>> worked out the I-P version, I found this may not be the best approach, 
>>>> since there can be new incremental changes coming and we want to process 
>>>> them in current iteration incrementally, so that we don't need to fallback 
>>>> to recompute in next iteration. So this logic is changed so that we always 
>>>> prioritize computing new changes and keeping the desired flow table up to 
>>>> date, while the in-flight messages to ovs-vswitchd may still pending for 
>>>> an older version of desired state. In the end the final desired state will 
>>>> be synced again to ovs-vswitchd. If there are new changes that triggers 
>>>> recompute again, the recompute (which is always slow) will slow down the 
>>>> ofctrl_run() which keeps sending old
  pending messages to ovs-vswitchd by the same main thread. (But it won't cause 
the original performance problem any more because incremental processing engine 
will not recompute when there is no input change).
>>>>
>>>> However, when the test scenario triggers recompute frequently, each single 
>>>> change may take longer to be enforced in OVS, because of this new 
>>>> approach. The later recompute iterations would slow down the previous 
>>>> computed OVS flow installation. In your test you used parallel 10, which 
>>>> means at any point there might be new changes from one client such as 
>>>> creating new router that triggers recomputing, which can block the OVS 
>>>> flow installation triggered earlier for another client. So overall you 
>>>> will see much bigger latency for each individual test iteration.
>>>>
>>>> This can also explain why I didn't reproduce the problem in my 
>>>> single-client single-node environment, since each iteration is serialized.
>>>>
>>>> [1] 
>>>> https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e
>>>>
>>>> To prove this theory, could you help with two tests reusing your 
>>>> environment? Thanks a lot!
>>>>
>>>
>>> Thanks Han. I will try these and come back to you with the results.
>>>
>>> Numan
>>>
>>>>
>>>> 1. Instead of parallelism of 10, try 1, to make sure the test is 
>>>> serialized. I'd expect the result should be similar w/ v.s. w/o I-P.
>>>>
>>>> 2. Try below patch on the I-P version you are testing, to see if the 
>>>> problem is gone.
>>>> ----8><--------------------------------------------><8---------------
>>>> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
>>>> index 043abd6..0fcaa72 100644
>>>> --- a/ovn/controller/ofctrl.c
>>>> +++ b/ovn/controller/ofctrl.c
>>>> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired,
>>>>   * in the correct state and not backlogged with existing flow_mods.  (Our
>>>>   * criteria for being backlogged appear very conservative, but the socket
>>>>   * between ovn-controller and OVS provides some buffering.) */
>>>> -static bool
>>>> +bool
>>>>  ofctrl_can_put(void)
>>>>  {
>>>>      if (state != S_UPDATE_FLOWS
>>>> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
>>>> index ed8918a..2b21c11 100644
>>>> --- a/ovn/controller/ofctrl.h
>>>> +++ b/ovn/controller/ofctrl.h
>>>> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *,
>>>>                  const struct sbrec_meter_table *,
>>>>                  int64_t nb_cfg,
>>>>                  bool flow_changed);
>>>> +bool ofctrl_can_put(void);
>>>>  void ofctrl_wait(void);
>>>>  void ofctrl_destroy(void);
>>>>  int64_t ofctrl_get_cur_cfg(void);
>>>> diff --git a/ovn/controller/ovn-controller.c 
>>>> b/ovn/controller/ovn-controller.c
>>>> index c4883aa..c85c6fa 100644
>>>> --- a/ovn/controller/ovn-controller.c
>>>> +++ b/ovn/controller/ovn-controller.c
>>>> @@ -1954,7 +1954,7 @@ main(int argc, char *argv[])
>>>>
>>>>                      stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>>>                                      time_msec());
>>>> -                    if (ovnsb_idl_txn) {
>>>> +                    if (ovnsb_idl_txn && ofctrl_can_put()) {
>>>>                          engine_run(&en_flow_output, ++engine_run_id);
>>>>                      }
>>>>                      stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>
>>
>>
>> Hi Han,
>>
>> So far I could do just one run after applying your above suggested patch 
>> with the I-P version and  results look promising.
>> It seems to me the problem is gone.
>>
>> +--------------------------------------------------------------------------------------------------------------------------+
>> |                                             Response Times (sec)           
>>                                                          |
>> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> | action                                | min   | median | 90%ile | 95%ile | 
>> max    | avg    | success | count   |
>> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> | ovn_network.ping_ports   | 0.037 | 10.236 | 10.392 | 10.462 | 20.455 | 
>> 7.15   | 100.0%  | 1000  |
>> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> | ovn_network.ping_ports   | 0.036 | 10.255 | 10.448 | 11.323 | 20.791 | 
>> 7.83   | 100.0%  | 1000  |
>> +----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>>
>> The first row represents Non IP and the 2nd row represents IP + your 
>> suggested patch.
>> The values are comparable and lot better compared to without your patch.
>>
>> On monday I will do more runs to be sure that the data is consistent and get 
>> back to you.
>>
>> If the results are consistent, I would try to run the tests which Daniel and 
>> Lucas ran on an openstack deployment.
>>
>> Thanks
>> Numan
>>
>
> Glad to see the test result improved! Thanks a lot and looking forward to 
> more data. Once it is finally confirmed, we can discuss whether this should 
> be submitted as a formal patch considering real world scenarios.
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to