On 8/26/20 5:11 PM, Dumitru Ceara wrote: > On 8/25/20 7:46 PM, Ben Pfaff wrote: >> On Tue, Aug 25, 2020 at 06:43:51PM +0200, Dumitru Ceara wrote: >>> On 8/25/20 6:01 PM, Ben Pfaff wrote: >>>> On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote: >>>>> As I remember you were working on the new ovn-northd that utilizes DDlog >>>>> for incremental processing. Could you share the current status? >>>>> >>>>> Now that some more improvements have been made in ovn-controller and >>>>> OVSDB, >>>>> the ovn-northd becomes the more obvious bottleneck for OVN use in large >>>>> scale environments. Since you were not in the OVN meetings for the last >>>>> couple of weeks, could you share here the status and plan moving forward? >>>> >>>> The status is basically that I haven't yet succeeded at getting Red >>>> Hat's recommended benchmarks running. I'm told that is important before >>>> we merge it. I find them super difficult to set up. I tried a few >>>> weeks ago and basically gave up. Piles and piles of repos all linked >>>> together in tricky ways, making it really difficult to substitute my own >>>> branches. I intend to try again soon, though. I have a new computer >>>> that should be arriving soon, which should also allow it to proceed more >>>> quickly. >>> >>> Hi Ben, >>> >>> I can try to help with setting up ovn-heater, in theory it should be >>> enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them >>> point to your repos and branches and then run "do.sh install" and it >>> should take care of installing all the dependencies and repos. >>> >>> I can also try to run the scale tests on our downstream if that helps. >> >> It's probably better if I come up with something locally, because I >> expect to have to run it multiple times, maybe many times, since I will >> presumably discover bottlenecks. >> >> This time around, I'll speak up when I run into problems. >> > > Sorry in advance for the log email. > > I went ahead and added a new test scenario to ovn-heater that I think > might be relevant in the context of ovn-northd incremental processing: > > https://github.com/dceara/ovn-heater#example-run-scenario-3---scale-up-number-of-pods---stress-ovn-northd > > On my test machine: > Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz > 2 NUMA nodes - 28 cores each. > > I did: > > $ cd > $ git clone https://github.com/dceara/ovn-heater > $ cd ovn-heater > $ cat > physical-deployments/physical-deployment.yml << EOF > registry-node: localhost > internal-iface: none > > central-node: > name: localhost > > worker-nodes: > - localhost > EOF > > # Install all the required repos and make everything work together using > # latest OVS and OVN code from github. This generates the > # ~/ovn-heater/runtime where all the repos are cloned and the test suite > # is run. This step also generates the container image with OVS/OVN > # compiled from sources. This step has to be done every time we need > # to test with a different version of OVS/OVN and can be customized with > # the OVS/OVN_REPO and OVS/OVN_BRANCH env vars. > $ ./do.sh install
# Missed a step here: $ ./do.sh rally-deploy > > # Start the test: > # This brings up 30 "fake" OVN nodes and then simulates addition of > # 1000 pods (lsps) and associated policies (port_group/address_set/acl). > $ ./do.sh browbeat-run > browbeat-scenarios/switch-per-node-30-node-1000-pods.yml debug-dceara-pods > > # This takes quite long, ~1hr on my system. > # Results are stored at: > # ls -l > ~/ovn-heater/test_results/debug-dceara-pods-20200826-080650/20200826-120718/rally/plugin-workloads/all-rally-run-0.html > > What I noticed was that while the test was running (we can monitor the > execution by tailing ~/ovn-heater/runtime/browbeat/*.log) that > ovn-northd's CPU usage increased constantly and was above 70-80% after > ~500 iterations. > > ovn-northd logs: > 2020-08-26T14:24:25.989Z|02119|poll_loop|INFO|wakeup due to [POLLIN] on > fd 12 (192.16.0.1:53642<->192.16.0.1:6642) at lib/stream-ssl.c:832 (97% > CPU usage) > > 2020-08-26T14:24:31.985Z|02120|poll_loop|INFO|Dropped 54 log messages in > last 5 seconds (most recently, 0 seconds ago) due to excessive rate > > > 2020-08-26T14:24:31.985Z|02121|poll_loop|INFO|wakeup due to [POLLIN] on > fd 11 (192.16.0.1:56340<->192.16.0.1:6641) at lib/stream-ssl.c:832 (99% > CPU usage) > > For troubleshooting/profiling, the easiest way I can think of for > rerunning the sequence of commands without actually running the whole > suite is to extract them from the ovn-nbctl daemon logs. We start it on > node ovn-central-1. I also added a short sleep to avoid NB changes being > batched before ovn-northd processes them: > > $ docker exec ovn-central-1 grep "Running command" > /var/log/openvswitch/ovn-nbctl.log | sed -ne 's/.*Running command > run\(.*\)/ovn-nbctl\1; sleep 0.01/p' > commands.sh > > # Now we can just run ovn-northd locally: > $ ovn-ctl start_northd > # Start an ovn-nbctl daemon locally: > $ export OVN_NB_DAEMON=$(ovn-nbctl --detach) > # Replay the commands: > $ ./commands.sh > > Regarding the ddlog compilation I suspect that we need to add support > for it in ovn-fake-multinode which builds and runs the fake node's > images. I can take care of that and add the rust compiler and ddlog > binaries to the docker files. > > I assume these are the branches I should use for testing: > https://github.com/blp/ovs-reviews/tree/ovs-for-ddlog > https://github.com/blp/ovs-reviews/tree/ddlog4 > > Hope this helps. > > Regards, > Dumitru > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss