From: Leonid Ryzhyk <[email protected]> This is another instance of a performance bug when a change to a router object forces a cascade of changes to relations that reference the object. This time around the problem was caused by the `Router.static_routes` field, which is copied from `nb::Logical_Router`. Luckily, this field was only used in one rule and was easy to remove.
Here is how we diagnosed the issue (this may be useful for posterity): - We started with a benchmark that executed several hundreds of similar transactions (in this case, these transactions were adding new router ports). We recorded execution of the benchmark in a DDlog command file (replay.txt) and added `timestamp;` commands after each transaction in the file. - Run `make NORTHD_CLI=1` to generate the ovn_northd_cli executable and use it to execute the command file: ``` ./ovn_northd_ddlog/target/release/ovn_northd_cli -w 1 < replay.txt > replay.dump ``` - Extract only the timestamps from replay.dump and plot differences between successive timestamps (i.e., individual transaction times). I use gnumeric. It would be nice to have some automation for this in the future. We observe that one of the transactions in the benchmark loop slows down linearly as the size of the network topology grows: https://gist.github.com/ryzhyk/16a5607b280ed9cd09b176d6816cb4f0 Clearly some of the rules in the program are getting more expensive as the number of ports goes up. Another interesting observation is that the size of the delta output at each iteration of the benchmark remains constant (the delta mostly consists of new network flows). This suggests that whatever extra work DDlog is doing at each iteration might be redundant. - To identify where the wasted work happens, we re-compile the program passing the `--output-internal-relations` flag to DDlog, which tells it to dump changes to all intermediate relations, not just output relation. We replay the trace again. We locate the expensive transaction in the log and compare its output from one of the first iterations vs one from the end of the log. We now see a whole bunch of intermediate relations that only had a few modified records in the first transaction versus hundreds in the second one. We further observe that all of these changes simply update the `static_routes` field (as explained above). - After removing the field, we observe that changes to intermediate relations no longer grow with the topology, and transaction timing increases much more slowly: https://gist.github.com/ryzhyk/d02784b9088d82f8549ea1b2ebdf095e Signed-off-by: Leonid Ryzhyk <[email protected]> Signed-off-by: Ben Pfaff <[email protected]> --- northd/lrouter.dl | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index a85e12d1b3df..db24ee64636c 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -444,7 +444,6 @@ typedef Router = Router { /* Fields copied from nb::Logical_Router. */ _uuid: uuid, name: string, - static_routes: Set<uuid>, policies: Set<uuid>, enabled: Option<bool>, nat: Set<uuid>, @@ -469,7 +468,6 @@ relation Router[Intern<Router>] Router[Router{ ._uuid = lr._uuid, .name = lr.name, - .static_routes = lr.static_routes, .policies = lr.policies, .enabled = lr.enabled, .nat = lr.nat, @@ -740,7 +738,8 @@ RouterStaticRoute_(.router = router, .nexthop = route.nexthop, .output_port = route.output_port, .ecmp_symmetric_reply = route.ecmp_symmetric_reply) :- - router in &Router(.static_routes = routes), + router in &Router(), + nb::Logical_Router(._uuid = router._uuid, .static_routes = routes), var route_id = FlatMap(routes), route in &StaticRoute(.lrsr = nb::Logical_Router_Static_Route{._uuid = route_id}). -- 2.31.1 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
