On Fri, Jul 16, 2021 at 2:53 PM Han Zhou <[email protected]> wrote: > > On Fri, Jul 16, 2021 at 7:15 AM Numan Siddique <[email protected]> wrote: > > > > On Mon, Jul 12, 2021 at 11:09 AM Mark Michelson <[email protected]> > wrote: > > > > > > Full disclosure: I have not looked at all the details in this patch, > > > since it is quite large. However, I felt I should comment on the idea. > > > > Hi Mark, > > > > Thanks for the review and the comments. > > > > > > > > The memory savings in ovn-northd and the southbound database are quite > > > nice. It's to be expected since the southbound database has fewer > > > logical flows, and ovn-northd doesn't have to calculate logical flows. I > > > think it's telling that the scale tests didn't show any noticeable > > > improvement in performance. > > > > > > I think one thing that could help would be to just skip logical flow > > > creation altogether in ovn-controller. It doesn't make much sense to > > > take the source data, translate it into logical flow syntax, just to run > > > it through the expression parser and translate it into OpenFlow. You > > > could presumably create OpenFlow matches and actions directly, saving > > > lots of processing. Of course, this takes this > > > semi-backwards-incompatible change and makes it wholly > > > backwards-incompatible :) > > > > > > > > > > I thought about skipping logical flow generation altogether. But it has > its own > > challenges > > - We will break backward compatibility and many tools - ovn-sbctl, > > ovn-trace/ovn-detrace > > - And it is a lot of effort. > > > > If breaking those tools is Ok, I guess we can explore this option. > > So far ovsdb-server > > has been a bottleneck in the scale environments and the goal of this > > approach was > > to relieve that pressure from ovsdb-servers. Since Ilya's relay > > feature is merged we > > can definitely test that feature and see if it solves all the scale > > concerns. If it > > solves, then I think we can drop the idea of removing the logical > > flows from the > > Southbound db. If not, I think we may have to revisit this idea. In > > my free cycles > > I'd like to experiment a bit and see if we can generate OF flows > > directly in ovn-controller. > > > > I think it is still good to discuss in this thread if generating OF > > flows directly can be a potential > > re-architecture or not. > > > > @Han - You mentioned in the irc meeting that you have some comments > > too. It would be > > great if you can share your thoughts. > > Sorry for my slow response. My comments regarding this RFC patch are > primarily the ones we discussed offline after the irc meeting. In fact most > of them are already covered by Mark's comments in his reply. Let me put it > here with some additional points: > > 1) I agree with Mark that this patch achieves its goal of reducing the size > of SB DB, without obvious e2e performance gains which is also expected. > While the SB DB reduction is significant, the benefit might not justify > this big architecture change. And I agree with you that let's see how the > OVSDB relay approach helps for the SB DB scale, and revisit this patch if > necessary. > > 2) I also mentioned the same idea a while ago for removing the logical flow > phase. If we remove logical flows from SB DB, it is more natural and > efficient to not generate them at all. However, I also thought about that > again recently and I feel not so sure now if it is the right way to > proceed. The open issues are: > - First of all, as you mentioned, this is an even bigger change, > probably close to rewriting all the components, and the tools. (We might > look at directly using DDlog to achieve that, and hopefully some code in > the northd-ddlog can be reused.) Most importantly, it took a great amount > of effort for OVN to stabilize (well, to some extent) for production, so > I'd expect such a big change at this level, if implemented, would require a > really long period of time, probably very painful, to stabilize again. > - We lose the benefit of logical-physical separation, which could make > trouble-shooting much more difficult. Nevertheless, the debuggability may > be achieved by other ways, and nothing prevents us from implementing tools > that generate human friendly logical flows for debugging purposes only, if > it is helpful. > - It seems ovn-controller should connect to NB DB directly, but at the > same time maintain SB DB for port-bindings and chassis information. (This > is not a problem, but something to think about) > - Although it avoids the cost of logical flow parsing, it doesn't > completely avoid the cost of parsing "match" strings, because the "match" > fields in ACL (and also Policy Routing, QoS) are still flexible strings > specified by users. If these objects are too many, specifically, if ACLs > contribute to a significant part of today's logical flow handling, then > even if logical flows are removed, parsing ACL matches could still be a > bottleneck, at least for some use cases. So we may need to lower our > expectations for the e2e performance gains after all the great efforts. > > So I am still open to the idea of rearchitecting to remove logical flows. > It would be great to have a POC to evaluate the e2e performance gains, > which itself can take a really big effort, at the risk of being completely > discarded. Since there is an ongoing effort of rewriting ovn-controller > using DDlog, which I believe is a big effort, too, perhaps this can be > considered as an option together. @Ben Pfaff <[email protected]> > > 3) Regarding the potential performance gains of this patch by using > incremental processing in ovn-controller, I don't think it is a strong > argument, because I believe our goal for ovn-northd is still making > ovn-northd-ddlog mature and replace the C version, so when comparing e2e > performance we should consider ovn-northd-ddlog which inherently does > incremental processing, instead of comparing with the current C version of > ovn-northd. (In case the direction of using ovn-northd-ddlog is somehow > changed, hopefully not, we need to raise it clearly in the community and > discuss it seriously.) > > 4) It seems we (the community) put more focus on scalability again recently > and a lot of improvements have been achieved, and more ongoing. Although > none of them are architectural changes, they turned out to be effective (or > with great potential). We could re-evaluate the longer term plan after we > complete these non-architectural improvements.
Hi Han, Thanks a lot for your comments. I agree with your comments. Let's see how it goes and if we still need to revisit this RFC patch. Numan > > Thanks, > Han > > > > > Few comments below > > > > > > > > I'm concerned about a few things in the patch as presented. > > > > > > First, moving logical flow generation from ovn-northd to ovn-controller > > > seems like it's just shifting the work from one place to another. And in > > > the case where datapaths have ports bound on many nodes, it means a lot > > > of the same work is being done on multiple ovn-controller instances. I-P > > > in ovn-controller could reduce the amount of work ovn-controller is > > > performing per iteration, but it also introduces error potential. > > > > It definitely adds some load to ovn-controllers. But I don't think it > > is significant. > > There are many logical flows which are generic (which datapath groups > > has already > > leveraged) and the proposed RFC patch generates those at start up. > > > > I do agree that there is potential for errors and adding new features > > could be a bit > > challenging. But it should not be so hard. > > > > > > > > > > Second, is there a danger in moving logical flows out of the southbound > > > database? You modified `ovn-sbctl lflow-list` to do some magic to list > > > the logical flows, but are there any CMS's that access the Logical_Flow > > > table directly in the SB DB for any reason? > > > > I don't think CMS should rely on the logical_flow table. I'm not > > aware of any CMS > > doing it. Even if any CMS is doing it, I think it probably shouldn't. > > > > > > > > Third, does ovn-trace still work? > > > > In the proposed patch, ovn-trace doesn't work but it should be > straightforward > > to make it work. In this proposed patch, a new command is added in > ovn-sbctl > > called - "ctrl-lflow-list" and it calls the same functions to > > generate logical flows > > as ovn-controller. ovn-trace can do the same. > > > > > > > > Overall, I'm fine with the idea of re-architecting things in OVN, but I > > > think it requires a bit more of a plan than this. A change like this is > > > doing its best to hide that anything has changed, when behind the scenes > > > things have changed a lot. I think a true rearchitecting will require > > > user-facing changes, too. If we still used major-minor versioning for > > > OVN, this would be the sort of thing that would result in a major > > > version bump. > > > > I agree. > > > > Thanks > > Numan > > > > > > > > > > > On 6/25/21 7:31 PM, [email protected] wrote: > > > > From: Numan Siddique <[email protected]> > > > > > > > > This is a an RFC patch to move the logical flow generation from > > > > ovn-northd to ovn-controller. > > > > > > > > This patch doesn't move all the generation to ovn-controller. > > > > ovn-northd still does the flow generation for > > > > - ACLs/Port groups. > > > > - DHCP options > > > > - Multicast groups > > > > - And few others. > > > > > > > > Other than ACLs and Port groups it would be possible to move > > > > flow generation for the above mentioned things. But before doing > > > > all that, it is worth to evaluate if the proposed RFC makes sense. > > > > > > > > The main motivation for this RFC effort is to > > > > - Address scale issues seen. For large scale deployments > > > > ovn-northd takes lot of CPU for the computation (ovn-northd-ddlog > > > > should help here) and memory and so does Southbound > ovsdb-servers. > > > > > > > > - Having a very huge southbound DB and logical flows affects the > > > > raft consenses and it requires increasing the raft election > timers. > > > > > > > > - Logical flows contributes majorly to the overall south bound DB. > > > > > > > > This RFC demonstrates that it is possible for each ovn-controller > > > > to generate logical flows. > > > > > > > > These are some of the findings with my general and scale testing. > > > > > > > > Below are the test findings with a huge pre-existing Northbound > database with > > > > datapath groups enabled with a size of 13 M > > > > - Southbound DB size is: > > > > * with ovn-northd-master - 35 M > > > > * with ovn-northd-proposed-rfc - 12M > > > > > > > > - Number of logical flows: > > > > * with ovn-northd-master - 78581 > > > > * with ovn-northd-proposed-rfc - 7933 > > > > > > > > - RSS Memory consumption of > > > > * ovn-northd-master - 441368 KiB > > > > * ovn-northd-proposed-rfc - 115540 KiB > > > > > > > > * ovn-controller-master - 1267716 KiB > > > > * ovn-controller-proposed-rfc - 915876 KiB > > > > > > > > * SB ovsdb-server-with-ovn-master - 612296 KiB > > > > * SB ovsdb-server-with-proposed-rfc-ovn - 134680 KiB > > > > > > > > With the scale testing of 500 fake multinode nodes and each node > > > > creating having a few port bindings claimed, the end result is > > > > almost similar. No signifcant improvements seen with the proposed > > > > RFC patch. The results are identical. > > > > > > > > I think more scale testing needs to be done to determine if > > > > the CPU usage and memory usage reduction in the ovn-northd and > > > > ovsdb-servers will have a major impact or not. Testing with > > > > a real Kubernetes/Openstack deployments would help. > > > > > > > > Few Observations > > > > - It is possible to move the flow generation to each > ovn-controller. > > > > > > > > - Each ovn-controller only generates the logical flows if required > > > > i.e if the datapath is in the 'local_datapaths'. > > > > > > > > - This RFC patch do complicate ovn-controller code which has > already > > > > many complicated bits. > > > > > > > > - I was expecting the scale test results to improve and the > > > > end-to-end time of a pod/VM creation would be quicker. But that > is > > > > not the case, which is a disappointment. > > > > > > > > Submitting this RFC patch to get a feed back and have conversation > > > > if it is worth the effort. > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
