Hi Han, On 9/21/22 23:06, Han Zhou wrote: > Thanks Dumitru for this promising optimization! >
Thanks for checking it out! > On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara <[email protected]> wrote: >> >> On 8/10/22 19:54, Mark Michelson wrote: >>> Hi Dumitru, >>> >> >> Hi Mark, >> >>> I read the patch series, and I think the idea of chassis-specific >>> variables is a good idea to reduce the number of DB records for certain >>> things. Aside from load balancers, I suspect this could have a positive >>> impact for other structures as well. >>> >> >> Thanks for taking a look! Yes, I think this might be applicable to >> other structures too. >> > > I think it is a good idea to make it more generic, but for my understanding > this template concept is for anything that's "node/chassis" specific, and > supposed to be instantiated at chassis level. Probably we should name the > DB tables as something like chassis_template_var. > If we decide to go for a simple "chassis" column (instead of a generic "match" column) then naming the table Chassis_Template_Var makes sense indeed. >>> Rather than criticizing the individual lines of code, I'll focus instead >>> on some higher-level questions/ideas. >>> >> >> Sure, thanks! :) >> >>> First, one question I had was what happens when a template variable name >>> is used in a load balancer, but there is no appropriate value to >>> substitute? For instance, what if a load balancer applies to chassis-3, >>> but you only have template variables for chassis-1 and chassis-2? This >>> might be addressed in the code but I didn't notice if it was. >>> >> >> There are actually two things to consider here: >> >> 1. there might be a logical flow that uses a template variable: in this >> case if template expansion/instantiation fails we currently leave the >> token untouched (e.g., '^variable' stays '^variable'). That will cause >> the flow action/match parsing to fail and currently logs a warning. The >> flow itself is skipped, as it should be. We probably need to avoid >> logging a warning though. >> >> 2. like you pointed out, there might be a load balancer using templates >> in its backends/vips: if some of those templates cannot be instantiated >> locally the backend/vip where they're added is skipped. Unless I missed >> something, the code should already do that. >> >>> Second, it seems like template variables are a natural extension of >>> existing concepts like address sets and port groups. In those cases, >>> they were an unconditional collection of IP addresses or ports. For >> >> You're right to some extent template variables are similar to port >> groups. The southbound database port group table splits the northbound >> port group per datapath though not per chassis like template variables. >> >>> template variables, they're a collection of untyped values with the >>> condition of only applying on certain Chassis. I wonder if this could >>> all be reconciled with a single table that uses untyped values with >>> user-specified conditions. Right now template variables have a "Chassis" >>> column, but maybe this could be replaced with a broader "condition", >>> "when", or "match" column. To get something integrated quickly, this >>> column could just accept the syntax of "chassis.name == <blah>" or >>> "chassis.uuid == <blah>" to allow for chassis-specific application of >>> the values. With this foundation, we could eventually allow >>> unconditional application of the value, or more complex conditions (e.g. >>> only apply to logical switch ports that are connected to a router with a >>> distributed gateway port). Doing this, we could deprecate address sets >>> and port groups eventually in favor of template variables. >> >> This sounds like a good idea to me. I wasn't too happy with the >> "chassis" string column of the Template_Var table anyway. A generic >> condition field makes more sense. >> > If it is chassis-specific template, a column "chassis" seems to be > straightforward. With a "match" column it is another burden of parsing I have a generic implementation (with a "predicate" column) almost ready for review. I agree it's a bit more work to parse and maintain references. I think it's probably best to discuss these details once I post v1. It's no problem for me to go back to the "chassis" column version if we decide to use that approach. > (which is costly and error prone). In addition, the LB object (or other > structures) is not a logical-flow, and it doesn't directly map to > logical-flows (unlike ACLs), so I didn't understand how would a match > string be applied to the template. Is there a more detailed example of > this? Maybe I am missing something, and hope we will see more details in > the formal patch. > Load balancer VIPs are a collection of key-value pairs (vip:backends). These are currently IPs (and optionally L4 ports). We can allow LBs to support "templated" VIPs/backends. For example (this is with the WIP v1 version of the code): # Create a template LB. ovn-nbctl --template lb-add lb-test "^vip:^vport" "^backends" tcp \ -- ls-lb-add ls1 lb-test \ -- lr-lb-add rtr lb-test # Instantiate the LB template variables, for the chassis where port # vm1 is bound: ovn-nbctl \ -- create template_var name=vip value=66.66.66.66 \ predicate="is_chassis_resident(\"vm1\")" \ -- create template_var name=vport value=666 \ predicate="is_chassis_resident(\"vm1\")" \ -- create template_var name=backends value=\"42.42.42.2:4242\" \ predicate="is_chassis_resident(\"vm1\")" This is equivalent to: ovn-nbctl lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp ovn-nbctl ls-lb-add ls1 lb-test ovn-nbctl lr-lb-add lr1 lb-test >> Regarding deprecating and replacing address sets and port groups, I'm >> not sure how easy that would be but we can try it when we get to that > point. >> > > Address sets and port groups are something different in my view. Although > they can be treated as variables in a template, they are not really > chassis-specific, and each variable needs to be instantiated to a big > number of instances (sometimes huge). For this reason, fine-grained I-P > embedded in the expression parsing (for Address set) was introduced for the > performance of ovn-controller. Maybe we can say there are still some > similarities of templates, but I am not really sure if it is really helpful > to generalize them and how difficult it would be. > It doesn't look straight forward indeed. If everyone agrees, I'd suggest discussing this option (replacing AS and PG) only after/if the template support is accepted. > Thanks, > Han > Thanks, Dumitru >>> >>> Third, I was wondering if there could be some layer that exists between >>> the IDL and the application that expands the template variables as early >>> as possible. I'm thinking the application could inject some callback in >>> the IDL layer that might allow for the values to be substituted. This >>> way, the variable substitution is taken care of in a single place, and >>> by the time the application gets the data, it knows that all >>> substitutions have been made and there is no need to special case >>> template variable names vs. plain tokens. They should all be plain >>> tokens. I don't think construction of such a layer should be a barrier >>> to merging the code, but it's something worth considering as a later >>> improvement. >> >> This could work. I'll think more about it. But like you said, it's >> probably a longer term goal. It will need some significant changes in >> the IDL layer (e.g., to re-evaluate some records when template >> instantiations change). >> >>> >>> Anyway, those were my high-level thoughts on the topic. Let me know what >>> you think. >>> >> >> I can work on changing the Template_Var schema to add a broader way of >> specifying conditions (when/match/etc). I'm already working on adding >> proper nbctl support for templated load balancers and trying to tackle >> the rest of the todos. I can probably send a v1 sometime in the first >> half of next week. Do you want to share any specific code related >> comments that I should already integrate or shall we start a proper >> review when v1 gets posted? >> >> Thanks again for your input on this RFC! >> >> Regards, >> Dumitru >> >>> On 8/5/22 12:26, Dumitru Ceara wrote: >>>> Sometimes network components are compute node-specific. Sometimes such >>>> components are replicated, almost identically, for multiple nodes >>>> in the cluster. >>>> >>>> One such example is the case of Kubernetes NodePort services which >>>> translate (in the ovn-kubernetes case) to Load_Balancer >>>> objects being applied to each and every node's logical gateway router. >>>> These load balancers are almost identical, the main difference being >>>> the fact that they use different VIPs (the node's IP). >>>> >>>> With the current OVN load balancer design, this becomes a problem at >>>> scale because the number of load balancers that must be configured is >>>> N x M (N nodes times M services). >>>> >>>> This series proposes a new concept in OVN: virtual network component >>>> templates. The goal of the templates is to help reduce resource >>>> consumption in the OVN central components in specific cases like the > one >>>> described above. >>>> >>>> To achieve that, the CMS will instead configure a "templated" load >>>> balancer for every service and apply that single template record to >>>> the cluster-wide load balancer group. This template is then >>>> instantiated differently on different compute nodes. This translation >>>> is controlled through per-chassis "template variables" configured by >>>> the CMS in the new NB.Template_Var table. >>>> >>>> A syntetic benchmark simulating what an OpenShift router (using Node >>>> Port services) scale test would do shows the following preliminary >>>> results: >>>> A. 120 node, 2K NodePort services: >>>> - before: >>>> - Southbound DB size on disk (compacted): ~385MB >>>> - Southbound DB memory usage (RSS): ~3GB >>>> - Southbound DB logical flows: 720K >>>> >>>> - after: >>>> - Southbound DB size on disk (compacted): ~100MB >>>> - Southbound DB memory usage (RSS): ~250MB >>>> - Southbound DB logical flows: 6K >>>> >>>> B. 250 node, 2K NodePort services: >>>> - after (didn't run the "before" test as it was taking way too long): >>>> - Southbound DB size on disk (compacted): ~155MB >>>> - Southbound DB memory usage (RSS): ~760MB >>>> - Southbound DB logical flows: 6K >>>> >>>> The series is sent as RFC because there's still the need to add >>>> some template specific unit tests and the "ovn-nbctl lb-*" helper >>>> utilities need to be adapted to support templated load balancers. >>>> >>>> With these two items addressed the code is self can likely qualify >>>> for acceptance as a new feature in the upcoming release. >>>> >>>> There also exists a more extensive TODO list (also listed in the commit >>>> log of every patch in the series for now) but these are mainly load >>>> balancer related functionalities that are not yet implemented for >>>> templated load balancers but can definitely be implemented as follow > ups: >>>> - No support for LB health check if the LB is templated. >>>> - No support for VIP ARP responder if the LB is templated. >>>> - No support for routed VIPs if the LB is templated. >>>> - Figure out a way to deal with templates in ovn-trace >>>> - Determine if we need to allow Template_Var to match against chassis >>>> hostname or other IDs. >>>> - Make ofctrl_inject_pkt() work with template_vars. >>>> - Make test-ovn work with template_vars. >>>> >>>> A basic example of how to configure a templated load balancer follows: >>>> $ ovn-nbctl create load_balancer name=lb-test \ >>>> protocol=tcp options:template=true \ >>>> vips:\"^vip:4200\"="^backends" >>>> >>>> $ ovn-nbctl ls-add ls >>>> $ ovn-nbctl ls-lb-add ls lb-test >>>> >>>> # Instantiate the load balancer on chassis-1 >>>> $ ovn-nbctl create template_var name=vip value=80.80.80.1 >>>> chassis=chassis-1 >>>> $ ovn-nbctl create template_var name=backends >>>> value='"42.42.42.1:1000"' chassis=chassis-1 >>>> >>>> # Instantiate the load balancer on chassis-2 >>>> $ ovn-nbctl create template_var name=vip value=80.80.80.2 >>>> chassis=chassis-2 >>>> $ ovn-nbctl create template_var name=backends >>>> value='"42.42.42.2:1000"' chassis=chassis-2 >>>> >>>> Dumitru Ceara (5): >>>> Add NB and SB Template_Var tables. >>>> controller: Add support for templated actions and matches. >>>> controller: Make resource references more generic. >>>> lb: Support using templates. >>>> controller: Add Template_Var <- LB references. >>>> >>>> >>>> controller/lflow.c | 248 ++++++++++++++++++++++++-------- >>>> controller/lflow.h | 98 +++++++------ >>>> controller/ofctrl.c | 9 +- >>>> controller/ofctrl.h | 3 +- >>>> controller/ovn-controller.c | 277 > ++++++++++++++++++++++++++++++++++-- >>>> include/ovn/expr.h | 4 +- >>>> include/ovn/lex.h | 14 +- >>>> lib/actions.c | 9 +- >>>> lib/expr.c | 14 +- >>>> lib/lb.c | 201 ++++++++++++++++++++++---- >>>> lib/lb.h | 36 +++-- >>>> lib/lex.c | 55 +++++++ >>>> northd/northd.c | 64 +++++---- >>>> tests/ovn.at | 2 +- >>>> tests/test-ovn.c | 16 ++- >>>> utilities/ovn-trace.c | 26 +++- >>>> 16 files changed, 869 insertions(+), 207 deletions(-) >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
