Hi Han,

On 9/21/22 23:06, Han Zhou wrote:
> Thanks Dumitru for this promising optimization!
> 

Thanks for checking it out!

> On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 8/10/22 19:54, Mark Michelson wrote:
>>> Hi Dumitru,
>>>
>>
>> Hi Mark,
>>
>>> I read the patch series, and I think the idea of chassis-specific
>>> variables is a good idea to reduce the number of DB records for certain
>>> things. Aside from load balancers, I suspect this could have a positive
>>> impact for other structures as well.
>>>
>>
>> Thanks for taking a look!  Yes, I think this might be applicable to
>> other structures too.
>>
> 
> I think it is a good idea to make it more generic, but for my understanding
> this template concept is for anything that's "node/chassis" specific, and
> supposed to be instantiated at chassis level. Probably we should name the
> DB tables as something like chassis_template_var.
> 

If we decide to go for a simple "chassis" column (instead of a generic
"match" column) then naming the table Chassis_Template_Var makes sense
indeed.

>>> Rather than criticizing the individual lines of code, I'll focus instead
>>> on some higher-level questions/ideas.
>>>
>>
>> Sure, thanks! :)
>>
>>> First, one question I had was what happens when a template variable name
>>> is used in a load balancer, but there is no appropriate value to
>>> substitute? For instance, what if a load balancer applies to chassis-3,
>>> but you only have template variables for chassis-1 and chassis-2? This
>>> might be addressed in the code but I didn't notice if it was.
>>>
>>
>> There are actually two things to consider here:
>>
>> 1. there might be a logical flow that uses a template variable: in this
>> case if template expansion/instantiation fails we currently leave the
>> token untouched (e.g., '^variable' stays '^variable').  That will cause
>> the flow action/match parsing to fail and currently logs a warning.  The
>> flow itself is skipped, as it should be.  We probably need to avoid
>> logging a warning though.
>>
>> 2. like you pointed out, there might be a load balancer using templates
>> in its backends/vips: if some of those templates cannot be instantiated
>> locally the backend/vip where they're added is skipped.  Unless I missed
>> something, the code should already do that.
>>
>>> Second, it seems like template variables are a natural extension of
>>> existing concepts like address sets and port groups. In those cases,
>>> they were an unconditional collection of IP addresses or ports. For
>>
>> You're right to some extent template variables are similar to port
>> groups.  The southbound database port group table splits the northbound
>> port group per datapath though not per chassis like template variables.
>>
>>> template variables, they're a collection of untyped values with the
>>> condition of only applying on certain Chassis. I wonder if this could
>>> all be reconciled with a single table that uses untyped values with
>>> user-specified conditions. Right now template variables have a "Chassis"
>>> column, but maybe this could be replaced with a broader "condition",
>>> "when", or "match" column. To get something integrated quickly, this
>>> column could just accept the syntax of "chassis.name == <blah>" or
>>> "chassis.uuid == <blah>" to allow for chassis-specific application of
>>> the values. With this foundation, we could eventually allow
>>> unconditional application of the value, or more complex conditions (e.g.
>>> only apply to logical switch ports that are connected to a router with a
>>> distributed gateway port). Doing this, we could deprecate address sets
>>> and port groups eventually in favor of template variables.
>>
>> This sounds like a good idea to me.  I wasn't too happy with the
>> "chassis" string column of the Template_Var table anyway.  A generic
>> condition field makes more sense.
>>
> If it is chassis-specific template, a column "chassis" seems to be
> straightforward. With a "match" column it is another burden of parsing

I have a generic implementation (with a "predicate" column) almost ready
for review.  I agree it's a bit more work to parse and maintain
references.  I think it's probably best to discuss these details once I
post v1.  It's no problem for me to go back to the "chassis" column
version if we decide to use that approach.

> (which is costly and error prone). In addition, the LB object (or other
> structures) is not a logical-flow, and it doesn't directly map to
> logical-flows (unlike ACLs), so I didn't understand how would a match
> string be applied to the template. Is there a more detailed example of
> this? Maybe I am missing something, and hope we will see more details in
> the formal patch.
> 

Load balancer VIPs are a collection of key-value pairs (vip:backends).
These are currently IPs (and optionally L4 ports).  We can allow LBs to
support "templated" VIPs/backends.  For example (this is with the WIP v1
version of the code):

# Create a template LB.
ovn-nbctl --template lb-add lb-test "^vip:^vport" "^backends" tcp \
    -- ls-lb-add ls1 lb-test                                      \
    -- lr-lb-add rtr lb-test

# Instantiate the LB template variables, for the chassis where port
# vm1 is bound:
ovn-nbctl                                                          \
    -- create template_var name=vip value=66.66.66.66              \
              predicate="is_chassis_resident(\"vm1\")"             \
    -- create template_var name=vport value=666                    \
              predicate="is_chassis_resident(\"vm1\")"             \
    -- create template_var name=backends value=\"42.42.42.2:4242\" \
              predicate="is_chassis_resident(\"vm1\")"

This is equivalent to:
ovn-nbctl lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp
ovn-nbctl ls-lb-add ls1 lb-test
ovn-nbctl lr-lb-add lr1 lb-test

>> Regarding deprecating and replacing address sets and port groups, I'm
>> not sure how easy that would be but we can try it when we get to that
> point.
>>
> 
> Address sets and port groups are something different in my view. Although
> they can be treated as variables in a template, they are not really
> chassis-specific, and each variable needs to be instantiated to a big
> number of instances (sometimes huge). For this reason, fine-grained I-P
> embedded in the expression parsing (for Address set) was introduced for the
> performance of ovn-controller. Maybe we can say there are still some
> similarities of templates, but I am not really sure if it is really helpful
> to generalize them and how difficult it would be.
> 

It doesn't look straight forward indeed.  If everyone agrees, I'd
suggest discussing this option (replacing AS and PG) only after/if the
template support is accepted.

> Thanks,
> Han
> 

Thanks,
Dumitru

>>>
>>> Third, I was wondering if there could be some layer that exists between
>>> the IDL and the application that expands the template variables as early
>>> as possible. I'm thinking the application could inject some callback in
>>> the IDL layer that might allow for the values to be substituted. This
>>> way, the variable substitution is taken care of in a single place, and
>>> by the time the application gets the data, it knows that all
>>> substitutions have been made and there is no need to special case
>>> template variable names vs. plain tokens. They should all be plain
>>> tokens. I don't think construction of such a layer should be a barrier
>>> to merging the code, but it's something worth considering as a later
>>> improvement.
>>
>> This could work.  I'll think more about it.  But like you said, it's
>> probably a longer term goal.  It will need some significant changes in
>> the IDL layer (e.g., to re-evaluate some records when template
>> instantiations change).
>>
>>>
>>> Anyway, those were my high-level thoughts on the topic. Let me know what
>>> you think.
>>>
>>
>> I can work on changing the Template_Var schema to add a broader way of
>> specifying conditions (when/match/etc).  I'm already working on adding
>> proper nbctl support for templated load balancers and trying to tackle
>> the rest of the todos.  I can probably send a v1 sometime in the first
>> half of next week.  Do you want to share any specific code related
>> comments that I should already integrate or shall we start a proper
>> review when v1 gets posted?
>>
>> Thanks again for your input on this RFC!
>>
>> Regards,
>> Dumitru
>>
>>> On 8/5/22 12:26, Dumitru Ceara wrote:
>>>> Sometimes network components are compute node-specific.  Sometimes such
>>>> components are replicated, almost identically, for multiple nodes
>>>> in the cluster.
>>>>
>>>> One such example is the case of Kubernetes NodePort services which
>>>> translate (in the ovn-kubernetes case) to Load_Balancer
>>>> objects being applied to each and every node's logical gateway router.
>>>> These load balancers are almost identical, the main difference being
>>>> the fact that they use different VIPs (the node's IP).
>>>>
>>>> With the current OVN load balancer design, this becomes a problem at
>>>> scale because the number of load balancers that must be configured is
>>>> N x M (N nodes times M services).
>>>>
>>>> This series proposes a new concept in OVN: virtual network component
>>>> templates.  The goal of the templates is to help reduce resource
>>>> consumption in the OVN central components in specific cases like the
> one
>>>> described above.
>>>>
>>>> To achieve that, the CMS will instead configure a "templated" load
>>>> balancer for every service and apply that single template record to
>>>> the cluster-wide load balancer group.  This template is then
>>>> instantiated differently on different compute nodes.  This translation
>>>> is controlled through per-chassis "template variables" configured by
>>>> the CMS in the new NB.Template_Var table.
>>>>
>>>> A syntetic benchmark simulating what an OpenShift router (using Node
>>>> Port services) scale test would do shows the following preliminary
>>>> results:
>>>> A. 120 node, 2K NodePort services:
>>>> - before:
>>>>    - Southbound DB size on disk (compacted): ~385MB
>>>>    - Southbound DB memory usage (RSS): ~3GB
>>>>    - Southbound DB logical flows: 720K
>>>>
>>>> - after:
>>>>    - Southbound DB size on disk (compacted): ~100MB
>>>>    - Southbound DB memory usage (RSS): ~250MB
>>>>    - Southbound DB logical flows: 6K
>>>>
>>>> B. 250 node, 2K NodePort services:
>>>> - after (didn't run the "before" test as it was taking way too long):
>>>>    - Southbound DB size on disk (compacted): ~155MB
>>>>    - Southbound DB memory usage (RSS): ~760MB
>>>>    - Southbound DB logical flows: 6K
>>>>
>>>> The series is sent as RFC because there's still the need to add
>>>> some template specific unit tests and the "ovn-nbctl lb-*" helper
>>>> utilities need to be adapted to support templated load balancers.
>>>>
>>>> With these two items addressed the code is self can likely qualify
>>>> for acceptance as a new feature in the upcoming release.
>>>>
>>>> There also exists a more extensive TODO list (also listed in the commit
>>>> log of every patch in the series for now) but these are mainly load
>>>> balancer related functionalities that are not yet implemented for
>>>> templated load balancers but can definitely be implemented as follow
> ups:
>>>> - No support for LB health check if the LB is templated.
>>>> - No support for VIP ARP responder if the LB is templated.
>>>> - No support for routed VIPs if the LB is templated.
>>>> - Figure out a way to deal with templates in ovn-trace
>>>> - Determine if we need to allow Template_Var to match against chassis
>>>>    hostname or other IDs.
>>>> - Make ofctrl_inject_pkt() work with template_vars.
>>>> - Make test-ovn work with template_vars.
>>>>
>>>> A basic example of how to configure a templated load balancer follows:
>>>>    $ ovn-nbctl create load_balancer name=lb-test \
>>>>        protocol=tcp options:template=true \
>>>>        vips:\"^vip:4200\"="^backends"
>>>>
>>>>    $ ovn-nbctl ls-add ls
>>>>    $ ovn-nbctl ls-lb-add ls lb-test
>>>>
>>>>    # Instantiate the load balancer on chassis-1
>>>>    $ ovn-nbctl create template_var name=vip value=80.80.80.1
>>>> chassis=chassis-1
>>>>    $ ovn-nbctl create template_var name=backends
>>>> value='"42.42.42.1:1000"' chassis=chassis-1
>>>>
>>>>    # Instantiate the load balancer on chassis-2
>>>>    $ ovn-nbctl create template_var name=vip value=80.80.80.2
>>>> chassis=chassis-2
>>>>    $ ovn-nbctl create template_var name=backends
>>>> value='"42.42.42.2:1000"' chassis=chassis-2
>>>>
>>>> Dumitru Ceara (5):
>>>>        Add NB and SB Template_Var tables.
>>>>        controller: Add support for templated actions and matches.
>>>>        controller: Make resource references more generic.
>>>>        lb: Support using templates.
>>>>        controller: Add Template_Var <- LB references.
>>>>
>>>>
>>>>   controller/lflow.c          | 248 ++++++++++++++++++++++++--------
>>>>   controller/lflow.h          |  98 +++++++------
>>>>   controller/ofctrl.c         |   9 +-
>>>>   controller/ofctrl.h         |   3 +-
>>>>   controller/ovn-controller.c | 277
> ++++++++++++++++++++++++++++++++++--
>>>>   include/ovn/expr.h          |   4 +-
>>>>   include/ovn/lex.h           |  14 +-
>>>>   lib/actions.c               |   9 +-
>>>>   lib/expr.c                  |  14 +-
>>>>   lib/lb.c                    | 201 ++++++++++++++++++++++----
>>>>   lib/lb.h                    |  36 +++--
>>>>   lib/lex.c                   |  55 +++++++
>>>>   northd/northd.c             |  64 +++++----
>>>>   tests/ovn.at                |   2 +-
>>>>   tests/test-ovn.c            |  16 ++-
>>>>   utilities/ovn-trace.c       |  26 +++-
>>>>   16 files changed, 869 insertions(+), 207 deletions(-)
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to