On Fri, Sep 23, 2022 at 8:10 AM Dumitru Ceara <[email protected]> wrote:
>
> On 9/22/22 19:55, Han Zhou wrote:
> > On Thu, Sep 22, 2022 at 10:38 AM Han Zhou <[email protected]> wrote:
> >>
> >>
> >>
> >> On Thu, Sep 22, 2022 at 1:00 AM Dumitru Ceara <[email protected]>
wrote:
> >>>
> >>> Hi Han,
> >>>
> >>> On 9/21/22 23:06, Han Zhou wrote:
> >>>> Thanks Dumitru for this promising optimization!
> >>>>
> >>>
> >>> Thanks for checking it out!
> >>>
> >>>> On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara <[email protected]>
> > wrote:
> >>>>>
> >>>>> On 8/10/22 19:54, Mark Michelson wrote:
> >>>>>> Hi Dumitru,
> >>>>>>
> >>>>>
> >>>>> Hi Mark,
> >>>>>
> >>>>>> I read the patch series, and I think the idea of chassis-specific
> >>>>>> variables is a good idea to reduce the number of DB records for
> > certain
> >>>>>> things. Aside from load balancers, I suspect this could have a
> > positive
> >>>>>> impact for other structures as well.
> >>>>>>
> >>>>>
> >>>>> Thanks for taking a look!  Yes, I think this might be applicable to
> >>>>> other structures too.
> >>>>>
> >>>>
> >>>> I think it is a good idea to make it more generic, but for my
> > understanding
> >>>> this template concept is for anything that's "node/chassis" specific,
> > and
> >>>> supposed to be instantiated at chassis level. Probably we should name
> > the
> >>>> DB tables as something like chassis_template_var.
> >>>>
> >>>
> >>> If we decide to go for a simple "chassis" column (instead of a generic
> >>> "match" column) then naming the table Chassis_Template_Var makes sense
> >>> indeed.
> >>>
> >>>>>> Rather than criticizing the individual lines of code, I'll focus
> > instead
> >>>>>> on some higher-level questions/ideas.
> >>>>>>
> >>>>>
> >>>>> Sure, thanks! :)
> >>>>>
> >>>>>> First, one question I had was what happens when a template variable
> > name
> >>>>>> is used in a load balancer, but there is no appropriate value to
> >>>>>> substitute? For instance, what if a load balancer applies to
> > chassis-3,
> >>>>>> but you only have template variables for chassis-1 and chassis-2?
> > This
> >>>>>> might be addressed in the code but I didn't notice if it was.
> >>>>>>
> >>>>>
> >>>>> There are actually two things to consider here:
> >>>>>
> >>>>> 1. there might be a logical flow that uses a template variable: in
> > this
> >>>>> case if template expansion/instantiation fails we currently leave
the
> >>>>> token untouched (e.g., '^variable' stays '^variable').  That will
> > cause
> >>>>> the flow action/match parsing to fail and currently logs a warning.
> > The
> >>>>> flow itself is skipped, as it should be.  We probably need to avoid
> >>>>> logging a warning though.
> >>>>>
> >>>>> 2. like you pointed out, there might be a load balancer using
> > templates
> >>>>> in its backends/vips: if some of those templates cannot be
> > instantiated
> >>>>> locally the backend/vip where they're added is skipped.  Unless I
> > missed
> >>>>> something, the code should already do that.
> >>>>>
> >>>>>> Second, it seems like template variables are a natural extension of
> >>>>>> existing concepts like address sets and port groups. In those
cases,
> >>>>>> they were an unconditional collection of IP addresses or ports. For
> >>>>>
> >>>>> You're right to some extent template variables are similar to port
> >>>>> groups.  The southbound database port group table splits the
> > northbound
> >>>>> port group per datapath though not per chassis like template
> > variables.
> >>>>>
> >>>>>> template variables, they're a collection of untyped values with the
> >>>>>> condition of only applying on certain Chassis. I wonder if this
> > could
> >>>>>> all be reconciled with a single table that uses untyped values with
> >>>>>> user-specified conditions. Right now template variables have a
> > "Chassis"
> >>>>>> column, but maybe this could be replaced with a broader
"condition",
> >>>>>> "when", or "match" column. To get something integrated quickly,
this
> >>>>>> column could just accept the syntax of "chassis.name == <blah>" or
> >>>>>> "chassis.uuid == <blah>" to allow for chassis-specific application
> > of
> >>>>>> the values. With this foundation, we could eventually allow
> >>>>>> unconditional application of the value, or more complex conditions
> > (e.g.
> >>>>>> only apply to logical switch ports that are connected to a router
> > with a
> >>>>>> distributed gateway port). Doing this, we could deprecate address
> > sets
> >>>>>> and port groups eventually in favor of template variables.
> >>>>>
> >>>>> This sounds like a good idea to me.  I wasn't too happy with the
> >>>>> "chassis" string column of the Template_Var table anyway.  A generic
> >>>>> condition field makes more sense.
> >>>>>
> >>>> If it is chassis-specific template, a column "chassis" seems to be
> >>>> straightforward. With a "match" column it is another burden of
parsing
> >>>
> >>> I have a generic implementation (with a "predicate" column) almost
ready
> >>> for review.  I agree it's a bit more work to parse and maintain
> >>> references.  I think it's probably best to discuss these details once
I
> >>> post v1.  It's no problem for me to go back to the "chassis" column
> >>> version if we decide to use that approach.
> >>>
> >>>> (which is costly and error prone). In addition, the LB object (or
> > other
> >>>> structures) is not a logical-flow, and it doesn't directly map to
> >>>> logical-flows (unlike ACLs), so I didn't understand how would a match
> >>>> string be applied to the template. Is there a more detailed example
of
> >>>> this? Maybe I am missing something, and hope we will see more details
> > in
> >>>> the formal patch.
> >>>>
> >>>
> >>> Load balancer VIPs are a collection of key-value pairs (vip:backends).
> >>> These are currently IPs (and optionally L4 ports).  We can allow LBs
to
> >>> support "templated" VIPs/backends.  For example (this is with the WIP
v1
> >>> version of the code):
> >>>
> >>> # Create a template LB.
> >>> ovn-nbctl --template lb-add lb-test "^vip:^vport" "^backends" tcp \
> >>>     -- ls-lb-add ls1 lb-test                                      \
> >>>     -- lr-lb-add rtr lb-test
> >>>
> >>> # Instantiate the LB template variables, for the chassis where port
> >>> # vm1 is bound:
> >>> ovn-nbctl                                                          \
> >>>     -- create template_var name=vip value=66.66.66.66              \
> >>>               predicate="is_chassis_resident(\"vm1\")"             \
> >>>     -- create template_var name=vport value=666                    \
> >>>               predicate="is_chassis_resident(\"vm1\")"             \
> >>>     -- create template_var name=backends value=\"42.42.42.2:4242\" \
> >>>               predicate="is_chassis_resident(\"vm1\")"
> >>>
> >>> This is equivalent to:
> >>> ovn-nbctl lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp
> >>> ovn-nbctl ls-lb-add ls1 lb-test
> >>> ovn-nbctl lr-lb-add lr1 lb-test
> >>>
> >>
> >> Thanks, this helps. I guess we wouldn't want to use backends as
variables
> > in the real use cases, right? Otherwise, creating backends value for
each
> > chassis sounds not helpful for the scale. So, if I understand correctly,
> > what we would do is:
> >>
> >> # Create a template LB.
> >> ovn-nbctl --template lb-add lb-test "^vip:<the real port>" "<real
> > backends>" tcp \
> >>     -- ls-lb-add ls1 lb-test                                      \
> >>     -- lr-lb-add rtr lb-test
> >>
> >> # Instantiate the LB template variables, for the chassis where DGPs are
> > bound:
> >> ovn-nbctl                                                          \
> >>     -- create template_var name=vip value=66.66.66.66              \
> >>               predicate="is_chassis_resident(\"lrp1\")"             \
> >>     -- create template_var name=vip value=66.66.66.67              \
> >>               predicate="is_chassis_resident(\"lrp2\")"             \
> >>     -- create template_var name=vip value=66.66.66.68              \
> >>               predicate="is_chassis_resident(\"lrp3\")"
> >>
>
> It can be that we need to use backends as variables too.  In the worst
> case, we still probably end up using less resources than with explicit
> load balancers.  But yes, very likely in a lot of cases with ovn-k8s
> deployments we'll be able to define load balancers like you did above.
>
> >> Now that in ovn-k8s since we use DGP to pin LS to chassis, the lrp1,
2, 3
> > resides on different nodes, so the "vip" var is instantiated to the
> > chassis-specific value. So this effectively binds the LB VIP to chassis
> > without explicit "chassis" column. Did I understand the magic correctly?
>
> That's the idea, yes.
>
> >> If so, the predicate needs to be in logical flow match format, and the
> > string will be directly used as match condition incorporated to related
> > logical flows, right? One problem though, is that for DGPs,
> > is_chassis_resident needs to evaluate the CR port "cr-xxx", which is
> > internally constructed by OVN, and is not exposed to CMS. Once the
> > implementation changes, the predicate wouldn't work.
>
> True.  I was thinking of using the logical router port of the GW router
> in ovn-k8s though.
>
> >> (I am looking forward to seeing your v1, but I think this discussion
> > should help me understand the concept before going to the review :)
> >>
> >
> > Sorry, one more point. With the "chassis" column, it is easy for
> > conditional monitoring, considering that only a small number of values
are
> > useful for a specific chassis. The predicate alone wouldn't tell this
>
> This is a very good point.  Conditional monitoring will be
hard/impossible.
>
> > information. If we go with the predicate approach, I think it may be
> > helpful to add another column to tell the datapath information (in NB,
use
> > LS/LR), so that it can be converted to DP in SB and be conditionally
> > monitored (reusing the existing ovn-controller "local datapath"
mechanism).
> > It can be "any", to indicate something that applies to multiple/all
DPs, if
> > useful for the future use cases.
>
> I wonder if this is not too complex.  Maybe we should just go to the
> "chassis" column approach instead.  To argue against chassis not being a
> NB OVN entity, we already have NB tables that refer to chassis:
> HA_Chassis_Group, HA_Chassis, Gateway_Chassis.
>
I am not too worried about the chassis not being a NB entity either. I
think it is more of a tradeoff between flexibility and simplicity. The
predicate approach can support something not related to chassis, such as:
"instantiate the backends to value abc for ip.src == xxx". Maybe it is not
a meaningful example, and I am not really sure if it is something useful in
the future at all. So, just my 2 cents - for the purpose of this series, if
we don't want to make it too complex, maybe "chassis" is a better option.
If in the future we do need more flexibility, we can still introduce the
predicate, although we may also need to take care of the backward
compatibility/upgrade, if that's not too bad. Of course maybe I am
short-sighted. I'd like to hear if anyone comes up with some realistic use
cases that require the flexible approach, and then we will need to figure
out a way for cond-monitoring.

Thanks,
Han

> >
> >> Thanks,
> >> Han
> >>
> >>>>> Regarding deprecating and replacing address sets and port groups,
I'm
> >>>>> not sure how easy that would be but we can try it when we get to
that
> >>>> point.
> >>>>>
> >>>>
> >>>> Address sets and port groups are something different in my view.
> > Although
> >>>> they can be treated as variables in a template, they are not really
> >>>> chassis-specific, and each variable needs to be instantiated to a big
> >>>> number of instances (sometimes huge). For this reason, fine-grained
> > I-P
> >>>> embedded in the expression parsing (for Address set) was introduced
> > for the
> >>>> performance of ovn-controller. Maybe we can say there are still some
> >>>> similarities of templates, but I am not really sure if it is really
> > helpful
> >>>> to generalize them and how difficult it would be.
> >>>>
> >>>
> >>> It doesn't look straight forward indeed.  If everyone agrees, I'd
> >>> suggest discussing this option (replacing AS and PG) only after/if the
> >>> template support is accepted.
> >>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>
> >>> Thanks,
> >>> Dumitru
> >>>
> >>>>>>
> >>>>>> Third, I was wondering if there could be some layer that exists
> > between
> >>>>>> the IDL and the application that expands the template variables as
> > early
> >>>>>> as possible. I'm thinking the application could inject some
> > callback in
> >>>>>> the IDL layer that might allow for the values to be substituted.
> > This
> >>>>>> way, the variable substitution is taken care of in a single place,
> > and
> >>>>>> by the time the application gets the data, it knows that all
> >>>>>> substitutions have been made and there is no need to special case
> >>>>>> template variable names vs. plain tokens. They should all be plain
> >>>>>> tokens. I don't think construction of such a layer should be a
> > barrier
> >>>>>> to merging the code, but it's something worth considering as a
later
> >>>>>> improvement.
> >>>>>
> >>>>> This could work.  I'll think more about it.  But like you said, it's
> >>>>> probably a longer term goal.  It will need some significant changes
> > in
> >>>>> the IDL layer (e.g., to re-evaluate some records when template
> >>>>> instantiations change).
> >>>>>
> >>>>>>
> >>>>>> Anyway, those were my high-level thoughts on the topic. Let me know
> > what
> >>>>>> you think.
> >>>>>>
> >>>>>
> >>>>> I can work on changing the Template_Var schema to add a broader way
> > of
> >>>>> specifying conditions (when/match/etc).  I'm already working on
> > adding
> >>>>> proper nbctl support for templated load balancers and trying to
> > tackle
> >>>>> the rest of the todos.  I can probably send a v1 sometime in the
> > first
> >>>>> half of next week.  Do you want to share any specific code related
> >>>>> comments that I should already integrate or shall we start a proper
> >>>>> review when v1 gets posted?
> >>>>>
> >>>>> Thanks again for your input on this RFC!
> >>>>>
> >>>>> Regards,
> >>>>> Dumitru
> >>>>>
> >>>>>> On 8/5/22 12:26, Dumitru Ceara wrote:
> >>>>>>> Sometimes network components are compute node-specific.  Sometimes
> > such
> >>>>>>> components are replicated, almost identically, for multiple nodes
> >>>>>>> in the cluster.
> >>>>>>>
> >>>>>>> One such example is the case of Kubernetes NodePort services which
> >>>>>>> translate (in the ovn-kubernetes case) to Load_Balancer
> >>>>>>> objects being applied to each and every node's logical gateway
> > router.
> >>>>>>> These load balancers are almost identical, the main difference
> > being
> >>>>>>> the fact that they use different VIPs (the node's IP).
> >>>>>>>
> >>>>>>> With the current OVN load balancer design, this becomes a problem
> > at
> >>>>>>> scale because the number of load balancers that must be configured
> > is
> >>>>>>> N x M (N nodes times M services).
> >>>>>>>
> >>>>>>> This series proposes a new concept in OVN: virtual network
> > component
> >>>>>>> templates.  The goal of the templates is to help reduce resource
> >>>>>>> consumption in the OVN central components in specific cases like
> > the
> >>>> one
> >>>>>>> described above.
> >>>>>>>
> >>>>>>> To achieve that, the CMS will instead configure a "templated" load
> >>>>>>> balancer for every service and apply that single template record
to
> >>>>>>> the cluster-wide load balancer group.  This template is then
> >>>>>>> instantiated differently on different compute nodes.  This
> > translation
> >>>>>>> is controlled through per-chassis "template variables" configured
> > by
> >>>>>>> the CMS in the new NB.Template_Var table.
> >>>>>>>
> >>>>>>> A syntetic benchmark simulating what an OpenShift router (using
> > Node
> >>>>>>> Port services) scale test would do shows the following preliminary
> >>>>>>> results:
> >>>>>>> A. 120 node, 2K NodePort services:
> >>>>>>> - before:
> >>>>>>>    - Southbound DB size on disk (compacted): ~385MB
> >>>>>>>    - Southbound DB memory usage (RSS): ~3GB
> >>>>>>>    - Southbound DB logical flows: 720K
> >>>>>>>
> >>>>>>> - after:
> >>>>>>>    - Southbound DB size on disk (compacted): ~100MB
> >>>>>>>    - Southbound DB memory usage (RSS): ~250MB
> >>>>>>>    - Southbound DB logical flows: 6K
> >>>>>>>
> >>>>>>> B. 250 node, 2K NodePort services:
> >>>>>>> - after (didn't run the "before" test as it was taking way too
> > long):
> >>>>>>>    - Southbound DB size on disk (compacted): ~155MB
> >>>>>>>    - Southbound DB memory usage (RSS): ~760MB
> >>>>>>>    - Southbound DB logical flows: 6K
> >>>>>>>
> >>>>>>> The series is sent as RFC because there's still the need to add
> >>>>>>> some template specific unit tests and the "ovn-nbctl lb-*" helper
> >>>>>>> utilities need to be adapted to support templated load balancers.
> >>>>>>>
> >>>>>>> With these two items addressed the code is self can likely qualify
> >>>>>>> for acceptance as a new feature in the upcoming release.
> >>>>>>>
> >>>>>>> There also exists a more extensive TODO list (also listed in the
> > commit
> >>>>>>> log of every patch in the series for now) but these are mainly
load
> >>>>>>> balancer related functionalities that are not yet implemented for
> >>>>>>> templated load balancers but can definitely be implemented as
> > follow
> >>>> ups:
> >>>>>>> - No support for LB health check if the LB is templated.
> >>>>>>> - No support for VIP ARP responder if the LB is templated.
> >>>>>>> - No support for routed VIPs if the LB is templated.
> >>>>>>> - Figure out a way to deal with templates in ovn-trace
> >>>>>>> - Determine if we need to allow Template_Var to match against
> > chassis
> >>>>>>>    hostname or other IDs.
> >>>>>>> - Make ofctrl_inject_pkt() work with template_vars.
> >>>>>>> - Make test-ovn work with template_vars.
> >>>>>>>
> >>>>>>> A basic example of how to configure a templated load balancer
> > follows:
> >>>>>>>    $ ovn-nbctl create load_balancer name=lb-test \
> >>>>>>>        protocol=tcp options:template=true \
> >>>>>>>        vips:\"^vip:4200\"="^backends"
> >>>>>>>
> >>>>>>>    $ ovn-nbctl ls-add ls
> >>>>>>>    $ ovn-nbctl ls-lb-add ls lb-test
> >>>>>>>
> >>>>>>>    # Instantiate the load balancer on chassis-1
> >>>>>>>    $ ovn-nbctl create template_var name=vip value=80.80.80.1
> >>>>>>> chassis=chassis-1
> >>>>>>>    $ ovn-nbctl create template_var name=backends
> >>>>>>> value='"42.42.42.1:1000"' chassis=chassis-1
> >>>>>>>
> >>>>>>>    # Instantiate the load balancer on chassis-2
> >>>>>>>    $ ovn-nbctl create template_var name=vip value=80.80.80.2
> >>>>>>> chassis=chassis-2
> >>>>>>>    $ ovn-nbctl create template_var name=backends
> >>>>>>> value='"42.42.42.2:1000"' chassis=chassis-2
> >>>>>>>
> >>>>>>> Dumitru Ceara (5):
> >>>>>>>        Add NB and SB Template_Var tables.
> >>>>>>>        controller: Add support for templated actions and matches.
> >>>>>>>        controller: Make resource references more generic.
> >>>>>>>        lb: Support using templates.
> >>>>>>>        controller: Add Template_Var <- LB references.
> >>>>>>>
> >>>>>>>
> >>>>>>>   controller/lflow.c          | 248
> > ++++++++++++++++++++++++--------
> >>>>>>>   controller/lflow.h          |  98 +++++++------
> >>>>>>>   controller/ofctrl.c         |   9 +-
> >>>>>>>   controller/ofctrl.h         |   3 +-
> >>>>>>>   controller/ovn-controller.c | 277
> >>>> ++++++++++++++++++++++++++++++++++--
> >>>>>>>   include/ovn/expr.h          |   4 +-
> >>>>>>>   include/ovn/lex.h           |  14 +-
> >>>>>>>   lib/actions.c               |   9 +-
> >>>>>>>   lib/expr.c                  |  14 +-
> >>>>>>>   lib/lb.c                    | 201 ++++++++++++++++++++++----
> >>>>>>>   lib/lb.h                    |  36 +++--
> >>>>>>>   lib/lex.c                   |  55 +++++++
> >>>>>>>   northd/northd.c             |  64 +++++----
> >>>>>>>   tests/ovn.at                |   2 +-
> >>>>>>>   tests/test-ovn.c            |  16 ++-
> >>>>>>>   utilities/ovn-trace.c       |  26 +++-
> >>>>>>>   16 files changed, 869 insertions(+), 207 deletions(-)
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> dev mailing list
> >>>>>>> [email protected]
> >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> [email protected]
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>
> >>>
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to