Em qui., 2 de mai. de 2024 às 10:11, Numan Siddique <[email protected]>
escreveu:

> On Fri, Mar 22, 2024 at 7:51 AM Roberto Bartzen Acosta
> <[email protected]> wrote:
> >
> > Hi Mark,
> >
> > Thanks for your feedback.
> >
> > Em seg., 18 de mar. de 2024 às 11:53, Mark Michelson <
> [email protected]>
> > escreveu:
> >
> > > Hi Roberto,
> > >
> > > I have some concerns about this patch. Let's use the test case you
> added
> > > as an example network. Let's bind the vms and DGPs to hypervisors:
> > >
> > > * vm1 and lr1-ts1 are bound to hypervisor hv1
> > > * vm2 and lr1-ts2 are bound to hypervisor hv2
> > >
> > > Now imagine a packet arrives on lr1-ts1. The packet gets load balanced
> > > and sent to vm2. In this case, since lr1-ts1 is on hv1, the
> ct_lb_mark()
> > > action runs there, creating a conntrack entry on hv1. However, the
> > > packet eventually is tunneled to hv2 so that it can be output to vm2.
> > >
> > > Now vm2 replies to the packet. Is there anything that ensures that the
> > > reply from vm2 gets sent to hv1 for the egress pipeline of lr1? If so,
> > > then this should work fine since the packet will be unDNATted as
> > > expected on hv1. However, if the packet runs all of lr1's pipelines on
> > > hv2, then there is no conntrack entry present, and the attempt to
> unDNAT
> > > will not succeed. The packet will either be dropped because of invalid
> > > CT, or the packet will have an incorrect source IP and port. Either
> way,
> > > this isn't what is desired.
> > >
> >
> > yep, you're right! that makes sense.
> > If the packet comes back with a chassis that does not have the related LB
> > contrack entry corresponding to the initial request, this will trigger a
> > miss in the pipeline.
> >
> > I tried to disable ct tracking for lb by configuring the parameter on the
> > router, but I still wasn't successful. E.g.:
> > options:lb_force_snat_ip="172.16.200.201"
> >
> > Even changing the lb_force snat ip on the router, or creating a SNAT
> > "stateless" rule, I still see this action in the output pipeline in the
> > highest priority table (table=1).
> >
> >   table=1 (lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-01" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-01")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-02" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-02")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-03" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-03")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-04" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-04")),
> action=(ct_dnat_in_czone;)
> >   table=1 (lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src
> ==
> > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src
> ==
> > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src ==
> > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src
> ==
> > 8000)) && outport == "incus-net40-lr-lrp-ext" &&
> > is_chassis_resident("cr-incus-net40-lr-lrp-ext")),
> > action=(ct_dnat_in_czone;)
> >
> >
> > Considering that the return when using multiple DGPs is probably
> associated
> > with ECMP, any idea on how to change the rules so that the LB output
> rules
> > are 'stateless' (not ct dependent) in this case with multiple DGPs? I
> > imagine this solves the problem and guarantees a return for any chassis.
>
> I think the only way to solve your problem is to add NAT support for a
> router having multiple DGPs.
> I'm not sure how easy that is or if it is even possible to support.
> We need to explore this.
>


Hi Numan, thanks for your feedback.

I'm not convinced that adding stateful NAT support to a distributed port
would work and/or be a good solution. It seems to me that for this use
case, the NAT rules should be stateless, right?
In addition, I believe that stateless rules are already supported for load
balancers/routers configs, I just need to understand how to use them
integrated with multiple DGPs (the configuration I made doesn't seem to
have worked well).

 Do you think this (stateless NAT) would be a better option to use with
ECMP? I mean, traffic that can go in/out via any chassis is usually not
well managed by stateful NAT rules (and it doesn't make much sense in my
opinion).

@Ilya Maximets <[email protected]> , do you have any ideas here?

Kind regards,
Roberto


> Numan
>
> >
> > Thanks,
> > Roberto
> >
> > PS: I have a complex load balancer scenario for the use case with
> multiple
> > DPGs, and 'internal' VIP addresses + OVN interconnect. I can explain the
> > general context if you are interested :)
> >
> >
> > > On 2/19/24 16:33, Roberto Bartzen Acosta wrote:
> > > > This commit fixes the build_distr_lrouter_nat_flows_for_lb function
> to
> > > > include one NAT flow entry for each DGP in use. Since we have added
> > > support
> > > > to create multiple gateway ports per logical router, it's necessary
> to
> > > > include in the LR nat rules pipeline a specific entry for each
> attached
> > > > DGP. Otherwise, the ingress traffic is only redirected when the
> incoming
> > > > LRP matches the chassis_resident field.
> > > >
> > > > Considering that DNAT rules for DGPs were implemented with the need
> to
> > > > configure the DGP-related gateway-port column, the load-balancer NAT
> rule
> > > > configuration can use a similar idea. In this case, we don't know
> the LRP
> > > > responsible for the incoming traffic, and therefore we must apply the
> > > > load-balancer automatically created NAT rule in all DGPs to allow the
> > > > incoming traffic.
> > > >
> > > > Reported-at:
> https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2054322
> > > > Fixes: 15348b7b806f ("ovn-northd: Multiple distributed gateway port
> > > support.")
> > > > Signed-off-by: Roberto Bartzen Acosta <[email protected]>
> > > > ---
> > > >   northd/en-lr-stateful.c | 12 ------
> > > >   northd/northd.c         | 14 ++++---
> > > >   tests/ovn-northd.at     | 92
> +++++++++++++++++++++++++++++++++++++++++
> > > >   3 files changed, 100 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> > > > index 6d0192487..7ffa4a690 100644
> > > > --- a/northd/en-lr-stateful.c
> > > > +++ b/northd/en-lr-stateful.c
> > > > @@ -537,18 +537,6 @@ lr_stateful_record_create(struct
> lr_stateful_table
> > > *table,
> > > >
> > > >       table->array[od->index] = lr_stateful_rec;
> > > >
> > > > -    /* Load balancers are not supported (yet) if a logical router
> has
> > > multiple
> > > > -     * distributed gateway port.  Log a warning. */
> > > > -    if (lr_stateful_rec->has_lb_vip &&
> lr_has_multiple_gw_ports(od)) {
> > > > -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
> 1);
> > > > -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical
> "
> > > > -                     "router %s, which has %"PRIuSIZE" distributed "
> > > > -                     "gateway ports. Load-balancer is not supported
> "
> > > > -                     "yet when there is more than one distributed "
> > > > -                     "gateway port on the router.",
> > > > -                     od->nbr->name, od->n_l3dgw_ports);
> > > > -    }
> > > > -
> > > >       return lr_stateful_rec;
> > > >   }
> > > >
> > > > diff --git a/northd/northd.c b/northd/northd.c
> > > > index 2c3560ce2..7eb943d2f 100644
> > > > --- a/northd/northd.c
> > > > +++ b/northd/northd.c
> > > > @@ -10919,10 +10919,9 @@ static void
> > > >   build_distr_lrouter_nat_flows_for_lb(struct
> lrouter_nat_lb_flows_ctx
> > > *ctx,
> > > >                                        enum lrouter_nat_lb_flow_type
> > > type,
> > > >                                        struct ovn_datapath *od,
> > > > -                                     struct lflow_ref *lflow_ref)
> > > > +                                     struct lflow_ref *lflow_ref,
> > > > +                                     struct ovn_port *dgp)
> > > >   {
> > > > -    struct ovn_port *dgp = od->l3dgw_ports[0];
> > > > -
> > > >       const char *undnat_action;
> > > >
> > > >       switch (type) {
> > > > @@ -10953,7 +10952,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
> > > lrouter_nat_lb_flows_ctx *ctx,
> > > >
> > > >       if (ctx->lb_vip->n_backends ||
> !ctx->lb_vip->empty_backend_rej) {
> > > >           ds_put_format(ctx->new_match, " &&
> is_chassis_resident(%s)",
> > > > -                      od->l3dgw_ports[0]->cr_port->json_key);
> > > > +                      dgp->cr_port->json_key);
> > > >       }
> > > >
> > > >       ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
> > > ctx->prio,
> > > > @@ -11164,8 +11163,11 @@ build_lrouter_nat_flows_for_lb(
> > > >           if (!od->n_l3dgw_ports) {
> > > >               bitmap_set1(gw_dp_bitmap[type], index);
> > > >           } else {
> > > > -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> > > > -                                                 lb_dps->lflow_ref);
> > > > +            for (size_t i = 0; i < od->n_l3dgw_ports; i++) {
> > > > +                struct ovn_port *dgp = od->l3dgw_ports[i];
> > > > +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> > > > +
>  lb_dps->lflow_ref,
> > > dgp);
> > > > +            }
> > > >           }
> > > >
> > > >           if (lb->affinity_timeout) {
> > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > > > index 6fdd761da..fa24935e1 100644
> > > > --- a/tests/ovn-northd.at
> > > > +++ b/tests/ovn-northd.at
> > > > @@ -12313,3 +12313,95 @@ check_engine_stats northd recompute
> nocompute
> > > >   check_engine_stats lflow recompute nocompute
> > > >
> > > >   AT_CLEANUP
> > > > +
> > > > +OVN_FOR_EACH_NORTHD_NO_HV([
> > > > +AT_SETUP([Load balancer with Distributed Gateway Ports (DGP)])
> > > > +ovn_start
> > > > +
> > > > +check ovn-nbctl ls-add public
> > > > +check ovn-nbctl lr-add lr1
> > > > +
> > > > +# lr1 DGP ts1
> > > > +check ovn-nbctl ls-add ts1
> > > > +check ovn-nbctl lrp-add lr1 lr1-ts1 00:00:01:02:03:04
> 172.16.10.1/24
> > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts1 chassis-2
> > > > +
> > > > +# lr1 DGP ts2
> > > > +check ovn-nbctl ls-add ts2
> > > > +check ovn-nbctl lrp-add lr1 lr1-ts2 00:00:01:02:03:05
> 172.16.20.1/24
> > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts2 chassis-3
> > > > +
> > > > +# lr1 DGP public
> > > > +check ovn-nbctl lrp-add lr1 lr1_public 00:de:ad:ff:00:01
> 173.16.0.1/16
> > > > +check ovn-nbctl lrp-add lr1 lr1_s1 00:de:ad:fe:00:02 172.16.0.1/24
> > > > +check ovn-nbctl lrp-set-gateway-chassis lr1_public chassis-1
> > > > +
> > > > +check ovn-nbctl ls-add s1
> > > > +# s1 - lr1
> > > > +check ovn-nbctl lsp-add s1 s1_lr1
> > > > +check ovn-nbctl lsp-set-type s1_lr1 router
> > > > +check ovn-nbctl lsp-set-addresses s1_lr1 "00:de:ad:fe:00:02
> 172.16.0.1"
> > > > +check ovn-nbctl lsp-set-options s1_lr1 router-port=lr1_s1
> > > > +
> > > > +# s1 - backend vm1
> > > > +check ovn-nbctl lsp-add s1 vm1
> > > > +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01
> 172.16.0.101"
> > > > +
> > > > +# s1 - backend vm2
> > > > +check ovn-nbctl lsp-add s1 vm2
> > > > +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02
> 172.16.0.102"
> > > > +
> > > > +# s1 - backend vm3
> > > > +check ovn-nbctl lsp-add s1 vm3
> > > > +check ovn-nbctl lsp-set-addresses vm3 "00:de:ad:01:00:03
> 172.16.0.103"
> > > > +
> > > > +# Add the lr1 DGP ts1 to the public switch
> > > > +check ovn-nbctl lsp-add public public_lr1_ts1
> > > > +check ovn-nbctl lsp-set-type public_lr1_ts1 router
> > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts1 router
> > > > +check ovn-nbctl lsp-set-options public_lr1_ts1 router-port=lr1-ts1
> > > nat-addresses=router
> > > > +
> > > > +# Add the lr1 DGP ts2 to the public switch
> > > > +check ovn-nbctl lsp-add public public_lr1_ts2
> > > > +check ovn-nbctl lsp-set-type public_lr1_ts2 router
> > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts2 router
> > > > +check ovn-nbctl lsp-set-options public_lr1_ts2 router-port=lr1-ts2
> > > nat-addresses=router
> > > > +
> > > > +# Add the lr1 DGP public to the public switch
> > > > +check ovn-nbctl lsp-add public public_lr1
> > > > +check ovn-nbctl lsp-set-type public_lr1 router
> > > > +check ovn-nbctl lsp-set-addresses public_lr1 router
> > > > +check ovn-nbctl lsp-set-options public_lr1 router-port=lr1_public
> > > nat-addresses=router
> > > > +
> > > > +# Create the Load Balancer lb1
> > > > +check ovn-nbctl --wait=sb lb-add lb1 "30.0.0.1"
> > > "172.16.0.103,172.16.0.102,172.16.0.101"
> > > > +
> > > > +# Associate load balancer to s1
> > > > +check ovn-nbctl ls-lb-add s1 lb1
> > > > +check ovn-nbctl --wait=sb sync
> > > > +
> > > > +ovn-sbctl dump-flows s1 > s1flows
> > > > +AT_CAPTURE_FILE([s1flows])
> > > > +
> > > > +AT_CHECK([grep "ls_in_pre_stateful" s1flows | ovn_strip_lflows |
> grep
> > > "30.0.0.1"], [0], [dnl
> > > > +  table=??(ls_in_pre_stateful ), priority=120  , match=(reg0[[2]]
> == 1
> > > && ip4.dst == 30.0.0.1), action=(reg1 = 30.0.0.1; ct_lb_mark;)
> > > > +])
> > > > +AT_CHECK([grep "ls_in_lb" s1flows | ovn_strip_lflows | grep
> > > "30.0.0.1"], [0], [dnl
> > > > +  table=??(ls_in_lb           ), priority=110  , match=(ct.new &&
> > > ip4.dst == 30.0.0.1), action=(reg0[[1]] = 0;
> > > ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);)
> > > > +])
> > > > +
> > > > +# Associate load balancer to lr1 with DGP
> > > > +check ovn-nbctl lr-lb-add lr1 lb1
> > > > +check ovn-nbctl --wait=sb sync
> > > > +
> > > > +ovn-sbctl dump-flows lr1 > lr1flows
> > > > +AT_CAPTURE_FILE([lr1flows])
> > > > +
> > > > +AT_CHECK([grep "lr_in_dnat" lr1flows | ovn_strip_lflows | grep
> > > "30.0.0.1"], [0], [dnl
> > > > +  table=??(lr_in_dnat         ), priority=110  , match=(ct.new &&
> > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 &&
> > > is_chassis_resident("cr-lr1-ts1")),
> > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);)
> > > > +  table=??(lr_in_dnat         ), priority=110  , match=(ct.new &&
> > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 &&
> > > is_chassis_resident("cr-lr1-ts2")),
> > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);)
> > > > +  table=??(lr_in_dnat         ), priority=110  , match=(ct.new &&
> > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 &&
> > > is_chassis_resident("cr-lr1_public")),
> > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);)
> > > > +])
> > > > +
> > > > +AT_CLEANUP
> > > > +])
> > >
> > >
> >
> > --
> >
> >
> >
> >
> > _‘Esta mensagem é direcionada apenas para os endereços constantes no
> > cabeçalho inicial. Se você não está listado nos endereços constantes no
> > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> estão
> > imediatamente anuladas e proibidas’._
> >
> >
> > * **‘Apesar do Magazine Luiza tomar
> > todas as precauções razoáveis para assegurar que nenhum vírus esteja
> > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade
> por
> > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
> >
> >
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to