Em qui., 2 de mai. de 2024 às 10:11, Numan Siddique <[email protected]> escreveu:
> On Fri, Mar 22, 2024 at 7:51 AM Roberto Bartzen Acosta > <[email protected]> wrote: > > > > Hi Mark, > > > > Thanks for your feedback. > > > > Em seg., 18 de mar. de 2024 às 11:53, Mark Michelson < > [email protected]> > > escreveu: > > > > > Hi Roberto, > > > > > > I have some concerns about this patch. Let's use the test case you > added > > > as an example network. Let's bind the vms and DGPs to hypervisors: > > > > > > * vm1 and lr1-ts1 are bound to hypervisor hv1 > > > * vm2 and lr1-ts2 are bound to hypervisor hv2 > > > > > > Now imagine a packet arrives on lr1-ts1. The packet gets load balanced > > > and sent to vm2. In this case, since lr1-ts1 is on hv1, the > ct_lb_mark() > > > action runs there, creating a conntrack entry on hv1. However, the > > > packet eventually is tunneled to hv2 so that it can be output to vm2. > > > > > > Now vm2 replies to the packet. Is there anything that ensures that the > > > reply from vm2 gets sent to hv1 for the egress pipeline of lr1? If so, > > > then this should work fine since the packet will be unDNATted as > > > expected on hv1. However, if the packet runs all of lr1's pipelines on > > > hv2, then there is no conntrack entry present, and the attempt to > unDNAT > > > will not succeed. The packet will either be dropped because of invalid > > > CT, or the packet will have an incorrect source IP and port. Either > way, > > > this isn't what is desired. > > > > > > > yep, you're right! that makes sense. > > If the packet comes back with a chassis that does not have the related LB > > contrack entry corresponding to the initial request, this will trigger a > > miss in the pipeline. > > > > I tried to disable ct tracking for lb by configuring the parameter on the > > router, but I still wasn't successful. E.g.: > > options:lb_force_snat_ip="172.16.200.201" > > > > Even changing the lb_force snat ip on the router, or creating a SNAT > > "stateless" rule, I still see this action in the output pipeline in the > > highest priority table (table=1). > > > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src > == > > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src > == > > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src > == > > 8000)) && outport == "incus-net40-lr-lrp-01" && > > is_chassis_resident("cr-incus-net40-lr-lrp-01")), > action=(ct_dnat_in_czone;) > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src > == > > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src > == > > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src > == > > 8000)) && outport == "incus-net40-lr-lrp-02" && > > is_chassis_resident("cr-incus-net40-lr-lrp-02")), > action=(ct_dnat_in_czone;) > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src > == > > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src > == > > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src > == > > 8000)) && outport == "incus-net40-lr-lrp-03" && > > is_chassis_resident("cr-incus-net40-lr-lrp-03")), > action=(ct_dnat_in_czone;) > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src > == > > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src > == > > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src > == > > 8000)) && outport == "incus-net40-lr-lrp-04" && > > is_chassis_resident("cr-incus-net40-lr-lrp-04")), > action=(ct_dnat_in_czone;) > > table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src > == > > 10.195.185.2 && tcp.src == 8000) || (ip4.src == 10.195.185.3 && tcp.src > == > > 8000) || (ip4.src == 10.195.185.4 && tcp.src == 8000) || (ip4.src == > > 10.195.185.5 && tcp.src == 8000) || (ip4.src == 10.195.185.6 && tcp.src > == > > 8000)) && outport == "incus-net40-lr-lrp-ext" && > > is_chassis_resident("cr-incus-net40-lr-lrp-ext")), > > action=(ct_dnat_in_czone;) > > > > > > Considering that the return when using multiple DGPs is probably > associated > > with ECMP, any idea on how to change the rules so that the LB output > rules > > are 'stateless' (not ct dependent) in this case with multiple DGPs? I > > imagine this solves the problem and guarantees a return for any chassis. > > I think the only way to solve your problem is to add NAT support for a > router having multiple DGPs. > I'm not sure how easy that is or if it is even possible to support. > We need to explore this. > Hi Numan, thanks for your feedback. I'm not convinced that adding stateful NAT support to a distributed port would work and/or be a good solution. It seems to me that for this use case, the NAT rules should be stateless, right? In addition, I believe that stateless rules are already supported for load balancers/routers configs, I just need to understand how to use them integrated with multiple DGPs (the configuration I made doesn't seem to have worked well). Do you think this (stateless NAT) would be a better option to use with ECMP? I mean, traffic that can go in/out via any chassis is usually not well managed by stateful NAT rules (and it doesn't make much sense in my opinion). @Ilya Maximets <[email protected]> , do you have any ideas here? Kind regards, Roberto > Numan > > > > > Thanks, > > Roberto > > > > PS: I have a complex load balancer scenario for the use case with > multiple > > DPGs, and 'internal' VIP addresses + OVN interconnect. I can explain the > > general context if you are interested :) > > > > > > > On 2/19/24 16:33, Roberto Bartzen Acosta wrote: > > > > This commit fixes the build_distr_lrouter_nat_flows_for_lb function > to > > > > include one NAT flow entry for each DGP in use. Since we have added > > > support > > > > to create multiple gateway ports per logical router, it's necessary > to > > > > include in the LR nat rules pipeline a specific entry for each > attached > > > > DGP. Otherwise, the ingress traffic is only redirected when the > incoming > > > > LRP matches the chassis_resident field. > > > > > > > > Considering that DNAT rules for DGPs were implemented with the need > to > > > > configure the DGP-related gateway-port column, the load-balancer NAT > rule > > > > configuration can use a similar idea. In this case, we don't know > the LRP > > > > responsible for the incoming traffic, and therefore we must apply the > > > > load-balancer automatically created NAT rule in all DGPs to allow the > > > > incoming traffic. > > > > > > > > Reported-at: > https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2054322 > > > > Fixes: 15348b7b806f ("ovn-northd: Multiple distributed gateway port > > > support.") > > > > Signed-off-by: Roberto Bartzen Acosta <[email protected]> > > > > --- > > > > northd/en-lr-stateful.c | 12 ------ > > > > northd/northd.c | 14 ++++--- > > > > tests/ovn-northd.at | 92 > +++++++++++++++++++++++++++++++++++++++++ > > > > 3 files changed, 100 insertions(+), 18 deletions(-) > > > > > > > > diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c > > > > index 6d0192487..7ffa4a690 100644 > > > > --- a/northd/en-lr-stateful.c > > > > +++ b/northd/en-lr-stateful.c > > > > @@ -537,18 +537,6 @@ lr_stateful_record_create(struct > lr_stateful_table > > > *table, > > > > > > > > table->array[od->index] = lr_stateful_rec; > > > > > > > > - /* Load balancers are not supported (yet) if a logical router > has > > > multiple > > > > - * distributed gateway port. Log a warning. */ > > > > - if (lr_stateful_rec->has_lb_vip && > lr_has_multiple_gw_ports(od)) { > > > > - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, > 1); > > > > - VLOG_WARN_RL(&rl, "Load-balancers are configured on logical > " > > > > - "router %s, which has %"PRIuSIZE" distributed " > > > > - "gateway ports. Load-balancer is not supported > " > > > > - "yet when there is more than one distributed " > > > > - "gateway port on the router.", > > > > - od->nbr->name, od->n_l3dgw_ports); > > > > - } > > > > - > > > > return lr_stateful_rec; > > > > } > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > index 2c3560ce2..7eb943d2f 100644 > > > > --- a/northd/northd.c > > > > +++ b/northd/northd.c > > > > @@ -10919,10 +10919,9 @@ static void > > > > build_distr_lrouter_nat_flows_for_lb(struct > lrouter_nat_lb_flows_ctx > > > *ctx, > > > > enum lrouter_nat_lb_flow_type > > > type, > > > > struct ovn_datapath *od, > > > > - struct lflow_ref *lflow_ref) > > > > + struct lflow_ref *lflow_ref, > > > > + struct ovn_port *dgp) > > > > { > > > > - struct ovn_port *dgp = od->l3dgw_ports[0]; > > > > - > > > > const char *undnat_action; > > > > > > > > switch (type) { > > > > @@ -10953,7 +10952,7 @@ build_distr_lrouter_nat_flows_for_lb(struct > > > lrouter_nat_lb_flows_ctx *ctx, > > > > > > > > if (ctx->lb_vip->n_backends || > !ctx->lb_vip->empty_backend_rej) { > > > > ds_put_format(ctx->new_match, " && > is_chassis_resident(%s)", > > > > - od->l3dgw_ports[0]->cr_port->json_key); > > > > + dgp->cr_port->json_key); > > > > } > > > > > > > > ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT, > > > ctx->prio, > > > > @@ -11164,8 +11163,11 @@ build_lrouter_nat_flows_for_lb( > > > > if (!od->n_l3dgw_ports) { > > > > bitmap_set1(gw_dp_bitmap[type], index); > > > > } else { > > > > - build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, > > > > - lb_dps->lflow_ref); > > > > + for (size_t i = 0; i < od->n_l3dgw_ports; i++) { > > > > + struct ovn_port *dgp = od->l3dgw_ports[i]; > > > > + build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, > > > > + > lb_dps->lflow_ref, > > > dgp); > > > > + } > > > > } > > > > > > > > if (lb->affinity_timeout) { > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > > index 6fdd761da..fa24935e1 100644 > > > > --- a/tests/ovn-northd.at > > > > +++ b/tests/ovn-northd.at > > > > @@ -12313,3 +12313,95 @@ check_engine_stats northd recompute > nocompute > > > > check_engine_stats lflow recompute nocompute > > > > > > > > AT_CLEANUP > > > > + > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > +AT_SETUP([Load balancer with Distributed Gateway Ports (DGP)]) > > > > +ovn_start > > > > + > > > > +check ovn-nbctl ls-add public > > > > +check ovn-nbctl lr-add lr1 > > > > + > > > > +# lr1 DGP ts1 > > > > +check ovn-nbctl ls-add ts1 > > > > +check ovn-nbctl lrp-add lr1 lr1-ts1 00:00:01:02:03:04 > 172.16.10.1/24 > > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts1 chassis-2 > > > > + > > > > +# lr1 DGP ts2 > > > > +check ovn-nbctl ls-add ts2 > > > > +check ovn-nbctl lrp-add lr1 lr1-ts2 00:00:01:02:03:05 > 172.16.20.1/24 > > > > +check ovn-nbctl lrp-set-gateway-chassis lr1-ts2 chassis-3 > > > > + > > > > +# lr1 DGP public > > > > +check ovn-nbctl lrp-add lr1 lr1_public 00:de:ad:ff:00:01 > 173.16.0.1/16 > > > > +check ovn-nbctl lrp-add lr1 lr1_s1 00:de:ad:fe:00:02 172.16.0.1/24 > > > > +check ovn-nbctl lrp-set-gateway-chassis lr1_public chassis-1 > > > > + > > > > +check ovn-nbctl ls-add s1 > > > > +# s1 - lr1 > > > > +check ovn-nbctl lsp-add s1 s1_lr1 > > > > +check ovn-nbctl lsp-set-type s1_lr1 router > > > > +check ovn-nbctl lsp-set-addresses s1_lr1 "00:de:ad:fe:00:02 > 172.16.0.1" > > > > +check ovn-nbctl lsp-set-options s1_lr1 router-port=lr1_s1 > > > > + > > > > +# s1 - backend vm1 > > > > +check ovn-nbctl lsp-add s1 vm1 > > > > +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 > 172.16.0.101" > > > > + > > > > +# s1 - backend vm2 > > > > +check ovn-nbctl lsp-add s1 vm2 > > > > +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 > 172.16.0.102" > > > > + > > > > +# s1 - backend vm3 > > > > +check ovn-nbctl lsp-add s1 vm3 > > > > +check ovn-nbctl lsp-set-addresses vm3 "00:de:ad:01:00:03 > 172.16.0.103" > > > > + > > > > +# Add the lr1 DGP ts1 to the public switch > > > > +check ovn-nbctl lsp-add public public_lr1_ts1 > > > > +check ovn-nbctl lsp-set-type public_lr1_ts1 router > > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts1 router > > > > +check ovn-nbctl lsp-set-options public_lr1_ts1 router-port=lr1-ts1 > > > nat-addresses=router > > > > + > > > > +# Add the lr1 DGP ts2 to the public switch > > > > +check ovn-nbctl lsp-add public public_lr1_ts2 > > > > +check ovn-nbctl lsp-set-type public_lr1_ts2 router > > > > +check ovn-nbctl lsp-set-addresses public_lr1_ts2 router > > > > +check ovn-nbctl lsp-set-options public_lr1_ts2 router-port=lr1-ts2 > > > nat-addresses=router > > > > + > > > > +# Add the lr1 DGP public to the public switch > > > > +check ovn-nbctl lsp-add public public_lr1 > > > > +check ovn-nbctl lsp-set-type public_lr1 router > > > > +check ovn-nbctl lsp-set-addresses public_lr1 router > > > > +check ovn-nbctl lsp-set-options public_lr1 router-port=lr1_public > > > nat-addresses=router > > > > + > > > > +# Create the Load Balancer lb1 > > > > +check ovn-nbctl --wait=sb lb-add lb1 "30.0.0.1" > > > "172.16.0.103,172.16.0.102,172.16.0.101" > > > > + > > > > +# Associate load balancer to s1 > > > > +check ovn-nbctl ls-lb-add s1 lb1 > > > > +check ovn-nbctl --wait=sb sync > > > > + > > > > +ovn-sbctl dump-flows s1 > s1flows > > > > +AT_CAPTURE_FILE([s1flows]) > > > > + > > > > +AT_CHECK([grep "ls_in_pre_stateful" s1flows | ovn_strip_lflows | > grep > > > "30.0.0.1"], [0], [dnl > > > > + table=??(ls_in_pre_stateful ), priority=120 , match=(reg0[[2]] > == 1 > > > && ip4.dst == 30.0.0.1), action=(reg1 = 30.0.0.1; ct_lb_mark;) > > > > +]) > > > > +AT_CHECK([grep "ls_in_lb" s1flows | ovn_strip_lflows | grep > > > "30.0.0.1"], [0], [dnl > > > > + table=??(ls_in_lb ), priority=110 , match=(ct.new && > > > ip4.dst == 30.0.0.1), action=(reg0[[1]] = 0; > > > ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > > +]) > > > > + > > > > +# Associate load balancer to lr1 with DGP > > > > +check ovn-nbctl lr-lb-add lr1 lb1 > > > > +check ovn-nbctl --wait=sb sync > > > > + > > > > +ovn-sbctl dump-flows lr1 > lr1flows > > > > +AT_CAPTURE_FILE([lr1flows]) > > > > + > > > > +AT_CHECK([grep "lr_in_dnat" lr1flows | ovn_strip_lflows | grep > > > "30.0.0.1"], [0], [dnl > > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > > is_chassis_resident("cr-lr1-ts1")), > > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > > is_chassis_resident("cr-lr1-ts2")), > > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > > + table=??(lr_in_dnat ), priority=110 , match=(ct.new && > > > !ct.rel && ip4 && ip4.dst == 30.0.0.1 && > > > is_chassis_resident("cr-lr1_public")), > > > action=(ct_lb_mark(backends=172.16.0.103,172.16.0.102,172.16.0.101);) > > > > +]) > > > > + > > > > +AT_CLEANUP > > > > +]) > > > > > > > > > > -- > > > > > > > > > > _‘Esta mensagem é direcionada apenas para os endereços constantes no > > cabeçalho inicial. Se você não está listado nos endereços constantes no > > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas > estão > > imediatamente anuladas e proibidas’._ > > > > > > * **‘Apesar do Magazine Luiza tomar > > todas as precauções razoáveis para assegurar que nenhum vírus esteja > > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade > por > > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* > > > > > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
