On 8/13/20 3:08 AM, Ankur Sharma wrote: > Hi Dumitru, > > Thanks a lot for looking at the change. > It my mistake, LB use case totally slipped through my mind. > > Cross product based approach is not generic enough to address LB as well (and > future use cases for CT ZONE). > > Please ignore this patch, i will submit a new one with a more generic > approach soon. >
Looking forward to it! > There is one change in this patch that can be taken, i.e replacing > CT_ZONE_DB_QUEUED with CT_ZONE_OF_QUEUED, > but i will submit a separate patch for it as well. > Sounds good, this is something we should definitely address and it can be indeed a separate patch. Thanks, Dumitru > > Thanks > > Regards, > Ankur > > > > > From: Dumitru Ceara <[email protected]> > Sent: Friday, August 7, 2020 1:58 AM > To: svc.mail.git <[email protected]>; [email protected] > <[email protected]>; Ankur Sharma <[email protected]> > Subject: Re: [ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment > logic for logical routers > > On 8/4/20 5:55 AM, Ankur Sharma wrote: >> From: Ankur Sharma <[email protected]> >> >> BACKGROUND: >> a. ovn-controller assigns CT ZONES for local ports and datapaths. >> b. If a local port/datapath is cleaned up from a chassis, then >> corresponding CT ZONE is "unassigned"/"freed" up. >> >> ISSUE: >> Above logic and implementations leaves stale CT entries in the >> datapath, which may get reused unexpectedly, thereby causing >> issues like, packets going through ct_nat(SNAT_IP_NEW) and getting >> a stale IP as SNAT IP etc. >> >> a. As a part of CT Zone unassign, implementation should FLUSH the >> corresponding CT entries, i.e it should do FLUSH by ZONE. >> As os now, implementation avoids the flushing, thereby leaving >> stale CT entries. >> >> b. Similarly, since the implementation relies on datapath existence >> for assign/unassign of CT ZONEs. Hence, simple operations like >> moving the logical router from one external logical switch to >> another, may not cause any CT ZONE reassignment and thereby >> stale CT entries might get consumed, when they should not have >> been. >> >> c. a. and b. combined causes following: >> i. Start a to be SNATed traffic from internal endpoint to an >> external endpoint. Let us say internal endpoint IP is >> 50.0.0.10 and external endpoint ip is 8.8.8.8 and >> logical router port ip (and hence SNAT ip) is 100.0.0.10. >> >> ii. Detach the logical router from old external logical switch >> and attach to new external logical switch. As a result >> of this operation, new router port ip becomes 200.0.0.10 >> , which also becomes the new SNAT ip. >> >> iii. The observation has been that traffic initiated in i. above >> still ends up using OLD SNAT IP, i.e 100.0.0.10, rather than >> 200.0.0.10 >> >> iv. iii. above happened, because although from OVS DP, the IP for >> NAT action is 200.0.0.10, however, since its an ongoing traffic, >> hence the CT entries come in use and end up NATing to old SNAT >> ip 100.0.0.10. For example: >> >> OVS DP STATE >> recirc_id(0),in_port(16),....ct(commit,zone=1,nat(src=200.0.0.10)) >> >> CT STATE >> icmp,orig=(src=50.0.0.10,dst=8.8.8.8,id=2288,type=8,code=0), >> reply=(src=8.8.8.8,dst=100.0.0.10,id=2288,type=0,code=0),zone=1 >> >> FIX: >> This patch improves the overall CT ZONE management by doing following: >> a. Do a FLUSH by CT ZONE, once we identify that a zone has to be freed up. >> b. From datapath perspective, restrict the CT ZONE assignment ONLY >> to logical routers that has NAT rules enabled. > > Hi Ankur, > > Maybe I'm missing something but doesn't this cause stateful ACLs, load > balancers applied to logical_switches and load balancers applied to > logical routers that don't have nat configured to use the default > conntrack zone (0)? I don't think that's ok. > > As a matter of fact, most of the system-ovn.at tests related to > conntrack fail because conntrack entries are created in the default > zone, e.g.: > > $ make check-kernel > 1: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT FAILED > (system-ovn.at:116) > > +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/1/stdout > 2020-08-07 04:50:11.167836393 -0400 > @@ -1,2 +1,2 @@ > -icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0),zone=<cleared> > +icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0) > > 2: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT - IPv6 > FAILED (system-ovn.at:296) > 3: ovn -- 2 LRs connected via LS, gateway router, easy SNAT FAILED > (system-ovn.at:448) > 4: ovn -- 2 LRs connected via LS, gateway router, easy SNAT - IPv6 > FAILED (system-ovn.at:560) > 5: ovn -- multiple gateway routers, SNAT and DNAT FAILED > (system-ovn.at:730) > 6: ovn -- multiple gateway routers, SNAT and DNAT - IPv6 FAILED > (system-ovn.at:958) > 7: ovn -- load-balancing FAILED > (system-ovn.at:1153) > 8: ovn -- load-balancing - IPv6 ok > 9: ovn -- load-balancing - same subnet. ok > 10: ovn -- load-balancing - same subnet. - IPv6 ok > 11: ovn -- load balancing in gateway router FAILED > (system-ovn.at:1829) > > ./system-ovn.at:1829: ovs-appctl dpctl/dump-conntrack | grep > "dst=30.0.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e > 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | > sort | uniq | > sed -e 's/zone=[0-9]*/zone=<cleared>/' > --- - 2020-08-07 04:46:41.414957151 -0400 > +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/11/stdout > 2020-08-07 04:46:41.412798625 -0400 > @@ -1,3 +1,3 @@ > -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) > -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) > +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>) > +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>) > > > 12: ovn -- load balancing in gateway router - IPv6 FAILED > (system-ovn.at:2033) > 13: ovn -- multiple gateway routers, load-balancing FAILED > (system-ovn.at:2209) > 14: ovn -- multiple gateway routers, load-balancing - IPv6 FAILED > (system-ovn.at:2376) > > [...] > > 30: ovn -- ECMP symmetric reply FAILED > (system-ovn.at:4684) > > ./system-ovn.at:4684: ovs-appctl dpctl/dump-conntrack | grep > "dst=172.16.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e > 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' | > sort | uniq | \ > sed -e 's/zone=[0-9]*/zone=<cleared>/' | > sed -e > 's/labels=0x[0-9a-f]*00000401020400000000/labels=0x00000401020400000000/' > --- - 2020-08-07 04:56:22.403499981 -0400 > +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/30/stdout > 2020-08-07 04:56:22.401442917 -0400 > @@ -1,2 +1,2 @@ > -icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x00000401020400000000 > +icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),labels=0x00000401020400000000 > >> c. Instead of using logical router uuid as ct zone key, use crossproduct >> of logical router and logical router port that connects to external >> logical switch. >> >> Signed-off-by: Ankur Sharma <[email protected]> >> --- >> controller/ovn-controller.c | 37 +++++++++++++++++++++++++++---------- >> controller/physical.c | 18 ++++++++++++------ >> lib/ovn-util.c | 10 ++++++---- >> lib/ovn-util.h | 3 ++- >> 4 files changed, 47 insertions(+), 21 deletions(-) >> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 5ca32ac..9a6746e 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -521,17 +521,34 @@ update_ct_zones(const struct sset *lports, const >> struct hmap *local_datapaths, >> sset_add(&all_users, user); >> } >> >> - /* Local patched datapath (gateway routers) need zones assigned. */ >> + /* Local patched datapath (gateway routers) need zones assigned. >> + * Only local logical routers with atleast one NAT rule are considered >> for >> + * CT zone assignment.*/ >> const struct local_datapath *ld; >> HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { >> - /* XXX Add method to limit zone assignment to logical router >> - * datapaths with NAT */ >> - char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, >> "dnat"); >> - char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, >> "snat"); >> - sset_add(&all_users, dnat); >> - sset_add(&all_users, snat); >> - free(dnat); >> - free(snat); >> + const char *dp_nblr = smap_get(&ld->datapath->external_ids, >> + "logical-router"); >> + if (dp_nblr) { >> + for (size_t iter = 0; iter < ld->n_peer_ports; iter++) { >> + const struct sbrec_port_binding *peer_binding = >> + ld->peer_ports[iter].remote; >> + const struct sbrec_port_binding *local_binding = >> + ld->peer_ports[iter].local; >> + >> + if (peer_binding->nat_addresses) { >> + char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, >> + >> &local_binding->header_.uuid, >> + "dnat"); >> + char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, >> + >> &local_binding->header_.uuid, >> + "snat"); >> + sset_add(&all_users, dnat); >> + sset_add(&all_users, snat); >> + free(dnat); >> + free(snat); >> + } >> + } >> + } >> } >> >> /* Delete zones that do not exist in above sset. */ >> @@ -541,7 +558,7 @@ update_ct_zones(const struct sset *lports, const struct >> hmap *local_datapaths, >> ct_zone->data, ct_zone->name); >> >> struct ct_zone_pending_entry *pending = xmalloc(sizeof >> *pending); >> - pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */ >> + pending->state = CT_ZONE_OF_QUEUED; >> pending->zone = ct_zone->data; >> pending->add = false; >> shash_add(pending_ct_zones, ct_zone->name, pending); >> diff --git a/controller/physical.c b/controller/physical.c >> index 535c777..cc497e0 100644 >> --- a/controller/physical.c >> +++ b/controller/physical.c >> @@ -218,18 +218,24 @@ static struct zone_ids >> get_zone_ids(const struct sbrec_port_binding *binding, >> const struct simap *ct_zones) >> { >> - struct zone_ids zone_ids; >> + struct zone_ids zone_ids = {0}; >> >> zone_ids.ct = simap_get(ct_zones, binding->logical_port); >> >> - const struct uuid *key = &binding->datapath->header_.uuid; >> + const struct uuid *key1 = &binding->datapath->header_.uuid; >> + const struct uuid *key2 = &binding->header_.uuid; >> >> - char *dnat = alloc_nat_zone_key(key, "dnat"); >> - zone_ids.dnat = simap_get(ct_zones, dnat); >> + char *dnat = alloc_nat_zone_key(key1, key2, "dnat"); >> + >> + if (simap_contains(ct_zones, dnat)) { >> + zone_ids.dnat = simap_get(ct_zones, dnat); >> + } > > Both simap_get() and simap_contains() call simap_find() internally. If > simap_get() can't find the key it will return a default value of 0. > There's no need for the "if (simap_contains(ct_zones, dnat)) {". We can > just directly: > > zone_ids.dnat = simap_get(ct_zones, dnat); > >> free(dnat); >> >> - char *snat = alloc_nat_zone_key(key, "snat"); >> - zone_ids.snat = simap_get(ct_zones, snat); >> + char *snat = alloc_nat_zone_key(key1, key2, "snat"); >> + if (simap_contains(ct_zones, snat)) { >> + zone_ids.snat = simap_get(ct_zones, snat); >> + } > > Same here. > > Thanks, > Dumitru > >> free(snat); >> >> return zone_ids; >> diff --git a/lib/ovn-util.c b/lib/ovn-util.c >> index cdb5e18..cba7355 100644 >> --- a/lib/ovn-util.c >> +++ b/lib/ovn-util.c >> @@ -327,14 +327,16 @@ destroy_lport_addresses(struct lport_addresses *laddrs) >> free(laddrs->ipv6_addrs); >> } >> >> -/* Allocates a key for NAT conntrack zone allocation for a provided >> - * 'key' record and a 'type'. >> +/* Allocates a key for NAT conntrack zone allocation for provided >> + * 'keys' and a 'type'. >> * >> * It is the caller's responsibility to free the allocated memory. */ >> char * >> -alloc_nat_zone_key(const struct uuid *key, const char *type) >> +alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2, >> + const char *type) >> { >> - return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type); >> + return xasprintf(UUID_FMT"_"UUID_FMT"_%s", UUID_ARGS(key1), >> + UUID_ARGS(key2), type); >> } >> >> const char * >> diff --git a/lib/ovn-util.h b/lib/ovn-util.h >> index 0f7b501..fe86bf8 100644 >> --- a/lib/ovn-util.h >> +++ b/lib/ovn-util.h >> @@ -77,7 +77,8 @@ bool extract_sbrec_binding_first_mac(const struct >> sbrec_port_binding *binding, >> >> void destroy_lport_addresses(struct lport_addresses *); >> >> -char *alloc_nat_zone_key(const struct uuid *key, const char *type); >> +char *alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2, >> + const char *type); >> >> const char *default_nb_db(void); >> const char *default_sb_db(void); >> > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
