Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

2024-01-23 Thread Han Zhou
On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara  wrote:
>
> On 1/11/24 16:31, num...@ovn.org wrote:
> > +
> > +void
> > +lflow_table_add_lflow(struct lflow_table *lflow_table,
> > +  const struct ovn_datapath *od,
> > +  const unsigned long *dp_bitmap, size_t
dp_bitmap_len,
> > +  enum ovn_stage stage, uint16_t priority,
> > +  const char *match, const char *actions,
> > +  const char *io_port, const char *ctrl_meter,
> > +  const struct ovsdb_idl_row *stage_hint,
> > +  const char *where,
> > +  struct lflow_ref *lflow_ref)
> > +OVS_EXCLUDED(fake_hash_mutex)
> > +{
> > +struct ovs_mutex *hash_lock;
> > +uint32_t hash;
> > +
> > +ovs_assert(!od ||
> > +   ovn_stage_to_datapath_type(stage) ==
ovn_datapath_get_type(od));
> > +
> > +hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
> > + ovn_stage_get_pipeline(stage),
> > + priority, match,
> > + actions);
> > +
> > +hash_lock = lflow_hash_lock(_table->entries, hash);
> > +struct ovn_lflow *lflow =
> > +do_ovn_lflow_add(lflow_table, od, dp_bitmap,
> > + dp_bitmap_len, hash, stage,
> > + priority, match, actions,
> > + io_port, ctrl_meter, stage_hint, where);
> > +
> > +if (lflow_ref) {
> > +/* lflow referencing is only supported if 'od' is not NULL. */
> > +ovs_assert(od);
> > +
> > +struct lflow_ref_node *lrn =
> > +lflow_ref_node_find(_ref->lflow_ref_nodes, lflow,
hash);
> > +if (!lrn) {
> > +lrn = xzalloc(sizeof *lrn);
> > +lrn->lflow = lflow;
> > +lrn->dp_index = od->index;
> > +ovs_list_insert(_ref->lflows_ref_list,
> > +>lflow_list_node);
> > +inc_dp_refcnt(>dp_refcnts_map, lrn->dp_index);
> > +ovs_list_insert(>referenced_by,
>ref_list_node);
> > +
> > +hmap_insert(_ref->lflow_ref_nodes, >ref_node,
hash);
> > +}
> > +
> > +lrn->linked = true;
> > +}
> > +
> > +lflow_hash_unlock(hash_lock);
> > +
> > +}
> > +
>
> This part is not thread safe.
>
> If two threads try to add logical flows that have different hashes and
> lflow_ref is not NULL we're going to have a race condition when
> inserting to the _ref->lflow_ref_nodes hash map because the two
> threads will take different locks.
>

I think it is safe because a lflow_ref is always associated with an object,
e.g. port, datapath, lb, etc., and lflow generation for a single such
object is never executed in parallel, which is how the parallel lflow build
is designed.
Does it make sense?

Thanks,
Han

> That might corrupt the map.
>
> I guess, if we don't want to cause more performance degradation we
> should maintain as many lflow_ref instances as we do hash_locks, i.e.,
> LFLOW_HASH_LOCK_MASK + 1.  Will that even be possible?
>
> Wdyt?
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVN] DHCP Relay Agent support for overlay subnets

2024-01-23 Thread Numan Siddique
On Tue, Jan 23, 2024 at 8:02 PM Naveen Yerramneni
 wrote:
>
>
>
> > On 16-Jan-2024, at 2:30 AM, Numan Siddique  wrote:
> >
> > On Tue, Dec 12, 2023 at 1:05 PM Naveen Yerramneni
> >  wrote:
> >>
> >>This patch contains changes to enable DHCP Relay Agent support for 
> >> overlay subnets.
> >>
> >>USE CASE:
> >>--
> >>  - Enable IP address assignment for overlay subnets from the 
> >> centralized DHCP server present in the underlay network.
> >>
> >>PREREQUISITES
> >>--
> >>  - Logical Router Port IP should be assigned (statically) from the 
> >> same overlay subnet which is managed by DHCP server.
> >>  - LRP IP is used for GIADRR field when relaying the DHCP packets and 
> >> also same IP needs to be configured as default gateway for the overlay 
> >> subnet.
> >>  - Overlay subnets managed by external DHCP server are expected to be 
> >> directly reachable from the underlay network.
> >>
> >>EXPECTED PACKET FLOW:
> >>--
> >>Following is the expected packet flow inorder to support DHCP rleay 
> >> functionality in OVN.
> >>  1. DHCP client originates DHCP discovery (broadcast).
> >>  2. DHCP relay (running on the OVN) receives the broadcast and 
> >> forwards the packet to the DHCP server by converting it to unicast. While 
> >> forwarding the packet, it updates the GIADDR in DHCP header to its
> >> interface IP on which DHCP packet is received.
> >>  3. DHCP server uses GIADDR field to decide the IP address pool from 
> >> which IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
> >>  4. DHCP relay agent forwards the offer to the client, it resets the 
> >> GIADDR field when forwarding the offer to the client.
> >>  5. DHCP client sends DHCP request (broadcast) packet.
> >>  6. DHCP relay (running on the OVN) receives the broadcast and 
> >> forwards the packet to the DHCP server by converting it to unicast. While 
> >> forwarding the packet, it updates the GIADDR in DHCP header to its
> >> interface IP on which DHCP packet is received.
> >>  7. DHCP Server sends the ACK packet.
> >>  8. DHCP relay agent forwards the ACK packet to the client, it resets 
> >> the GIADDR field when forwarding the ACK to the client.
> >>  9. All the future renew/release packets are directly exchanged 
> >> between DHCP client and DHCP server.
> >>
> >>OVN DHCP RELAY PACKET FLOW:
> >>
> >>To add DHCP Relay support on OVN, we need to replicate all the behavior 
> >> described above using distributed logical switch and logical router.
> >>At, highlevel packet flow is distributed among Logical Switch and 
> >> Logical Router on source node (where VM is deployed) and redirect 
> >> chassis(RC) node.
> >>  1. Request packet gets processed on the source node where VM is 
> >> deployed and relays the packet to DHCP server.
> >>  2. Response packet is first processed on RC node (which first 
> >> recieves the packet from underlay network). RC node forwards the packet to 
> >> the right node by filling in the dest MAC and IP.
> >>
> >>OVN Packet flow with DHCP relay is explained below.
> >>  1. DHCP client (VM) sends the DHCP discover packet (broadcast).
> >>  2. Logical switch converts the packet to L2 unicast by setting the 
> >> destination MAC to LRP's MAC
> >>  3. Logical Router receives the packet and redirects it to the OVN 
> >> controller.
> >>  4. OVN controller updates the required information(GIADDR) in the 
> >> DHCP payload after doing the required checks. If any check fails, packet 
> >> is dropped.
> >>  5. Logical Router converts the packet to L3 unicast and forwards it 
> >> to the server. This packets gets routed like any other packet (via RC 
> >> node).
> >>  6. Server replies with DHCP offer.
> >>  7. RC node processes the DHCP offer and forwards it to the OVN 
> >> controller.
> >>  8. OVN controller does sanity checks and  updates the destination MAC 
> >> (available in DHCP header), destination IP (available in DHCP header), 
> >> resets GIADDR  and reinjects the packet to datapath.
> >> If any check fails, packet is dropped.
> >>  9. Logical router updates the source IP and port and forwards the 
> >> packet to logical switch.
> >>  10. Logical switch delivers the packet to the DHCP client.
> >>  11. Similar steps are performed for Request and Ack packets.
> >>  12. All the future renew/release packets are directly exchanged 
> >> between DHCP client and DHCP server
> >>
> >>NEW OVN ACTIONS
> >>---
> >>
> >>  1. dhcp_relay_req(, )
> >>  - This action executes on the source node on which the DHCP 
> >> request originated.
> >>  - This action relays the DHCP request coming from client to the 
> >> server. Relay-ip is used to update GIADDR in the DHCP header.
> >>  2. dhcp_relay_resp_fwd(, )
> >>   

Re: [ovs-dev] [PATCH OVN] DHCP Relay Agent support for overlay subnets

2024-01-23 Thread Naveen Yerramneni


> On 16-Jan-2024, at 2:30 AM, Numan Siddique  wrote:
> 
> On Tue, Dec 12, 2023 at 1:05 PM Naveen Yerramneni
>  wrote:
>> 
>>This patch contains changes to enable DHCP Relay Agent support for 
>> overlay subnets.
>> 
>>USE CASE:
>>--
>>  - Enable IP address assignment for overlay subnets from the centralized 
>> DHCP server present in the underlay network.
>> 
>>PREREQUISITES
>>--
>>  - Logical Router Port IP should be assigned (statically) from the same 
>> overlay subnet which is managed by DHCP server.
>>  - LRP IP is used for GIADRR field when relaying the DHCP packets and 
>> also same IP needs to be configured as default gateway for the overlay 
>> subnet.
>>  - Overlay subnets managed by external DHCP server are expected to be 
>> directly reachable from the underlay network.
>> 
>>EXPECTED PACKET FLOW:
>>--
>>Following is the expected packet flow inorder to support DHCP rleay 
>> functionality in OVN.
>>  1. DHCP client originates DHCP discovery (broadcast).
>>  2. DHCP relay (running on the OVN) receives the broadcast and forwards 
>> the packet to the DHCP server by converting it to unicast. While forwarding 
>> the packet, it updates the GIADDR in DHCP header to its
>> interface IP on which DHCP packet is received.
>>  3. DHCP server uses GIADDR field to decide the IP address pool from 
>> which IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
>>  4. DHCP relay agent forwards the offer to the client, it resets the 
>> GIADDR field when forwarding the offer to the client.
>>  5. DHCP client sends DHCP request (broadcast) packet.
>>  6. DHCP relay (running on the OVN) receives the broadcast and forwards 
>> the packet to the DHCP server by converting it to unicast. While forwarding 
>> the packet, it updates the GIADDR in DHCP header to its
>> interface IP on which DHCP packet is received.
>>  7. DHCP Server sends the ACK packet.
>>  8. DHCP relay agent forwards the ACK packet to the client, it resets 
>> the GIADDR field when forwarding the ACK to the client.
>>  9. All the future renew/release packets are directly exchanged between 
>> DHCP client and DHCP server.
>> 
>>OVN DHCP RELAY PACKET FLOW:
>>
>>To add DHCP Relay support on OVN, we need to replicate all the behavior 
>> described above using distributed logical switch and logical router.
>>At, highlevel packet flow is distributed among Logical Switch and Logical 
>> Router on source node (where VM is deployed) and redirect chassis(RC) node.
>>  1. Request packet gets processed on the source node where VM is 
>> deployed and relays the packet to DHCP server.
>>  2. Response packet is first processed on RC node (which first recieves 
>> the packet from underlay network). RC node forwards the packet to the right 
>> node by filling in the dest MAC and IP.
>> 
>>OVN Packet flow with DHCP relay is explained below.
>>  1. DHCP client (VM) sends the DHCP discover packet (broadcast).
>>  2. Logical switch converts the packet to L2 unicast by setting the 
>> destination MAC to LRP's MAC
>>  3. Logical Router receives the packet and redirects it to the OVN 
>> controller.
>>  4. OVN controller updates the required information(GIADDR) in the DHCP 
>> payload after doing the required checks. If any check fails, packet is 
>> dropped.
>>  5. Logical Router converts the packet to L3 unicast and forwards it to 
>> the server. This packets gets routed like any other packet (via RC node).
>>  6. Server replies with DHCP offer.
>>  7. RC node processes the DHCP offer and forwards it to the OVN 
>> controller.
>>  8. OVN controller does sanity checks and  updates the destination MAC 
>> (available in DHCP header), destination IP (available in DHCP header), 
>> resets GIADDR  and reinjects the packet to datapath.
>> If any check fails, packet is dropped.
>>  9. Logical router updates the source IP and port and forwards the 
>> packet to logical switch.
>>  10. Logical switch delivers the packet to the DHCP client.
>>  11. Similar steps are performed for Request and Ack packets.
>>  12. All the future renew/release packets are directly exchanged between 
>> DHCP client and DHCP server
>> 
>>NEW OVN ACTIONS
>>---
>> 
>>  1. dhcp_relay_req(, )
>>  - This action executes on the source node on which the DHCP request 
>> originated.
>>  - This action relays the DHCP request coming from client to the 
>> server. Relay-ip is used to update GIADDR in the DHCP header.
>>  2. dhcp_relay_resp_fwd(, )
>>  - This action executes on the first node (RC node) which processes 
>> the DHCP response from the server.
>>  - This action updates  the destination MAC and destination IP so 
>> that the response can be forwarded to the appropriate node 

Re: [ovs-dev] [PATCH ovn] pinctrl: dns: Ignore additional additional records.

2024-01-23 Thread Mark Michelson

Hi Dumitru.

Two questions:

1) Did you mean to title this as "additional additional records?" Or did 
you just mean "additional records?"


2) Should this include a test? I'm thinking you could construct a DNS 
query that includes an EDNS AR and ensure that OVN responds to it.


Thanks,
Mark Michelson

On 1/23/24 09:15, Dumitru Ceara wrote:

EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: https://github.com/ovn-org/ovn/issues/228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara 
---
  controller/pinctrl.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 4992eab089..0be77701ec 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2885,6 +2885,7 @@ dns_build_ptr_answer(
  free(encoded);
  }
  
+#define DNS_QUERY_TYPE_CLASS_LEN (2 * sizeof(ovs_be16))

  #define DNS_RCODE_SERVER_REFUSE 0x5
  
  /* Called with in the pinctrl_handler thread context. */

@@ -2949,18 +2950,13 @@ pinctrl_handle_dns_lookup(
  goto exit;
  }
  
-/* Check if there is an additional record present, which is unsupported */

-if (in_dns_header->arcount) {
-VLOG_DBG_RL(, "Received DNS query with additional records, which"
-" is unsupported");
-goto exit;
-}
-
  struct udp_header *in_udp = dp_packet_l4(pkt_in);
  size_t udp_len = ntohs(in_udp->udp_len);
  size_t l4_len = dp_packet_l4_size(pkt_in);
+uint8_t *l4_start = (uint8_t *) in_udp;
  uint8_t *end = (uint8_t *)in_udp + MIN(udp_len, l4_len);
  uint8_t *in_dns_data = (uint8_t *)(in_dns_header + 1);
+uint8_t *in_dns_data_start = in_dns_data;
  uint8_t *in_queryname = in_dns_data;
  uint16_t idx = 0;
  struct ds query_name;
@@ -2984,7 +2980,7 @@ pinctrl_handle_dns_lookup(
  in_dns_data += idx;
  
  /* Query should have TYPE and CLASS fields */

-if (in_dns_data + (2 * sizeof(ovs_be16)) > end) {
+if (in_dns_data + DNS_QUERY_TYPE_CLASS_LEN > end) {
  ds_destroy(_name);
  goto exit;
  }
@@ -2998,6 +2994,10 @@ pinctrl_handle_dns_lookup(
  goto exit;
  }
  
+uint8_t *rest = in_dns_data + DNS_QUERY_TYPE_CLASS_LEN;

+uint32_t query_size = rest - in_dns_data_start;
+uint32_t query_l4_size = rest - l4_start;
+
  uint64_t dp_key = ntohll(pin->flow_metadata.flow.metadata);
  const char *answer_data = NULL;
  bool ovn_owned = false;
@@ -3080,7 +3080,7 @@ pinctrl_handle_dns_lookup(
  goto exit;
  }
  
-uint16_t new_l4_size = ntohs(in_udp->udp_len) +  dns_answer.size;

+uint16_t new_l4_size = query_l4_size + dns_answer.size;
  size_t new_packet_size = pkt_in->l4_ofs + new_l4_size;
  struct dp_packet pkt_out;
  dp_packet_init(_out, new_packet_size);
@@ -3117,7 +3117,7 @@ pinctrl_handle_dns_lookup(
  out_dns_header->arcount = 0;
  
  /* Copy the Query section. */

-dp_packet_put(_out, dp_packet_data(pkt_in), dp_packet_size(pkt_in));
+dp_packet_put(_out, dp_packet_data(pkt_in), query_size);
  
  /* Copy the answer sections. */

  dp_packet_put(_out, dns_answer.data, dns_answer.size);


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] actions: Use random port selection for SNAT with external_port_range.

2024-01-23 Thread Mark Michelson

Hi Dumitru,

Acked-by: Mark Michelson 

I think this change is good. I looked through the documentation for the 
"external_port_range" column in the NAT table. This change appears to 
actually make the documentation *more* accurate rather than to introduce 
any potentially undesirable behavior changes.


What I find more telling is that the only tests you had to update were 
the actions tests. We apparently don't have any system tests that use 
NAT with an external port range. That's not great :(


On 1/22/24 09:54, Dumitru Ceara wrote:

This is to avoid unexpected behavior changes due to the underlying
datapath (e.g., kernel) changing defaults.  If we don't explicitly
request a port selection algorithm, OVS leaves it up to the
datapath to decide how to do the port selection.  Currently that means
that source port allocation is not random if the original source port
fits in the requested range.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/410847.html
Reported-at: https://issues.redhat.com/browse/FDP-301
Fixes: 60bdc8ea78d7 ("NAT: Provide port range in input")
Signed-off-by: Dumitru Ceara 
---
  lib/actions.c | 16 ++--
  tests/ovn.at  |  8 
  2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/lib/actions.c b/lib/actions.c
index 38cf4642d6..fdc0529de6 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -1131,8 +1131,20 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
  }
  
  if (cn->port_range.exists) {

-   nat->range.proto.min = cn->port_range.port_lo;
-   nat->range.proto.max = cn->port_range.port_hi;
+nat->range.proto.min = cn->port_range.port_lo;
+nat->range.proto.max = cn->port_range.port_hi;
+
+/* Explicitly set the port selection algorithm to "random".  Otherwise
+ * it's up to the datapath to choose how to select the port and that
+ * might create unexpected behavior changes when the datapath defaults
+ * change.
+ *
+ * NOTE: for the userspace datapath the "random" function doesn't
+ * really generate random ports, it uses "hash" under the hood:
+ * https://issues.redhat.com/browse/FDP-269. */
+if (nat->range.proto.min && nat->range.proto.max) {
+nat->flags |= NX_NAT_F_PROTO_RANDOM;
+}
  }
  
  ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);

diff --git a/tests/ovn.at b/tests/ovn.at
index 8cc4c311c1..e8cef83cb5 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -1368,7 +1368,7 @@ ct_dnat(fd11::2);
  has prereqs ip
  ct_dnat(192.168.1.2, 1-3000);
  formats as ct_dnat(192.168.1.2,1-3000);
-encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000))
+encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000,random))
  has prereqs ip
  
  ct_dnat(192.168.1.2, 192.168.1.3);

@@ -1402,7 +1402,7 @@ ct_dnat_in_czone(fd11::2);
  has prereqs ip
  ct_dnat_in_czone(192.168.1.2, 1-3000);
  formats as ct_dnat_in_czone(192.168.1.2,1-3000);
-encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000))
+encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000,random))
  has prereqs ip
  
  ct_dnat_in_czone(192.168.1.2, 192.168.1.3);

@@ -1436,7 +1436,7 @@ ct_snat(fd11::2);
  has prereqs ip
  ct_snat(192.168.1.2, 1-3000);
  formats as ct_snat(192.168.1.2,1-3000);
-encodes as 
ct(commit,table=19,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2:1-3000))
+encodes as 
ct(commit,table=19,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2:1-3000,random))
  has prereqs ip
  
  ct_snat(192.168.1.2, 192.168.1.3);

@@ -1470,7 +1470,7 @@ ct_snat_in_czone(fd11::2);
  has prereqs ip
  ct_snat_in_czone(192.168.1.2, 1-3000);
  formats as ct_snat_in_czone(192.168.1.2,1-3000);
-encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(src=192.168.1.2:1-3000))
+encodes as 
ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(src=192.168.1.2:1-3000,random))
  has prereqs ip
  
  ct_snat_in_czone(192.168.1.2, 192.168.1.3);


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 ovn] ovn: Add geneve PMTUD support.

2024-01-23 Thread Lorenzo Bianconi
Introduce specif flows for E/W ICMPv{4,6} packets if tunnelled packets
do not fit path MTU. This patch enable PMTUD for East/West Geneve traffic.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2241711
Tested-at: 
https://github.com/LorenzoBianconi/ovn/actions/runs/7627345562/job/20777962659
Signed-off-by: Lorenzo Bianconi 
---
Changes since v3:
- fix flaky ovn multinode test
- rebase on top of ovn main branch
Changes since v2:
- fix openshift regression for regular icmp traffic received from a remote node
- add gw router selftest
Changes since v1:
- add fix for vxlan and stt tunnels
---
 .../workflows/ovn-fake-multinode-tests.yml|   6 +-
 NEWS  |   1 +
 controller/physical.c |  30 +-
 include/ovn/logical-fields.h  |   3 +
 lib/logical-fields.c  |   2 +
 northd/northd.c   |  83 ++
 northd/ovn-northd.8.xml   |  41 +-
 tests/multinode.at| 813 +-
 tests/ovn-northd.at   |  12 +
 9 files changed, 985 insertions(+), 6 deletions(-)

diff --git a/.github/workflows/ovn-fake-multinode-tests.yml 
b/.github/workflows/ovn-fake-multinode-tests.yml
index 25610df53..b3ba82a30 100644
--- a/.github/workflows/ovn-fake-multinode-tests.yml
+++ b/.github/workflows/ovn-fake-multinode-tests.yml
@@ -76,8 +76,8 @@ jobs:
   fail-fast: false
   matrix:
 cfg:
-- { branch: "main" }
-- { branch: "branch-22.03" }
+- { branch: "main", testsuiteflags: ""}
+- { branch: "branch-22.03", testsuiteflags: "-k 'basic test'" }
 name: multinode tests ${{ join(matrix.cfg.*, ' ') }}
 env:
   RUNC_CMD: podman
@@ -176,7 +176,7 @@ jobs:
 
 - name: Run fake-multinode system tests
   run: |
-if ! sudo make check-multinode; then
+if ! sudo make check-multinode TESTSUITEFLAGS="${{ 
matrix.cfg.testsuiteflags }}"; then
   sudo podman exec -it ovn-central ovn-nbctl show || :
   sudo podman exec -it ovn-central ovn-sbctl show || :
   sudo podman exec -it ovn-chassis-1 ovs-vsctl show || :
diff --git a/NEWS b/NEWS
index 5f267b4c6..72c40df82 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,7 @@ Post v23.09.0
   - ovn-northd-ddlog has been removed.
   - A new LSP option "enable_router_port_acl" has been added to enable
 conntrack for the router port whose peer is l3dgw_port if set it true.
+  - Enable PMTU discovery on geneve tunnels for E/W traffic.
 
 OVN v23.09.0 - 15 Sep 2023
 --
diff --git a/controller/physical.c b/controller/physical.c
index eda085441..c3f8769bb 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -2452,9 +2452,37 @@ physical_run(struct physical_ctx *p_ctx,
 }
 
 put_resubmit(OFTABLE_LOCAL_OUTPUT, );
-
 ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 100, 0, ,
 , hc_uuid);
+
+/* Set allow rx from tunnel bit. */
+put_load(1, MFF_LOG_FLAGS, MLF_RX_FROM_TUNNEL_BIT, 1, );
+
+/* Add specif flows for E/W ICMPv{4,6} packets if tunnelled packets
+ * do not fit path MTU.
+ */
+put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, );
+
+/* IPv4 */
+match_init_catchall();
+match_set_in_port(, tun->ofport);
+match_set_dl_type(, htons(ETH_TYPE_IP));
+match_set_nw_proto(, IPPROTO_ICMP);
+match_set_icmp_type(, 3);
+match_set_icmp_code(, 4);
+
+ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 120, 0, ,
+, hc_uuid);
+/* IPv6 */
+match_init_catchall();
+match_set_in_port(, tun->ofport);
+match_set_dl_type(, htons(ETH_TYPE_IPV6));
+match_set_nw_proto(, IPPROTO_ICMPV6);
+match_set_icmp_type(, 2);
+match_set_icmp_code(, 0);
+
+ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 120, 0, ,
+, hc_uuid);
 }
 
 /* Add VXLAN specific rules to transform port keys
diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index 272277ec4..e8e0e6d33 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -78,6 +78,7 @@ enum mff_log_flags_bits {
 MLF_LOOKUP_COMMIT_ECMP_NH_BIT = 13,
 MLF_USE_LB_AFF_SESSION_BIT = 14,
 MLF_LOCALNET_BIT = 15,
+MLF_RX_FROM_TUNNEL_BIT = 16,
 };
 
 /* MFF_LOG_FLAGS_REG flag assignments */
@@ -129,6 +130,8 @@ enum mff_log_flags {
 /* Indicate that the port is localnet. */
 MLF_LOCALNET = (1 << MLF_LOCALNET_BIT),
 
+/* Indicate the packet has been received from the tunnel. */
+MLF_RX_FROM_TUNNEL = (1 << MLF_RX_FROM_TUNNEL_BIT),
 };
 
 /* OVN logical fields
diff --git a/lib/logical-fields.c b/lib/logical-fields.c
index 7a1e66d0a..662c1ef0e 100644
--- a/lib/logical-fields.c
+++ b/lib/logical-fields.c
@@ -133,6 +133,8 @@ ovn_init_symtab(struct shash 

[ovs-dev] [PATCH ovn] northd: Make sure that affinity flows match on VIP.

2024-01-23 Thread Ales Musil
Consider the two following LBs:
1) 192.168.0.100:8080 -> 192.168.0.10:80
   affinity_timeout=120, skip_snat=true
2) 172.10.0.100:8080  -> 192.168.0.10:80
   affinity_timeout=120, skip_snat=false

Connected to the same LR with "lb_force_snat_ip" option set.
This combination would produce two flows with the same match
but different action:
1) priority=150, match=(reg9[6] == 1 && ct.new && ip4 && reg4 == 192.168.0.10 
&& reg8[0..15] == 80)
   action=(reg0 = 192.168.0.100; flags.skip_snat_for_lb = 1; 
ct_lb_mark(backends=192.168.0.10:80; skip_snat);)

2) priority=150, match=(reg9[6] == 1 && ct.new && ip4 && reg4 == 192.168.0.10 
&& reg8[0..15] == 80)
   action=(reg0 = 172.10.0.100; flags.force_snat_for_lb = 1; 
ct_lb_mark(backends=192.168.0.10:80; force_snat);)

This causes issues because it's not defined what flow will take
priority because they have the same match. So it might happen that
it the traffic undergoes SNAT when it shouldn't or vice versa.

To make sure that correct action is taken, differentiate the match by
adding VIP match (ip.dst == VIP).

Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
Reported-at: https://issues.redhat.com/browse/FDP-290
Signed-off-by: Ales Musil 
---
 northd/northd.c  | 14 +++-
 tests/ofproto-macros.at  |  4 ++--
 tests/ovn-northd.at  | 48 +++-
 tests/system-ovn-kmod.at |  8 +++
 4 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index de15ca101..5763a1b8b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8404,12 +8404,12 @@ build_lb_rules_pre_stateful(struct hmap *lflows,
  *
  * - load balancing:
  *   table=lr_in_dnat, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
  * && REG_LB_AFF_BACKEND_IP4 == B1 && REG_LB_AFF_MATCH_PORT == BP1)
  *  action=(REG_NEXT_HOP_IPV4 = V; lb_action;
  *  ct_lb_mark(backends=B1:BP1; ct_flag);)
  *   table=lr_in_dnat, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
  * && REG_LB_AFF_BACKEND_IP4 == B2 && REG_LB_AFF_MATCH_PORT == BP2)
  *  action=(REG_NEXT_HOP_IPV4 = V; lb_action;
  *  ct_lb_mark(backends=B2:BP2; ct_flag);)
@@ -8508,7 +8508,8 @@ build_lb_affinity_lr_flows(struct hmap *lflows, const 
struct ovn_northd_lb *lb,
 
 /* Prepare common part of affinity match. */
 ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
-  "ct.new && %s && %s == ", ip_match, reg_backend);
+  "ct.new && %s.dst == %s && %s == ", ip_match,
+  lb_vip->vip_str, reg_backend);
 
 /* Store the common part length. */
 size_t aff_action_len = aff_action.length;
@@ -8587,13 +8588,13 @@ build_lb_affinity_lr_flows(struct hmap *lflows, const 
struct ovn_northd_lb *lb,
  *
  * - load balancing:
  *   table=ls_in_lb, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
  * && REG_LB_AFF_BACKEND_IP4 == B1 && REG_LB_AFF_MATCH_PORT == BP1)
  *  action=(REGBIT_CONNTRACK_COMMIT = 0;
  *  REG_ORIG_DIP_IPV4 = V; REG_ORIG_TP_DPORT = VP;
  *  ct_lb_mark(backends=B1:BP1);)
  *   table=ls_in_lb, priority=150
- *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4
+ *  match=(REGBIT_KNOWN_LB_SESSION == 1 && ct.new && ip4.dst == V
  * && REG_LB_AFF_BACKEND_IP4 == B2 && REG_LB_AFF_MATCH_PORT == BP2)
  *  action=(REGBIT_CONNTRACK_COMMIT = 0;
  *  REG_ORIG_DIP_IPV4 = V;
@@ -8696,7 +8697,8 @@ build_lb_affinity_ls_flows(struct hmap *lflows,
 
 /* Prepare common part of affinity match. */
 ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
-  "ct.new && %s && %s == ", ip_match, reg_backend);
+  "ct.new && %s.dst == %s && %s == ", ip_match,
+  lb_vip->vip_str, reg_backend);
 
 /* Store the common part length. */
 size_t aff_action_len = aff_action.length;
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 07ef1d092..bccedbaf7 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -200,14 +200,14 @@ m4_define([_OVS_VSWITCHD_START],
AT_CHECK([[sed < stderr '
 /vlog|INFO|opened log file/d
 /ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d']])
-   AT_CAPTURE_FILE([ovsdb-server.log])
+  # AT_CAPTURE_FILE([ovsdb-server.log])
 
dnl Initialize database.
AT_CHECK([ovs-vsctl --no-wait init $2])
 
dnl Start ovs-vswitchd.
AT_CHECK([ovs-vswitchd $1 --detach --no-chdir --pidfile --log-file -vvconn 
-vofproto_dpif -vunixctl], [0], [], [stderr])
-   AT_CAPTURE_FILE([ovs-vswitchd.log])
+   #AT_CAPTURE_FILE([ovs-vswitchd.log])
on_exit "kill_ovs_vswitchd `cat 

Re: [ovs-dev] [PATCH ovn 2/2] OVN-SB: Exposes igmp group protocol version through IGMP table.

2024-01-23 Thread Mohammad Heib
On Tue, Jan 23, 2024 at 4:10 PM Dumitru Ceara  wrote:

> On 1/23/24 14:58, Mohammad Heib wrote:
> >>>  static struct pinctrl pinctrl;
> >>> @@ -3586,11 +3587,21 @@ pinctrl_update(const struct ovsdb_idl *idl,
> >> const char *br_int_name)
> >>>  if (dns_supports_ovn_owned != pinctrl.dns_supports_ovn_owned) {
> >>>  pinctrl.dns_supports_ovn_owned = dns_supports_ovn_owned;
> >>>
> >>> -/* Notify pinctrl_handler that fdb timestamp column
> >>> +/* Notify pinctrl_handler that dns ovn_owned column
> >>> * availability has changed. */
> >>>  notify_pinctrl_handler();
> >>>  }
> >>>
> >>> +bool igmp_support_proto =
> >>> +sbrec_server_has_igmp_group_table_col_protocol(idl);
> >>> +if (igmp_support_proto != pinctrl.igmp_support_protocol) {
> >>> +pinctrl.igmp_support_protocol = igmp_support_proto;
> >> We only use this in the main thread, when updating the SB, why can't we
> >> just directly check the column support there instead?
> >>
> > *like you mean to call
> > sbrec_server_has_igmp_group_table_col_protocol(idl); inside the
> **ip_mcast_sync
> > function?*
> > *something like this:*
> >
> >
> >
> >
> > */* Set Group protocol*/if
> > (sbrec_server_has_igmp_group_table_col_protocol(idl)) {
> > igmp_group_set_protocol(sbrec_igmp,
> > mc_group->protocol_version);}*
>
> Yes, I think that would be better.  Or even inside
> igmp_group_update_ports() but then we should also pass the IDL pointer;
> the latter seems like a better option to me.
>
> Please let me know what you think.
>
Sound good to me :),
i will send v2 addressing your comments when I can bump the ovs to the last
stable.

Thank you so much :)

>
> Regards,
> Dumitru
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] pinctrl: dns: Ignore additional additional records.

2024-01-23 Thread Dumitru Ceara
EDNS is backwards compatible so it's safe to just ignore additional ARs.

Reported-at: https://github.com/ovn-org/ovn/issues/228
Reported-at: https://issues.redhat.com/browse/FDP-222
Signed-off-by: Dumitru Ceara 
---
 controller/pinctrl.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 4992eab089..0be77701ec 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2885,6 +2885,7 @@ dns_build_ptr_answer(
 free(encoded);
 }
 
+#define DNS_QUERY_TYPE_CLASS_LEN (2 * sizeof(ovs_be16))
 #define DNS_RCODE_SERVER_REFUSE 0x5
 
 /* Called with in the pinctrl_handler thread context. */
@@ -2949,18 +2950,13 @@ pinctrl_handle_dns_lookup(
 goto exit;
 }
 
-/* Check if there is an additional record present, which is unsupported */
-if (in_dns_header->arcount) {
-VLOG_DBG_RL(, "Received DNS query with additional records, which"
-" is unsupported");
-goto exit;
-}
-
 struct udp_header *in_udp = dp_packet_l4(pkt_in);
 size_t udp_len = ntohs(in_udp->udp_len);
 size_t l4_len = dp_packet_l4_size(pkt_in);
+uint8_t *l4_start = (uint8_t *) in_udp;
 uint8_t *end = (uint8_t *)in_udp + MIN(udp_len, l4_len);
 uint8_t *in_dns_data = (uint8_t *)(in_dns_header + 1);
+uint8_t *in_dns_data_start = in_dns_data;
 uint8_t *in_queryname = in_dns_data;
 uint16_t idx = 0;
 struct ds query_name;
@@ -2984,7 +2980,7 @@ pinctrl_handle_dns_lookup(
 in_dns_data += idx;
 
 /* Query should have TYPE and CLASS fields */
-if (in_dns_data + (2 * sizeof(ovs_be16)) > end) {
+if (in_dns_data + DNS_QUERY_TYPE_CLASS_LEN > end) {
 ds_destroy(_name);
 goto exit;
 }
@@ -2998,6 +2994,10 @@ pinctrl_handle_dns_lookup(
 goto exit;
 }
 
+uint8_t *rest = in_dns_data + DNS_QUERY_TYPE_CLASS_LEN;
+uint32_t query_size = rest - in_dns_data_start;
+uint32_t query_l4_size = rest - l4_start;
+
 uint64_t dp_key = ntohll(pin->flow_metadata.flow.metadata);
 const char *answer_data = NULL;
 bool ovn_owned = false;
@@ -3080,7 +3080,7 @@ pinctrl_handle_dns_lookup(
 goto exit;
 }
 
-uint16_t new_l4_size = ntohs(in_udp->udp_len) +  dns_answer.size;
+uint16_t new_l4_size = query_l4_size + dns_answer.size;
 size_t new_packet_size = pkt_in->l4_ofs + new_l4_size;
 struct dp_packet pkt_out;
 dp_packet_init(_out, new_packet_size);
@@ -3117,7 +3117,7 @@ pinctrl_handle_dns_lookup(
 out_dns_header->arcount = 0;
 
 /* Copy the Query section. */
-dp_packet_put(_out, dp_packet_data(pkt_in), dp_packet_size(pkt_in));
+dp_packet_put(_out, dp_packet_data(pkt_in), query_size);
 
 /* Copy the answer sections. */
 dp_packet_put(_out, dns_answer.data, dns_answer.size);
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] OVN-SB: Exposes igmp group protocol version through IGMP table.

2024-01-23 Thread Dumitru Ceara
On 1/23/24 14:58, Mohammad Heib wrote:
>>>  static struct pinctrl pinctrl;
>>> @@ -3586,11 +3587,21 @@ pinctrl_update(const struct ovsdb_idl *idl,
>> const char *br_int_name)
>>>  if (dns_supports_ovn_owned != pinctrl.dns_supports_ovn_owned) {
>>>  pinctrl.dns_supports_ovn_owned = dns_supports_ovn_owned;
>>>
>>> -/* Notify pinctrl_handler that fdb timestamp column
>>> +/* Notify pinctrl_handler that dns ovn_owned column
>>> * availability has changed. */
>>>  notify_pinctrl_handler();
>>>  }
>>>
>>> +bool igmp_support_proto =
>>> +sbrec_server_has_igmp_group_table_col_protocol(idl);
>>> +if (igmp_support_proto != pinctrl.igmp_support_protocol) {
>>> +pinctrl.igmp_support_protocol = igmp_support_proto;
>> We only use this in the main thread, when updating the SB, why can't we
>> just directly check the column support there instead?
>>
> *like you mean to call
> sbrec_server_has_igmp_group_table_col_protocol(idl); inside the 
> **ip_mcast_sync
> function?*
> *something like this:*
> 
> 
> 
> 
> */* Set Group protocol*/if
> (sbrec_server_has_igmp_group_table_col_protocol(idl)) {
> igmp_group_set_protocol(sbrec_igmp,
> mc_group->protocol_version);}*

Yes, I think that would be better.  Or even inside
igmp_group_update_ports() but then we should also pass the IDL pointer;
the latter seems like a better option to me.

Please let me know what you think.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] OVN-SB: Exposes igmp group protocol version through IGMP table.

2024-01-23 Thread Mohammad Heib
On Mon, Jan 22, 2024 at 5:26 PM Dumitru Ceara  wrote:

> On 1/22/24 15:14, Mohammad Heib wrote:
> > Expose the igmp/mld group protocol version through the
> > IGMP_GROUP table in SBDB.
> >
> > This patch can be used by ovn consumer for debuggability purposes, user
> > now can  match between the protocol version used in the OVN logical
> > switches and the uplink ports.
> >
> > Rreported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2160476
> > Signed-off-by: Mohammad Heib 
> > ---
>
Hi Dumitru,
Thank you for your review :), i have small question please see below.


> Hi Mohammad,
>
> I have a few (minor) comments, please see below.
>
> >  NEWS  |  2 ++
> >  controller/ip-mcast.c |  8 
> >  controller/ip-mcast.h |  3 +++
> >  controller/pinctrl.c  | 19 ++-
> >  northd/ovn-northd.c   |  2 +-
> >  ovn-sb.ovsschema  |  5 +++--
> >  ovn-sb.xml|  4 
> >  tests/ovn.at  |  3 +++
> >  8 files changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 5f267b4c6..9075e7d80 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -14,6 +14,8 @@ Post v23.09.0
> >- ovn-northd-ddlog has been removed.
> >- A new LSP option "enable_router_port_acl" has been added to enable
> >  conntrack for the router port whose peer is l3dgw_port if set it
> true.
> > +  - IGMP_Group have a new "protocol" column that displays the the group
>
> Nit: s/have/has
>
> > +protocol version.
> >
> >  OVN v23.09.0 - 15 Sep 2023
> >  --
> > diff --git a/controller/ip-mcast.c b/controller/ip-mcast.c
> > index a870fb29e..83e41c81f 100644
> > --- a/controller/ip-mcast.c
> > +++ b/controller/ip-mcast.c
> > @@ -226,6 +226,14 @@ igmp_group_cleanup(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >  return true;
> >  }
> >
> > +
> > +void igmp_group_set_protocol(const struct sbrec_igmp_group *group,
> > + mcast_group_proto protocol)
> > +{
> > +sbrec_igmp_group_set_protocol(group,
> > +
> mcast_snooping_group_protocol_str(protocol));
> > +}
> > +
>
> Does it make more sense to add the protocol as argument to
> igmp_group_update_ports() and rename that function to igmp_group_update()?
>
yes that make sense and will make the code more cleaner.

>
> >  static const struct sbrec_igmp_group *
> >  igmp_group_lookup_(struct ovsdb_idl_index *igmp_groups,
> > const char *addr_str,
> > diff --git a/controller/ip-mcast.h b/controller/ip-mcast.h
> > index 326f39db1..f0c34343f 100644
> > --- a/controller/ip-mcast.h
> > +++ b/controller/ip-mcast.h
> > @@ -63,4 +63,7 @@ void igmp_group_delete(const struct sbrec_igmp_group
> *g);
> >  bool igmp_group_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >  struct ovsdb_idl_index *igmp_groups);
> >
> > +void igmp_group_set_protocol(const struct sbrec_igmp_group *group,
> > + mcast_group_proto protocol);
> > +
> >  #endif /* controller/ip-mcast.h */
> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > index 77bf67e58..6379b7afb 100644
> > --- a/controller/pinctrl.c
> > +++ b/controller/pinctrl.c
> > @@ -180,6 +180,7 @@ struct pinctrl {
> >  bool mac_binding_can_timestamp;
> >  bool fdb_can_timestamp;
> >  bool dns_supports_ovn_owned;
> > +bool igmp_support_protocol;
> >  };
> >
> >  static struct pinctrl pinctrl;
> > @@ -3586,11 +3587,21 @@ pinctrl_update(const struct ovsdb_idl *idl,
> const char *br_int_name)
> >  if (dns_supports_ovn_owned != pinctrl.dns_supports_ovn_owned) {
> >  pinctrl.dns_supports_ovn_owned = dns_supports_ovn_owned;
> >
> > -/* Notify pinctrl_handler that fdb timestamp column
> > +/* Notify pinctrl_handler that dns ovn_owned column
> > * availability has changed. */
> >  notify_pinctrl_handler();
> >  }
> >
> > +bool igmp_support_proto =
> > +sbrec_server_has_igmp_group_table_col_protocol(idl);
> > +if (igmp_support_proto != pinctrl.igmp_support_protocol) {
> > +pinctrl.igmp_support_protocol = igmp_support_proto;
>
> We only use this in the main thread, when updating the SB, why can't we
> just directly check the column support there instead?
>
*like you mean to call
sbrec_server_has_igmp_group_table_col_protocol(idl); inside the **ip_mcast_sync
function?*
*something like this:*




*/* Set Group protocol*/if
(sbrec_server_has_igmp_group_table_col_protocol(idl)) {
igmp_group_set_protocol(sbrec_igmp,
mc_group->protocol_version);}*

>
> > +
> > +/* Notify pinctrl_handler that igmp protocol column
> > + * availability has changed. */
> > +notify_pinctrl_handler();
> > +}
> > +
> >  ovs_mutex_unlock(_mutex);
> >  }
> >
> > @@ -5400,6 +5411,12 @@ ip_mcast_sync(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > local_dp->datapath,
> chassis);
> >  }
> >
> > +/* 

Re: [ovs-dev] [PATCH ovn 1/2] ovs: Bump submodule to include igmp protocol version.

2024-01-23 Thread Mohammad Heib
Hi Dumitru,

Thank you for the review,
yes sure will bump the submodule to the last stable once they fix the issue.
Thanks

On Mon, Jan 22, 2024 at 5:01 PM Dumitru Ceara  wrote:

> On 1/22/24 15:14, Mohammad Heib wrote:
> > Specifically the following commit:
> >   077d0bad0436 ("mcast-snooping: Store IGMP/MLD protocol version.")
> >
> > Also fix a small compilation error due to prototype change.
> >
> > Signed-off-by: Mohammad Heib 
> > ---
>
> Hi Mohammad,
>
> Thanks for the patch!
>
> >  controller/pinctrl.c | 6 +-
> >  ovs  | 2 +-
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > index 4992eab08..77bf67e58 100644
> > --- a/controller/pinctrl.c
> > +++ b/controller/pinctrl.c
> > @@ -5474,9 +5474,13 @@ pinctrl_ip_mcast_handle_igmp(struct rconn *swconn,
> >  switch (ntohs(ip_flow->tp_src)) {
> >  case IGMP_HOST_MEMBERSHIP_REPORT:
> >  case IGMPV2_HOST_MEMBERSHIP_REPORT:
> > +mcast_group_proto grp_proto =
> > +(ntohs(ip_flow->tp_src) == IGMP_HOST_MEMBERSHIP_REPORT)
> > +? MCAST_GROUP_IGMPV1
> > +: MCAST_GROUP_IGMPV2;
> >  group_change =
> >  mcast_snooping_add_group4(ip_ms->ms, ip4, IP_MCAST_VLAN,
> > -  port_key_data);
> > +  port_key_data, grp_proto);
> >  break;
> >  case IGMP_HOST_LEAVE_MESSAGE:
> >  group_change =
> > diff --git a/ovs b/ovs
> > index 4102674b3..b222593bc 16
> > --- a/ovs
> > +++ b/ovs
> > @@ -1 +1 @@
> > -Subproject commit 4102674b3ecadb0e20e512cc661cddbbc4b3d1f6
> > +Subproject commit b222593bc69b5d82849d18eb435564f5f93449d3
>
> However, it's probably desirable to bump the submodule to the tip of the
> latest stable branch, i.e. branch-3.3:
>
>
> https://github.com/ovn-org/ovn/blob/main/Documentation/internals/ovs_submodule.rst#submodules-for-releases
>
> That would also fix our scheduled CI runs:
>
> https://github.com/ovn-org/ovn/actions/runs/7597582431/job/20692570222#step:10:3531
>
> However, there's a crash in OVS on branch-3.3 that needs to be fixed first:
> https://issues.redhat.com/browse/FDP-300
>
> I'd wait with bumping the submodule until then.
>
> Regards,
> Dumitru
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2] ovn-ic: Handle NB:name updates properly.

2024-01-23 Thread Mohammad Heib
When the user updates the NB_GLOBAL.name after registering
to IC Databases if the user already has defined chassis as a gateway
that will cause ovn-ic instance to run in an infinity loop trying
to update the gateways and insert the current gateway to the SB.chassis
tables as a remote chassis (we match on the new AZ ) which will fail since
we already have this chassis with is-interconn in local SB.

This patch aims to fix the above issues by updating the AZ.name only
when the user updates the NB.name locally.

Signed-off-by: Mohammad Heib 
Acked-by: Dumitru Ceara 
---
 ic/ovn-ic.c | 10 +++---
 tests/ovn-ic.at | 24 
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 8ceb34d7c..12e2729ce 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -132,14 +132,18 @@ az_run(struct ic_context *ctx)
 return NULL;
 }
 
-/* Delete old AZ if name changes.  Note: if name changed when ovn-ic
- * is not running, one has to manually delete the old AZ with:
+/* Update old AZ if name changes.  Note: if name changed when ovn-ic
+ * is not running, one has to manually delete/update the old AZ with:
  * "ovn-ic-sbctl destroy avail ". */
 static char *az_name;
 const struct icsbrec_availability_zone *az;
 if (az_name && strcmp(az_name, nb_global->name)) {
 ICSBREC_AVAILABILITY_ZONE_FOR_EACH (az, ctx->ovnisb_idl) {
-if (!strcmp(az->name, az_name)) {
+/* AZ name update locally need to update az in ISB. */
+if (nb_global->name[0] && !strcmp(az->name, az_name)) {
+icsbrec_availability_zone_set_name(az, nb_global->name);
+break;
+} else if (!nb_global->name[0] && !strcmp(az->name, az_name)) {
 icsbrec_availability_zone_delete(az);
 break;
 }
diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index d4c436f84..6061d054c 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -28,7 +28,31 @@ availability-zone az3
 ])
 
 OVN_CLEANUP_IC([az1], [az2])
+AT_CLEANUP
+])
+
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-ic -- AZ update in GW])
+ovn_init_ic_db
+net_add n1
 
+ovn_start az1
+sim_add gw-az1
+as gw-az1
+
+check ovs-vsctl add-br br-phys
+ovn_az_attach az1 n1 br-phys 192.168.1.1
+check ovs-vsctl set open . external-ids:ovn-is-interconn=true
+
+az_uuid=$(fetch_column ic-sb:availability-zone _uuid name="az1")
+ovn_as az1 ovn-nbctl set NB_Global . name="az2"
+wait_column "$az_uuid" ic-sb:availability-zone _uuid name="az2"
+
+# make sure that gateway still point to the same AZ with new name
+wait_column "$az_uuid" ic-sb:gateway availability_zone name="gw-az1"
+
+OVN_CLEANUP_IC([az1])
 AT_CLEANUP
 ])
 
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] userspace: Check for inner L3 while preparing encapsulated packets.

2024-01-23 Thread Mike Pattrick
On Tue, Jan 23, 2024 at 7:41 AM Ilya Maximets  wrote:
>
> On 1/22/24 23:24, Mike Pattrick wrote:
> > On Mon, Jan 22, 2024 at 4:10 PM Ilya Maximets  wrote:
> >>
> >> On 1/22/24 21:33, Mike Pattrick wrote:
> >>> On Mon, Jan 22, 2024 at 1:47 PM Ilya Maximets  wrote:
> 
>  On 1/22/24 18:51, Mike Pattrick wrote:
> > The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
> > a double encapsulated BFD packet would trigger a seg fault. This
> > happened because we had assumed that if IP checksumming was marked as
> > offloaded, that checksum would occur in the innermost packet.
> >
> > This change will check that the inner packet has an L3 and L4 before
> > checksumming those parts. And if missing, will resume checksumming the
> > outer components.
> 
>  Hrm.  This looks like a workaround rather than a fix.  If the inner 
>  packet
>  doesn't have L3 header, dp-packet must not have a flag for L3 
>  checksumming
>  set in the first place.  And if it does have inner L3, the offset must be
>  initialized.  I guess, some of the offsets can be not initialized, 
>  because
>  the packet is never parsed by either miniflow_extract() or 
>  parse_tcp_flags().
>  And the bfd_put_packet() doesn't seem to set them.
> >>>
> >>> I think you're right, I stepped through the problem in GDB and it was
> >>> clear that the flags weren't getting reset properly between BFD
> >>> packets. I'll send an updated patch.
> >>
> >> Yeah, flags preservation is one thing, the other is that l3/l4_ofs are just
> >> not populated in these packets, which doesn't sound right.
> >>
> >> It's also not clear why the packet is marked for inner checksum offload if
> >> the inner checksum is actually fully calculated and correct.  I understand
> >> that the flags are carried over from a previous packet, but why did that
> >> previous packet have this flag set?
> >
> > Here's an example:
> >
> > monitor_run()
> > \_ dp_packet_use_stub(, stub, sizeof stub); <--- initializes packet 
> > once
> > \_ while(!heap_is_empty(_heap) <-- loop where the same packet
> > can be reused
> > \_ monitor_mport_run()
> > \_ dp_packet_clear() <- Data cleared, but flags are not cleared
> > \_ cfm_compose_ccm() / bfd_put_packet() / lldp_put_packet() <-
> > one or more can run
> > \_ Note that cfm_compose_ccm and lldp_put_packet call
> > eth_compose, but bfd_put_packet doesn't. bfd_put_packet doesn't reset
> > the offsets like eth_compose does.
>
> Sounds like a bug to me.  bfd_put_packet() should set correct offsets
> in the packet, otherwise we'll get random failures with different
> actions that may depend on these offsets to be populated.

Agreed. Do you want me to homologate this as part of this series?

>
> > \_ ofproto_dpif_send_packet()
> >    non-relevant stack trace ...
> >  \_ netdev_push_header()
> >   \_ First run, push geneve header and toggle geneve flag
> >   \_ Second run, detect geneve header flag, call 
> > dp_packet_ol_send_prepare()
> >
> > Given the above, I think it makes sense to clear the offload flags in
> > dp_packet_clear().
>
> I agree with that.  But it still doesn't explain why the 
> DP_PACKET_OL_TX_IP_CKSUM
> is set after the first run.  The inner checksum is fully calculated and 
> correct.
> There should be no Tx offloading for it set.  Only the 
> DP_PACKET_OL_TX_OUTER_IP_CKSUM
> should be set in this packet.  Or am I missing something?

bfd_put_packet() calculates the checksum, so it will always be
correct, miniflow_extract() sets DP_PACKET_OL_TX_IP_CKSUM if the
previous packet had DP_PACKET_OL_RX_IP_CKSUM_GOOD, which it would have
gotten from dp_packet_ol_send_prepare().


>
> >
> > -M
> >
> >>
> >>>
>  BTW, is there actually a double encapsulation in the original OVN test?
>  Sounds strange.
> >>>
> >>> The problem was exposed in netdev_push_header() in
> >>> dp_packet_ol_send_prepare(packet, 0);
> >>
> >> That's true, but the packet dump in gdb showed a plain BFD packet.
> >> So, it was a first encapsulation, not double.
> >>
> >>>
> >>> -M
> >>>
> 
> >
> > Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
> > Reported-by: Dumitru Ceara 
> > Reported-at: https://issues.redhat.com/browse/FDP-300
> > Signed-off-by: Mike Pattrick 
> > ---
> >  lib/dp-packet.h | 10 --
> >  lib/packets.c   |  6 +++---
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> 
>  Best regards, Ilya Maximets.
> 
> >>>
> >>
> >
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 3/3] ovn-controller: Support VIF-based local encap IPs selection.

2024-01-23 Thread Ales Musil
On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:

> Commit dd527a283cd8 partially supported multiple encap IPs. It supported
> remote encap IP selection based on the destination VIF's encap_ip
> configuration. This patch adds the support for selecting local encap IP
> based on the source VIF's encap_ip configuration.
>
> Co-authored-by: Lei Huang 
> Signed-off-by: Lei Huang 
> Signed-off-by: Han Zhou 
> ---
>

Hi Han and Lei,

thank you for the patch, I have a couple of comments/questions down below.

 NEWS|   3 +
>  controller/chassis.c|   2 +-
>  controller/local_data.c |   2 +-
>  controller/local_data.h |   2 +-
>  controller/ovn-controller.8.xml |  30 ++-
>  controller/ovn-controller.c |  49 
>  controller/physical.c   | 134 ++--
>  controller/physical.h   |   2 +
>  include/ovn/logical-fields.h|   4 +-
>  ovn-architecture.7.xml  |  18 -
>  tests/ovn.at|  51 +++-
>  11 files changed, 243 insertions(+), 54 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5f267b4c64cc..5a3eed608617 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,9 @@ Post v23.09.0
>- ovn-northd-ddlog has been removed.
>- A new LSP option "enable_router_port_acl" has been added to enable
>  conntrack for the router port whose peer is l3dgw_port if set it true.
> +  - Support selecting encapsulation IP based on the source/destination
> VIF's
> +settting. See ovn-controller(8) 'external_ids:ovn-encap-ip' for more
> +details.
>
>  OVN v23.09.0 - 15 Sep 2023
>  --
> diff --git a/controller/chassis.c b/controller/chassis.c
> index a6f13ccc42d5..55f2beb37674 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -61,7 +61,7 @@ struct ovs_chassis_cfg {
>
>  /* Set of encap types parsed from the 'ovn-encap-type' external-id. */
>  struct sset encap_type_set;
> -/* Set of encap IPs parsed from the 'ovn-encap-type' external-id. */
> +/* Set of encap IPs parsed from the 'ovn-encap-ip' external-id. */
>  struct sset encap_ip_set;
>  /* Interface type list formatted in the OVN-SB Chassis required
> format. */
>  struct ds iface_types;
> diff --git a/controller/local_data.c b/controller/local_data.c
> index a9092783958f..8606414f8728 100644
> --- a/controller/local_data.c
> +++ b/controller/local_data.c
> @@ -514,7 +514,7 @@ chassis_tunnels_destroy(struct hmap *chassis_tunnels)
>   */
>  struct chassis_tunnel *
>  chassis_tunnel_find(const struct hmap *chassis_tunnels, const char
> *chassis_id,
> -char *remote_encap_ip, char *local_encap_ip)
> +char *remote_encap_ip, const char *local_encap_ip)
>

nit: Unrelated change.

 {
>  /*
>   * If the specific encap_ip is given, look for the chassisid_ip entry,
> diff --git a/controller/local_data.h b/controller/local_data.h
> index bab95bcc3824..ca3905bd20e6 100644
> --- a/controller/local_data.h
> +++ b/controller/local_data.h
> @@ -150,7 +150,7 @@ bool local_nonvif_data_handle_ovs_iface_changes(
>  struct chassis_tunnel *chassis_tunnel_find(const struct hmap
> *chassis_tunnels,
> const char *chassis_id,
> char *remote_encap_ip,
> -   char *local_encap_ip);
> +   const char *local_encap_ip);
>

Same as above.


>  bool get_chassis_tunnel_ofport(const struct hmap *chassis_tunnels,
> const char *chassis_name,
> diff --git a/controller/ovn-controller.8.xml
> b/controller/ovn-controller.8.xml
> index efa65e3fd927..5ebef048d721 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -176,10 +176,32 @@
>
>external_ids:ovn-encap-ip
>
> -The IP address that a chassis should use to connect to this node
> -using encapsulation types specified by
> -external_ids:ovn-encap-type. Multiple encapsulation
> IPs
> -may be specified with a comma-separated list.
> +
> +  The IP address that a chassis should use to connect to this node
> +  using encapsulation types specified by
> +  external_ids:ovn-encap-type. Multiple
> encapsulation IPs
> +  may be specified with a comma-separated list.
> +
> +
> +  In scenarios where multiple encapsulation IPs are present,
> distinct
> +  tunnels are established for each remote chassis. These tunnels
> are
> +  differentiated by setting unique options:local_ip
> and
> +  options:remote_ip values in the tunnel interface.
> When
> +  transmitting a packet to a remote chassis, the selection of
> local_ip
> +  is guided by the Interface:external_ids:encap-ip
> from
> +  the local OVSDB, corresponding to the VIF 

Re: [ovs-dev] [PATCH ovn 2/3] encaps: Create separate tunnels for multiple local encap IPs.

2024-01-23 Thread Ales Musil
On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:

> In commit dd527a283cd8, it created separate tunnels for different remote
> encap IPs of the same remote chassis, but when the current chassis is
> the one that has multiple encap IPs configured, it only uses the first
> encap IP. This patch creates separate tunnels taking into consideration
> the multiple encap IPs in current chassis and sets corresponding
> local_ip for each tunnel interface in such cases.
>
> Co-authored-by: Lei Huang 
> Signed-off-by: Lei Huang 
> Signed-off-by: Han Zhou 
> ---
>  controller/bfd.c|   4 +-
>  controller/encaps.c | 158 ++--
>  controller/encaps.h |   9 ++-
>  controller/lflow.c  |   2 +-
>  controller/local_data.c |  14 ++--
>  controller/local_data.h |   5 +-
>  controller/physical.c   |  28 +++
>  controller/pinctrl.c|   2 +-
>  tests/ovn-ipsec.at  |  49 -
>  tests/ovn.at|  88 +-
>  10 files changed, 192 insertions(+), 167 deletions(-)
>
> diff --git a/controller/bfd.c b/controller/bfd.c
> index cf011e382c6c..f24bfd063888 100644
> --- a/controller/bfd.c
> +++ b/controller/bfd.c
> @@ -75,7 +75,7 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge
> *br_int,
>  char *chassis_name = NULL;
>
>  if (encaps_tunnel_id_parse(id, _name,
> -   NULL)) {
> +   NULL, NULL)) {
>  if (!sset_contains(active_tunnels,
> chassis_name)) {
>  sset_add(active_tunnels,
> chassis_name);
> @@ -204,7 +204,7 @@ bfd_run(const struct ovsrec_interface_table
> *interface_table,
>
>  sset_add(, port_name);
>
> -if (encaps_tunnel_id_parse(tunnel_id, _name, NULL)) {
> +if (encaps_tunnel_id_parse(tunnel_id, _name, NULL,
> NULL)) {
>  if (sset_contains(_chassis, chassis_name)) {
>  sset_add(_ifaces, port_name);
>  }
> diff --git a/controller/encaps.c b/controller/encaps.c
> index 1f6e667a606c..28237f6191c8 100644
> --- a/controller/encaps.c
> +++ b/controller/encaps.c
> @@ -32,10 +32,9 @@ VLOG_DEFINE_THIS_MODULE(encaps);
>  /*
>   * Given there could be multiple tunnels with different IPs to the same
>   * chassis we annotate the external_ids:ovn-chassis-id in tunnel port with
> - * OVN_MVTEP_CHASSISID_DELIM. The external_id key
> + * @%. The external_id key
>   * "ovn-chassis-id" is kept for backward compatibility.
>   */
> -#defineOVN_MVTEP_CHASSISID_DELIM '@'
>  #define OVN_TUNNEL_ID "ovn-chassis-id"
>
>  static char *current_br_int_name = NULL;
> @@ -95,72 +94,93 @@ tunnel_create_name(struct tunnel_ctx *tc, const char
> *chassis_id)
>  }
>
>  /*
> - * Returns a tunnel-id of the form 'chassis_id'-delimiter-'encap_ip'.
> + * Returns a tunnel-id of the form chassis_id@remote_encap_ip
> %local_encap_ip.
>   */
>  char *
> -encaps_tunnel_id_create(const char *chassis_id, const char *encap_ip)
> +encaps_tunnel_id_create(const char *chassis_id, const char
> *remote_encap_ip,
> +const char *local_encap_ip)
>  {
> -return xasprintf("%s%c%s", chassis_id, OVN_MVTEP_CHASSISID_DELIM,
> - encap_ip);
> +return xasprintf("%s%c%s%c%s", chassis_id, '@', remote_encap_ip,
> + '%', local_encap_ip);
>  }
>
>  /*
> - * Parses a 'tunnel_id' of the form .
> + * Parses a 'tunnel_id' of the form @%.
>   * If the 'chassis_id' argument is not NULL the function will allocate
> memory
>   * and store the chassis_name part of the tunnel-id at '*chassis_id'.
> - * If the 'encap_ip' argument is not NULL the function will allocate
> memory
> - * and store the encapsulation IP part of the tunnel-id at '*encap_ip'.
> + * Same for remote_encap_ip and local_encap_ip.
>   */
>  bool
>  encaps_tunnel_id_parse(const char *tunnel_id, char **chassis_id,
> -   char **encap_ip)
> +   char **remote_encap_ip, char **local_encap_ip)
>  {
> -/* Find the delimiter.  Fail if there is no delimiter or if
> 
> - * or  is the empty string.*/
> -const char *d = strchr(tunnel_id, OVN_MVTEP_CHASSISID_DELIM);
> +/* Find the @.  Fail if there is no @ or if any part is empty. */
> +const char *d = strchr(tunnel_id, '@');
>  if (d == tunnel_id || !d || !d[1]) {
>  return false;
>  }
>
> +/* Find the %.  Fail if there is no % or if any part is empty. */
> +const char *d2 = strchr(d + 1, '%');
> +if (d2 == d + 1 || !d2 || !d2[1]) {
> +return false;
> +}
> +
>  if (chassis_id) {
>  *chassis_id = xmemdup0(tunnel_id, d - tunnel_id);
>  }
> -if (encap_ip) {
> -*encap_ip = xstrdup(d + 1);
> +
> +if (remote_encap_ip) {
> +

Re: [ovs-dev] [PATCH ovn 1/3] encaps: Refactor the naming related to tunnels.

2024-01-23 Thread Ales Musil
On Wed, Jan 17, 2024 at 6:48 AM Han Zhou  wrote:

> Rename vars and structs to reflect the fact that there can be multiple
> tunnels for each individual chassis. Also update the documentation of
> external_ids:ovn-encap-ip.
>
> Signed-off-by: Han Zhou 
> ---
>  controller/encaps.c | 77 -
>  controller/ovn-controller.8.xml |  3 +-
>  2 files changed, 40 insertions(+), 40 deletions(-)
>
> diff --git a/controller/encaps.c b/controller/encaps.c
> index b69d725843e9..1f6e667a606c 100644
> --- a/controller/encaps.c
> +++ b/controller/encaps.c
> @@ -31,10 +31,12 @@ VLOG_DEFINE_THIS_MODULE(encaps);
>
>  /*
>   * Given there could be multiple tunnels with different IPs to the same
> - * chassis we annotate the ovn-chassis-id with
> - * OVN_MVTEP_CHASSISID_DELIM.
> + * chassis we annotate the external_ids:ovn-chassis-id in tunnel port with
> + * OVN_MVTEP_CHASSISID_DELIM. The external_id key
> + * "ovn-chassis-id" is kept for backward compatibility.
>   */
>  #defineOVN_MVTEP_CHASSISID_DELIM '@'
> +#define OVN_TUNNEL_ID "ovn-chassis-id"
>
>  static char *current_br_int_name = NULL;
>
> @@ -55,8 +57,9 @@ encaps_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>
>  /* Enough context to create a new tunnel, using tunnel_add(). */
>  struct tunnel_ctx {
> -/* Maps from a chassis name to "struct chassis_node *". */
> -struct shash chassis;
> +/* Maps from a tunnel-id (stored in external_ids:ovn-chassis-id) to
> + * "struct tunnel_node *". */
> +struct shash tunnel;
>
>  /* Names of all ports in the bridge, to allow checking uniqueness when
>   * adding a new tunnel. */
> @@ -68,7 +71,7 @@ struct tunnel_ctx {
>  const struct sbrec_chassis *this_chassis;
>  };
>
> -struct chassis_node {
> +struct tunnel_node {
>  const struct ovsrec_port *port;
>  const struct ovsrec_bridge *bridge;
>  };
> @@ -104,7 +107,7 @@ encaps_tunnel_id_create(const char *chassis_id, const
> char *encap_ip)
>  /*
>   * Parses a 'tunnel_id' of the form .
>   * If the 'chassis_id' argument is not NULL the function will allocate
> memory
> - * and store the chassis-id part of the tunnel-id at '*chassis_id'.
> + * and store the chassis_name part of the tunnel-id at '*chassis_id'.
>   * If the 'encap_ip' argument is not NULL the function will allocate
> memory
>   * and store the encapsulation IP part of the tunnel-id at '*encap_ip'.
>   */
> @@ -169,7 +172,7 @@ tunnel_add(struct tunnel_ctx *tc, const struct
> sbrec_sb_global *sbg,
>
>  /*
>   * Since a chassis may have multiple encap-ip, we can't just add the
> - * chassis name as as the "ovn-chassis-id" for the port; we use the
> + * chassis name as the OVN_TUNNEL_ID for the port; we use the
>   * combination of the chassis_name and the encap-ip to identify
>   * a specific tunnel to the chassis.
>   */
> @@ -260,25 +263,25 @@ tunnel_add(struct tunnel_ctx *tc, const struct
> sbrec_sb_global *sbg,
>  }
>  }
>
> -/* If there's an existing chassis record that does not need any
> change,
> +/* If there's an existing tunnel record that does not need any change,
>   * keep it.  Otherwise, create a new record (if there was an existing
>   * record, the new record will supplant it and encaps_run() will
> delete
>   * it). */
> -struct chassis_node *chassis = shash_find_data(>chassis,
> -   tunnel_entry_id);
> -if (chassis
> -&& chassis->port->n_interfaces == 1
> -&& !strcmp(chassis->port->interfaces[0]->type, encap->type)
> -&& smap_equal(>port->interfaces[0]->options, )) {
> -shash_find_and_delete(>chassis, tunnel_entry_id);
> -free(chassis);
> +struct tunnel_node *tunnel = shash_find_data(>tunnel,
> + tunnel_entry_id);
> +if (tunnel
> +&& tunnel->port->n_interfaces == 1
> +&& !strcmp(tunnel->port->interfaces[0]->type, encap->type)
> +&& smap_equal(>port->interfaces[0]->options, )) {
> +shash_find_and_delete(>tunnel, tunnel_entry_id);
> +free(tunnel);
>  goto exit;
>  }
>
>  /* Choose a name for the new port.  If we're replacing an old port,
> reuse
>   * its name, otherwise generate a new, unique name. */
> -char *port_name = (chassis
> -   ? xstrdup(chassis->port->name)
> +char *port_name = (tunnel
> +   ? xstrdup(tunnel->port->name)
> : tunnel_create_name(tc, new_chassis_id));
>  if (!port_name) {
>  VLOG_WARN("Unable to allocate unique name for '%s' tunnel",
> @@ -294,7 +297,7 @@ tunnel_add(struct tunnel_ctx *tc, const struct
> sbrec_sb_global *sbg,
>  struct ovsrec_port *port = ovsrec_port_insert(tc->ovs_txn);
>  ovsrec_port_set_name(port, port_name);
>  ovsrec_port_set_interfaces(port, , 1);
> -const struct smap id = SMAP_CONST1(, "ovn-chassis-id",
> 

[ovs-dev] [PATCH ovn] Controller: Handle unconditional IDL messages while paused.

2024-01-23 Thread Mohammad Heib
If the user triggers a pause command to the ovn-controller the current
implementation will wait for commands from unixctl server only and
ignore the other component.

This implementation works fine if we don't have inactivity_probe set in
the SB DataBase, but once the user sets the inactivity_probe in SB
DataBase the connection will be dropped by the SBDB. Once the controller
resumes the execution it will try to commit some changes to the SBDB but
the transaction will fail since we lost the connection to the SBDB and
the controller must reconnect before committing the transaction again.

To avoid the above scenario the controller can keep handling
unconditional IDL messages to avoid reconnecting to SB.

Signed-off-by: Mohammad Heib 
---
 controller/ovn-controller.c | 16 +---
 ovn-sb.xml  |  2 +-
 tests/ovn-controller.at | 51 +
 3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 856e5e270..d2c8f66d9 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5534,12 +5534,22 @@ main(int argc, char *argv[])
 simap_destroy();
 }
 
-/* If we're paused just run the unixctl server and skip most of the
- * processing loop.
+/* If we're paused just run the unixctl-server/unconditional IDL and
+ *  skip most of the processing loop.
  */
 if (paused) {
 unixctl_server_run(unixctl);
+int ovnsb_seq = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl);
+ovsdb_idl_run(ovnsb_idl_loop.idl);
+int new_ovnsb_seq = ovsdb_idl_get_seqno(ovnsb_idl_loop.idl);
+/* If the IDL content has changed while the controller is
+ * in pause state, trigger a recompute.
+ */
+if (new_ovnsb_seq != ovnsb_seq) {
+engine_set_force_recompute(true);
+}
 unixctl_server_wait(unixctl);
+ovsdb_idl_wait(ovnsb_idl_loop.idl);
 goto loop_done;
 }
 
@@ -6009,7 +6019,6 @@ main(int argc, char *argv[])
 OVS_NOT_REACHED();
 }
 
-ovsdb_idl_track_clear(ovnsb_idl_loop.idl);
 ovsdb_idl_track_clear(ovs_idl_loop.idl);
 
 lflow_cache_run(ctrl_engine_ctx.lflow_cache);
@@ -6017,6 +6026,7 @@ main(int argc, char *argv[])
 
 loop_done:
 memory_wait();
+ovsdb_idl_track_clear(ovnsb_idl_loop.idl);
 poll_block();
 if (should_service_stop()) {
 exit_args.exiting = true;
diff --git a/ovn-sb.xml b/ovn-sb.xml
index e393f92b3..43c13f23c 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -4308,7 +4308,7 @@ tcp.flags = RST;
   
 Maximum number of milliseconds of idle time on connection to the client
 before sending an inactivity probe message.  If Open vSwitch does not
-communicate with the client for the specified number of seconds, it
+communicate with the client for the specified number of milliseconds,it
 will send a probe.  If a response is not received for the same
 additional amount of time, Open vSwitch assumes the connection has been
 broken and attempts to reconnect.  Default is implementation-specific.
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 9d2a37c72..04e4c52e7 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -352,6 +352,57 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 AT_CLEANUP
 ])
 
+# Check that the connection to the Southbound database
+# is not dropped when probe-interval is set and the controller
+# is in pause state.
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-controller - check sbdb connection while pause])
+AT_KEYWORDS([ovn])
+ovn_start
+
+net_add n1
+sim_add hv
+as hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+set interface hv1-vif1 external-ids:iface-id=sw0-p1 \
+options:tx_pcap=hv1/vif1-tx.pcap \
+options:rxq_pcap=hv1/vif1-rx.pcap \
+ofport-request=1
+
+ovs-vsctl set open . external_ids:ovn-remote-probe-interval=10
+ovn-sbctl set connection . inactivity_probe=1
+
+ovn-nbctl ls-add sw0
+check ovn-nbctl lsp-add sw0 sw0-p1
+check ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3"
+check ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3"
+ovn-nbctl --wait=hv sync
+
+sleep_controller hv
+# Trigger DB change to make SBDB connect to controller.
+check ovn-nbctl lsp-del sw0-p1
+
+# wait for 2 sec to give enough time to the SBDB to drop the connection
+# if there is no answer from the controller. The connection should not
+# be dropped since we keep handle the idl messages from SBDB even if we
+# in pause state.
+sleep 2
+wake_up_controller hv
+
+check ovn-nbctl lsp-add sw0 sw0-p1
+check ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3"
+check ovn-nbctl lsp-set-port-security sw0-p1 

Re: [ovs-dev] [PATCH ovn] actions: Use random port selection for SNAT with external_port_range.

2024-01-23 Thread Ales Musil
On Mon, Jan 22, 2024 at 3:55 PM Dumitru Ceara  wrote:

> This is to avoid unexpected behavior changes due to the underlying
> datapath (e.g., kernel) changing defaults.  If we don't explicitly
> request a port selection algorithm, OVS leaves it up to the
> datapath to decide how to do the port selection.  Currently that means
> that source port allocation is not random if the original source port
> fits in the requested range.
>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/410847.html
> Reported-at: https://issues.redhat.com/browse/FDP-301
> Fixes: 60bdc8ea78d7 ("NAT: Provide port range in input")
> Signed-off-by: Dumitru Ceara 
> ---
>  lib/actions.c | 16 ++--
>  tests/ovn.at  |  8 
>  2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/lib/actions.c b/lib/actions.c
> index 38cf4642d6..fdc0529de6 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -1131,8 +1131,20 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
>  }
>
>  if (cn->port_range.exists) {
> -   nat->range.proto.min = cn->port_range.port_lo;
> -   nat->range.proto.max = cn->port_range.port_hi;
> +nat->range.proto.min = cn->port_range.port_lo;
> +nat->range.proto.max = cn->port_range.port_hi;
> +
> +/* Explicitly set the port selection algorithm to "random".
> Otherwise
> + * it's up to the datapath to choose how to select the port and
> that
> + * might create unexpected behavior changes when the datapath
> defaults
> + * change.
> + *
> + * NOTE: for the userspace datapath the "random" function doesn't
> + * really generate random ports, it uses "hash" under the hood:
> + * https://issues.redhat.com/browse/FDP-269. */
> +if (nat->range.proto.min && nat->range.proto.max) {
> +nat->flags |= NX_NAT_F_PROTO_RANDOM;
> +}
>  }
>
>  ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 8cc4c311c1..e8cef83cb5 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -1368,7 +1368,7 @@ ct_dnat(fd11::2);
>  has prereqs ip
>  ct_dnat(192.168.1.2, 1-3000);
>  formats as ct_dnat(192.168.1.2,1-3000);
> -encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000))
> +encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1
> -3000,random))
>  has prereqs ip
>
>  ct_dnat(192.168.1.2, 192.168.1.3);
> @@ -1402,7 +1402,7 @@ ct_dnat_in_czone(fd11::2);
>  has prereqs ip
>  ct_dnat_in_czone(192.168.1.2, 1-3000);
>  formats as ct_dnat_in_czone(192.168.1.2,1-3000);
> -encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1-3000))
> +encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(dst=192.168.1.2:1
> -3000,random))
>  has prereqs ip
>
>  ct_dnat_in_czone(192.168.1.2, 192.168.1.3);
> @@ -1436,7 +1436,7 @@ ct_snat(fd11::2);
>  has prereqs ip
>  ct_snat(192.168.1.2, 1-3000);
>  formats as ct_snat(192.168.1.2,1-3000);
> -encodes as
> ct(commit,table=19,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2:1-3000))
> +encodes as
> ct(commit,table=19,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2:1
> -3000,random))
>  has prereqs ip
>
>  ct_snat(192.168.1.2, 192.168.1.3);
> @@ -1470,7 +1470,7 @@ ct_snat_in_czone(fd11::2);
>  has prereqs ip
>  ct_snat_in_czone(192.168.1.2, 1-3000);
>  formats as ct_snat_in_czone(192.168.1.2,1-3000);
> -encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(src=192.168.1.2:1-3000))
> +encodes as
> ct(commit,table=19,zone=NXM_NX_REG11[0..15],nat(src=192.168.1.2:1
> -3000,random))
>  has prereqs ip
>
>  ct_snat_in_czone(192.168.1.2, 192.168.1.3);
> --
> 2.39.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks.

Acked-by: Ales Musil 

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] userspace: Check for inner L3 while preparing encapsulated packets.

2024-01-23 Thread Ilya Maximets
On 1/22/24 23:24, Mike Pattrick wrote:
> On Mon, Jan 22, 2024 at 4:10 PM Ilya Maximets  wrote:
>>
>> On 1/22/24 21:33, Mike Pattrick wrote:
>>> On Mon, Jan 22, 2024 at 1:47 PM Ilya Maximets  wrote:

 On 1/22/24 18:51, Mike Pattrick wrote:
> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
> a double encapsulated BFD packet would trigger a seg fault. This
> happened because we had assumed that if IP checksumming was marked as
> offloaded, that checksum would occur in the innermost packet.
>
> This change will check that the inner packet has an L3 and L4 before
> checksumming those parts. And if missing, will resume checksumming the
> outer components.

 Hrm.  This looks like a workaround rather than a fix.  If the inner packet
 doesn't have L3 header, dp-packet must not have a flag for L3 checksumming
 set in the first place.  And if it does have inner L3, the offset must be
 initialized.  I guess, some of the offsets can be not initialized, because
 the packet is never parsed by either miniflow_extract() or 
 parse_tcp_flags().
 And the bfd_put_packet() doesn't seem to set them.
>>>
>>> I think you're right, I stepped through the problem in GDB and it was
>>> clear that the flags weren't getting reset properly between BFD
>>> packets. I'll send an updated patch.
>>
>> Yeah, flags preservation is one thing, the other is that l3/l4_ofs are just
>> not populated in these packets, which doesn't sound right.
>>
>> It's also not clear why the packet is marked for inner checksum offload if
>> the inner checksum is actually fully calculated and correct.  I understand
>> that the flags are carried over from a previous packet, but why did that
>> previous packet have this flag set?
> 
> Here's an example:
> 
> monitor_run()
> \_ dp_packet_use_stub(, stub, sizeof stub); <--- initializes packet 
> once
> \_ while(!heap_is_empty(_heap) <-- loop where the same packet
> can be reused
> \_ monitor_mport_run()
> \_ dp_packet_clear() <- Data cleared, but flags are not cleared
> \_ cfm_compose_ccm() / bfd_put_packet() / lldp_put_packet() <-
> one or more can run
> \_ Note that cfm_compose_ccm and lldp_put_packet call
> eth_compose, but bfd_put_packet doesn't. bfd_put_packet doesn't reset
> the offsets like eth_compose does.

Sounds like a bug to me.  bfd_put_packet() should set correct offsets
in the packet, otherwise we'll get random failures with different
actions that may depend on these offsets to be populated.

> \_ ofproto_dpif_send_packet()
>    non-relevant stack trace ...
>  \_ netdev_push_header()
>   \_ First run, push geneve header and toggle geneve flag
>   \_ Second run, detect geneve header flag, call 
> dp_packet_ol_send_prepare()
> 
> Given the above, I think it makes sense to clear the offload flags in
> dp_packet_clear().

I agree with that.  But it still doesn't explain why the 
DP_PACKET_OL_TX_IP_CKSUM
is set after the first run.  The inner checksum is fully calculated and correct.
There should be no Tx offloading for it set.  Only the 
DP_PACKET_OL_TX_OUTER_IP_CKSUM
should be set in this packet.  Or am I missing something?

> 
> -M
> 
>>
>>>
 BTW, is there actually a double encapsulation in the original OVN test?
 Sounds strange.
>>>
>>> The problem was exposed in netdev_push_header() in
>>> dp_packet_ol_send_prepare(packet, 0);
>>
>> That's true, but the packet dump in gdb showed a plain BFD packet.
>> So, it was a first encapsulation, not double.
>>
>>>
>>> -M
>>>

>
> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
> Reported-by: Dumitru Ceara 
> Reported-at: https://issues.redhat.com/browse/FDP-300
> Signed-off-by: Mike Pattrick 
> ---
>  lib/dp-packet.h | 10 --
>  lib/packets.c   |  6 +++---
>  2 files changed, 11 insertions(+), 5 deletions(-)

 Best regards, Ilya Maximets.

>>>
>>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] controller: fixed potential segfault when changing tunnel_key and deleting ls

2024-01-23 Thread Dumitru Ceara
On 1/4/24 11:43, Dumitru Ceara wrote:
> On 12/8/23 13:44, Xavier Simonart wrote:
>> When a tunnel_key for a datapath was changed, the local_datapaths hmap was 
>> not properly updated
>> as the old/initial entry was not removed.
>> - If the datapath was not deleted at the same time, a new entry (for the 
>> same dp) was created
>>   in the local_datapaths as the previous entry was not found (wrong hash).
>> - If the datapath was deleted at the same time, the former entry also 
>> remained (as, again, the hash
>>   was wrong). So, we kept a deleted (dp) entry in the hmap, and might crash 
>> when we used it later.
>>
>> When tunnel_key is updated for an existing datapath, we now clean the 
>> local_datapaths,
>> removing bad entries (i.e entries for which the hash is not the tunnel_key).
>>
>> This issue was identified by flaky failures of test "ovn-ic -- route 
>> deletion upon TS deletion".
>>
>> Backtrace:
>> 0  0x00504a9a in hmap_first_with_hash (hmap=0x20f4d90, 
>> hmap@entry=0x5d5634, hmap=0x20f4d90, hmap@entry=0x5d5634, hash=1956414673) 
>> at ./include/openvswitch/hmap.h:360
>> 1  smap_find__ (smap=smap@entry=0x20f4d90, key=key@entry=0x5d5634 
>> "snat-ct-zone", key_len=key_len@entry=12, hash=1956414673) at lib/smap.c:421
>> 2  0x00505073 in smap_get_node (key=0x5d5634 "snat-ct-zone", 
>> smap=0x20f4d90) at lib/smap.c:217
>> 3  smap_get_def (def=0x0, key=0x5d5634 "snat-ct-zone", smap=0x20f4d90) at 
>> lib/smap.c:208
>> 4  smap_get (smap=0x20f4d90, key=key@entry=0x5d5634 "snat-ct-zone") at 
>> lib/smap.c:200
>> 5  0x00419898 in get_common_nat_zone (ldp=0x20a8c40, ldp=0x20a8c40) 
>> at controller/lflow.c:831
>> 6  add_matches_to_flow_table (lflow=lflow@entry=0x21ddf90, 
>> ldp=ldp@entry=0x20a8c40, matches=matches@entry=0x211bb50, 
>> ptable=ptable@entry=10 '\n', output_ptable=output_ptable@entry=37 '%', 
>> ovnacts=ovnacts@entry=0x7ffd19b99de0,
>> ingress=true, l_ctx_in=0x7ffd19b9a300, l_ctx_out=0x7ffd19b9a2c0) at 
>> controller/lflow.c:892
>> 7  0x0041a29b in consider_logical_flow__ 
>> (lflow=lflow@entry=0x21ddf90, dp=0x20eb720, 
>> l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, 
>> l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1207
>> 8  0x0041a587 in consider_logical_flow (lflow=lflow@entry=0x21ddf90, 
>> is_recompute=is_recompute@entry=false, 
>> l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, 
>> l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:1276
>> 9  0x0041e30f in lflow_handle_changed_flows 
>> (l_ctx_in=l_ctx_in@entry=0x7ffd19b9a300, 
>> l_ctx_out=l_ctx_out@entry=0x7ffd19b9a2c0) at controller/lflow.c:271
>> 10 0x00439fc7 in lflow_output_sb_logical_flow_handler 
>> (node=0x7ffd19ba5b70, data=) at 
>> controller/ovn-controller.c:4045
>> 11 0x004682fe in engine_compute (recompute_allowed=, 
>> node=) at lib/inc-proc-eng.c:441
>> 12 engine_run_node (recompute_allowed=true, node=0x7ffd19ba5b70) at 
>> lib/inc-proc-eng.c:503
>> 13 engine_run (recompute_allowed=recompute_allowed@entry=true) at 
>> lib/inc-proc-eng.c:528
>> 14 0x0040ade2 in main (argc=, argv=) 
>> at controller/ovn-controller.c:5709
>>
>> Signed-off-by: Xavier Simonart 
>>
>> ---
>> v2: Fixed potential memory leak.
>> v3: - Updated based on Dumitru's feedback: style & fixed another potential 
>> memory leak.
>> - Modify test case to detect potential memory leak.
>> - Rebased on latest main.
>> ---
>>  controller/local_data.c | 14 
>>  controller/local_data.h |  4 
>>  controller/ovn-controller.c | 23 
>>  tests/ovn.at| 43 +
>>  4 files changed, 84 insertions(+)
>>
>> diff --git a/controller/local_data.c b/controller/local_data.c
>> index 3a7d3afeb..8f0890a14 100644
>> --- a/controller/local_data.c
>> +++ b/controller/local_data.c
>> @@ -56,6 +56,20 @@ static bool datapath_is_transit_switch(const struct 
>> sbrec_datapath_binding *);
>>  
>>  static uint64_t local_datapath_usage;
>>  
>> +/* To be used when hmap_node.hash might be wrong e.g. tunnel_key got 
>> updated */
>> +struct local_datapath *
>> +get_local_datapath_no_hash(const struct hmap *local_datapaths,
>> +uint32_t tunnel_key)
> 
> Nit: indentation.
> 
>> +{
>> +struct local_datapath *ld;
>> +HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>> +if (ld->datapath->tunnel_key == tunnel_key) {
>> +return ld;
>> +}
>> +}
>> +return NULL;
>> +}
>> +
>>  struct local_datapath *
>>  get_local_datapath(const struct hmap *local_datapaths, uint32_t tunnel_key)
>>  {
>> diff --git a/controller/local_data.h b/controller/local_data.h
>> index f6d8f725f..1586a76da 100644
>> --- a/controller/local_data.h
>> +++ b/controller/local_data.h
>> @@ -69,6 +69,10 @@ struct local_datapath *local_datapath_alloc(
>>  const struct sbrec_datapath_binding *);
>>  struct local_datapath *get_local_datapath(const struct hmap *,

Re: [ovs-dev] [PATCH ovn] controller: Don't release LRP until its claimed

2024-01-23 Thread Dumitru Ceara
On 1/23/24 11:55, Dumitru Ceara wrote:

[...]

>> +static void *
>> +en_lrp_claims_init(struct engine_node *node OVS_UNUSED, struct engine_arg 
>> *arg OVS_UNUSED)
> 
> Checkpatch reports "WARNING: Line is 90 characters long (recommended
> limit is 79)".
> 
>> +{
>> +struct ed_type_lrp_claims *data = xzalloc(sizeof *data);
>> +data->lrp_claims = xzalloc(sizeof *(data->lrp_claims));
>> +
>> +data->lrp_claims->claim = NULL;
>> +data->lrp_claims->size = 0;
>> +
>> +return data;
>> +}

Running this through CI shows a memleak:

==18945==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x558463f97e88 in __interceptor_calloc 
(/workspace/ovn-tmp/controller/ovn-controller+0x304e88) (BuildId: 
b928f7b59bd18b4fec080b5a3a9f0098e471ec2a)
#1 0x55846437e94e in xcalloc__ /workspace/ovn-tmp/ovs/lib/util.c:124:31
#2 0x55846437e94e in xzalloc__ /workspace/ovn-tmp/ovs/lib/util.c:134:12
#3 0x55846437e94e in xzalloc /workspace/ovn-tmp/ovs/lib/util.c:168:12
#4 0x5584640d1ec7 in en_lrp_claims_init 
/workspace/ovn-tmp/controller/ovn-controller.c:1470:24
#5 0x5584641dd636 in engine_init 
/workspace/ovn-tmp/lib/inc-proc-eng.c:206:17
#6 0x5584640c73fe in main 
/workspace/ovn-tmp/controller/ovn-controller.c:5447:5
#7 0x7f84fdd8fd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 
c289da5071a3399de893d2af81d6a30c62646e1e)

One way to trigger CI is to push the patch to your GitHub fork.

For example, my run:
https://github.com/dceara/ovn/actions/runs/7624514129

Hope this helps,

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: Don't release LRP until its claimed

2024-01-23 Thread Dumitru Ceara
Hi Ihtisham ul Haq,

Thanks for the patch and sorry for the delay in reviewing it!

On 12/19/23 17:46, Ihtisham ul Haq via dev wrote:
> Previously LRP would be released as soon as the BFD session to
> another chassis is established even if that chassis is unable to
> claim the LRP for whatever reason.

But if the BFD session went down and the new ha_chassis leader can't
claim the port networking will not work anyway.  So what's the point of
keeping the current "claim"?  Or am I missing something?

> This patch introduces a cache to keep track of LRPs claimed by the
> current chassis and release it only if an update event for LRP claim
> from another chassis

IIUC there's a risk we never get that claim and then we just don't
release the LRP for ever.  That sounds wrong.

> 
> Signed-off-by: Ihtisham ul Haq 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-September/052688.html

>From your report there:

"We observed this issue when a GTW Chassis gets rebooted without a
proper termination i.e `ovn-appctl exit()`, this results in higher
priority LRPs persisting for that GTW Chassis in the Southbound(not
rescheduled by CMS). And when the node comes back up again, the
lower priority GTW Chassis releases the LRP, but the restarted node
doesn't reclaim it, causing router outages. Below is a case we came
across in our environment, and at the bottom is a patch to remedy
this issue."

I don't really understand why not releasing the lrp fixes your issue.
Shouldn't we actually fix the fact that "the restarted node
doesn't reclaim it"?

> ---
> 
> diff --git a/controller/automake.mk b/controller/automake.mk
> --- a/controller/automake.mk
> +++ b/controller/automake.mk
> @@ -22,6 +22,7 @@ controller_ovn_controller_SOURCES = \
> controller/lflow-conj-ids.h \
> controller/lport.c \
> controller/lport.h \
> +   controller/lrp-claim.h \

There's no lrp-claim.h file added by this patch.  Is this an accidental
leftover?

> controller/ofctrl.c \
> controller/ofctrl.h \
> controller/ofctrl-seqno.c \
> diff --git a/controller/binding.c b/controller/binding.c
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -63,6 +63,58 @@ static struct shash _claimed_ports = 
> SHASH_INITIALIZER(&_claimed_ports);
>  static struct sset _postponed_ports = SSET_INITIALIZER(&_postponed_ports);
>  static struct shash _qos_ports = SHASH_INITIALIZER(&_qos_ports);
> 
> +
> +void add_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id) {
> +if (find_lrp_claim(ctx, lrp_id)) {
> +return;
> +}
> +
> +struct lrp_claim *new_claim = xmalloc(sizeof *new_claim);
> +new_claim->lrp_id = xstrdup(lrp_id);
> +new_claim->next = ctx->lrp_claims->claim;
> +ctx->lrp_claims->claim = new_claim;
> +ctx->lrp_claims->size++;
> +}
> +
> +void remove_lrp_claim(struct binding_ctx_out *ctx, char *lrp_id) {
> +struct lrp_claim *prev = NULL;
> +struct lrp_claim *current = ctx->lrp_claims->claim;
> +
> +while (current) {
> +if (current->lrp_id && !strcmp(current->lrp_id, lrp_id)) {
> +if (prev) {
> +prev->next = current->next;
> +} else {
> +ctx->lrp_claims->claim = current->next;
> +}
> +
> +free(current->lrp_id);
> +free(current);
> +
> +ctx->lrp_claims->size--;
> +
> +return;
> +}
> +
> +prev = current;
> +current = current->next;
> +}
> +}

On a closer look, what you should probably use here is a sset (set of
strings, i.e., set of lrp ids).  Please check 'postponed_ports' for a
similar example.

> +
> +bool find_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id) {
> +if (!ctx->lrp_claims || !lrp_id) {
> +return false;
> +}
> +struct lrp_claim *current = ctx->lrp_claims->claim;
> +while (current) {
> +if (current->lrp_id && strcmp(current->lrp_id, lrp_id) == 0) {
> +return true;
> +}
> +current = current->next;
> +}
> +return false;
> +}
> +
>  static void
>  remove_additional_chassis(const struct sbrec_port_binding *pb,
>const struct sbrec_chassis *chassis_rec);
> @@ -1821,7 +1873,8 @@ static bool
>  consider_nonvif_lport_(const struct sbrec_port_binding *pb,
> bool our_chassis,
> struct binding_ctx_in *b_ctx_in,
> -   struct binding_ctx_out *b_ctx_out)
> +   struct binding_ctx_out *b_ctx_out,
> +   bool should_release)
>  {
>  if (our_chassis) {
>  update_local_lports(pb->logical_port, b_ctx_out);
> @@ -1831,8 +1884,8 @@ consider_nonvif_lport_(const struct sbrec_port_binding 
> *pb,
> pb->datapath, b_ctx_in->chassis_rec,
> b_ctx_out->local_datapaths,
> 

Re: [ovs-dev] [PATCH ovn] ovn-ic: handle NB:name updates properly

2024-01-23 Thread Dumitru Ceara
On 12/18/23 15:25, Mohammad Heib wrote:
> When the user updates the NB_GLOBAL.name after registering
> to IC Databases if the user already has defined chassis as a gateway
> that will cause ovn-ic instance to run in an infinity loop trying
> to update the gateways and insert the current gateway to the SB.chassis
> tables as a remote chassis (we match on the new AZ ) which will fail since
> we already have this chassis with is-interconn in local SB.
> 
> This patch aims to fix the above issues by updating the AZ.name only
> when the user updates the NB.name locally.
> 
> Signed-off-by: Mohammad Heib 
> ---

Thanks for the patch, Mohammad, and sorry for the delay in reviewing it.

>  ic/ovn-ic.c | 10 +++---
>  tests/ovn-ic.at | 23 +++
>  2 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 8ceb34d7c..12e2729ce 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -132,14 +132,18 @@ az_run(struct ic_context *ctx)
>  return NULL;
>  }
>  
> -/* Delete old AZ if name changes.  Note: if name changed when ovn-ic
> - * is not running, one has to manually delete the old AZ with:
> +/* Update old AZ if name changes.  Note: if name changed when ovn-ic
> + * is not running, one has to manually delete/update the old AZ with:
>   * "ovn-ic-sbctl destroy avail ". */
>  static char *az_name;
>  const struct icsbrec_availability_zone *az;
>  if (az_name && strcmp(az_name, nb_global->name)) {
>  ICSBREC_AVAILABILITY_ZONE_FOR_EACH (az, ctx->ovnisb_idl) {
> -if (!strcmp(az->name, az_name)) {
> +/* AZ name update locally need to update az in ISB. */
> +if (nb_global->name[0] && !strcmp(az->name, az_name)) {
> +icsbrec_availability_zone_set_name(az, nb_global->name);
> +break;
> +} else if (!nb_global->name[0] && !strcmp(az->name, az_name)) {
>  icsbrec_availability_zone_delete(az);
>  break;
>  }
> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
> index d4c436f84..5cc504e17 100644
> --- a/tests/ovn-ic.at
> +++ b/tests/ovn-ic.at
> @@ -28,7 +28,30 @@ availability-zone az3
>  ])
>  
>  OVN_CLEANUP_IC([az1], [az2])
> +AT_CLEANUP
> +])
> +
> +

We're missing a "OVN_FOR_EACH_NORTHD([" here.

> +AT_SETUP([ovn-ic -- AZ update in GW])
> +ovn_init_ic_db
> +net_add n1
>  
> +ovn_start az1
> +sim_add gw-az1
> +as gw-az1
> +
> +check ovs-vsctl add-br br-phys
> +ovn_az_attach az1 n1 br-phys 192.168.1.1
> +check ovs-vsctl set open . external-ids:ovn-is-interconn=true
> +
> +az_uuid=$(fetch_column ic-sb:availability-zone _uuid name="az1")
> +ovn_as az1 ovn-nbctl set NB_Global . name="az2"

Nit: missing "check".

> +wait_column "$az_uuid" ic-sb:availability-zone _uuid name="az2"
> +
> +# make sure that gateway still point to the same AZ with new name
> +wait_column "$az_uuid" ic-sb:gateway availability_zone name="gw-az1"
> +
> +OVN_CLEANUP_IC([az1])
>  AT_CLEANUP
>  ])
>  

The rest looks correct to me, feel free to add my ack to v2 if you
address the two comments above.

Acked-by: Dumitru Ceara 

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dp-packet: Reset offload flags when clearing a packet.

2024-01-23 Thread Dumitru Ceara
On 1/23/24 00:11, Mike Pattrick wrote:
> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
> a BFD packet flagged as double encapsulated would trigger a seg fault.
> The problem surfaced because bfd_put_packet was reusing a packet
> allocated on the stack that wasn't having its flags reset between calls.
> 

Thanks for tracking this one down, Mike!

> This change will reset OL flags in data_clear(), which should fix this
> type of packet reuse issue in general as long as data_clear() is called
> in between uses. This change also includes a tangentially related check
> in dp_packet_inner_l4_size(), where the correct offset was not being
> checked.

Up to maintainers but should this tangential fix be a separate patch?

> 
> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
> Reported-by: Dumitru Ceara 
> Reported-at: https://issues.redhat.com/browse/FDP-300
> Signed-off-by: Mike Pattrick 
> ---

I'm no expert in this code but the change itself looks correct to me and
OVN tests pass with this applied:

Reviewed-by: Dumitru Ceara 

>  lib/dp-packet.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 939bec5c8..f328a6637 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -207,6 +207,7 @@ void *dp_packet_resize_l2(struct dp_packet *, int 
> increment);
>  void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
>  static inline void *dp_packet_eth(const struct dp_packet *);
>  static inline void dp_packet_reset_offsets(struct dp_packet *);
> +static inline void dp_packet_reset_offload(struct dp_packet *);
>  static inline uint16_t dp_packet_l2_pad_size(const struct dp_packet *);
>  static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint16_t);
>  static inline void *dp_packet_l2_5(const struct dp_packet *);
> @@ -380,6 +381,7 @@ dp_packet_clear(struct dp_packet *b)
>  {
>  dp_packet_set_data(b, dp_packet_base(b));
>  dp_packet_set_size(b, 0);
> +dp_packet_reset_offload(b);
>  }
>  
>  /* Removes 'size' bytes from the head end of 'b', which must contain at least
> @@ -537,7 +539,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
>  static inline size_t
>  dp_packet_inner_l4_size(const struct dp_packet *b)
>  {
> -return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
> +return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
> ? (const char *) dp_packet_tail(b)
> - (const char *) dp_packet_inner_l4(b)
> - dp_packet_l2_pad_size(b)

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev