On Fri, Nov 20, 2020 at 9:08 PM Dumitru Ceara <[email protected]> wrote:
>
> When a port binding of type "l3gateway" is claimed its remote peer
> port_binding is also stored in local_datapath.peer_ports[].remote.
>
> If the remote peer port_binding is deleted first (i.e., before the local
> "l3gateway" one) then we need to remove the complete
> local_datapath.peer_ports[] entry in order to avoid ending up using
> dangling pointers to already freed port bindings.
>
> Also, properly reset local_datapath->has_local_l3gateway in
> remove_pb_from_local_datapath().
>
> Ilya reported this issue found by AddressSanitizer during his testing:
>
> ==1816017==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x6140000cb170 at pc 0x0000005ab574 bp 0x7fff68925a30 sp 0x7fff68925a28
> READ of size 8 at 0x6140000cb170 thread T0
> #0 0x5ab573 in put_replace_chassis_mac_flows
> git/ovn/controller/physical.c:550:9
> #1 0x5a65eb in consider_port_binding git/ovn/controller/physical.c:1168:13
> #2 0x5a8764 in physical_run git/ovn/controller/physical.c:1607:9
> #3 0x5a0064 in flow_output_physical_flow_changes_handler
> git/ovn/controller/ovn-controller.c:2127:9
> #4 0x5db423 in engine_compute git/ovn/lib/inc-proc-eng.c:306:18
> #5 0x5dae1f in engine_run_node git/ovn/lib/inc-proc-eng.c:352:14
> #6 0x5dac74 in engine_run git/ovn/lib/inc-proc-eng.c:377:9
> #7 0x59ad64 in main git/ovn/controller/ovn-controller.c
> #8 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
> #9 0x480b2d in _start (git/ovn/controller/ovn-controller+0x480b2d)
>
> 0x6140000cb170 is located 304 bytes inside of 408-byte region
> [0x6140000cb040,0x6140000cb1d8)
> freed by thread T0 here:
> #0 0x520d07 in free (git/ovn/controller/ovn-controller+0x520d07)
> #1 0x712de7 in ovsdb_idl_db_track_clear git/ovs/lib/ovsdb-idl.c:1984:21
> #2 0x59b5cd in main git/ovn/controller/ovn-controller.c:2762:9
> #3 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
>
> Reported-by: Ilya Maximets <[email protected]>
> Fixes: 354bdba51abf ("ovn-controller: I-P for SB port binding and OVS
> interface in runtime_data.")
> Signed-off-by: Dumitru Ceara <[email protected]>
Hi Dumitru,
Please see below for few comments.
> ---
> controller/binding.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index eb92679..cb60c5d 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1523,21 +1523,41 @@ binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> }
>
> static const struct sbrec_port_binding *
> -get_peer_lport(const struct sbrec_port_binding *pb,
> - struct binding_ctx_in *b_ctx_in)
> +get_peer_lport__(const struct sbrec_port_binding *pb,
> + struct binding_ctx_in *b_ctx_in)
> {
> const char *peer_name = smap_get(&pb->options, "peer");
> - if (strcmp(pb->type, "patch") || !peer_name) {
> +
> + if (!peer_name) {
> return NULL;
> }
>
> const struct sbrec_port_binding *peer;
> peer = lport_lookup_by_name(b_ctx_in->sbrec_port_binding_by_name,
> peer_name);
> -
> return (peer && peer->datapath) ? peer : NULL;
> }
>
> +static const struct sbrec_port_binding *
> +get_l3gw_peer_lport(const struct sbrec_port_binding *pb,
> + struct binding_ctx_in *b_ctx_in)
> +{
> + if (strcmp(pb->type, "l3gateway")) {
> + return NULL;
> + }
> + return get_peer_lport__(pb, b_ctx_in);
> +}
> +
> +static const struct sbrec_port_binding *
> +get_peer_lport(const struct sbrec_port_binding *pb,
> + struct binding_ctx_in *b_ctx_in)
> +{
> + if (strcmp(pb->type, "patch")) {
> + return NULL;
> + }
> + return get_peer_lport__(pb, b_ctx_in);
> +}
> +
> /* This function adds the local datapath of the 'peer' of
> * lport 'pb' to the local datapaths if it is not yet added.
> */
> @@ -1654,7 +1674,9 @@ remove_pb_from_local_datapath(const struct
> sbrec_port_binding *pb,
> pb->logical_port)) {
> ld->localnet_port = NULL;
> }
> - } else if (!strcmp(pb->type, "l3gateway")) {
> + }
This makes sense. Thanks for spotting this.
> +
> + if (!strcmp(pb->type, "l3gateway")) {
> const char *chassis_id = smap_get(&pb->options,
> "l3gateway-chassis");
> if (chassis_id && !strcmp(chassis_id, chassis_rec->name)) {
> @@ -1956,12 +1978,27 @@ handle_deleted_lport(const struct sbrec_port_binding
> *pb,
> struct binding_ctx_in *b_ctx_in,
> struct binding_ctx_out *b_ctx_out)
> {
> + /* If the binding is local, remove it. */
> struct local_datapath *ld =
> get_local_datapath(b_ctx_out->local_datapaths,
> pb->datapath->tunnel_key);
> if (ld) {
> remove_pb_from_local_datapath(pb, b_ctx_in->chassis_rec,
> b_ctx_out, ld);
> + return;
> + }
> +
> + /* If the binding is not local, if 'pb' is a L3 gateway port, we should
> + * remove its peer, if that one is local.
> + */
Hi Dumitru,
Is this issue seen wiith this Ilya's patch series applied ?
https://patchwork.ozlabs.org/project/ovn/list/?series=214426
I have not looked into that series yet. But I fail to understand how
can we encounter this issue with the present master ?
Because
- If a logical router is a gateway router, then all the logical
router ports of that router and the peer logical switch ports
are claimed by the same chassis where the options:chassis=<ch> is set.
- So when a logical router port of l3gateway is deleted or its peer
(of type l3gatewayport) is deleted, then
if that port_binding is not in the local datapath, then its peer
cannot be in the local datapath too.
Do you see a scenarion where this could happen with the present master ?
Thanks
Numan
> + pb = get_l3gw_peer_lport(pb, b_ctx_in);
> + if (pb) {
> + ld = get_local_datapath(b_ctx_out->local_datapaths,
> + pb->datapath->tunnel_key);
> + if (ld) {
> + remove_pb_from_local_datapath(pb, b_ctx_in->chassis_rec,
> b_ctx_out,
> + ld);
> + }
> }
> }
>
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev