On Sun, Feb 19, 2017 at 2:41 AM,  <[email protected]> wrote:
> From: Huanle Han <[email protected]>
>
> The recirc rule_op of tcp-balance bond is still referenced
> though its memory is freed. And the freed object will be
> written invalidly on following bond_update_post_recirc_rules.
> The commit releases the reference to the rule_ops when
> tcp-balance bond destroy or balance type change.
>
Thanks for the bug report and for working on it!

> Thread 23 handler69:
> Invalid write of size 8
>    update_recirc_rules (bond.c:385)
>    bond_update_post_recirc_rules__ (bond.c:952)
>    bond_update_post_recirc_rules (bond.c:960)
>    output_normal (ofproto-dpif-xlate.c:2102)
>    xlate_normal (ofproto-dpif-xlate.c:2858)
>    xlate_output_action (ofproto-dpif-xlate.c:4407)
>    do_xlate_actions (ofproto-dpif-xlate.c:5335)
>    xlate_actions (ofproto-dpif-xlate.c:6198)
>    upcall_xlate (ofproto-dpif-upcall.c:1129)
>    process_upcall (ofproto-dpif-upcall.c:1271)
>    recv_upcalls (ofproto-dpif-upcall.c:822)
>    udpif_upcall_handler (ofproto-dpif-upcall.c:740)
>  Address 0x18630490 is 1,904 bytes inside a block of size 12,288 free'd
>     free (vg_replace_malloc.c:529)
>    bond_entry_reset (bond.c:1635)
>    bond_reconfigure (bond.c:457)
>    bundle_set (ofproto-dpif.c:2896)
>    ofproto_bundle_register (ofproto.c:1343)
>    port_configure (bridge.c:1159)
>    bridge_reconfigure (bridge.c:785)
>    bridge_run (bridge.c:3099)
>    main (ovs-vswitchd.c:111)
>  Block was alloc'd at
>     malloc (vg_replace_malloc.c:298)
>    xmalloc (util.c:110)
>    bond_entry_reset (bond.c:1629)
>    bond_reconfigure (bond.c:457)
>    bond_create (bond.c:245)
>    bundle_set (ofproto-dpif.c:2900)
>    ofproto_bundle_register (ofproto.c:1343)
>    port_configure (bridge.c:1159)
>    bridge_reconfigure (bridge.c:785)
>    bridge_run (bridge.c:3099)
>    main (ovs-vswitchd.c:111)

It is not clear to me how the fix proposed by the patch explains the
cause of the backtrace.

To me, the backtrace suggests a race condition between revalidate
thread and the main thread.
I think I found the race condition, and proposed a fix at:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329073.html

>
> Signed-off-by: Huanle Han <[email protected]>
> ---
>  ofproto/bond.c | 38 +++++++++++++++++++++++++++++++++-----
>  1 file changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index 260023e..1c5ae43 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -275,6 +275,22 @@ bond_unref(struct bond *bond)
>      hmap_remove(all_bonds, &bond->hmap_node);
>      ovs_rwlock_unlock(&rwlock);
>
> +    HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
> +        int error = ofproto_dpif_delete_internal_flow(bond->ofproto,
> +                &pr_op->match, RECIRC_RULE_PRIORITY);
> +        if (error) {
> +            char *err_s = match_to_string(&pr_op->match,
> +                    RECIRC_RULE_PRIORITY);
> +
> +            VLOG_ERR("failed to remove post recirculation flow %s when 
> bond_unref", err_s);
> +            free(err_s);
> +        }
> +
> +        *pr_op->pr_rule = NULL;
> +        free(pr_op);
> +    }
> +    hmap_destroy(&bond->pr_rule_ops);
> +
>      HMAP_FOR_EACH_POP (slave, hmap_node, &bond->slaves) {
>          /* Client owns 'slave->netdev'. */
>          free(slave->name);
> @@ -284,13 +300,9 @@ bond_unref(struct bond *bond)
>
>      ovs_mutex_destroy(&bond->mutex);
>      free(bond->hash);
> +    bond->hash = NULL;
>      free(bond->name);
>
> -    HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
> -        free(pr_op);
> -    }
> -    hmap_destroy(&bond->pr_rule_ops);
> -
>      if (bond->recirc_id) {
>          recirc_free_id(bond->recirc_id);
>      }
> @@ -1635,6 +1647,22 @@ bond_entry_reset(struct bond *bond)
>
>          bond->next_rebalance = time_msec() + bond->rebalance_interval;
>      } else {
> +        struct bond_pr_rule_op *pr_op;
> +        HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
> +            int error = ofproto_dpif_delete_internal_flow(bond->ofproto,
> +                    &pr_op->match, RECIRC_RULE_PRIORITY);
> +            if (error) {
> +                char *err_s = match_to_string(&pr_op->match,
> +                        RECIRC_RULE_PRIORITY);
> +
> +                VLOG_ERR("failed to remove post recirculation flow %s when 
> bond reset", err_s);
> +                free(err_s);
> +            }
> +
> +            hmap_remove(&bond->pr_rule_ops, &pr_op->hmap_node);
> +            *pr_op->pr_rule = NULL;
> +            free(pr_op);
> +        }
>          free(bond->hash);
>          bond->hash = NULL;
>      }
> --

The patch seems mostly about fixing post recirculation rule leak.
However, there are some
logic duplications, and it seems we can reuse the existing logics
update_recirc_rules().
How about the following patch instead:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329074.html
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to