Thanks, Andy. Your fix looks better than mine.
And, don't forget to apply the fix in ofproto_dpif_delete_internal_flow to
fix the post recirc rule leak. :)

2017年2月24日星期五,Andy Zhou <[email protected]> 写道:
> On Sun, Feb 19, 2017 at 2:41 AM,  <[email protected]> wrote:
>> From: Huanle Han <[email protected]>
>>
>> The recirc rule_op of tcp-balance bond is still referenced
>> though its memory is freed. And the freed object will be
>> written invalidly on following bond_update_post_recirc_rules.
>> The commit releases the reference to the rule_ops when
>> tcp-balance bond destroy or balance type change.
>>
> Thanks for the bug report and for working on it!
>
>> Thread 23 handler69:
>> Invalid write of size 8
>>    update_recirc_rules (bond.c:385)
>>    bond_update_post_recirc_rules__ (bond.c:952)
>>    bond_update_post_recirc_rules (bond.c:960)
>>    output_normal (ofproto-dpif-xlate.c:2102)
>>    xlate_normal (ofproto-dpif-xlate.c:2858)
>>    xlate_output_action (ofproto-dpif-xlate.c:4407)
>>    do_xlate_actions (ofproto-dpif-xlate.c:5335)
>>    xlate_actions (ofproto-dpif-xlate.c:6198)
>>    upcall_xlate (ofproto-dpif-upcall.c:1129)
>>    process_upcall (ofproto-dpif-upcall.c:1271)
>>    recv_upcalls (ofproto-dpif-upcall.c:822)
>>    udpif_upcall_handler (ofproto-dpif-upcall.c:740)
>>  Address 0x18630490 is 1,904 bytes inside a block of size 12,288 free'd
>>     free (vg_replace_malloc.c:529)
>>    bond_entry_reset (bond.c:1635)
>>    bond_reconfigure (bond.c:457)
>>    bundle_set (ofproto-dpif.c:2896)
>>    ofproto_bundle_register (ofproto.c:1343)
>>    port_configure (bridge.c:1159)
>>    bridge_reconfigure (bridge.c:785)
>>    bridge_run (bridge.c:3099)
>>    main (ovs-vswitchd.c:111)
>>  Block was alloc'd at
>>     malloc (vg_replace_malloc.c:298)
>>    xmalloc (util.c:110)
>>    bond_entry_reset (bond.c:1629)
>>    bond_reconfigure (bond.c:457)
>>    bond_create (bond.c:245)
>>    bundle_set (ofproto-dpif.c:2900)
>>    ofproto_bundle_register (ofproto.c:1343)
>>    port_configure (bridge.c:1159)
>>    bridge_reconfigure (bridge.c:785)
>>    bridge_run (bridge.c:3099)
>>    main (ovs-vswitchd.c:111)
>
> It is not clear to me how the fix proposed by the patch explains the
> cause of the backtrace.
>
> To me, the backtrace suggests a race condition between revalidate
> thread and the main thread.
> I think I found the race condition, and proposed a fix at:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329073.html
>
>>
>> Signed-off-by: Huanle Han <[email protected]>
>> ---
>>  ofproto/bond.c | 38 +++++++++++++++++++++++++++++++++-----
>>  1 file changed, 33 insertions(+), 5 deletions(-)
>>
>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>> index 260023e..1c5ae43 100644
>> --- a/ofproto/bond.c
>> +++ b/ofproto/bond.c
>> @@ -275,6 +275,22 @@ bond_unref(struct bond *bond)
>>      hmap_remove(all_bonds, &bond->hmap_node);
>>      ovs_rwlock_unlock(&rwlock);
>>
>> +    HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
>> +        int error = ofproto_dpif_delete_internal_flow(bond->ofproto,
>> +                &pr_op->match, RECIRC_RULE_PRIORITY);
>> +        if (error) {
>> +            char *err_s = match_to_string(&pr_op->match,
>> +                    RECIRC_RULE_PRIORITY);
>> +
>> +            VLOG_ERR("failed to remove post recirculation flow %s when
bond_unref", err_s);
>> +            free(err_s);
>> +        }
>> +
>> +        *pr_op->pr_rule = NULL;
>> +        free(pr_op);
>> +    }
>> +    hmap_destroy(&bond->pr_rule_ops);
>> +
>>      HMAP_FOR_EACH_POP (slave, hmap_node, &bond->slaves) {
>>          /* Client owns 'slave->netdev'. */
>>          free(slave->name);
>> @@ -284,13 +300,9 @@ bond_unref(struct bond *bond)
>>
>>      ovs_mutex_destroy(&bond->mutex);
>>      free(bond->hash);
>> +    bond->hash = NULL;
>>      free(bond->name);
>>
>> -    HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
>> -        free(pr_op);
>> -    }
>> -    hmap_destroy(&bond->pr_rule_ops);
>> -
>>      if (bond->recirc_id) {
>>          recirc_free_id(bond->recirc_id);
>>      }
>> @@ -1635,6 +1647,22 @@ bond_entry_reset(struct bond *bond)
>>
>>          bond->next_rebalance = time_msec() + bond->rebalance_interval;
>>      } else {
>> +        struct bond_pr_rule_op *pr_op;
>> +        HMAP_FOR_EACH_POP (pr_op, hmap_node, &bond->pr_rule_ops) {
>> +            int error = ofproto_dpif_delete_internal_flow(bond->ofproto,
>> +                    &pr_op->match, RECIRC_RULE_PRIORITY);
>> +            if (error) {
>> +                char *err_s = match_to_string(&pr_op->match,
>> +                        RECIRC_RULE_PRIORITY);
>> +
>> +                VLOG_ERR("failed to remove post recirculation flow %s
when bond reset", err_s);
>> +                free(err_s);
>> +            }
>> +
>> +            hmap_remove(&bond->pr_rule_ops, &pr_op->hmap_node);
>> +            *pr_op->pr_rule = NULL;
>> +            free(pr_op);
>> +        }
>>          free(bond->hash);
>>          bond->hash = NULL;
>>      }
>> --
>
> The patch seems mostly about fixing post recirculation rule leak.
> However, there are some
> logic duplications, and it seems we can reuse the existing logics
> update_recirc_rules().
> How about the following patch instead:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329074.html
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to