Ilya Maximets <[email protected]> writes: > On 6/7/21 3:59 PM, Ilya Maximets wrote: >> On 6/7/21 3:09 PM, Aaron Conole wrote: >>> Ilya Maximets <[email protected]> writes: >>> >>>>> Here is a patch with both a test and a fix. >>> >>> Thanks so much! It's nice to get fixes, but I think it's really great >>> when test cases come along with them. >>> >>>> Hi. Thanks for working n this! >>>> >>>> CC: ovs-dev >>>> >>>>> Not submitting as a formal >>>>> patch because I would like some feedback on whether 1) maintainers feel >>>>> this is worth fixing and >>>> >>>> I can reproduce the crash with your test. Basically, actions in userspace >>>> datapath may drop packets if something goes wrong. 'meter' action just >>>> seems to be the most explicit variant. So, I think, this is definitely >>>> worth fixing as some other condition might trigger this crash on packet-out >>>> as well. >>>> >>>> ==2568112==ERROR: AddressSanitizer: heap-use-after-free >>>> on address 0x61600000699c at pc 0x000000573860 bp 0x7ffebc6cc880 sp >>>> 0x7ffebc6cc878 >>>> READ of size 1 at 0x61600000699c thread T0 >>>> #0 0x57385f in dp_packet_delete lib/dp-packet.h:242:16 >>>> #1 0x57372c in ofproto_packet_out_uninit ofproto/ofproto.c:3562:5 >>>> #2 0x585e77 in handle_packet_out ofproto/ofproto.c:3722:5 >>>> #3 0x583801 in handle_single_part_openflow ofproto/ofproto.c:8499:16 >>>> #4 0x570c9c in handle_openflow ofproto/ofproto.c:8686:21 >>>> #5 0x611781 in ofconn_run ofproto/connmgr.c:1329:13 >>>> #6 0x6112ed in connmgr_run ofproto/connmgr.c:356:9 >>>> #7 0x56fdf4 in ofproto_run ofproto/ofproto.c:1891:5 >>>> #8 0x545ec0 in bridge_run__ vswitchd/bridge.c:3251:9 >>>> #9 0x5456a5 in bridge_run vswitchd/bridge.c:3310:5 >>>> #10 0x55f5b1 in main vswitchd/ovs-vswitchd.c:127:9 >>>> #11 0x7f85bfe09081 in __libc_start_main (/lib64/libc.so.6+0x27081) >>>> #12 0x46d00d in _start (vswitchd/ovs-vswitchd+0x46d00d) >>>> >>>>> 2) whether this is the way to fix it. >>>> >>>> See inline. >>>> >>>>> >>>>> I have tried to make the most minimal change possible, but this means that >>>>> there might be paths through the code that give unexpected behaviour >>>>> (which >>>>> in the worst case would be a memory leak I suppose). >>>>> >>>>> Tony >>>>> >>>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h >>>>> index 246be14d0..5e0dabe67 100644 >>>>> --- a/lib/dp-packet.h >>>>> +++ b/lib/dp-packet.h >>>>> @@ -739,6 +739,7 @@ struct dp_packet_batch { >>>>> size_t count; >>>>> bool trunc; /* true if the batch needs truncate. */ >>>>> bool do_not_steal; /* Indicate that the packets should not be stolen. >>>>> */ >>>>> + bool packet_out; /* Indicate single packet is PACKET_OUT */ >>>>> struct dp_packet *packets[NETDEV_MAX_BURST]; >>>>> }; >>>>> >>>>> @@ -748,6 +749,7 @@ dp_packet_batch_init(struct dp_packet_batch *batch) >>>>> batch->count = 0; >>>>> batch->trunc = false; >>>>> batch->do_not_steal = false; >>>>> + batch->packet_out = false; >>>>> } >>>>> >>>>> static inline void >>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>>>> index 650e67ab3..deba4a94a 100644 >>>>> --- a/lib/dpif-netdev.c >>>>> +++ b/lib/dpif-netdev.c >>>>> @@ -4170,6 +4170,7 @@ dpif_netdev_execute(struct dpif *dpif, struct >>>>> dpif_execute *execute) >>>>> >>>>> dp_packet_batch_init_packet(&pp, execute->packet); >>>>> pp.do_not_steal = true; >>>>> + pp.packet_out = execute->packet_out; >>>>> dp_netdev_execute_actions(pmd, &pp, false, execute->flow, >>>>> execute->actions, execute->actions_len); >>>> >>>> There is already a dirty hack named "do_not_steal" that was introduced, >>>> I guess, exactly to avoid crash in the conntrack code that could drop/steal >>>> the packet just like meter action. And it seems that here in >>>> dpif_netdev_execute() is the only problematic entry point as all other >>>> normal paths expects that packet might be destroyed. >>>> >>>> The problem was, I suppose, introduced when we tried to unify semantics >>>> of "may_steal" flag by turning it into "should_steal". But it seems that >>>> in this function we really need to prohibit stealing of the packet since >>>> ofproto layer still owns it regardless of the result of execution. >>>> >>>> I don't think that we need one more flag here, but we have several options >>>> how to fix the crash: >>>> >>>> 1. Start honoring batch->do_not_steal flag in all actions that may result >>>> in packet drops. As the original idea of having 'do_not_steal' flag for >>>> a batch is very hacky, I'd like to not do that. >>> >>> +1 >>> >>>> 2. Try to propagate information that packet was deleted up to ofproto >>>> layer, >>>> i.e. make handle_packet_out() aware of that. Will, probably, be not >>>> that >>>> easy to do. >>> >>> I had a look at doing this, but as you note it is quite intrusive, and >>> we need to make changes all over. >>> >>>> 3. This function (dpif_netdev_execute) is not on a hot path in userspace >>>> datapath, IIUC. It might be that it's just easier to remove the >>>> 'do_not_steal' flag entirely, clone the packet here and call the >>>> dp_netdev_execute_actions() with should_steal=true. >>>> This sounds like the best solution for me, unless I overlooked some >>>> scenario, where this code is on a hot path. >>> >>> I like this approach. >>> >>>> Thoughts? >>>> >>>> Aaron, you have a patch[1] to remove 'do_not_steal' flag, so fix for this >>>> issue >>>> will, likely, touch the same parts of the code. What do you think about >>>> this >>>> issue and possible solutions? >>> >>> I guess we should do the same thing we do in other places, ie: default >>> assume that the packet cannot be 'stolen' and we should clone our own >>> copies. >> >> I'm not sure that I understood this correctly, but the idea was that default >> assumption is that packet can be stolen at any point of datapath processing >> and higher layers should deal with this.
Re-reading what I wrote, I don't either. :-/ I agree with what you've written. >> For the IP fragmentation handling this will mean that ipf will just take a >> packet directly from the original batch without copying, so the original >> batch will not have this packet anymore and ipf is allowed to free it at >> any point in time, because now it owns this packet. >> This aligns with the "should_steal" semantics, as any function called with >> "should_steal=true" must take the ownership of the packet. If the function >> called with "should_steal=false" it still allowed to take ownership of some >> packets from the batch and caller must be prepared for that. >> If some function in datapath has no "should_steal" argument, it should be >> treated as a function with "should_steal=false". This applies to both >> conntrack_execute() and dp_netdev_run_meter(), also to netdev_pop_header() >> and so on. > > Hmm, OVS_ACTION_ATTR_TUNNEL_POP implies recirculation, so netdev_pop_header() > is not a fully valid example here. Actions that implies recirculation or > recirculates packets in any other way (e.g. OVS_ACTION_ATTR_USERSPACE) should > clone packets before doing that if not asked to take ownership. The ownership of the dp_packet object needs to be well established. Use of should_steal could be okay, but as an example, we have something in the packet batch but it doesn't actually get honored (which is why I proposed removing it). >>> If we are worried about the time it takes to copy the dp_packet >>> structure and buffer, we can always introduce a reference counting >>> mechanism later as an optimization. >>> >>> I would just prefer to do clone, and then the functional area which >>> needs to hold a reference to a valid packet buffer can delete when it >>> makes sense. >>> >>> Hope it helps. >>> >>>> Best regards, Ilya Maximets. >>>> >>>> >>>> [1] >>>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ >>>> >>>> Non-line-wrapped version of the test for convenience: >>>> >>>> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at >>>> index 31064ed95..d01f438b8 100644 >>>> --- a/tests/ofproto-dpif.at >>>> +++ b/tests/ofproto-dpif.at >>>> @@ -2159,6 +2159,27 @@ meter:controller flow_count:0 packet_in_count:8 >>>> byte_in_count:112 duration:0.0s >>>> OVS_VSWITCHD_STOP >>>> AT_CLEANUP >>>> >>>> +AT_SETUP([ofproto-dpif packet-out table meter drop qwe]) >>>> +OVS_VSWITCHD_START >>>> +add_of_ports br0 1 2 >>>> + >>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps >>>> bands=type=drop rate=1']) >>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 'in_port=1 >>>> action=meter:1,output:2']) >>>> + >>>> +ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 >>>> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 >>>> actions=resubmit(,0)" >>>> +ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 >>>> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 >>>> actions=resubmit(,0)" >>>> + >>>> +# Check that vswitchd hasn't crashed by dumping the meter added above >>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 dump-meters br0 | ofctl_strip], [0], >>>> [dnl >>>> +OFPST_METER_CONFIG reply (OF1.3): >>>> +meter=1 pktps bands= >>>> +type=drop rate=1 >>>> +]) >>>> + >>>> +OVS_VSWITCHD_STOP >>>> +AT_CLEANUP >>>> + >>>> + >>>> AT_SETUP([ofproto-dpif - MPLS handling]) >>>> OVS_VSWITCHD_START([dnl >>>> add-port br0 p1 -- set Interface p1 type=dummy >>>> --- >>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
