Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

Ilya Maximets Fri, 13 Aug 2021 10:26:54 -0700

On 8/13/21 6:25 AM, Tony van der Peet wrote:
> Hi Ilya et al
> 
> I can do that, but it may take a few days to get to it. Thanks for the 
> detailed analysis, I'll try to get my head around that too and contribute to 
> a fix.


Thanks!  I'm planning to make a series of stable releases in a couple
of weeks after the 2.16 release (which is on Monday). It would be great
to have a fix for this crash there.  So, we have a bit of time, no rush.
In any case, if you have any questions feel free to ask.

BTW, here is the test for the tunnel push issue:

diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 48c5de9d1..2ba39cff5 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -595,6 +595,62 @@ OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep 
50540000000a5054000000091235 | wc
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([tunnel_push_pop - packet_out debug_slow])
+
+OVS_VSWITCHD_START(
+    [add-port br0 p0 dnl
+     -- set Interface p0 type=dummy ofport_request=1 dnl
+                         other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
+AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
+AT_CHECK([ovs-vsctl add-port int-br t2 dnl
+          -- set Interface t2 type=geneve options:remote_ip=1.1.2.92 dnl
+                              options:key=123 ofport_request=2])
+
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 br0], [0], [OK
+])
+AT_CHECK([ovs-ofctl add-flow br0 action=normal])
+
+dnl This ARP reply from p0 has two effects:
+dnl 1. The ARP cache will learn that 1.1.2.92 is at f8:bc:12:44:34:b6.
+dnl 2. The br0 mac learning will learn that f8:bc:12:44:34:b6 is on p0.
+AT_CHECK([
+  ovs-appctl netdev-dummy/receive p0 dnl
+      'recirc_id(0),in_port(2),dnl
+       eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
+       
arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'
+])
+
+AT_CHECK([ovs-vsctl -- set Interface p0 options:tx_pcap=p0.pcap])
+
+packet=50540000000a505400000009123
+encap=f8bc124434b6aa55aa5500000800450000320000400040113406010102580101025c83a917c1001e00000000655800007b00
+
+dnl Output to tunnel from a int-br internal port.
+dnl Checking that the packet arrived and it was correctly encapsulated.
+AT_CHECK([ovs-ofctl add-flow int-br 
"in_port=LOCAL,actions=debug_slow,output:2"])
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l` 
-ge 1])
+dnl Sending again to exercise the non-miss upcall path.
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l` 
-ge 2])
+
+dnl Output to tunnel from the controller.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out int-br CONTROLLER 
"debug_slow,output:2" "${packet}5"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}5" | wc -l` 
-ge 1])
+
+dnl Datapath actions should not have tunnel push action.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q tnl_push], [1])
+dnl There should be slow_path action instead.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q 'slow_path(action)'], [0])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([tunnel_push_pop - underlay bridge match])
 
 OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy 
ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
---

Basically, it triggers the following PACKET_OUT:
OFPT_PACKET_OUT (OF1.3) (xid=0x6): in_port=CONTROLLER 
actions=debug_slow,output:2 data_len=14
vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,dl_type=0x1235

And it results in sending a packet through the tunnel with the datapath actions:
  tnl_push(tnl_port(6081),header(...),out_port(100)),1

The trick is that 'debug_slow' forces slow path execution, i.e. 
dpif_execute_with_help().
Therefore dpif_execute() called twice, first time with actions:tnl_push(...) 
and the
second time with actions:1 expecting that packet got modified between the calls.

If we will clone the packet inside the dpif_netdev_execute() and not copy the
data back, packet will not be modified and we will have original packet without
tunnel headers in the pcap file.

Interestingly, this happens only for PACKET_OUT packets and never happens
for the packets originated inside the datapath (received from the actual port).
This is because on upcall (both miss and actions upcalls), dp->upcall_cb()
receives both actual 'actions' and 'put_actions'.  In this case, for the first
packet entering the datapath handle_packet_upcall will trigger DPIF_UC_MISS
upcall and it will receive following:

actions    :  tnl_push(tnl_port(6081),header(...),out_port(100)),1
put_actions:  userspace(pid=0,slow_path(action))

It will execute the 'actions' inside the datapath avoiding the problem, but
will install 'put_actions' into the classifier.

The second packet in the userspace datapath will have a match in the datapath
classifier and datapath will execute 'userspace(pid=0,slow_path(action))'
actions.  It will do that by triggering DPIF_UC_ACTIONS upcall and it will
receive 'tnl_push(tnl_port(6081),header(...),out_port(100)),1' as 'actions'.
Again they will be executed inside the datapath avoiding the problem.

But the test above exercises this path too just in case.

Best regards, Ilya Maximets.

> 
> Tony
> 
> On Thu, Aug 12, 2021 at 10:51 AM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     Hi.
> 
>     It's been some time since the last email in this thread.  I spent more
>     time debugging and analyzing the code and, I think, I got to the bottom
>     of this.  First of all, I had a question why the set of actions in that
>     particular unit test in OVN needs help for execution, so it hits the
>     dpif_execute_with_help() path?  The answer is the action set that looks
>     like this:
> 
>     actions:set(eth(dst=ff:ff:ff:ff:ff:ff)),set(arp(sip=172.16.1.1)),
>             clone(tnl_push(tnl_port(6081),...,out_port(1)),101)
> 
>     Here the 'set(arp())' action is always marked as SLOW_ACTION and leads
>     to 'needs_help == true'.  Changes of arp fields is not supported by
>     kernel.  We may argue that userspace datapath supports it and we should
>     be able to execute these actions inside the datapath avoiding the
>     slow path execution process, but there are other cases where slow path
>     execution can be requested.  One of the most obvious is a direct request
>     of OFPACT_DEBUG_SLOW.  So, it's hard to completely avoid slow path
>     execution even for a userspace datapath.
> 
>     Anyway, dpif_execute_with_help() or, more precisely, the callback
>     dpif_execute_helper_cb() seems to be the source of all issues.  The
>     overall design of the actions execution seems to be flawed here.
>     This function makes several opposite assumptions:
> 
>     1. It assumes that dpif_execute() can't modify packets and that
>        there is no way to get result of the execution from the datapath.
> 
>     2. It assumes that dpif_execute() can and will modify the packet.
> 
>     Assumption 1 leads to a workaround for 'meter' actions which are
>     accumulated from several calls and pushed along with the first terminal
>     action (output, userspace or ct).  This assumption is true for the
>     kernel datapath, because the packet basically gets copied to the kernel.
> 
>     Assumption 2 at the same time allows to implement tunneling for the
>     userpsace datapath.  dpif_execute_helper_cb() relies on the fact
>     that dpif_execute(OVS_ACTION_ATTR_TUNNEL_PUSH) will modify the packet
>     by pushing the tunnel header.
> 
>     So, this inconsistent behavior for different actions breaks the
>     dpif abstraction and doesn't allow us to correctly implement the
>     dpif_execute() interface consistently across different datapath
>     interfaces.
> 
>     Even though I see what is going on here, I don't currently see a way
>     to fix this architectural/design flaw.  Definitely packet reference
>     counters will still help in correct tracking of dp_packets and we
>     need that.  I know that Aaron started working on this, but it will
>     take a while before we will have a working solution.  I'll try
>     to think how to actually re-work the actions execution path so we
>     will not have this kind of self-contradicting functions and APIs
>     (these inconsistencies are really the main thing that complicates
>     the code and doesn't allow to understand it without a deep dive).
>     If someone has ideas on that front, I'd like to hear.
> 
>     Meanwhile, I still think that we need to get the "bandaid" workaround
>     (clone + data copy) in, so we can fix the actual crash.  As far as
>     I understand now, change like this should not have any unexpected
>     side effect.
> 
>     Tony, do you want to sent the workaround patch?  Or else I can do that.
> 
>     Bets regards, Ilya Maximets.
> 
>     On 6/21/21 10:58 PM, Ilya Maximets wrote:
>     > On 6/19/21 4:11 AM, Tony van der Peet wrote:
>     >> Hi all
>     >>
>     >> I am in favour of a better fix, bandaids tend to come back and bite 
> you in the end anyway. So, will be happy to help with this effort, though I 
> am probably going to have to defer to the rest of you when it comes to 
> actually knowing what to do in this area.
>     >>
>     >
>     > It's great to have a full correct solution, but unfortunately, I don't
>     > think that we can come up with something like a full and correct
>     > reference counting in a short period of time.  But we still need to fix
>     > a crash that is pretty easy to trigger.  I'm also nervous that OVN is
>     > using meters more and more (e.g. control plane protection patch set)
>     > and they might actually hit this issue at the high load.  Another point
>     > is that we need a fix that we can backport to LTS, and I don't think 
> that
>     > reference counting would be a small change that we can easily backport
>     > without worrying about breaking something else.
>     >
>     > All in all, I think that we need to come up with a "bandaid" solution
>     > and work further on correct reference counting after that.
>     >
>     > And we also need to create a unit test that will mimic packet-out that
>     > I encountered in OVN tests, so we can test this kind of behavior in OVS.
>     >
>     > For the reference, the packet-out generated by OVN controller had a few
>     > set() actions and the resubmit() to a different table. And this
>     > table had rules leading to packet output to a tunnel port, resulting
>     > in a tunnel push + output datapath actions.
>     >
>     >> Cheers
>     >> Tony
>     >> ________________________________________
>     >> From: Aaron Conole <[email protected] <mailto:[email protected]>>
>     >> Sent: Saturday, 19 June 2021 2:50 a.m.
>     >> To: Ilya Maximets
>     >> Cc: Ben Pfaff; Tony van der Peet; [email protected] 
> <mailto:[email protected]>; Tony van der Peet
>     >> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT 
> is metered
>     >>
>     >> Ilya Maximets <[email protected] <mailto:[email protected]>> writes:
>     >>
>     >>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
>     >>>> All these flags for stealing, allowing stealing, blah blah, are just
>     >>>> ways to do some kind of dumb reference counting without actually 
> have a
>     >>>> reference count.  When it gets super complex like this, maybe
>     >>>> introducing a reference count is the way to go.  It would be a bigger
>     >>>> change, but perhaps more maintainable over time.
>     >>>>
>     >>>
>     >>> Yes, I absolutely agree.  We just removed one of these hacky flags 
> from
>     >>> the struct dp_packet_batch and we need to get rid of the rest of them.
>     >>
>     >> +1 from me - it's a bigger scoped change, but over-all, I think it's a
>     >> better one, and makes the most sense.
>     >>
>     >>> One thing that bothers me is that some parts of the code treats
>     >>> 'should_steal=false' as "do not modify".  For example, I don't really
>     >>> understand why these functions are carrying cutlen around and clones
>     >>> the packet before truncating if 'should_steal=false'.
>     >>>
>     >>> Some action executors also have optimizations that allows to not clone
>     >>> the packet before the fatal action if this action is the last one.
>     >>>
>     >>> So, in general, yes, we need to get rid of these flags and accurately
>     >>> re-work all the packet paths.  And yes, this would be not a small
>     >>> change.
>     >>>
>     >>> For now, I think, we need to find a less ugly as possible solution for
>     >>> existing crash (maybe the one that I suggested), so we will be able to
>     >>> backport it and continue working on correct reference counting.
>     >>>
>     >>> What do you think?  Tony?  Aaron?
>     >>
>     >> That makes sense to me, and I hope we will actually work on the
>     >> ref-count stuff.  I had started taking a look a few weeks back, but the
>     >> idea of 'steal' is pretty ingrained throughout the code, so getting a
>     >> ref-count semantic correct is a big effort.  As an example, the
>     >> odp-execute area expects that each sub-action will have its own copy to
>     >> modify (and this is documented with the 'should_steal' semantics).
>     >> Taking that out will be a big rework in that area.  I think it makes
>     >> the most sense, and we probably could implement something like COW to
>     >> cover those cases where we really need to modify a packet without
>     >> propagating those changes back up.
>     >>
>     >>> Best regards, Ilya Maximets.
>     >>
>     >
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

Reply via email to