On 8/13/21 6:25 AM, Tony van der Peet wrote:
> Hi Ilya et al
>
> I can do that, but it may take a few days to get to it. Thanks for the
> detailed analysis, I'll try to get my head around that too and contribute to
> a fix.
Thanks! I'm planning to make a series of stable releases in a couple
of weeks after the 2.16 release (which is on Monday). It would be great
to have a fix for this crash there. So, we have a bit of time, no rush.
In any case, if you have any questions feel free to ask.
BTW, here is the test for the tunnel push issue:
diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 48c5de9d1..2ba39cff5 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -595,6 +595,62 @@ OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep
50540000000a5054000000091235 | wc
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([tunnel_push_pop - packet_out debug_slow])
+
+OVS_VSWITCHD_START(
+ [add-port br0 p0 dnl
+ -- set Interface p0 type=dummy ofport_request=1 dnl
+ other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
+AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
+AT_CHECK([ovs-vsctl add-port int-br t2 dnl
+ -- set Interface t2 type=geneve options:remote_ip=1.1.2.92 dnl
+ options:key=123 ofport_request=2])
+
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 br0], [0], [OK
+])
+AT_CHECK([ovs-ofctl add-flow br0 action=normal])
+
+dnl This ARP reply from p0 has two effects:
+dnl 1. The ARP cache will learn that 1.1.2.92 is at f8:bc:12:44:34:b6.
+dnl 2. The br0 mac learning will learn that f8:bc:12:44:34:b6 is on p0.
+AT_CHECK([
+ ovs-appctl netdev-dummy/receive p0 dnl
+ 'recirc_id(0),in_port(2),dnl
+ eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
+
arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'
+])
+
+AT_CHECK([ovs-vsctl -- set Interface p0 options:tx_pcap=p0.pcap])
+
+packet=50540000000a505400000009123
+encap=f8bc124434b6aa55aa5500000800450000320000400040113406010102580101025c83a917c1001e00000000655800007b00
+
+dnl Output to tunnel from a int-br internal port.
+dnl Checking that the packet arrived and it was correctly encapsulated.
+AT_CHECK([ovs-ofctl add-flow int-br
"in_port=LOCAL,actions=debug_slow,output:2"])
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l`
-ge 1])
+dnl Sending again to exercise the non-miss upcall path.
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}4" | wc -l`
-ge 2])
+
+dnl Output to tunnel from the controller.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out int-br CONTROLLER
"debug_slow,output:2" "${packet}5"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep "${encap}${packet}5" | wc -l`
-ge 1])
+
+dnl Datapath actions should not have tunnel push action.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q tnl_push], [1])
+dnl There should be slow_path action instead.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -q 'slow_path(action)'], [0])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
AT_SETUP([tunnel_push_pop - underlay bridge match])
OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
---
Basically, it triggers the following PACKET_OUT:
OFPT_PACKET_OUT (OF1.3) (xid=0x6): in_port=CONTROLLER
actions=debug_slow,output:2 data_len=14
vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,dl_type=0x1235
And it results in sending a packet through the tunnel with the datapath actions:
tnl_push(tnl_port(6081),header(...),out_port(100)),1
The trick is that 'debug_slow' forces slow path execution, i.e.
dpif_execute_with_help().
Therefore dpif_execute() called twice, first time with actions:tnl_push(...)
and the
second time with actions:1 expecting that packet got modified between the calls.
If we will clone the packet inside the dpif_netdev_execute() and not copy the
data back, packet will not be modified and we will have original packet without
tunnel headers in the pcap file.
Interestingly, this happens only for PACKET_OUT packets and never happens
for the packets originated inside the datapath (received from the actual port).
This is because on upcall (both miss and actions upcalls), dp->upcall_cb()
receives both actual 'actions' and 'put_actions'. In this case, for the first
packet entering the datapath handle_packet_upcall will trigger DPIF_UC_MISS
upcall and it will receive following:
actions : tnl_push(tnl_port(6081),header(...),out_port(100)),1
put_actions: userspace(pid=0,slow_path(action))
It will execute the 'actions' inside the datapath avoiding the problem, but
will install 'put_actions' into the classifier.
The second packet in the userspace datapath will have a match in the datapath
classifier and datapath will execute 'userspace(pid=0,slow_path(action))'
actions. It will do that by triggering DPIF_UC_ACTIONS upcall and it will
receive 'tnl_push(tnl_port(6081),header(...),out_port(100)),1' as 'actions'.
Again they will be executed inside the datapath avoiding the problem.
But the test above exercises this path too just in case.
Best regards, Ilya Maximets.
>
> Tony
>
> On Thu, Aug 12, 2021 at 10:51 AM Ilya Maximets <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi.
>
> It's been some time since the last email in this thread. I spent more
> time debugging and analyzing the code and, I think, I got to the bottom
> of this. First of all, I had a question why the set of actions in that
> particular unit test in OVN needs help for execution, so it hits the
> dpif_execute_with_help() path? The answer is the action set that looks
> like this:
>
> actions:set(eth(dst=ff:ff:ff:ff:ff:ff)),set(arp(sip=172.16.1.1)),
> clone(tnl_push(tnl_port(6081),...,out_port(1)),101)
>
> Here the 'set(arp())' action is always marked as SLOW_ACTION and leads
> to 'needs_help == true'. Changes of arp fields is not supported by
> kernel. We may argue that userspace datapath supports it and we should
> be able to execute these actions inside the datapath avoiding the
> slow path execution process, but there are other cases where slow path
> execution can be requested. One of the most obvious is a direct request
> of OFPACT_DEBUG_SLOW. So, it's hard to completely avoid slow path
> execution even for a userspace datapath.
>
> Anyway, dpif_execute_with_help() or, more precisely, the callback
> dpif_execute_helper_cb() seems to be the source of all issues. The
> overall design of the actions execution seems to be flawed here.
> This function makes several opposite assumptions:
>
> 1. It assumes that dpif_execute() can't modify packets and that
> there is no way to get result of the execution from the datapath.
>
> 2. It assumes that dpif_execute() can and will modify the packet.
>
> Assumption 1 leads to a workaround for 'meter' actions which are
> accumulated from several calls and pushed along with the first terminal
> action (output, userspace or ct). This assumption is true for the
> kernel datapath, because the packet basically gets copied to the kernel.
>
> Assumption 2 at the same time allows to implement tunneling for the
> userpsace datapath. dpif_execute_helper_cb() relies on the fact
> that dpif_execute(OVS_ACTION_ATTR_TUNNEL_PUSH) will modify the packet
> by pushing the tunnel header.
>
> So, this inconsistent behavior for different actions breaks the
> dpif abstraction and doesn't allow us to correctly implement the
> dpif_execute() interface consistently across different datapath
> interfaces.
>
> Even though I see what is going on here, I don't currently see a way
> to fix this architectural/design flaw. Definitely packet reference
> counters will still help in correct tracking of dp_packets and we
> need that. I know that Aaron started working on this, but it will
> take a while before we will have a working solution. I'll try
> to think how to actually re-work the actions execution path so we
> will not have this kind of self-contradicting functions and APIs
> (these inconsistencies are really the main thing that complicates
> the code and doesn't allow to understand it without a deep dive).
> If someone has ideas on that front, I'd like to hear.
>
> Meanwhile, I still think that we need to get the "bandaid" workaround
> (clone + data copy) in, so we can fix the actual crash. As far as
> I understand now, change like this should not have any unexpected
> side effect.
>
> Tony, do you want to sent the workaround patch? Or else I can do that.
>
> Bets regards, Ilya Maximets.
>
> On 6/21/21 10:58 PM, Ilya Maximets wrote:
> > On 6/19/21 4:11 AM, Tony van der Peet wrote:
> >> Hi all
> >>
> >> I am in favour of a better fix, bandaids tend to come back and bite
> you in the end anyway. So, will be happy to help with this effort, though I
> am probably going to have to defer to the rest of you when it comes to
> actually knowing what to do in this area.
> >>
> >
> > It's great to have a full correct solution, but unfortunately, I don't
> > think that we can come up with something like a full and correct
> > reference counting in a short period of time. But we still need to fix
> > a crash that is pretty easy to trigger. I'm also nervous that OVN is
> > using meters more and more (e.g. control plane protection patch set)
> > and they might actually hit this issue at the high load. Another point
> > is that we need a fix that we can backport to LTS, and I don't think
> that
> > reference counting would be a small change that we can easily backport
> > without worrying about breaking something else.
> >
> > All in all, I think that we need to come up with a "bandaid" solution
> > and work further on correct reference counting after that.
> >
> > And we also need to create a unit test that will mimic packet-out that
> > I encountered in OVN tests, so we can test this kind of behavior in OVS.
> >
> > For the reference, the packet-out generated by OVN controller had a few
> > set() actions and the resubmit() to a different table. And this
> > table had rules leading to packet output to a tunnel port, resulting
> > in a tunnel push + output datapath actions.
> >
> >> Cheers
> >> Tony
> >> ________________________________________
> >> From: Aaron Conole <[email protected] <mailto:[email protected]>>
> >> Sent: Saturday, 19 June 2021 2:50 a.m.
> >> To: Ilya Maximets
> >> Cc: Ben Pfaff; Tony van der Peet; [email protected]
> <mailto:[email protected]>; Tony van der Peet
> >> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT
> is metered
> >>
> >> Ilya Maximets <[email protected] <mailto:[email protected]>> writes:
> >>
> >>> On 6/17/21 11:59 PM, Ben Pfaff wrote:
> >>>> All these flags for stealing, allowing stealing, blah blah, are just
> >>>> ways to do some kind of dumb reference counting without actually
> have a
> >>>> reference count. When it gets super complex like this, maybe
> >>>> introducing a reference count is the way to go. It would be a bigger
> >>>> change, but perhaps more maintainable over time.
> >>>>
> >>>
> >>> Yes, I absolutely agree. We just removed one of these hacky flags
> from
> >>> the struct dp_packet_batch and we need to get rid of the rest of them.
> >>
> >> +1 from me - it's a bigger scoped change, but over-all, I think it's a
> >> better one, and makes the most sense.
> >>
> >>> One thing that bothers me is that some parts of the code treats
> >>> 'should_steal=false' as "do not modify". For example, I don't really
> >>> understand why these functions are carrying cutlen around and clones
> >>> the packet before truncating if 'should_steal=false'.
> >>>
> >>> Some action executors also have optimizations that allows to not clone
> >>> the packet before the fatal action if this action is the last one.
> >>>
> >>> So, in general, yes, we need to get rid of these flags and accurately
> >>> re-work all the packet paths. And yes, this would be not a small
> >>> change.
> >>>
> >>> For now, I think, we need to find a less ugly as possible solution for
> >>> existing crash (maybe the one that I suggested), so we will be able to
> >>> backport it and continue working on correct reference counting.
> >>>
> >>> What do you think? Tony? Aaron?
> >>
> >> That makes sense to me, and I hope we will actually work on the
> >> ref-count stuff. I had started taking a look a few weeks back, but the
> >> idea of 'steal' is pretty ingrained throughout the code, so getting a
> >> ref-count semantic correct is a big effort. As an example, the
> >> odp-execute area expects that each sub-action will have its own copy to
> >> modify (and this is documented with the 'should_steal' semantics).
> >> Taking that out will be a big rework in that area. I think it makes
> >> the most sense, and we probably could implement something like COW to
> >> cover those cases where we really need to modify a packet without
> >> propagating those changes back up.
> >>
> >>> Best regards, Ilya Maximets.
> >>
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev