> On Jan 27, 2017, at 7:40 AM, Thomas Morin <thomas.mo...@orange.com> wrote:
> 
> Hi Jarno, all,
> 
> Sorry for being slow to react.
> I'd be glad to test a fix, but I'm unclear if there is one ready to be tested.
> 
> Perhasp "xlate: Recirculate also when MPLS POP is implicit." but I'm not sure 
> there has been a follow-up to the discuss between you and Ben?
> Or is the new clone action possibly taking care of this issue (does not seem 
> so, since datapath dependent, but I might have misunderstood)?
> 

Cleanest solution would be to use the new clone action for the datapath, which, 
if important, we could also backport to OVS 2.5.x.

Datapath CLONE is yet to be upstreamed, though, may take a little time.

  Jarno

> -Thomas
> 
> 2017-01-06, Jarno Rajahalme:
>> I sent a patch to do this fix on OVS master. IMO we should also make the 
>> datapath more flexible so that it would be able to POP back to IP after PUSH 
>> actions on an IP packet in the same action list. But that will not make it 
>> to OVS 2.7.
>> 
>> I would appreciate if Thomas could apply the and test!
>> 
>>   Jarno
>> 
>>> On Dec 14, 2016, at 1:52 PM, Jarno Rajahalme <ja...@ovn.org 
>>> <mailto:ja...@ovn.org>> wrote:
>>> 
>>>> 
>>>> On Dec 13, 2016, at 8:44 PM, Takashi YAMAMOTO <yamam...@ovn.org 
>>>> <mailto:yamam...@ovn.org>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Wed, Dec 14, 2016 at 9:02 AM, Jarno Rajahalme <ja...@ovn.org 
>>>> <mailto:ja...@ovn.org>> wrote:
>>>> 
>>>>> On Dec 13, 2016, at 3:32 PM, Takashi YAMAMOTO <yamam...@ovn.org 
>>>>> <mailto:yamam...@ovn.org>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Dec 13, 2016 at 6:49 PM, Thomas Morin <thomas.mo...@orange.com 
>>>>> <mailto:thomas.mo...@orange.com>> wrote:
>>>>> Hi Jarno,
>>>>> 
>>>>> 2016-12-10, Jarno Rajahalme:
>>>>>> On Dec 9, 2016, at 7:04 AM, Thomas Morin <thomas.mo...@orange.com 
>>>>>> <mailto:thomas.mo...@orange.com>> wrote:
>>>>>>> 
>>>>>>> 2016-12-09, Thomas Morin:
>>>>>>>> In the same setup as the one on which the bug was observed, [...]
>>>>>>> 
>>>>>>> I was confused, I in fact tested the patch (branch-2.5) on a 
>>>>>>> /different/ setup as the one on which we hit the bug, using MPLS over a 
>>>>>>> GRE tunnel port, rather than plain MPLS over an eth port.
>>>>>>> Sorry if any confusion arised... I can test on the first setup if 
>>>>>>> relevant.
>>>>>>> 
>>>>>> 
>>>>>> Maybe the kernel datapath does not support MPLS over a GRE tunnel port. 
>>>>>> Having ‘dmesg’ output for the test run might reveal why the actions 
>>>>>> validation fails.
>>>>> 
>>>>> The dmesg output was the following: 
>>>>> 
>>>>> [171295.258939] openvswitch: netlink: Flow actions may not be safe on all 
>>>>> matching packets.
>>>>> 
>>>>> I've tested the patch on the platform on which the bug was initially hit 
>>>>> (*not* using MPLS/GRE), and I have the following a few times in the logs 
>>>>> right after I do an "ovs-appctl fdb/flush":
>>>>> 
>>>>> 2016-12-13T09:44:08.449Z|00001|dpif(handler68)|WARN|Dropped 3 log 
>>>>> messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>>>> excessive rate
>>>>> 2016-12-13T09:44:08.449Z|00002|dpif(handler68)|WARN|system@ovs-system: 
>>>>> failed to put[create] (Invalid argument) 
>>>>> ufid:f046c4c4-b97f-436d-bd7c-91ed307275ac 
>>>>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(9),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64),eth_type(0x0800),ipv4(src=10.0.1.29,dst=10.0.0.3,proto=6,tos=0/0xfc,ttl=64,frag=no),tcp(src=54253,dst=8080),tcp_flags(0/0),
>>>>>  
>>>>> actions:set(ipv4(src=10.0.1.29,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=433680,tc=0,ttl=63,bos=1,eth_type=0x8847),7,set(eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),set(ipv4(src=10.0.1.29,dst=10.0.0.3,tos=0/0xfc,ttl=64)),push_vlan(vid=1,pcp=0),3,8,pop_vlan,13
>>>>> 
>>>>> And dmesg:
>>>>> [926833.612372] openvswitch: netlink: Flow actions may not be safe on all 
>>>>> matching packets.
>>>>> 
>>>>> it's complaining about set ipv4 after pop_mpls because it doesn't know 
>>>>> about the "restored" l3.
>>>>> while we can improve the kernel, i guess we need to stick with recirc for 
>>>>> now.
>>>>>  
>>>> 
>>>> This should do it? Does not break any existing tests on branch-2.5, but I 
>>>> did not create a test for this yet.
>>>> 
>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>> index fb25f63..6ee2a72 100644
>>>> --- a/ofproto/ofproto-dpif-xlate.c
>>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>>> @@ -2899,6 +2899,15 @@ xlate_commit_actions(struct xlate_ctx *ctx)
>>>>  {
>>>>      bool use_masked = ctx->xbridge->support.masked_set_action;
>>>>  
>>>> +    /* An MPLS packet can be implicitly popped back to a non-MPLS packet, 
>>>> if a
>>>> +     * patch port peer or a group bucket pushed MPLS.  Set the 'was_mpls' 
>>>> flag
>>>> +     * also in that case. */
>>>> +    if (eth_type_mpls(ctx->base_flow.dl_type)
>>>> +        && !eth_type_mpls(ctx->xin->flow.dl_type)
>>>> +        && ctx->xbridge->support.odp.recirc) {
>>>> +        ctx->was_mpls = true;
>>>> +    }
>>>> +
>>>> 
>>>> i guess this check needs to be somewhere around "ctx->was_mpls = 
>>>> old_was_mpls" in
>>>> affected functions. 
>>>> 
>>> 
>>> Right, as that is where the implicit MPLS POP action happens, when the 
>>> ‘ctx->xin->flow’ is restored.
>>> 
>>>   Jarno
>>> 
>>>>      ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow, 
>>>> &ctx->base_flow,
>>>>                                            ctx->odp_actions, ctx->wc,
>>>>                                            use_masked);
>>>> 
>>>>   Jarno
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> -Thomas
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>>>> 
>>>>>>>>>> On Dec 1, 2016, at 5:57 PM, Jarno Rajahalme <ja...@ovn.org 
>>>>>>>>>> <mailto:ja...@ovn.org>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Nov 30, 2016, at 8:50 PM, Ben Pfaff <b...@ovn.org 
>>>>>>>>>>> <mailto:b...@ovn.org>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Nov 30, 2016 at 06:58:57PM -0800, Jarno Rajahalme wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 30, 2016, at 8:41 AM, Ben Pfaff <b...@ovn.org 
>>>>>>>>>>>>> <mailto:b...@ovn.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Nov 30, 2016 at 12:05:46PM +0100, Thomas Morin wrote:
>>>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2016-11-30, Ben Pfaff:
>>>>>>>>>>>>>>> Do you have any idea what in your OpenFlow pipeline might do 
>>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>> i.e. is there anything especially tricky in the OpenFlow flows?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Are you willing to show us your OpenFlow flow table?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The setup involves three OVS bridges connected with patch-ports: 
>>>>>>>>>>>>>> br-int --
>>>>>>>>>>>>>> br-tun -- br-mpls, with the traffic that triggers the assert 
>>>>>>>>>>>>>> being processed
>>>>>>>>>>>>>> by br-int with a NORMAL action (ie. MAC learning).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The flows in this setup aren't particularly tricky, I think, 
>>>>>>>>>>>>>> although I'm
>>>>>>>>>>>>>> not sure what qualifies as tricky or non-tricky :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Anyway, since yesterday I managed to identify the event that 
>>>>>>>>>>>>>> trigger the
>>>>>>>>>>>>>> assert, by adding more logging before the assert and displaying 
>>>>>>>>>>>>>> the actions
>>>>>>>>>>>>>> taken:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00001|odp_util(revalidator45)|WARN|commit_set_ipv4_action
>>>>>>>>>>>>>> assert would fail....
>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00002|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>> base_flow: 
>>>>>>>>>>>>>> ip,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00003|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>> flow: 
>>>>>>>>>>>>>> tcp,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=10.0.1.22,nw_dst=10.0.0.3,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53295,tp_dst=8080,tcp_flags=psh|ack
>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00004|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>> masks: 
>>>>>>>>>>>>>> recirc_id=0xffffffff,reg0=0xffffffff,in_port=4294967295,dl_vlan=4095,dl_vlan_pcp=7,dl_src=ff:ff:ff:ff:ff:ff,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0xffff
>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00005|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>> actions: 
>>>>>>>>>>>>>> set(ipv4(src=10.0.1.22,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410384,tc=0,ttl=63,bos=1,eth_type=0x8847),9,set(eth(src=fa:16:3e:33:f7:fe,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),push_vlan(vid=3,pcp=0),1
>>>>>>>>>>>> 
>>>>>>>>>>>> push_mpls clears L3/L4, while pop_mpls re-populates them, and then 
>>>>>>>>>>>> processing the output to port 1 hits the assert?
>>>>>>>>>>> 
>>>>>>>>>>> That's what I'm thinking too.
>>>>>>>>>>> 
>>>>>>>>>>> Jarno, is this something you have time to look into?  It'd be 
>>>>>>>>>>> great, if
>>>>>>>>>>> you do.  I'm way behind.
>>>>>>>>>> 
>>>>>>>>>> I’m looking at this.
>>>>>>>>>> 
>>>>>>>>>> Based on the trace given it seems that:
>>>>>>>>>> 1. Packet is received on br-int port 32, which outputs it via NORMAL 
>>>>>>>>>> action over a patch port to another bridge. The only patch-port on 
>>>>>>>>>> br-int is 2 (patch-tun). The NORMAL action adds dl_vlan=1.
>>>>>>>>>> 2. br-tun receives the packet on in_port 1 (patch-int), and outputs 
>>>>>>>>>> it on it’s port 2 (patch-to-mpls)
>>>>>>>>>> 3. br-mpls receives the packet on it’s in_port 2 (patch-to-tun), 
>>>>>>>>>> does pop_vlan, and outputs to it’s port 21 (ipvpn-pp-out), which is 
>>>>>>>>>> also an patch port.
>>>>>>>>>> 4. br-mpls (?) receives the packet on it’s in_port 20 (ipvpn-pp-in), 
>>>>>>>>>> does 
>>>>>>>>>> dec_ttl,push_mpls:0x8847,load:0x644c0->OXM_OF_MPLS_LABEL[],set_field:b8:2a:72:de:1b:e3->eth_src,set_field:00:17:cb:79:2c:01->eth_dst,output:1
>>>>>>>>>> 
>>>>>>>>>> All this generates a megaflow: 
>>>>>>>>>> set(ipv4(src=10.0.1.23,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410816,tc=0,ttl=63,bos=1,eth_type=0x8847),9
>>>>>>>>>> 
>>>>>>>>>> This is only the beginning part of the trouble-some megaflow, in 
>>>>>>>>>> which br-int sends the packet also to another port (vlan 3), and as 
>>>>>>>>>> part of that pops the MPLS and restores the original ethernet 
>>>>>>>>>> addresses. Maybe this would happen with the trace too, if you 
>>>>>>>>>> flushed MACs before the trace?
>>>>>>>>>> 
>>>>>>>>>> The patch ports 21 and 20 appear to be in the same bridge and 
>>>>>>>>>> patched to each other. Is this the case?
>>>>>>>>>> 
>>>>>>>>>> The crashing megaflow has in_port=5,dl_vlan=3. Is this also on 
>>>>>>>>>> br-int?
>>>>>>>>>> 
>>>>>>>>>> Also, OVS 2.6 is a little bit less aggressive about avoiding 
>>>>>>>>>> recirculation after mpls operations, and I’d be interested to know 
>>>>>>>>>> if your case fails the same way with OVS 2.6?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>>   Jarno
>> 
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to