Jarno,

2017-01-27, Jarno Rajahalme:
Cleanest solution would be to use the new clone action for the datapath, which, if important, we could also backport to OVS 2.5.x.

Yes, clone is indeed much more elegant.


Datapath CLONE is yet to be upstreamed, though, may take a little time.

Even before net-dev upstreaming, there doesn't seem to even be a clone implementation in ovs/datapath yet, right ?

-Thomas




2017-01-06, Jarno Rajahalme:
I sent a patch to do this fix on OVS master. IMO we should also make the datapath more flexible so that it would be able to POP back to IP after PUSH actions on an IP packet in the same action list. But that will not make it to OVS 2.7.

I would appreciate if Thomas could apply the and test!

  Jarno

On Dec 14, 2016, at 1:52 PM, Jarno Rajahalme <[email protected] <mailto:[email protected]>> wrote:


On Dec 13, 2016, at 8:44 PM, Takashi YAMAMOTO <[email protected] <mailto:[email protected]>> wrote:



On Wed, Dec 14, 2016 at 9:02 AM, Jarno Rajahalme<[email protected] <mailto:[email protected]>>wrote:


    On Dec 13, 2016, at 3:32 PM, Takashi YAMAMOTO
    <[email protected] <mailto:[email protected]>> wrote:



    On Tue, Dec 13, 2016 at 6:49 PM, Thomas
    Morin<[email protected]
    <mailto:[email protected]>>wrote:

        Hi Jarno,

        2016-12-10, Jarno Rajahalme:
        On Dec 9, 2016, at 7:04 AM, Thomas Morin
        <[email protected]
        <mailto:[email protected]>> wrote:

        2016-12-09, Thomas Morin:
        In the same setup as the one on which the bug was
        observed, [...]

        I was confused, I in fact tested the patch (branch-2.5)
        on a /different/ setup as the one on which we hit the
        bug, using MPLS over a GRE tunnel port, rather than
        plain MPLS over an eth port.
        Sorry if any confusion arised... I can test on the
        first setup if relevant.


        Maybe the kernel datapath does not support MPLS over a
        GRE tunnel port. Having ‘dmesg’ output for the test run
        might reveal why the actions validation fails.

        The dmesg output was the following:

        [171295.258939] openvswitch: netlink: Flow actions may
        not be safe on all matching packets.

        I've tested the patch on the platform on which the bug
        was initially hit (*not* using MPLS/GRE), and I have the
        following a few times in the logs right after I do an
        "ovs-appctl fdb/flush":

        2016-12-13T09:44:08.449Z|00001|dpif(handler68)|WARN|Dropped
        3 log messages in last 1 seconds (most recently, 1
        seconds ago) due to excessive rate
        2016-12-13T09:44:08.449Z|00002|dpif(handler68)|WARN|system@ovs-system:
        failed to put[create] (Invalid argument)
        ufid:f046c4c4-b97f-436d-bd7c-91ed307275ac
        
recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(9),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64),eth_type(0x0800),ipv4(src=10.0.1.29,dst=10.0.0.3,proto=6,tos=0/0xfc,ttl=64,frag=no),tcp(src=54253,dst=8080),tcp_flags(0/0),
        
actions:set(ipv4(src=10.0.1.29,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=433680,tc=0,ttl=63,bos=1,eth_type=0x8847),7,set(eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),set(ipv4(src=10.0.1.29,dst=10.0.0.3,tos=0/0xfc,ttl=64)),push_vlan(vid=1,pcp=0),3,8,pop_vlan,13

        And dmesg:
        [926833.612372] openvswitch: netlink: Flow actions may
        not be safe on all matching packets.


    it's complaining about set ipv4 after pop_mpls because it
    doesn't know about the "restored" l3.
    while we can improve the kernel, i guess we need to stick
    with recirc for now.

    This should do it? Does not break any existing tests on
    branch-2.5, but I did not create a test for this yet.

    diff --git a/ofproto/ofproto-dpif-xlate.c
    b/ofproto/ofproto-dpif-xlate.c
    index fb25f63..6ee2a72 100644
    --- a/ofproto/ofproto-dpif-xlate.c
    +++ b/ofproto/ofproto-dpif-xlate.c
    @@ -2899,6 +2899,15 @@ xlate_commit_actions(struct xlate_ctx *ctx)
     {
         bool use_masked = ctx->xbridge->support.masked_set_action;
    +    /* An MPLS packet can be implicitly popped back to a
    non-MPLS packet, if a
    +     * patch port peer or a group bucket pushed MPLS.  Set
    the 'was_mpls' flag
    +     * also in that case. */
    +    if (eth_type_mpls(ctx->base_flow.dl_type)
    +        && !eth_type_mpls(ctx->xin->flow.dl_type)
    +        && ctx->xbridge->support.odp.recirc) {
    +  ctx->was_mpls = true;
    +    }
    +


i guess this check needs to be somewhere around "ctx->was_mpls = old_was_mpls" in
affected functions.


Right, as that is where the implicit MPLS POP action happens, when the ‘ctx->xin->flow’ is restored.

  Jarno

     ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow,
    &ctx->base_flow,
     ctx->odp_actions, ctx->wc,
     use_masked);

    Jarno




        -Thomas





        On Dec 1, 2016, at 5:57 PM, Jarno Rajahalme
        <[email protected] <mailto:[email protected]>> wrote:


        On Nov 30, 2016, at 8:50 PM, Ben Pfaff <[email protected]
        <mailto:[email protected]>> wrote:

        On Wed, Nov 30, 2016 at 06:58:57PM -0800, Jarno
        Rajahalme wrote:

        On Nov 30, 2016, at 8:41 AM, Ben Pfaff
        <[email protected] <mailto:[email protected]>> wrote:

        On Wed, Nov 30, 2016 at 12:05:46PM +0100, Thomas
        Morin wrote:
        Hi Ben,

        2016-11-30, Ben Pfaff:
        Do you have any idea what in your OpenFlow
        pipeline might do that,
        i.e. is there anything especially tricky in the
        OpenFlow flows?

        Are you willing to show us your OpenFlow flow
        table?

        The setup involves three OVS bridges connected
        with patch-ports: br-int --
        br-tun -- br-mpls, with the traffic that
        triggers the assert being processed
        by br-int with a NORMAL action (ie. MAC learning).

        The flows in this setup aren't particularly
        tricky, I think, although I'm
        not sure what qualifies as tricky or non-tricky :)

        Anyway, since yesterday I managed to identify
        the event that trigger the
        assert, by adding more logging before the assert
        and displaying the actions
        taken:

        
2016-11-29T14:44:40.126Z|00001|odp_util(revalidator45)|WARN|commit_set_ipv4_action
        assert would fail....
        2016-11-29T14:44:40.126Z|00002|odp_util(revalidator45)|WARN|
         base_flow:
        
ip,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
        2016-11-29T14:44:40.126Z|00003|odp_util(revalidator45)|WARN|
         flow:
        
tcp,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=10.0.1.22,nw_dst=10.0.0.3,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53295,tp_dst=8080,tcp_flags=psh|ack
        2016-11-29T14:44:40.126Z|00004|odp_util(revalidator45)|WARN|
         masks:
        
recirc_id=0xffffffff,reg0=0xffffffff,in_port=4294967295,dl_vlan=4095,dl_vlan_pcp=7,dl_src=ff:ff:ff:ff:ff:ff,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0xffff
        2016-11-29T14:44:40.126Z|00005|odp_util(revalidator45)|WARN|
         actions:
        
set(ipv4(src=10.0.1.22,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410384,tc=0,ttl=63,bos=1,eth_type=0x8847),9,set(eth(src=fa:16:3e:33:f7:fe,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),push_vlan(vid=3,pcp=0),1

        push_mpls clears L3/L4, while pop_mpls
        re-populates them, and then processing the output
        to port 1 hits the assert?

        That's what I'm thinking too.

        Jarno, is this something you have time to look
        into? It'd be great, if
        you do.  I'm way behind.

        I’m looking at this.

        Based on the trace given it seems that:
        1. Packet is received on br-int port 32, which
        outputs it via NORMAL action over a patch port to
        another bridge. The only patch-port on br-int is 2
        (patch-tun). The NORMAL action adds dl_vlan=1.
        2. br-tun receives the packet on in_port 1
        (patch-int), and outputs it on it’s port 2
        (patch-to-mpls)
        3. br-mpls receives the packet on it’s in_port 2
        (patch-to-tun), does pop_vlan, and outputs to it’s
        port 21 (ipvpn-pp-out), which is also an patch port.
        4. br-mpls (?) receives the packet on it’s in_port
        20 (ipvpn-pp-in), does
        
dec_ttl,push_mpls:0x8847,load:0x644c0->OXM_OF_MPLS_LABEL[],set_field:b8:2a:72:de:1b:e3->eth_src,set_field:00:17:cb:79:2c:01->eth_dst,output:1

        All this generates a megaflow:
        
set(ipv4(src=10.0.1.23,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410816,tc=0,ttl=63,bos=1,eth_type=0x8847),9

        This is only the beginning part of the trouble-some
        megaflow, in which br-int sends the packet also to
        another port (vlan 3), and as part of that pops the
        MPLS and restores the original ethernet addresses.
        Maybe this would happen with the trace too, if you
        flushed MACs before the trace?

        The patch ports 21 and 20 appear to be in the same
        bridge and patched to each other. Is this the case?

        The crashing megaflow has in_port=5,dl_vlan=3. Is
        this also on br-int?

        Also, OVS 2.6 is a little bit less aggressive about
        avoiding recirculation after mpls operations, and
        I’d be interested to know if your case fails the
        same way with OVS 2.6?

        Thanks,

        Jarno





_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to