Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

Joe Stringer Thu, 11 May 2017 09:39:48 -0700

On 11 May 2017 at 09:04, Zoltán Balogh <zoltan.bal...@ericsson.com> wrote:
> Hi Joe,
>
> Thank you for your comments! Please, find my answers below.
>
>> I had trouble trying to review this, in part because I felt like
>> changes to fix multiple issues are being placed into one patch. If
>> there are separate issues being addressed, each issue should be fixed
>> in a separate patch, each with a clear description of the intent of
>> the change (with links to patches that introduced the breakage if
>> possible!). If there is refactoring to do, consider separating the
>> non-functional changes into a separate patch---when looking at the
>> original patch that started this discussion, I found it difficult to
>> review post-merge because the refactoring was included and I couldn't
>> tell if there were actually functional changes.
>>
>> Given that this is taking a bit longer than I thought and we need to
>> take a holistic approach to how all of these interactions should work,
>> I plan to go ahead and apply the revert patch. I look forward to
>> seeing a fresh series proposal that will bring back the benefits of
>> the original patch.
>>
>> More feedback below.
>>
>> > Here comes the patch:
>> >
>> >
>> > Since tunneling and forwarding via patch port uses clone action, the 
>> > tunneled
>>
>> Where does forwarding via patch port use clone action?
>
> Sorry, that was not correct. Patch port does not use clone action. Both patch
> port and native tunneling use the apply_netsed_clone_action(). That's what I
> wanted to express.


OK. I find that function name a bit confusing given it doesn't relate to clone.

>> apply_nested_clone_action() also takes copies of the flow and
>> base_flow, do we need to do it twice in this path? I suspect that this
>> work could benefit from some further refactoring, taking note to have
>> separate patches to refactor code and different patches for functional
>> changes.
>
> Thank you for indicating this. There is no need to create a backup copy of
> flow. We need to store some fields of base_flow. I could use a function like
> this to create backup:
>
> static void
> copy_flow_L2L3_data(struct flow *dst, const struct flow *src)
> {
>     dst->packet_type = src->packet_type;
>     dst->dl_dst = src->dl_dst;
>     dst->dl_src = src->dl_src;
>     dst->dl_type = src->dl_type;
>     dst->nw_dst = src->nw_dst;
>     dst->nw_src = src->nw_src;
>     dst->ipv6_dst = src->ipv6_dst;
>     dst->ipv6_src = src->ipv6_src;
>     dst->nw_tos = src->nw_tos;
>     dst->nw_ttl = src->nw_ttl;
>     dst->nw_proto = src->nw_proto;
>     dst->tp_dst = src->tp_dst;
>     dst->tp_src = src->tp_src;
> }

Ah, that's not quite what I meant. Copying the whole flow seems
generically correct as per before. The above may be more efficient,
but it also seems easier to screw up if 'struct flow' fields are
changed. I think we can start with copying the whole flow, and
consider an optimization like this separately.

My concern was that both build_tunnel_send() and
apply_nested_clone_action() was doing the store/restore of the flows.

> ...
>
> /* Backup. */
> copy_flow_L2L3_data(&old_base_flow, &ctx->base_flow);
>
> ...
>
> /* Restore. */
> copy_flow_L2L3_data(&ctx->base_flow, &old_base_flow);
>
>> >
>> >      err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev);
>> >      if (err) {
>> > @@ -3216,16 +3248,57 @@ build_tunnel_send(struct xlate_ctx *ctx, const 
>> > struct xport *xport,
>> >      tnl_push_data.tnl_port = odp_to_u32(tunnel_odp_port);
>> >      tnl_push_data.out_port = odp_to_u32(out_dev->odp_port);
>> >
>> > +    /* After tunnel header has been added, MAC and IP data of flow and
>> > +     * base_flow need to be set properly, since there is not recirculation
>> > +     * any more when sending packet to tunnel. */
>> > +    if (!eth_addr_is_zero(ctx->xin->flow.dl_dst)) {
>> > +        ctx->xin->flow.dl_dst = dmac;
>> > +        ctx->base_flow.dl_dst = ctx->xin->flow.dl_dst;
>> > +    }
>>
>> What's the expectation if ctx->xin->flow.dl_dst is zero? Is that for
>> L3 tunneled packets?
>
> No, it's not for L3 tunneled packets. Without this push_pop_tunnel and
> push_pop_tunnel_ipv6 unit tests do fail. This needs to be fixed.
>
> 783: tunnel_push_pop - action                        FAILED 
> (tunnel-push-pop.at:109)
> 785: tunnel_push_pop_ipv6 - action                   FAILED 
> (tunnel-push-pop-ipv6.at:92)

OK, I still don't quite understand how we get into this state of
dl_dst is zero in the first place. If it's unexpected, let's fix the
bug that causes this state. If it's expected, maybe we can add a short
comment above the if statement which describes why the dl_dst may be
zero and why it makes sense to leave the flow's mac alone for
tunneling.

> ./tunnel-push-pop.at:108: ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(2),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.
> 3.112,proto=47,tos=0,ttl=64,frag=no)'
> stdout:
> Flow: 
> ip,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=1.1.3.88,nw_dst=1.1.
> 3.112,nw_proto=47,nw_tos=0,nw_ecn=0,nw_ttl=64
>
> bridge("int-br")
> ----------------
>  0. priority 32768
>     output:2
>      -> output to native tunnel
>      -> tunneling to 1.1.2.92 via br0
>      -> tunneling from aa:55:aa:55:00:00 1.1.2.88 to f8:bc:12:44:34:b6 
> 1.1.2.92
>
> bridge("br0")
> -------------
>  0. priority 32768
>     NORMAL
>      -> learned port is input port, dropping
>
> Final flow: unchanged
> Megaflow: 
> recirc_id=0,ip,in_port=LOCAL,vlan_tci=0x0000/0x1fff,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_e
> cn=0,nw_frag=no
> Datapath actions: 
> clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:5
> 5:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x
> 0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100))
> ./tunnel-push-pop.at:109: tail -1 stdout
> --- -   2017-05-11 14:40:42.205776583 +0200
> +++ 
> /home/ezolbal/work/general_L3_tunneling/ovs/tests/testsuite.dir/at-groups/783/stdout
>         2017-05-11 14:40:42.200387095 +0200
> @@ -1,2 +1,2 @@
> -Datapath actions: 
> clone(tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)),1)
> +Datapath actions: 
> clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100))
>
>
>> > +
>> > +    ctx->xin->flow.dl_src = smac;
>> > +    ctx->base_flow.dl_src = ctx->xin->flow.dl_src;
>> > +
>> > +    propagate_tunnel_data_to_flow(&ctx->xin->flow, &ctx->base_flow);
>> > +
>> > +    if (!tnl_params.is_ipv6) {
>> > +        ctx->xin->flow.dl_type = htons(ETH_TYPE_IP);
>> > +    } else {
>> > +        ctx->xin->flow.dl_type = htons(ETH_TYPE_IPV6);
>> > +    }
>>
>> When writing if (!condition) { ... } else { ... }, please consider
>> whether you can swap the conditions and remove the negation. It's less
>> cognitive load if we don't have to negate a negative condition to
>> consider which condition the "else" applies to. (This otherwise looks
>> correct, so just a style request.)
>
> Ok. I'm going to change this.

Thanks.

>> > +
>> > +    if (tnl_push_data.tnl_type == OVS_VPORT_TYPE_GRE) {
>> > +        ctx->xin->flow.nw_proto = IPPROTO_GRE;
>> > +    } else {
>> > +        ctx->xin->flow.nw_proto = IPPROTO_UDP;
>> > +    }
>> > +    ctx->base_flow.nw_proto = ctx->xin->flow.nw_proto;
>>
>> Should we use a switch () statement here to ensure that newer tunnel
>> ports are properly handled in this case if/when they are added? What
>> about things like LISP or STT?
>
> I can use a switch case:
>
>     switch (tnl_push_data.tnl_type) {
>     case OVS_VPORT_TYPE_GRE:
>         ctx->xin->flow.nw_proto = IPPROTO_GRE;
>         break;
>     case OVS_VPORT_TYPE_VXLAN:
>     case OVS_VPORT_TYPE_GENEVE:
>         ctx->xin->flow.nw_proto = IPPROTO_UDP;
>         break;
>     case OVS_VPORT_TYPE_LISP:
>          /* XXX: UDP? */
>     case OVS_VPORT_TYPE_STT:
>          /* XXX: TCP? */
>     default:
>         break;
>     }
>
> But, how to handle LISP and STT? I have not seen any build_header() function
> for LISP and STT in netdev-vport.c:
>
>     static const struct vport_class vport_classes[] = {
>         TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header,
>                                             netdev_tnl_push_udp_header,
>                                             netdev_geneve_pop_header),
>         TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header,
>                                        netdev_gre_push_header,
>                                        netdev_gre_pop_header),
>         TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header,
>                                            netdev_tnl_push_udp_header,
>                                            netdev_vxlan_pop_header),
>         TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL),
>         TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL),
>     };
>
> Are LISP and STT valid tunnel types for native tunneling? If I'm not wrong,
> build_tunnel_send() is invoked when native tunneling is used.

Ah! Sugesh pointed this out to me yesterday. I though there was STT
but you're right.

For LISP/STT it can just call OVS_NOT_REACHED(). Then it can fail in a
pretty obvious way if someone attempts to use those tunnel types in
future.

>> > +
>> >      size_t push_action_size = 0;
>> >      size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
>> >                                             OVS_ACTION_ATTR_CLONE);
>> >      odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
>> >      push_action_size = ctx->odp_actions->size;
>> > +
>> > +    if (ctx->rule) {
>> > +        old_ths = ctx->rule->tunnel_header_size;
>> > +        ctx->rule->tunnel_header_size = tnl_push_data.header_len;
>> > +    }
>> > +
>> >      apply_nested_clone_actions(ctx, xport, out_dev);
>> >      if (ctx->odp_actions->size > push_action_size) {
>> >          /* Update the CLONE action only when combined */
>> >          nl_msg_end_nested(ctx->odp_actions, clone_ofs);
>> >      }
>>
>> If there are no actions on the underlay bridge (ie drop), the else
>> condition here should probably reset the nl_msg nesting so that we
>> don't generate any actions to the datapath at all. You may also
>> consider adding a xlate_report() call to explain why the drop is
>> occuring - this should help debugging later on using ofproto/trace.
>
> I'm not sure about this.
> Sugesh, how should this be resolved?
>
>> >      if (credit_counts) {
>> > +        uint64_t stats_n_bytes = 0;
>> > +
>> > +        if (rule->truncated_packet_size) {
>> > +            stats_n_bytes = MIN(rule->truncated_packet_size, 
>> > stats->n_bytes);
>> > +        } else {
>> > +            stats_n_bytes = stats->n_bytes;
>> > +        }
>>
>> Is this fixing a separate issue? I'm confused about whether truncated
>> packet stats are being correctly attributed today on master. If so, I
>> think this should split out into a separate patch. You might also
>> consider whether it makes more sense to modify 'stats' earlier in the
>> path so that each rule doesn't need to individually apply the stats
>> adjustment. I could imagine a store/restore of the stats plus
>> modifying for truncation during the xlate_output_trunc_action()
>> processing rather than pushing this complexity all the way into the
>> rule stats attribution.
>
> Incorrect bytes statistics on the underlay bridge is an unwanted side-effect 
> of
> the original "Avoid recirculate" commit. By exluding recirculation, there is 
> no
> upcall for the tunnelled packets, and the packet size is not set by
> upcall_xlate() again:
>
>   static void
>   upcall_xlate(struct udpif *udpif, struct upcall *upcall,
>                struct ofpbuf *odp_actions, struct flow_wildcards *wc)
>   {
>       struct dpif_flow_stats stats;
>       struct xlate_in xin;
>
>       stats.n_packets = 1;
>       stats.n_bytes = dp_packet_size(upcall->packet);
>       stats.used = time_msec();
>       stats.tcp_flags = ntohs(upcall->flow->tcp_flags);
>
>       xlate_in_init(&xin, upcall->ofproto,
>                     ofproto_dpif_get_tables_version(upcall->ofproto),
>                     upcall->flow, upcall->in_port, NULL,
>                     stats.tcp_flags, upcall->packet, wc, odp_actions);
>
>       if (upcall->type == DPIF_UC_MISS) {
>           xin.resubmit_stats = &stats;
>
> Here upcall_xlate() sets xin.resubmint_stats which will be used for 
> calculating
> statistic. These are const values, that's why I have not updated them earlier.
>
>     /* If nonnull, flow translation credits the specified statistics to each
>      * rule reached through a resubmit or OFPP_TABLE action.
>      *
>      * This is normally null so the client has to set it manually after
>      * calling xlate_in_init(). */
>     const struct dpif_flow_stats *resubmit_stats;
>
> If it does not harm, then the const modifier could be removed and stats 
> updated
> at earlier stage.

I see, thanks for the explanation. Mostly I'm just trying to push the
question of "how can we make this code easier to read and
understand?". Const is nice because it tells you these fields won't
change. But I think it's probably a little easier if the truncation of
the statistics is performed where the truncation occurs, so that the
core stats attribution logic can be oblivious of this particular
feature.

Cheers,
Joe
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

Reply via email to