Hi Joe,
Thank you for your comments! Please, find my answers below.
> I had trouble trying to review this, in part because I felt like
> changes to fix multiple issues are being placed into one patch. If
> there are separate issues being addressed, each issue should be fixed
> in a separate patch, each with a clear description of the intent of
> the change (with links to patches that introduced the breakage if
> possible!). If there is refactoring to do, consider separating the
> non-functional changes into a separate patch---when looking at the
> original patch that started this discussion, I found it difficult to
> review post-merge because the refactoring was included and I couldn't
> tell if there were actually functional changes.
>
> Given that this is taking a bit longer than I thought and we need to
> take a holistic approach to how all of these interactions should work,
> I plan to go ahead and apply the revert patch. I look forward to
> seeing a fresh series proposal that will bring back the benefits of
> the original patch.
>
> More feedback below.
>
> > Here comes the patch:
> >
> >
> > Since tunneling and forwarding via patch port uses clone action, the
> > tunneled
>
> Where does forwarding via patch port use clone action?
Sorry, that was not correct. Patch port does not use clone action. Both patch
port and native tunneling use the apply_netsed_clone_action(). That's what I
wanted to express.
> apply_nested_clone_action() also takes copies of the flow and
> base_flow, do we need to do it twice in this path? I suspect that this
> work could benefit from some further refactoring, taking note to have
> separate patches to refactor code and different patches for functional
> changes.
Thank you for indicating this. There is no need to create a backup copy of
flow. We need to store some fields of base_flow. I could use a function like
this to create backup:
static void
copy_flow_L2L3_data(struct flow *dst, const struct flow *src)
{
dst->packet_type = src->packet_type;
dst->dl_dst = src->dl_dst;
dst->dl_src = src->dl_src;
dst->dl_type = src->dl_type;
dst->nw_dst = src->nw_dst;
dst->nw_src = src->nw_src;
dst->ipv6_dst = src->ipv6_dst;
dst->ipv6_src = src->ipv6_src;
dst->nw_tos = src->nw_tos;
dst->nw_ttl = src->nw_ttl;
dst->nw_proto = src->nw_proto;
dst->tp_dst = src->tp_dst;
dst->tp_src = src->tp_src;
}
...
/* Backup. */
copy_flow_L2L3_data(&old_base_flow, &ctx->base_flow);
...
/* Restore. */
copy_flow_L2L3_data(&ctx->base_flow, &old_base_flow);
> >
> > err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev);
> > if (err) {
> > @@ -3216,16 +3248,57 @@ build_tunnel_send(struct xlate_ctx *ctx, const
> > struct xport *xport,
> > tnl_push_data.tnl_port = odp_to_u32(tunnel_odp_port);
> > tnl_push_data.out_port = odp_to_u32(out_dev->odp_port);
> >
> > + /* After tunnel header has been added, MAC and IP data of flow and
> > + * base_flow need to be set properly, since there is not recirculation
> > + * any more when sending packet to tunnel. */
> > + if (!eth_addr_is_zero(ctx->xin->flow.dl_dst)) {
> > + ctx->xin->flow.dl_dst = dmac;
> > + ctx->base_flow.dl_dst = ctx->xin->flow.dl_dst;
> > + }
>
> What's the expectation if ctx->xin->flow.dl_dst is zero? Is that for
> L3 tunneled packets?
No, it's not for L3 tunneled packets. Without this push_pop_tunnel and
push_pop_tunnel_ipv6 unit tests do fail. This needs to be fixed.
783: tunnel_push_pop - action FAILED
(tunnel-push-pop.at:109)
785: tunnel_push_pop_ipv6 - action FAILED
(tunnel-push-pop-ipv6.at:92)
./tunnel-push-pop.at:108: ovs-appctl ofproto/trace ovs-dummy
'in_port(2),eth_type(0x0800),ipv4(src=1.1.3.88,dst=1.1.
3.112,proto=47,tos=0,ttl=64,frag=no)'
stdout:
Flow:
ip,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=1.1.3.88,nw_dst=1.1.
3.112,nw_proto=47,nw_tos=0,nw_ecn=0,nw_ttl=64
bridge("int-br")
----------------
0. priority 32768
output:2
-> output to native tunnel
-> tunneling to 1.1.2.92 via br0
-> tunneling from aa:55:aa:55:00:00 1.1.2.88 to f8:bc:12:44:34:b6 1.1.2.92
bridge("br0")
-------------
0. priority 32768
NORMAL
-> learned port is input port, dropping
Final flow: unchanged
Megaflow:
recirc_id=0,ip,in_port=LOCAL,vlan_tci=0x0000/0x1fff,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_e
cn=0,nw_frag=no
Datapath actions:
clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:5
5:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x
0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100))
./tunnel-push-pop.at:109: tail -1 stdout
--- - 2017-05-11 14:40:42.205776583 +0200
+++
/home/ezolbal/work/general_L3_tunneling/ovs/tests/testsuite.dir/at-groups/783/stdout
2017-05-11 14:40:42.200387095 +0200
@@ -1,2 +1,2 @@
-Datapath actions:
clone(tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100)),1)
+Datapath actions:
clone(drop),tnl_push(tnl_port(4789),header(size=50,type=4,eth(dst=f8:bc:12:44:34:b6,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.2.88,dst=1.1.2.92,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x7b)),out_port(100))
> > +
> > + ctx->xin->flow.dl_src = smac;
> > + ctx->base_flow.dl_src = ctx->xin->flow.dl_src;
> > +
> > + propagate_tunnel_data_to_flow(&ctx->xin->flow, &ctx->base_flow);
> > +
> > + if (!tnl_params.is_ipv6) {
> > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IP);
> > + } else {
> > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IPV6);
> > + }
>
> When writing if (!condition) { ... } else { ... }, please consider
> whether you can swap the conditions and remove the negation. It's less
> cognitive load if we don't have to negate a negative condition to
> consider which condition the "else" applies to. (This otherwise looks
> correct, so just a style request.)
Ok. I'm going to change this.
> > +
> > + if (tnl_push_data.tnl_type == OVS_VPORT_TYPE_GRE) {
> > + ctx->xin->flow.nw_proto = IPPROTO_GRE;
> > + } else {
> > + ctx->xin->flow.nw_proto = IPPROTO_UDP;
> > + }
> > + ctx->base_flow.nw_proto = ctx->xin->flow.nw_proto;
>
> Should we use a switch () statement here to ensure that newer tunnel
> ports are properly handled in this case if/when they are added? What
> about things like LISP or STT?
I can use a switch case:
switch (tnl_push_data.tnl_type) {
case OVS_VPORT_TYPE_GRE:
ctx->xin->flow.nw_proto = IPPROTO_GRE;
break;
case OVS_VPORT_TYPE_VXLAN:
case OVS_VPORT_TYPE_GENEVE:
ctx->xin->flow.nw_proto = IPPROTO_UDP;
break;
case OVS_VPORT_TYPE_LISP:
/* XXX: UDP? */
case OVS_VPORT_TYPE_STT:
/* XXX: TCP? */
default:
break;
}
But, how to handle LISP and STT? I have not seen any build_header() function
for LISP and STT in netdev-vport.c:
static const struct vport_class vport_classes[] = {
TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header,
netdev_tnl_push_udp_header,
netdev_geneve_pop_header),
TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header,
netdev_gre_push_header,
netdev_gre_pop_header),
TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header,
netdev_tnl_push_udp_header,
netdev_vxlan_pop_header),
TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL),
TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL),
};
Are LISP and STT valid tunnel types for native tunneling? If I'm not wrong,
build_tunnel_send() is invoked when native tunneling is used.
> > +
> > size_t push_action_size = 0;
> > size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
> > OVS_ACTION_ATTR_CLONE);
> > odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> > push_action_size = ctx->odp_actions->size;
> > +
> > + if (ctx->rule) {
> > + old_ths = ctx->rule->tunnel_header_size;
> > + ctx->rule->tunnel_header_size = tnl_push_data.header_len;
> > + }
> > +
> > apply_nested_clone_actions(ctx, xport, out_dev);
> > if (ctx->odp_actions->size > push_action_size) {
> > /* Update the CLONE action only when combined */
> > nl_msg_end_nested(ctx->odp_actions, clone_ofs);
> > }
>
> If there are no actions on the underlay bridge (ie drop), the else
> condition here should probably reset the nl_msg nesting so that we
> don't generate any actions to the datapath at all. You may also
> consider adding a xlate_report() call to explain why the drop is
> occuring - this should help debugging later on using ofproto/trace.
I'm not sure about this.
Sugesh, how should this be resolved?
> > if (credit_counts) {
> > + uint64_t stats_n_bytes = 0;
> > +
> > + if (rule->truncated_packet_size) {
> > + stats_n_bytes = MIN(rule->truncated_packet_size,
> > stats->n_bytes);
> > + } else {
> > + stats_n_bytes = stats->n_bytes;
> > + }
>
> Is this fixing a separate issue? I'm confused about whether truncated
> packet stats are being correctly attributed today on master. If so, I
> think this should split out into a separate patch. You might also
> consider whether it makes more sense to modify 'stats' earlier in the
> path so that each rule doesn't need to individually apply the stats
> adjustment. I could imagine a store/restore of the stats plus
> modifying for truncation during the xlate_output_trunc_action()
> processing rather than pushing this complexity all the way into the
> rule stats attribution.
Incorrect bytes statistics on the underlay bridge is an unwanted side-effect of
the original "Avoid recirculate" commit. By exluding recirculation, there is no
upcall for the tunnelled packets, and the packet size is not set by
upcall_xlate() again:
static void
upcall_xlate(struct udpif *udpif, struct upcall *upcall,
struct ofpbuf *odp_actions, struct flow_wildcards *wc)
{
struct dpif_flow_stats stats;
struct xlate_in xin;
stats.n_packets = 1;
stats.n_bytes = dp_packet_size(upcall->packet);
stats.used = time_msec();
stats.tcp_flags = ntohs(upcall->flow->tcp_flags);
xlate_in_init(&xin, upcall->ofproto,
ofproto_dpif_get_tables_version(upcall->ofproto),
upcall->flow, upcall->in_port, NULL,
stats.tcp_flags, upcall->packet, wc, odp_actions);
if (upcall->type == DPIF_UC_MISS) {
xin.resubmit_stats = &stats;
Here upcall_xlate() sets xin.resubmint_stats which will be used for calculating
statistic. These are const values, that's why I have not updated them earlier.
/* If nonnull, flow translation credits the specified statistics to each
* rule reached through a resubmit or OFPP_TABLE action.
*
* This is normally null so the client has to set it manually after
* calling xlate_in_init(). */
const struct dpif_flow_stats *resubmit_stats;
If it does not harm, then the const modifier could be removed and stats updated
at earlier stage.
Best regards,
Zoltan
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev