Hi Joe, Thank you for the comments., Please find my answers below Regards _Sugesh
> -----Original Message----- > From: Zoltán Balogh [mailto:[email protected]] > Sent: Monday, July 17, 2017 3:38 PM > To: Joe Stringer <[email protected]>; Chandran, Sugesh > <[email protected]> > Cc: ovs dev <[email protected]>; Andy Zhou <[email protected]> > Subject: RE: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining > recirc actions at xlate. > > > Hi Joe, > > I used the setup below to measure Tx performance using a single core and 2 > x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the > 10G > bandwidth and measured the Tx processing cost per transmitted packet in > nsec. > > +-------------+ > dpdk0 | | > -->o br-in | > | o--> gre0 > +-------------+ > > --> LOCAL > +-----------o-+ > | | dpdk1 > | br-p1 o--> > | | > +-------------+ > > This is the result of OVS master with DPDK 16.11.2: > > # dpdk0 > > RX packets : 7037641.60 / sec > RX packet errors : 0 / sec > RX packets dropped : 7730632.90 / sec > RX rate : 402.69 MB/sec > > # dpdk1 > > TX packets : 7037641.60 / sec > TX packet errors : 0 / sec > TX packets dropped : 0 / sec > TX rate : 657.73 MB/sec > TX processing cost per TX packets in nsec : 142.09 > > > This is the result of OVS master + patch with DPDK 16.11.2: > > # dpdk0 > > RX packets : 9386809.60 / sec > RX packet errors : 0 / sec > RX packets dropped : 5381496.40 / sec > RX rate : 537.11 MB/sec > > # dpdk1 > > TX packets : 9386809.60 / sec > TX packet errors : 0 / sec > TX packets dropped : 0 / sec > TX rate : 877.29 MB/sec > TX processing cost per TX packets in nsec : 106.53 > > As you can see the number of transmitted packets per second increased > from 7M to 9.3M. The gain is above 30% > > Best regards, > Zoltan > > > -----Original Message----- > > From: Joe Stringer [mailto:[email protected]] > > Sent: Saturday, July 15, 2017 1:46 AM > > To: Sugesh Chandran <[email protected]> > > Cc: ovs dev <[email protected]>; Andy Zhou <[email protected]>; Zoltán > > Balogh <[email protected]> > > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining > recirc actions at xlate. > > > > On 4 July 2017 at 02:46, Sugesh Chandran <[email protected]> > wrote: > > > This patch set removes the recirculation of encapsulated tunnel > > > packets if possible. It is done by computing the post tunnel actions > > > at the time of translation. The combined nested action set are > > > programmed in the datapath using CLONE action. > > > > > > Signed-off-by: Sugesh Chandran <[email protected]> > > > Signed-off-by: Zoltán Balogh <[email protected]> > > > Co-authored-by: Zoltán Balogh <[email protected]> > > > --- > > > > Hi Sugesh, Zoltán, > > > > Thanks for working on this, it's subtle code but it seems like you've > > found a useful optimization here. Hopefully we can move this forward > > soon. > > [Sugesh] Thank you Joe!. Will try to do the best to cover most of the corner cases. > > Would you be able to put your performance numbers into the commit > > message here? They could be useful to refer back to in the future. [Sugesh] Sure, Zoltan already ran performance benchmarking and shared the results. Will add that in the next version of patch. > > > > This patch needs rebasing. [Sugesh] Will do that in the next version. > > > > More feedback below. > > > > > lib/dpif-netdev.c | 18 +-- > > > ofproto/ofproto-dpif-xlate-cache.c | 14 +- > > > ofproto/ofproto-dpif-xlate-cache.h | 12 +- > > > ofproto/ofproto-dpif-xlate.c | 295 > ++++++++++++++++++++++++++++++++++++- > > > ofproto/ofproto-dpif-xlate.h | 1 + > > > ofproto/ofproto-dpif.c | 3 +- > > > tests/packet-type-aware.at | 24 +-- > > > 7 files changed, 324 insertions(+), 43 deletions(-) > > > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > > 4e29085..4d996c1 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct > > > dp_packet_batch *packets_, > > > > > > case OVS_ACTION_ATTR_TUNNEL_PUSH: > > > if (*depth < MAX_RECIRC_DEPTH) { > > > - struct dp_packet_batch tnl_pkt; > > > - struct dp_packet_batch *orig_packets_ = packets_; > > > - int err; > > > - > > > - if (!may_steal) { > > > - dp_packet_batch_clone(&tnl_pkt, packets_); > > > - packets_ = &tnl_pkt; > > > - dp_packet_batch_reset_cutlen(orig_packets_); > > > - } > > > - > > > dp_packet_batch_apply_cutlen(packets_); > > > - > > > - err = push_tnl_action(pmd, a, packets_); > > > - if (!err) { > > > - (*depth)++; > > > - dp_netdev_recirculate(pmd, packets_); > > > - (*depth)--; > > > - } > > > + push_tnl_action(pmd, a, packets_); > > > > If I follow, this was the previous datapath-level code to ensure that > > when packets are output to a tunnel, only the copy of the packet that > > goes to the tunnel gets the cutlen applied. With the new version of > > the code, we replace this by making sure that the ofproto layer > > generates a clone to wrap the push_tunnel and all subsequent bridge > > translation, so then it results in the equivalent behaviour: The > > cutlen is only ever applied to a clone of the packets. Is that right? [Sugesh] Hmm. That’s the good catch. Looks like the packets are not cloned when combine_tunnel_action is found impossible in ofproto layer. As a fix we should keep the code block in 'may_steal' to handle the non tunnel-combine case. Any other suggestion to apply cut_len in ofproto? > > > > <snip> > > > > > +/* Validate if the transalated combined actions are OK to proceed. > > > + * If actions consist of TRUNC action, it is not allowed to do the > > > + * tunnel_push combine as it cannot update stats correctly. > > > + */ > > > +static bool > > > +is_tunnel_actions_clone_ready(struct ofpbuf *actions) { > > > + struct nlattr *tnl_actions; > > > + const struct nlattr *a; > > > + unsigned int left; > > > + size_t actions_len; > > > + > > > + if (!actions) { > > > + /* No actions, no harm in doing combine. */ > > > + return true; > > > + } > > > + actions_len = actions->size; > > > + > > > + tnl_actions =(struct nlattr *)(actions->data); > > > + NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) { > > > + int type = nl_attr_type(a); > > > + if (type == OVS_ACTION_ATTR_TRUNC) { > > > + VLOG_DBG("Cannot do tunnel action-combine on trunc action"); > > > + return false; > > > + break; > > > + } > > > + } > > > + return true; > > > +} > > > + > > > +/* > > > + * Copy the xc entry and update the references accordingly. > > > + */ > > > +static void > > > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src) > > > > I think that this whole function should probably reside in > > ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative > > proposal detailed below that would not require us to also take > > expensive refcounts due to this approach. [Sugesh] Will introduce a new function in xlate-cache to handle it. > > > > > +{ > > > + memcpy(xc_dst, xc_src, sizeof *xc_dst); > > > + switch (xc_src->type) { > > > + case XC_RULE: > > > + ofproto_rule_ref(&xc_dst->rule->up); > > > + break; > > > + case XC_BOND: > > > + bond_ref(xc_dst->bond.bond); > > > + xc_dst->bond.flow = xmemdup(xc_src->bond.flow, > > > + sizeof *xc_src->bond.flow); > > > + break; > > > + case XC_NETDEV: > > > + if (xc_src->dev.tx) { > > > + netdev_ref(xc_dst->dev.tx); > > > + } > > > + if (xc_src->dev.rx) { > > > + netdev_ref(xc_dst->dev.rx); > > > + } > > > + if (xc_src->dev.bfd) { > > > + bfd_ref(xc_dst->dev.bfd); > > > + } > > > + break; > > > + case XC_NETFLOW: > > > + netflow_ref(xc_dst->nf.netflow); > > > + xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src- > >nf.flow); > > > + break; > > > + case XC_MIRROR: > > > + mbridge_ref(xc_dst->mirror.mbridge); > > > + break; > > > + case XC_LEARN: > > > + case XC_TABLE: > > > + case XC_NORMAL: > > > + case XC_FIN_TIMEOUT: > > > + case XC_GROUP: > > > + case XC_TNL_NEIGH: > > > + case XC_CONTROLLER: > > > + case XC_TUNNEL_HEADER: > > > + break; > > > + default: > > > + OVS_NOT_REACHED(); > > > + } > > > +} > > > + > > > +static bool > > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx, > > > + const struct xport *xport, > > > + struct xport *out_dev, > > > + struct ovs_action_push_tnl > > > +tnl_push_data) { > > > + const struct dpif_flow_stats *backup_resubmit_stats; > > > + struct xlate_cache *backup_xcache; > > > + bool nested_act_flag = false; > > > + struct flow_wildcards tmp_flow_wc; > > > + struct flow_wildcards *backup_flow_wc_ptr; > > > + bool backup_side_effects; > > > + const struct dp_packet *backup_pkt; > > > + > > > + memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc); > > > + backup_flow_wc_ptr = ctx->wc; > > > + ctx->wc = &tmp_flow_wc; > > > + ctx->xin->wc = NULL; > > > + backup_resubmit_stats = ctx->xin->resubmit_stats; > > > + backup_xcache = ctx->xin->xcache; > > > + backup_side_effects = ctx->xin->allow_side_effects; > > > + backup_pkt = ctx->xin->packet; > > > + > > > + size_t push_action_size = 0; > > > + size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions, > > > + OVS_ACTION_ATTR_CLONE); > > > + odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data); > > > + push_action_size = ctx->odp_actions->size; > > > + > > > + ctx->xin->resubmit_stats = NULL; > > > + ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache. > */ > > > + ctx->xin->allow_side_effects = false; > > > + ctx->xin->packet = NULL; > > > > I realised that you asked about this after the previous patch, but I > > didn't understand the implications of it. I believe that this packet > > is how, for instance, the second bridge can forward packets to the > > controller. If you reset it to NULL here, then I don't think that the > > controller action is executed correctly during the second bridge > > translation. (That's a test we could add). [Sugesh] Yes, that’s a good test case for it. > > > > I suspect that some multicast snooping stuff and some other pieces > > that would be affected by this. [Sugesh] This is interesting. So does the controller expects a encaped packets in that case? Do you think, the controller action also be an exception case like the TRUNC? Do we have a case where send the encap traffic to controller? > > > > > + > > > + if (ctx->xin->xcache) { > > > > A few lines above, this is unconditionally set. Could be removed? [Sugesh] I think it is newly created and its possible to have a NULL incase of malloc failure. That’s why added the case. > > > > > + /* Push the cache entry for the tunnel first. */ > > > + struct xc_entry *entry; > > > + entry = xlate_cache_add_entry(ctx->xin->xcache, > XC_TUNNEL_HEADER); > > > + entry->tunnel_hdr.hdr_size = tnl_push_data.header_len; > > > + entry->tunnel_hdr.operation = ADD; > > > + } > > > + > > > + apply_nested_clone_actions(ctx, xport, out_dev); > > > + nested_act_flag = > > > + is_tunnel_actions_clone_ready(ctx->odp_actions); > > > + > > > + if (nested_act_flag) { > > > + struct dpif_flow_stats tmp_resubmit_stats; > > > + struct xc_entry *entry; > > > + struct ofpbuf entries; > > > + > > > + memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats); > > > > I think that tmp_resubmit_stats is only ever used in the if condition > > below, which will initialize the entire structure anyway so the > > declaration can be moved below, and this memset line can be removed. [Sugesh] Sure. Will move it down. > > > > > + /* Similar to the stats update in revalidation, the x_cache > > > entries > > > + * are populated by the previous translation are used to update > > > the > > > + * stats correctly. > > > + .*/ > > > + ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex); > > > > Which piece of shared memory is this mutex protecting? [Sugesh] Will remove it. > > > > > + if (backup_resubmit_stats) { > > > + memcpy(&tmp_resubmit_stats, backup_resubmit_stats, > > > + sizeof tmp_resubmit_stats); > > > + xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats); > > > + } > > > + entries = ctx->xin->xcache->entries; > > > + XC_ENTRY_FOR_EACH (entry, &entries) { > > > + struct xc_entry *xc_copy_entry; > > > + if (!backup_xcache) { > > > + break; > > > > Do we need to do this for every single XC entry? [Sugesh] Will remove in next release . > > > > > + } > > > + xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry- > >type); > > > + copy_xc_entry(xc_copy_entry, entry); > > > > I wonder if instead of open-coding this stuff here, instead there was > > a function defined in ofproto-dpif-xlate-cache.c which would append a > > second xlate_cache onto a first. A function prototype like this, > > perhaps? > > > > /* Iterates through 'src' and removes the xc_entry elements from it, > > and places them inside 'dst' */ void xlate_cache_steal_entries(struct > > xlate_cache *dst, struct xlate_cache *src); [Sugesh] Make sense, will create a function to append cache entries and clear off the src once the copying completed. > > > > The current execution should own both xlate_caches so I think that it > > should be safe to more or less just append the latter xlate_cache onto > > the original one. > > > > > + } > > > + ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex); > > > + } > > > + else { > > > > Please place the else on the same line as the close bracket. [Sugesh] ok > > > > > + /* Combine is not valid. */ > > > + nl_msg_cancel_nested(ctx->odp_actions, clone_ofs); > > > + goto out; > > > + } > > > + if (ctx->odp_actions->size > push_action_size) { > > > + /* Update the CLONE action only when combined. */ > > > + nl_msg_end_nested(ctx->odp_actions, clone_ofs); > > > + } > > > + else { > > > > Same style request. > > > > > + /* No actions after the tunnel, no need of clone. */ > > > + nl_msg_cancel_nested(ctx->odp_actions, clone_ofs); > > > + odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data); > > > > If there are no actions after the tunnel, then do we need the tunnel > > header at all? [Sugesh] There are test cases that does it. I kept it for cover those. > > > > Do we have a test case which outputs to a tunnel which goes to another > > bridge that drops the packet, then on the first bridge continues to > > execute some more stuff to make sure this path works in the way we > > expect? _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
