Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-30 Thread Yanqin Wei (Arm Technology China)
Hi 

I run a basic nic2nic case for compiler branch stripping. The improvement is 
around 0.8%.
It is observed that 0.37% branch miss in case of directly call 
miniflow_extract__(but it is not inline here)  and 0.27% branch miss in case of 
enable multi-instance for miniflow_extract.

But downside is code size gets bigger and is possible to increase I-cache miss. 
In this test, it is not observed because only miniflow_extract_firstpass is 
invoked. For some cases such as tunnel or connection track, it might has some 
impact.  But in these cases, the branch prediction failure should also increase 
if not enable branch strip.

So the impact should be platform(cache size, branch prediction, compiler) and 
deployment(L3 fwding/tunnel/...) dependency. 
On the whole, the greater difference between first pass and recirculation, the 
more benefit to enable multi-instance.

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets  
Sent: Thursday, August 29, 2019 9:46 PM
To: Yanqin Wei (Arm Technology China) ; Ben Pfaff 

Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 

Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless 
optimization

On 29.08.2019 12:21, Yanqin Wei (Arm Technology China) wrote:
> Hi Ben,
> Thanks for the feedback.  It is indeed related with userspace datapath. 
> 
> Hi Ilya,
> Could you take a look at this patch when you have time?> I knew 
> first-pass and recirculating traffic share the same packet handling. It makes 
> code common and maintainable.
> But if we can introduce some data-oriented and well-defined flag to bypass 
> some time-consuming handling, it can improve performance a lot.

Hi. I had a quick look at the patch.
Few thoughts:
* 'md' is actually always valid inside 'miniflow_extract', so the
   variable 'md_valid' should be renamed to not be misleading.
   Maybe something like 'md_is_full'? I'm not sure about the name.

* How much is the performance benefit of the compiler code stripping?
  I mean, what is the difference between direct call
 miniflow_extract__(packet, dst, md_is_valid);
  where 'md_is_valid == false' and the call to
 miniflow_extract_firstpass(packet, dst);
  ?
  Asking because this complicates dfc_processing() function.
  I'm, actually have a patch locally to combine rss_hash
  calculation functions to reduce code duplication, so I'm trying
  to figure out what are the possibilities here.

Best regards, Ilya Maximets.

> 
> Best Regards,
> Wei Yanqin
> 
> -Original Message-
> From: Ben Pfaff 
> Sent: Thursday, August 29, 2019 6:11 AM
> To: Yanqin Wei (Arm Technology China) ; Ilya 
> Maximets 
> Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology 
> China) 
> Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata 
> branchless optimization
> 
> This fails to apply to current master.
> 
> Whereas most of the time I'd be comfortable with reviewing 'flow'
> patches myself, this one is closed related to the dpif-netdev code and I'd 
> prefer to have someone who understands that code well, as well as the 
> tradeoffs between performance and maintainability, review it.  Ilya (added to 
> the To line) is a good choice.
> 
> On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
>> "miniflow_extract" is a branch heavy implementation for packet header 
>> and metadata parsing. There is a lot of meta data handling for all traffic.
>> But this should not be applicable for packets from interface.
>> This patch adds a layer of inline encapsulation to miniflow_extract 
>> and introduces constant "md_valid" input parameter as a branch condition.
>> The new branch will be removed by the compiler at compile time. Two 
>> instances of miniflow_extract with different branches will be generated.
>>
>> This patch is tested on an arm64 platform. It improves more than 3.5% 
>> performance in P2P forwarding cases.
>>
>> Reviewed-by: Gavin Hu 
>> Signed-off-by: Yanqin Wei 
>> ---
>>  lib/dpif-netdev.c |  13 +++---
>>  lib/flow.c| 116 
>> --
>>  lib/flow.h|   2 +
>>  3 files changed, 79 insertions(+), 52 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> d0a1c58..6686b14 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>  }
>>  }
>>  
>> -miniflow_extract(packet, >mf);
>> +if (!md_is_valid) {
>> +miniflow_extract_firstpass(packet, >mf);
>> +key->hash =
>> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
>&g

Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-29 Thread Ilya Maximets
On 29.08.2019 12:21, Yanqin Wei (Arm Technology China) wrote:
> Hi Ben,
> Thanks for the feedback.  It is indeed related with userspace datapath. 
> 
> Hi Ilya, 
> Could you take a look at this patch when you have time?> 
> I knew first-pass and recirculating traffic share the same packet handling. 
> It makes code common and maintainable. 
> But if we can introduce some data-oriented and well-defined flag to bypass 
> some time-consuming handling, it can improve performance a lot.

Hi. I had a quick look at the patch.
Few thoughts:
* 'md' is actually always valid inside 'miniflow_extract', so the
   variable 'md_valid' should be renamed to not be misleading.
   Maybe something like 'md_is_full'? I'm not sure about the name.

* How much is the performance benefit of the compiler code stripping?
  I mean, what is the difference between direct call
 miniflow_extract__(packet, dst, md_is_valid);
  where 'md_is_valid == false' and the call to
 miniflow_extract_firstpass(packet, dst);
  ?
  Asking because this complicates dfc_processing() function.
  I'm, actually have a patch locally to combine rss_hash
  calculation functions to reduce code duplication, so I'm trying
  to figure out what are the possibilities here.

Best regards, Ilya Maximets.

> 
> Best Regards,
> Wei Yanqin
> 
> -Original Message-
> From: Ben Pfaff  
> Sent: Thursday, August 29, 2019 6:11 AM
> To: Yanqin Wei (Arm Technology China) ; Ilya Maximets 
> 
> Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 
> 
> Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless 
> optimization
> 
> This fails to apply to current master.
> 
> Whereas most of the time I'd be comfortable with reviewing 'flow'
> patches myself, this one is closed related to the dpif-netdev code and I'd 
> prefer to have someone who understands that code well, as well as the 
> tradeoffs between performance and maintainability, review it.  Ilya (added to 
> the To line) is a good choice.
> 
> On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
>> "miniflow_extract" is a branch heavy implementation for packet header 
>> and metadata parsing. There is a lot of meta data handling for all traffic.
>> But this should not be applicable for packets from interface.
>> This patch adds a layer of inline encapsulation to miniflow_extract 
>> and introduces constant "md_valid" input parameter as a branch condition.
>> The new branch will be removed by the compiler at compile time. Two 
>> instances of miniflow_extract with different branches will be generated.
>>
>> This patch is tested on an arm64 platform. It improves more than 3.5% 
>> performance in P2P forwarding cases.
>>
>> Reviewed-by: Gavin Hu 
>> Signed-off-by: Yanqin Wei 
>> ---
>>  lib/dpif-netdev.c |  13 +++---
>>  lib/flow.c| 116 
>> --
>>  lib/flow.h|   2 +
>>  3 files changed, 79 insertions(+), 52 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
>> d0a1c58..6686b14 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>  }
>>  }
>>  
>> -miniflow_extract(packet, >mf);
>> +if (!md_is_valid) {
>> +miniflow_extract_firstpass(packet, >mf);
>> +key->hash =
>> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
>> +} else {
>> +miniflow_extract(packet, >mf);
>> +key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
>> +}
>>  key->len = 0; /* Not computed yet. */
>> -key->hash =
>> -(md_is_valid == false)
>> -? dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf)
>> -: dpif_netdev_packet_get_rss_hash(packet, >mf);
>>  
>>  /* If EMC is disabled skip emc_lookup */
>>  flow = (cur_min != 0) ? emc_lookup(>emc_cache, key) : 
>> NULL; diff --git a/lib/flow.c b/lib/flow.c index e54fd2e..e5b554b 
>> 100644
>> --- a/lib/flow.c
>> +++ b/lib/flow.c
>> @@ -707,7 +707,8 @@ ipv6_sanity_check(const struct 
>> ovs_16aligned_ip6_hdr *nh, size_t size)  }
>>  
>>  /* Initializes 'dst' from 'packet' and 'md', taking the packet type 
>> into
>> - * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
>> + * account.  'dst' must have enough space for FLOW_U64S * 8 bytes. 
>> + Metadata
>> + * initialization sho

Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-29 Thread Yanqin Wei (Arm Technology China)
Hi Ben,
Thanks for the feedback.  It is indeed related with userspace datapath. 

Hi Ilya, 
Could you take a look at this patch when you have time? 

I knew first-pass and recirculating traffic share the same packet handling. It 
makes code common and maintainable. 
But if we can introduce some data-oriented and well-defined flag to bypass some 
time-consuming handling, it can improve performance a lot.

Best Regards,
Wei Yanqin

-Original Message-
From: Ben Pfaff  
Sent: Thursday, August 29, 2019 6:11 AM
To: Yanqin Wei (Arm Technology China) ; Ilya Maximets 

Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 

Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless 
optimization

This fails to apply to current master.

Whereas most of the time I'd be comfortable with reviewing 'flow'
patches myself, this one is closed related to the dpif-netdev code and I'd 
prefer to have someone who understands that code well, as well as the tradeoffs 
between performance and maintainability, review it.  Ilya (added to the To 
line) is a good choice.

On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
> "miniflow_extract" is a branch heavy implementation for packet header 
> and metadata parsing. There is a lot of meta data handling for all traffic.
> But this should not be applicable for packets from interface.
> This patch adds a layer of inline encapsulation to miniflow_extract 
> and introduces constant "md_valid" input parameter as a branch condition.
> The new branch will be removed by the compiler at compile time. Two 
> instances of miniflow_extract with different branches will be generated.
> 
> This patch is tested on an arm64 platform. It improves more than 3.5% 
> performance in P2P forwarding cases.
> 
> Reviewed-by: Gavin Hu 
> Signed-off-by: Yanqin Wei 
> ---
>  lib/dpif-netdev.c |  13 +++---
>  lib/flow.c| 116 
> --
>  lib/flow.h|   2 +
>  3 files changed, 79 insertions(+), 52 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
> d0a1c58..6686b14 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  }
>  
> -miniflow_extract(packet, >mf);
> +if (!md_is_valid) {
> +miniflow_extract_firstpass(packet, >mf);
> +key->hash =
> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
> +} else {
> +miniflow_extract(packet, >mf);
> +key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
> +}
>  key->len = 0; /* Not computed yet. */
> -key->hash =
> -(md_is_valid == false)
> -? dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf)
> -: dpif_netdev_packet_get_rss_hash(packet, >mf);
>  
>  /* If EMC is disabled skip emc_lookup */
>  flow = (cur_min != 0) ? emc_lookup(>emc_cache, key) : 
> NULL; diff --git a/lib/flow.c b/lib/flow.c index e54fd2e..e5b554b 
> 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -707,7 +707,8 @@ ipv6_sanity_check(const struct 
> ovs_16aligned_ip6_hdr *nh, size_t size)  }
>  
>  /* Initializes 'dst' from 'packet' and 'md', taking the packet type 
> into
> - * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
> + * account.  'dst' must have enough space for FLOW_U64S * 8 bytes. 
> + Metadata
> + * initialization should be bypassed if "md_valid" is false.
>   *
>   * Initializes the layer offsets as follows:
>   *
> @@ -732,8 +733,9 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
> size_t size)
>   *  present and the packet has at least the content used for the fields
>   *  of interest for the flow, otherwise UINT16_MAX.
>   */
> -void
> -miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
> +static inline ALWAYS_INLINE void
> +miniflow_extract__(struct dp_packet *packet, struct miniflow *dst,
> +const bool md_valid)
>  {
>  /* Add code to this function (or its callees) to extract new fields. */
>  BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41); @@ -752,54 +754,60 @@ 
> miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
>  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
>  
>  /* Metadata. */
> -if (flow_tnl_dst_is_set(>tunnel)) {
> -miniflow_push_words(mf, tunnel, >tunnel,
> -offsetof(struct flow_tnl, metadata) /
> -sizeof(uint64_t));
> -
> -if (!(md->tunnel.flags & FLOW_TNL_

Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-28 Thread Ben Pfaff
This fails to apply to current master.

Whereas most of the time I'd be comfortable with reviewing 'flow'
patches myself, this one is closed related to the dpif-netdev code and
I'd prefer to have someone who understands that code well, as well as
the tradeoffs between performance and maintainability, review it.  Ilya
(added to the To line) is a good choice.

On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
> "miniflow_extract" is a branch heavy implementation for packet header and
> metadata parsing. There is a lot of meta data handling for all traffic.
> But this should not be applicable for packets from interface.
> This patch adds a layer of inline encapsulation to miniflow_extract and
> introduces constant "md_valid" input parameter as a branch condition.
> The new branch will be removed by the compiler at compile time. Two
> instances of miniflow_extract with different branches will be generated.
> 
> This patch is tested on an arm64 platform. It improves more than 3.5%
> performance in P2P forwarding cases.
> 
> Reviewed-by: Gavin Hu 
> Signed-off-by: Yanqin Wei 
> ---
>  lib/dpif-netdev.c |  13 +++---
>  lib/flow.c| 116 
> --
>  lib/flow.h|   2 +
>  3 files changed, 79 insertions(+), 52 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d0a1c58..6686b14 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  }
>  
> -miniflow_extract(packet, >mf);
> +if (!md_is_valid) {
> +miniflow_extract_firstpass(packet, >mf);
> +key->hash =
> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
> +} else {
> +miniflow_extract(packet, >mf);
> +key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
> +}
>  key->len = 0; /* Not computed yet. */
> -key->hash =
> -(md_is_valid == false)
> -? dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf)
> -: dpif_netdev_packet_get_rss_hash(packet, >mf);
>  
>  /* If EMC is disabled skip emc_lookup */
>  flow = (cur_min != 0) ? emc_lookup(>emc_cache, key) : NULL;
> diff --git a/lib/flow.c b/lib/flow.c
> index e54fd2e..e5b554b 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -707,7 +707,8 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
> size_t size)
>  }
>  
>  /* Initializes 'dst' from 'packet' and 'md', taking the packet type into
> - * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
> + * account.  'dst' must have enough space for FLOW_U64S * 8 bytes. Metadata
> + * initialization should be bypassed if "md_valid" is false.
>   *
>   * Initializes the layer offsets as follows:
>   *
> @@ -732,8 +733,9 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
> size_t size)
>   *  present and the packet has at least the content used for the fields
>   *  of interest for the flow, otherwise UINT16_MAX.
>   */
> -void
> -miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
> +static inline ALWAYS_INLINE void
> +miniflow_extract__(struct dp_packet *packet, struct miniflow *dst,
> +const bool md_valid)
>  {
>  /* Add code to this function (or its callees) to extract new fields. */
>  BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
> @@ -752,54 +754,60 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
>  
>  /* Metadata. */
> -if (flow_tnl_dst_is_set(>tunnel)) {
> -miniflow_push_words(mf, tunnel, >tunnel,
> -offsetof(struct flow_tnl, metadata) /
> -sizeof(uint64_t));
> -
> -if (!(md->tunnel.flags & FLOW_TNL_F_UDPIF)) {
> -if (md->tunnel.metadata.present.map) {
> -miniflow_push_words(mf, tunnel.metadata, 
> >tunnel.metadata,
> -sizeof md->tunnel.metadata /
> -sizeof(uint64_t));
> -}
> -} else {
> -if (md->tunnel.metadata.present.len) {
> -miniflow_push_words(mf, tunnel.metadata.present,
> ->tunnel.metadata.present, 1);
> -miniflow_push_words(mf, tunnel.metadata.opts.gnv,
> -md->tunnel.metadata.opts.gnv,
> -
> DIV_ROUND_UP(md->tunnel.metadata.present.len,
> - sizeof(uint64_t)));
> +if (md_valid) {
> +if (flow_tnl_dst_is_set(>tunnel)) {
> +miniflow_push_words(mf, tunnel, >tunnel,
> +offsetof(struct flow_tnl, metadata) /
> +sizeof(uint64_t));
> +
> +if 

[ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-22 Thread Yanqin Wei
"miniflow_extract" is a branch heavy implementation for packet header and
metadata parsing. There is a lot of meta data handling for all traffic.
But this should not be applicable for packets from interface.
This patch adds a layer of inline encapsulation to miniflow_extract and
introduces constant "md_valid" input parameter as a branch condition.
The new branch will be removed by the compiler at compile time. Two
instances of miniflow_extract with different branches will be generated.

This patch is tested on an arm64 platform. It improves more than 3.5%
performance in P2P forwarding cases.

Reviewed-by: Gavin Hu 
Signed-off-by: Yanqin Wei 
---
 lib/dpif-netdev.c |  13 +++---
 lib/flow.c| 116 --
 lib/flow.h|   2 +
 3 files changed, 79 insertions(+), 52 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d0a1c58..6686b14 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 }
 
-miniflow_extract(packet, >mf);
+if (!md_is_valid) {
+miniflow_extract_firstpass(packet, >mf);
+key->hash =
+dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
+} else {
+miniflow_extract(packet, >mf);
+key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
+}
 key->len = 0; /* Not computed yet. */
-key->hash =
-(md_is_valid == false)
-? dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf)
-: dpif_netdev_packet_get_rss_hash(packet, >mf);
 
 /* If EMC is disabled skip emc_lookup */
 flow = (cur_min != 0) ? emc_lookup(>emc_cache, key) : NULL;
diff --git a/lib/flow.c b/lib/flow.c
index e54fd2e..e5b554b 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -707,7 +707,8 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
size_t size)
 }
 
 /* Initializes 'dst' from 'packet' and 'md', taking the packet type into
- * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
+ * account.  'dst' must have enough space for FLOW_U64S * 8 bytes. Metadata
+ * initialization should be bypassed if "md_valid" is false.
  *
  * Initializes the layer offsets as follows:
  *
@@ -732,8 +733,9 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
size_t size)
  *  present and the packet has at least the content used for the fields
  *  of interest for the flow, otherwise UINT16_MAX.
  */
-void
-miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
+static inline ALWAYS_INLINE void
+miniflow_extract__(struct dp_packet *packet, struct miniflow *dst,
+const bool md_valid)
 {
 /* Add code to this function (or its callees) to extract new fields. */
 BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
@@ -752,54 +754,60 @@ miniflow_extract(struct dp_packet *packet, struct 
miniflow *dst)
 ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
 
 /* Metadata. */
-if (flow_tnl_dst_is_set(>tunnel)) {
-miniflow_push_words(mf, tunnel, >tunnel,
-offsetof(struct flow_tnl, metadata) /
-sizeof(uint64_t));
-
-if (!(md->tunnel.flags & FLOW_TNL_F_UDPIF)) {
-if (md->tunnel.metadata.present.map) {
-miniflow_push_words(mf, tunnel.metadata, >tunnel.metadata,
-sizeof md->tunnel.metadata /
-sizeof(uint64_t));
-}
-} else {
-if (md->tunnel.metadata.present.len) {
-miniflow_push_words(mf, tunnel.metadata.present,
->tunnel.metadata.present, 1);
-miniflow_push_words(mf, tunnel.metadata.opts.gnv,
-md->tunnel.metadata.opts.gnv,
-
DIV_ROUND_UP(md->tunnel.metadata.present.len,
- sizeof(uint64_t)));
+if (md_valid) {
+if (flow_tnl_dst_is_set(>tunnel)) {
+miniflow_push_words(mf, tunnel, >tunnel,
+offsetof(struct flow_tnl, metadata) /
+sizeof(uint64_t));
+
+if (!(md->tunnel.flags & FLOW_TNL_F_UDPIF)) {
+if (md->tunnel.metadata.present.map) {
+miniflow_push_words(mf, tunnel.metadata,
+>tunnel.metadata,
+sizeof md->tunnel.metadata /
+sizeof(uint64_t));
+}
+} else {
+if (md->tunnel.metadata.present.len) {
+miniflow_push_words(mf, tunnel.metadata.present,
+>tunnel.metadata.present, 1);
+miniflow_push_words(mf,