On 8/17/17, 6:48 AM, "[email protected] on behalf of Jan 
Scheurich" <[email protected] on behalf of 
[email protected]> wrote:

    Hi Darrell,
    
    Please find my specific comments below.
    
    I have a general concern for the pmd-stats-show counters: The output refers 
to the number of processed packets (e.g. processing cycles/pkt etc). But the 
data actually presented is the total number of *passes of packets through the 
datapath*. 
    
    In the early OVS days there was little difference, but with increasing 
amount of recirculations ( for tunnel decap, pop_mpls, bond selection, 
conntrack), nowadays each packet passes the datapath several times and the 
output becomes confusing for users, especially as it does not contain any 
indication of the average number of datapath passes per packet.
    
    I would strongly suggest that we add a new PMD stats counter for the 
actually received packets, which allows us to calculate and include the average 
number of passes per packet and true average processing cost per pkt. 
    
    What do you think?


[Darrell] This is an excellent proposal. We have had this level of 
recirculations for 3+ years, so it is nothing new, but better late to fix this 
than never.
                 Processing cycles is mainly useful for checking single flows 
and comparing relative numbers for code changes, however it is definitely 
                 better to use the actual packets received, rather than packet 
passes. 
                 I can do this enhancement/fix as part of this patch.
                 I will add the counter and go one step further and display 
both received packets and total recirculations. The processing cycles will be
                 based on received packets, as you pointed out.

    
    BR, Jan
    
    > The per packets stats are presently overlapping in that miss stats include
    > lost stats; make these stats non-overlapping for clarity and make this 
clear
    > in the dp_stat_type enum.  This also eliminates the need to increment two
    > 'miss' stats for a single packet.
    > 
    > The subtable lookup stats is renamed to make it clear that it relates to
    > masked lookups.
    > The stats that total to the number of packets seen are defined in
    > dp_stat_type and an api is created to total the stats in case these stats 
are
    > further divided in the future.
    > 
    > The pmd stats test is enhanced to include megaflow stats counting and
    > checking.
    > Also, miss and lost stats are annotated to make it clear what they mean.
    > 
    > Signed-off-by: Darrell Ball <[email protected]>
    > ---
    >  lib/dpif-netdev.c | 78 ++++++++++++++++++++++++++++++++++--------------
    > -------
    >  tests/pmd.at      | 31 +++++++++++++++++-----
    >  2 files changed, 74 insertions(+), 35 deletions(-)
    > 
    > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 17e1666..dfc6684
    > 100644
    > --- a/lib/dpif-netdev.c
    > +++ b/lib/dpif-netdev.c
    > @@ -323,12 +323,19 @@ static struct dp_netdev_port
    > *dp_netdev_lookup_port(const struct dp_netdev *dp,
    >      OVS_REQUIRES(dp->port_mutex);
    > 
    >  enum dp_stat_type {
    > -    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc).
    > */
    > -    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow 
table.
    > */
    > -    DP_STAT_MISS,               /* Packets that did not match. */
    > -    DP_STAT_LOST,               /* Packets not passed up to the client. 
*/
    > -    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow
    > table
    > -                                   hits */
    > +    DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
    > +    DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
    > +    DP_STAT_MISS,       /* Packets that did not match and upcall was
    > +                           done. */
    > +    DP_STAT_LOST,       /* Packets that did not match and upcall was
    > +                           not done. */
    
    If I understand the implementation correctly, this counts packets for which 
an upcall was done but failed so that the packet was dropped. There are no 
packets with a miss for which we currently do not do an upcall. Why do you 
suggest to change the meaning of the counter?


[Darrell] The usage now is:

miss: means an upcall was actually done; meaning we processed at the ofproto 
layer successfully. This brings this counter in sync with the counter in
the ofproto upcall code itself (n_upcalls++;), which counts ‘successful 
upcalls’. This is also what the user expects by OVS “miss upcall”; the user 
expects
a miss upcall to mean the packet was processed by the ofproto layer; the user 
really does not care that an ‘attempt was made’ to process at the ofproto layer.

lost: now means specifically a failed upcall; I actually had no idea what it 
might mean (based on the o/p of pmd-stats-show) before I looked at the actual 
code,
so, I also made this clear in the o/p.

The 2 numbers are non-overlapping now, which is an added bonus.

    
    > +    DP_N_PER_PKT_CNT,   /* The above statistics account for the total
    > +                           number of packets seen and should not be
    > +                           overlapping with each other. */
    
    I find this definition very obscure.

[Darrell] yeah, I agree; I had trouble with the naming as well and was not 
happy with it either.

 I think it was much clearer (and less likely to break again in the future) to 
explicitly sum up the above four non-overlapping counters to obtain the total 
number of packets as suggested by Ilya. It would be good to keep the comment, 
though.

[Darrell] This refers to the next comment below; let me add my response there 
instead.


    > +    DP_STAT_MASKED_LOOKUP = DP_N_PER_PKT_CNT,  /* Number of
    > subtable lookups
    > +                                                  for flow table hits. 
Each
    > +                                                  MASKED_HIT hit will 
have
    > +                                                  >= 1 MASKED_LOOKUP(s). 
*/
    >      DP_N_STATS
    >  };
    
    +1 for the clarification of the comment.
    
    > +static unsigned long long
    > +dp_netdev_calcul_total_packets(unsigned long long *stats) {
    > +    unsigned long long total_packets = 0;
    > +    for (uint8_t i = 0; i < DP_N_PER_PKT_CNT; i++) {
    > +        total_packets += stats[i];
    > +    }
    > +    return total_packets;
    > +}
    
    Please compute the sum explicitly. Much clearer and better maintainable.


[Darrell] The idea here is that if the number of counters change, then the sum 
is automatically adjusted – meaning this
                 code would not need to be maintained at all. We do this in 
many places in OVS code.

                 However, this now becomes a moot point. Since this number is 
packet passes and we want packets received,
                 as discussed above, which will be a single counter, this 
function is no longer needed.


    
    >      ds_put_format(reply,
    >                    "\temc hits:%llu\n\tmegaflow hits:%llu\n"
    > -                  "\tavg. subtable lookups per hit:%.2f\n"
    > -                  "\tmiss:%llu\n\tlost:%llu\n",
    > +                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
    > +                  "\tmiss(upcall done):%llu\n\tlost(upcall not
    > + done):%llu\n",
    >                    stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
    >                    stats[DP_STAT_MASKED_HIT] > 0
    > -                  ?
    > (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
    > +                  ? (1.0 * stats[DP_STAT_MASKED_LOOKUP])
    > +                     / stats[DP_STAT_MASKED_HIT]
    >                    : 0,
    >                    stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
    
    We should not lightly break backward compatibility in the output. There are 
plenty of scripts out there in test labs and production deployments that parse 
this output for performance monitoring/trouble-shooting purposes.
    
    It is perhaps OK to add new lines, but please do not unnecessarily change 
existing lines!

[Darrell] I had considered the impact on scripts and also the relative newness 
of pmd-stats-show.
                The scripts that I know look for a keyword, which all remain 
the same and a delimiter, in this case, a colon.
                 Adding additional lines makes the o/p longer and less succinct 
and in future, the keywords will change anyways.
                 Do you really think that if the keywords remain the same and 
the delimiter remains the same, that scripts will generally break ?
                 If so, I can add a legend instead; it would not be ideal, but 
would avoid the discussion about which scripts would break, whether scripts
                 should already handle this and so on…. 


    
    > -static inline void
    > +static inline int
    >  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
    >                       struct dp_packet *packet,
    >                       const struct netdev_flow_key *key,
    >                       struct ofpbuf *actions, struct ofpbuf *put_actions,
    > -                     int *lost_cnt, long long now)
    > +                     int *miss_no_upcall, long long now)
    >  {
    >      struct ofpbuf *add_actions;
    >      struct dp_packet_batch b;
    >      struct match match;
    >      ovs_u128 ufid;
    > -    int error;
    > +    int error = 0;
    > 
    >      match.tun_md.valid = false;
    >      miniflow_expand(&key->mf, &match.flow); @@ -4749,8 +4765,8 @@
    > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
    >                               put_actions);
    >      if (OVS_UNLIKELY(error && error != ENOSPC)) {
    >          dp_packet_delete(packet);
    > -        (*lost_cnt)++;
    > -        return;
    > +        (*miss_no_upcall)++;
    > +        return error;
    >      }
    
    If you return the result of the upcall explicitly, there is no need to also 
increase the lost_cnt passed by reference. Better step the appropriate counter 
in the calling function.

[Darrell] Sure


    
    > 
    >      /* The Netlink encoding of datapath flow keys cannot express @@ -
    > 4790,6 +4806,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread
    > *pmd,
    >          ovs_mutex_unlock(&pmd->flow_mutex);
    >          emc_probabilistic_insert(pmd, key, netdev_flow);
    >      }
    > +    return error;
    >  }
    > 
    >  static inline void
    > @@ -4811,7 +4828,7 @@ fast_path_processing(struct
    > dp_netdev_pmd_thread *pmd,
    >      struct dpcls *cls;
    >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
    >      struct dp_netdev *dp = pmd->dp;
    > -    int miss_cnt = 0, lost_cnt = 0;
    > +    int miss_upcall_cnt = 0, miss_no_upcall_cnt = 0;
    
    Better upcall_ok_cnt and upcall_drop_cnt?

[Darrell] Sure; these counter names are better
    
    > @@ -4853,9 +4870,12 @@ fast_path_processing(struct
    > dp_netdev_pmd_thread *pmd,
    >                  continue;
    >              }
    > 
    > -            miss_cnt++;
    > -            handle_packet_upcall(pmd, packets[i], &keys[i], &actions,
    > -                                 &put_actions, &lost_cnt, now);
    > +            int error = handle_packet_upcall(pmd, packets[i], &keys[i],
    > +                           &actions, &put_actions, &miss_no_upcall_cnt,
    > + now);
    > +
    > +            if (OVS_LIKELY(!error)) {
    > +                miss_upcall_cnt++;
    > +            }
    
    Better:
    if (OVS_LIKELY(!error)) {
          upcall_ok_cnt++;
    } else {
         upcall_drop_cnt++;
    }

[Darrell] sure


    
    >              if (OVS_UNLIKELY(!rules[i])) {
    >                  dp_packet_delete(packets[i]);
    > -                lost_cnt++;
    > -                miss_cnt++;
    > +                miss_no_upcall_cnt++;
    
    upcall_drop_cnt++;

[Darrell] This is a repeated comment about the counter name.

    
    >              }
    >          }
    >      }
    > @@ -4885,10 +4904,11 @@ fast_path_processing(struct
    > dp_netdev_pmd_thread *pmd,
    >          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
    > n_batches);
    >      }
    > 
    > -    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
    > -    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
    > -    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
    > -    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
    > +    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt -
    > miss_upcall_cnt -
    > +                           miss_no_upcall_cnt);
    > +    dp_netdev_count_packet(pmd, DP_STAT_MASKED_LOOKUP,
    > lookup_cnt);
    > +    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_upcall_cnt);
    > +    dp_netdev_count_packet(pmd, DP_STAT_LOST, miss_no_upcall_cnt);
    >  }
    
    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - upcall_ok_cnt - 
upcall_drop_cnt);
    dp_netdev_count_packet(pmd, DP_STAT_MISS, upcall_ok_cnt);
    dp_netdev_count_packet(pmd, DP_STAT_LOST, upcall_drop_cnt);


[Darrell] This is a repeated comment about the counter name.

    
    _______________________________________________
    dev mailing list
    [email protected]
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=Lv3Gpn8nXejG_K9ooDT-_u4KUJveZKC9Vgjopu6O8mk&s=_ZBrNbm1KWhHi3FMz-keJb2ZIHgI9iz1wRx_jWUtzIU&e=
 
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to