Re: [ovs-dev] [RFCv2] revalidator: add a USDT probe after evaluation when flows are deleted.

Eelco Chaudron Tue, 26 Jul 2022 05:21:36 -0700


On 11 Jul 2022, at 18:01, Kevin Sprague wrote:


> During normal operations, it is useful to understand when a particular flow
> gets removed from the system. This can be useful when debugging performance
> issues tied to ofproto flow changes, trying to determine deployed traffic
> patterns, or while debugging dynamic systems where ports come and go.
>
> Prior to this change, there was a lack of visibility around flow expiration.
> The existing debugging infrastructure could tell us when a flow was added to
> the datapath, but not when it was removed or why.
>
> This change introduces a USDT probe at the point where the revalidator
> determines that the flow should be removed.  Additionally, we track the
> reason for the flow eviction and provide that information as well.  With
> this change, we can track the complete flow lifecycle for the netlink datapath
> by hooking the upcall tracepoint in kernel, the flow put USDT, and the
> revaldiator USDT, letting us watch as flows are added and removed from the
> kernel datapath.
>
> This change only enables this information via USDT probe, so it won't be
> possible to access this information any other way (see:
> Documentation/topics/usdt-probes.rst).
>
> Also included are two scripts (utilities/usdt-scripts/filter_probe.py and
> utilities/usdt-scripts/watch_flows.bt) that serve as demonstrations of how
> the new USDT probes might be used going forward.
>
> Signed-off-by: Kevin Sprague <[email protected]>

See my comments below, and to continue I would fix the comments and remove the 
python script (as it seems to add no additional value in the current state). 
And submit this patch as a non-RFC version.

Cheers,

Eelco

> ---
>  Documentation/topics/usdt-probes.rst   |  20 ++
>  ofproto/ofproto-dpif-upcall.c          |  43 +++-
>  utilities/automake.mk                  |   5 +-
>  utilities/usdt-scripts/filter_probe.py | 301 +++++++++++++++++++++++++
>  utilities/usdt-scripts/watch_flows.bt  | 156 +++++++++++++
>  5 files changed, 517 insertions(+), 8 deletions(-)
>  create mode 100755 utilities/usdt-scripts/filter_probe.py
>  create mode 100755 utilities/usdt-scripts/watch_flows.bt
>
> diff --git a/Documentation/topics/usdt-probes.rst 
> b/Documentation/topics/usdt-probes.rst
> index 7ce19aaed..ef4764684 100644
> --- a/Documentation/topics/usdt-probes.rst
> +++ b/Documentation/topics/usdt-probes.rst
> @@ -212,6 +212,7 @@ Available probes in ``ovs_vswitchd``:
>  - dpif_netlink_operate\_\_:op_flow_get
>  - dpif_netlink_operate\_\_:op_flow_put
>  - dpif_recv:recv_upcall
> +- revalidate:flow_results

Please keep the list in alphabetical order.

>  - main:poll_block
>  - main:run_start
>
> @@ -322,6 +323,25 @@ sent to ``ovs-vswitchd``.
>  - ``utilities/usdt-scripts/upcall_cost.py``
>  - ``utilities/usdt-scripts/upcall_monitor.py``
>
> +probe revalidate:flow_results

I think the name should be flow_result, as it’s the result for a single flow.

> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Same here, add this section in alphabetical order of probe names.

> +
> +**Description**:
> +This probe is triggered when the revalidator decides whether or not to
> +revalidate a flow. ``reason`` is an enum that denotes that either the flow
> +is being kept, or the reason why the flow is being deleted. The
> +``filter_probe.py`` and ``watch_flows.bt`` scripts use this probe to notify
> +users when flows matching user-provided criteria (to implement) are deleted.

Remove the (to implement) section, as we can currently filter on the key 
attribute type.

> +**Arguments**:
> +
> +- *arg0*: ``(enum) reason``

Include the enum type, so ``(enum flow_del_reason) reason``

> +- *arg1*: ``(struct udpif *) udpif``
> +- *arg2*: ``(struct udpif_key *) ukey``
> +
> +**Script references**:
> +
> +- ``utilities/usdt-scripts/filter_probe.py``
> +- ``utilities/usdt-scripts/watch_flow.bt``
>
>  probe main:run_start
>  ~~~~~~~~~~~~~~~~~~~~
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 57f94df54..42df2bbfd 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -31,6 +31,7 @@
>  #include "openvswitch/list.h"
>  #include "netlink.h"
>  #include "openvswitch/ofpbuf.h"
> +#include "openvswitch/usdt-probes.h"
>  #include "ofproto-dpif-ipfix.h"
>  #include "ofproto-dpif-sflow.h"
>  #include "ofproto-dpif-xlate.h"
> @@ -260,6 +261,17 @@ enum ukey_state {
>  };
>  #define N_UKEY_STATES (UKEY_DELETED + 1)
>
> +enum flow_del_reason {
> +    FLOW_LIVE = 0,
> +    FLOW_TIME_OUT,      /* the flow went unused and was deleted. */
> +    TOO_EXPENSIVE,
> +    FLOW_WILDCARDED,
> +    BAD_ODP_FIT,
> +    ASSOCIATED_OFPROTO,
> +    XLATION_ERROR,
> +    AVOID_CACHING,
> +};

In OVS enum values seem to get some common denominator. So in your case, I 
would add FDR_ to all of the enums, so they are unique.

   FDR_FLOW_LIVE = 0,
   FDR_FLOW_TIME_OUT,      /* the flow went unused and was deleted. */
   FDR_TOO_EXPENSIVE,
   etc., etc.

> +
>  /* 'udpif_key's are responsible for tracking the little bit of state udpif
>   * needs to do flow expiration which can't be pulled directly from the
>   * datapath.  They may be created by any handler or revalidator thread at any
> @@ -2202,7 +2214,8 @@ populate_xcache(struct udpif *udpif, struct udpif_key 
> *ukey,
>  static enum reval_result
>  revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
>                    uint16_t tcp_flags, struct ofpbuf *odp_actions,
> -                  struct recirc_refs *recircs, struct xlate_cache *xcache)
> +                  struct recirc_refs *recircs, struct xlate_cache *xcache,
> +                  enum flow_del_reason *reason)
>  {
>      struct xlate_out *xoutp;
>      struct netflow *netflow;
> @@ -2215,16 +2228,20 @@ revalidate_ukey__(struct udpif *udpif, const struct 
> udpif_key *ukey,
>          .wc = &wc,
>      };
>
> +    int error;

Add a CR here, and remove it above the definition, so all definitions are 
grouped.

>      result = UKEY_DELETE;
>      xoutp = NULL;
>      netflow = NULL;
>
> -    if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
> +    error = xlate_ukey(udpif, ukey, tcp_flags, &ctx);
> +    if (error) {

Actually, is error really needed here? Can we keep the original code?

> +        *reason = XLATION_ERROR;
>          goto exit;
>      }
>      xoutp = &ctx.xout;
>
>      if (xoutp->avoid_caching) {
> +        *reason = AVOID_CACHING;
>          goto exit;
>      }
>
> @@ -2238,6 +2255,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
> udpif_key *ukey,
>          ofpbuf_clear(odp_actions);
>
>          if (!ofproto) {
> +            *reason = ASSOCIATED_OFPROTO;
>              goto exit;
>          }
>
> @@ -2249,6 +2267,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
> udpif_key *ukey,
>      if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, &dp_mask, &ctx.flow,
>                               NULL)
>          == ODP_FIT_ERROR) {
> +        *reason = BAD_ODP_FIT;
>          goto exit;
>      }
>
> @@ -2258,6 +2277,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
> udpif_key *ukey,
>       * down.  Note that we do not know if the datapath has ignored any of the
>       * wildcarded bits, so we may be overly conservative here. */
>      if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
> +        *reason = FLOW_WILDCARDED;
>          goto exit;
>      }
>
> @@ -2303,7 +2323,8 @@ static enum reval_result
>  revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>                  const struct dpif_flow_stats *stats,
>                  struct ofpbuf *odp_actions, uint64_t reval_seq,
> -                struct recirc_refs *recircs, bool offloaded)
> +                struct recirc_refs *recircs, bool offloaded,
> +                enum flow_del_reason *reason)
>      OVS_REQUIRES(ukey->mutex)
>  {
>      bool need_revalidate = ukey->reval_seq != reval_seq;
> @@ -2329,8 +2350,11 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
> *ukey,
>                  xlate_cache_clear(ukey->xcache);
>              }
>              result = revalidate_ukey__(udpif, ukey, push.tcp_flags,
> -                                       odp_actions, recircs, ukey->xcache);
> -        } /* else delete; too expensive to revalidate */
> +                                       odp_actions, recircs, ukey->xcache,
> +                                       reason);
> +        } /* else delete; too expensive to revalidate */ else {
> +           *reason = TOO_EXPENSIVE;
> +        }

This is ugly, can we move the comment to the else clause?

        } else {
           /* Delete flow as it's too expensive to revalidate. */
           *reason = TOO_EXPENSIVE;
        }


>      } else if (!push.n_packets || ukey->xcache
>                 || !populate_xcache(udpif, ukey, push.tcp_flags)) {
>          result = UKEY_KEEP;
> @@ -2720,6 +2744,7 @@ revalidate(struct revalidator *revalidator)
>              struct recirc_refs recircs = RECIRC_REFS_EMPTY_INITIALIZER;
>              struct dpif_flow_stats stats = f->stats;
>              enum reval_result result;
> +            enum flow_del_reason reason = FLOW_LIVE;
>              struct udpif_key *ukey;
>              bool already_dumped;
>              int error;
> @@ -2767,10 +2792,11 @@ revalidate(struct revalidator *revalidator)
>              }
>              if (kill_them_all || (used && used < now - max_idle)) {
>                  result = UKEY_DELETE;
> +                reason = FLOW_TIME_OUT;
>              } else {
>                  result = revalidate_ukey(udpif, ukey, &stats, &odp_actions,
>                                           reval_seq, &recircs,
> -                                         f->attrs.offloaded);
> +                                         f->attrs.offloaded, &reason);
>              }
>              ukey->dump_seq = dump_seq;
>
> @@ -2779,6 +2805,7 @@ revalidate(struct revalidator *revalidator)
>                  udpif_update_flow_pps(udpif, ukey, f);
>              }
>
> +            OVS_USDT_PROBE(revalidate, flow_results, reason, udpif, ukey);

The name should be “flow_result”

>              if (result != UKEY_KEEP) {
>                  /* Takes ownership of 'recircs'. */
>                  reval_op_init(&ops[n_ops++], result, udpif, ukey, &recircs,
> @@ -2829,6 +2856,7 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>          struct udpif_key *ukey;
>          struct umap *umap = &udpif->ukeys[i];
>          size_t n_ops = 0;
> +        enum flow_del_reason reason = FLOW_LIVE;
>
>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>              enum ukey_state ukey_state;
> @@ -2855,7 +2883,8 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>                      COVERAGE_INC(revalidate_missed_dp_flow);
>                      memset(&stats, 0, sizeof stats);
>                      result = revalidate_ukey(udpif, ukey, &stats, 
> &odp_actions,
> -                                             reval_seq, &recircs, false);
> +                                             reval_seq, &recircs, false,
> +                                             &reason);
>                  }
>                  if (result != UKEY_KEEP) {
>                      /* Clears 'recircs' if filled by revalidate_ukey(). */
> diff --git a/utilities/automake.mk b/utilities/automake.mk
> index eb57653a1..e42950c49 100644
> --- a/utilities/automake.mk
> +++ b/utilities/automake.mk
> @@ -63,8 +63,10 @@ EXTRA_DIST += \
>       utilities/docker/debian/Dockerfile \
>       utilities/docker/debian/build-kernel-modules.sh \
>       utilities/usdt-scripts/bridge_loop.bt \
> +     utilities/usdt-scripts/filter_probe.py \
>       utilities/usdt-scripts/upcall_cost.py \
> -     utilities/usdt-scripts/upcall_monitor.py
> +     utilities/usdt-scripts/upcall_monitor.py \
> +     utilities/usdt-scripts/watch_flows.bt
>  MAN_ROOTS += \
>       utilities/ovs-testcontroller.8.in \
>       utilities/ovs-dpctl.8.in \
> @@ -133,6 +135,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>       utilities/ovs-check-dead-ifs.in \
>       utilities/ovs-tcpdump.in \
>       utilities/ovs-pipegen.py \
> +     utilities/usdt-scripts/filter_probe.py \
>       utilities/usdt-scripts/upcall_monitor.py \
>       utilities/usdt-scripts/upcall_cost.py
>
> diff --git a/utilities/usdt-scripts/filter_probe.py 
> b/utilities/usdt-scripts/filter_probe.py
> new file mode 100755
> index 000000000..dfd393736
> --- /dev/null
> +++ b/utilities/usdt-scripts/filter_probe.py
> @@ -0,0 +1,301 @@
> +#!/usr/bin/env python3

Missing help, the scripts in this directory should include details instructions 
on how to use them, and there purpose.
Please take a look at upcall_monitor.py and re-use it here.

> +from bcc import BPF
> +from bcc import USDT
> +from bcc import USDTException
> +
> +import argparse
> +import psutil
> +import struct
> +import sys
> +import time
> +
> +#
> +# eBPF source code
> +#
> +bpf_src = """
> +#include <linux/sched.h>
> +#include <linux/types.h>
> +#include <uapi/linux/ptrace.h>
> +
> +#define MAX_KEY     2048
> +#define FLOW_FILTER  <FILTER_BOOL>
> +
> +
> +enum probe { PUT, REVAL };
> +union u_ufid {
> +    u32 ufid32[4];
> +    u64 ufid64[2];
> +};


Use the same names for structures as in OVS as this makes understanding the 
code easier.

typedef union ovs_u128 ?

<CR>

> +struct netlink_attr {
> +    u16 len;
> +    u16 type;
> +};

100  struct nlattr {
101      uint16_t nla_len;
102      uint16_t nla_type;
103  };

<CR>

> +struct flow_put {

Use the same names for structures as in OVS as this makes understanding the 
code easier, dpif_flow_put

> +    int flags;
> +    u64 key_ptr;
> +    size_t key_len;
> +    u64 mask_ptr;
> +    size_t mask_len;
> +    u64 action_ptr;b
> +    size_t action_len;
> +    u64 ufid_loc;
> +};
> +struct ukey {

This struct is called “udpif_key” in OVS so maybe call it the same.

> +    u64 cmap_node; // ???????????????????? maybe?

What does this mean? cmap node is a struct with a pointer.

> +    u64 key_ptr;

This is a pointer, so void *key_ptr? You might need to fix this in all your 
structures. And make the verifier happy ;)


I stopped reviewing the python script, as this script needs a lot of cleanups, 
and in the current state, it makes no sense to add, as it does exactly the same 
as the .bt script below.

Or maybe I miss something that adding the help will clear up. Also for code 
styling (both C and Python) take a look at upcall_monitor.py and see what you 
can match.

> +    u64 key_len;
> +    u64 mask_ptr;
> +    u64 mask_len;
> +    union u_ufid ufid;
> +};
> +
> +struct event_t {
> +    u64 ts;
> +    u32 reason;
> +    u32 ufid[4];

As you have defined a union for this, why not use it here?

> +    u64 key_size;
> +    unsigned char key[MAX_KEY];
> +    enum probe probe;
> +};
> +
> +BPF_HASH(watchlist, union u_ufid);
> +BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_COUNT>);
> +
> +int watch_reval(struct pt_regs *ctx) {
> +    u64 *ufid_present = NULL;
> +    struct ukey u;
> +    bpf_usdt_readarg_p(3, ctx, &u, sizeof(struct ukey));
> +    union u_ufid ufid = u.ufid;
> +    ufid_present = watchlist.lookup(&ufid);
> +    if(FLOW_FILTER && !ufid_present)
> +        return 0; // return, since this is not the droid we're looking for.
> +    struct event_t *data = events.ringbuf_reserve(sizeof(struct event_t));
> +    /* If we can't reserve the space we need for the ring buffer, return 1 */
> +    if(!data)
> +        return 1;
> +    data->probe = REVAL;
> +    data->ts = bpf_ktime_get_ns();
> +    bpf_probe_read(&data->ufid, sizeof(ufid), &ufid);
> +    bpf_usdt_readarg(1, ctx, &data->reason);
> +    events.ringbuf_submit(data, 0);
> +    return 0;
> +};
> +
> +
> +int watch_put(struct pt_regs *ctx) {
> +    struct event_t *data = events.ringbuf_reserve(sizeof(struct event_t));
> +    struct flow_put f;
> +    struct netlink_attr nla;
> +    union u_ufid ufid;
> +    if(!data)
> +        return 1;
> +    data->probe = PUT;
> +    data->ts = bpf_ktime_get_ns();
> +    bpf_usdt_readarg_p(2, ctx, &f, sizeof(struct flow_put));
> +    bpf_probe_read(&data->ufid, sizeof(data->ufid), (void *) f.ufid_loc);
> +    bpf_probe_read(&ufid, sizeof(ufid), &data->ufid); // maybe a better way?
> +    if (f.key_len > MAX_KEY) // verifier fails without this check.
> +        f.key_len = MAX_KEY;
> +    data->key_size = f.key_len;
> +    bpf_probe_read(&data->key, f.key_len, (void*)f.key_ptr);
> +    watchlist.increment(ufid);
> +    data->reason = 0;
> +    events.ringbuf_submit(data, 0);
> +    return 0;
> +};
> +"""
> +
> +
> +def format_ufid(ufid):
> +    result = "ufid:%08x-%04x-%04x-%04x-%04x%08x" \
> +             % (ufid[0], ufid[1] >> 16, ufid[1] & 0xffff,
> +                ufid[2] >> 16, ufid[2] & 0, ufid[3])
> +    return result
> +
> +
> +def find_and_delete_from_watchlist(event):
> +    for k, _ in b['watchlist'].items():
> +        key_ufid = struct.unpack("=IIII", k)
> +        if key_ufid == tuple(event.ufid):
> +            key = (b['watchlist'].Key * 1)(k)
> +            b['watchlist'].items_delete_batch(key)
> +            break
> +
> +
> +def handle_flow_put(event):
> +    if args.filter_flows is not None:
> +        key = decode_key(bytes(event.key)[:event.key_size])
> +        # for each attribute that we're watching
> +        for attr in target:
> +            # if that attribute isn't in our current key
> +            if attr not in key:
> +                # find and delete matching key
> +                find_and_delete_from_watchlist(event)
> +                return
> +    print("At time: {:<18.9f} a flow with ufid: {} was upcalled".
> +          format(event.ts / 1000000000, format_ufid(event.ufid)))
> +
> +
> +def print_expiration(event):
> +    ufid_str = format_ufid(event.ufid)
> +    reasons = ["flow timed out", "flow was too expensive",
> +               "flow wildcards", "bad odp fit", "associated ofproto",
> +               "translation error", "cache avoidance", "ERR"]
> +    print("At time: {:<18.9f} a flow with ufid: {} was deleted for reason: 
> {}".
> +          format(event.ts / 1000000000, ufid_str, reasons[event.reason - 1]))
> +
> +
> +def decode_key(msg, dump=True):
> +    dump = args.print_flow_keys
> +    bytes_left = len(msg)
> +    result = {}
> +    while bytes_left:
> +        if bytes_left < 4:
> +            if dump:
> +                print("{}WARN: decode truncated; cannot read header".format(
> +                    ' ' * 4))
> +            break
> +        nla_len, nla_type = struct.unpack("=HH", msg[:4])
> +        if nla_len < 4:
> +            if dump:
> +                print("{}WARN: decode truncated; nla_len < 4".format(' ' * 
> 4))
> +            break
> +        nla_data = msg[4:nla_len]
> +        trunc = ""
> +        if nla_len > bytes_left:
> +            trunc = "..."
> +            nla_data = nla_data[:(bytes_left - 4)]
> +        else:
> +            result[get_ovs_key_attr_str(nla_type)] = nla_data
> +        if dump:
> +            print("{}nla_len {}, nla_type {}[{}], data: {}{}".format(
> +                ' ' * 4, nla_len, get_ovs_key_attr_str(nla_type),
> +                nla_type,
> +                "".join("{:02x} ".format(b) for b in nla_data), trunc))
> +        if trunc != "":
> +            if dump:
> +                print("{}WARN: decode truncated; nla_len > msg_len[{}] ".
> +                      format(" " * 4, bytes_left))
> +            break
> +        next_offset = (nla_len + 3) & (~3)
> +        msg = msg[next_offset:]
> +        bytes_left -= next_offset
> +    return result
> +
> +
> +def get_ovs_key_attr_str(attr):
> +    ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC",
> +                    "OVS_KEY_ATTR_ENCAP",
> +                    "OVS_KEY_ATTR_PRIORITY",
> +                    "OVS_KEY_ATTR_IN_PORT",
> +                    "OVS_KEY_ATTR_ETHERNET",
> +                    "OVS_KEY_ATTR_VLAN",
> +                    "OVS_KEY_ATTR_ETHERTYPE",
> +                    "OVS_KEY_ATTR_IPV4",
> +                    "OVS_KEY_ATTR_IPV6",
> +                    "OVS_KEY_ATTR_TCP",
> +                    "OVS_KEY_ATTR_UDP",
> +                    "OVS_KEY_ATTR_ICMP",
> +                    "OVS_KEY_ATTR_ICMPV6",
> +                    "OVS_KEY_ATTR_ARP",
> +                    "OVS_KEY_ATTR_ND",
> +                    "OVS_KEY_ATTR_SKB_MARK",
> +                    "OVS_KEY_ATTR_TUNNEL",
> +                    "OVS_KEY_ATTR_SCTP",
> +                    "OVS_KEY_ATTR_TCP_FLAGS",
> +                    "OVS_KEY_ATTR_DP_HASH",
> +                    "OVS_KEY_ATTR_RECIRC_ID",
> +                    "OVS_KEY_ATTR_MPLS",
> +                    "OVS_KEY_ATTR_CT_STATE",
> +                    "OVS_KEY_ATTR_CT_ZONE",
> +                    "OVS_KEY_ATTR_CT_MARK",
> +                    "OVS_KEY_ATTR_CT_LABELS",
> +                    "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4",
> +                    "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6",
> +                    "OVS_KEY_ATTR_NSH"]
> +    if attr < 0 or attr > len(ovs_key_attr):
> +        return "<UNKNOWN>"
> +    return ovs_key_attr[attr]
> +
> +
> +def handle_event(ctx, data, size):
> +    """Determine the probe event and what to do about it.
> +
> +    Once we grab the event, we have three cases.
> +    1. It's a revalidator probe and the reason is nonzero: A flow is expiring
> +    2. It's a revalidator probe and the reason is zero: flow revalidated
> +    3. It's a flow_put probe.
> +    """
> +    event = b["events"].event(data)
> +    if event.probe and event.reason:
> +        print_expiration(event)
> +    if not event.probe:
> +        handle_flow_put(event)
> +
> +
> +def main():
> +    global b
> +    global args
> +    # TODO(Kevin Sprague): Parser for user input flow attribute.
> +    global target
> +    target = ["OVS_KEY_ATTR_IPV4"]
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument("--buffer-page-count",
> +                        help="Number of BPF ring buffer pages, default 1024",
> +                        type=int, default=1024, metavar="NUMBER")
> +    parser.add_argument("-k", "--print-flow-keys",
> +                        help="Print flow keys captured?",
> +                        type=bool, const=True, default=False, nargs="?")
> +    parser.add_argument("--pid", "-p", metavar="VSWITCHD_PID",
> +                        help="ovs-vswitchd's PID", type=int, default=None)
> +    parser.add_argument("-D", "--debug", help="debug eBPF",
> +                        type=int, const=0x3f, default=0, nargs="?")
> +    # right now, this is active if given a string, but it does nothing with
> +    # with the string. This should pass into a function that turns it to a 
> list
> +    # of attributes
> +    parser.add_argument("-f", "--filter-flows",
> +                        help="Filter flows based on conditions (to 
> implement)",
> +                        type=str, default=None, nargs="*")
> +    args = parser.parse_args()
> +    vswitch_pid = args.pid
> +    if vswitch_pid is None:
> +        for proc in psutil.process_iter():
> +            if "ovs-vswitchd" in proc.name():
> +                if vswitch_pid is not None:
> +                    print("Error: Multiple ovs-vswitchd daemons running. "
> +                          "Use the -p option to specify one to track.")
> +                    sys.exit(-1)
> +                vswitch_pid = proc.pid
> +    if vswitch_pid is None:
> +        print("Error: is ovs-vswitchd running?")
> +        sys.exit(-1)
> +    u = USDT(pid=int(vswitch_pid))
> +    try:
> +        u.enable_probe(probe="op_flow_put", fn_name="watch_put")
> +    except USDTException as e:
> +        print("Error attaching flow_put probe.")
> +        print(str(e))
> +        sys.exit(-1)
> +    try:
> +        u.enable_probe(probe="flow_results", fn_name="watch_reval")
> +    except USDTException as e:
> +        print("Error attaching revalidator_deletion probe.")
> +        print(str(e))
> +        sys.exit(-1)
> +    filter_bool = 1 if args.filter_flows is not None else 0
> +    source = bpf_src.replace("<BUFFER_PAGE_COUNT>",
> +                             str(args.buffer_page_count))
> +    source = source.replace("<FILTER_BOOL>", str(filter_bool))
> +    b = BPF(text=source, usdt_contexts=[u], debug=args.debug)
> +    b["events"].open_ring_buffer(handle_event)
> +    print("Watching for events")
> +    while 1:
> +        try:
> +            b.ring_buffer_poll()
> +            time.sleep(0.5)
> +        except KeyboardInterrupt:
> +            break
> +
> +
> +if __name__ == "__main__":
> +    main()
> diff --git a/utilities/usdt-scripts/watch_flows.bt 
> b/utilities/usdt-scripts/watch_flows.bt
> new file mode 100755
> index 000000000..cb6adad49
> --- /dev/null
> +++ b/utilities/usdt-scripts/watch_flows.bt
> @@ -0,0 +1,156 @@
> +#!/usr/bin/env bpftrace
> +/*
> +* usage: sudo bpftrace -p $(pidof ovs-vswitchd) ./watch_flows.bt
> +* The following traits can be watched for with the following arguments:
> +* "OVS_KEY_ATTR_UNSPEC",                     0
> +* "OVS_KEY_ATTR_ENCAP",                      1
> +* "OVS_KEY_ATTR_PRIORITY",                   2
> +* "OVS_KEY_ATTR_IN_PORT",                    3
> +* "OVS_KEY_ATTR_ETHERNET",                   4
> +* "OVS_KEY_ATTR_VLAN",                       5
> +* "OVS_KEY_ATTR_ETHERTYPE",                  6
> +* "OVS_KEY_ATTR_IPV4",                       7
> +* "OVS_KEY_ATTR_IPV6",                       8
> +* "OVS_KEY_ATTR_TCP",                        9
> +* "OVS_KEY_ATTR_UDP",                       10
> +* "OVS_KEY_ATTR_ICMP",                      11
> +* "OVS_KEY_ATTR_ICMPV6",                    12
> +* "OVS_KEY_ATTR_ARP",                       13
> +* "OVS_KEY_ATTR_ND",                        14
> +* "OVS_KEY_ATTR_SKB_MARK",                  15
> +* "OVS_KEY_ATTR_TUNNEL",                    16
> +* "OVS_KEY_ATTR_SCTP",                      17
> +* "OVS_KEY_ATTR_TCP_FLAGS",                 18
> +* "OVS_KEY_ATTR_DP_HASH",                   19
> +* "OVS_KEY_ATTR_RECIRC_ID",                 20
> +* "OVS_KEY_ATTR_MPLS",                      21
> +* "OVS_KEY_ATTR_CT_STATE",                  22
> +* "OVS_KEY_ATTR_CT_ZONE",                   23
> +* "OVS_KEY_ATTR_CT_MARK",                   24
> +* "OVS_KEY_ATTR_CT_LABELS",                 25
> +* "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4",        26
> +* "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6",        27
> +* "OVS_KEY_ATTR_NSH"                        28
> +* for example: sudo bpftrace -p $(pidof ovs-vswitchd) ./watch_flows.bt 9 will
> +* track all flows that have OVS_KEY_ATTR_TCP.
> +* In order to track all flows, do not give an argument after ./watch_flows.bt
> +*/

For someone not knowing what this script is supposed to do, the help is a bit 
scarce. What about adding the following, which is more in line with the 
existing scripts:


#!/usr/bin/env bpftrace
/*
 * Copyright (c) 2022 Red Hat, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at:
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 *
 * Script information:
 * -------------------
 * watch_flows.bt uses the USDTs probes to determine all flows put (sent)
 * the kernel, and filters out the ones containing a specific key attribute
 * (enum ovs_key_attr). For these entries, it will display the invalidate
 * reasons.
 *
 * The following is an example of how to use the script on the running
 * ovs-vswitchd process caputing all flow that have a OVS_KEY_ATTR_TCP
 * attribute key:
 *
 *   # ./watch_flows.bt -p `pgrep -n ovs-vswitchd` 9
 *   --------------------------------------------------------------
 *   |                   Tracking flow lifecycles                 |
 *   --------------------------------------------------------------
 *   Target traffic was spotted: ufid:208e36c4-9b28-4a5b-a4ed-d6585e18f6ca
 *   Target traffic was spotted: ufid:82108fe4-34a4-4758-b871-91f68c2c0ae8
 *   ufid:208e36c4-9b28-4a5b-a4ed-d6585e18f6ca was invalidated because it timed 
out.
 *   ufid:82108fe4-34a4-4758-b871-91f68c2c0ae8 was invalidated because it timed 
out.
 *
 * If you would like to trace all flow events, ommit the ovs_key_attr paramer.
 *
 * NOTE: The attribute values for the OVS_KET_ATTR_* can be found in
 *       include/linux/openvswitch.h
 *
 */


> +#include <linux/sched.h>
> +
> +#define MAX_ATTRS       32
> +#define MAX_KEY         2048
> +
> +union ufid {
> +    u32 ufid32[4];
> +    u64 ufid64[2];
> +}
> +
> +struct flowput {
> +    u32 flags;
> +    u64 key_ptr;
> +    u64 key_len;
> +    u64 mask_ptr;
> +    u64 mask_len;
> +    u64 action_ptr;
> +    u64 action_len;
> +    u64 ufid_loc;
> +}
> +
> +struct ukey {
> +    u64 cmap_node;
> +    u64 key_ptr;
> +    u64 key_len;
> +    u64 mask_ptr;
> +    u64 mask_len;
> +    union ufid ufid;
> +}
> +
> +struct attr {
> +    u16 len;
> +    u16 type;
> +}
> +
> +BEGIN
> +{
> +    
> printf("--------------------------------------------------------------\n");
> +    printf("|                   Tracking flow lifecycles                 
> |\n");
> +    
> printf("--------------------------------------------------------------\n");
> +}
> +
> +usdt::dpif_netlink_operate__:op_flow_put
> +{
> +    /*
> +     * read the flow_put operation and get the ufid from it.

Capital R for Read.

> +     * If the trait that we're watching is in the flow key, then

I think you should change trait here to attribute.

> +     * increment the watchlist entry with a key equal to the ufid.
> +     */
> +    $ptr = (struct flowput *) arg1;
> +    $pArr = (union ufid *) $ptr->ufid_loc;
> +    $num_attrs = (uint64) 0;
> +    $key_posn = (uint64) 0;
> +    while($num_attrs < MAX_ATTRS && $key_posn < MAX_KEY) {
> +        if((uint64) $key_posn >=  $ptr->key_len) {
> +            break;
> +        }
> +        $pAttr = (struct attr *) ((uint8*) $ptr->key_ptr + $key_posn);
> +        if ($# == 0) {
> +            printf("Now watching flow:\t");
> +            printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0],
> +                   $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff,
> +                   $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff,
> +                   $pArr->ufid32[3]);
> +            @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++;
> +            break;
> +        } else if ($# > 0 && $pAttr->type == $1) {
> +            printf("Target traffic was spotted.\t");

Keep the format the same as before, something like:

    printf("Target traffic was spotted: \t”);

I would also remove the \t and just add a single space to save on screen real 
estate (same for the previous “Now watching flow:”).

> +            printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0],
> +                   $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff,
> +                   $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff,
> +                   $pArr->ufid32[3]);
> +                @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++;
> +                break;
> +        }

We can de-duplicate traffic here, i.e. do we really need two different 
messages? I would assume the user know if he adds a filter option?
Something like (not tested):

if ($# == 0 || ($# > 0 && $pAttr->type == $1)) {
    printf("Now watching flow: ");
    printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0],
           $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff,
           $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff,
           $pArr->ufid32[3]);
    @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++;
    break;
}

> +        $num_attrs++;
> +        $key_posn = ($key_posn + $pAttr->len + 3) & (0xffffff ^ 3);
> +    }
> +}
> +
> +usdt::revalidate:flow_results
> +{
> +    if (arg0 == 0) {
> +        return;
> +    }
> +    $ukey = (struct ukey *) arg2;
> +    if(!@watchlist[$ukey->ufid.ufid64[0],$ukey->ufid.ufid64[1]]) {
> +        return;
> +    }

As the flow now is cleaned up, do we also want to remove if from our @watchlist?

> +    printf("ufid:%08x-%04x-%04x-%04x-%04x%08x was invalidated because ",
> +           $ukey->ufid.ufid32[0],
> +           $ukey->ufid.ufid32[1] >> 16, $ukey->ufid.ufid32[1] & 0xffff,
> +           $ukey->ufid.ufid32[2] >> 16, $ukey->ufid.ufid32[2] & 0xffff,
> +           $ukey->ufid.ufid32[3]);
> +    if (arg0 == 1)  {
> +        printf("it timed out.\n");
> +    } else if (arg0 == 2) {
> +        printf("it was too expensive to revalidate.\n");
> +    } else if (arg0 == 3) {
> +        printf("there was a change in the openflow wildcards.\n");
> +    } else if (arg0 == 4) {
> +        printf("the odp fit was bad.\n");
> +    } else if (arg0 == 5) {
> +        printf("associated ofproto.\n");
> +    } else if (arg0 == 6) {
> +        printf("there was an error translating the openflow rule to ovs.\n");
> +    } else if (arg0 == 7) {
> +        printf("avoiding caching.\n");
> +    } else {
> +        printf("The value of arg0 is %d\n",arg0);

This does not create a good-looking sentence when issued.
Maybe something like:
   printf(“of an unknown reason, %d\n",arg0);

> +    }
> +}
> +
> +END
> +{
> +    clear(@watchlist);
> +    printf("\n");
> +}
> -- 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFCv2] revalidator: add a USDT probe after evaluation when flows are deleted.

Reply via email to