On 11 Jul 2022, at 18:01, Kevin Sprague wrote:
> During normal operations, it is useful to understand when a particular flow > gets removed from the system. This can be useful when debugging performance > issues tied to ofproto flow changes, trying to determine deployed traffic > patterns, or while debugging dynamic systems where ports come and go. > > Prior to this change, there was a lack of visibility around flow expiration. > The existing debugging infrastructure could tell us when a flow was added to > the datapath, but not when it was removed or why. > > This change introduces a USDT probe at the point where the revalidator > determines that the flow should be removed. Additionally, we track the > reason for the flow eviction and provide that information as well. With > this change, we can track the complete flow lifecycle for the netlink datapath > by hooking the upcall tracepoint in kernel, the flow put USDT, and the > revaldiator USDT, letting us watch as flows are added and removed from the > kernel datapath. > > This change only enables this information via USDT probe, so it won't be > possible to access this information any other way (see: > Documentation/topics/usdt-probes.rst). > > Also included are two scripts (utilities/usdt-scripts/filter_probe.py and > utilities/usdt-scripts/watch_flows.bt) that serve as demonstrations of how > the new USDT probes might be used going forward. > > Signed-off-by: Kevin Sprague <[email protected]> See my comments below, and to continue I would fix the comments and remove the python script (as it seems to add no additional value in the current state). And submit this patch as a non-RFC version. Cheers, Eelco > --- > Documentation/topics/usdt-probes.rst | 20 ++ > ofproto/ofproto-dpif-upcall.c | 43 +++- > utilities/automake.mk | 5 +- > utilities/usdt-scripts/filter_probe.py | 301 +++++++++++++++++++++++++ > utilities/usdt-scripts/watch_flows.bt | 156 +++++++++++++ > 5 files changed, 517 insertions(+), 8 deletions(-) > create mode 100755 utilities/usdt-scripts/filter_probe.py > create mode 100755 utilities/usdt-scripts/watch_flows.bt > > diff --git a/Documentation/topics/usdt-probes.rst > b/Documentation/topics/usdt-probes.rst > index 7ce19aaed..ef4764684 100644 > --- a/Documentation/topics/usdt-probes.rst > +++ b/Documentation/topics/usdt-probes.rst > @@ -212,6 +212,7 @@ Available probes in ``ovs_vswitchd``: > - dpif_netlink_operate\_\_:op_flow_get > - dpif_netlink_operate\_\_:op_flow_put > - dpif_recv:recv_upcall > +- revalidate:flow_results Please keep the list in alphabetical order. > - main:poll_block > - main:run_start > > @@ -322,6 +323,25 @@ sent to ``ovs-vswitchd``. > - ``utilities/usdt-scripts/upcall_cost.py`` > - ``utilities/usdt-scripts/upcall_monitor.py`` > > +probe revalidate:flow_results I think the name should be flow_result, as it’s the result for a single flow. > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same here, add this section in alphabetical order of probe names. > + > +**Description**: > +This probe is triggered when the revalidator decides whether or not to > +revalidate a flow. ``reason`` is an enum that denotes that either the flow > +is being kept, or the reason why the flow is being deleted. The > +``filter_probe.py`` and ``watch_flows.bt`` scripts use this probe to notify > +users when flows matching user-provided criteria (to implement) are deleted. Remove the (to implement) section, as we can currently filter on the key attribute type. > +**Arguments**: > + > +- *arg0*: ``(enum) reason`` Include the enum type, so ``(enum flow_del_reason) reason`` > +- *arg1*: ``(struct udpif *) udpif`` > +- *arg2*: ``(struct udpif_key *) ukey`` > + > +**Script references**: > + > +- ``utilities/usdt-scripts/filter_probe.py`` > +- ``utilities/usdt-scripts/watch_flow.bt`` > > probe main:run_start > ~~~~~~~~~~~~~~~~~~~~ > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 57f94df54..42df2bbfd 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -31,6 +31,7 @@ > #include "openvswitch/list.h" > #include "netlink.h" > #include "openvswitch/ofpbuf.h" > +#include "openvswitch/usdt-probes.h" > #include "ofproto-dpif-ipfix.h" > #include "ofproto-dpif-sflow.h" > #include "ofproto-dpif-xlate.h" > @@ -260,6 +261,17 @@ enum ukey_state { > }; > #define N_UKEY_STATES (UKEY_DELETED + 1) > > +enum flow_del_reason { > + FLOW_LIVE = 0, > + FLOW_TIME_OUT, /* the flow went unused and was deleted. */ > + TOO_EXPENSIVE, > + FLOW_WILDCARDED, > + BAD_ODP_FIT, > + ASSOCIATED_OFPROTO, > + XLATION_ERROR, > + AVOID_CACHING, > +}; In OVS enum values seem to get some common denominator. So in your case, I would add FDR_ to all of the enums, so they are unique. FDR_FLOW_LIVE = 0, FDR_FLOW_TIME_OUT, /* the flow went unused and was deleted. */ FDR_TOO_EXPENSIVE, etc., etc. > + > /* 'udpif_key's are responsible for tracking the little bit of state udpif > * needs to do flow expiration which can't be pulled directly from the > * datapath. They may be created by any handler or revalidator thread at any > @@ -2202,7 +2214,8 @@ populate_xcache(struct udpif *udpif, struct udpif_key > *ukey, > static enum reval_result > revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey, > uint16_t tcp_flags, struct ofpbuf *odp_actions, > - struct recirc_refs *recircs, struct xlate_cache *xcache) > + struct recirc_refs *recircs, struct xlate_cache *xcache, > + enum flow_del_reason *reason) > { > struct xlate_out *xoutp; > struct netflow *netflow; > @@ -2215,16 +2228,20 @@ revalidate_ukey__(struct udpif *udpif, const struct > udpif_key *ukey, > .wc = &wc, > }; > > + int error; Add a CR here, and remove it above the definition, so all definitions are grouped. > result = UKEY_DELETE; > xoutp = NULL; > netflow = NULL; > > - if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) { > + error = xlate_ukey(udpif, ukey, tcp_flags, &ctx); > + if (error) { Actually, is error really needed here? Can we keep the original code? > + *reason = XLATION_ERROR; > goto exit; > } > xoutp = &ctx.xout; > > if (xoutp->avoid_caching) { > + *reason = AVOID_CACHING; > goto exit; > } > > @@ -2238,6 +2255,7 @@ revalidate_ukey__(struct udpif *udpif, const struct > udpif_key *ukey, > ofpbuf_clear(odp_actions); > > if (!ofproto) { > + *reason = ASSOCIATED_OFPROTO; > goto exit; > } > > @@ -2249,6 +2267,7 @@ revalidate_ukey__(struct udpif *udpif, const struct > udpif_key *ukey, > if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, &dp_mask, &ctx.flow, > NULL) > == ODP_FIT_ERROR) { > + *reason = BAD_ODP_FIT; > goto exit; > } > > @@ -2258,6 +2277,7 @@ revalidate_ukey__(struct udpif *udpif, const struct > udpif_key *ukey, > * down. Note that we do not know if the datapath has ignored any of the > * wildcarded bits, so we may be overly conservative here. */ > if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) { > + *reason = FLOW_WILDCARDED; > goto exit; > } > > @@ -2303,7 +2323,8 @@ static enum reval_result > revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, > const struct dpif_flow_stats *stats, > struct ofpbuf *odp_actions, uint64_t reval_seq, > - struct recirc_refs *recircs, bool offloaded) > + struct recirc_refs *recircs, bool offloaded, > + enum flow_del_reason *reason) > OVS_REQUIRES(ukey->mutex) > { > bool need_revalidate = ukey->reval_seq != reval_seq; > @@ -2329,8 +2350,11 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key > *ukey, > xlate_cache_clear(ukey->xcache); > } > result = revalidate_ukey__(udpif, ukey, push.tcp_flags, > - odp_actions, recircs, ukey->xcache); > - } /* else delete; too expensive to revalidate */ > + odp_actions, recircs, ukey->xcache, > + reason); > + } /* else delete; too expensive to revalidate */ else { > + *reason = TOO_EXPENSIVE; > + } This is ugly, can we move the comment to the else clause? } else { /* Delete flow as it's too expensive to revalidate. */ *reason = TOO_EXPENSIVE; } > } else if (!push.n_packets || ukey->xcache > || !populate_xcache(udpif, ukey, push.tcp_flags)) { > result = UKEY_KEEP; > @@ -2720,6 +2744,7 @@ revalidate(struct revalidator *revalidator) > struct recirc_refs recircs = RECIRC_REFS_EMPTY_INITIALIZER; > struct dpif_flow_stats stats = f->stats; > enum reval_result result; > + enum flow_del_reason reason = FLOW_LIVE; > struct udpif_key *ukey; > bool already_dumped; > int error; > @@ -2767,10 +2792,11 @@ revalidate(struct revalidator *revalidator) > } > if (kill_them_all || (used && used < now - max_idle)) { > result = UKEY_DELETE; > + reason = FLOW_TIME_OUT; > } else { > result = revalidate_ukey(udpif, ukey, &stats, &odp_actions, > reval_seq, &recircs, > - f->attrs.offloaded); > + f->attrs.offloaded, &reason); > } > ukey->dump_seq = dump_seq; > > @@ -2779,6 +2805,7 @@ revalidate(struct revalidator *revalidator) > udpif_update_flow_pps(udpif, ukey, f); > } > > + OVS_USDT_PROBE(revalidate, flow_results, reason, udpif, ukey); The name should be “flow_result” > if (result != UKEY_KEEP) { > /* Takes ownership of 'recircs'. */ > reval_op_init(&ops[n_ops++], result, udpif, ukey, &recircs, > @@ -2829,6 +2856,7 @@ revalidator_sweep__(struct revalidator *revalidator, > bool purge) > struct udpif_key *ukey; > struct umap *umap = &udpif->ukeys[i]; > size_t n_ops = 0; > + enum flow_del_reason reason = FLOW_LIVE; > > CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { > enum ukey_state ukey_state; > @@ -2855,7 +2883,8 @@ revalidator_sweep__(struct revalidator *revalidator, > bool purge) > COVERAGE_INC(revalidate_missed_dp_flow); > memset(&stats, 0, sizeof stats); > result = revalidate_ukey(udpif, ukey, &stats, > &odp_actions, > - reval_seq, &recircs, false); > + reval_seq, &recircs, false, > + &reason); > } > if (result != UKEY_KEEP) { > /* Clears 'recircs' if filled by revalidate_ukey(). */ > diff --git a/utilities/automake.mk b/utilities/automake.mk > index eb57653a1..e42950c49 100644 > --- a/utilities/automake.mk > +++ b/utilities/automake.mk > @@ -63,8 +63,10 @@ EXTRA_DIST += \ > utilities/docker/debian/Dockerfile \ > utilities/docker/debian/build-kernel-modules.sh \ > utilities/usdt-scripts/bridge_loop.bt \ > + utilities/usdt-scripts/filter_probe.py \ > utilities/usdt-scripts/upcall_cost.py \ > - utilities/usdt-scripts/upcall_monitor.py > + utilities/usdt-scripts/upcall_monitor.py \ > + utilities/usdt-scripts/watch_flows.bt > MAN_ROOTS += \ > utilities/ovs-testcontroller.8.in \ > utilities/ovs-dpctl.8.in \ > @@ -133,6 +135,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ > utilities/ovs-check-dead-ifs.in \ > utilities/ovs-tcpdump.in \ > utilities/ovs-pipegen.py \ > + utilities/usdt-scripts/filter_probe.py \ > utilities/usdt-scripts/upcall_monitor.py \ > utilities/usdt-scripts/upcall_cost.py > > diff --git a/utilities/usdt-scripts/filter_probe.py > b/utilities/usdt-scripts/filter_probe.py > new file mode 100755 > index 000000000..dfd393736 > --- /dev/null > +++ b/utilities/usdt-scripts/filter_probe.py > @@ -0,0 +1,301 @@ > +#!/usr/bin/env python3 Missing help, the scripts in this directory should include details instructions on how to use them, and there purpose. Please take a look at upcall_monitor.py and re-use it here. > +from bcc import BPF > +from bcc import USDT > +from bcc import USDTException > + > +import argparse > +import psutil > +import struct > +import sys > +import time > + > +# > +# eBPF source code > +# > +bpf_src = """ > +#include <linux/sched.h> > +#include <linux/types.h> > +#include <uapi/linux/ptrace.h> > + > +#define MAX_KEY 2048 > +#define FLOW_FILTER <FILTER_BOOL> > + > + > +enum probe { PUT, REVAL }; > +union u_ufid { > + u32 ufid32[4]; > + u64 ufid64[2]; > +}; Use the same names for structures as in OVS as this makes understanding the code easier. typedef union ovs_u128 ? <CR> > +struct netlink_attr { > + u16 len; > + u16 type; > +}; 100 struct nlattr { 101 uint16_t nla_len; 102 uint16_t nla_type; 103 }; <CR> > +struct flow_put { Use the same names for structures as in OVS as this makes understanding the code easier, dpif_flow_put > + int flags; > + u64 key_ptr; > + size_t key_len; > + u64 mask_ptr; > + size_t mask_len; > + u64 action_ptr;b > + size_t action_len; > + u64 ufid_loc; > +}; > +struct ukey { This struct is called “udpif_key” in OVS so maybe call it the same. > + u64 cmap_node; // ???????????????????? maybe? What does this mean? cmap node is a struct with a pointer. > + u64 key_ptr; This is a pointer, so void *key_ptr? You might need to fix this in all your structures. And make the verifier happy ;) I stopped reviewing the python script, as this script needs a lot of cleanups, and in the current state, it makes no sense to add, as it does exactly the same as the .bt script below. Or maybe I miss something that adding the help will clear up. Also for code styling (both C and Python) take a look at upcall_monitor.py and see what you can match. > + u64 key_len; > + u64 mask_ptr; > + u64 mask_len; > + union u_ufid ufid; > +}; > + > +struct event_t { > + u64 ts; > + u32 reason; > + u32 ufid[4]; As you have defined a union for this, why not use it here? > + u64 key_size; > + unsigned char key[MAX_KEY]; > + enum probe probe; > +}; > + > +BPF_HASH(watchlist, union u_ufid); > +BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_COUNT>); > + > +int watch_reval(struct pt_regs *ctx) { > + u64 *ufid_present = NULL; > + struct ukey u; > + bpf_usdt_readarg_p(3, ctx, &u, sizeof(struct ukey)); > + union u_ufid ufid = u.ufid; > + ufid_present = watchlist.lookup(&ufid); > + if(FLOW_FILTER && !ufid_present) > + return 0; // return, since this is not the droid we're looking for. > + struct event_t *data = events.ringbuf_reserve(sizeof(struct event_t)); > + /* If we can't reserve the space we need for the ring buffer, return 1 */ > + if(!data) > + return 1; > + data->probe = REVAL; > + data->ts = bpf_ktime_get_ns(); > + bpf_probe_read(&data->ufid, sizeof(ufid), &ufid); > + bpf_usdt_readarg(1, ctx, &data->reason); > + events.ringbuf_submit(data, 0); > + return 0; > +}; > + > + > +int watch_put(struct pt_regs *ctx) { > + struct event_t *data = events.ringbuf_reserve(sizeof(struct event_t)); > + struct flow_put f; > + struct netlink_attr nla; > + union u_ufid ufid; > + if(!data) > + return 1; > + data->probe = PUT; > + data->ts = bpf_ktime_get_ns(); > + bpf_usdt_readarg_p(2, ctx, &f, sizeof(struct flow_put)); > + bpf_probe_read(&data->ufid, sizeof(data->ufid), (void *) f.ufid_loc); > + bpf_probe_read(&ufid, sizeof(ufid), &data->ufid); // maybe a better way? > + if (f.key_len > MAX_KEY) // verifier fails without this check. > + f.key_len = MAX_KEY; > + data->key_size = f.key_len; > + bpf_probe_read(&data->key, f.key_len, (void*)f.key_ptr); > + watchlist.increment(ufid); > + data->reason = 0; > + events.ringbuf_submit(data, 0); > + return 0; > +}; > +""" > + > + > +def format_ufid(ufid): > + result = "ufid:%08x-%04x-%04x-%04x-%04x%08x" \ > + % (ufid[0], ufid[1] >> 16, ufid[1] & 0xffff, > + ufid[2] >> 16, ufid[2] & 0, ufid[3]) > + return result > + > + > +def find_and_delete_from_watchlist(event): > + for k, _ in b['watchlist'].items(): > + key_ufid = struct.unpack("=IIII", k) > + if key_ufid == tuple(event.ufid): > + key = (b['watchlist'].Key * 1)(k) > + b['watchlist'].items_delete_batch(key) > + break > + > + > +def handle_flow_put(event): > + if args.filter_flows is not None: > + key = decode_key(bytes(event.key)[:event.key_size]) > + # for each attribute that we're watching > + for attr in target: > + # if that attribute isn't in our current key > + if attr not in key: > + # find and delete matching key > + find_and_delete_from_watchlist(event) > + return > + print("At time: {:<18.9f} a flow with ufid: {} was upcalled". > + format(event.ts / 1000000000, format_ufid(event.ufid))) > + > + > +def print_expiration(event): > + ufid_str = format_ufid(event.ufid) > + reasons = ["flow timed out", "flow was too expensive", > + "flow wildcards", "bad odp fit", "associated ofproto", > + "translation error", "cache avoidance", "ERR"] > + print("At time: {:<18.9f} a flow with ufid: {} was deleted for reason: > {}". > + format(event.ts / 1000000000, ufid_str, reasons[event.reason - 1])) > + > + > +def decode_key(msg, dump=True): > + dump = args.print_flow_keys > + bytes_left = len(msg) > + result = {} > + while bytes_left: > + if bytes_left < 4: > + if dump: > + print("{}WARN: decode truncated; cannot read header".format( > + ' ' * 4)) > + break > + nla_len, nla_type = struct.unpack("=HH", msg[:4]) > + if nla_len < 4: > + if dump: > + print("{}WARN: decode truncated; nla_len < 4".format(' ' * > 4)) > + break > + nla_data = msg[4:nla_len] > + trunc = "" > + if nla_len > bytes_left: > + trunc = "..." > + nla_data = nla_data[:(bytes_left - 4)] > + else: > + result[get_ovs_key_attr_str(nla_type)] = nla_data > + if dump: > + print("{}nla_len {}, nla_type {}[{}], data: {}{}".format( > + ' ' * 4, nla_len, get_ovs_key_attr_str(nla_type), > + nla_type, > + "".join("{:02x} ".format(b) for b in nla_data), trunc)) > + if trunc != "": > + if dump: > + print("{}WARN: decode truncated; nla_len > msg_len[{}] ". > + format(" " * 4, bytes_left)) > + break > + next_offset = (nla_len + 3) & (~3) > + msg = msg[next_offset:] > + bytes_left -= next_offset > + return result > + > + > +def get_ovs_key_attr_str(attr): > + ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC", > + "OVS_KEY_ATTR_ENCAP", > + "OVS_KEY_ATTR_PRIORITY", > + "OVS_KEY_ATTR_IN_PORT", > + "OVS_KEY_ATTR_ETHERNET", > + "OVS_KEY_ATTR_VLAN", > + "OVS_KEY_ATTR_ETHERTYPE", > + "OVS_KEY_ATTR_IPV4", > + "OVS_KEY_ATTR_IPV6", > + "OVS_KEY_ATTR_TCP", > + "OVS_KEY_ATTR_UDP", > + "OVS_KEY_ATTR_ICMP", > + "OVS_KEY_ATTR_ICMPV6", > + "OVS_KEY_ATTR_ARP", > + "OVS_KEY_ATTR_ND", > + "OVS_KEY_ATTR_SKB_MARK", > + "OVS_KEY_ATTR_TUNNEL", > + "OVS_KEY_ATTR_SCTP", > + "OVS_KEY_ATTR_TCP_FLAGS", > + "OVS_KEY_ATTR_DP_HASH", > + "OVS_KEY_ATTR_RECIRC_ID", > + "OVS_KEY_ATTR_MPLS", > + "OVS_KEY_ATTR_CT_STATE", > + "OVS_KEY_ATTR_CT_ZONE", > + "OVS_KEY_ATTR_CT_MARK", > + "OVS_KEY_ATTR_CT_LABELS", > + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4", > + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6", > + "OVS_KEY_ATTR_NSH"] > + if attr < 0 or attr > len(ovs_key_attr): > + return "<UNKNOWN>" > + return ovs_key_attr[attr] > + > + > +def handle_event(ctx, data, size): > + """Determine the probe event and what to do about it. > + > + Once we grab the event, we have three cases. > + 1. It's a revalidator probe and the reason is nonzero: A flow is expiring > + 2. It's a revalidator probe and the reason is zero: flow revalidated > + 3. It's a flow_put probe. > + """ > + event = b["events"].event(data) > + if event.probe and event.reason: > + print_expiration(event) > + if not event.probe: > + handle_flow_put(event) > + > + > +def main(): > + global b > + global args > + # TODO(Kevin Sprague): Parser for user input flow attribute. > + global target > + target = ["OVS_KEY_ATTR_IPV4"] > + parser = argparse.ArgumentParser() > + parser.add_argument("--buffer-page-count", > + help="Number of BPF ring buffer pages, default 1024", > + type=int, default=1024, metavar="NUMBER") > + parser.add_argument("-k", "--print-flow-keys", > + help="Print flow keys captured?", > + type=bool, const=True, default=False, nargs="?") > + parser.add_argument("--pid", "-p", metavar="VSWITCHD_PID", > + help="ovs-vswitchd's PID", type=int, default=None) > + parser.add_argument("-D", "--debug", help="debug eBPF", > + type=int, const=0x3f, default=0, nargs="?") > + # right now, this is active if given a string, but it does nothing with > + # with the string. This should pass into a function that turns it to a > list > + # of attributes > + parser.add_argument("-f", "--filter-flows", > + help="Filter flows based on conditions (to > implement)", > + type=str, default=None, nargs="*") > + args = parser.parse_args() > + vswitch_pid = args.pid > + if vswitch_pid is None: > + for proc in psutil.process_iter(): > + if "ovs-vswitchd" in proc.name(): > + if vswitch_pid is not None: > + print("Error: Multiple ovs-vswitchd daemons running. " > + "Use the -p option to specify one to track.") > + sys.exit(-1) > + vswitch_pid = proc.pid > + if vswitch_pid is None: > + print("Error: is ovs-vswitchd running?") > + sys.exit(-1) > + u = USDT(pid=int(vswitch_pid)) > + try: > + u.enable_probe(probe="op_flow_put", fn_name="watch_put") > + except USDTException as e: > + print("Error attaching flow_put probe.") > + print(str(e)) > + sys.exit(-1) > + try: > + u.enable_probe(probe="flow_results", fn_name="watch_reval") > + except USDTException as e: > + print("Error attaching revalidator_deletion probe.") > + print(str(e)) > + sys.exit(-1) > + filter_bool = 1 if args.filter_flows is not None else 0 > + source = bpf_src.replace("<BUFFER_PAGE_COUNT>", > + str(args.buffer_page_count)) > + source = source.replace("<FILTER_BOOL>", str(filter_bool)) > + b = BPF(text=source, usdt_contexts=[u], debug=args.debug) > + b["events"].open_ring_buffer(handle_event) > + print("Watching for events") > + while 1: > + try: > + b.ring_buffer_poll() > + time.sleep(0.5) > + except KeyboardInterrupt: > + break > + > + > +if __name__ == "__main__": > + main() > diff --git a/utilities/usdt-scripts/watch_flows.bt > b/utilities/usdt-scripts/watch_flows.bt > new file mode 100755 > index 000000000..cb6adad49 > --- /dev/null > +++ b/utilities/usdt-scripts/watch_flows.bt > @@ -0,0 +1,156 @@ > +#!/usr/bin/env bpftrace > +/* > +* usage: sudo bpftrace -p $(pidof ovs-vswitchd) ./watch_flows.bt > +* The following traits can be watched for with the following arguments: > +* "OVS_KEY_ATTR_UNSPEC", 0 > +* "OVS_KEY_ATTR_ENCAP", 1 > +* "OVS_KEY_ATTR_PRIORITY", 2 > +* "OVS_KEY_ATTR_IN_PORT", 3 > +* "OVS_KEY_ATTR_ETHERNET", 4 > +* "OVS_KEY_ATTR_VLAN", 5 > +* "OVS_KEY_ATTR_ETHERTYPE", 6 > +* "OVS_KEY_ATTR_IPV4", 7 > +* "OVS_KEY_ATTR_IPV6", 8 > +* "OVS_KEY_ATTR_TCP", 9 > +* "OVS_KEY_ATTR_UDP", 10 > +* "OVS_KEY_ATTR_ICMP", 11 > +* "OVS_KEY_ATTR_ICMPV6", 12 > +* "OVS_KEY_ATTR_ARP", 13 > +* "OVS_KEY_ATTR_ND", 14 > +* "OVS_KEY_ATTR_SKB_MARK", 15 > +* "OVS_KEY_ATTR_TUNNEL", 16 > +* "OVS_KEY_ATTR_SCTP", 17 > +* "OVS_KEY_ATTR_TCP_FLAGS", 18 > +* "OVS_KEY_ATTR_DP_HASH", 19 > +* "OVS_KEY_ATTR_RECIRC_ID", 20 > +* "OVS_KEY_ATTR_MPLS", 21 > +* "OVS_KEY_ATTR_CT_STATE", 22 > +* "OVS_KEY_ATTR_CT_ZONE", 23 > +* "OVS_KEY_ATTR_CT_MARK", 24 > +* "OVS_KEY_ATTR_CT_LABELS", 25 > +* "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4", 26 > +* "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6", 27 > +* "OVS_KEY_ATTR_NSH" 28 > +* for example: sudo bpftrace -p $(pidof ovs-vswitchd) ./watch_flows.bt 9 will > +* track all flows that have OVS_KEY_ATTR_TCP. > +* In order to track all flows, do not give an argument after ./watch_flows.bt > +*/ For someone not knowing what this script is supposed to do, the help is a bit scarce. What about adding the following, which is more in line with the existing scripts: #!/usr/bin/env bpftrace /* * Copyright (c) 2022 Red Hat, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at: * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * * * Script information: * ------------------- * watch_flows.bt uses the USDTs probes to determine all flows put (sent) * the kernel, and filters out the ones containing a specific key attribute * (enum ovs_key_attr). For these entries, it will display the invalidate * reasons. * * The following is an example of how to use the script on the running * ovs-vswitchd process caputing all flow that have a OVS_KEY_ATTR_TCP * attribute key: * * # ./watch_flows.bt -p `pgrep -n ovs-vswitchd` 9 * -------------------------------------------------------------- * | Tracking flow lifecycles | * -------------------------------------------------------------- * Target traffic was spotted: ufid:208e36c4-9b28-4a5b-a4ed-d6585e18f6ca * Target traffic was spotted: ufid:82108fe4-34a4-4758-b871-91f68c2c0ae8 * ufid:208e36c4-9b28-4a5b-a4ed-d6585e18f6ca was invalidated because it timed out. * ufid:82108fe4-34a4-4758-b871-91f68c2c0ae8 was invalidated because it timed out. * * If you would like to trace all flow events, ommit the ovs_key_attr paramer. * * NOTE: The attribute values for the OVS_KET_ATTR_* can be found in * include/linux/openvswitch.h * */ > +#include <linux/sched.h> > + > +#define MAX_ATTRS 32 > +#define MAX_KEY 2048 > + > +union ufid { > + u32 ufid32[4]; > + u64 ufid64[2]; > +} > + > +struct flowput { > + u32 flags; > + u64 key_ptr; > + u64 key_len; > + u64 mask_ptr; > + u64 mask_len; > + u64 action_ptr; > + u64 action_len; > + u64 ufid_loc; > +} > + > +struct ukey { > + u64 cmap_node; > + u64 key_ptr; > + u64 key_len; > + u64 mask_ptr; > + u64 mask_len; > + union ufid ufid; > +} > + > +struct attr { > + u16 len; > + u16 type; > +} > + > +BEGIN > +{ > + > printf("--------------------------------------------------------------\n"); > + printf("| Tracking flow lifecycles > |\n"); > + > printf("--------------------------------------------------------------\n"); > +} > + > +usdt::dpif_netlink_operate__:op_flow_put > +{ > + /* > + * read the flow_put operation and get the ufid from it. Capital R for Read. > + * If the trait that we're watching is in the flow key, then I think you should change trait here to attribute. > + * increment the watchlist entry with a key equal to the ufid. > + */ > + $ptr = (struct flowput *) arg1; > + $pArr = (union ufid *) $ptr->ufid_loc; > + $num_attrs = (uint64) 0; > + $key_posn = (uint64) 0; > + while($num_attrs < MAX_ATTRS && $key_posn < MAX_KEY) { > + if((uint64) $key_posn >= $ptr->key_len) { > + break; > + } > + $pAttr = (struct attr *) ((uint8*) $ptr->key_ptr + $key_posn); > + if ($# == 0) { > + printf("Now watching flow:\t"); > + printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0], > + $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff, > + $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff, > + $pArr->ufid32[3]); > + @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++; > + break; > + } else if ($# > 0 && $pAttr->type == $1) { > + printf("Target traffic was spotted.\t"); Keep the format the same as before, something like: printf("Target traffic was spotted: \t”); I would also remove the \t and just add a single space to save on screen real estate (same for the previous “Now watching flow:”). > + printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0], > + $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff, > + $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff, > + $pArr->ufid32[3]); > + @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++; > + break; > + } We can de-duplicate traffic here, i.e. do we really need two different messages? I would assume the user know if he adds a filter option? Something like (not tested): if ($# == 0 || ($# > 0 && $pAttr->type == $1)) { printf("Now watching flow: "); printf("ufid:%08x-%04x-%04x-%04x-%04x%08x\n",$pArr->ufid32[0], $pArr->ufid32[1] >> 16, $pArr->ufid32[1] & 0xffff, $pArr->ufid32[2] >> 16, $pArr->ufid32[2] & 0xffff, $pArr->ufid32[3]); @watchlist[$pArr->ufid64[0],$pArr->ufid64[1]]++; break; } > + $num_attrs++; > + $key_posn = ($key_posn + $pAttr->len + 3) & (0xffffff ^ 3); > + } > +} > + > +usdt::revalidate:flow_results > +{ > + if (arg0 == 0) { > + return; > + } > + $ukey = (struct ukey *) arg2; > + if(!@watchlist[$ukey->ufid.ufid64[0],$ukey->ufid.ufid64[1]]) { > + return; > + } As the flow now is cleaned up, do we also want to remove if from our @watchlist? > + printf("ufid:%08x-%04x-%04x-%04x-%04x%08x was invalidated because ", > + $ukey->ufid.ufid32[0], > + $ukey->ufid.ufid32[1] >> 16, $ukey->ufid.ufid32[1] & 0xffff, > + $ukey->ufid.ufid32[2] >> 16, $ukey->ufid.ufid32[2] & 0xffff, > + $ukey->ufid.ufid32[3]); > + if (arg0 == 1) { > + printf("it timed out.\n"); > + } else if (arg0 == 2) { > + printf("it was too expensive to revalidate.\n"); > + } else if (arg0 == 3) { > + printf("there was a change in the openflow wildcards.\n"); > + } else if (arg0 == 4) { > + printf("the odp fit was bad.\n"); > + } else if (arg0 == 5) { > + printf("associated ofproto.\n"); > + } else if (arg0 == 6) { > + printf("there was an error translating the openflow rule to ovs.\n"); > + } else if (arg0 == 7) { > + printf("avoiding caching.\n"); > + } else { > + printf("The value of arg0 is %d\n",arg0); This does not create a good-looking sentence when issued. Maybe something like: printf(“of an unknown reason, %d\n",arg0); > + } > +} > + > +END > +{ > + clear(@watchlist); > + printf("\n"); > +} > -- _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
