On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote: > On 2020/05/04 1:22, William Tu wrote: > >On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote: > >>This adds a reference program, flowtable_afxdp.o, which can be used to > >>offload flows to XDP through netdev-offload-xdp. > >>The program will be compiled when using --enable-bpf switch. > >> > >>Signed-off-by: Toshiaki Makita <[email protected]> > > > >Hi Toshiaki, > > > >Thanks for your patch, I haven't tried to run it but I have > >questions about miniflow. > > > >>--- > ... > >>+SEC("xdp") int > >>+flowtable_afxdp(struct xdp_md *ctx) > >>+{ > >>+ struct xdp_miniflow *pkt_mf; > >>+ struct xdp_subtable_mask *subtable_mask; > >>+ int *head; > >>+ struct xdp_flow_actions *xdp_actions = NULL; > >>+ struct nlattr *a; > >>+ unsigned int left; > >>+ int cnt, idx, zero = 0; > >>+ > >>+ account_debug(0); > >>+ > >>+ head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero); > >>+ if (!head) { > >>+ return XDP_ABORTED; > >>+ } > >>+ if (*head == XDP_SUBTABLES_TAIL) { > >>+ /* Offload not enabled */ > >>+ goto upcall; > >>+ } > >>+ > >>+ /* Get temporary storage for storing packet miniflow */ > >>+ pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero); > >>+ if (!pkt_mf) { > >>+ return XDP_ABORTED; > >>+ } > >>+ > > > >I start to wonder what's the benefit of using miniflow in XDP? > >miniflow tries to compress the large flow key into smaller memory, > >and with a flowmap indicating which offset in bits are valid. > >And when adding a new subtable at flow_put, the subtable's key > >size can be smaller with only the needed fields. > > > >But in the case of XDP/eBPF, we have to statically allocated fixed > >key size (the subtbl_template has key as struct xdp_flow_key), so > >each subtable is always having key with full key fields. (not saving > >memory space) Moreover with miniflow, it makes the 'mask_key' function > >below pretty complicated. > > Fixed sized subtable is restriction of map-in-map. > The benefit to use miniflow I envisioned initially was > - compress the key > - use existing function to convert flows for XDP > > The first one is actually not doable due to map-in-map. I hope someday the > restriction gets loosen...
On my second thought this might be a good idea. One problem I hit in my previous prototype is that the flow key is too big. ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf some BPF limitation. If we continue adding more fields to struct xdp_flow, ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory usage in key is smaller. The key size in subtbl_template can be a value smaller the the size of struct xdp_flow. But when adding more fields, I'm not sure whether we will hit some limitation of BPF at xdp_miniflow_extract. So miniflow can save key size but complicate the key extraction. Without miniflow, key size is large but key extract and match are simpler. What do you think? > > The second one is doable and it's good for userspace offload driver. > But I ended up with rewriting most helper functions of miniflow for BPF due > to some restriction. > > >What if we don't compress the flow key? > >unlike to OVS userspace, more like the kernel's implementation of OVS > >megaflow (ex: masked_flow_lookup in net/openvswitch/flow_table.c) > > Without miniflow the BPF program will be simple like the original xdp_flow > patch for Linux kernel. > I can try it. > > >Where it has a list of masks (like your 'subtbl_masks' map) but > >only use 1 hash table (so we don't need map-in-map). > >We lookup the hash table multiple times, each > >time applied different masks in the mask list until finding a match. > > For HW offload one big hash table may help. Fow software it will be slower > as key size and entries are larger. > Maybe we can support both types, one big hash table and multiple subtables, > in the future? > For one big hash table, do you mean adding another cache before megaflow like EMC (exact match cache)? William > As I don't have BPF offloadable NICs I prefer to keep current multiple > subtables for now. > > Toshiaki Makita > > > > >However, I'm not sure whether this will hit some limitations of verifier. > > > >Please correct me if I understand wrong here. > >Thanks > >William _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
