On 2020/05/06 0:18, William Tu wrote:
On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
On 2020/05/04 1:22, William Tu wrote:
On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
This adds a reference program, flowtable_afxdp.o, which can be used to
offload flows to XDP through netdev-offload-xdp.
The program will be compiled when using --enable-bpf switch.

Signed-off-by: Toshiaki Makita <[email protected]>

Hi Toshiaki,

Thanks for your patch, I haven't tried to run it but I have
questions about miniflow.

---
...
+SEC("xdp") int
+flowtable_afxdp(struct xdp_md *ctx)
+{
+    struct xdp_miniflow *pkt_mf;
+    struct xdp_subtable_mask *subtable_mask;
+    int *head;
+    struct xdp_flow_actions *xdp_actions = NULL;
+    struct nlattr *a;
+    unsigned int left;
+    int cnt, idx, zero = 0;
+
+    account_debug(0);
+
+    head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
+    if (!head) {
+        return XDP_ABORTED;
+    }
+    if (*head == XDP_SUBTABLES_TAIL) {
+        /* Offload not enabled */
+        goto upcall;
+    }
+
+    /* Get temporary storage for storing packet miniflow */
+    pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
+    if (!pkt_mf) {
+        return XDP_ABORTED;
+    }
+

I start to wonder what's the benefit of using miniflow in XDP?
miniflow tries to compress the large flow key into smaller memory,
and with a flowmap indicating which offset in bits are valid.
And when adding a new subtable at flow_put, the subtable's key
size can be smaller with only the needed fields.

But in the case of XDP/eBPF, we have to statically allocated fixed
key size (the subtbl_template has key as struct xdp_flow_key), so
each subtable is always having key with full key fields. (not saving
memory space) Moreover with miniflow, it makes the 'mask_key' function
below pretty complicated.

Fixed sized subtable is restriction of map-in-map.
The benefit to use miniflow I envisioned initially was
- compress the key
- use existing function to convert flows for XDP

The first one is actually not doable due to map-in-map. I hope someday the
restriction gets loosen...

On my second thought this might be a good idea.

One problem I hit in my previous prototype is that the flow key is too big.
ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
some BPF limitation. If we continue adding more fields to struct xdp_flow,
ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory usage
in key is smaller. The key size in subtbl_template can be a value smaller the
the size of struct xdp_flow.

Nice idea!
So you mean we can have less sized subtbl_template key because when adding more and more keys, we probably don't use *all* of keys at the same time?

But when adding more fields, I'm not sure whether we will hit some limitation
of BPF at xdp_miniflow_extract.
So miniflow can save key size but complicate the key extraction.
Without miniflow, key size is large but key extract and match are simpler.
What do you think?

I cannot think of any limitation xdp_miniflow_extract will hit. Do you have insns limit in mind? My main concern for miniflow is insns limit in mask_key(), but if we can have smaller key size for subtbl_template then insns will be less too. Although I should check how large key is acceptable for mask_key, I guess miniflow is better considering we have more keys in the future.

The second one is doable and it's good for userspace offload driver.
But I ended up with rewriting most helper functions of miniflow for BPF due
to some restriction.

What if we don't compress the flow key?
unlike to OVS userspace, more like the kernel's implementation of OVS
megaflow (ex: masked_flow_lookup in net/openvswitch/flow_table.c)

Without miniflow the BPF program will be simple like the original xdp_flow
patch for Linux kernel.
I can try it.

Where it has a list of masks (like your 'subtbl_masks' map) but
only use 1 hash table (so we don't need map-in-map).
We lookup the hash table multiple times, each
time applied different masks in the mask list until finding a match.

For HW offload one big hash table may help. Fow software it will be slower
as key size and entries are larger.
Maybe we can support both types, one big hash table and multiple subtables,
in the future?


For one big hash table, do you mean adding another cache before megaflow
like EMC (exact match cache)?

No, I meant replacing combination of current flow_table and multiple subtbls with one big hash table, like ovs kernel module.

Toshiaki Makita
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to