On Thu, May 7, 2020 at 7:56 AM Toshiaki Makita
<[email protected]> wrote:
>
> On 2020/05/06 0:18, William Tu wrote:
> > On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> >> On 2020/05/04 1:22, William Tu wrote:
> >>> On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>>> This adds a reference program, flowtable_afxdp.o, which can be used to
> >>>> offload flows to XDP through netdev-offload-xdp.
> >>>> The program will be compiled when using --enable-bpf switch.
> >>>>
> >>>> Signed-off-by: Toshiaki Makita <[email protected]>
> >>>
> >>> Hi Toshiaki,
> >>>
> >>> Thanks for your patch, I haven't tried to run it but I have
> >>> questions about miniflow.
> >>>
> >>>> ---
> >> ...
> >>>> +SEC("xdp") int
> >>>> +flowtable_afxdp(struct xdp_md *ctx)
> >>>> +{
> >>>> + struct xdp_miniflow *pkt_mf;
> >>>> + struct xdp_subtable_mask *subtable_mask;
> >>>> + int *head;
> >>>> + struct xdp_flow_actions *xdp_actions = NULL;
> >>>> + struct nlattr *a;
> >>>> + unsigned int left;
> >>>> + int cnt, idx, zero = 0;
> >>>> +
> >>>> + account_debug(0);
> >>>> +
> >>>> + head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>>> + if (!head) {
> >>>> + return XDP_ABORTED;
> >>>> + }
> >>>> + if (*head == XDP_SUBTABLES_TAIL) {
> >>>> + /* Offload not enabled */
> >>>> + goto upcall;
> >>>> + }
> >>>> +
> >>>> + /* Get temporary storage for storing packet miniflow */
> >>>> + pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>>> + if (!pkt_mf) {
> >>>> + return XDP_ABORTED;
> >>>> + }
> >>>> +
> >>>
> >>> I start to wonder what's the benefit of using miniflow in XDP?
> >>> miniflow tries to compress the large flow key into smaller memory,
> >>> and with a flowmap indicating which offset in bits are valid.
> >>> And when adding a new subtable at flow_put, the subtable's key
> >>> size can be smaller with only the needed fields.
> >>>
> >>> But in the case of XDP/eBPF, we have to statically allocated fixed
> >>> key size (the subtbl_template has key as struct xdp_flow_key), so
> >>> each subtable is always having key with full key fields. (not saving
> >>> memory space) Moreover with miniflow, it makes the 'mask_key' function
> >>> below pretty complicated.
> >>
> >> Fixed sized subtable is restriction of map-in-map.
> >> The benefit to use miniflow I envisioned initially was
> >> - compress the key
> >> - use existing function to convert flows for XDP
> >>
> >> The first one is actually not doable due to map-in-map. I hope someday the
> >> restriction gets loosen...
> >
> > On my second thought this might be a good idea.
> >
> > One problem I hit in my previous prototype is that the flow key is too big.
> > ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
> > some BPF limitation. If we continue adding more fields to struct xdp_flow,
> > ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory
> > usage
> > in key is smaller. The key size in subtbl_template can be a value smaller
> > the
> > the size of struct xdp_flow.
>
> Nice idea!
> So you mean we can have less sized subtbl_template key because when adding
> more and
> more keys, we probably don't use *all* of keys at the same time?
Now I'm a little confused.
If you see 'struct flow' in ovs, all the fields have its own space
(ex: no union is used for
ipv4 and ipv6). So using miniflow extract saves space by only use
fields on the packet header.
But if you see 'struct sw_flow_key' in ovs kernel, same layer of
protocols are using union,
(ex: ipv4 and ipv6 are in a union). So no extra space is wasted.
So how we define the struct xdp_flow_key makes difference.
Or here is another idea.
We can also do on-demand parsing/key_extract here.
1) keep a list of key_fields = {empty}
2) when xdp_flow_put, we know which fields are needed to extract
ex: key_fields = {src_mac, dst_mac}
3) then at the xdp_miniflow_extract we can skip/return early after
the L2 fields are parsed.
>
> > But when adding more fields, I'm not sure whether we will hit some
> > limitation
> > of BPF at xdp_miniflow_extract.
> > So miniflow can save key size but complicate the key extraction.
> > Without miniflow, key size is large but key extract and match are simpler.
> > What do you think?
>
> I cannot think of any limitation xdp_miniflow_extract will hit. Do you have
> insns
> limit in mind?
yes.
> My main concern for miniflow is insns limit in mask_key(), but if we can have
> smaller key size for subtbl_template then insns will be less too.
> Although I should check how large key is acceptable for mask_key, I guess
> miniflow
> is better considering we have more keys in the future.
>
> >> The second one is doable and it's good for userspace offload driver.
> >> But I ended up with rewriting most helper functions of miniflow for BPF due
> >> to some restriction.
> >>
> >>> What if we don't compress the flow key?
> >>> unlike to OVS userspace, more like the kernel's implementation of OVS
> >>> megaflow (ex: masked_flow_lookup in net/openvswitch/flow_table.c)
> >>
> >> Without miniflow the BPF program will be simple like the original xdp_flow
> >> patch for Linux kernel.
> >> I can try it.
> >>
> >>> Where it has a list of masks (like your 'subtbl_masks' map) but
> >>> only use 1 hash table (so we don't need map-in-map).
> >>> We lookup the hash table multiple times, each
> >>> time applied different masks in the mask list until finding a match.
> >>
> >> For HW offload one big hash table may help. Fow software it will be slower
> >> as key size and entries are larger.
> >> Maybe we can support both types, one big hash table and multiple subtables,
> >> in the future?
> >>
> >
> > For one big hash table, do you mean adding another cache before megaflow
> > like EMC (exact match cache)?
>
> No, I meant replacing combination of current flow_table and multiple subtbls
> with
> one big hash table, like ovs kernel module.
I see, thanks
William
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev