On Fri, Aug 3, 2018 at 5:43 PM, Stefano Brivio <[email protected]> wrote:

> On Fri, 3 Aug 2018 16:01:08 -0700
> Ben Pfaff <[email protected]> wrote:
>
> > I think that a simple mechanism for fairness is fine.  The direction
> > of extensibility that makes me anxious is how to decide what matters
> > for fairness.  So far, we've talked about per-vport fairness.  That
> > works pretty well for packets coming in from virtual interfaces where
> > each vport represents a separate VM.
>
> Yes, right, that's the case where we have significant issues currently.
>
> > It does not work well if the traffic filling your queues all comes
> > from a single physical port because some source of traffic is sending
> > traffic at a high rate.  In that case, you'll do a lot better if you
> > do fairness based on the source 5-tuple. But if you're doing network
> > virtualization, then the outer source 5-tuples won't necessarily vary
> > much and you'd be better off looking at the VNI and maybe some Geneve
> > TLV options and maybe the inner 5-tuple...
>
> Sure, I see what you mean now. That looks entirely doable if we
> abstract the round-robin bucket selection out of the current patch.
>
> > I would be very pleased if we could integrate a simple mechanism for
> > fairness, based for now on some simple criteria like the source port,
> > but thinking ahead to how we could later make it gracefully extensible
> > to consider more general and possibly customizable criteria.
>
> We could change the patch so that instead of just using the vport for
> round-robin queue insertion, we generalise that and use "buckets"
> instead of vports, and have a set of possible functions that are called
> instead of using port_no directly in ovs_dp_upcall_queue_roundrobin(),
> making this configurable via netlink, per datapath.
>
> We could implement selection based on source port or a hash on the
> source 5-tuple, and the relevant bits of
> ovs_dp_upcall_queue_roundrobin() would look like this:
>
> static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp,
>                                           struct dp_upcall_info *upcall)
> {
>
> [...]
>
>         list_for_each_entry(pos, head, list) {
>                 int bucket = dp->rr_select(pos);
>
>                 /* Count per-bucket upcalls. */
>                 if (dp->upcalls.count[bucket] == U8_MAX) {
>                         err = -ENOSPC;
>                         goto out_clear;
>                 }
>                 dp->upcalls.count[bucket]++;
>
>                 if (bucket == upcall->bucket) {
>                         /* Another upcall for the same bucket: move
> insertion
>                          * point here, keep looking for insertion
> condition to
>                          * be still met further on.
>                          */
>                         find_next = true;
>                         here = pos;
>                         continue;
>                 }
>
>                 count = dp->upcalls.count[bucket];
>                 if (find_next && dp->upcalls.count[bucket] >= count) {
>                         /* Insertion condition met: no need to look
> further,
>                          * unless another upcall for the same port occurs
> later.
>                          */
>                         find_next = false;
>                         here = pos;
>                 }
>         }
>
> [...]
>
> }
>
> and implementations for dp->rr_select() would look like:
>
> int rr_select_vport(struct dp_upcall_info *upcall)
> {
>         return upcall->port_no;
> }
>
> int rr_select_srcport(struct dp_upcall_info *upcall)
> {
>         /* look up source port from upcall->skb... */
> }
>
> And we could then easily extend this to use BPF with maps one day.
>
>
Hi Stefano,

If you want to experiment with BPF, Joe and I have some prototype.
We implemented the upcall mechanism using BPF perf event helper function
https://github.com/williamtu/ovs-ebpf/blob/master/bpf/datapath.c#L62

And there are threads polling the perf ring buffer to receive packets from
BPF.
https://github.com/williamtu/ovs-ebpf/blob/master/lib/perf-event.c#L232

If I follow the discussion correctly, before upcall, you need to queue
packets based
on different configurations (vport/hash/vni/5-tuple/...) and queue to
different buckets
when congestion happens. In this case, you probably needs a BPF map to
enqueue/dequeue
the packet. BPF queue map is not supported yet, but there is patch
available:
[iovisor-dev] [RFC PATCH 1/3] bpf: add bpf queue map

So how to enqueue and dequeue packets depends on user's BPF implementation.
This allows fairness scheme to be extensible.

Regards,
William



This is for clarity by the way, but I guess we should avoid indirect
> calls in the final implementation.
>
> What do you think?
>
> --
> Stefano
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to