Re: [PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler

2018-10-01 Thread Vinicius Costa Gomes
Hi,

Just a small correction, one link on the cover letter is wrong.

Vinicius Costa Gomes  writes:

[...]

>
>
> [1] https://patchwork.ozlabs.org/cover/938991/
>
> [2] https://patchwork.ozlabs.org/cover/808504/
>
> [3] github doesn't make it clear, but the gist can be cloned like this:
> $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f 
> taprio-test
>
> [4] https://github.com/vcgomes/linux/tree/taprio-v1

The correct link is:

[4] https://github.com/vcgomes/net-next

>
> [5] https://github.com/vcgomes/iproute2/tree/taprio-v1
>
>
> Vinicius Costa Gomes (1):
>   tc: Add support for configuring the taprio scheduler
>
>  include/uapi/linux/pkt_sched.h |  46 ++
>  net/sched/Kconfig  |  11 +
>  net/sched/Makefile |   1 +
>  net/sched/sch_taprio.c | 962 +
>  4 files changed, 1020 insertions(+)
>  create mode 100644 net/sched/sch_taprio.c
>
> -- 
> 2.19.0


Cheers,
--
Vinicius


[PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler

2018-09-28 Thread Vinicius Costa Gomes
Hi,

Changes from the RFC:
  - Moved some fields from the per-qdisc data structure to the per
schedule entry one, mainly "expires" (now called "close_time",
when an entry ends) and "budget" (how many bytes can be sent
during an entry);

  - Removed support for the schedule file, in favour of using iproute2
batch mode (only affects the iproute2 patches) (Jiri Pirko,
Stephen Hemminger);

  - Removed support for manually setting a cycle-time (it will be
added in a later series);


Original cover letter
=
(lightly edited, updated references and usage)


This series provides a set of interfaces that can be used by
applications that require (time-based) Scheduled Transmission of
packets. It is comprised by 3 new components to the kernel:

  - etf: the per-queue TxTime-Based scheduling qdisc;
  - taprio: the per-port Time-Aware scheduler qdisc;
  - SO_TXTIME: a socket option + cmsg APIs.

ETF and SO_TXTIME are already applied[1] into the net-next tree. This
is the remaining piece.

Overview


The CBS qdisc proposal RFC [2] included some rough ideas about the
design and API of a "taprio" (Time Aware Priority) qdisc. The idea of
presenting the taprio ideas at that point (almost one year ago!) was
to show our vision of how things would fit together going forward.
>From that concept stage to this (almost) realised stage the main
differences are:

  - As of now, taprio is a software only implementation of a schedule
executor;
  - Instead of taprio centralising all the time based decisions, taprio
can work together with ETF (the Earliest TxTime First), a qdisc
meant to use the LaunchTime (or similar) feature of various network
controllers;

In a nutshell, taprio is a root qdisc that can execute a pre-defined
schedule, etf is a qdisc that provides time based admission control
and "earliest deadline first" dequeue mode, and SO_TXTIME is a socket
option that is used for enabling a socket to be used for time-based
Tx and configuring its parameters.

taprio
==

This scheduler allows the network administrator to configure schedules
for classes of traffic, the configuration interface is similar to what
IEEE 802.1Q-2018 defines.

Example configuration:

$ tc qdisc add dev enp2s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
sched-entry S 01 30 \
sched-entry S 02 30 \
sched-entry S 04 30 \
base-time 1528743495910289987 \
clockid CLOCK_TAI

This qdisc borrows a few concepts from mqprio and so, most the
parameters are similar to mqprio. The main difference is the sequence of 
'sched-entry' parameters, that constitute one schedule:

  sched-entry S 01 30
  sched-entry S 02 30
  sched-entry S 04 30

The format of each entry is:
sched-entry   

The only supported  is "S", which means "SetGateStates",
following the IEEE 802.1Q-2018 definition (Table 8-7). 
is a bit-mask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry.  is a time duration
in nanoseconds that specifies for how long that state defined by 
and  should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:
  - base-time: allows that multiple systems can have synchronised
schedules, it specifies the instant when the schedule starts;
  - clockid: specifies the reference clock to be used;

A more complete example can be found here, with instructions of how to
test it:

https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [3]

The basic design of the scheduler is simple, after we calculate the
first expiration of the hrtimer, we set the next expiration to be the
previous plus the current entry's interval. At each time the function
runs, we set the current_entry, which has a gate_mask (that controls
which traffic classes are allowed to "go out" during each interval),
and we reuse this callback to "kick" the qdisc (this is the reason
that the usual qdisc watchdog isn't used).


Future work
===

  - Add support for multiple schedules, so something like the Admin
and Oper schedules from IEEE 802.1Q-2018 can be implemented,
probably "cycle-time" will be re-implemented at this time;

  - Add support for HW offloading;

  - Add support for Frame Preemption related commands (formerly
802.1Qbu, now part of 802.1Q);

Known Issues


  - As taprio is a software only implementation, and there's another
layer of queuing in the network controller, packets can still
leave the controller outside their "correct" windows. This happens
mostly for low-priority classes, and only if they are