> This continues the breakup of the huge DPDK "howto" into smaller
> components. There are a couple of related changes included, such as using
> "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> 
> We enable the TODO directive, so we can actually start calling out some
> TODOs.
> 
> Signed-off-by: Stephen Finucane <step...@that.guru>
> ---
>  Documentation/conf.py                    |   2 +-
>  Documentation/howto/dpdk.rst             |  86 -------------------
>  Documentation/topics/dpdk/index.rst      |   1 +
>  Documentation/topics/dpdk/phy.rst        |  10 +++
>  Documentation/topics/dpdk/pmd.rst        | 139
> +++++++++++++++++++++++++++++++
>  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
>  6 files changed, 159 insertions(+), 96 deletions(-)  create mode 100644
> Documentation/topics/dpdk/pmd.rst
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py index
> 6ab144c5d..babda21de 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
>  # Add any Sphinx extension module names here, as strings. They can be  #
> extensions coming with Sphinx (named 'sphinx.ext.*') or your custom  #
> ones.
> -extensions = []
> +extensions = ['sphinx.ext.todo']
> 
>  # Add any paths that contain templates here, relative to this directory.
>  templates_path = ['_templates']
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d717d2ebe..c2324118d 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
>      $ ovs-appctl -t ovsdb-server exit
>      $ ovs-vsctl del-br br0
> 
> -PMD Thread Statistics
> ----------------------
> -
> -To show current stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-show
> -
> -To clear previous stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> -
> -Port/RXQ Assigment to PMD Threads
> ----------------------------------
> -
> -To show port/rxq assignment::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -To change default rxq assignment to pmd threads, rxqs may be manually
> pinned to -desired cores using::
> -
> -    $ ovs-vsctl set Interface <iface> \
> -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> -
> -where:
> -
> -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> values
> -
> -For example::
> -
> -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> -
> -This will ensure:
> -
> -- Queue #0 pinned to core 3
> -- Queue #1 pinned to core 7
> -- Queue #2 not pinned
> -- Queue #3 pinned to core 8
> -
> -After that PMD threads on cores where RX queues was pinned will become -
> ``isolated``. This means that this thread will poll only pinned RX queues.
> -
> -.. warning::
> -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> will
> -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> any PMD
> -  thread.
> -
> -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> (cores) -automatically. The processing cycles that have been stored for
> each rxq -will be used where known to assign rxqs to pmd based on a round
> robin of the -sorted rxqs.
> -
> -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> 3,7,8) -available, and the measured usage of core cycles per rxq over the
> last -interval is seen to be:
> -
> -- Queue #0: 30%
> -- Queue #1: 80%
> -- Queue #3: 60%
> -- Queue #4: 70%
> -- Queue #5: 10%
> -
> -The rxqs will be assigned to cores 3,7,8 in the following order:
> -
> -Core 3: Q1 (80%) |
> -Core 7: Q4 (70%) | Q5 (10%)
> -core 8: Q3 (60%) | Q0 (30%)
> -
> -To see the current measured usage history of pmd core cycles for each
> rxq::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -.. note::
> -
> -  A history of one minute is recorded and shown for each rxq to allow for
> -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> traffic
> -  pattern or reconfig changes will take one minute before they are fully
> -  reflected in the stats.
> -
> -Rxq to pmds assignment takes place whenever there are configuration
> changes -or can be triggered by using::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> -
>  QoS
>  ---
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index 5f836a6e9..dfde88377 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -31,3 +31,4 @@ The DPDK Datapath
>     phy
>     vhost-user
>     ring
> +   pmd
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index 1c18e4e3d..222fa3e9f 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -109,3 +109,13 @@ tool::
>  For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
> 
>  .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> +
> +Multiqueue
> +----------
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath. Correct configuration of PMD threads and
> +the Rx queues they utilize is a requirement in order to deliver the
> +high-performance possible with the DPDK datapath. It is possible to
> +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is
> +not a bottleneck for performance. For information on configuring PMD
> threads, refer to :doc:`pmd`.
> diff --git a/Documentation/topics/dpdk/pmd.rst
> b/Documentation/topics/dpdk/pmd.rst
> new file mode 100644
> index 000000000..e15e8cc3b
> --- /dev/null
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -0,0 +1,139 @@
> +..
> +      Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +      not use this file except in compliance with the License. You may
> obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +      License for the specific language governing permissions and
> limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +===========
> +PMD Threads
> +===========
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath and perform tasks such as continuous
> +polling of input ports for packets, classifying packets once received,
> +and executing actions on the packets once they are classified.
> +
> +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> +automatically, Rx queues can be configured by the user. This can happen
> in one of two ways:

Just on above, could be a to-do but it's a good opportunity to add a note on 
the "automatic" behavior of tx queues, number created and how it relates to the 
number of PMDs etc. Could be a separate section in the PMD doc.

> +
> +- For physical interfaces, configuration is done using the
> +  :program:`ovs-appctl` utility.
> +
> +- For virtual interfaces, configuration is done using the
> +:program:`ovs-appctl`
> +  utility, but this configuration must be reflected in the guest
> +configuration
> +  (e.g. QEMU command line arguments).
> +
> +The :program:`ovs-appctl` utility also provides a number of commands
> +for querying PMD threads and their respective queues. This, and all of
> +the above, is discussed here.
> +
> +PMD Thread Statistics
> +---------------------
> +
> +To show current stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-show
> +
> +To clear previous stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> +
> +Port/Rx Queue Assigment to PMD Threads
> +--------------------------------------
> +
> +.. todo::
> +
> +   This needs a more detailed overview of *why* this should be done,
> along with
> +   the impact on things like NUMA affinity.
> +
> +To show port/RX queue assignment::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +Rx queues may be manually pinned to cores. This will change the default
> +Rx queue assignment to PMD threads::
> +
> +    $ ovs-vsctl set Interface <iface> \
> +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> +
> +where:
> +
> +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> +values
> +
> +For example::
> +
> +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> +
> +This will ensure there are *4* Rx queues and that these queues are
> +configured like so:
> +
> +- Queue #0 pinned to core 3
> +- Queue #1 pinned to core 7
> +- Queue #2 not pinned
> +- Queue #3 pinned to core 8
> +
> +PMD threads on cores where Rx queues are *pinned* will become
> +*isolated*. This means that this thread will only poll the *pinned* Rx
> queues.
> +
> +.. warning::
> +
> +  If there are no *non-isolated* PMD threads, *non-pinned* RX queues
> + will not  be polled. Also, if the provided ``<core-id>`` is not
> + available (e.g. the  ``<core-id>`` is not in ``pmd-cpu-mask``), the RX
> + queue will not be polled by  any PMD thread.
> +
> +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
> +to PMDs
> +(cores) automatically. Where known, the processing cycles that have
> +been stored for each Rx queue will be used to assign Rx queue to PMDs
> +based on a round robin of the sorted Rx queues. For example, take the
> +following example, where there are five Rx queues and three cores - 3,
> +7, and 8 - available and the measured usage of core cycles per Rx queue
> +over the last interval is seen to
> +be:
> +
> +- Queue #0: 30%
> +- Queue #1: 80%
> +- Queue #3: 60%
> +- Queue #4: 70%
> +- Queue #5: 10%
> +
> +The Rx queues will be assigned to the cores in the following order:
> +
> +Core 3: Q1 (80%) |
> +Core 7: Q4 (70%) | Q5 (10%)
> +core 8: Q3 (60%) | Q0 (30%)
> +

This functionality was introduced in OVS 2.8.
Do we need to warn the user with a versionchanged:: 2.8.0 and that it's 
unavailable prior to this?
The behavior in that case was round robin without taking processing cycles into 
consideration.
There would also be no history tracking for the stats and no pmd rebalance 
command.

> +To see the current measured usage history of PMD core cycles for each
> +Rx
> +queue::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +.. note::
> +
> +  A history of one minute is recorded and shown for each Rx queue to
> + allow for  traffic pattern spikes. Any changes in the Rx queue's PMD
> + core cycles usage,  due to traffic pattern or reconfig changes, will
> + take one minute to be fully  reflected in the stats.
> +
> +Rx queue to PMD assignment takes place whenever there are configuration
> +changes or can be triggered by using::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance

We should probably flag to users considerations for PMD and multi queue 
specific to phy and vhost ports.

Perhaps a link to the specific documents below along with the heads up:

Documentation/topics/dpdk/vhost-user.rst
Documentation/topics/dpdk/phy.rst

Ian

> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 95517a676..d84d99246 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -127,11 +127,10 @@ an additional set of parameters::
>      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
>      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> 
> -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> vhost-user
> -ports access a virtio-net device's virtual rings and packet buffers
> mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports
> to map the VM's -memory into their process address space, pass the
> following parameters to
> -QEMU::
> +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> +vhost-user ports access a virtio-net device's virtual rings and packet
> +buffers mapping the VM's physical memory on hugetlbfs. To enable
> +vhost-user ports to map the VM's memory into their process address space,
> pass the following parameters to QEMU::
> 
>      -object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,share=on
>      -numa node,memdev=mem -mem-prealloc @@ -151,18 +150,18 @@ where:
>    The number of vectors, which is ``$q`` * 2 + 2
> 
>  The vhost-user interface will be automatically reconfigured with required
> -number of rx and tx queues after connection of virtio device.  Manual
> +number of Rx and Tx queues after connection of virtio device.  Manual
>  configuration of ``n_rxq`` is not supported because OVS will work
> properly only  if ``n_rxq`` will match number of queues configured in
> QEMU.
> 
> -A least 2 PMDs should be configured for the vswitch when using
> multiqueue.
> +A least two PMDs should be configured for the vswitch when using
> multiqueue.
>  Using a single PMD will cause traffic to be enqueued to the same vhost
> queue  rather than being distributed among different vhost queues for a
> vhost-user  interface.
> 
>  If traffic destined for a VM configured with multiqueue arrives to the
> vswitch -via a physical DPDK port, then the number of rxqs should also be
> set to at -least 2 for that physical DPDK port. This is required to
> increase the
> +via a physical DPDK port, then the number of Rx queues should also be
> +set to at least two for that physical DPDK port. This is required to
> +increase the
>  probability that a different PMD will handle the multiqueue transmission
> to the  guest using a different vhost queue.
> 
> --
> 2.14.3
> 
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to