> Previously if there is no available (non-isolated) pmd on the numa node
> for a port then the port is not polled at all. This can result in a non-
> operational system until such time as nics are physically repositioned. It
> is preferable to operate with a pmd on the 'wrong' numa node albeit with
> lower performance. Local pmds are still chosen when available.
>
> Signed-off-by: Billy O'Mahony <[email protected]>
Thanks for the patch Billy, few comments below.
> ---
> Documentation/intro/install/dpdk.rst | 10 ++++++++++
> lib/dpif-netdev.c | 36
> ++++++++++++++++++++++++++++++++----
> 2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index b947bd5..0e0b9f0 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -450,6 +450,16 @@ affinitized accordingly.
> pmd thread on a NUMA node is only created if there is at least one
> DPDK
> interface from that NUMA node added to OVS.
>
> + .. note::
> + On NUMA systems PCI devices are also local to a NUMA node. Rx queues
> for
> + PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-
> mask
> + has created a pmd thread on that NUMA node. If not the queue will be
> + assigned to a pmd on a remote NUMA node. This will result in reduced
> + maximum throughput on that device. In the case such a queue
> assingment
Typo in assignment.
> + is made a warning message will be logged: "There's no available
> + (non isolated) pmd thread on numa node N. Queue Q on port P will be
> assigned
> + to a pmd on numa node M. Expect reduced performance."
> +
> - QEMU vCPU thread Affinity
>
> A VM performing simple packet forwarding or running complex packet
> pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> a14a2eb..c6570ba 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int
> numa_id) }
>
> static void
> -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> + int *all_numa_ids, unsigned all_numa_ids_sz,
> + int *num_ids_written)
> {
> struct dp_netdev_pmd_thread *pmd;
> struct rr_numa *numa;
> + unsigned idx = 0;
>
> hmap_init(&rr->numas);
>
> @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct
> rr_numa_list *rr)
> numa->n_pmds++;
> numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa-
> >pmds);
> numa->pmds[numa->n_pmds - 1] = pmd;
> +
> + all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> + idx++;
> }
> + *num_ids_written = idx;
> }
>
> static struct dp_netdev_pmd_thread *
> @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex) {
> struct dp_netdev_port *port;
> struct rr_numa_list rr;
> + int all_numa_ids [64];
> + int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
> + unsigned all_numa_ids_idx = 0;
> + int all_numa_ids_max_idx = 0;
> + int num_numa_ids = 0;
>
> - rr_numa_list_populate(dp, &rr);
> + rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> + &num_numa_ids);
> + all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1);
>
> HMAP_FOR_EACH (port, node, &dp->ports) {
> struct rr_numa *numa;
> @@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)
> }
> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> if (!numa) {
> + if (all_numa_ids_max_idx < 0) {
> + VLOG_ERR("There are no pmd threads. "
> + "Is pmd-cpu-mask set to zero?");
The VLOG_ERR is a bit general and misleading. A user could set a PMD on a numa
node, add a DPDK device local to that node, isolate the PMD for that DPDK port,
then add a second DPDK port that is local to the same numa node. This will
trigger the ERR above.
Strictly speaking there is a PMD thread set on the numa node but it has been
isolated. I would change the message above to include isolation. I would also
add the name of the port searching for a PMD to the message so it is clear
which port could not find a thread. Something like below
if (all_numa_ids_max_idx < 0) {
VLOG_ERR("There are no available pmd threads for %s. "
"Is pmd-cpu-mask set to zero or isolated?",
netdev_get_name(port->netdev));
> + continue;
> + }
> VLOG_WARN("There's no available (non isolated) pmd
> thread "
> "on numa node %d. Queue %d on port \'%s\'
> will "
> - "not be polled.",
> - numa_id, qid, netdev_get_name(port-
> >netdev));
> + "be assigned to a pmd on numa node %d.
> Expect "
> + "reduced performance.",
> + numa_id, qid, netdev_get_name(port-
> >netdev),
> + all_numa_ids[all_numa_ids_idx]);
The warning above will identify queue 0 will be assigned to a pmd on the other
numa node.
However it's not just queue 0 that will be assigned but all the unpinned queues
associated with the port (see 'q->pmd = rr_numa_get_pmd(numa)' below).
I'm thinking the warning should reflect this so users don't think only queue 0
has been assigned to the non local numa pmd.
> + numa_id = all_numa_ids[all_numa_ids_idx];
> + numa = rr_numa_list_lookup(&rr, numa_id);
> + q->pmd = rr_numa_get_pmd(numa);
> + all_numa_ids_idx++;
> + if (all_numa_ids_idx > all_numa_ids_max_idx) {
> + all_numa_ids_idx = 0;
> + }
> } else {
> q->pmd = rr_numa_get_pmd(numa);
> }
> --
> 2.7.4
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev