> Previously if there is no available (non-isolated) pmd on the numa node
> for a port then the port is not polled at all. This can result in a non-
> operational system until such time as nics are physically repositioned. It
> is preferable to operate with a pmd on the 'wrong' numa node albeit with
> lower performance. Local pmds are still chosen when available.
> 
> Signed-off-by: Billy O'Mahony <[email protected]>

Thanks for the patch Billy, few comments below.

> ---
>  Documentation/intro/install/dpdk.rst | 10 ++++++++++
>  lib/dpif-netdev.c                    | 36
> ++++++++++++++++++++++++++++++++----
>  2 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index b947bd5..0e0b9f0 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -450,6 +450,16 @@ affinitized accordingly.
>      pmd thread on a NUMA node is only created if there is at least one
> DPDK
>      interface from that NUMA node added to OVS.
> 
> +  .. note::
> +   On NUMA systems PCI devices are also local to a NUMA node.  Rx queues
> for
> +   PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-
> mask
> +   has created a pmd thread on that NUMA node.  If not the queue will be
> +   assigned to a pmd on a remote NUMA node.  This will result in reduced
> +   maximum throughput on that device.  In the case such a queue
> assingment

Typo in assignment.

> +   is made a warning message will be logged: "There's no available
> +   (non isolated) pmd thread on numa node N. Queue Q on port P will be
> assigned
> +   to a pmd on numa node M. Expect reduced performance."
> +
>  - QEMU vCPU thread Affinity
> 
>    A VM performing simple packet forwarding or running complex packet
> pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> a14a2eb..c6570ba 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int
> numa_id)  }
> 
>  static void
> -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> +                      int *all_numa_ids, unsigned all_numa_ids_sz,
> +                      int *num_ids_written)
>  {
>      struct dp_netdev_pmd_thread *pmd;
>      struct rr_numa *numa;
> +    unsigned idx = 0;
> 
>      hmap_init(&rr->numas);
> 
> @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct
> rr_numa_list *rr)
>          numa->n_pmds++;
>          numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa-
> >pmds);
>          numa->pmds[numa->n_pmds - 1] = pmd;
> +
> +        all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> +        idx++;
>      }
> +    *num_ids_written = idx;
>  }
> 
>  static struct dp_netdev_pmd_thread *
> @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)  {
>      struct dp_netdev_port *port;
>      struct rr_numa_list rr;
> +    int all_numa_ids [64];
> +    int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
> +    unsigned all_numa_ids_idx = 0;
> +    int all_numa_ids_max_idx = 0;
> +    int num_numa_ids = 0;
> 
> -    rr_numa_list_populate(dp, &rr);
> +    rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> +                          &num_numa_ids);
> +    all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1);
> 
>      HMAP_FOR_EACH (port, node, &dp->ports) {
>          struct rr_numa *numa;
> @@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)
>                  }
>              } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>                  if (!numa) {
> +                    if (all_numa_ids_max_idx < 0) {
> +                        VLOG_ERR("There are no pmd threads. "
> +                                 "Is pmd-cpu-mask set to zero?");

The VLOG_ERR is a bit general and misleading. A user could set a PMD on a numa 
node, add a DPDK device local to that node, isolate the PMD for that DPDK port, 
then add a second DPDK port that is local to the same numa node. This will 
trigger the ERR above.

Strictly speaking there is a PMD thread set on the numa node but it has been 
isolated. I would change the message above to include isolation. I would also 
add the name of the port searching for a PMD to the message so it is clear 
which port could not find a thread. Something like below

if (all_numa_ids_max_idx < 0) {
    VLOG_ERR("There are no available pmd threads for %s. "
             "Is pmd-cpu-mask set to zero or isolated?", 
             netdev_get_name(port->netdev));

> +                        continue;
> +                    }
>                      VLOG_WARN("There's no available (non isolated) pmd
> thread "
>                                "on numa node %d. Queue %d on port \'%s\'
> will "
> -                              "not be polled.",
> -                              numa_id, qid, netdev_get_name(port-
> >netdev));
> +                              "be assigned to a pmd on numa node %d.
> Expect "
> +                              "reduced performance.",
> +                              numa_id, qid, netdev_get_name(port-
> >netdev),
> +                              all_numa_ids[all_numa_ids_idx]);

The warning above will identify queue 0 will be assigned to a pmd on the other 
numa node.
However it's not just queue 0 that will be assigned but all the unpinned queues 
associated with the port (see 'q->pmd = rr_numa_get_pmd(numa)' below).

I'm thinking the warning should reflect this so users don't think only queue 0 
has been assigned to the non local numa pmd.

> +                    numa_id = all_numa_ids[all_numa_ids_idx];
> +                    numa = rr_numa_list_lookup(&rr, numa_id);
> +                    q->pmd = rr_numa_get_pmd(numa);
> +                    all_numa_ids_idx++;
> +                    if (all_numa_ids_idx > all_numa_ids_max_idx) {
> +                        all_numa_ids_idx = 0;
> +                    }
>                  } else {
>                      q->pmd = rr_numa_get_pmd(numa);
>                  }
> --
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to