> On 08.07.2017 22:09, Stokes, Ian wrote:
> >> Previously if there is no available (non-isolated) pmd on the numa
> >> node for a port then the port is not polled at all. This can result
> >> in a non- operational system until such time as nics are physically
> >> repositioned. It is preferable to operate with a pmd on the 'wrong'
> >> numa node albeit with lower performance. Local pmds are still chosen
> when available.
> >>
> >> Signed-off-by: Billy O'Mahony <[email protected]>
> >> Signed-off-by: Ilya Maximets <[email protected]>
> >> Co-authored-by: Ilya Maximets <[email protected]>
> >> ---
> >> v9: v8 missed some comments on v7
> >> v8: Some coding style issues; doc tweak
> >> v7: Incorporate review comments on docs and implementation
> >> v6: Change 'port' to 'queue' in a warning msg
> >> v5: Fix warning msg; Update same in docs
> >> v4: Fix a checkpatch error
> >> v3: Fix warning messages not appearing when using multiqueue
> >> v2: Add details of warning messages into docs
> >>
> >> Documentation/intro/install/dpdk.rst | 21 +++++++++++++++---
> >> lib/dpif-netdev.c | 41
> >> +++++++++++++++++++++++++++++++++---
> >> 2 files changed, 56 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/intro/install/dpdk.rst
> >> b/Documentation/intro/install/dpdk.rst
> >> index e83f852..89775d6 100644
> >> --- a/Documentation/intro/install/dpdk.rst
> >> +++ b/Documentation/intro/install/dpdk.rst
> >> @@ -449,7 +449,7 @@ affinitized accordingly.
> >>
> >> A poll mode driver (pmd) thread handles the I/O of all DPDK
> interfaces
> >> assigned to it. A pmd thread shall poll the ports for incoming
> >> packets,
> >> - switch the packets and send to tx port. pmd thread is CPU bound,
> >> and needs
> >> + switch the packets and send to tx port. A pmd thread is CPU
> >> + bound, and needs
> >> to be affinitized to isolated cores for optimum performance.
> >>
> >> By setting a bit in the mask, a pmd thread is created and pinned
> >> to the @@ -458,8 +458,23 @@ affinitized accordingly.
> >> $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
> >>
> >> .. note::
> >> - pmd thread on a NUMA node is only created if there is at least one
> >> DPDK
> >> - interface from that NUMA node added to OVS.
> >> + A pmd thread on a NUMA node is only created if there is at least
> >> + one
> >> DPDK
> >> + interface from that NUMA node added to OVS. A pmd thread is
> >> + created
> >> by
> >> + default on a core of a NUMA node or when a specified pmd-cpu-mask
> has
> >> + indicated so. Even though a PMD thread may exist, the thread
> >> + only
> >> starts
> >> + consuming CPU cycles if there is least one receive queue assigned
> to
> >> + the pmd.
> >> +
> >> + .. note::
> >> + On NUMA systems PCI devices are also local to a NUMA node.
> >> + Unbound
> >> rx
> >> + queues for a PCI device will assigned to a pmd on it's local
> >> + NUMA
> >
> > Minor point but should read 'will be assigned'
> >> node if a
> >> + non-isolated PMD exists on that NUMA node. If not, the queue will
> be
> >> + assigned to a non-isolated pmd on a remote NUMA node. This will
> >> result in
> >> + reduced maximum throughput on that device and possibly on other
> >> devices
> >> + assigned to that pmd thread. In the case such, a queue
> >> + assignment is
> >> made a
> >> + warning message will be logged: "There's no available
> >> + (non-isolated)
> >> pmd
> >
> > Above should read 'In the case where such a queue assignment is made, a
> warning message will be logged'
> >> + thread on numa node N. Queue Q on port P will be assigned to the
> >> + pmd
> >> on
> >> + core C (numa node N'). Expect reduced performance."
> >>
> >> - QEMU vCPU thread Affinity
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >> 4e29085..7557f32
> >> 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3195,6 +3195,23 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
> >> int
> >> numa_id)
> >> return NULL;
> >> }
> >>
> >> +/* Returns next NUMA from rr list in round-robin fashion. Returns
> >> +the first
> >> + * NUMA node if 'NULL' or the last node passed, and 'NULL' if list
> >> +is empty. */ static struct rr_numa * rr_numa_list_next(struct
> >> +rr_numa_list *rr, const struct rr_numa *numa) {
> >
> > The comment above can be tidied up a little to better clarify the
> behavior of this function.
> > I ended up reading the comments for hmap_next() and hmap_first() before
> it made sense, and even then it's a bit ambiguous, it ends up being the
> code that explains the comments.
> >
> > You could clarify the following 2 statements:
> >
> > (1) "Returns the first NUMA node if 'NULL'" - If what is NULL? I assume
> you mean the function parameter 'const struct rr_numa *numa' but it's not
> clear on first reading.
> >
> > (2) " or the last node passed" - again this makes sense only when you
> look into the behavior of the call 'hmap_next(&rr->numas, &numa->node)'.
> >
> > You could say something like:
> >
> > "Attempt to return the next NUMA from a numa list in a round robin
> fashion. Return the first NUMA node if the struct rr_numa *numa argument
> passed to the function is NULL or if the numa node argument passed to
> hmap_next is already the last node. Return NULL if the numa list is
> empty."
>
> I'm not sure that references to implementation is a good way to write
> comments (I mean 'passed to hmap_next' part).
>
> How about this:
> """
> /* Returns the next node in numa list following 'numa' in round-robin
> fashion.
> * Returns first node if 'numa' is a null pointer or the last node in
> 'rr'. */ """
>
> or
>
> """
> /* The next node in numa list following 'numa' in round-robin fashion.
> * Returns:
> * - 'NULL' if 'rr' is an empty numa list.
> * - First node in 'rr' if 'numa' is a null pointer.
> * - First node in 'rr' if 'numa' is the last node in 'rr'.
> * - Otherwise, the next node in numa list following 'numa'. */
> """
>
> ?
I'm happy with the first option you provided above, could you append returning
NULL if the list is empty then I think we're good.
/* Returns the next node in numa list following 'numa' in round-robin fashion.
* Returns first node if 'numa' is a null pointer or the last node in 'rr'.
* Returns NULL if 'rr' numa list is empty. */
Thanks
Ian
>
> >
> >> + struct hmap_node *node = NULL;
> >> +
> >> + if (numa) {
> >> + node = hmap_next(&rr->numas, &numa->node);
> >> + }
> >> + if (!node) {
> >> + node = hmap_first(&rr->numas);
> >> + }
> >> +
> >> + return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL;
> >> + }
> >> +
> >> static void
> >> rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> >> { @@ -3249,6 +3266,7 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned)
> >> OVS_REQUIRES(dp->port_mutex) {
> >> struct dp_netdev_port *port;
> >> struct rr_numa_list rr;
> >> + struct rr_numa *non_local_numa = NULL;
> >>
> >> rr_numa_list_populate(dp, &rr);
> >>
> >> @@ -3281,11 +3299,28 @@ rxq_scheduling(struct dp_netdev *dp, bool
> >> pinned)
> >> OVS_REQUIRES(dp->port_mutex)
> >> }
> >> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
> >> if (!numa) {
> >> - VLOG_WARN("There's no available (non isolated) pmd
> >> thread "
> >> + /* There are no pmds on the queue's local NUMA
> node.
> >> + Round-robin on the NUMA nodes that do have
> pmds.
> >> */
> >> + non_local_numa = rr_numa_list_next(&rr,
> >> non_local_numa);
> >> + if (!non_local_numa) {
> >> + VLOG_ERR("There is no available
> >> + (non-isolated)
> >> pmd "
> >> + "thread for port \'%s\' queue %d.
> >> + This
> >> queue "
> >> + "will not be polled. Is
> >> + pmd-cpu-mask set
> >> to "
> >> + "zero? Or are all PMDs isolated to
> >> + other
> >> "
> >> + "queues?", netdev_get_name(port-
> >>> netdev),
> >> + qid);
> >> + continue;
> >> + }
> >> + q->pmd = rr_numa_get_pmd(non_local_numa);
> >> + VLOG_WARN("There's no available (non-isolated)
> >> + pmd
> >> thread "
> >> "on numa node %d. Queue %d on port
> \'%s\'
> >> will "
> >> - "not be polled.",
> >> - numa_id, qid, netdev_get_name(port-
> >>> netdev));
> >> + "be assigned to the pmd on core %d "
> >> + "(numa node %d). Expect reduced
> >> performance.",
> >> + numa_id, qid, netdev_get_name(port-
> >>> netdev),
> >> + q->pmd->core_id, q->pmd->numa_id);
> >> } else {
> >> + /* Assign queue to the next (round-robin) PMD on
> >> + it's
> >> local
> >> + NUMA node. */
> >> q->pmd = rr_numa_get_pmd(numa);
> >> }
> >> }
> >> --
> >> 2.7.4
> > This tested fine for me, tested with multiple rxqs distributed and
> isolated over pmds on 2 different numa nodes with varying pmd masks. Also
> passed sanity checks (clang, sparse compilation etc.).
> >
> > You can add the tested by tag for me but I'd like to see the changes for
> the documentation and function comments above before acking.
> >
> > Tested-by: Ian Stokes <[email protected]>
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev