> On 08.07.2017 22:09, Stokes, Ian wrote: > >> Previously if there is no available (non-isolated) pmd on the numa > >> node for a port then the port is not polled at all. This can result > >> in a non- operational system until such time as nics are physically > >> repositioned. It is preferable to operate with a pmd on the 'wrong' > >> numa node albeit with lower performance. Local pmds are still chosen > when available. > >> > >> Signed-off-by: Billy O'Mahony <billy.o.mah...@intel.com> > >> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com> > >> Co-authored-by: Ilya Maximets <i.maxim...@samsung.com> > >> --- > >> v9: v8 missed some comments on v7 > >> v8: Some coding style issues; doc tweak > >> v7: Incorporate review comments on docs and implementation > >> v6: Change 'port' to 'queue' in a warning msg > >> v5: Fix warning msg; Update same in docs > >> v4: Fix a checkpatch error > >> v3: Fix warning messages not appearing when using multiqueue > >> v2: Add details of warning messages into docs > >> > >> Documentation/intro/install/dpdk.rst | 21 +++++++++++++++--- > >> lib/dpif-netdev.c | 41 > >> +++++++++++++++++++++++++++++++++--- > >> 2 files changed, 56 insertions(+), 6 deletions(-) > >> > >> diff --git a/Documentation/intro/install/dpdk.rst > >> b/Documentation/intro/install/dpdk.rst > >> index e83f852..89775d6 100644 > >> --- a/Documentation/intro/install/dpdk.rst > >> +++ b/Documentation/intro/install/dpdk.rst > >> @@ -449,7 +449,7 @@ affinitized accordingly. > >> > >> A poll mode driver (pmd) thread handles the I/O of all DPDK > interfaces > >> assigned to it. A pmd thread shall poll the ports for incoming > >> packets, > >> - switch the packets and send to tx port. pmd thread is CPU bound, > >> and needs > >> + switch the packets and send to tx port. A pmd thread is CPU > >> + bound, and needs > >> to be affinitized to isolated cores for optimum performance. > >> > >> By setting a bit in the mask, a pmd thread is created and pinned > >> to the @@ -458,8 +458,23 @@ affinitized accordingly. > >> $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4 > >> > >> .. note:: > >> - pmd thread on a NUMA node is only created if there is at least one > >> DPDK > >> - interface from that NUMA node added to OVS. > >> + A pmd thread on a NUMA node is only created if there is at least > >> + one > >> DPDK > >> + interface from that NUMA node added to OVS. A pmd thread is > >> + created > >> by > >> + default on a core of a NUMA node or when a specified pmd-cpu-mask > has > >> + indicated so. Even though a PMD thread may exist, the thread > >> + only > >> starts > >> + consuming CPU cycles if there is least one receive queue assigned > to > >> + the pmd. > >> + > >> + .. note:: > >> + On NUMA systems PCI devices are also local to a NUMA node. > >> + Unbound > >> rx > >> + queues for a PCI device will assigned to a pmd on it's local > >> + NUMA > > > > Minor point but should read 'will be assigned' > >> node if a > >> + non-isolated PMD exists on that NUMA node. If not, the queue will > be > >> + assigned to a non-isolated pmd on a remote NUMA node. This will > >> result in > >> + reduced maximum throughput on that device and possibly on other > >> devices > >> + assigned to that pmd thread. In the case such, a queue > >> + assignment is > >> made a > >> + warning message will be logged: "There's no available > >> + (non-isolated) > >> pmd > > > > Above should read 'In the case where such a queue assignment is made, a > warning message will be logged' > >> + thread on numa node N. Queue Q on port P will be assigned to the > >> + pmd > >> on > >> + core C (numa node N'). Expect reduced performance." > >> > >> - QEMU vCPU thread Affinity > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > >> 4e29085..7557f32 > >> 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -3195,6 +3195,23 @@ rr_numa_list_lookup(struct rr_numa_list *rr, > >> int > >> numa_id) > >> return NULL; > >> } > >> > >> +/* Returns next NUMA from rr list in round-robin fashion. Returns > >> +the first > >> + * NUMA node if 'NULL' or the last node passed, and 'NULL' if list > >> +is empty. */ static struct rr_numa * rr_numa_list_next(struct > >> +rr_numa_list *rr, const struct rr_numa *numa) { > > > > The comment above can be tidied up a little to better clarify the > behavior of this function. > > I ended up reading the comments for hmap_next() and hmap_first() before > it made sense, and even then it's a bit ambiguous, it ends up being the > code that explains the comments. > > > > You could clarify the following 2 statements: > > > > (1) "Returns the first NUMA node if 'NULL'" - If what is NULL? I assume > you mean the function parameter 'const struct rr_numa *numa' but it's not > clear on first reading. > > > > (2) " or the last node passed" - again this makes sense only when you > look into the behavior of the call 'hmap_next(&rr->numas, &numa->node)'. > > > > You could say something like: > > > > "Attempt to return the next NUMA from a numa list in a round robin > fashion. Return the first NUMA node if the struct rr_numa *numa argument > passed to the function is NULL or if the numa node argument passed to > hmap_next is already the last node. Return NULL if the numa list is > empty." > > I'm not sure that references to implementation is a good way to write > comments (I mean 'passed to hmap_next' part). > > How about this: > """ > /* Returns the next node in numa list following 'numa' in round-robin > fashion. > * Returns first node if 'numa' is a null pointer or the last node in > 'rr'. */ """ > > or > > """ > /* The next node in numa list following 'numa' in round-robin fashion. > * Returns: > * - 'NULL' if 'rr' is an empty numa list. > * - First node in 'rr' if 'numa' is a null pointer. > * - First node in 'rr' if 'numa' is the last node in 'rr'. > * - Otherwise, the next node in numa list following 'numa'. */ > """ > > ?
I'm happy with the first option you provided above, could you append returning NULL if the list is empty then I think we're good. /* Returns the next node in numa list following 'numa' in round-robin fashion. * Returns first node if 'numa' is a null pointer or the last node in 'rr'. * Returns NULL if 'rr' numa list is empty. */ Thanks Ian > > > > >> + struct hmap_node *node = NULL; > >> + > >> + if (numa) { > >> + node = hmap_next(&rr->numas, &numa->node); > >> + } > >> + if (!node) { > >> + node = hmap_first(&rr->numas); > >> + } > >> + > >> + return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL; > >> + } > >> + > >> static void > >> rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) > >> { @@ -3249,6 +3266,7 @@ rxq_scheduling(struct dp_netdev *dp, bool > >> pinned) > >> OVS_REQUIRES(dp->port_mutex) { > >> struct dp_netdev_port *port; > >> struct rr_numa_list rr; > >> + struct rr_numa *non_local_numa = NULL; > >> > >> rr_numa_list_populate(dp, &rr); > >> > >> @@ -3281,11 +3299,28 @@ rxq_scheduling(struct dp_netdev *dp, bool > >> pinned) > >> OVS_REQUIRES(dp->port_mutex) > >> } > >> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { > >> if (!numa) { > >> - VLOG_WARN("There's no available (non isolated) pmd > >> thread " > >> + /* There are no pmds on the queue's local NUMA > node. > >> + Round-robin on the NUMA nodes that do have > pmds. > >> */ > >> + non_local_numa = rr_numa_list_next(&rr, > >> non_local_numa); > >> + if (!non_local_numa) { > >> + VLOG_ERR("There is no available > >> + (non-isolated) > >> pmd " > >> + "thread for port \'%s\' queue %d. > >> + This > >> queue " > >> + "will not be polled. Is > >> + pmd-cpu-mask set > >> to " > >> + "zero? Or are all PMDs isolated to > >> + other > >> " > >> + "queues?", netdev_get_name(port- > >>> netdev), > >> + qid); > >> + continue; > >> + } > >> + q->pmd = rr_numa_get_pmd(non_local_numa); > >> + VLOG_WARN("There's no available (non-isolated) > >> + pmd > >> thread " > >> "on numa node %d. Queue %d on port > \'%s\' > >> will " > >> - "not be polled.", > >> - numa_id, qid, netdev_get_name(port- > >>> netdev)); > >> + "be assigned to the pmd on core %d " > >> + "(numa node %d). Expect reduced > >> performance.", > >> + numa_id, qid, netdev_get_name(port- > >>> netdev), > >> + q->pmd->core_id, q->pmd->numa_id); > >> } else { > >> + /* Assign queue to the next (round-robin) PMD on > >> + it's > >> local > >> + NUMA node. */ > >> q->pmd = rr_numa_get_pmd(numa); > >> } > >> } > >> -- > >> 2.7.4 > > This tested fine for me, tested with multiple rxqs distributed and > isolated over pmds on 2 different numa nodes with varying pmd masks. Also > passed sanity checks (clang, sparse compilation etc.). > > > > You can add the tested by tag for me but I'd like to see the changes for > the documentation and function comments above before acking. > > > > Tested-by: Ian Stokes <ian.sto...@intel.com> > >> > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev