On 08/08/2017 07:15 PM, Greg Rose wrote: > On 08/01/2017 08:58 AM, Kevin Traynor wrote: >> Previously rxqs were assigned to pmds by round robin in >> port/queue order. >> >> Now that we have the processing cycles used for existing rxqs, >> use that information to try and produced a better balanced >> distribution of rxqs across pmds. i.e. given multiple pmds, the >> rxqs which have consumed the largest amount of processing cycles >> will be placed on different pmds. >> >> The rxqs are sorted by their processing cycles and assigned (in >> sorted order) round robin across pmds. >> >> Signed-off-by: Kevin Traynor <[email protected]> >> --- >> Documentation/howto/dpdk.rst | 7 +++++ >> lib/dpif-netdev.c | 73 >> +++++++++++++++++++++++++++++++++++--------- >> 2 files changed, 66 insertions(+), 14 deletions(-) >> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst >> index af01d3e..a969285 100644 >> --- a/Documentation/howto/dpdk.rst >> +++ b/Documentation/howto/dpdk.rst >> @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues >> was pinned will become >> thread. >> >> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to >> pmds (cores) >> +automatically. The processing cycles that have been required for each >> rxq >> +will be used where known to assign rxqs with the highest consumption of >> +processing cycles to different pmds. >> + >> +Rxq to pmds assignment takes place whenever there are configuration >> changes. >> + >> QoS >> --- >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 25a521a..a05e586 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) >> } >> >> +/* Sort Rx Queues by the processing cycles they are consuming. */ >> +static int >> +rxq_cycle_sort(const void *a, const void *b) >> +{ >> + struct dp_netdev_rxq * qa; >> + struct dp_netdev_rxq * qb; >> + >> + qa = *(struct dp_netdev_rxq **) a; >> + qb = *(struct dp_netdev_rxq **) b; >> + >> + if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= >> + dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { >> + return -1; >> + } >> + >> + return 1; >> +} >> + >> /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned >> * queues and marks the pmds as isolated. Otherwise, assign non >> isolated >> * pmds to unpinned queues. >> * >> + * If 'pinned' is false queues will be sorted by processing cycles >> they are >> + * consuming and then assigned to pmds in round robin order. >> + * >> * The function doesn't touch the pmd threads, it just stores the >> assignment >> * in the 'pmd' member of each rxq. */ >> @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> struct dp_netdev_port *port; >> struct rr_numa_list rr; >> - >> - rr_numa_list_populate(dp, &rr); >> + struct dp_netdev_rxq ** rxqs = NULL; >> + int i, n_rxqs = 0; >> + struct rr_numa *numa = NULL; >> + int numa_id; >> >> HMAP_FOR_EACH (port, node, &dp->ports) { >> - struct rr_numa *numa; >> - int numa_id; >> - >> if (!netdev_is_pmd(port->netdev)) { >> continue; >> } >> >> - numa_id = netdev_get_numa_id(port->netdev); >> - numa = rr_numa_list_lookup(&rr, numa_id); >> - >> for (int qid = 0; qid < port->n_rxq; qid++) { >> struct dp_netdev_rxq *q = &port->rxqs[qid]; >> @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool >> pinned) OVS_REQUIRES(dp->port_mutex) >> } >> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { >> - if (!numa) { >> - VLOG_WARN("There's no available (non isolated) >> pmd thread " >> - "on numa node %d. Queue %d on port >> \'%s\' will " >> - "not be polled.", >> - numa_id, qid, >> netdev_get_name(port->netdev)); >> + if (n_rxqs == 0) { >> + rxqs = xmalloc(sizeof *rxqs); >> } else { >> - q->pmd = rr_numa_get_pmd(numa); >> + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); >> } >> + /* Store the queue. */ >> + rxqs[n_rxqs++] = q; >> } >> } >> } >> >> + if (n_rxqs > 1) { >> + /* Sort the queues in order of the processing cycles >> + * they consumed during their last pmd interval. */ >> + qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort); >> + } >> + >> + rr_numa_list_populate(dp, &rr); >> + /* Assign the sorted queues to pmds in round robin. */ >> + for (i = 0; i < n_rxqs; i++) { >> + numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); >> + numa = rr_numa_list_lookup(&rr, numa_id); >> + if (!numa) { >> + VLOG_WARN("There's no available (non isolated) pmd thread " >> + "on numa node %d. Queue %d on port \'%s\' will " >> + "not be polled.", >> + numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx), >> + netdev_get_name(rxqs[i]->port->netdev)); >> + continue; >> + } >> + rxqs[i]->pmd = rr_numa_get_pmd(numa); >> + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " >> + "rx queue %d (measured processing cycles %"PRIu64").", >> + rxqs[i]->pmd->core_id, numa_id, >> + netdev_rxq_get_name(rxqs[i]->rx), >> + netdev_rxq_get_queue_id(rxqs[i]->rx), >> + dp_netdev_rxq_get_cycles(rxqs[i], >> RXQ_CYCLES_PROC_LAST)); > > Kevin, > > I've been reviewing and testing this code and found something odd. The > measured processing cycles are > always zero in my setup. > > sample log output: > > 2017-08-08T12:48:25.871Z|00417|dpif_netdev|INFO|Core 6 on numa node 0 > assigned port 'port-em2' rx queue 5 (measured processing cycles > 10011304791). > 2017-08-08T12:48:25.871Z|00418|dpif_netdev|INFO|Core 6 on numa node 0 > assigned port 'port-em2' rx queue 4 (measured processing cycles 0). > > Initially I configure my setup with 16 rxq's and a PMD CPU mask of > 0x1FFFE. Then I've been testing by running > iperf traffic with multiple ports 8 or 16 (-P option) to allow > 'processing cycles' to count up. Or at least I think that's > what should be happening. But when I reconfigure the rxq's and cpu mask > the processing cycles is always > zero. >
Hi Greg, thanks for trying it out. I see that rxq 5 has measured cycles so it appears to be just on some queues. The stat that is showing is the processing cycles that was counted for the rxq during the last 1 sec run while it was on a pmd. "processing cycles" counts time to fetch packets and process them but it does not count time spent polling when there are no rx packets. There's a few reasons it could be 0: - The queue is newly added - There is no rx traffic on that interface - The interface has not distributed the traffic to that particular rxq so there is no "processing cycles" done for that queue. Given the rxq number in the log, I would hazard a guess that it's the last issue. You could confirm this by setting pmds > total rxqs, so that each pmd has a max of 1 rxq. Then the pmds stats then can indicate if there are packets being received on that pmd, and hence rxq. You can check that setup with ovs-appctl dpif-netdev/pmd-rxq-show ovs-appctl dpif-netdev/pmd-stats-clear ovs-appctl dpif-netdev/pmd-stats-show If you increase the number of flows so that the RSS in the NIC (IIRC relies on 5-tuple) can split them across the full range of rxq's it should solve that issue. Of course there could always be a bug somewhere too! > How are you testing this? Perhaps it's just my test harness or > something else. > I'm using 2 dpdk ports with flows added to send between them. Externally I have pktgen-dpdk connected and sending 1K flows so I hit all queues. Then varying traffic rates, pmds and queue numbers and also using ovs-appctl dpif-netdev/pmd-rxq-rebalance from 6/6. > Initial setup: > > ovs-vsctl set Interface port-em2 options:n_rxq=16 > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1FFFE > > (Note that I do not set affinity - I have read your patch to infer that > this is for cases without affinitization.) > That's correct, and manual affinitization takes precedence (I need to add in docs if I haven't). The patchset only changes the how the non-affinitized rxqs are distributed. > After getting traffic I then run this setup: > > ovs-vsctl set Interface port-em2 options:n_rxq=4 > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E > > Any advice or comment? > > Thanks, > > - Greg > Just sent a v4 also with rebase for head of master. thanks, Kevin. >> + } >> + >> rr_numa_list_destroy(&rr); >> + free(rxqs); >> } >> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
