On 08/16/2017 04:48 PM, Darrell Ball wrote: > Seems we are in sync on most points then. > > > > *From: *Jan Scheurich <[email protected]> > *Date: *Wednesday, August 16, 2017 at 8:03 AM > *To: *Darrell Ball <[email protected]>, Sugesh Chandran > <[email protected]>, "[email protected]" > <[email protected]>, "[email protected]" <[email protected]> > *Subject: *RE: [ovs-dev] [RFC PATCH 1/1] dpif-netdev : Include Rxq > processing cycles > > > > I believe that queue-related cycles stats should be part of the > pmd-rxq-show output, possibly guarded by some command option. > > > > We are in sync on having an option for pmd-rxq-show, then. > > > > The queue statistics should be cumulative in the same sense as with > pmd-stats-show, so that a user can reset them with the pmd-stats-clear > command at any time and chose the observation period themselves. > Statistics over a shorter interval than a second are rarely useful for > analysis. > > > > Clearing the overall stats is certainly needed. > > I suggested the collection of multiple intervals is needed to see > trends; meaning if the traffic patterns are varying wildly over multiple > > intervals, the user may not want to rebalance. I think to make a
Ah ok, I thought you just wanted a longer history to base the re-balance on, but now I think you specifically want to display multiple intervals to help the user understand if there is spiky behavior. Let me know if I got it now. > rebalance decision, stats over longer total periods, like an hour, would > be more reliable. > Originally I had it as 10 secs but Jan had suggested it was too long :s Display length and data used for rebalance do not necessarily need to be the same. For example a user might want to look at the interval trend over a longer period of time to ensure they should do a re-balance, but the actual re-balance could be on more recent data. > Of course, the user may want to observe more. > > > > > > To be of value for the decision whether to manually trigger a load-based > queue redistribution, the pmd-rxq-show command should not (only) display > the distribution of polling and processing cycles among the Rx queues of > a single PMD but rather the total load of each PMD (to see imbalance > between PMDs) > The idle/processing cycles and % for a pmd are already displayed in the pmd stats. Are you suggesting they should be repeated in the rxq stats? > > > We are in sync on having totals per pmd, then. > > > > and the share of the cycles of each queue among all Rx queues. > > > > Adding percentage is needed and is simple; this would be unpleasant for > the user to. > > > > > > > > Only that allows to identify the elephant queues and judge the chances > for improvement through rebalancing. > > > > > > Otherwise it is impossible to compare the load on queues polled by > different PMDs. > > > > BR, Jan > > > > > >> -----Original Message----- > >> From: [email protected] [mailto:ovs-dev- > >> [email protected]] On Behalf Of Darrell Ball > >> Sent: Wednesday, 16 August, 2017 09:13 > >> To: Sugesh Chandran <[email protected]>; > >> [email protected]; [email protected] > >> Subject: Re: [ovs-dev] [RFC PATCH 1/1] dpif-netdev : Include Rxq processing > >> cycles > >> > >> I know there is some debate about where this o/p fits – pmd-rxq-show or > >> pmd-stats-show > >> > >> If this is meant to support Kevin’s series - OVS-DPDK rxq to pmd assignment > >> improvements ?, > >> then maybe pmd-rxq-show might be easier since pmd-stats-show has lots > >> of other info. > >> Maybe an ‘option’ to pmd-rxq-show would address concerns ? > >> > >> The o/p below however does not work perfectly for Kevin’s need though I > >> think ? > >> A listing of pmds with non-pinned queues, with processing cycles over > >> several time intervals might be better ? > >> Maybe totals as well per PMD of the above processing cycles ? > >> > >> Darrell > >> > >> > >> -----Original Message----- > >> From: <[email protected] > <mailto:[email protected]>> on behalf of Sugesh Chandran > >> <[email protected] <mailto:[email protected]>> > >> Date: Tuesday, May 30, 2017 at 11:46 AM > >> To: "[email protected] <mailto:[email protected]>" >> <[email protected] > <mailto:[email protected]>>, > >> "[email protected] <mailto:[email protected]>" <[email protected] > <mailto:[email protected]>> > >> Subject: [ovs-dev] [RFC PATCH 1/1] dpif-netdev : Include Rxq processing > >> cycles > >> > >> PMD threads polls and process packets from Rxq in round-robin fashion. > >> It is > >> not guaranteed optimal allocation of Rxq across the PMD threads all the > >> time. > >> The patch series in the following link are trying to offer uniform > >> distribution > >> of rxqs across the PMD threads. > >> > >> https://urldefense.proofpoint.com/v2/url?u=https- > <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DMay_332734.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=lFd-Uz76lHVdT8c_HQSZEVqVqngPqInSeoB9Lx5CCtY&s=wayV1b7KIDTf6r4QS3mFd-jCSolIBzdNSwaS3PXQpWM&e> > >> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017- > >> 2DMay_332734.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA > >> 09CGX7JQ5Ih-uZnsw&m=lFd- > >> Uz76lHVdT8c_HQSZEVqVqngPqInSeoB9Lx5CCtY&s=wayV1b7KIDTf6r4QS3m > >> Fd-jCSolIBzdNSwaS3PXQpWM&e= > >> > >> However at first it is necessary for the user to know about the > >> distribution of > >> PMD cycles across its Rx queues. This patch is extending the existing > >> command > >> 'ovs-appctl dpif-netdev/pmd-rxq-show' > >> to show the percentage of total PMD thread cycles spent on each port rx > >> queues. > >> > >> A sample 'ovs-appctl dpif-netdev/pmd-rxq-show' with pmd cycle > >> percentage : > >> > >> pmd thread numa_id 0 core_id 2: > >> isolated : false > >> port: dpdk0 > >> queue-id: 0 > >> polling cycles(% of total polling cycles):88687867260 > >> (24.40%) > >> processing cycles(% of total processing > >> cycles):741227393476 (65.87%) > >> > >> port: dpdkvhostuser1 > >> queue-id: 0 > >> polling cycles(% of total polling cycles):274003203688 > >> (75.38%) > >> processing cycles(% of total processing > >> cycles):384127428352 (34.13%) > >> > >> pmd thread numa_id 0 core_id 3: > >> isolated : false > >> port: dpdk1 > >> queue-id: 0 > >> polling cycles(% of total polling cycles):96022088320 > >> (26.01%) > >> processing cycles(% of total processing > >> cycles):716284951804 (64.07%) > >> > >> port: dpdkvhostuser0 > >> queue-id: 0 > >> polling cycles(% of total polling cycles):273103986760 > >> (73.98%) > >> processing cycles(% of total processing > >> cycles):401626635996 (35.93%) > >> > >> Signed-off-by: Sugesh Chandran <[email protected] >> <mailto:[email protected]>> > >> --- > >> lib/dpif-netdev.c | 155 > >> +++++++++++++++++++++++++++++++++++++++++++++--------- > >> 1 file changed, 131 insertions(+), 24 deletions(-) > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > >> index b50164b..e4fcb2e 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -332,8 +332,16 @@ enum pmd_cycles_counter_type { > >> > >> #define XPS_TIMEOUT_MS 500LL > >> > >> +/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > >> +struct dp_netdev_pmd_cycles { > >> + /* Indexed by PMD_CYCLES_*. */ > >> + atomic_ullong n[PMD_N_CYCLES]; > >> +}; > >> + > >> /* Contained by struct dp_netdev_port's 'rxqs' member. */ > >> struct dp_netdev_rxq { > >> + /* Cycles counters for each queue. */ > >> + struct dp_netdev_pmd_cycles cycles; > >> struct dp_netdev_port *port; > >> struct netdev_rxq *rx; > >> unsigned core_id; /* Core to which this queue >> should be > >> @@ -341,6 +349,7 @@ struct dp_netdev_rxq { > >> queue doesn't need to be >> pinned to a > >> particular core. */ > >> struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this > >> queue. */ > >> + uint64_t cycles_zero[PMD_N_CYCLES]; /* cycles to zero out */ > >> }; > >> > >> /* A port in a netdev-based datapath. */ > >> @@ -469,14 +478,8 @@ struct dp_netdev_pmd_stats { > >> atomic_ullong n[DP_N_STATS]; > >> }; > >> > >> -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > >> -struct dp_netdev_pmd_cycles { > >> - /* Indexed by PMD_CYCLES_*. */ > >> - atomic_ullong n[PMD_N_CYCLES]; > >> -}; > >> - > >> struct polled_queue { > >> - struct netdev_rxq *rx; > >> + struct dp_netdev_rxq *rxq; > >> odp_port_t port_no; > >> }; > >> > >> @@ -676,6 +679,10 @@ static inline bool emc_entry_alive(struct > >> emc_entry *ce); > >> static void emc_clear_entry(struct emc_entry *ce); > >> > >> static void > >> +pmd_reset_cycles(struct dp_netdev_pmd_thread *pmd, > >> + uint64_t cycles[PMD_N_CYCLES]); > >> + > >> +static void > >> emc_cache_init(struct emc_cache *flow_cache) > >> { > >> int i; > >> @@ -842,9 +849,9 @@ pmd_info_clear_stats(struct ds *reply > >> OVS_UNUSED, > >> for (i = 0; i < DP_N_STATS; i++) { > >> pmd->stats_zero[i] = stats[i]; > >> } > >> - for (i = 0; i < PMD_N_CYCLES; i++) { > >> - pmd->cycles_zero[i] = cycles[i]; > >> - } > >> + > >> + /* Clear the PMD cycles stats */ > >> + pmd_reset_cycles(pmd, cycles); > >> } > >> > >> static int > >> @@ -896,7 +903,9 @@ pmd_info_show_rxq(struct ds *reply, struct > >> dp_netdev_pmd_thread *pmd) > >> if (pmd->core_id != NON_PMD_CORE_ID) { > >> const char *prev_name = NULL; > >> struct rxq_poll *list; > >> - size_t i, n; > >> + size_t i, j, n; > >> + uint64_t pmd_cycles[PMD_N_CYCLES]; > >> + uint64_t port_cycles[PMD_N_CYCLES]; > >> > >> ds_put_format(reply, > >> "pmd thread numa_id %d core_id %u:\n\tisolated : >> %s\n", > >> @@ -905,6 +914,19 @@ pmd_info_show_rxq(struct ds *reply, struct > >> dp_netdev_pmd_thread *pmd) > >> > >> ovs_mutex_lock(&pmd->port_mutex); > >> sorted_poll_list(pmd, &list, &n); > >> + > >> + for (i = 0; i < ARRAY_SIZE(pmd_cycles); i++) { > >> + /* Read the PMD CPU cycles */ > >> + atomic_read_relaxed(&pmd->cycles.n[i], &pmd_cycles[i]); > >> + } > >> + for (i = 0; i < PMD_N_CYCLES; i++) { > >> + if (pmd_cycles[i] > pmd->cycles_zero[i]) { > >> + pmd_cycles[i] -= pmd->cycles_zero[i]; > >> + } else { > >> + pmd_cycles[i] = 0; > >> + } > >> + } > >> + > >> for (i = 0; i < n; i++) { > >> const char *name = netdev_rxq_get_name(list[i].rxq->rx); > >> > >> @@ -912,11 +934,38 @@ pmd_info_show_rxq(struct ds *reply, struct > >> dp_netdev_pmd_thread *pmd) > >> if (prev_name) { > >> ds_put_cstr(reply, "\n"); > >> } > >> - ds_put_format(reply, "\tport: %s\tqueue-id:", name); > >> + ds_put_format(reply, "\tport: %s\n", name); > >> } > >> - ds_put_format(reply, " %d", > >> + ds_put_format(reply, "\t\tqueue-id: %d\n", > >> netdev_rxq_get_queue_id(list[i].rxq->rx)); > >> prev_name = name; > >> + if (!pmd_cycles[PMD_CYCLES_POLLING] || > >> + !pmd_cycles[PMD_CYCLES_PROCESSING]) { > >> + continue; > >> + } > >> + for (j = 0; j < ARRAY_SIZE(port_cycles); j++) { > >> + /* Read the Rx queue CPU cycles */ > >> + atomic_read_relaxed(&list[i].rxq->cycles.n[j], > >> + &port_cycles[j]); > >> + } > >> + for (j = 0; j < PMD_N_CYCLES; j++) { > >> + if (port_cycles[j] > list[i].rxq->cycles_zero[j]) { > >> + port_cycles[j] -= list[i].rxq->cycles_zero[j]; > >> + } else { > >> + port_cycles[j] = 0; > >> + } > >> + } > >> + ds_put_format(reply, > >> + "\t\tpolling cycles(%% of total polling >> cycles):" > >> + "%"PRIu64" (%.02f%%)\n" > >> + "\t\tprocessing cycles(%% of total >> processing " > >> + "cycles):%"PRIu64" (%.02f%%)\n", > >> + port_cycles[PMD_CYCLES_POLLING], > >> + port_cycles[PMD_CYCLES_POLLING] / > >> + (double) pmd_cycles[PMD_CYCLES_POLLING]* >> 100, > >> + port_cycles[PMD_CYCLES_PROCESSING], > >> + port_cycles[PMD_CYCLES_PROCESSING] / > >> + (double) >> pmd_cycles[PMD_CYCLES_PROCESSING]* > >> 100); > >> } > >> ovs_mutex_unlock(&pmd->port_mutex); > >> ds_put_cstr(reply, "\n"); > >> @@ -3076,6 +3125,7 @@ cycles_count_start(struct > >> dp_netdev_pmd_thread *pmd) > >> /* Stop counting cycles and add them to the counter 'type' */ > >> static inline void > >> cycles_count_end(struct dp_netdev_pmd_thread *pmd, > >> + struct dp_netdev_rxq *rx, > >> enum pmd_cycles_counter_type type) > >> OVS_RELEASES(&cycles_counter_fake_mutex) > >> OVS_NO_THREAD_SAFETY_ANALYSIS > >> @@ -3083,11 +3133,13 @@ cycles_count_end(struct > >> dp_netdev_pmd_thread *pmd, > >> unsigned long long interval = cycles_counter() - pmd->last_cycles; > >> > >> non_atomic_ullong_add(&pmd->cycles.n[type], interval); > >> + /* Update the cycles for the specific queue as well */ > >> + non_atomic_ullong_add(&rx->cycles.n[type], interval); > >> } > >> > >> static void > >> dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, > >> - struct netdev_rxq *rx, > >> + struct dp_netdev_rxq *rx, > >> odp_port_t port_no) > >> { > >> struct dp_packet_batch batch; > >> @@ -3095,19 +3147,19 @@ dp_netdev_process_rxq_port(struct > >> dp_netdev_pmd_thread *pmd, > >> > >> dp_packet_batch_init(&batch); > >> cycles_count_start(pmd); > >> - error = netdev_rxq_recv(rx, &batch); > >> - cycles_count_end(pmd, PMD_CYCLES_POLLING); > >> + error = netdev_rxq_recv(rx->rx, &batch); > >> + cycles_count_end(pmd, rx, PMD_CYCLES_POLLING); > >> if (!error) { > >> *recirc_depth_get() = 0; > >> > >> cycles_count_start(pmd); > >> dp_netdev_input(pmd, &batch, port_no); > >> - cycles_count_end(pmd, PMD_CYCLES_PROCESSING); > >> + cycles_count_end(pmd, rx, PMD_CYCLES_PROCESSING); > >> } else if (error != EAGAIN && error != EOPNOTSUPP) { > >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > >> > >> VLOG_ERR_RL(&rl, "error receiving data from %s: %s", > >> - netdev_rxq_get_name(rx), ovs_strerror(error)); > >> + netdev_rxq_get_name(rx->rx), ovs_strerror(error)); > >> } > >> } > >> > >> @@ -3129,7 +3181,7 @@ static int > >> port_reconfigure(struct dp_netdev_port *port) > >> { > >> struct netdev *netdev = port->netdev; > >> - int i, err; > >> + int i, err, j; > >> > >> port->need_reconfigure = false; > >> > >> @@ -3162,6 +3214,11 @@ port_reconfigure(struct dp_netdev_port > >> *port) > >> if (err) { > >> return err; > >> } > >> + /* Initialize the rxq stats */ > >> + for (j = 0 ; j < PMD_N_CYCLES; j++) { > >> + atomic_init(&port->rxqs[i].cycles.n[j], 0); > >> + port->rxqs[i].cycles_zero[j] = 0; > >> + } > >> port->n_rxq++; > >> } > >> > >> @@ -3580,7 +3637,7 @@ dpif_netdev_run(struct dpif *dpif) > >> int i; > >> > >> for (i = 0; i < port->n_rxq; i++) { > >> - dp_netdev_process_rxq_port(non_pmd, >> port->rxqs[i].rx, > >> + dp_netdev_process_rxq_port(non_pmd, &port->rxqs[i], > >> port->port_no); > >> } > >> } > >> @@ -3687,7 +3744,7 @@ pmd_load_queues_and_ports(struct > >> dp_netdev_pmd_thread *pmd, > >> > >> i = 0; > >> HMAP_FOR_EACH (poll, node, &pmd->poll_list) { > >> - poll_list[i].rx = poll->rxq->rx; > >> + poll_list[i].rxq = poll->rxq; > >> poll_list[i].port_no = poll->rxq->port->port_no; > >> i++; > >> } > >> @@ -3700,6 +3757,54 @@ pmd_load_queues_and_ports(struct > >> dp_netdev_pmd_thread *pmd, > >> return i; > >> } > >> > >> +static void > >> +pmd_reset_rx_queue_cycles(struct dp_netdev_pmd_thread *pmd) > >> +{ > >> + int i, j; > >> + struct dp_netdev_rxq *rxq; > >> + struct rxq_poll *poll_list; > >> + size_t n; > >> + uint64_t q_cycles[PMD_N_CYCLES]; > >> + > >> + sorted_poll_list(pmd, &poll_list, &n); > >> + > >> + for (i = 0; i < n; i++) { > >> + rxq = poll_list[i].rxq; > >> + for (j = 0; j < ARRAY_SIZE(q_cycles); j++) { > >> + /* Read the Rx queue CPU cycles */ > >> + atomic_read_relaxed(&rxq->cycles.n[j], &q_cycles[j]); > >> + rxq->cycles_zero[j] = q_cycles[j]; > >> + } > >> + } > >> + > >> + free(poll_list); > >> +} > >> + > >> +static void > >> +pmd_reset_cycles(struct dp_netdev_pmd_thread *pmd, > >> + uint64_t cycles[PMD_N_CYCLES]) > >> +{ > >> + int i; > >> + > >> + for (i = 0; i < PMD_N_CYCLES; i++) { > >> + pmd->cycles_zero[i] = cycles[i]; > >> + } > >> + /* Clear the Rx queue stats */ > >> + pmd_reset_rx_queue_cycles(pmd); > >> +} > >> + > >> +static void > >> +pmd_reset_cycles_stat(struct dp_netdev_pmd_thread *pmd) > >> +{ > >> + uint64_t cycles[PMD_N_CYCLES]; > >> + int i; > >> + > >> + for (i = 0; i < ARRAY_SIZE(cycles); i++) { > >> + atomic_read_relaxed(&pmd->cycles.n[i], &cycles[i]); > >> + } > >> + pmd_reset_cycles(pmd, cycles); > >> +} > >> + > >> static void * > >> pmd_thread_main(void *f_) > >> { > >> @@ -3723,8 +3828,8 @@ reload: > >> /* List port/core affinity */ > >> for (i = 0; i < poll_cnt; i++) { > >> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", > >> - pmd->core_id, netdev_rxq_get_name(poll_list[i].rx), > >> - netdev_rxq_get_queue_id(poll_list[i].rx)); > >> + pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx), > >> + netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); > >> } > >> > >> if (!poll_cnt) { > >> @@ -3737,7 +3842,7 @@ reload: > >> > >> for (;;) { > >> for (i = 0; i < poll_cnt; i++) { > >> - dp_netdev_process_rxq_port(pmd, poll_list[i].rx, > >> + dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, > >> poll_list[i].port_no); > >> } > >> > >> @@ -4326,6 +4431,7 @@ dp_netdev_add_rxq_to_pmd(struct > >> dp_netdev_pmd_thread *pmd, > >> poll->rxq = rxq; > >> hmap_insert(&pmd->poll_list, &poll->node, hash); > >> > >> + pmd_reset_cycles_stat(pmd); > >> pmd->need_reload = true; > >> } > >> > >> @@ -4339,6 +4445,7 @@ dp_netdev_del_rxq_from_pmd(struct > >> dp_netdev_pmd_thread *pmd, > >> free(poll); > >> > >> pmd->need_reload = true; > >> + pmd_reset_cycles_stat(pmd); > >> } > >> > >> /* Add 'port' to the tx port cache of 'pmd', which must be reloaded for > >> the > >> -- > >> 2.7.4 > >> > >> _______________________________________________ > >> dev mailing list > >> [email protected] <mailto:[email protected]> > >> https://urldefense.proofpoint.com/v2/url?u=https- > <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=lFd-Uz76lHVdT8c_HQSZEVqVqngPqInSeoB9Lx5CCtY&s=JEXquDEXkDuD7XO2oTuMRBKxnckyRjWYVfDBEz0Bo9E&e> > >> 3A__mail.openvswitch.org_mailman_listinfo_ovs- > >> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih- > >> uZnsw&m=lFd- > >> Uz76lHVdT8c_HQSZEVqVqngPqInSeoB9Lx5CCtY&s=JEXquDEXkDuD7XO2oTu > >> MRBKxnckyRjWYVfDBEz0Bo9E&e= > >> > >> > >> _______________________________________________ > >> dev mailing list > >> [email protected] <mailto:[email protected]> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=dGZmbKhBG9tJHY4odedsGA&m=ch--skz5Jl76T7UTaqtG4NSp_ETkNlQ4kKU1ygmSy5Q&s=jjo0tvwQpvPJJKGOWr6_EalNRyS_MGEzeZaDKYniCmE&e=> > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
