> -----Original Message----- > From: Kevin Traynor [mailto:[email protected]] > Sent: Friday, January 04, 2019 1:48 AM > To: Nitin Katiyar <[email protected]>; [email protected] > Cc: Rohith Basavaraja <[email protected]>; Venkatesan Pradeep > <[email protected]> > Subject: Re: [PATCH v2] Adding support for PMD auto load balancing > > On 01/03/2019 12:36 PM, Nitin Katiyar wrote: > > Port rx queues that have not been statically assigned to PMDs are > > currently assigned based on periodically sampled load measurements. > > The assignment is performed at specific instances – port addition, > > port deletion, upon reassignment request via CLI etc. > > > > Due to change in traffic pattern over time it can cause uneven load > > among the PMDs and thus resulting in lower overall throughout. > > > > This patch enables the support of auto load balancing of PMDs based on > > measured load of RX queues. Each PMD measures the processing load for > > each of its associated queues every 10 seconds. If the aggregated PMD > > load reaches 95% for 6 consecutive intervals then PMD considers itself to > be overloaded. > > > > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is > > performed by OVS main thread. The dry-run does NOT change the existing > > queue to PMD assignments. > > > > If the resultant mapping of dry-run indicates an improved distribution > > of the load then the actual reassignment will be performed. > > > > The automatic rebalancing will be disabled by default and has to be > > enabled via configuration option. The interval (in minutes) between > > two consecutive rebalancing can also be configured via CLI, default is > > 1 min. > > > > Following example commands can be used to set the auto-lb params: > > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" > > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" > > > > Hi Nitin, thanks for v2. Not full review yet but sending some comments below. > > Maybe you can put some of the above into a new section below this > http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue- > assigment-to-pmd-threads Sure, I will update that too. > > I also think this feature should be experimental until it has been road > tested a > bit more. > > > Co-authored-by: Rohith Basavaraja <[email protected]> > > Co-authored-by: Venkatesan Pradeep > <[email protected]> > > Signed-off-by: Rohith Basavaraja <[email protected]> > > Signed-off-by: Venkatesan Pradeep <[email protected]> > > Signed-off-by: Nitin Katiyar <[email protected]> > > --- > > lib/dpif-netdev.c | 403 > +++++++++++++++++++++++++++++++++++++++++++++++++-- > > vswitchd/vswitch.xml | 30 ++++ > > 2 files changed, 424 insertions(+), 9 deletions(-) > > > > There seems to be windows style line endings in the patch? or something else > has gone wrong for me. > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > 1564db9..8db106f 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -80,6 +80,12 @@ > > > > VLOG_DEFINE_THIS_MODULE(dpif_netdev); > > > > +/* Auto Load Balancing Defaults */ > > +#define ACCEPT_IMPROVE_DEFAULT (25) > > +#define PMD_LOAD_THRE_DEFAULT (95) > > Probably you should remove the brackets above to be consistent with the > others below and in the rest of the file. > > > +#define PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */ > > +#define MIN_TO_MSEC 60000 > > + > > #define FLOW_DUMP_MAX_BATCH 50 > > /* Use per thread recirc_depth to prevent recirculation loop. */ > > #define MAX_RECIRC_DEPTH 6 @@ -288,6 +294,13 @@ struct dp_meter { > > struct dp_meter_band bands[]; > > }; > > > > +struct pmd_auto_lb { > > + bool auto_lb_conf; /* enable-disable auto load balancing */ > > I'm not sure what '_conf' is short for? maybe it should be something like > 'auto_lb_requested' Sure > > > + bool is_enabled; /* auto_lb current status */ > > Comments should be of style /* Sentence case. */ > http://docs.openvswitch.org/en/latest/internals/contributing/coding- > style/#comments > Thanks for providing the link. I will update in next version > > > + uint64_t rebalance_intvl; > > + uint64_t rebalance_poll_timer; > > +}; > > + > > /* Datapath based on the network device interface from netdev.h. > > * > > * > > @@ -368,6 +381,7 @@ struct dp_netdev { > > uint64_t last_tnl_conf_seq; > > > > struct conntrack conntrack; > > + struct pmd_auto_lb pmd_alb; > > }; > > > > static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) > > @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread { > > struct ovs_mutex port_mutex; /* Mutex for 'poll_list' and > > 'tx_ports'. */ > > /* List of rx queues to poll. */ > > struct hmap poll_list OVS_GUARDED; > > + > > Unrelated newline should be removed > > > /* Map of 'tx_port's used for transmission. Written by the main > > thread, > > * read by the pmd thread. */ > > struct hmap tx_ports OVS_GUARDED; @@ -702,6 +717,11 @@ struct > > dp_netdev_pmd_thread { > > /* Keep track of detailed PMD performance statistics. */ > > struct pmd_perf_stats perf_stats; > > > > + /* Some stats from previous iteration used by automatic pmd > > + load balance logic. */ > > Nit, but see coding stds. and other multi-line comments wrt style > > > + uint64_t prev_stats[PMD_N_STATS];> + atomic_count > pmd_overloaded; > > + > > /* Set to true if the pmd thread needs to be reloaded. */ > > bool need_reload; > > }; > > @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq > *rx, > > enum rxq_cycles_counter_type type); static > > void dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, > > - unsigned long long cycles); > > + unsigned long long cycles, > > + unsigned idx); > > static uint64_t > > -dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned > > idx); > > +dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, > > + unsigned idx); > > no need to change dp_netdev_rxq_get_intrvl_cycles() > > > static void > > dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread > *pmd, > > bool purge); @@ -3734,6 +3756,51 @@ > > dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops, > > } > > } > > > > +/* Enable/Disable PMD auto load balancing */ static void > > +set_pmd_auto_lb(struct dp_netdev *dp) { > > + unsigned int cnt = 0; > > + struct dp_netdev_pmd_thread *pmd; > > + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; > > + > > + bool enable = false; > > + bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc; > > + > > + /* Ensure that there is at least 2 non-isolated PMDs and > > + * one of the PMD is polling more than one rxq > > + */ > > + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > + if (pmd->core_id == NON_PMD_CORE_ID || pmd->isolated) { > > + continue; > > + } > > + > > + cnt++; > > + if (hmap_count(&pmd->poll_list) > 1) { > > + if (enable && (cnt > 1)) { > > + break; > > + } else { > > + enable = true; > > + } > > + } > > + } > > + > > Won't this give the wrong result if there is one pmd with multiple rxq's? How > about something in the loop like, Yes, you are right. Thanks for catching this. > > if (hmap_count(&pmd->poll_list) > 1) { > multirxq = true; > } > if (cnt && multirxq) { > enable = true; > break; > } > cnt++; > > > + /* Enable auto LB if it is configured and cycle based assignment is > > true */ > > + enable = enable && pmd_rxq_assign_cyc && pmd_alb->auto_lb_conf; > > + > > + if (pmd_alb->is_enabled != enable) { > > + pmd_alb->is_enabled = enable; > > + if (pmd_alb->is_enabled) { > > + VLOG_INFO("PMD auto lb is enabled, rebalance > > intvl:%lu(msec)\n", > > + pmd_alb->rebalance_intvl); > > + } else { > > + pmd_alb->rebalance_poll_timer = 0; > > + VLOG_INFO("PMD auto lb is disabled\n"); > > + } > > + } > > + > > +} > > + > > /* Applies datapath configuration from the database. Some of the changes > are > > * actually applied in dpif_netdev_run(). */ static int @@ -3748,6 > > +3815,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap > *other_config) > > DEFAULT_EM_FLOW_INSERT_INV_PROB); > > uint32_t insert_min, cur_min; > > uint32_t tx_flush_interval, cur_tx_flush_interval; > > + uint64_t rebalance_intvl; > > > > tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", > > DEFAULT_TX_FLUSH_INTERVAL); @@ > > -3819,6 +3887,23 @@ dpif_netdev_set_config(struct dpif *dpif, const > struct smap *other_config) > > pmd_rxq_assign); > > dp_netdev_request_reconfigure(dp); > > } > > + > > + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; > > + pmd_alb->auto_lb_conf = smap_get_bool(other_config, "pmd-auto-lb", > > + false); > > + > > + rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebalance- > intvl", > > + PMD_REBALANCE_POLL_INTERVAL); > > + > > + /* Input is in min, convert it to msec */ > > + rebalance_intvl = > > + rebalance_intvl ? rebalance_intvl * MIN_TO_MSEC : > > + MIN_TO_MSEC; > > + > > This is creating a default when the user sets 0 - that needs to be documented. > > With current values, this could overflow rebalance_intvl. The user value is in > minutes, so suggest to limit user input to some reasonable value like 1 week > in > minutes, and then the min to msec can be safe. See tx-flush-interval as an > example where the range is limited. Thanks, I will update the documentation. > > > + if (pmd_alb->rebalance_intvl != rebalance_intvl) { > > + pmd_alb->rebalance_intvl = rebalance_intvl; > > + } > > + > > + set_pmd_auto_lb(dp); > > return 0; > > } > > > > @@ -3974,9 +4059,9 @@ dp_netdev_rxq_get_cycles(struct > dp_netdev_rxq > > *rx, > > > > static void > > dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, > > - unsigned long long cycles) > > + unsigned long long cycles, > > + unsigned idx) > > { > > - unsigned int idx = rx->intrvl_idx++ % PMD_RXQ_INTERVAL_MAX; > > atomic_store_relaxed(&rx->cycles_intrvl[idx], cycles); } > > > > @@ -4762,6 +4847,9 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > /* Reload affected pmd threads. */ > > reload_affected_pmds(dp); > > + > > + /* Check if PMD Auto LB is to be enabled */ > > + set_pmd_auto_lb(dp); > > } > > > > /* Returns true if one of the netdevs in 'dp' requires a > > reconfiguration */ @@ -4780,6 +4868,228 @@ > ports_require_restart(const struct dp_netdev *dp) > > return false; > > } > > > > +/* Function for calculating variance */ static uint64_t > > +variance(uint64_t a[], int n) { > > + /* Compute mean (average of elements) */ > > + uint64_t sum = 0; > > + uint64_t mean = 0; > > + uint64_t sqDiff = 0; > > + > > + if (!n) { > > + return 0; > > + } > > + > > + for (int i = 0; i < n; i++) { > > + sum += a[i]; > > + } > > + > > + if (sum) { > > + mean = sum / n; > > + > > + /* Compute sum squared differences with mean. */ > > + for (int i = 0; i < n; i++) { > > + sqDiff += (a[i] - mean)*(a[i] - mean); > > + } > > + } > > + return (sqDiff ? (sqDiff / n) : 0); } > > + > > +static uint64_t > > +get_dry_run_variance(struct dp_netdev *dp, uint32_t *core_list, > > +uint32_t num) > > I think this function would require the port_mutex. I will check and add in next version. > > > +{ > > + struct dp_netdev_port *port; > > + struct dp_netdev_pmd_thread *pmd; > > + struct dp_netdev_rxq ** rxqs = NULL; > > + struct rr_numa *numa = NULL; > > + struct rr_numa_list rr; > > + int n_rxqs = 0; > > + uint64_t ret = 0; > > + uint64_t *pmd_usage; > > + > > + pmd_usage = xcalloc(num, sizeof(uint64_t)); > > + > > + HMAP_FOR_EACH (port, node, &dp->ports) { > > + if (!netdev_is_pmd(port->netdev)) { > > + continue; > > + } > > + > > + for (int qid = 0; qid < port->n_rxq; qid++) { > > + struct dp_netdev_rxq *q = &port->rxqs[qid]; > > + uint64_t cycle_hist = 0; > > + > > + if (q->pmd->isolated) { > > + continue; > > + } > > + > > + if (n_rxqs == 0) { > > + rxqs = xmalloc(sizeof *rxqs); > > + } else { > > + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); > > + } > > + > > + /* Sum the queue intervals and store the cycle history. */ > > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > > + cycle_hist += dp_netdev_rxq_get_intrvl_cycles(q, i); > > + } > > + /* Do we need to add intrvl_cycles in history?? > > If you want to use compare_rxq_cycles() then you have to put them in history, > but it's only used for that and re-written everytime, so I don't think it is > harmful. Yeah, that is the objective of adding it. > > > + * but then we should clear interval cycles also */ > > I don't think you should be clearing interval cycles in a dry run, otherwise > they > will be reset if the real rebalance occurs. Thanks for clarifying it. I will remove the comment. > > > + dp_netdev_rxq_set_cycles(q, RXQ_CYCLES_PROC_HIST, > > + cycle_hist); > > + /* Store the queue. */ > > + rxqs[n_rxqs++] = q; > > + } > > + } > > + if (n_rxqs > 1) { > > + /* Sort the queues in order of the processing cycles > > + * they consumed during their last pmd interval. */ > > + qsort(rxqs, n_rxqs, sizeof *rxqs, compare_rxq_cycles); > > + } > > + rr_numa_list_populate(dp, &rr); > > + > > + for (int i = 0; i < n_rxqs; i++) { > > + int numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); > > + numa = rr_numa_list_lookup(&rr, numa_id); > > + if (!numa) { > > + /* Don't consider queues across NUMA ???*/ > > I think you should abort the whole dry run process if this is happening Okay. > > > + continue; > > + } > > + > > + pmd = rr_numa_get_pmd(numa, true); > > + VLOG_DBG("PMD AUTO_LB:Core %d on numa node %d assigned port > \'%s\' " > > + "rx queue %d " > > + "(measured processing cycles %"PRIu64").", > > + pmd->core_id, numa_id, > > + netdev_rxq_get_name(rxqs[i]->rx), > > + netdev_rxq_get_queue_id(rxqs[i]->rx), > > + dp_netdev_rxq_get_cycles(rxqs[i], > > + RXQ_CYCLES_PROC_HIST)); > > + > > + for (int id = 0; id < num; id++) { > > + if (pmd->core_id == core_list[id]) { > > + /* Add the processing cycles of rxq to pmd polling it */ > > + uint64_t proc_cycles = 0; > > + for (unsigned idx = 0; idx < PMD_RXQ_INTERVAL_MAX; idx++) { > > + proc_cycles += dp_netdev_rxq_get_intrvl_cycles(rxqs[i], > > + idx); > > + } > > + pmd_usage[id] += proc_cycles; > > + } > > + } > > + } > > + > > + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > + uint64_t total_cycles = 0; > > + > > + if ((pmd->core_id == NON_PMD_CORE_ID) || pmd->isolated) { > > + continue; > > + } > > + > > + /* Get the total pmd cycles for an interval. */ > > + atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles); > > + /* Estimate the cycles to cover all intervals. */ > > + total_cycles *= PMD_RXQ_INTERVAL_MAX; > > + for (int id = 0; id < num; id++) { > > + if (pmd->core_id == core_list[id]) { > > + if (pmd_usage[id]) { > > + pmd_usage[id] = (pmd_usage[id] * 100) / total_cycles; > > + } > > + VLOG_DBG("Core_id:%d, usage:%"PRIu64"\n", > > + pmd->core_id, pmd_usage[id]); > > + } > > + } > > + } > > + ret = variance(pmd_usage, num); > > + > > + rr_numa_list_destroy(&rr); > > + free(rxqs); > > + free(pmd_usage); > > + return ret; > > +} > > + > > +static bool > > +pmd_rebalance_dry_run(struct dp_netdev *dp) { > > + struct dp_netdev_pmd_thread *pmd; > > + uint64_t *curr_pmd_usage; > > + > > + uint64_t curr_variance; > > + uint64_t new_variance; > > + uint64_t improvement = 0; > > + uint32_t num_pmds; > > + uint32_t *pmd_corelist; > > + struct rxq_poll *poll, *poll_next; > > + > > + num_pmds = cmap_count(&dp->poll_threads); > > + > > + if (num_pmds > 1) { > > + curr_pmd_usage = xcalloc(num_pmds, sizeof(uint64_t)); > > + pmd_corelist = xcalloc(num_pmds, sizeof(uint32_t)); > > + } else { > > + return false; > > + } > > + > > + num_pmds = 0; > > + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > + uint64_t total_cycles = 0; > > + uint64_t total_proc = 0; > > + > > + if ((pmd->core_id == NON_PMD_CORE_ID) || pmd->isolated) { > > + continue; > > + } > > + > > + /* Get the total pmd cycles for an interval. */ > > + atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles); > > + /* Estimate the cycles to cover all intervals. */ > > + total_cycles *= PMD_RXQ_INTERVAL_MAX; > > + > > + HMAP_FOR_EACH_SAFE (poll, poll_next, node, &pmd->poll_list) { > > + uint64_t proc_cycles = 0; > > + for (unsigned i = 0; i < PMD_RXQ_INTERVAL_MAX; i++) { > > + proc_cycles += dp_netdev_rxq_get_intrvl_cycles(poll->rxq, > > i); > > + } > > + total_proc += proc_cycles; > > + } > > + if (total_proc) { > > + curr_pmd_usage[num_pmds] = (total_proc * 100) / total_cycles; > > + } > > + > > + VLOG_DBG("PMD_AUTO_LB_MON curr_pmd_usage(%d):%"PRIu64"", > > + pmd->core_id, curr_pmd_usage[num_pmds]); > > + > > + if (atomic_count_get(&pmd->pmd_overloaded)) { > > + atomic_count_set(&pmd->pmd_overloaded, 0); > > + } > > + > > + pmd_corelist[num_pmds] = pmd->core_id; > > + num_pmds++; > > + } > > + > > + curr_variance = variance(curr_pmd_usage, num_pmds); > > + > > + new_variance = get_dry_run_variance(dp, pmd_corelist, num_pmds); > > + VLOG_DBG("PMD_AUTO_LB_MON new variance: %"PRIu64"," > > + " curr_variance: %"PRIu64"", > > + new_variance, curr_variance); > > + > > + if (new_variance && (new_variance < curr_variance)) { > > + improvement = > > + ((curr_variance - new_variance) * 100) / curr_variance; > > + > > + VLOG_DBG("PMD_AUTO_LB_MON improvement %"PRIu64"", > improvement); > > + } > > + > > + free(curr_pmd_usage); > > + free(pmd_corelist); > > + > > + if (improvement >= ACCEPT_IMPROVE_DEFAULT) { > > + return true; > > + } > > + > > + return false; > > +} > > + > > + > > /* Return true if needs to revalidate datapath flows. */ static bool > > dpif_netdev_run(struct dpif *dpif) @@ -4789,6 +5099,9 @@ > > dpif_netdev_run(struct dpif *dpif) > > struct dp_netdev_pmd_thread *non_pmd; > > uint64_t new_tnl_seq; > > bool need_to_flush = true; > > + bool pmd_rebalance = false; > > + long long int now = time_msec(); > > + struct dp_netdev_pmd_thread *pmd; > > > > ovs_mutex_lock(&dp->port_mutex); > > non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); @@ -4821,6 > > +5134,38 @@ dpif_netdev_run(struct dpif *dpif) > > dp_netdev_pmd_unref(non_pmd); > > } > > > > + struct pmd_auto_lb * pmd_alb = &dp->pmd_alb; > > + if (pmd_alb->is_enabled) { > > + if (!pmd_alb->rebalance_poll_timer) { > > + pmd_alb->rebalance_poll_timer = now; > > + } else if ((pmd_alb->rebalance_poll_timer + > > + pmd_alb->rebalance_intvl) < now) { > > + pmd_alb->rebalance_poll_timer = now; > > + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > + if (atomic_count_get(&pmd->pmd_overloaded) >= > > + PMD_RXQ_INTERVAL_MAX) { > > + pmd_rebalance = true; > > + break; > > + } > > + } > > + VLOG_DBG("PMD_AUTO_LB_MON periodic check:pmd > rebalance:%d", > > + pmd_rebalance); > > + > > + if (pmd_rebalance && > > + !dp_netdev_is_reconf_required(dp) && > > + !ports_require_restart(dp) && > > + pmd_rebalance_dry_run(dp)) { > > Don't you need the dp_netdev_mutex for call to pmd_rebalance_dry_run, or > at least for some parts of it? I will check it. > > > + > > + ovs_mutex_unlock(&dp->port_mutex); > > It seems odd to be unlocking this and then taking it again, is there a reason? Need to check it again if we can have both locks at the same time. > > > + ovs_mutex_lock(&dp_netdev_mutex); > > + VLOG_DBG("PMD_AUTO_LB_MON Invoking PMD RECONFIGURE"); > > + dp_netdev_request_reconfigure(dp); > > + ovs_mutex_unlock(&dp_netdev_mutex); > > + ovs_mutex_lock(&dp->port_mutex); > > + } > > + } > > + } > > + > > if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) { > > reconfigure_datapath(dp); > > } > > @@ -4979,6 +5324,8 @@ pmd_thread_main(void *f_) > > reload: > > pmd_alloc_static_tx_qid(pmd); > > > > + atomic_count_init(&pmd->pmd_overloaded, 0); > > + > > /* List port/core affinity */ > > for (i = 0; i < poll_cnt; i++) { > > VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", > > @@ -4986,6 +5333,10 @@ reload: > > netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); > > /* Reset the rxq current cycles counter. */ > > dp_netdev_rxq_set_cycles(poll_list[i].rxq, > > RXQ_CYCLES_PROC_CURR, 0); > > + > > + for (unsigned j = 0; j < PMD_RXQ_INTERVAL_MAX; j++) { > > + dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, 0, j); > > + } > > is it needed for this patch? won't all the values have been refreshed by the > time the next check is performed anyway I thought it is safe to reset it. If PMD is reset in middle of cycle then it may have stale information when dry_run is executed. > > > } > > > > if (!poll_cnt) { > > @@ -7188,17 +7539,51 @@ dp_netdev_pmd_try_optimize(struct > dp_netdev_pmd_thread *pmd, > > struct polled_queue *poll_list, int > > poll_cnt) { > > struct dpcls *cls; > > + uint64_t tot_idle = 0, tot_proc = 0; > > + unsigned int idx; > > + unsigned int pmd_load = 0; > > > > if (pmd->ctx.now > pmd->rxq_next_cycle_store) { > > uint64_t curr_tsc; > > + struct pmd_auto_lb * pmd_alb = &pmd->dp->pmd_alb; > > + if (pmd_alb->is_enabled && !pmd->isolated) { > > + tot_idle = pmd->perf_stats.counters.n[PMD_CYCLES_ITER_IDLE] - > > + pmd->prev_stats[PMD_CYCLES_ITER_IDLE]; > > + tot_proc = pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY] - > > + pmd->prev_stats[PMD_CYCLES_ITER_BUSY]; > > + > > + if (tot_proc) { > > + pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc)); > > + } > > + > > + if (pmd_load >= PMD_LOAD_THRE_DEFAULT) { > > + atomic_count_inc(&pmd->pmd_overloaded); > > + > > + VLOG_DBG("PMD_AUTO_LB_MON PMD OVERLOAD DETECT iter > %d", > > + atomic_count_get(&pmd->pmd_overloaded)); > > Better to remove this log Sure > > > + } else { > > + atomic_count_set(&pmd->pmd_overloaded, 0); > > + } > > + } > > + > > + pmd->prev_stats[PMD_CYCLES_ITER_IDLE] = > > + pmd->perf_stats.counters.n[PMD_CYCLES_ITER_IDLE]; > > + pmd->prev_stats[PMD_CYCLES_ITER_BUSY] = > > + > > + pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY]; > > + > > These have been used earlier - are they initialized somewhere for the first > use? you could just skip the above if() on the first call, so they get > initialized, > or else init them at the start of pmd_thread_main() It would have in initialized to 0 when pmd structure is created. We are not resetting it to 0 whenever PMD is reloaded. > > > /* Get the cycles that were used to process each queue and store. > > */ > > for (unsigned i = 0; i < poll_cnt; i++) { > > - uint64_t rxq_cyc_curr = > > dp_netdev_rxq_get_cycles(poll_list[i].rxq, > > - > > RXQ_CYCLES_PROC_CURR); > > - dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, > > rxq_cyc_curr); > > - dp_netdev_rxq_set_cycles(poll_list[i].rxq, > RXQ_CYCLES_PROC_CURR, > > - 0); > > + uint64_t rxq_cyc_curr; > > + struct dp_netdev_rxq *rxq; > > + > > + rxq = poll_list[i].rxq; > > + idx = rxq->intrvl_idx++ % PMD_RXQ_INTERVAL_MAX; > > + > > + rxq_cyc_curr = dp_netdev_rxq_get_cycles(rxq, > RXQ_CYCLES_PROC_CURR); > > + dp_netdev_rxq_set_intrvl_cycles(rxq, rxq_cyc_curr, idx); > > + dp_netdev_rxq_set_cycles(rxq, RXQ_CYCLES_PROC_CURR, 0); > > } > > + > > curr_tsc = cycles_counter_update(&pmd->perf_stats); > > if (pmd->intrvl_tsc_prev) { > > /* There is a prev timestamp, store a new intrvl cycle > > count. */ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 2160910..ff3589c 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -574,6 +574,36 @@ > > be set to 'skip_sw'. > > </p> > > </column> > > + <column name="other_config" key="pmd-auto-lb" > > + type='{"type": "boolean"}'> > > + <p> > > + Configures PMD Auto Load Balancing that allows automatic > assignment of > > + RX queues to PMDs if any of PMDs is overloaded (i.e. processing > cycles > > + > 95%). > > + </p> > > + <p> > > + It uses current scheme of cycle based assignment of RX queues that > > + are not statically pinned to PMDs. > > + </p> > > + <p> > > + Set this value to <code>true</code> to enable this option. > > + </p> > > + <p> > > + The default value is <code>false</code>. > > + </p> > > + <p> > > + This only comes in effect if cycle based assignment is enabled and > > + there are more than one non-isolated PMDs present and atleast one > of > > + it polls more than one queue. > > + </p> > > + </column> > > + <column name="other_config" key="pmd-auto-lb-rebalance-intvl" > > + type='{"type": "integer", "minInteger": 1}'> > > + <p> > > + The minimum time (in minutes) 2 consecutive PMD Auto Load > Balancing > > + iterations. > > + </p> > > + </column> > > </group> > > <group title="Status"> > > <column name="next_cfg"> > >
_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
