On 01/16/2019 01:13 PM, Stokes, Ian wrote: >> Conditional EMC insert helps a lot in scenarios with high numbers of >> parallel flows, but in current implementation this option affects all the >> threads and ports at once. There are scenarios where we have different >> number of flows on different ports. For example, if one of the VMs >> encapsulates traffic using additional headers, it will receive large >> number of flows but only few flows will come out of this VM. In this >> scenario it's much faster to use EMC instead of classifier for traffic >> from the VM, but it's better to disable EMC for the traffic which flows to >> VM. >> >> To handle above issue introduced 'emc-enable' configurable to >> enable/disable EMC on a per-port basis. Ex.: >> >> ovs-vsctl set interface dpdk0 other_config:emc-enable=false >> >> EMC probability kept as is and it works for all the ports with 'emc- >> enable=true'. >> > > Hi Ilya, > > Thanks for this. It sounds like a useful usecase. > > I'm concentrating on testing/debugging the port representor patch at the > moment. I if get some spare cycles later today I'll take a look at this. >
I'll have a look also and send comments tomorrow > (I spotted the changes to the NEWS doc below, good catch, regardless of this > patch, those changes can be added to master). > > Ian > >> Signed-off-by: Ilya Maximets <[email protected]> >> --- >> >> Version 3: >> * Minor rebase on current master. >> >> Version 2: >> * The main concern was about backward compatibility. Also, there >> is no real profit having the per-port probability value. >> So, per-port probability switched to per-port 'emc-enable' >> configurable. >> Global probability kept back and can be used without any changes. >> >> It's been a while since the first version. It's available here: >> https://patchwork.ozlabs.org/patch/800277/ >> >> Documentation/topics/dpdk/bridge.rst | 4 ++ >> NEWS | 5 +- >> lib/dpif-netdev.c | 79 +++++++++++++++++++++++++--- >> vswitchd/vswitch.xml | 19 +++++++ >> 4 files changed, 98 insertions(+), 9 deletions(-) >> >> diff --git a/Documentation/topics/dpdk/bridge.rst >> b/Documentation/topics/dpdk/bridge.rst >> index 82bdad840..2fcd86607 100644 >> --- a/Documentation/topics/dpdk/bridge.rst >> +++ b/Documentation/topics/dpdk/bridge.rst >> @@ -101,6 +101,10 @@ observed with pmd stats:: >> For certain traffic profiles with many parallel flows, it's recommended >> to set ``N`` to '0' to achieve higher forwarding performance. >> >> +It is also possible to enable/disable EMC on per-port basis using:: >> + >> + $ ovs-vsctl set interface <iface> >> + other_config:emc-enable={true,false} >> + >> For more information on the EMC refer to :doc:`/intro/install/dpdk` . >> >> >> diff --git a/NEWS b/NEWS >> index 4618cc0a0..7826578d3 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -24,10 +24,13 @@ Post-v2.10.0 >> allocated dynamically using the following syntax: >> ovn-nbctl lsp-set-addresses <port> "dynamic <IP>" >> - DPDK: >> + * Add support for DPDK 18.11 >> + - Userspace datapath: >> * Add option for simple round-robin based Rxq to PMD assignment. >> It can be set with pmd-rxq-assign. >> - * Add support for DPDK 18.11 >> * Add support for Auto load balancing of PMDs (experimental) >> + * Added new per-port configurable option to manage EMC: >> + 'other_config:emc-enable'. >> - Add 'symmetric_l3' hash function. >> - OVS now honors 'updelay' and 'downdelay' for bonds with LACP >> configured. >> - ovs-vswitchd: >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >> be529b6b0..6704be400 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -474,6 +474,7 @@ struct dp_netdev_port { >> unsigned n_rxq; /* Number of elements in 'rxqs' */ >> unsigned *txq_used; /* Number of threads that use each tx >> queue. */ >> struct ovs_mutex txq_used_mutex; >> + bool emc_enabled; /* If true EMC will be used. */ >> char *type; /* Port type as requested by user. */ >> char *rxq_affinity_list; /* Requested affinity of rx queues. */ >> }; >> @@ -588,6 +589,7 @@ static void dp_netdev_actions_free(struct >> dp_netdev_actions *); struct polled_queue { >> struct dp_netdev_rxq *rxq; >> odp_port_t port_no; >> + bool emc_enabled; >> }; >> >> /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */ @@ - >> 617,6 +619,8 @@ struct dp_netdev_pmd_thread_ctx { >> long long now; >> /* RX queue from which last packet was received. */ >> struct dp_netdev_rxq *last_rxq; >> + /* EMC insertion probability context for the current processing >> cycle. */ >> + uint32_t emc_insert_min; >> }; >> >> /* PMD: Poll modes drivers. PMD accesses devices via polling to >> eliminate @@ -1798,6 +1802,7 @@ port_create(const char *devname, const >> char *type, >> port->netdev = netdev; >> port->type = xstrdup(type); >> port->sf = sf; >> + port->emc_enabled = true; >> port->need_reconfigure = true; >> ovs_mutex_init(&port->txq_used_mutex); >> >> @@ -2830,9 +2835,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread >> *pmd, >> * default the value is UINT32_MAX / 100 which yields an insertion >> * probability of 1/100 ie. 1% */ >> >> - uint32_t min; >> - >> - atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); >> + uint32_t min = pmd->ctx.emc_insert_min; >> >> if (min && random_uint32() <= min) { >> emc_insert(&(pmd->flow_cache).emc_cache, key, flow); @@ -3698,7 >> +3701,8 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute >> *execute) >> ovs_mutex_lock(&dp->non_pmd_mutex); >> } >> >> - /* Update current time in PMD context. */ >> + /* Update current time in PMD context. We don't care about EMC >> insertion >> + * probability, because we are on a slow path. */ >> pmd_thread_ctx_time_update(pmd); >> >> /* The action processing expects the RSS hash to be valid, because @@ >> -3842,7 +3846,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct >> smap *other_config) >> if (insert_min != cur_min) { >> atomic_store_relaxed(&dp->emc_insert_min, insert_min); >> if (insert_min == 0) { >> - VLOG_INFO("EMC has been disabled"); >> + VLOG_INFO("EMC insertion probability changed to zero"); >> } else { >> VLOG_INFO("EMC insertion probability changed to 1/%llu >> (~%.2f%%)", >> insert_prob, (100 / (float)insert_prob)); @@ - >> 3965,6 +3969,27 @@ exit: >> return error; >> } >> >> +/* Returns 'true' if one of the 'port's RX queues exists in 'poll_list' >> + * of given PMD thread. */ >> +static bool >> +dpif_netdev_pmd_polls_port(struct dp_netdev_pmd_thread *pmd, >> + struct dp_netdev_port *port) >> + OVS_EXCLUDED(pmd->port_mutex) >> +{ >> + struct rxq_poll *poll; >> + bool found = false; >> + >> + ovs_mutex_lock(&pmd->port_mutex); >> + HMAP_FOR_EACH (poll, node, &pmd->poll_list) { >> + if (port == poll->rxq->port) { >> + found = true; >> + break; >> + } >> + } >> + ovs_mutex_unlock(&pmd->port_mutex); >> + return found; >> +} >> + >> /* Changes the affinity of port's rx queues. The changes are actually >> applied >> * in dpif_netdev_run(). */ >> static int >> @@ -3975,10 +4000,33 @@ dpif_netdev_port_set_config(struct dpif *dpif, >> odp_port_t port_no, >> struct dp_netdev_port *port; >> int error = 0; >> const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity"); >> + bool emc_enabled = smap_get_bool(cfg, "emc-enable", true); >> >> ovs_mutex_lock(&dp->port_mutex); >> error = get_port_by_number(dp, port_no, &port); >> - if (error || !netdev_is_pmd(port->netdev) >> + if (error) { >> + goto unlock; >> + } >> + >> + if (emc_enabled != port->emc_enabled) { >> + struct dp_netdev_pmd_thread *pmd; >> + >> + port->emc_enabled = emc_enabled; >> + /* Mark for reload all the threads that polls this port and >> request >> + * for reconfiguration for the actual reloading of threads. */ >> + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { >> + if (dpif_netdev_pmd_polls_port(pmd, port)) { >> + pmd->need_reload = true; >> + } >> + } >> + dp_netdev_request_reconfigure(dp); >> + >> + VLOG_INFO("%s: EMC has been %s", netdev_get_name(port->netdev), >> + (emc_enabled) ? "enabled" : "disabled"); >> + } >> + >> + /* Checking for RXq affinity changes. */ >> + if (!netdev_is_pmd(port->netdev) >> || nullable_string_is_equal(affinity_list, port- >>> rxq_affinity_list)) { >> goto unlock; >> } >> @@ -5123,6 +5171,13 @@ dpif_netdev_run(struct dpif *dpif) >> if (!netdev_is_pmd(port->netdev)) { >> int i; >> >> + if (port->emc_enabled) { >> + atomic_read_relaxed(&dp->emc_insert_min, >> + &non_pmd->ctx.emc_insert_min); >> + } else { >> + non_pmd->ctx.emc_insert_min = 0; >> + } >> + >> for (i = 0; i < port->n_rxq; i++) { >> if (dp_netdev_process_rxq_port(non_pmd, >> &port->rxqs[i], @@ - >> 5296,6 +5351,7 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread >> *pmd, >> HMAP_FOR_EACH (poll, node, &pmd->poll_list) { >> poll_list[i].rxq = poll->rxq; >> poll_list[i].port_no = poll->rxq->port->port_no; >> + poll_list[i].emc_enabled = poll->rxq->port->emc_enabled; >> i++; >> } >> >> @@ -5360,6 +5416,14 @@ reload: >> pmd_perf_start_iteration(s); >> >> for (i = 0; i < poll_cnt; i++) { >> + >> + if (poll_list[i].emc_enabled) { >> + atomic_read_relaxed(&pmd->dp->emc_insert_min, >> + &pmd->ctx.emc_insert_min); >> + } else { >> + pmd->ctx.emc_insert_min = 0; >> + } >> + >> process_packets = >> dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, >> poll_list[i].port_no); @@ - >> 6301,7 +6365,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, >> struct dfc_cache *cache = &pmd->flow_cache; >> struct dp_packet *packet; >> const size_t cnt = dp_packet_batch_size(packets_); >> - uint32_t cur_min; >> + uint32_t cur_min = pmd->ctx.emc_insert_min; >> int i; >> uint16_t tcp_flags; >> bool smc_enable_db; >> @@ -6309,7 +6373,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, >> bool batch_enable = true; >> >> atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db); >> - atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); >> pmd_perf_update_counter(&pmd->perf_stats, >> md_is_valid ? PMD_STAT_RECIRC : >> PMD_STAT_RECV, >> cnt); >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index >> d58f63228..88edb5d35 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -3101,6 +3101,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 >> type=patch options:peer=p1 \ >> </column> >> </group> >> >> + <group title="EMC (Exact Match Cache) Configuration"> >> + <p> >> + These settings controls behaviour of EMC lookups/insertions for >> packets >> + received from the interface. >> + </p> >> + >> + <column name="other_config" key="emc-enable" type='{"type": >> "boolean"}'> >> + <p> >> + Specifies if Exact Match Cache (EMC) should be used while >> processing >> + packets received from this interface. >> + If true, <ref table="Open_vSwitch" column="other_config" >> + key="emc-insert-inv-prob"/> will have effect on this interface. >> + </p> >> + <p> >> + Defaults to true. >> + </p> >> + </column> >> + </group> >> + >> <group title="MTU"> >> <p> >> The MTU (maximum transmission unit) is the largest amount of data >> -- >> 2.17.1 > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
