Hi Kevin and Ian. Comments inline. Best regards, Ilya Maximets.
On 17.01.2019 19:50, Stokes, Ian wrote: >> On 01/16/2019 01:05 PM, Ilya Maximets wrote: >>> Conditional EMC insert helps a lot in scenarios with high numbers of >>> parallel flows, but in current implementation this option affects all >>> the threads and ports at once. There are scenarios where we have >>> different number of flows on different ports. For example, if one of >>> the VMs encapsulates traffic using additional headers, it will receive >>> large number of flows but only few flows will come out of this VM. In >>> this scenario it's much faster to use EMC instead of classifier for >>> traffic from the VM, but it's better to disable EMC for the traffic >>> which flows to VM. >>> >>> To handle above issue introduced 'emc-enable' configurable to >>> enable/disable EMC on a per-port basis. Ex.: >>> >>> ovs-vsctl set interface dpdk0 other_config:emc-enable=false >>> >>> EMC probability kept as is and it works for all the ports with >>> 'emc-enable=true'. >>> >> >> Hi Ilya, just a couple of minor comments below, > > Hi Ilya, I agree with what Kevins flagged below, one other comment from > myself. > > (BTW I still have to do some testing on this but the code looks good so far.) > >> >>> Signed-off-by: Ilya Maximets <[email protected]> >>> --- >>> >>> Version 3: >>> * Minor rebase on current master. >>> >>> Version 2: >>> * The main concern was about backward compatibility. Also, there >>> is no real profit having the per-port probability value. >>> So, per-port probability switched to per-port 'emc-enable' >>> configurable. >>> Global probability kept back and can be used without any changes. >>> >>> It's been a while since the first version. It's available here: >>> https://patchwork.ozlabs.org/patch/800277/ >>> >>> Documentation/topics/dpdk/bridge.rst | 4 ++ >>> NEWS | 5 +- >>> lib/dpif-netdev.c | 79 +++++++++++++++++++++++++--- >>> vswitchd/vswitch.xml | 19 +++++++ >>> 4 files changed, 98 insertions(+), 9 deletions(-) >>> >>> diff --git a/Documentation/topics/dpdk/bridge.rst >>> b/Documentation/topics/dpdk/bridge.rst >>> index 82bdad840..2fcd86607 100644 >>> --- a/Documentation/topics/dpdk/bridge.rst >>> +++ b/Documentation/topics/dpdk/bridge.rst >>> @@ -101,6 +101,10 @@ observed with pmd stats:: >>> For certain traffic profiles with many parallel flows, it's >>> recommended to set ``N`` to '0' to achieve higher forwarding >> performance. >>> >>> +It is also possible to enable/disable EMC on per-port basis using:: >>> + >>> + $ ovs-vsctl set interface <iface> >>> + other_config:emc-enable={true,false} >>> + >>> For more information on the EMC refer to :doc:`/intro/install/dpdk` . > > It would be good to give a short description of why/when this feature should > be used. > The description of the use case you give in the commit message would even do. > > As more 'tunable' features like this are added I think it's good for > reference so > users understand the motivation for the feature as they may not check commit > message. OK. > > Ian > >>> >>> >>> diff --git a/NEWS b/NEWS >>> index 4618cc0a0..7826578d3 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -24,10 +24,13 @@ Post-v2.10.0 >>> allocated dynamically using the following syntax: >>> ovn-nbctl lsp-set-addresses <port> "dynamic <IP>" >>> - DPDK: >>> + * Add support for DPDK 18.11 >>> + - Userspace datapath: >>> * Add option for simple round-robin based Rxq to PMD assignment. >>> It can be set with pmd-rxq-assign. >>> - * Add support for DPDK 18.11 >>> * Add support for Auto load balancing of PMDs (experimental) >>> + * Added new per-port configurable option to manage EMC: >>> + 'other_config:emc-enable'. >>> - Add 'symmetric_l3' hash function. >>> - OVS now honors 'updelay' and 'downdelay' for bonds with LACP >> configured. >>> - ovs-vswitchd: >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>> be529b6b0..6704be400 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -474,6 +474,7 @@ struct dp_netdev_port { >>> unsigned n_rxq; /* Number of elements in 'rxqs' */ >>> unsigned *txq_used; /* Number of threads that use each tx >> queue. */ >>> struct ovs_mutex txq_used_mutex; >>> + bool emc_enabled; /* If true EMC will be used. */ >>> char *type; /* Port type as requested by user. */ >>> char *rxq_affinity_list; /* Requested affinity of rx queues. */ >>> }; >>> @@ -588,6 +589,7 @@ static void dp_netdev_actions_free(struct >>> dp_netdev_actions *); struct polled_queue { >>> struct dp_netdev_rxq *rxq; >>> odp_port_t port_no; >>> + bool emc_enabled; >>> }; >>> >>> /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */ >>> @@ -617,6 +619,8 @@ struct dp_netdev_pmd_thread_ctx { >>> long long now; >>> /* RX queue from which last packet was received. */ >>> struct dp_netdev_rxq *last_rxq; >>> + /* EMC insertion probability context for the current processing >> cycle. */ >>> + uint32_t emc_insert_min; >>> }; >>> >>> /* PMD: Poll modes drivers. PMD accesses devices via polling to >>> eliminate @@ -1798,6 +1802,7 @@ port_create(const char *devname, const >> char *type, >>> port->netdev = netdev; >>> port->type = xstrdup(type); >>> port->sf = sf; >>> + port->emc_enabled = true; >>> port->need_reconfigure = true; >>> ovs_mutex_init(&port->txq_used_mutex); >>> >>> @@ -2830,9 +2835,7 @@ emc_probabilistic_insert(struct >> dp_netdev_pmd_thread *pmd, >>> * default the value is UINT32_MAX / 100 which yields an insertion >>> * probability of 1/100 ie. 1% */ >>> >>> - uint32_t min; >>> - >>> - atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); >>> + uint32_t min = pmd->ctx.emc_insert_min; >>> >>> if (min && random_uint32() <= min) { >>> emc_insert(&(pmd->flow_cache).emc_cache, key, flow); @@ >>> -3698,7 +3701,8 @@ dpif_netdev_execute(struct dpif *dpif, struct >> dpif_execute *execute) >>> ovs_mutex_lock(&dp->non_pmd_mutex); >>> } >>> >>> - /* Update current time in PMD context. */ >>> + /* Update current time in PMD context. We don't care about EMC >> insertion >>> + * probability, because we are on a slow path. */ >>> pmd_thread_ctx_time_update(pmd); >>> >>> /* The action processing expects the RSS hash to be valid, >>> because @@ -3842,7 +3846,7 @@ dpif_netdev_set_config(struct dpif *dpif, >> const struct smap *other_config) >>> if (insert_min != cur_min) { >>> atomic_store_relaxed(&dp->emc_insert_min, insert_min); >>> if (insert_min == 0) { >>> - VLOG_INFO("EMC has been disabled"); >>> + VLOG_INFO("EMC insertion probability changed to zero"); >>> } else { >>> VLOG_INFO("EMC insertion probability changed to 1/%llu >> (~%.2f%%)", >>> insert_prob, (100 / (float)insert_prob)); @@ >>> -3965,6 +3969,27 @@ exit: >>> return error; >>> } >>> >>> +/* Returns 'true' if one of the 'port's RX queues exists in 'poll_list' >>> + * of given PMD thread. */ >>> +static bool >>> +dpif_netdev_pmd_polls_port(struct dp_netdev_pmd_thread *pmd, >>> + struct dp_netdev_port *port) >>> + OVS_EXCLUDED(pmd->port_mutex) >>> +{ >>> + struct rxq_poll *poll; >>> + bool found = false; >>> + >>> + ovs_mutex_lock(&pmd->port_mutex); >>> + HMAP_FOR_EACH (poll, node, &pmd->poll_list) { >>> + if (port == poll->rxq->port) { >>> + found = true; >>> + break; >>> + } >>> + } >>> + ovs_mutex_unlock(&pmd->port_mutex); >>> + return found; >>> +} >>> + >>> /* Changes the affinity of port's rx queues. The changes are actually >> applied >>> * in dpif_netdev_run(). */ >> >> The comment is a bit stale now. Oh. Thanks. I'll update it. >> >>> static int >>> @@ -3975,10 +4000,33 @@ dpif_netdev_port_set_config(struct dpif *dpif, >> odp_port_t port_no, >>> struct dp_netdev_port *port; >>> int error = 0; >>> const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity"); >>> + bool emc_enabled = smap_get_bool(cfg, "emc-enable", true); >>> >>> ovs_mutex_lock(&dp->port_mutex); >>> error = get_port_by_number(dp, port_no, &port); >>> - if (error || !netdev_is_pmd(port->netdev) >>> + if (error) { >>> + goto unlock; >>> + } >>> + >>> + if (emc_enabled != port->emc_enabled) { >>> + struct dp_netdev_pmd_thread *pmd; >>> + >>> + port->emc_enabled = emc_enabled; >>> + /* Mark for reload all the threads that polls this port and >> request >>> + * for reconfiguration for the actual reloading of threads. */ >>> + CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { >>> + if (dpif_netdev_pmd_polls_port(pmd, port)) { >>> + pmd->need_reload = true; >>> + } >>> + } >>> + dp_netdev_request_reconfigure(dp); >>> + >>> + VLOG_INFO("%s: EMC has been %s", netdev_get_name(port->netdev), >>> + (emc_enabled) ? "enabled" : "disabled"); >> >> I think it would be good idea to add a similar log as >> dpif_netdev_set_config() for 'emc-insert-inv-prob' when emc_enabled gets >> enabled here as a reminder for the user. Otherwise, a user will see 'EMC >> has been enabled' which is not incorrect, but perhaps a long time ago >> 'emc-insert-inv-prob' was set to 0, and there is confusion about whether >> the EMC will actually be used now or not. OK. I'm thinking about adding something like: port: EMC has been enabled. Current insertion probability is %{value}. Disabled case will remain the same: port: EMC has been disabled. I'll update the patch tomorrow morning. It's already too late today. >> >>> + } >>> + >>> + /* Checking for RXq affinity changes. */ >>> + if (!netdev_is_pmd(port->netdev) >>> || nullable_string_is_equal(affinity_list, port- >>> rxq_affinity_list)) { >>> goto unlock; >>> } >>> @@ -5123,6 +5171,13 @@ dpif_netdev_run(struct dpif *dpif) >>> if (!netdev_is_pmd(port->netdev)) { >>> int i; >>> >>> + if (port->emc_enabled) { >>> + atomic_read_relaxed(&dp->emc_insert_min, >>> + &non_pmd->ctx.emc_insert_min); >>> + } else { >>> + non_pmd->ctx.emc_insert_min = 0; >>> + } >>> + >>> for (i = 0; i < port->n_rxq; i++) { >>> if (dp_netdev_process_rxq_port(non_pmd, >>> &port->rxqs[i], @@ >>> -5296,6 +5351,7 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread >> *pmd, >>> HMAP_FOR_EACH (poll, node, &pmd->poll_list) { >>> poll_list[i].rxq = poll->rxq; >>> poll_list[i].port_no = poll->rxq->port->port_no; >>> + poll_list[i].emc_enabled = poll->rxq->port->emc_enabled; >>> i++; >>> } >>> >>> @@ -5360,6 +5416,14 @@ reload: >>> pmd_perf_start_iteration(s); >>> >>> for (i = 0; i < poll_cnt; i++) { >>> + >>> + if (poll_list[i].emc_enabled) { >>> + atomic_read_relaxed(&pmd->dp->emc_insert_min, >>> + &pmd->ctx.emc_insert_min); >>> + } else { >>> + pmd->ctx.emc_insert_min = 0; >>> + } >>> + >>> process_packets = >>> dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, >>> poll_list[i].port_no); @@ >>> -6301,7 +6365,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, >>> struct dfc_cache *cache = &pmd->flow_cache; >>> struct dp_packet *packet; >>> const size_t cnt = dp_packet_batch_size(packets_); >>> - uint32_t cur_min; >>> + uint32_t cur_min = pmd->ctx.emc_insert_min; >>> int i; >>> uint16_t tcp_flags; >>> bool smc_enable_db; >>> @@ -6309,7 +6373,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, >>> bool batch_enable = true; >>> >>> atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db); >>> - atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); >>> pmd_perf_update_counter(&pmd->perf_stats, >>> md_is_valid ? PMD_STAT_RECIRC : >> PMD_STAT_RECV, >>> cnt); >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index >>> d58f63228..88edb5d35 100644 >>> --- a/vswitchd/vswitch.xml >>> +++ b/vswitchd/vswitch.xml >>> @@ -3101,6 +3101,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 >> type=patch options:peer=p1 \ >>> </column> >>> </group> >>> >>> + <group title="EMC (Exact Match Cache) Configuration"> >>> + <p> >>> + These settings controls behaviour of EMC lookups/insertions for >> packets >>> + received from the interface. >>> + </p> >>> + >>> + <column name="other_config" key="emc-enable" type='{"type": >> "boolean"}'> >>> + <p> >>> + Specifies if Exact Match Cache (EMC) should be used while >> processing >>> + packets received from this interface. >>> + If true, <ref table="Open_vSwitch" column="other_config" >>> + key="emc-insert-inv-prob"/> will have effect on this >> interface. >>> + </p> >>> + <p> >>> + Defaults to true. >>> + </p> >>> + </column> >>> + </group> >>> + >>> <group title="MTU"> >>> <p> >>> The MTU (maximum transmission unit) is the largest amount of >>> data >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
