On 01/16/2019 01:13 PM, Stokes, Ian wrote:
>> Conditional EMC insert helps a lot in scenarios with high numbers of
>> parallel flows, but in current implementation this option affects all the
>> threads and ports at once. There are scenarios where we have different
>> number of flows on different ports. For example, if one of the VMs
>> encapsulates traffic using additional headers, it will receive large
>> number of flows but only few flows will come out of this VM. In this
>> scenario it's much faster to use EMC instead of classifier for traffic
>> from the VM, but it's better to disable EMC for the traffic which flows to
>> VM.
>>
>> To handle above issue introduced 'emc-enable' configurable to
>> enable/disable EMC on a per-port basis. Ex.:
>>
>>   ovs-vsctl set interface dpdk0 other_config:emc-enable=false
>>
>> EMC probability kept as is and it works for all the ports with 'emc-
>> enable=true'.
>>
> 
> Hi Ilya,
> 
> Thanks for this. It sounds like a useful usecase.
> 
> I'm concentrating on testing/debugging the port representor patch at the 
> moment. I if get some spare cycles later today I'll take a look at this.
> 

I'll have a look also and send comments tomorrow

> (I spotted the changes to the NEWS doc below, good catch, regardless of this 
> patch, those changes can be added to master).
> 
> Ian 
> 
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
>>
>> Version 3:
>>     * Minor rebase on current master.
>>
>> Version 2:
>>     * The main concern was about backward compatibility. Also, there
>>       is no real profit having the per-port probability value.
>>       So, per-port probability switched to per-port 'emc-enable'
>>       configurable.
>>       Global probability kept back and can be used without any changes.
>>
>> It's been a while since the first version. It's available here:
>>     https://patchwork.ozlabs.org/patch/800277/
>>
>>  Documentation/topics/dpdk/bridge.rst |  4 ++
>>  NEWS                                 |  5 +-
>>  lib/dpif-netdev.c                    | 79 +++++++++++++++++++++++++---
>>  vswitchd/vswitch.xml                 | 19 +++++++
>>  4 files changed, 98 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/bridge.rst
>> b/Documentation/topics/dpdk/bridge.rst
>> index 82bdad840..2fcd86607 100644
>> --- a/Documentation/topics/dpdk/bridge.rst
>> +++ b/Documentation/topics/dpdk/bridge.rst
>> @@ -101,6 +101,10 @@ observed with pmd stats::
>>  For certain traffic profiles with many parallel flows, it's recommended
>> to set  ``N`` to '0' to achieve higher forwarding performance.
>>
>> +It is also possible to enable/disable EMC on per-port basis using::
>> +
>> +    $ ovs-vsctl set interface <iface>
>> + other_config:emc-enable={true,false}
>> +
>>  For more information on the EMC refer to :doc:`/intro/install/dpdk` .
>>
>>
>> diff --git a/NEWS b/NEWS
>> index 4618cc0a0..7826578d3 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -24,10 +24,13 @@ Post-v2.10.0
>>           allocated dynamically using the following syntax:
>>             ovn-nbctl lsp-set-addresses <port> "dynamic <IP>"
>>     - DPDK:
>> +     * Add support for DPDK 18.11
>> +   - Userspace datapath:
>>       * Add option for simple round-robin based Rxq to PMD assignment.
>>         It can be set with pmd-rxq-assign.
>> -     * Add support for DPDK 18.11
>>       * Add support for Auto load balancing of PMDs (experimental)
>> +     * Added new per-port configurable option to manage EMC:
>> +       'other_config:emc-enable'.
>>     - Add 'symmetric_l3' hash function.
>>     - OVS now honors 'updelay' and 'downdelay' for bonds with LACP
>> configured.
>>     - ovs-vswitchd:
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> be529b6b0..6704be400 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -474,6 +474,7 @@ struct dp_netdev_port {
>>      unsigned n_rxq;             /* Number of elements in 'rxqs' */
>>      unsigned *txq_used;         /* Number of threads that use each tx
>> queue. */
>>      struct ovs_mutex txq_used_mutex;
>> +    bool emc_enabled;           /* If true EMC will be used. */
>>      char *type;                 /* Port type as requested by user. */
>>      char *rxq_affinity_list;    /* Requested affinity of rx queues. */
>>  };
>> @@ -588,6 +589,7 @@ static void dp_netdev_actions_free(struct
>> dp_netdev_actions *);  struct polled_queue {
>>      struct dp_netdev_rxq *rxq;
>>      odp_port_t port_no;
>> +    bool emc_enabled;
>>  };
>>
>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */ @@ -
>> 617,6 +619,8 @@ struct dp_netdev_pmd_thread_ctx {
>>      long long now;
>>      /* RX queue from which last packet was received. */
>>      struct dp_netdev_rxq *last_rxq;
>> +    /* EMC insertion probability context for the current processing
>> cycle. */
>> +    uint32_t emc_insert_min;
>>  };
>>
>>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to
>> eliminate @@ -1798,6 +1802,7 @@ port_create(const char *devname, const
>> char *type,
>>      port->netdev = netdev;
>>      port->type = xstrdup(type);
>>      port->sf = sf;
>> +    port->emc_enabled = true;
>>      port->need_reconfigure = true;
>>      ovs_mutex_init(&port->txq_used_mutex);
>>
>> @@ -2830,9 +2835,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread
>> *pmd,
>>       * default the value is UINT32_MAX / 100 which yields an insertion
>>       * probability of 1/100 ie. 1% */
>>
>> -    uint32_t min;
>> -
>> -    atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
>> +    uint32_t min = pmd->ctx.emc_insert_min;
>>
>>      if (min && random_uint32() <= min) {
>>          emc_insert(&(pmd->flow_cache).emc_cache, key, flow); @@ -3698,7
>> +3701,8 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute
>> *execute)
>>          ovs_mutex_lock(&dp->non_pmd_mutex);
>>      }
>>
>> -    /* Update current time in PMD context. */
>> +    /* Update current time in PMD context. We don't care about EMC
>> insertion
>> +     * probability, because we are on a slow path. */
>>      pmd_thread_ctx_time_update(pmd);
>>
>>      /* The action processing expects the RSS hash to be valid, because @@
>> -3842,7 +3846,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct
>> smap *other_config)
>>      if (insert_min != cur_min) {
>>          atomic_store_relaxed(&dp->emc_insert_min, insert_min);
>>          if (insert_min == 0) {
>> -            VLOG_INFO("EMC has been disabled");
>> +            VLOG_INFO("EMC insertion probability changed to zero");
>>          } else {
>>              VLOG_INFO("EMC insertion probability changed to 1/%llu
>> (~%.2f%%)",
>>                        insert_prob, (100 / (float)insert_prob)); @@ -
>> 3965,6 +3969,27 @@ exit:
>>      return error;
>>  }
>>
>> +/* Returns 'true' if one of the 'port's RX queues exists in 'poll_list'
>> + * of given PMD thread. */
>> +static bool
>> +dpif_netdev_pmd_polls_port(struct dp_netdev_pmd_thread *pmd,
>> +                           struct dp_netdev_port *port)
>> +    OVS_EXCLUDED(pmd->port_mutex)
>> +{
>> +    struct rxq_poll *poll;
>> +    bool found = false;
>> +
>> +    ovs_mutex_lock(&pmd->port_mutex);
>> +    HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
>> +        if (port == poll->rxq->port) {
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +    ovs_mutex_unlock(&pmd->port_mutex);
>> +    return found;
>> +}
>> +
>>  /* Changes the affinity of port's rx queues.  The changes are actually
>> applied
>>   * in dpif_netdev_run(). */
>>  static int
>> @@ -3975,10 +4000,33 @@ dpif_netdev_port_set_config(struct dpif *dpif,
>> odp_port_t port_no,
>>      struct dp_netdev_port *port;
>>      int error = 0;
>>      const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
>> +    bool emc_enabled = smap_get_bool(cfg, "emc-enable", true);
>>
>>      ovs_mutex_lock(&dp->port_mutex);
>>      error = get_port_by_number(dp, port_no, &port);
>> -    if (error || !netdev_is_pmd(port->netdev)
>> +    if (error) {
>> +        goto unlock;
>> +    }
>> +
>> +    if (emc_enabled != port->emc_enabled) {
>> +        struct dp_netdev_pmd_thread *pmd;
>> +
>> +        port->emc_enabled = emc_enabled;
>> +        /* Mark for reload all the threads that polls this port and
>> request
>> +         * for reconfiguration for the actual reloading of threads. */
>> +        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
>> +            if (dpif_netdev_pmd_polls_port(pmd, port)) {
>> +                pmd->need_reload = true;
>> +            }
>> +        }
>> +        dp_netdev_request_reconfigure(dp);
>> +
>> +        VLOG_INFO("%s: EMC has been %s", netdev_get_name(port->netdev),
>> +                  (emc_enabled) ? "enabled" : "disabled");
>> +    }
>> +
>> +    /* Checking for RXq affinity changes. */
>> +    if (!netdev_is_pmd(port->netdev)
>>          || nullable_string_is_equal(affinity_list, port-
>>> rxq_affinity_list)) {
>>          goto unlock;
>>      }
>> @@ -5123,6 +5171,13 @@ dpif_netdev_run(struct dpif *dpif)
>>              if (!netdev_is_pmd(port->netdev)) {
>>                  int i;
>>
>> +                if (port->emc_enabled) {
>> +                    atomic_read_relaxed(&dp->emc_insert_min,
>> +                                        &non_pmd->ctx.emc_insert_min);
>> +                } else {
>> +                    non_pmd->ctx.emc_insert_min = 0;
>> +                }
>> +
>>                  for (i = 0; i < port->n_rxq; i++) {
>>                      if (dp_netdev_process_rxq_port(non_pmd,
>>                                                     &port->rxqs[i], @@ -
>> 5296,6 +5351,7 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread
>> *pmd,
>>      HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
>>          poll_list[i].rxq = poll->rxq;
>>          poll_list[i].port_no = poll->rxq->port->port_no;
>> +        poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
>>          i++;
>>      }
>>
>> @@ -5360,6 +5416,14 @@ reload:
>>          pmd_perf_start_iteration(s);
>>
>>          for (i = 0; i < poll_cnt; i++) {
>> +
>> +            if (poll_list[i].emc_enabled) {
>> +                atomic_read_relaxed(&pmd->dp->emc_insert_min,
>> +                                    &pmd->ctx.emc_insert_min);
>> +            } else {
>> +                pmd->ctx.emc_insert_min = 0;
>> +            }
>> +
>>              process_packets =
>>                  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq,
>>                                             poll_list[i].port_no); @@ -
>> 6301,7 +6365,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>      struct dfc_cache *cache = &pmd->flow_cache;
>>      struct dp_packet *packet;
>>      const size_t cnt = dp_packet_batch_size(packets_);
>> -    uint32_t cur_min;
>> +    uint32_t cur_min = pmd->ctx.emc_insert_min;
>>      int i;
>>      uint16_t tcp_flags;
>>      bool smc_enable_db;
>> @@ -6309,7 +6373,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>      bool batch_enable = true;
>>
>>      atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db);
>> -    atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
>>      pmd_perf_update_counter(&pmd->perf_stats,
>>                              md_is_valid ? PMD_STAT_RECIRC :
>> PMD_STAT_RECV,
>>                              cnt);
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
>> d58f63228..88edb5d35 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3101,6 +3101,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>        </column>
>>      </group>
>>
>> +    <group title="EMC (Exact Match Cache) Configuration">
>> +      <p>
>> +        These settings controls behaviour of EMC lookups/insertions for
>> packets
>> +        received from the interface.
>> +      </p>
>> +
>> +      <column name="other_config" key="emc-enable" type='{"type":
>> "boolean"}'>
>> +        <p>
>> +          Specifies if Exact Match Cache (EMC) should be used while
>> processing
>> +          packets received from this interface.
>> +          If true, <ref table="Open_vSwitch" column="other_config"
>> +          key="emc-insert-inv-prob"/> will have effect on this interface.
>> +        </p>
>> +        <p>
>> +          Defaults to true.
>> +        </p>
>> +      </column>
>> +    </group>
>> +
>>      <group title="MTU">
>>        <p>
>>          The MTU (maximum transmission unit) is the largest amount of data
>> --
>> 2.17.1
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to