Hi Kevin and Ian.
Comments inline.

Best regards, Ilya Maximets.

On 17.01.2019 19:50, Stokes, Ian wrote:
>> On 01/16/2019 01:05 PM, Ilya Maximets wrote:
>>> Conditional EMC insert helps a lot in scenarios with high numbers of
>>> parallel flows, but in current implementation this option affects all
>>> the threads and ports at once. There are scenarios where we have
>>> different number of flows on different ports. For example, if one of
>>> the VMs encapsulates traffic using additional headers, it will receive
>>> large number of flows but only few flows will come out of this VM. In
>>> this scenario it's much faster to use EMC instead of classifier for
>>> traffic from the VM, but it's better to disable EMC for the traffic
>>> which flows to VM.
>>>
>>> To handle above issue introduced 'emc-enable' configurable to
>>> enable/disable EMC on a per-port basis. Ex.:
>>>
>>>   ovs-vsctl set interface dpdk0 other_config:emc-enable=false
>>>
>>> EMC probability kept as is and it works for all the ports with
>>> 'emc-enable=true'.
>>>
>>
>> Hi Ilya, just a couple of minor comments below,
> 
> Hi Ilya, I agree with what Kevins flagged below, one other comment from 
> myself.
> 
> (BTW I still have to do some testing on this but the code looks good so far.)
> 
>>
>>> Signed-off-by: Ilya Maximets <[email protected]>
>>> ---
>>>
>>> Version 3:
>>>     * Minor rebase on current master.
>>>
>>> Version 2:
>>>     * The main concern was about backward compatibility. Also, there
>>>       is no real profit having the per-port probability value.
>>>       So, per-port probability switched to per-port 'emc-enable'
>>>       configurable.
>>>       Global probability kept back and can be used without any changes.
>>>
>>> It's been a while since the first version. It's available here:
>>>     https://patchwork.ozlabs.org/patch/800277/
>>>
>>>  Documentation/topics/dpdk/bridge.rst |  4 ++
>>>  NEWS                                 |  5 +-
>>>  lib/dpif-netdev.c                    | 79 +++++++++++++++++++++++++---
>>>  vswitchd/vswitch.xml                 | 19 +++++++
>>>  4 files changed, 98 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/topics/dpdk/bridge.rst
>>> b/Documentation/topics/dpdk/bridge.rst
>>> index 82bdad840..2fcd86607 100644
>>> --- a/Documentation/topics/dpdk/bridge.rst
>>> +++ b/Documentation/topics/dpdk/bridge.rst
>>> @@ -101,6 +101,10 @@ observed with pmd stats::
>>>  For certain traffic profiles with many parallel flows, it's
>>> recommended to set  ``N`` to '0' to achieve higher forwarding
>> performance.
>>>
>>> +It is also possible to enable/disable EMC on per-port basis using::
>>> +
>>> +    $ ovs-vsctl set interface <iface>
>>> + other_config:emc-enable={true,false}
>>> +
>>>  For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> 
> It would be good to give a short description of why/when this feature should 
> be used.
> The description of the use case you give in the commit message would even do.
> 
> As more 'tunable' features like this are added I think it's good for 
> reference so 
> users understand the motivation for the feature as they may not check commit 
> message.

OK.

> 
> Ian
> 
>>>
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 4618cc0a0..7826578d3 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -24,10 +24,13 @@ Post-v2.10.0
>>>           allocated dynamically using the following syntax:
>>>             ovn-nbctl lsp-set-addresses <port> "dynamic <IP>"
>>>     - DPDK:
>>> +     * Add support for DPDK 18.11
>>> +   - Userspace datapath:
>>>       * Add option for simple round-robin based Rxq to PMD assignment.
>>>         It can be set with pmd-rxq-assign.
>>> -     * Add support for DPDK 18.11
>>>       * Add support for Auto load balancing of PMDs (experimental)
>>> +     * Added new per-port configurable option to manage EMC:
>>> +       'other_config:emc-enable'.
>>>     - Add 'symmetric_l3' hash function.
>>>     - OVS now honors 'updelay' and 'downdelay' for bonds with LACP
>> configured.
>>>     - ovs-vswitchd:
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>> be529b6b0..6704be400 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -474,6 +474,7 @@ struct dp_netdev_port {
>>>      unsigned n_rxq;             /* Number of elements in 'rxqs' */
>>>      unsigned *txq_used;         /* Number of threads that use each tx
>> queue. */
>>>      struct ovs_mutex txq_used_mutex;
>>> +    bool emc_enabled;           /* If true EMC will be used. */
>>>      char *type;                 /* Port type as requested by user. */
>>>      char *rxq_affinity_list;    /* Requested affinity of rx queues. */
>>>  };
>>> @@ -588,6 +589,7 @@ static void dp_netdev_actions_free(struct
>>> dp_netdev_actions *);  struct polled_queue {
>>>      struct dp_netdev_rxq *rxq;
>>>      odp_port_t port_no;
>>> +    bool emc_enabled;
>>>  };
>>>
>>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
>>> @@ -617,6 +619,8 @@ struct dp_netdev_pmd_thread_ctx {
>>>      long long now;
>>>      /* RX queue from which last packet was received. */
>>>      struct dp_netdev_rxq *last_rxq;
>>> +    /* EMC insertion probability context for the current processing
>> cycle. */
>>> +    uint32_t emc_insert_min;
>>>  };
>>>
>>>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to
>>> eliminate @@ -1798,6 +1802,7 @@ port_create(const char *devname, const
>> char *type,
>>>      port->netdev = netdev;
>>>      port->type = xstrdup(type);
>>>      port->sf = sf;
>>> +    port->emc_enabled = true;
>>>      port->need_reconfigure = true;
>>>      ovs_mutex_init(&port->txq_used_mutex);
>>>
>>> @@ -2830,9 +2835,7 @@ emc_probabilistic_insert(struct
>> dp_netdev_pmd_thread *pmd,
>>>       * default the value is UINT32_MAX / 100 which yields an insertion
>>>       * probability of 1/100 ie. 1% */
>>>
>>> -    uint32_t min;
>>> -
>>> -    atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
>>> +    uint32_t min = pmd->ctx.emc_insert_min;
>>>
>>>      if (min && random_uint32() <= min) {
>>>          emc_insert(&(pmd->flow_cache).emc_cache, key, flow); @@
>>> -3698,7 +3701,8 @@ dpif_netdev_execute(struct dpif *dpif, struct
>> dpif_execute *execute)
>>>          ovs_mutex_lock(&dp->non_pmd_mutex);
>>>      }
>>>
>>> -    /* Update current time in PMD context. */
>>> +    /* Update current time in PMD context. We don't care about EMC
>> insertion
>>> +     * probability, because we are on a slow path. */
>>>      pmd_thread_ctx_time_update(pmd);
>>>
>>>      /* The action processing expects the RSS hash to be valid,
>>> because @@ -3842,7 +3846,7 @@ dpif_netdev_set_config(struct dpif *dpif,
>> const struct smap *other_config)
>>>      if (insert_min != cur_min) {
>>>          atomic_store_relaxed(&dp->emc_insert_min, insert_min);
>>>          if (insert_min == 0) {
>>> -            VLOG_INFO("EMC has been disabled");
>>> +            VLOG_INFO("EMC insertion probability changed to zero");
>>>          } else {
>>>              VLOG_INFO("EMC insertion probability changed to 1/%llu
>> (~%.2f%%)",
>>>                        insert_prob, (100 / (float)insert_prob)); @@
>>> -3965,6 +3969,27 @@ exit:
>>>      return error;
>>>  }
>>>
>>> +/* Returns 'true' if one of the 'port's RX queues exists in 'poll_list'
>>> + * of given PMD thread. */
>>> +static bool
>>> +dpif_netdev_pmd_polls_port(struct dp_netdev_pmd_thread *pmd,
>>> +                           struct dp_netdev_port *port)
>>> +    OVS_EXCLUDED(pmd->port_mutex)
>>> +{
>>> +    struct rxq_poll *poll;
>>> +    bool found = false;
>>> +
>>> +    ovs_mutex_lock(&pmd->port_mutex);
>>> +    HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
>>> +        if (port == poll->rxq->port) {
>>> +            found = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +    ovs_mutex_unlock(&pmd->port_mutex);
>>> +    return found;
>>> +}
>>> +
>>>  /* Changes the affinity of port's rx queues.  The changes are actually
>> applied
>>>   * in dpif_netdev_run(). */
>>
>> The comment is a bit stale now.

Oh. Thanks. I'll update it.

>>
>>>  static int
>>> @@ -3975,10 +4000,33 @@ dpif_netdev_port_set_config(struct dpif *dpif,
>> odp_port_t port_no,
>>>      struct dp_netdev_port *port;
>>>      int error = 0;
>>>      const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
>>> +    bool emc_enabled = smap_get_bool(cfg, "emc-enable", true);
>>>
>>>      ovs_mutex_lock(&dp->port_mutex);
>>>      error = get_port_by_number(dp, port_no, &port);
>>> -    if (error || !netdev_is_pmd(port->netdev)
>>> +    if (error) {
>>> +        goto unlock;
>>> +    }
>>> +
>>> +    if (emc_enabled != port->emc_enabled) {
>>> +        struct dp_netdev_pmd_thread *pmd;
>>> +
>>> +        port->emc_enabled = emc_enabled;
>>> +        /* Mark for reload all the threads that polls this port and
>> request
>>> +         * for reconfiguration for the actual reloading of threads. */
>>> +        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
>>> +            if (dpif_netdev_pmd_polls_port(pmd, port)) {
>>> +                pmd->need_reload = true;
>>> +            }
>>> +        }
>>> +        dp_netdev_request_reconfigure(dp);
>>> +
>>> +        VLOG_INFO("%s: EMC has been %s", netdev_get_name(port->netdev),
>>> +                  (emc_enabled) ? "enabled" : "disabled");
>>
>> I think it would be good idea to add a similar log as
>> dpif_netdev_set_config() for 'emc-insert-inv-prob' when emc_enabled gets
>> enabled here as a reminder for the user. Otherwise, a user will see 'EMC
>> has been enabled' which is not incorrect, but perhaps a long time ago
>> 'emc-insert-inv-prob' was set to 0, and there is confusion about whether
>> the EMC will actually be used now or not.

OK. I'm thinking about adding something like:
   port: EMC has been enabled. Current insertion probability is %{value}.

Disabled case will remain the same:
   port: EMC has been disabled.

I'll update the patch tomorrow morning. It's already too late today.

>>
>>> +    }
>>> +
>>> +    /* Checking for RXq affinity changes. */
>>> +    if (!netdev_is_pmd(port->netdev)
>>>          || nullable_string_is_equal(affinity_list, port-
>>> rxq_affinity_list)) {
>>>          goto unlock;
>>>      }
>>> @@ -5123,6 +5171,13 @@ dpif_netdev_run(struct dpif *dpif)
>>>              if (!netdev_is_pmd(port->netdev)) {
>>>                  int i;
>>>
>>> +                if (port->emc_enabled) {
>>> +                    atomic_read_relaxed(&dp->emc_insert_min,
>>> +                                        &non_pmd->ctx.emc_insert_min);
>>> +                } else {
>>> +                    non_pmd->ctx.emc_insert_min = 0;
>>> +                }
>>> +
>>>                  for (i = 0; i < port->n_rxq; i++) {
>>>                      if (dp_netdev_process_rxq_port(non_pmd,
>>>                                                     &port->rxqs[i], @@
>>> -5296,6 +5351,7 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread
>> *pmd,
>>>      HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
>>>          poll_list[i].rxq = poll->rxq;
>>>          poll_list[i].port_no = poll->rxq->port->port_no;
>>> +        poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
>>>          i++;
>>>      }
>>>
>>> @@ -5360,6 +5416,14 @@ reload:
>>>          pmd_perf_start_iteration(s);
>>>
>>>          for (i = 0; i < poll_cnt; i++) {
>>> +
>>> +            if (poll_list[i].emc_enabled) {
>>> +                atomic_read_relaxed(&pmd->dp->emc_insert_min,
>>> +                                    &pmd->ctx.emc_insert_min);
>>> +            } else {
>>> +                pmd->ctx.emc_insert_min = 0;
>>> +            }
>>> +
>>>              process_packets =
>>>                  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq,
>>>                                             poll_list[i].port_no); @@
>>> -6301,7 +6365,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>>      struct dfc_cache *cache = &pmd->flow_cache;
>>>      struct dp_packet *packet;
>>>      const size_t cnt = dp_packet_batch_size(packets_);
>>> -    uint32_t cur_min;
>>> +    uint32_t cur_min = pmd->ctx.emc_insert_min;
>>>      int i;
>>>      uint16_t tcp_flags;
>>>      bool smc_enable_db;
>>> @@ -6309,7 +6373,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>>      bool batch_enable = true;
>>>
>>>      atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db);
>>> -    atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
>>>      pmd_perf_update_counter(&pmd->perf_stats,
>>>                              md_is_valid ? PMD_STAT_RECIRC :
>> PMD_STAT_RECV,
>>>                              cnt);
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
>>> d58f63228..88edb5d35 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -3101,6 +3101,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>>        </column>
>>>      </group>
>>>
>>> +    <group title="EMC (Exact Match Cache) Configuration">
>>> +      <p>
>>> +        These settings controls behaviour of EMC lookups/insertions for
>> packets
>>> +        received from the interface.
>>> +      </p>
>>> +
>>> +      <column name="other_config" key="emc-enable" type='{"type":
>> "boolean"}'>
>>> +        <p>
>>> +          Specifies if Exact Match Cache (EMC) should be used while
>> processing
>>> +          packets received from this interface.
>>> +          If true, <ref table="Open_vSwitch" column="other_config"
>>> +          key="emc-insert-inv-prob"/> will have effect on this
>> interface.
>>> +        </p>
>>> +        <p>
>>> +          Defaults to true.
>>> +        </p>
>>> +      </column>
>>> +    </group>
>>> +
>>>      <group title="MTU">
>>>        <p>
>>>          The MTU (maximum transmission unit) is the largest amount of
>>> data
>>>
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to