Jan,

Thanx for your comments. Please see inline. I will send v4 patch with your 
comments addressed.

Thanx
Manu

On 09/05/18, 5:26 AM, "Jan Scheurich" <[email protected]> wrote:

    Hi Manu,
    
    Thanks for working on this. Two general comments:
    
    1. Is there a chance to add unit test cases for this feature? I know it 
might be difficult due to the real-time character, but perhaps using very low 
parameter values?

[manu] I looked at it but felt it won't be needed as its rate based and not 
that much useful.
    
    2. I believe the number RL-dropped packets must be accounted for in 
function dpif_netdev_get_stats() as stats->n_missed, otherwise the overall 
number of reported packets may not match the total number of packets processed.

[manu] Thanx. Will fix it.
    
    Other comments in-line.
    
    Regards, Jan
    
    > From: Manohar Krishnappa Chidambaraswamy
    > Sent: Monday, 07 May, 2018 12:45
    > 
    > Hi
    > 
    > Rebased to master and adapted to the new dpif-netdev-perf counters.
    > As explained in v2 thread, OFPM_SLOWPATH meters cannot be used as is
    > for rate-limiting upcalls, hence reverted back to the simpler method
    > using token bucket.
    
    I guess the question was not whether to use meter actions in the datapath 
to implement the upcall rate limiter in dpif-netdev but whether to allow 
configuration of the upcall rate limiter through OpenFlow Meter Mod command for 
special pre-defined meter OFPM_SLOWPATH. In any case, the decision was to drop 
that idea.
    
    > 
    > Could you please review this patch?
    > 
    > Thanx
    > Manu
    > 
    > v2: https://patchwork.ozlabs.org/patch/860687/
    > v1: https://patchwork.ozlabs.org/patch/836737/
    
    Please add the list of main changes between the current and the previous 
version in the next revision of the patch. Put it below the '---' separator so 
that is not part of the commit message.
[manu] Sure ok.
    
    > 
    > Signed-off-by: Manohar K C
    > <[email protected]>
    > CC: Jan Scheurich <[email protected]>
    > ---
    >  Documentation/howto/dpdk.rst | 21 +++++++++++
    >  lib/dpif-netdev-perf.h       |  1 +
    >  lib/dpif-netdev.c            | 83 
++++++++++++++++++++++++++++++++++++++++----
    >  vswitchd/vswitch.xml         | 47 +++++++++++++++++++++++++
    >  4 files changed, 146 insertions(+), 6 deletions(-)
    > 
    > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
    > index 79b626c..bd1eaac 100644
    > --- a/Documentation/howto/dpdk.rst
    > +++ b/Documentation/howto/dpdk.rst
    > @@ -739,3 +739,24 @@ devices to bridge ``br0``. Once complete, follow the 
below steps:
    >     Check traffic on multiple queues::
    > 
    >         $ cat /proc/interrupts | grep virtio
    > +
    > +Upcall rate limiting
    > +--------------------
    > +ovs-vsctl can be used to enable and configure upcall rate limit 
parameters.
    > +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` 
which
    > +take effect when global enable knob ``upcall-rl`` is set to true.
    
    Please explain why upcall rate limiting may be relevant in the context of 
DPDK datapath (upcalls executed in the context of the PMD and affecting 
datapath forwarding capacity). Worth noting here, perhaps, that this rate 
limiting is independently per PMD and not a global limit.
    
    Replace "knob" by "configuration parameter"  and put the command to enable 
rate limiting:
     $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
    before the commands to tune the token bucket parameters. Mention the 
default parameter values?
[manu] Done.
    
    > +
    > +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
    > +
    > +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
    > +
    > +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. 
For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
    
    > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
    > index 5993c25..189213c 100644
    > --- a/lib/dpif-netdev-perf.h
    > +++ b/lib/dpif-netdev-perf.h
    > @@ -64,6 +64,7 @@ enum pmd_stat_type {
    >                               * recirculation. */
    >      PMD_STAT_SENT_PKTS,     /* Packets that have been sent. */
    >      PMD_STAT_SENT_BATCHES,  /* Number of batches sent. */
    > +    PMD_STAT_RATELIMIT_DROP,/* Packets dropped due to upcall policer. */
    
    Name PMD_STAT_RL_DROP and move up in list after PMD_STAT_LOST. Add space 
before comment.
[manu] Done
    
    >      PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
    >      PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
    >      PMD_N_STATS
    > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    > index be31fd0..eebab89 100644
    > --- a/lib/dpif-netdev.c
    > +++ b/lib/dpif-netdev.c
    > @@ -101,6 +101,16 @@ static struct shash dp_netdevs 
OVS_GUARDED_BY(dp_netdev_mutex)
    > 
    >  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
    > 
    > +/* Upcall rate-limit parameters */
    > +static bool upcall_ratelimit;
    > +static unsigned int upcall_rate;
    > +static unsigned int upcall_burst;
    
    I suggest to store these datapath config parameters in struct dp_netdev, 
analogous to emc_min_insert and tx_flush_interval. Use "atomic" reads when 
accessing the parameter values in dp_netdev.
[manu] Done
    
    > +
    > +/* These default values may be tuned based on upcall performance */
    > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
    > +#define UPCALL_RATE_DEFAULT      1000  /* pps */
    > +#define UPCALL_BURST_DEFAULT     1000  /* pps */
    
    The rate is in "upcalls per second". The burst size is not in PPS but 
"upcalls". 
    
    Have you tested these default parameters? What is the performance 
degradation if a PMD handles 1000 upcalls per second? I have seen typical 
upcall durations of 75K cycles, that would 75 million cycles or 30 us at 2.5GHz 
clock spent processing upcalls (not including any degradation due to increased 
risk for locking in the slow path if multiple PMDs have similar upcall rates). 
So the minimum impact should be 3%, probably more.
    
    With a burst of 1000 upcalls we have already seen packet drops earlier when 
running at forwarding rates close to the limit.
    
    Without having tested this specifically, I'd consider rate of 500 upcalls 
per second and a burst size of 500 upcalls slightly more appropriate as default 
values.
[manu] OK. I did some functionality tests. Since upcall processing times depend 
on SDN pipeline (translation complexity) and rate of new flows, I didn’t spend 
much effort in fine tuning the default values, as we need to evaluate/tune the 
rate limit parameters for each deployment separately if we need to enable this 
feature. I will change the default values to 500. I tried to measure the 
degradation of upcall path due to additional token-bucket lookup and it was not 
noticeable.
    
    > +
    >  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | 
CS_RELATED \
    >                                       | CS_INVALID | CS_REPLY_DIR | 
CS_TRACKED \
    >                                       | CS_SRC_NAT | CS_DST_NAT)
    > @@ -615,6 +625,9 @@ struct dp_netdev_pmd_thread {
    >      /* Keep track of detailed PMD performance statistics. */
    >      struct pmd_perf_stats perf_stats;
    > 
    > +    /* Policer to rate limit slow-path */
    > +    struct token_bucket upcall_tb;
    > +
    >      /* Set to true if the pmd thread needs to be reloaded. */
    >      bool need_reload;
    >  };
    > @@ -856,12 +869,13 @@ pmd_info_show_stats(struct ds *reply,
    >                    "\tavg. subtable lookups per megaflow hit: %.02f\n"
    >                    "\tmiss with success upcall: %"PRIu64"\n"
    >                    "\tmiss with failed upcall: %"PRIu64"\n"
    > +                  "\tmiss with ratelimit dropped upcall: %"PRIu64"\n"
    
    "miss with upcall dropped by rate limiter" or "miss with rate-limited 
upcall"?
[manu] OK. I changed it to "drops due to upcall ratelimit".
    
    >                    "\tavg. packets per output batch: %.02f\n",
    >                    total_packets, stats[PMD_STAT_RECIRC],
    >                    passes_per_pkt, stats[PMD_STAT_EXACT_HIT],
    >                    stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
    >                    stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
    > -                  packets_per_batch);
    > +                  stats[PMD_STAT_RATELIMIT_DROP], packets_per_batch);
    > 
    >      if (total_cycles == 0) {
    >          return;
    > @@ -2987,6 +3001,8 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)
    >                          DEFAULT_EM_FLOW_INSERT_INV_PROB);
    >      uint32_t insert_min, cur_min;
    >      uint32_t tx_flush_interval, cur_tx_flush_interval;
    > +    unsigned int rate, burst;
    > +    bool ratelimit;
    > 
    >      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
    >                                       DEFAULT_TX_FLUSH_INTERVAL);
    > @@ -3021,6 +3037,32 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)
    >          }
    >      }
    > 
    > +    /* Handle upcall policer params */
    > +    ratelimit = smap_get_bool(other_config, "upcall-rl",
    > +                              UPCALL_RATELIMIT_DEFAULT);
    > +    rate = smap_get_int(other_config, "upcall-rate",
    > +                        UPCALL_RATE_DEFAULT);
    > +    burst = smap_get_int(other_config, "upcall-burst",
    > +                         UPCALL_BURST_DEFAULT);
    > +
    > +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
    > +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d 
burst=%d "
    > +                  ": New - rate=%d burst=%d\n", upcall_rate, 
upcall_burst,
    > +                  rate, burst);
    
    Better: "rate limiter parameters"? For all other config parameters only the 
new value is logged here.
[manu] OK.
    
    > +
    > +        upcall_rate = rate;
    > +        upcall_burst = burst;
    > +
    > +        dp_netdev_request_reconfigure(dp);
    
    Reconfiguring the datapath is a heavy operation with potential traffic 
impact as all PMDs must be stopped. The other parameters emc_min_insert and 
tx_flush_interval are updated without stopping PMDs. But I guess the token 
bucket per PMD must be re-initialized atomically while the PMD is not polling.
    
    Furthermore, I think it is not sufficient to signal 
dp_netdev_request_reconfigure(). In reconfigure_datapath() you would also have 
to ensure that all PMDs are actually reloaded at least once when any of the 
token bucket parameters are changed. It would be sufficient to simply mark all 
PMDs with pmd->need_reload at the beginning of the function when needed. The 
first reload_affected_pmds() would then bring the RL config change into effect.
    
    The non-PMD thread requires a special handling. But since 
reconfigure_datapath() is executed in the main thread, there is no need for 
reload.
[manu] Ok. I had taken care of this in my first patch. I see that 
reload/re-config code has changed since then. I didn’t notice. Thanx. Will fix 
this.
    
    > +    }
    > +
    > +    if (ratelimit != upcall_ratelimit) {
    > +        upcall_ratelimit = ratelimit;
    > +
    > +        VLOG_INFO("Upcall ratelimit changed to %s\n",
    > +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
    > +    }
    
    Better "Upcall rate limiter enabled|disabled"? I think it is worth to also 
log the current parameter values.
[manu] OK.
    
    > +
    >      return 0;
    >  }
    > 
    > @@ -3932,6 +3974,12 @@ dpif_netdev_run(struct dpif *dpif)
    >      ovs_mutex_lock(&dp->port_mutex);
    >      non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
    >      if (non_pmd) {
    > +        /* Reconfig the upcall policer if params have changed */
    > +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
    > +            (upcall_burst != non_pmd->upcall_tb.burst)) {
    > +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, 
upcall_burst);
    > +        }
    > +
    
    The update of non-PMD token bucket parameter should also be done as part of 
reconfigure_datapath() not in here in the dpif_netdev forwarding code of the 
main thread.
[manu] OK. Moved to dp_netdev_reload_pmd__().
    
    >          ovs_mutex_lock(&dp->non_pmd_mutex);
    >          HMAP_FOR_EACH (port, node, &dp->ports) {
    >              if (!netdev_is_pmd(port->netdev)) {
    > @@ -4135,6 +4183,9 @@ reload:
    >          lc = UINT_MAX;
    >      }
    > 
    > +    /* Initialize upcall policer token bucket with configured params */
    > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    > +
    >      pmd->intrvl_tsc_prev = 0;
    >      atomic_store_relaxed(&pmd->intrvl_cycles, 0);
    >      cycles_counter_update(s);
    > @@ -4627,6 +4678,10 @@ dp_netdev_configure_pmd(struct 
dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
    >          emc_cache_init(&pmd->flow_cache);
    >          pmd_alloc_static_tx_qid(pmd);
    >      }
    > +
    > +    /* Initialize upcall policer token bucket with configured params */
    > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    > +
    >      pmd_perf_stats_init(&pmd->perf_stats);
    >      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, 
&pmd->node),
    >                  hash_int(core_id, 0));
    > @@ -5145,6 +5200,7 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
    >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
    >      struct dp_netdev *dp = pmd->dp;
    >      int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
    > +    int upcall_rl_drop_cnt = 0;
    >      int lookup_cnt = 0, add_lookup_cnt;
    >      bool any_miss;
    > 
    > @@ -5185,13 +5241,26 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
    >                  continue;
    >              }
    > 
    > -            int error = handle_packet_upcall(pmd, packet, &keys[i],
    > -                                             &actions, &put_actions);
    > +            /*
    > +             * Grab a token from the upcall policer to enter slowpath. 
If token
    > +             * is not available, drop and account the packet. This is to
    > +             * rate-limit packets getting into slowpath.
    > +             */
    > +            if (upcall_ratelimit &&
    > +                !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
    > +                dp_packet_delete(packet);
    > 
    > -            if (OVS_UNLIKELY(error)) {
    > -                upcall_fail_cnt++;
    > +                upcall_rl_drop_cnt++;
    >              } else {
    > -                upcall_ok_cnt++;
    > +
    > +                int error = handle_packet_upcall(pmd, packet, &keys[i],
    > +                                                 &actions, &put_actions);
    > +
    > +                if (OVS_UNLIKELY(error)) {
    > +                    upcall_fail_cnt++;
    > +                } else {
    > +                    upcall_ok_cnt++;
    > +                }
    >              }
    >          }
    > 
    > @@ -5228,6 +5297,8 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
    >                              upcall_ok_cnt);
    >      pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
    >                              upcall_fail_cnt);
    > +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RATELIMIT_DROP,
    > +                            upcall_rl_drop_cnt);
    >  }
    > 
    >  /* Packets enter the datapath from a port (or from recirculation) here.
    > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
    > index 9c2a826..e0098ad 100644
    > --- a/vswitchd/vswitch.xml
    > +++ b/vswitchd/vswitch.xml
    > @@ -428,6 +428,53 @@
    >          </p>
    >        </column>
    > 
    > +      <column name="other_config" key="upcall-rl"
    > +              type='{"type": "boolean"}'>
    > +        <p>
    > +          Set this value to <code>true</code> to enable upcall 
rate-limiting.
    > +          The upcall parameters like rate and burst will be ignored, if 
this is
    > +          not set.
    > +        </p>
    > +        <p>
    > +          The default value is <code>false</code> and upcall 
rate-limiting will
    > +          be disabled.
    > +        </p>
    > +      </column>
    > +
    > +      <column name="other_config" key="upcall-rate"
    > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
    > +        <p>
    > +          Specifies the rate of upcalls in packets-per-second that is to 
be
    > +          allowed. For example, if the value is 10000, then those many 
upcalls
    > +          (for packets) are allowed per second in each of the packet 
polling
    > +          thread (PMD or non-PMD).
    > +        </p>
    > +        <p>
    > +          A value of <code>0</code> means, no upcalls would be allowed 
i.e,
    > +          upcall will be disabled. This is mainly for debugging.
    > +        </p>
    > +        <p>
    > +          The default value is 1000.
    > +        </p>
    
    Proposal:
    Specifies the sustained rate of upcalls in upcalls per second that are 
admitted
    per packet-polling thread (PMD or non-PMD) in the netdev datapath when 
    upcall rate limiting is enabled. The default value is 500 upcalls per 
second.
    
    Setting both upcall-rate and upcall-burst to <code>0</code> disables upcalls
    In the netdev datapath. This can be used for debugging purposes.
[manu] OK.
    
    > +      </column>
    > +
    > +      <column name="other_config" key="upcall-burst"
    > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
    > +        <p>
    > +          Specifies the maximum burst of upcalls in packets-per-second 
that is
    > +          to be allowed. For example, if the value is 15000, then a 
maximum
    > +          burst of 15000 upcalls (for packets) are allowed per second in 
each
    > +          of the packet polling thread (PMD or non-PMD).
    > +        </p>
    > +        <p>
    > +          A value of <code>0</code> means, no upcalls would be allowed 
i.e,
    > +          upcall will be disabled. This is mainly for debugging.
    > +        </p>
    > +        <p>
    > +          The default value is 1000.
    > +        </p>
    
    Proposal:
    When upcall rate limiting is enabled, this specifies the maximum burst of
    upcalls that are admitted per packet-polling  thread (PMD or non-PMD) in 
    the netdev datapath independent from their packet arrival rate. 
    The default value is a burst of 500 upcalls.
    
    Setting both upcall-rate and upcall-burst to <code>0</code> disables upcalls
    In the netdev datapath. This can be used for debugging purposes.
[manu] OK.
    
    > +      </column>
    > +
    >        <column name="other_config" key="vlan-limit"
    >                type='{"type": "integer", "minInteger": 0}'>
    >          <p>
    > --
    > 1.9.1
    > 
    
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to