Hi,

2nd attempt. Does anyone have any comments on this add-on to upcalls?

Thanx
Manu

On 17/11/17, 11:17 AM, "Manohar Krishnappa Chidambaraswamy" 
<[email protected]> wrote:

    Hi,
    
    Does anyone have any comments on this add-on to upcalls?
    
    Thanx
    Manu
    
    On 10/11/17, 7:06 PM, "[email protected] on behalf of Manohar 
Krishnappa Chidambaraswamy" <[email protected] on behalf of 
[email protected]> wrote:
    
        In OVS-DPDK, both fast-path and slow-path execute in the context of a 
common
        thread (i.e, PMD thread), without any partitioning of CPU cycles 
between the
        two. When there is a burst of new flows coming into the data-path, 
packets
        are punted to slow-path in the order they are received and the PMD is 
busy
        for the duration of the upcall. Slow-path processing of a packet 
consumes
        100-200 times the cycles of fast-path handling. As a result, the 
forwarding
        performance of a PMD degrades significantly during an upcall burst. If 
the
        PMD was highly loaded already, it becomes temporarily overloaded and 
its rx
        queues start filling up. If the upcall burst is long enough, packets 
will be
        dropped when rx queues are full. This happens even if the new flows are
        unexpected and the slow-path decides to drop the packets.
        
        It is likely that most of the packets dropped due to rx queue overflow 
belong
        to established flows that should have been processed by the fast-path. 
Hence,
        the current OVS-DPDK architecture favors the handling of new flows over 
the
        forwarding of established flows. This is generally a sub-optimal 
approach.
        
        Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS 
attacks.
        But even sporadic bursts of e.g. unexpected multicast packets have 
shown to
        cause such packet drops.
        
        Proposed solution:
        ------------------
        This patch implements a mechanism to limit the rate of packets going 
into
        slow-path from fast-path in OVS-DPDK mode. A simple token bucket 
policer per
        packet processing thread (either PMD or non-PMD thread) restricts the 
flow
        of packets funneling from fast-path into slow-path, as shown below. So 
for
        each PMD (or non-PMD) thread, there is a limited amount of cycles 
carved out
        for its slow-path. This Upcall Policer allows only configured number of
        packets per second (pps) into handle_packet_upcall() which identifies 
the
        start of slow-path in OVS-DPDK. A packet entering slow-path has to take 
a
        token to get into slow-path and if no tokens are available, the packet 
is
        dropped and accounted per-PMD (or thread) under
        "ovs-appctl dpif-netdev/pmd-stats-show".
        
        pmd thread numa_id 0 core_id 2:
                emc hits:0
                megaflow hits:0
                avg. subtable lookups per hit:0.00
                miss:287572
                rate limit drops:xxxxx <<<<<<<<<<<<<
                lost:0
                idle cycles:14925072116 (43.81%)
                processing cycles:19140112904 (56.19%)
                avg cycles per packet: 118457.93 (34065185020/287572)
                avg processing cycles per packet: 66557.64 (19140112904/287572)
        NOTE: This is a sample output and may need adaptation based on new
        changes/proposals (if any) in progress.
        
        The upcall policer can be enabled and configured with the following
        parameters in the "Open_vSwitch" table as new items under other_config.
        These values are common for all packet processing threads (PMD or 
non-PMD),
        with each thread using the same configured value independently.
        
        1. ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
            - Global knob to enable/disable upcall ratelimiting for all 
non-PMD/PMD
              threads.
        2. ovs-vsctl set Open_vSwitch . other_config:upcall-rate=xxx
            - xxx is in packets per second (pps). This determines the token 
bucket's
              fill-rate.
        3. ovs-vsctl set Open_vSwitch . other_config:upcall-burst=xxx
            - xxx is the maximum burst of packets allowed at any time. This 
determines
              the token bucket's burst size.
        
        By default, this feature is disabled for backward compatibility and 
needs to be
        explicitly enabled (via global knob upcall-rl shown above). When 
enabled,
        default values (that are derived based on typical slow-path and 
fast-path
        performance measurements) will be used during init and can be 
overridden by the
        above knob/commands. Configured values for rate and burst would take 
effect only
        when the feature is enabled.
        
        The patch is based on an existing token bucket implementation in OVS.
        Signed-off-by: Manohar K C 
<[email protected]>
        CC: Jan Scheurich [email protected]
        ---
         Documentation/howto/dpdk.rst | 21 +++++++++++
         lib/dpif-netdev.c            | 85 
+++++++++++++++++++++++++++++++++++++++++---
         vswitchd/vswitch.xml         | 47 ++++++++++++++++++++++++
         3 files changed, 148 insertions(+), 5 deletions(-)
        
        diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
        index d123819..2cd0209 100644
        --- a/Documentation/howto/dpdk.rst
        +++ b/Documentation/howto/dpdk.rst
        @@ -709,3 +709,24 @@ devices to bridge ``br0``. Once complete, follow 
the below steps:
            Check traffic on multiple queues::
        
                $ cat /proc/interrupts | grep virtio
        +
        +Upcall rate limiting
        +--------------------
        +ovs-vsctl can be used to enable and configure upcall rate limit 
parameters.
        +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` 
which
        +take effect when global enable knob ``upcall-rl`` is set to true.
        +
        +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
        +example::
        +
        +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
        +
        +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. 
For
        +example::
        +
        +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
        +
        +Upcall ratelimit feature should be globally enabled using 
``upcall-rl``. For
        +example::
        +
        +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
        diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
        index 599308d..b26fbbb 100644
        --- a/lib/dpif-netdev.c
        +++ b/lib/dpif-netdev.c
        @@ -100,6 +100,16 @@ static struct shash dp_netdevs 
OVS_GUARDED_BY(dp_netdev_mutex)
        
         static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 
600);
        
        +/* Upcall rate-limit parameters */
        +static bool upcall_ratelimit;
        +static unsigned int upcall_rate;
        +static unsigned int upcall_burst;
        +
        +/* TODO: Tune these default values based on upcall perf test */
        +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
        +#define UPCALL_RATE_DEFAULT      1000  /* pps */
        +#define UPCALL_BURST_DEFAULT     1000  /* pps */
        +
         #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | 
CS_RELATED \
                                              | CS_INVALID | CS_REPLY_DIR | 
CS_TRACKED \
                                              | CS_SRC_NAT | CS_DST_NAT)
        @@ -337,6 +347,7 @@ enum dp_stat_type {
             DP_STAT_LOST,               /* Packets not passed up to the 
client. */
             DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow 
table
                                            hits */
        +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall 
policer */
             DP_N_STATS
         };
        
        @@ -651,6 +662,11 @@ struct dp_netdev_pmd_thread {
                 uint64_t cycles_zero[PMD_N_CYCLES];
                 /* 8 pad bytes. */
             );
        +
        +    PADDED_MEMBERS(CACHE_LINE_SIZE,
        +        /* Policer to rate limit slow-path */
        +        struct token_bucket upcall_tb;
        +    );
         };
        
         /* Interface to netdev-based datapath. */
        @@ -864,12 +880,13 @@ pmd_info_show_stats(struct ds *reply,
             ds_put_format(reply,
                           "\temc hits:%llu\n\tmegaflow hits:%llu\n"
                           "\tavg. subtable lookups per hit:%.2f\n"
        -                  "\tmiss:%llu\n\tlost:%llu\n",
        +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit drops:%llu\n",
                           stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
                           stats[DP_STAT_MASKED_HIT] > 0
                           ? 
(1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
                           : 0,
        -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
        +                  stats[DP_STAT_MISS], stats[DP_STAT_LOST],
        +                  stats[DP_STAT_RATELIMIT_DROP]);
        
             if (total_cycles == 0) {
                 return;
        @@ -2996,6 +3013,8 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)
                 smap_get_ullong(other_config, "emc-insert-inv-prob",
                                 DEFAULT_EM_FLOW_INSERT_INV_PROB);
             uint32_t insert_min, cur_min;
        +    unsigned int rate, burst;
        +    bool ratelimit;
        
             if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
                 free(dp->pmd_cmask);
        @@ -3021,6 +3040,36 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)
                 }
             }
        
        +    /* Handle upcall policer params */
        +    ratelimit = smap_get_bool(other_config, "upcall-rl",
        +                              UPCALL_RATELIMIT_DEFAULT);
        +    rate = smap_get_int(other_config, "upcall-rate",
        +                        UPCALL_RATE_DEFAULT);
        +    burst = smap_get_int(other_config, "upcall-burst",
        +                         UPCALL_BURST_DEFAULT);
        +
        +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
        +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d 
burst=%d "
        +                  ": New - rate=%d burst=%d\n", upcall_rate, 
upcall_burst,
        +                  rate, burst);
        +
        +        upcall_rate = rate;
        +        upcall_burst = burst;
        +
        +        /*
        +         * TODO: See if there is a way to reconfig only the policer
        +         * in each PMD.
        +         */
        +        dp_netdev_request_reconfigure(dp);
        +    }
        +
        +    if (ratelimit != upcall_ratelimit) {
        +        upcall_ratelimit = ratelimit;
        +
        +        VLOG_INFO("Upcall ratelimit changed to %s\n",
        +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
        +    }
        +
             return 0;
         }
        @@ -3879,6 +3928,12 @@ dpif_netdev_run(struct dpif *dpif)
             ovs_mutex_lock(&dp->port_mutex);
             non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
             if (non_pmd) {
        +        /* Reconfig the upcall policer if params have changed */
        +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
        +            (upcall_burst != non_pmd->upcall_tb.burst)) {
        +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, 
upcall_burst);
        +        }
        +
                 ovs_mutex_lock(&dp->non_pmd_mutex);
                 cycles_count_start(non_pmd);
                 HMAP_FOR_EACH (port, node, &dp->ports) {
        @@ -4074,6 +4129,9 @@ reload:
                 lc = UINT_MAX;
             }
        
        +    /* Initialize upcall policer token bucket with configured params */
        +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
        +
             cycles_count_start(pmd);
             for (;;) {
                 for (i = 0; i < poll_cnt; i++) {
        @@ -4554,6 +4612,10 @@ dp_netdev_configure_pmd(struct 
dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
                 emc_cache_init(&pmd->flow_cache);
                 pmd_alloc_static_tx_qid(pmd);
             }
        +
        +    /* Initialize upcall policer token bucket with configured params */
        +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
        +
             cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, 
&pmd->node),
                         hash_int(core_id, 0));
         }
        @@ -4992,7 +5054,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread 
*pmd,
                              struct dp_packet *packet,
                              const struct netdev_flow_key *key,
                              struct ofpbuf *actions, struct ofpbuf 
*put_actions,
        -                     int *lost_cnt, long long now)
        +                     int *lost_cnt, int *rl_drop_cnt, long long now)
         {
             struct ofpbuf *add_actions;
             struct dp_packet_batch b;
        @@ -5000,6 +5062,18 @@ handle_packet_upcall(struct dp_netdev_pmd_thread 
*pmd,
             ovs_u128 ufid;
             int error;
        
        +    /*
        +     * Grab a token from the upcall policer to enter slowpath. If token
        +     * is not available, drop and account the packet. This is to
        +     * rate-limit packets getting into slowpath.
        +     */
        +    if (upcall_ratelimit && !token_bucket_withdraw(&pmd->upcall_tb, 
1)) {
        +        dp_packet_delete(packet);
        +        (*rl_drop_cnt)++;
        +
        +        return;
        +    }
        +
             match.tun_md.valid = false;
             miniflow_expand(&key->mf, &match.flow);
        
        @@ -5074,7 +5148,7 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
             struct dpcls *cls;
             struct dpcls_rule *rules[PKT_ARRAY_SIZE];
             struct dp_netdev *dp = pmd->dp;
        -    int miss_cnt = 0, lost_cnt = 0;
        +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
             int lookup_cnt = 0, add_lookup_cnt;
             bool any_miss;
             size_t i;
        @@ -5118,7 +5192,7 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
        
                     miss_cnt++;
                     handle_packet_upcall(pmd, packet, &keys[i], &actions,
        -                                 &put_actions, &lost_cnt, now);
        +                                 &put_actions, &lost_cnt, 
&rl_drop_cnt, now);
                 }
        
                 ofpbuf_uninit(&actions);
        @@ -5151,6 +5225,7 @@ fast_path_processing(struct dp_netdev_pmd_thread 
*pmd,
             dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
             dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
             dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
        +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, rl_drop_cnt);
         }
        
         /* Packets enter the datapath from a port (or from recirculation) here.
        diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
        index c145e1a..3d86367 100644
        --- a/vswitchd/vswitch.xml
        +++ b/vswitchd/vswitch.xml
        @@ -397,6 +397,53 @@
                 </p>
               </column>
        
        +      <column name="other_config" key="upcall-rl"
        +              type='{"type": "boolean"}'>
        +        <p>
        +          Set this value to <code>true</code> to enable upcall 
rate-limiting.
        +          The upcall parameters like rate and burst will be ignored, 
if this is
        +          not set.
        +        </p>
        +        <p>
        +          The default value is <code>false</code> and upcall 
rate-limiting will
        +          be disabled.
        +        </p>
        +      </column>
        +
        +      <column name="other_config" key="upcall-rate"
        +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
        +        <p>
        +          Specifies the rate of upcalls in packets-per-second that is 
to be
        +          allowed. For example, if the value is 10000, then those many 
upcalls
        +          (for packets) are allowed per second in each of the packet 
polling
        +          thread (PMD or non-PMD).
        +        </p>
        +        <p>
        +          A value of <code>0</code> means, no upcalls would be allowed 
i.e,
        +          upcall will be disabled. This is mainly for debugging.
        +        </p>
        +        <p>
        +          The default value is 1000.
        +        </p>
        +      </column>
        +
        +      <column name="other_config" key="upcall-burst"
        +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
        +        <p>
        +          Specifies the maximum burst of upcalls in packets-per-second 
that is
        +          to be allowed. For example, if the value is 15000, then a 
maximum
        +          burst of 15000 upcalls (for packets) are allowed per second 
in each
        +          of the packet polling thread (PMD or non-PMD).
        +        </p>
        +        <p>
        +          A value of <code>0</code> means, no upcalls would be allowed 
i.e,
        +          upcall will be disabled. This is mainly for debugging.
        +        </p>
        +        <p>
        +          The default value is 1000.
        +        </p>
        +      </column>
        +
               <column name="other_config" key="vlan-limit"
                       type='{"type": "integer", "minInteger": 0}'>
                 <p>
        --
        1.9.1
        
        _______________________________________________
        dev mailing list
        [email protected]
        https://mail.openvswitch.org/mailman/listinfo/ovs-dev
        
    
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to