Hi,

This was one of the items discussed in the last OVS-DPDK public meeting 
(29-Nov). Considering that this change is passive (disabled by default and does 
not come in fast-path) and makes OVS (when enabled) more secure for potential 
DOS-attack, could this be considered for OVS 2.9?

In any case, could someone please review the patch?

Thanx
Manu

    On 17/11/17, 11:17 AM, "Manohar Krishnappa Chidambaraswamy" 
<manohar.krishnappa.chidambarasw...@ericsson.com> wrote:
    
        Hi,
        
        Does anyone have any comments on this add-on to upcalls?
        
        Thanx
        Manu
        
        On 10/11/17, 7:06 PM, "ovs-dev-boun...@openvswitch.org on behalf of 
Manohar Krishnappa Chidambaraswamy" <ovs-dev-boun...@openvswitch.org on behalf 
of manohar.krishnappa.chidambarasw...@ericsson.com> wrote:
        
            In OVS-DPDK, both fast-path and slow-path execute in the context of 
a common
            thread (i.e, PMD thread), without any partitioning of CPU cycles 
between the
            two. When there is a burst of new flows coming into the data-path, 
packets
            are punted to slow-path in the order they are received and the PMD 
is busy
            for the duration of the upcall. Slow-path processing of a packet 
consumes
            100-200 times the cycles of fast-path handling. As a result, the 
forwarding
            performance of a PMD degrades significantly during an upcall burst. 
If the
            PMD was highly loaded already, it becomes temporarily overloaded 
and its rx
            queues start filling up. If the upcall burst is long enough, 
packets will be
            dropped when rx queues are full. This happens even if the new flows 
are
            unexpected and the slow-path decides to drop the packets.
            
            It is likely that most of the packets dropped due to rx queue 
overflow belong
            to established flows that should have been processed by the 
fast-path. Hence,
            the current OVS-DPDK architecture favors the handling of new flows 
over the
            forwarding of established flows. This is generally a sub-optimal 
approach.
            
            Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for 
DoS attacks.
            But even sporadic bursts of e.g. unexpected multicast packets have 
shown to
            cause such packet drops.
            
            Proposed solution:
            ------------------
            This patch implements a mechanism to limit the rate of packets 
going into
            slow-path from fast-path in OVS-DPDK mode. A simple token bucket 
policer per
            packet processing thread (either PMD or non-PMD thread) restricts 
the flow
            of packets funneling from fast-path into slow-path, as shown below. 
So for
            each PMD (or non-PMD) thread, there is a limited amount of cycles 
carved out
            for its slow-path. This Upcall Policer allows only configured 
number of
            packets per second (pps) into handle_packet_upcall() which 
identifies the
            start of slow-path in OVS-DPDK. A packet entering slow-path has to 
take a
            token to get into slow-path and if no tokens are available, the 
packet is
            dropped and accounted per-PMD (or thread) under
            "ovs-appctl dpif-netdev/pmd-stats-show".
            
            pmd thread numa_id 0 core_id 2:
                    emc hits:0
                    megaflow hits:0
                    avg. subtable lookups per hit:0.00
                    miss:287572
                    rate limit drops:xxxxx <<<<<<<<<<<<<
                    lost:0
                    idle cycles:14925072116 (43.81%)
                    processing cycles:19140112904 (56.19%)
                    avg cycles per packet: 118457.93 (34065185020/287572)
                    avg processing cycles per packet: 66557.64 
(19140112904/287572)
            NOTE: This is a sample output and may need adaptation based on new
            changes/proposals (if any) in progress.
            
            The upcall policer can be enabled and configured with the following
            parameters in the "Open_vSwitch" table as new items under 
other_config.
            These values are common for all packet processing threads (PMD or 
non-PMD),
            with each thread using the same configured value independently.
            
            1. ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
                - Global knob to enable/disable upcall ratelimiting for all 
non-PMD/PMD
                  threads.
            2. ovs-vsctl set Open_vSwitch . other_config:upcall-rate=xxx
                - xxx is in packets per second (pps). This determines the token 
bucket's
                  fill-rate.
            3. ovs-vsctl set Open_vSwitch . other_config:upcall-burst=xxx
                - xxx is the maximum burst of packets allowed at any time. This 
determines
                  the token bucket's burst size.
            
            By default, this feature is disabled for backward compatibility and 
needs to be
            explicitly enabled (via global knob upcall-rl shown above). When 
enabled,
            default values (that are derived based on typical slow-path and 
fast-path
            performance measurements) will be used during init and can be 
overridden by the
            above knob/commands. Configured values for rate and burst would 
take effect only
            when the feature is enabled.
            
            The patch is based on an existing token bucket implementation in 
OVS.
            Signed-off-by: Manohar K C 
<manohar.krishnappa.chidambarasw...@ericsson.com>
            CC: Jan Scheurich jan.scheur...@ericsson.com
            ---
             Documentation/howto/dpdk.rst | 21 +++++++++++
             lib/dpif-netdev.c            | 85 
+++++++++++++++++++++++++++++++++++++++++---
             vswitchd/vswitch.xml         | 47 ++++++++++++++++++++++++
             3 files changed, 148 insertions(+), 5 deletions(-)
            
            diff --git a/Documentation/howto/dpdk.rst 
b/Documentation/howto/dpdk.rst
            index d123819..2cd0209 100644
            --- a/Documentation/howto/dpdk.rst
            +++ b/Documentation/howto/dpdk.rst
            @@ -709,3 +709,24 @@ devices to bridge ``br0``. Once complete, 
follow the below steps:
                Check traffic on multiple queues::
            
                    $ cat /proc/interrupts | grep virtio
            +
            +Upcall rate limiting
            +--------------------
            +ovs-vsctl can be used to enable and configure upcall rate limit 
parameters.
            +There are 2 configurable values ``upcall-rate`` and 
``upcall-burst`` which
            +take effect when global enable knob ``upcall-rl`` is set to true.
            +
            +Upcall rate should be set using ``upcall-rate`` in 
packets-per-sec. For
            +example::
            +
            +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
            +
            +Upcall burst should be set using ``upcall-burst`` in 
packets-per-sec. For
            +example::
            +
            +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
            +
            +Upcall ratelimit feature should be globally enabled using 
``upcall-rl``. For
            +example::
            +
            +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
            diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
            index 599308d..b26fbbb 100644
            --- a/lib/dpif-netdev.c
            +++ b/lib/dpif-netdev.c
            @@ -100,6 +100,16 @@ static struct shash dp_netdevs 
OVS_GUARDED_BY(dp_netdev_mutex)
            
             static struct vlog_rate_limit upcall_rl = 
VLOG_RATE_LIMIT_INIT(600, 600);
            
            +/* Upcall rate-limit parameters */
            +static bool upcall_ratelimit;
            +static unsigned int upcall_rate;
            +static unsigned int upcall_burst;
            +
            +/* TODO: Tune these default values based on upcall perf test */
            +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
            +#define UPCALL_RATE_DEFAULT      1000  /* pps */
            +#define UPCALL_BURST_DEFAULT     1000  /* pps */
            +
             #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | 
CS_RELATED \
                                                  | CS_INVALID | CS_REPLY_DIR | 
CS_TRACKED \
                                                  | CS_SRC_NAT | CS_DST_NAT)
            @@ -337,6 +347,7 @@ enum dp_stat_type {
                 DP_STAT_LOST,               /* Packets not passed up to the 
client. */
                 DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for 
flow table
                                                hits */
            +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall 
policer */
                 DP_N_STATS
             };
            
            @@ -651,6 +662,11 @@ struct dp_netdev_pmd_thread {
                     uint64_t cycles_zero[PMD_N_CYCLES];
                     /* 8 pad bytes. */
                 );
            +
            +    PADDED_MEMBERS(CACHE_LINE_SIZE,
            +        /* Policer to rate limit slow-path */
            +        struct token_bucket upcall_tb;
            +    );
             };
            
             /* Interface to netdev-based datapath. */
            @@ -864,12 +880,13 @@ pmd_info_show_stats(struct ds *reply,
                 ds_put_format(reply,
                               "\temc hits:%llu\n\tmegaflow hits:%llu\n"
                               "\tavg. subtable lookups per hit:%.2f\n"
            -                  "\tmiss:%llu\n\tlost:%llu\n",
            +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit 
drops:%llu\n",
                               stats[DP_STAT_EXACT_HIT], 
stats[DP_STAT_MASKED_HIT],
                               stats[DP_STAT_MASKED_HIT] > 0
                               ? 
(1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
                               : 0,
            -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
            +                  stats[DP_STAT_MISS], stats[DP_STAT_LOST],
            +                  stats[DP_STAT_RATELIMIT_DROP]);
            
                 if (total_cycles == 0) {
                     return;
            @@ -2996,6 +3013,8 @@ dpif_netdev_set_config(struct dpif *dpif, 
const struct smap *other_config)
                     smap_get_ullong(other_config, "emc-insert-inv-prob",
                                     DEFAULT_EM_FLOW_INSERT_INV_PROB);
                 uint32_t insert_min, cur_min;
            +    unsigned int rate, burst;
            +    bool ratelimit;
            
                 if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
                     free(dp->pmd_cmask);
            @@ -3021,6 +3040,36 @@ dpif_netdev_set_config(struct dpif *dpif, 
const struct smap *other_config)
                     }
                 }
            
            +    /* Handle upcall policer params */
            +    ratelimit = smap_get_bool(other_config, "upcall-rl",
            +                              UPCALL_RATELIMIT_DEFAULT);
            +    rate = smap_get_int(other_config, "upcall-rate",
            +                        UPCALL_RATE_DEFAULT);
            +    burst = smap_get_int(other_config, "upcall-burst",
            +                         UPCALL_BURST_DEFAULT);
            +
            +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
            +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d 
burst=%d "
            +                  ": New - rate=%d burst=%d\n", upcall_rate, 
upcall_burst,
            +                  rate, burst);
            +
            +        upcall_rate = rate;
            +        upcall_burst = burst;
            +
            +        /*
            +         * TODO: See if there is a way to reconfig only the policer
            +         * in each PMD.
            +         */
            +        dp_netdev_request_reconfigure(dp);
            +    }
            +
            +    if (ratelimit != upcall_ratelimit) {
            +        upcall_ratelimit = ratelimit;
            +
            +        VLOG_INFO("Upcall ratelimit changed to %s\n",
            +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
            +    }
            +
                 return 0;
             }
            @@ -3879,6 +3928,12 @@ dpif_netdev_run(struct dpif *dpif)
                 ovs_mutex_lock(&dp->port_mutex);
                 non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
                 if (non_pmd) {
            +        /* Reconfig the upcall policer if params have changed */
            +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
            +            (upcall_burst != non_pmd->upcall_tb.burst)) {
            +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, 
upcall_burst);
            +        }
            +
                     ovs_mutex_lock(&dp->non_pmd_mutex);
                     cycles_count_start(non_pmd);
                     HMAP_FOR_EACH (port, node, &dp->ports) {
            @@ -4074,6 +4129,9 @@ reload:
                     lc = UINT_MAX;
                 }
            
            +    /* Initialize upcall policer token bucket with configured 
params */
            +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
            +
                 cycles_count_start(pmd);
                 for (;;) {
                     for (i = 0; i < poll_cnt; i++) {
            @@ -4554,6 +4612,10 @@ dp_netdev_configure_pmd(struct 
dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
                     emc_cache_init(&pmd->flow_cache);
                     pmd_alloc_static_tx_qid(pmd);
                 }
            +
            +    /* Initialize upcall policer token bucket with configured 
params */
            +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
            +
                 cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, 
&pmd->node),
                             hash_int(core_id, 0));
             }
            @@ -4992,7 +5054,7 @@ handle_packet_upcall(struct 
dp_netdev_pmd_thread *pmd,
                                  struct dp_packet *packet,
                                  const struct netdev_flow_key *key,
                                  struct ofpbuf *actions, struct ofpbuf 
*put_actions,
            -                     int *lost_cnt, long long now)
            +                     int *lost_cnt, int *rl_drop_cnt, long long 
now)
             {
                 struct ofpbuf *add_actions;
                 struct dp_packet_batch b;
            @@ -5000,6 +5062,18 @@ handle_packet_upcall(struct 
dp_netdev_pmd_thread *pmd,
                 ovs_u128 ufid;
                 int error;
            
            +    /*
            +     * Grab a token from the upcall policer to enter slowpath. If 
token
            +     * is not available, drop and account the packet. This is to
            +     * rate-limit packets getting into slowpath.
            +     */
            +    if (upcall_ratelimit && 
!token_bucket_withdraw(&pmd->upcall_tb, 1)) {
            +        dp_packet_delete(packet);
            +        (*rl_drop_cnt)++;
            +
            +        return;
            +    }
            +
                 match.tun_md.valid = false;
                 miniflow_expand(&key->mf, &match.flow);
            
            @@ -5074,7 +5148,7 @@ fast_path_processing(struct 
dp_netdev_pmd_thread *pmd,
                 struct dpcls *cls;
                 struct dpcls_rule *rules[PKT_ARRAY_SIZE];
                 struct dp_netdev *dp = pmd->dp;
            -    int miss_cnt = 0, lost_cnt = 0;
            +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
                 int lookup_cnt = 0, add_lookup_cnt;
                 bool any_miss;
                 size_t i;
            @@ -5118,7 +5192,7 @@ fast_path_processing(struct 
dp_netdev_pmd_thread *pmd,
            
                         miss_cnt++;
                         handle_packet_upcall(pmd, packet, &keys[i], &actions,
            -                                 &put_actions, &lost_cnt, now);
            +                                 &put_actions, &lost_cnt, 
&rl_drop_cnt, now);
                     }
            
                     ofpbuf_uninit(&actions);
            @@ -5151,6 +5225,7 @@ fast_path_processing(struct 
dp_netdev_pmd_thread *pmd,
                 dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
                 dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
                 dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
            +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, 
rl_drop_cnt);
             }
            
             /* Packets enter the datapath from a port (or from recirculation) 
here.
            diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
            index c145e1a..3d86367 100644
            --- a/vswitchd/vswitch.xml
            +++ b/vswitchd/vswitch.xml
            @@ -397,6 +397,53 @@
                     </p>
                   </column>
            
            +      <column name="other_config" key="upcall-rl"
            +              type='{"type": "boolean"}'>
            +        <p>
            +          Set this value to <code>true</code> to enable upcall 
rate-limiting.
            +          The upcall parameters like rate and burst will be 
ignored, if this is
            +          not set.
            +        </p>
            +        <p>
            +          The default value is <code>false</code> and upcall 
rate-limiting will
            +          be disabled.
            +        </p>
            +      </column>
            +
            +      <column name="other_config" key="upcall-rate"
            +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
            +        <p>
            +          Specifies the rate of upcalls in packets-per-second that 
is to be
            +          allowed. For example, if the value is 10000, then those 
many upcalls
            +          (for packets) are allowed per second in each of the 
packet polling
            +          thread (PMD or non-PMD).
            +        </p>
            +        <p>
            +          A value of <code>0</code> means, no upcalls would be 
allowed i.e,
            +          upcall will be disabled. This is mainly for debugging.
            +        </p>
            +        <p>
            +          The default value is 1000.
            +        </p>
            +      </column>
            +
            +      <column name="other_config" key="upcall-burst"
            +        type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>
            +        <p>
            +          Specifies the maximum burst of upcalls in 
packets-per-second that is
            +          to be allowed. For example, if the value is 15000, then 
a maximum
            +          burst of 15000 upcalls (for packets) are allowed per 
second in each
            +          of the packet polling thread (PMD or non-PMD).
            +        </p>
            +        <p>
            +          A value of <code>0</code> means, no upcalls would be 
allowed i.e,
            +          upcall will be disabled. This is mainly for debugging.
            +        </p>
            +        <p>
            +          The default value is 1000.
            +        </p>
            +      </column>
            +
                   <column name="other_config" key="vlan-limit"
                           type='{"type": "integer", "minInteger": 0}'>
                     <p>
            --
            1.9.1
            
            _______________________________________________
            dev mailing list
            d...@openvswitch.org
            https://mail.openvswitch.org/mailman/listinfo/ovs-dev
            
        
        
    
    

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to