Jan,

Have addressed your comments/suggestions. Please take a look and let me
know if you have any more comments.

Signed-off-by: Manohar K C
<manohar.krishnappa.chidambarasw...@ericsson.com>
CC: Jan Scheurich <jan.scheur...@ericsson.com>
---
v3 : v2 rebased to master and adpated to dpif-netdev-perf counters.
     https://patchwork.ozlabs.org/patch/909676/

v2 : v1 rebased to master.
     https://patchwork.ozlabs.org/patch/860687/

v1 : Initial patch for upcall rate-limiter based on token-bucket.
     https://patchwork.ozlabs.org/patch/836737/

 Documentation/howto/dpdk.rst |  53 ++++++++++++++++++
 lib/dpif-netdev-perf.h       |   1 +
 lib/dpif-netdev.c            | 130 ++++++++++++++++++++++++++++++++++++++++---
 lib/dpif.h                   |   1 +
 vswitchd/vswitch.xml         |  42 ++++++++++++++
 5 files changed, 220 insertions(+), 7 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 380181d..b717804 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -358,6 +358,59 @@ devices to bridge ``br0``. Once complete, follow the below 
steps:
 
        $ cat /proc/interrupts | grep virtio
 
+Upcall rate limiting
+--------------------
+In OVS, when an incoming packet does not match any flow in the flow
+table maintained in fast-path (either in userspace or kernel), the
+packet is processed in the slow-path via a mechanism known as upcall.
+In OVS-DPDK, both fast-path and slow-path execute in the context of a
+common poll-mode-driver (PMD) thread, without any partitioning of CPU
+cycles between the two.
+
+When there is a burst of new flows coming into the data-path, packets
+are punted to slow-path in the order they are received and the PMD is
+busy for the duration of the upcall. Slow-path processing of a packet
+consumes 100-200 times the cycles of fast-path handling. As a result,
+the forwarding performance of a PMD degrades significantly during an
+upcall burst. If the PMD was highly loaded already, it becomes temporarily
+overloaded and its rx queues start filling up. If the upcall burst is long
+enough, packets will be dropped when rx queues are full. This happens even
+if the new flows are unexpected and the slow-path decides to drop the
+packets.
+
+It is likely that most of the packets dropped due to rx queue overflow
+belong to established flows that should have been processed by the
+fast-path. Hence, the current OVS-DPDK architecture favors the handling
+of new flows over the forwarding of established flows. This is generally
+a sub-optimal approach.
+
+Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS
+attacks. But even sporadic bursts of e.g. unexpected multicast packets
+have shown to cause such packet drops.
+
+ovs-vsctl can be used to enable and configure upcall rate limit parameters.
+There are 2 configurable parameters ``upcall-rate`` and ``upcall-burst`` which
+take effect when the configuration parameter ``upcall-rl`` is set to true.
+
+Upcall rate-limiting is done at independent PMD level. Configured values for
+``upcall-rate`` and ``upcall-burst`` are used indepdently in each PMD
+(and non-PMD) threads which execute upcalls.
+
+Upcall rate should be set using ``upcall-rate`` in upcalls-per-sec. If not
+configured, default value of 500 upcalls-per-sec will be set. For example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
+
+Upcall burst size should be set using ``upcall-burst`` in upcalls. If not
+configured, default value of 500 upcalls will be set. For example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
+
+Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
+example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
+
 Further Reading
 ---------------
 
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 5993c25..e4b945a 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -55,6 +55,7 @@ enum pmd_stat_type {
                              * number of packet passes through the datapath
                              * pipeline and should not be overlapping with each
                              * other. */
+    PMD_STAT_RL_DROP,       /* Packets dropped due to upcall rate-limit. */
     PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
                                hits. Each MASKED_HIT hit will have >= 1
                                MASKED_LOOKUP(s). */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index be31fd0..dba8629 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -101,6 +101,11 @@ static struct shash dp_netdevs 
OVS_GUARDED_BY(dp_netdev_mutex)
 
 static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
 
+/* These default values may be tuned based on upcall performance */
+#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
+#define UPCALL_RATE_DEFAULT      500  /* upcalls-per-sec */
+#define UPCALL_BURST_DEFAULT     500  /* maximum burst of upcalls */
+
 #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
                                      | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
                                      | CS_SRC_NAT | CS_DST_NAT)
@@ -316,6 +321,12 @@ struct dp_netdev {
     uint64_t last_tnl_conf_seq;
 
     struct conntrack conntrack;
+
+    /* Upcall rate-limiter parameters */
+    atomic_bool upcall_ratelimit;
+    atomic_uint32_t upcall_rate;
+    atomic_uint32_t upcall_burst;
+    bool upcall_params_changed;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -615,6 +626,11 @@ struct dp_netdev_pmd_thread {
     /* Keep track of detailed PMD performance statistics. */
     struct pmd_perf_stats perf_stats;
 
+    /* Policer to rate limit upcalls */
+    struct token_bucket upcall_tb;  /* Cache of rate and burst parameters. */
+    bool upcall_ratelimit;          /* Cache of global enable/disable
+                                       parameter. */
+
     /* Set to true if the pmd thread needs to be reloaded. */
     bool need_reload;
 };
@@ -856,12 +872,13 @@ pmd_info_show_stats(struct ds *reply,
                   "\tavg. subtable lookups per megaflow hit: %.02f\n"
                   "\tmiss with success upcall: %"PRIu64"\n"
                   "\tmiss with failed upcall: %"PRIu64"\n"
+                  "\tdrops due to upcall ratelimiter: %"PRIu64"\n"
                   "\tavg. packets per output batch: %.02f\n",
                   total_packets, stats[PMD_STAT_RECIRC],
                   passes_per_pkt, stats[PMD_STAT_EXACT_HIT],
                   stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
                   stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
-                  packets_per_batch);
+                  stats[PMD_STAT_RL_DROP], packets_per_batch);
 
     if (total_cycles == 0) {
         return;
@@ -1479,6 +1496,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct 
dpif_dp_stats *stats)
         stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT];
         stats->n_missed += pmd_stats[PMD_STAT_MISS];
         stats->n_lost += pmd_stats[PMD_STAT_LOST];
+        stats->n_rl_dropped += pmd_stats[PMD_STAT_RL_DROP];
     }
     stats->n_masks = UINT32_MAX;
     stats->n_mask_hit = UINT64_MAX;
@@ -1489,11 +1507,24 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct 
dpif_dp_stats *stats)
 static void
 dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 {
+    uint32_t rate, burst;
+
     if (pmd->core_id == NON_PMD_CORE_ID) {
         ovs_mutex_lock(&pmd->dp->non_pmd_mutex);
         ovs_mutex_lock(&pmd->port_mutex);
         pmd_load_cached_ports(pmd);
         ovs_mutex_unlock(&pmd->port_mutex);
+
+        /* Reconfig the upcall policer if params have changed */
+        atomic_read_relaxed(&pmd->dp->upcall_rate, &rate);
+        atomic_read_relaxed(&pmd->dp->upcall_burst, &burst);
+        if ((rate != pmd->upcall_tb.rate) ||
+            (burst != pmd->upcall_tb.burst)) {
+            token_bucket_init(&pmd->upcall_tb, rate, burst);
+        }
+        atomic_read_relaxed(&pmd->dp->upcall_ratelimit,
+                            &pmd->upcall_ratelimit);
+
         ovs_mutex_unlock(&pmd->dp->non_pmd_mutex);
         return;
     }
@@ -2987,6 +3018,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
                         DEFAULT_EM_FLOW_INSERT_INV_PROB);
     uint32_t insert_min, cur_min;
     uint32_t tx_flush_interval, cur_tx_flush_interval;
+    uint32_t rate, cur_rate;
+    uint32_t burst, cur_burst;
+    bool ratelimit, cur_ratelimit;
 
     tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
                                      DEFAULT_TX_FLUSH_INTERVAL);
@@ -3021,6 +3055,44 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
         }
     }
 
+    /* Handle upcall policer params */
+    ratelimit = smap_get_bool(other_config, "upcall-rl",
+                              UPCALL_RATELIMIT_DEFAULT);
+    rate = smap_get_int(other_config, "upcall-rate",
+                        UPCALL_RATE_DEFAULT);
+    burst = smap_get_int(other_config, "upcall-burst",
+                         UPCALL_BURST_DEFAULT);
+
+    atomic_read_relaxed(&dp->upcall_ratelimit, &cur_ratelimit);
+    atomic_read_relaxed(&dp->upcall_rate, &cur_rate);
+    atomic_read_relaxed(&dp->upcall_burst, &cur_burst);
+
+    if ((rate != cur_rate) || (burst != cur_burst)) {
+        VLOG_INFO("Upcall ratelimiter params changed : Old - rate=%d burst=%d "
+                  ": New - rate=%d burst=%d\n", cur_rate, cur_burst,
+                  rate, burst);
+
+        atomic_store_relaxed(&dp->upcall_rate, rate);
+        atomic_store_relaxed(&dp->upcall_burst, burst);
+
+        dp->upcall_params_changed = true;
+    }
+
+    if (ratelimit != cur_ratelimit) {
+        VLOG_INFO("Upcall ratelimiter changed to %s\n",
+                  (ratelimit ? "Enabled" : "Disabled"));
+        VLOG_INFO("Upcall ratelimiter params : rate=%d burst=%d\n",
+                  rate, burst);
+
+        atomic_store_relaxed(&dp->upcall_ratelimit, ratelimit);
+
+        dp->upcall_params_changed = true;
+    }
+
+    if (dp->upcall_params_changed) {
+        dp_netdev_request_reconfigure(dp);
+    }
+
     return 0;
 }
 
@@ -3653,6 +3725,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
     struct hmapx_node *node;
     bool changed = false;
     bool need_to_adjust_static_tx_qids = false;
+    bool need_to_reinit_upcall_ratelimiter = false;
 
     /* The pmd threads should be started only if there's a pmd port in the
      * datapath.  If the user didn't provide any "pmd-cpu-mask", we start
@@ -3674,6 +3747,13 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
         need_to_adjust_static_tx_qids = true;
     }
 
+    /* Check if upcall ratelimiter paramters have changed */
+    if (dp->upcall_params_changed) {
+        need_to_reinit_upcall_ratelimiter = true;
+
+        dp->upcall_params_changed = false;
+    }
+
     /* Check for unwanted pmd threads */
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
         if (pmd->core_id == NON_PMD_CORE_ID) {
@@ -3682,7 +3762,8 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
         if (!ovs_numa_dump_contains_core(pmd_cores, pmd->numa_id,
                                                     pmd->core_id)) {
             hmapx_add(&to_delete, pmd);
-        } else if (need_to_adjust_static_tx_qids) {
+        } else if (need_to_adjust_static_tx_qids ||
+                   need_to_reinit_upcall_ratelimiter) {
             pmd->need_reload = true;
         }
     }
@@ -4106,6 +4187,7 @@ pmd_thread_main(void *f_)
     int poll_cnt;
     int i;
     int process_packets = 0;
+    uint32_t rate, burst;
 
     poll_list = NULL;
 
@@ -4135,6 +4217,15 @@ reload:
         lc = UINT_MAX;
     }
 
+    /* Reconfig upcall policer token bucket with configured params. */
+    atomic_read_relaxed(&pmd->dp->upcall_rate, &rate);
+    atomic_read_relaxed(&pmd->dp->upcall_burst, &burst);
+    if ((rate != pmd->upcall_tb.rate) ||
+        (burst != pmd->upcall_tb.burst)) {
+        token_bucket_init(&pmd->upcall_tb, rate, burst);
+    }
+    atomic_read_relaxed(&pmd->dp->upcall_ratelimit, &pmd->upcall_ratelimit);
+
     pmd->intrvl_tsc_prev = 0;
     atomic_store_relaxed(&pmd->intrvl_cycles, 0);
     cycles_counter_update(s);
@@ -4596,6 +4687,8 @@ static void
 dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
                         unsigned core_id, int numa_id)
 {
+    uint32_t rate, burst;
+
     pmd->dp = dp;
     pmd->core_id = core_id;
     pmd->numa_id = numa_id;
@@ -4627,6 +4720,13 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
*pmd, struct dp_netdev *dp,
         emc_cache_init(&pmd->flow_cache);
         pmd_alloc_static_tx_qid(pmd);
     }
+
+    /* Initialize upcall policer token bucket with configured params */
+    atomic_read_relaxed(&dp->upcall_rate, &rate);
+    atomic_read_relaxed(&dp->upcall_burst, &burst);
+    token_bucket_init(&pmd->upcall_tb, rate, burst);
+    atomic_read_relaxed(&dp->upcall_ratelimit, &pmd->upcall_ratelimit);
+
     pmd_perf_stats_init(&pmd->perf_stats);
     cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                 hash_int(core_id, 0));
@@ -5145,6 +5245,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     struct dpcls_rule *rules[PKT_ARRAY_SIZE];
     struct dp_netdev *dp = pmd->dp;
     int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
+    int upcall_rl_drop_cnt = 0;
     int lookup_cnt = 0, add_lookup_cnt;
     bool any_miss;
 
@@ -5185,13 +5286,26 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                 continue;
             }
 
-            int error = handle_packet_upcall(pmd, packet, &keys[i],
-                                             &actions, &put_actions);
+            /*
+             * Grab a token from the upcall policer to enter slowpath. If token
+             * is not available, drop and account the packet. This is to
+             * rate-limit packets getting into slowpath.
+             */
+            if (pmd->upcall_ratelimit &&
+                !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
+                dp_packet_delete(packet);
 
-            if (OVS_UNLIKELY(error)) {
-                upcall_fail_cnt++;
+                upcall_rl_drop_cnt++;
             } else {
-                upcall_ok_cnt++;
+
+                int error = handle_packet_upcall(pmd, packet, &keys[i],
+                                                 &actions, &put_actions);
+
+                if (OVS_UNLIKELY(error)) {
+                    upcall_fail_cnt++;
+                } else {
+                    upcall_ok_cnt++;
+                }
             }
         }
 
@@ -5228,6 +5342,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                             upcall_ok_cnt);
     pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
                             upcall_fail_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RL_DROP,
+                            upcall_rl_drop_cnt);
 }
 
 /* Packets enter the datapath from a port (or from recirculation) here.
diff --git a/lib/dpif.h b/lib/dpif.h
index 94f89ec..708e798 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -438,6 +438,7 @@ struct dpif_dp_stats {
     uint64_t n_hit;             /* Number of flow table matches. */
     uint64_t n_missed;          /* Number of flow table misses. */
     uint64_t n_lost;            /* Number of misses not sent to userspace. */
+    uint64_t n_rl_dropped;      /* Number of drops due to upcall rate-limit. */
     uint64_t n_flows;           /* Number of flows present. */
     uint64_t n_mask_hit;        /* Number of mega flow masks visited for
                                    flow table matches. */
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 94a64dd..858ba4e 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -428,6 +428,48 @@
         </p>
       </column>
 
+      <column name="other_config" key="upcall-rl"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to enable upcall rate-limiting.
+          The upcall parameters like rate and burst will be ignored, if this is
+          not set.
+        </p>
+        <p>
+          The default value is <code>false</code> and upcall rate-limiting will
+          be disabled.
+        </p>
+      </column>
+
+      <column name="other_config" key="upcall-rate"
+        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
+        <p>
+          When upcall rate limiting is enabled, this specifies the sustained
+          rate of upcalls in upcalls per second that are admitted per
+          packet-polling thread (PMD or non-PMD) in the netdev datapath when
+          upcall rate limiting is enabled. The default value is 500 upcalls
+          per second.
+
+          Setting both upcall-rate and upcall-burst to <code>0</code> disables
+          upcalls in the netdev datapath. This can be used for debugging
+          purposes.
+        </p>
+      </column>
+
+      <column name="other_config" key="upcall-burst"
+        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
+        <p>
+          When upcall rate limiting is enabled, this specifies the maximum
+          burst of upcalls that are admitted per packet-polling  thread
+          (PMD or non-PMD) in the netdev datapath independent from their
+          packet arrival rate. The default value is a burst of 500 upcalls.
+
+          Setting both upcall-rate and upcall-burst to <code>0</code> disables
+          upcalls in the netdev datapath. This can be used for debugging
+          purposes.
+        </p>
+      </column>
+
       <column name="other_config" key="vlan-limit"
               type='{"type": "integer", "minInteger": 0}'>
         <p>
-- 
1.9.1


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to