Hi,
This was one of the items discussed in the last OVS-DPDK public meeting
(29-Nov). Considering that this change is passive (disabled by default and does
not come in fast-path) and makes OVS (when enabled) more secure for potential
DOS-attack, could this be considered for OVS 2.9?
In any case, could someone please review the patch?
Thanx
Manu
On 17/11/17, 11:17 AM, "Manohar Krishnappa Chidambaraswamy"
<[email protected]> wrote:
Hi,
Does anyone have any comments on this add-on to upcalls?
Thanx
Manu
On 10/11/17, 7:06 PM, "[email protected] on behalf of
Manohar Krishnappa Chidambaraswamy" <[email protected] on behalf
of [email protected]> wrote:
In OVS-DPDK, both fast-path and slow-path execute in the context of
a common
thread (i.e, PMD thread), without any partitioning of CPU cycles
between the
two. When there is a burst of new flows coming into the data-path,
packets
are punted to slow-path in the order they are received and the PMD
is busy
for the duration of the upcall. Slow-path processing of a packet
consumes
100-200 times the cycles of fast-path handling. As a result, the
forwarding
performance of a PMD degrades significantly during an upcall burst.
If the
PMD was highly loaded already, it becomes temporarily overloaded
and its rx
queues start filling up. If the upcall burst is long enough,
packets will be
dropped when rx queues are full. This happens even if the new flows
are
unexpected and the slow-path decides to drop the packets.
It is likely that most of the packets dropped due to rx queue
overflow belong
to established flows that should have been processed by the
fast-path. Hence,
the current OVS-DPDK architecture favors the handling of new flows
over the
forwarding of established flows. This is generally a sub-optimal
approach.
Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for
DoS attacks.
But even sporadic bursts of e.g. unexpected multicast packets have
shown to
cause such packet drops.
Proposed solution:
------------------
This patch implements a mechanism to limit the rate of packets
going into
slow-path from fast-path in OVS-DPDK mode. A simple token bucket
policer per
packet processing thread (either PMD or non-PMD thread) restricts
the flow
of packets funneling from fast-path into slow-path, as shown below.
So for
each PMD (or non-PMD) thread, there is a limited amount of cycles
carved out
for its slow-path. This Upcall Policer allows only configured
number of
packets per second (pps) into handle_packet_upcall() which
identifies the
start of slow-path in OVS-DPDK. A packet entering slow-path has to
take a
token to get into slow-path and if no tokens are available, the
packet is
dropped and accounted per-PMD (or thread) under
"ovs-appctl dpif-netdev/pmd-stats-show".
pmd thread numa_id 0 core_id 2:
emc hits:0
megaflow hits:0
avg. subtable lookups per hit:0.00
miss:287572
rate limit drops:xxxxx <<<<<<<<<<<<<
lost:0
idle cycles:14925072116 (43.81%)
processing cycles:19140112904 (56.19%)
avg cycles per packet: 118457.93 (34065185020/287572)
avg processing cycles per packet: 66557.64
(19140112904/287572)
NOTE: This is a sample output and may need adaptation based on new
changes/proposals (if any) in progress.
The upcall policer can be enabled and configured with the following
parameters in the "Open_vSwitch" table as new items under
other_config.
These values are common for all packet processing threads (PMD or
non-PMD),
with each thread using the same configured value independently.
1. ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
- Global knob to enable/disable upcall ratelimiting for all
non-PMD/PMD
threads.
2. ovs-vsctl set Open_vSwitch . other_config:upcall-rate=xxx
- xxx is in packets per second (pps). This determines the token
bucket's
fill-rate.
3. ovs-vsctl set Open_vSwitch . other_config:upcall-burst=xxx
- xxx is the maximum burst of packets allowed at any time. This
determines
the token bucket's burst size.
By default, this feature is disabled for backward compatibility and
needs to be
explicitly enabled (via global knob upcall-rl shown above). When
enabled,
default values (that are derived based on typical slow-path and
fast-path
performance measurements) will be used during init and can be
overridden by the
above knob/commands. Configured values for rate and burst would
take effect only
when the feature is enabled.
The patch is based on an existing token bucket implementation in
OVS.
Signed-off-by: Manohar K C
<[email protected]>
CC: Jan Scheurich [email protected]
---
Documentation/howto/dpdk.rst | 21 +++++++++++
lib/dpif-netdev.c | 85
+++++++++++++++++++++++++++++++++++++++++---
vswitchd/vswitch.xml | 47 ++++++++++++++++++++++++
3 files changed, 148 insertions(+), 5 deletions(-)
diff --git a/Documentation/howto/dpdk.rst
b/Documentation/howto/dpdk.rst
index d123819..2cd0209 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -709,3 +709,24 @@ devices to bridge ``br0``. Once complete,
follow the below steps:
Check traffic on multiple queues::
$ cat /proc/interrupts | grep virtio
+
+Upcall rate limiting
+--------------------
+ovs-vsctl can be used to enable and configure upcall rate limit
parameters.
+There are 2 configurable values ``upcall-rate`` and
``upcall-burst`` which
+take effect when global enable knob ``upcall-rl`` is set to true.
+
+Upcall rate should be set using ``upcall-rate`` in
packets-per-sec. For
+example::
+
+ $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
+
+Upcall burst should be set using ``upcall-burst`` in
packets-per-sec. For
+example::
+
+ $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
+
+Upcall ratelimit feature should be globally enabled using
``upcall-rl``. For
+example::
+
+ $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 599308d..b26fbbb 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -100,6 +100,16 @@ static struct shash dp_netdevs
OVS_GUARDED_BY(dp_netdev_mutex)
static struct vlog_rate_limit upcall_rl =
VLOG_RATE_LIMIT_INIT(600, 600);
+/* Upcall rate-limit parameters */
+static bool upcall_ratelimit;
+static unsigned int upcall_rate;
+static unsigned int upcall_burst;
+
+/* TODO: Tune these default values based on upcall perf test */
+#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
+#define UPCALL_RATE_DEFAULT 1000 /* pps */
+#define UPCALL_BURST_DEFAULT 1000 /* pps */
+
#define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED |
CS_RELATED \
| CS_INVALID | CS_REPLY_DIR |
CS_TRACKED \
| CS_SRC_NAT | CS_DST_NAT)
@@ -337,6 +347,7 @@ enum dp_stat_type {
DP_STAT_LOST, /* Packets not passed up to the
client. */
DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for
flow table
hits */
+ DP_STAT_RATELIMIT_DROP, /* Packets dropped due to upcall
policer */
DP_N_STATS
};
@@ -651,6 +662,11 @@ struct dp_netdev_pmd_thread {
uint64_t cycles_zero[PMD_N_CYCLES];
/* 8 pad bytes. */
);
+
+ PADDED_MEMBERS(CACHE_LINE_SIZE,
+ /* Policer to rate limit slow-path */
+ struct token_bucket upcall_tb;
+ );
};
/* Interface to netdev-based datapath. */
@@ -864,12 +880,13 @@ pmd_info_show_stats(struct ds *reply,
ds_put_format(reply,
"\temc hits:%llu\n\tmegaflow hits:%llu\n"
"\tavg. subtable lookups per hit:%.2f\n"
- "\tmiss:%llu\n\tlost:%llu\n",
+ "\tmiss:%llu\n\tlost:%llu\n\tratelimit
drops:%llu\n",
stats[DP_STAT_EXACT_HIT],
stats[DP_STAT_MASKED_HIT],
stats[DP_STAT_MASKED_HIT] > 0
?
(1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
: 0,
- stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
+ stats[DP_STAT_MISS], stats[DP_STAT_LOST],
+ stats[DP_STAT_RATELIMIT_DROP]);
if (total_cycles == 0) {
return;
@@ -2996,6 +3013,8 @@ dpif_netdev_set_config(struct dpif *dpif,
const struct smap *other_config)
smap_get_ullong(other_config, "emc-insert-inv-prob",
DEFAULT_EM_FLOW_INSERT_INV_PROB);
uint32_t insert_min, cur_min;
+ unsigned int rate, burst;
+ bool ratelimit;
if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
free(dp->pmd_cmask);
@@ -3021,6 +3040,36 @@ dpif_netdev_set_config(struct dpif *dpif,
const struct smap *other_config)
}
}
+ /* Handle upcall policer params */
+ ratelimit = smap_get_bool(other_config, "upcall-rl",
+ UPCALL_RATELIMIT_DEFAULT);
+ rate = smap_get_int(other_config, "upcall-rate",
+ UPCALL_RATE_DEFAULT);
+ burst = smap_get_int(other_config, "upcall-burst",
+ UPCALL_BURST_DEFAULT);
+
+ if ((rate != upcall_rate) || (burst != upcall_burst)) {
+ VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d
burst=%d "
+ ": New - rate=%d burst=%d\n", upcall_rate,
upcall_burst,
+ rate, burst);
+
+ upcall_rate = rate;
+ upcall_burst = burst;
+
+ /*
+ * TODO: See if there is a way to reconfig only the policer
+ * in each PMD.
+ */
+ dp_netdev_request_reconfigure(dp);
+ }
+
+ if (ratelimit != upcall_ratelimit) {
+ upcall_ratelimit = ratelimit;
+
+ VLOG_INFO("Upcall ratelimit changed to %s\n",
+ (upcall_ratelimit ? "Enabled" : "Disabled"));
+ }
+
return 0;
}
@@ -3879,6 +3928,12 @@ dpif_netdev_run(struct dpif *dpif)
ovs_mutex_lock(&dp->port_mutex);
non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
if (non_pmd) {
+ /* Reconfig the upcall policer if params have changed */
+ if ((upcall_rate != non_pmd->upcall_tb.rate) ||
+ (upcall_burst != non_pmd->upcall_tb.burst)) {
+ token_bucket_init(&non_pmd->upcall_tb, upcall_rate,
upcall_burst);
+ }
+
ovs_mutex_lock(&dp->non_pmd_mutex);
cycles_count_start(non_pmd);
HMAP_FOR_EACH (port, node, &dp->ports) {
@@ -4074,6 +4129,9 @@ reload:
lc = UINT_MAX;
}
+ /* Initialize upcall policer token bucket with configured
params */
+ token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
+
cycles_count_start(pmd);
for (;;) {
for (i = 0; i < poll_cnt; i++) {
@@ -4554,6 +4612,10 @@ dp_netdev_configure_pmd(struct
dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
emc_cache_init(&pmd->flow_cache);
pmd_alloc_static_tx_qid(pmd);
}
+
+ /* Initialize upcall policer token bucket with configured
params */
+ token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
+
cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *,
&pmd->node),
hash_int(core_id, 0));
}
@@ -4992,7 +5054,7 @@ handle_packet_upcall(struct
dp_netdev_pmd_thread *pmd,
struct dp_packet *packet,
const struct netdev_flow_key *key,
struct ofpbuf *actions, struct ofpbuf
*put_actions,
- int *lost_cnt, long long now)
+ int *lost_cnt, int *rl_drop_cnt, long long
now)
{
struct ofpbuf *add_actions;
struct dp_packet_batch b;
@@ -5000,6 +5062,18 @@ handle_packet_upcall(struct
dp_netdev_pmd_thread *pmd,
ovs_u128 ufid;
int error;
+ /*
+ * Grab a token from the upcall policer to enter slowpath. If
token
+ * is not available, drop and account the packet. This is to
+ * rate-limit packets getting into slowpath.
+ */
+ if (upcall_ratelimit &&
!token_bucket_withdraw(&pmd->upcall_tb, 1)) {
+ dp_packet_delete(packet);
+ (*rl_drop_cnt)++;
+
+ return;
+ }
+
match.tun_md.valid = false;
miniflow_expand(&key->mf, &match.flow);
@@ -5074,7 +5148,7 @@ fast_path_processing(struct
dp_netdev_pmd_thread *pmd,
struct dpcls *cls;
struct dpcls_rule *rules[PKT_ARRAY_SIZE];
struct dp_netdev *dp = pmd->dp;
- int miss_cnt = 0, lost_cnt = 0;
+ int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
int lookup_cnt = 0, add_lookup_cnt;
bool any_miss;
size_t i;
@@ -5118,7 +5192,7 @@ fast_path_processing(struct
dp_netdev_pmd_thread *pmd,
miss_cnt++;
handle_packet_upcall(pmd, packet, &keys[i], &actions,
- &put_actions, &lost_cnt, now);
+ &put_actions, &lost_cnt,
&rl_drop_cnt, now);
}
ofpbuf_uninit(&actions);
@@ -5151,6 +5225,7 @@ fast_path_processing(struct
dp_netdev_pmd_thread *pmd,
dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
+ dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP,
rl_drop_cnt);
}
/* Packets enter the datapath from a port (or from recirculation)
here.
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index c145e1a..3d86367 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -397,6 +397,53 @@
</p>
</column>
+ <column name="other_config" key="upcall-rl"
+ type='{"type": "boolean"}'>
+ <p>
+ Set this value to <code>true</code> to enable upcall
rate-limiting.
+ The upcall parameters like rate and burst will be
ignored, if this is
+ not set.
+ </p>
+ <p>
+ The default value is <code>false</code> and upcall
rate-limiting will
+ be disabled.
+ </p>
+ </column>
+
+ <column name="other_config" key="upcall-rate"
+ type='{"type": "integer", "minInteger": 0, "maxInteger":
4294967295}'>
+ <p>
+ Specifies the rate of upcalls in packets-per-second that
is to be
+ allowed. For example, if the value is 10000, then those
many upcalls
+ (for packets) are allowed per second in each of the
packet polling
+ thread (PMD or non-PMD).
+ </p>
+ <p>
+ A value of <code>0</code> means, no upcalls would be
allowed i.e,
+ upcall will be disabled. This is mainly for debugging.
+ </p>
+ <p>
+ The default value is 1000.
+ </p>
+ </column>
+
+ <column name="other_config" key="upcall-burst"
+ type='{"type": "integer", "minInteger": 0, "maxInteger":
4294967295}'>
+ <p>
+ Specifies the maximum burst of upcalls in
packets-per-second that is
+ to be allowed. For example, if the value is 15000, then
a maximum
+ burst of 15000 upcalls (for packets) are allowed per
second in each
+ of the packet polling thread (PMD or non-PMD).
+ </p>
+ <p>
+ A value of <code>0</code> means, no upcalls would be
allowed i.e,
+ upcall will be disabled. This is mainly for debugging.
+ </p>
+ <p>
+ The default value is 1000.
+ </p>
+ </column>
+
<column name="other_config" key="vlan-limit"
type='{"type": "integer", "minInteger": 0}'>
<p>
--
1.9.1
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev