Jan, Have addressed your comments/suggestions. Please take a look and let me know if you have any more comments.
Signed-off-by: Manohar K C <manohar.krishnappa.chidambarasw...@ericsson.com> CC: Jan Scheurich <jan.scheur...@ericsson.com> --- v3 : v2 rebased to master and adpated to dpif-netdev-perf counters. https://patchwork.ozlabs.org/patch/909676/ v2 : v1 rebased to master. https://patchwork.ozlabs.org/patch/860687/ v1 : Initial patch for upcall rate-limiter based on token-bucket. https://patchwork.ozlabs.org/patch/836737/ Documentation/howto/dpdk.rst | 53 ++++++++++++++++++ lib/dpif-netdev-perf.h | 1 + lib/dpif-netdev.c | 130 ++++++++++++++++++++++++++++++++++++++++--- lib/dpif.h | 1 + vswitchd/vswitch.xml | 42 ++++++++++++++ 5 files changed, 220 insertions(+), 7 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 380181d..b717804 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -358,6 +358,59 @@ devices to bridge ``br0``. Once complete, follow the below steps: $ cat /proc/interrupts | grep virtio +Upcall rate limiting +-------------------- +In OVS, when an incoming packet does not match any flow in the flow +table maintained in fast-path (either in userspace or kernel), the +packet is processed in the slow-path via a mechanism known as upcall. +In OVS-DPDK, both fast-path and slow-path execute in the context of a +common poll-mode-driver (PMD) thread, without any partitioning of CPU +cycles between the two. + +When there is a burst of new flows coming into the data-path, packets +are punted to slow-path in the order they are received and the PMD is +busy for the duration of the upcall. Slow-path processing of a packet +consumes 100-200 times the cycles of fast-path handling. As a result, +the forwarding performance of a PMD degrades significantly during an +upcall burst. If the PMD was highly loaded already, it becomes temporarily +overloaded and its rx queues start filling up. If the upcall burst is long +enough, packets will be dropped when rx queues are full. This happens even +if the new flows are unexpected and the slow-path decides to drop the +packets. + +It is likely that most of the packets dropped due to rx queue overflow +belong to established flows that should have been processed by the +fast-path. Hence, the current OVS-DPDK architecture favors the handling +of new flows over the forwarding of established flows. This is generally +a sub-optimal approach. + +Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS +attacks. But even sporadic bursts of e.g. unexpected multicast packets +have shown to cause such packet drops. + +ovs-vsctl can be used to enable and configure upcall rate limit parameters. +There are 2 configurable parameters ``upcall-rate`` and ``upcall-burst`` which +take effect when the configuration parameter ``upcall-rl`` is set to true. + +Upcall rate-limiting is done at independent PMD level. Configured values for +``upcall-rate`` and ``upcall-burst`` are used indepdently in each PMD +(and non-PMD) threads which execute upcalls. + +Upcall rate should be set using ``upcall-rate`` in upcalls-per-sec. If not +configured, default value of 500 upcalls-per-sec will be set. For example:: + + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000 + +Upcall burst size should be set using ``upcall-burst`` in upcalls. If not +configured, default value of 500 upcalls will be set. For example:: + + $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000 + +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For +example:: + + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true + Further Reading --------------- diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 5993c25..e4b945a 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -55,6 +55,7 @@ enum pmd_stat_type { * number of packet passes through the datapath * pipeline and should not be overlapping with each * other. */ + PMD_STAT_RL_DROP, /* Packets dropped due to upcall rate-limit. */ PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table hits. Each MASKED_HIT hit will have >= 1 MASKED_LOOKUP(s). */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index be31fd0..dba8629 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -101,6 +101,11 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex) static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); +/* These default values may be tuned based on upcall performance */ +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */ +#define UPCALL_RATE_DEFAULT 500 /* upcalls-per-sec */ +#define UPCALL_BURST_DEFAULT 500 /* maximum burst of upcalls */ + #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \ | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \ | CS_SRC_NAT | CS_DST_NAT) @@ -316,6 +321,12 @@ struct dp_netdev { uint64_t last_tnl_conf_seq; struct conntrack conntrack; + + /* Upcall rate-limiter parameters */ + atomic_bool upcall_ratelimit; + atomic_uint32_t upcall_rate; + atomic_uint32_t upcall_burst; + bool upcall_params_changed; }; static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) @@ -615,6 +626,11 @@ struct dp_netdev_pmd_thread { /* Keep track of detailed PMD performance statistics. */ struct pmd_perf_stats perf_stats; + /* Policer to rate limit upcalls */ + struct token_bucket upcall_tb; /* Cache of rate and burst parameters. */ + bool upcall_ratelimit; /* Cache of global enable/disable + parameter. */ + /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; }; @@ -856,12 +872,13 @@ pmd_info_show_stats(struct ds *reply, "\tavg. subtable lookups per megaflow hit: %.02f\n" "\tmiss with success upcall: %"PRIu64"\n" "\tmiss with failed upcall: %"PRIu64"\n" + "\tdrops due to upcall ratelimiter: %"PRIu64"\n" "\tavg. packets per output batch: %.02f\n", total_packets, stats[PMD_STAT_RECIRC], passes_per_pkt, stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT], lookups_per_hit, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], - packets_per_batch); + stats[PMD_STAT_RL_DROP], packets_per_batch); if (total_cycles == 0) { return; @@ -1479,6 +1496,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; stats->n_missed += pmd_stats[PMD_STAT_MISS]; stats->n_lost += pmd_stats[PMD_STAT_LOST]; + stats->n_rl_dropped += pmd_stats[PMD_STAT_RL_DROP]; } stats->n_masks = UINT32_MAX; stats->n_mask_hit = UINT64_MAX; @@ -1489,11 +1507,24 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) static void dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd) { + uint32_t rate, burst; + if (pmd->core_id == NON_PMD_CORE_ID) { ovs_mutex_lock(&pmd->dp->non_pmd_mutex); ovs_mutex_lock(&pmd->port_mutex); pmd_load_cached_ports(pmd); ovs_mutex_unlock(&pmd->port_mutex); + + /* Reconfig the upcall policer if params have changed */ + atomic_read_relaxed(&pmd->dp->upcall_rate, &rate); + atomic_read_relaxed(&pmd->dp->upcall_burst, &burst); + if ((rate != pmd->upcall_tb.rate) || + (burst != pmd->upcall_tb.burst)) { + token_bucket_init(&pmd->upcall_tb, rate, burst); + } + atomic_read_relaxed(&pmd->dp->upcall_ratelimit, + &pmd->upcall_ratelimit); + ovs_mutex_unlock(&pmd->dp->non_pmd_mutex); return; } @@ -2987,6 +3018,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) DEFAULT_EM_FLOW_INSERT_INV_PROB); uint32_t insert_min, cur_min; uint32_t tx_flush_interval, cur_tx_flush_interval; + uint32_t rate, cur_rate; + uint32_t burst, cur_burst; + bool ratelimit, cur_ratelimit; tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", DEFAULT_TX_FLUSH_INTERVAL); @@ -3021,6 +3055,44 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } } + /* Handle upcall policer params */ + ratelimit = smap_get_bool(other_config, "upcall-rl", + UPCALL_RATELIMIT_DEFAULT); + rate = smap_get_int(other_config, "upcall-rate", + UPCALL_RATE_DEFAULT); + burst = smap_get_int(other_config, "upcall-burst", + UPCALL_BURST_DEFAULT); + + atomic_read_relaxed(&dp->upcall_ratelimit, &cur_ratelimit); + atomic_read_relaxed(&dp->upcall_rate, &cur_rate); + atomic_read_relaxed(&dp->upcall_burst, &cur_burst); + + if ((rate != cur_rate) || (burst != cur_burst)) { + VLOG_INFO("Upcall ratelimiter params changed : Old - rate=%d burst=%d " + ": New - rate=%d burst=%d\n", cur_rate, cur_burst, + rate, burst); + + atomic_store_relaxed(&dp->upcall_rate, rate); + atomic_store_relaxed(&dp->upcall_burst, burst); + + dp->upcall_params_changed = true; + } + + if (ratelimit != cur_ratelimit) { + VLOG_INFO("Upcall ratelimiter changed to %s\n", + (ratelimit ? "Enabled" : "Disabled")); + VLOG_INFO("Upcall ratelimiter params : rate=%d burst=%d\n", + rate, burst); + + atomic_store_relaxed(&dp->upcall_ratelimit, ratelimit); + + dp->upcall_params_changed = true; + } + + if (dp->upcall_params_changed) { + dp_netdev_request_reconfigure(dp); + } + return 0; } @@ -3653,6 +3725,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp) struct hmapx_node *node; bool changed = false; bool need_to_adjust_static_tx_qids = false; + bool need_to_reinit_upcall_ratelimiter = false; /* The pmd threads should be started only if there's a pmd port in the * datapath. If the user didn't provide any "pmd-cpu-mask", we start @@ -3674,6 +3747,13 @@ reconfigure_pmd_threads(struct dp_netdev *dp) need_to_adjust_static_tx_qids = true; } + /* Check if upcall ratelimiter paramters have changed */ + if (dp->upcall_params_changed) { + need_to_reinit_upcall_ratelimiter = true; + + dp->upcall_params_changed = false; + } + /* Check for unwanted pmd threads */ CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { if (pmd->core_id == NON_PMD_CORE_ID) { @@ -3682,7 +3762,8 @@ reconfigure_pmd_threads(struct dp_netdev *dp) if (!ovs_numa_dump_contains_core(pmd_cores, pmd->numa_id, pmd->core_id)) { hmapx_add(&to_delete, pmd); - } else if (need_to_adjust_static_tx_qids) { + } else if (need_to_adjust_static_tx_qids || + need_to_reinit_upcall_ratelimiter) { pmd->need_reload = true; } } @@ -4106,6 +4187,7 @@ pmd_thread_main(void *f_) int poll_cnt; int i; int process_packets = 0; + uint32_t rate, burst; poll_list = NULL; @@ -4135,6 +4217,15 @@ reload: lc = UINT_MAX; } + /* Reconfig upcall policer token bucket with configured params. */ + atomic_read_relaxed(&pmd->dp->upcall_rate, &rate); + atomic_read_relaxed(&pmd->dp->upcall_burst, &burst); + if ((rate != pmd->upcall_tb.rate) || + (burst != pmd->upcall_tb.burst)) { + token_bucket_init(&pmd->upcall_tb, rate, burst); + } + atomic_read_relaxed(&pmd->dp->upcall_ratelimit, &pmd->upcall_ratelimit); + pmd->intrvl_tsc_prev = 0; atomic_store_relaxed(&pmd->intrvl_cycles, 0); cycles_counter_update(s); @@ -4596,6 +4687,8 @@ static void dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, unsigned core_id, int numa_id) { + uint32_t rate, burst; + pmd->dp = dp; pmd->core_id = core_id; pmd->numa_id = numa_id; @@ -4627,6 +4720,13 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, emc_cache_init(&pmd->flow_cache); pmd_alloc_static_tx_qid(pmd); } + + /* Initialize upcall policer token bucket with configured params */ + atomic_read_relaxed(&dp->upcall_rate, &rate); + atomic_read_relaxed(&dp->upcall_burst, &burst); + token_bucket_init(&pmd->upcall_tb, rate, burst); + atomic_read_relaxed(&dp->upcall_ratelimit, &pmd->upcall_ratelimit); + pmd_perf_stats_init(&pmd->perf_stats); cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), hash_int(core_id, 0)); @@ -5145,6 +5245,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, struct dpcls_rule *rules[PKT_ARRAY_SIZE]; struct dp_netdev *dp = pmd->dp; int upcall_ok_cnt = 0, upcall_fail_cnt = 0; + int upcall_rl_drop_cnt = 0; int lookup_cnt = 0, add_lookup_cnt; bool any_miss; @@ -5185,13 +5286,26 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } - int error = handle_packet_upcall(pmd, packet, &keys[i], - &actions, &put_actions); + /* + * Grab a token from the upcall policer to enter slowpath. If token + * is not available, drop and account the packet. This is to + * rate-limit packets getting into slowpath. + */ + if (pmd->upcall_ratelimit && + !token_bucket_withdraw(&pmd->upcall_tb, 1)) { + dp_packet_delete(packet); - if (OVS_UNLIKELY(error)) { - upcall_fail_cnt++; + upcall_rl_drop_cnt++; } else { - upcall_ok_cnt++; + + int error = handle_packet_upcall(pmd, packet, &keys[i], + &actions, &put_actions); + + if (OVS_UNLIKELY(error)) { + upcall_fail_cnt++; + } else { + upcall_ok_cnt++; + } } } @@ -5228,6 +5342,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, upcall_ok_cnt); pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, upcall_fail_cnt); + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RL_DROP, + upcall_rl_drop_cnt); } /* Packets enter the datapath from a port (or from recirculation) here. diff --git a/lib/dpif.h b/lib/dpif.h index 94f89ec..708e798 100644 --- a/lib/dpif.h +++ b/lib/dpif.h @@ -438,6 +438,7 @@ struct dpif_dp_stats { uint64_t n_hit; /* Number of flow table matches. */ uint64_t n_missed; /* Number of flow table misses. */ uint64_t n_lost; /* Number of misses not sent to userspace. */ + uint64_t n_rl_dropped; /* Number of drops due to upcall rate-limit. */ uint64_t n_flows; /* Number of flows present. */ uint64_t n_mask_hit; /* Number of mega flow masks visited for flow table matches. */ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 94a64dd..858ba4e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -428,6 +428,48 @@ </p> </column> + <column name="other_config" key="upcall-rl" + type='{"type": "boolean"}'> + <p> + Set this value to <code>true</code> to enable upcall rate-limiting. + The upcall parameters like rate and burst will be ignored, if this is + not set. + </p> + <p> + The default value is <code>false</code> and upcall rate-limiting will + be disabled. + </p> + </column> + + <column name="other_config" key="upcall-rate" + type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'> + <p> + When upcall rate limiting is enabled, this specifies the sustained + rate of upcalls in upcalls per second that are admitted per + packet-polling thread (PMD or non-PMD) in the netdev datapath when + upcall rate limiting is enabled. The default value is 500 upcalls + per second. + + Setting both upcall-rate and upcall-burst to <code>0</code> disables + upcalls in the netdev datapath. This can be used for debugging + purposes. + </p> + </column> + + <column name="other_config" key="upcall-burst" + type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'> + <p> + When upcall rate limiting is enabled, this specifies the maximum + burst of upcalls that are admitted per packet-polling thread + (PMD or non-PMD) in the netdev datapath independent from their + packet arrival rate. The default value is a burst of 500 upcalls. + + Setting both upcall-rate and upcall-burst to <code>0</code> disables + upcalls in the netdev datapath. This can be used for debugging + purposes. + </p> + </column> + <column name="other_config" key="vlan-limit" type='{"type": "integer", "minInteger": 0}'> <p> -- 1.9.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev