Hi Ben, This patch is about rate-limiting miss upcalls. Could you elaborate how you suggest to use the metering datapath action to realize that?
I also think this is a specific issue for the DPDK datapath as only the miss upcalls are executed in-line in the PMD thread so that mass upcalls directly impact the datapath performance. The kernel datapath (not sure about Windows) doesn't have the same issue as upcalls are transferred to the user-space through netlink. Thanks, Jan > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Ben Pfaff > Sent: Thursday, 15 February, 2018 19:13 > To: Manohar Krishnappa Chidambaraswamy > <[email protected]> > Cc: [email protected] > Subject: Re: [ovs-dev] [PATCH v2] Upcall/Slowpath rate limiter for OVS > > Metering is implemented as a datapath action so I think you can share > the mechanism instead of inventing a new one. > > On Thu, Feb 15, 2018 at 05:57:58PM +0000, Manohar Krishnappa Chidambaraswamy > wrote: > > [Adding to my previous reply] > > > > Ben, > > > > “[manu] Do you mean using meters in dp-netdev instead of token-bucket. I > > don’t know much about meters. Will check.” > > > > I checked about meters. For limiting flows hitting slowpath, meters can’t > > be used. Since meter is an action, new flows would still hit the > slow-path before meter-action can be executed. > > > > Thanx > > Manu > > > > On 25/01/18, 11:52 AM, "[email protected] on behalf of > > Manohar Krishnappa Chidambaraswamy" <ovs-dev- > [email protected] on behalf of > [email protected]> wrote: > > > > Ben, > > > > On 23/01/18, 2:01 AM, "Ben Pfaff" <[email protected]> wrote: > > > > I guess I'm probably the wrong one to ultimately review this patch, > > since it's really a DPDK patch. > > > > It's difficult to say for sure that this is really an improvement, > > since > > it also causes traffic to be dropped (that is, the traffic that > > needs to > > be slow-pathed). > > > > [manu] Thanx for your response. Its true that this also results in > > packet-drops, but these packets belong to new flows that can potentially > > take lot more cycles of PMD. Already established flows will *not* see > > any drops > > due to this rate-limiting. If implemented in DPDK, we can’t > > differentiate packets > > belonging to already established vs new flows. To do this, we would > > need something > > in OVS only. > > > > Sorry I didn’t include the description in my v2 patch. Here is the main > > intent > > behind this patch (from v1). Do you think this is reasonable? > > > > “In OVS-DPDK, both fast-path and slow-path execute in the context of a > > common > > thread (i.e, PMD thread), without any partitioning of CPU cycles > > between the > > two. When there is a burst of new flows coming into the data-path, > > packets > > are punted to slow-path in the order they are received and the PMD is > > busy > > for the duration of the upcall. Slow-path processing of a packet > > consumes > > 100-200 times the cycles of fast-path handling. As a result, the > > forwarding > > performance of a PMD degrades significantly during an upcall burst. If > > the > > PMD was highly loaded already, it becomes temporarily overloaded and > > its rx > > queues start filling up. If the upcall burst is long enough, packets > > will be > > dropped when rx queues are full. This happens even if the new flows are > > unexpected and the slow-path decides to drop the packets. > > > > It is likely that most of the packets dropped due to rx queue overflow > > belong > > to established flows that should have been processed by the fast-path. > > Hence, > > the current OVS-DPDK architecture favors the handling of new flows over > > the > > forwarding of established flows. This is generally a sub-optimal > > approach. > > > > Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS > > attacks. > > But even sporadic bursts of e.g. unexpected multicast packets have > > shown to > > cause such packet drops.” > > > > I think that this could be handled in a datapath independent way > > using > > meters. > > > > [manu] Do you mean using meters in dp-netdev instead of token-bucket. I > > don’t > > know much about meters. Will check. > > > > Thanx > > Manu > > > > On Mon, Jan 15, 2018 at 08:33:31AM +0000, Manohar Krishnappa > > Chidambaraswamy wrote: > > > Rebased to master. > > > > > > Could you please review this patch? > > > > > > v1 of this patch has been sitting idle for quite sometime. I > > believe this > > > feature is important. Without this feature, an attacker sending > > packets > > > belonging to different (unestablished) flows can result in PMDs > > running only > > > upcalls, reducing the throughput drastically. Huge traffic loss > > due to this > > > is more like a DOS attack. > > > > > > Please let me know your thoughts/comments. This is my first patch > > series in > > > OVS. Sorry if naming of this patch as v2 is not appropriate. > > > > > > Thanx > > > Manu > > > > > > v1 of this patch: https://patchwork.ozlabs.org/patch/836737/ > > > > > > Signed-off-by: Manohar K C > > <[email protected]> > > > cc: Jan Scheurich <[email protected]> > > > cc: Ben Pfaff <[email protected]> > > > > > > --- > > > Documentation/howto/dpdk.rst | 21 ++++++++++ > > > debian/libopenvswitch-dev.install | 1 - > > > debian/libopenvswitch.install | 1 - > > > debian/rules | 4 +- > > > lib/dpif-netdev.c | 81 > > +++++++++++++++++++++++++++++++++++++-- > > > vswitchd/vswitch.xml | 47 +++++++++++++++++++++++ > > > 6 files changed, 148 insertions(+), 7 deletions(-) > > > > > > diff --git a/Documentation/howto/dpdk.rst > > b/Documentation/howto/dpdk.rst > > > index 587aaed..1db1562 100644 > > > --- a/Documentation/howto/dpdk.rst > > > +++ b/Documentation/howto/dpdk.rst > > > @@ -716,3 +716,24 @@ devices to bridge ``br0``. Once complete, > > follow the below steps: > > > Check traffic on multiple queues:: > > > > > > $ cat /proc/interrupts | grep virtio > > > + > > > +Upcall rate limiting > > > +-------------------- > > > +ovs-vsctl can be used to enable and configure upcall rate limit > > parameters. > > > +There are 2 configurable values ``upcall-rate`` and > > ``upcall-burst`` which > > > +take effect when global enable knob ``upcall-rl`` is set to true. > > > + > > > +Upcall rate should be set using ``upcall-rate`` in > > packets-per-sec. For > > > +example:: > > > + > > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000 > > > + > > > +Upcall burst should be set using ``upcall-burst`` in > > packets-per-sec. For > > > +example:: > > > + > > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000 > > > + > > > +Upcall ratelimit feature should be globally enabled using > > ``upcall-rl``. For > > > +example:: > > > + > > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > > index c7d157a..e2fd92c 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -101,6 +101,16 @@ static struct shash dp_netdevs > > OVS_GUARDED_BY(dp_netdev_mutex) > > > > > > static struct vlog_rate_limit upcall_rl = > > VLOG_RATE_LIMIT_INIT(600, 600); > > > > > > +/* Upcall rate-limit parameters */ > > > +static bool upcall_ratelimit; > > > +static unsigned int upcall_rate; > > > +static unsigned int upcall_burst; > > > + > > > +/* TODO: Tune these default values based on upcall perf test */ > > > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */ > > > +#define UPCALL_RATE_DEFAULT 1000 /* pps */ > > > +#define UPCALL_BURST_DEFAULT 1000 /* pps */ > > > + > > > #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | > > CS_RELATED \ > > > | CS_INVALID | CS_REPLY_DIR > > | CS_TRACKED \ > > > | CS_SRC_NAT | CS_DST_NAT) > > > @@ -340,6 +350,7 @@ enum dp_stat_type { > > > hits */ > > > DP_STAT_SENT_PKTS, /* Packets that has been sent. */ > > > DP_STAT_SENT_BATCHES, /* Number of batches sent. */ > > > + DP_STAT_RATELIMIT_DROP, /* Packets dropped due to upcall > > policer */ > > > DP_N_STATS > > > }; > > > > > > @@ -645,6 +656,9 @@ struct dp_netdev_pmd_thread { > > > unsigned long long stats_zero[DP_N_STATS]; > > > uint64_t cycles_zero[PMD_N_CYCLES]; > > > > > > + /* Policer to rate limit slow-path */ > > > + struct token_bucket upcall_tb; > > > + > > > /* Set to true if the pmd thread needs to be reloaded. */ > > > bool need_reload; > > > }; > > > @@ -894,10 +908,11 @@ pmd_info_show_stats(struct ds *reply, > > > ds_put_format(reply, > > > "\temc hits:%llu\n\tmegaflow hits:%llu\n" > > > "\tavg. subtable lookups per hit:%.2f\n" > > > - "\tmiss:%llu\n\tlost:%llu\n" > > > + "\tmiss:%llu\n\tlost:%llu\n\tratelimit > > drops:%llu\n" > > > "\tavg. packets per output batch: %.2f\n", > > > stats[DP_STAT_EXACT_HIT], > > stats[DP_STAT_MASKED_HIT], > > > lookups_per_hit, stats[DP_STAT_MISS], > > stats[DP_STAT_LOST], > > > + stats[DP_STAT_RATELIMIT_DROP], > > > packets_per_batch); > > > > > > if (total_cycles == 0) { > > > @@ -3031,6 +3046,8 @@ dpif_netdev_set_config(struct dpif *dpif, > > const struct smap *other_config) > > > smap_get_ullong(other_config, "emc-insert-inv-prob", > > > DEFAULT_EM_FLOW_INSERT_INV_PROB); > > > uint32_t insert_min, cur_min; > > > + unsigned int rate, burst; > > > + bool ratelimit; > > > > > > if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) { > > > free(dp->pmd_cmask); > > > @@ -3056,6 +3073,36 @@ dpif_netdev_set_config(struct dpif *dpif, > > const struct smap *other_config) > > > } > > > } > > > > > > + /* Handle upcall policer params */ > > > + ratelimit = smap_get_bool(other_config, "upcall-rl", > > > + UPCALL_RATELIMIT_DEFAULT); > > > + rate = smap_get_int(other_config, "upcall-rate", > > > + UPCALL_RATE_DEFAULT); > > > + burst = smap_get_int(other_config, "upcall-burst", > > > + UPCALL_BURST_DEFAULT); > > > + > > > + if ((rate != upcall_rate) || (burst != upcall_burst)) { > > > + VLOG_INFO("Upcall ratelimit params changed : Old - > > rate=%d burst=%d " > > > + ": New - rate=%d burst=%d\n", upcall_rate, > > upcall_burst, > > > + rate, burst); > > > + > > > + upcall_rate = rate; > > > + upcall_burst = burst; > > > + > > > + /* > > > + * TODO: See if there is a way to reconfig only the > > policer > > > + * in each PMD. > > > + */ > > > + dp_netdev_request_reconfigure(dp); > > > + } > > > + > > > + if (ratelimit != upcall_ratelimit) { > > > + upcall_ratelimit = ratelimit; > > > + > > > + VLOG_INFO("Upcall ratelimit changed to %s\n", > > > + (upcall_ratelimit ? "Enabled" : "Disabled")); > > > + } > > > + > > > return 0; > > > } > > > > > > @@ -3958,6 +4005,12 @@ dpif_netdev_run(struct dpif *dpif) > > > ovs_mutex_lock(&dp->port_mutex); > > > non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); > > > if (non_pmd) { > > > + /* Reconfig the upcall policer if params have changed */ > > > + if ((upcall_rate != non_pmd->upcall_tb.rate) || > > > + (upcall_burst != non_pmd->upcall_tb.burst)) { > > > + token_bucket_init(&non_pmd->upcall_tb, upcall_rate, > > upcall_burst); > > > + } > > > + > > > ovs_mutex_lock(&dp->non_pmd_mutex); > > > cycles_count_start(non_pmd); > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > > @@ -4154,6 +4207,9 @@ reload: > > > lc = UINT_MAX; > > > } > > > > > > + /* Initialize upcall policer token bucket with configured > > params */ > > > + token_bucket_init(&pmd->upcall_tb, upcall_rate, > > upcall_burst); > > > + > > > cycles_count_start(pmd); > > > for (;;) { > > > for (i = 0; i < poll_cnt; i++) { > > > @@ -4638,6 +4694,10 @@ dp_netdev_configure_pmd(struct > > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > > > emc_cache_init(&pmd->flow_cache); > > > pmd_alloc_static_tx_qid(pmd); > > > } > > > + > > > + /* Initialize upcall policer token bucket with configured > > params */ > > > + token_bucket_init(&pmd->upcall_tb, upcall_rate, > > upcall_burst); > > > + > > > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node > > *, &pmd->node), > > > hash_int(core_id, 0)); > > > } > > > @@ -5076,7 +5136,7 @@ handle_packet_upcall(struct > > dp_netdev_pmd_thread *pmd, > > > struct dp_packet *packet, > > > const struct netdev_flow_key *key, > > > struct ofpbuf *actions, struct ofpbuf > > *put_actions, > > > - int *lost_cnt) > > > + int *lost_cnt, int *rl_drop_cnt) > > > { > > > struct ofpbuf *add_actions; > > > struct dp_packet_batch b; > > > @@ -5084,6 +5144,18 @@ handle_packet_upcall(struct > > dp_netdev_pmd_thread *pmd, > > > ovs_u128 ufid; > > > int error; > > > > > > + /* > > > + * Grab a token from the upcall policer to enter slowpath. > > If token > > > + * is not available, drop and account the packet. This is to > > > + * rate-limit packets getting into slowpath. > > > + */ > > > + if (upcall_ratelimit && > > !token_bucket_withdraw(&pmd->upcall_tb, 1)) { > > > + dp_packet_delete(packet); > > > + (*rl_drop_cnt)++; > > > + > > > + return; > > > + } > > > + > > > match.tun_md.valid = false; > > > miniflow_expand(&key->mf, &match.flow); > > > > > > @@ -5158,7 +5230,7 @@ fast_path_processing(struct > > dp_netdev_pmd_thread *pmd, > > > struct dpcls *cls; > > > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > > > struct dp_netdev *dp = pmd->dp; > > > - int miss_cnt = 0, lost_cnt = 0; > > > + int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0; > > > int lookup_cnt = 0, add_lookup_cnt; > > > bool any_miss; > > > size_t i; > > > @@ -5202,7 +5274,7 @@ fast_path_processing(struct > > dp_netdev_pmd_thread *pmd, > > > > > > miss_cnt++; > > > handle_packet_upcall(pmd, packet, &keys[i], &actions, > > > - &put_actions, &lost_cnt); > > > + &put_actions, &lost_cnt, > > &rl_drop_cnt); > > > } > > > > > > ofpbuf_uninit(&actions); > > > @@ -5235,6 +5307,7 @@ fast_path_processing(struct > > dp_netdev_pmd_thread *pmd, > > > dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > > > dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > > > dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > > > + dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, > > rl_drop_cnt); > > > } > > > > > > /* Packets enter the datapath from a port (or from > > recirculation) here. > > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > > index 58c0ebd..bc77c35 100644 > > > --- a/vswitchd/vswitch.xml > > > +++ b/vswitchd/vswitch.xml > > > @@ -412,6 +412,53 @@ > > > </p> > > > </column> > > > > > > + <column name="other_config" key="upcall-rl" > > > + type='{"type": "boolean"}'> > > > + <p> > > > + Set this value to <code>true</code> to enable upcall > > rate-limiting. > > > + The upcall parameters like rate and burst will be > > ignored, if this is > > > + not set. > > > + </p> > > > + <p> > > > + The default value is <code>false</code> and upcall > > rate-limiting will > > > + be disabled. > > > + </p> > > > + </column> > > > + > > > + <column name="other_config" key="upcall-rate" > > > + type='{"type": "integer", "minInteger": 0, "maxInteger": > > 4294967295}'> > > > + <p> > > > + Specifies the rate of upcalls in packets-per-second > > that is to be > > > + allowed. For example, if the value is 10000, then > > those many upcalls > > > + (for packets) are allowed per second in each of the > > packet polling > > > + thread (PMD or non-PMD). > > > + </p> > > > + <p> > > > + A value of <code>0</code> means, no upcalls would be > > allowed i.e, > > > + upcall will be disabled. This is mainly for debugging. > > > + </p> > > > + <p> > > > + The default value is 1000. > > > + </p> > > > + </column> > > > + > > > + <column name="other_config" key="upcall-burst" > > > + type='{"type": "integer", "minInteger": 0, "maxInteger": > > 4294967295}'> > > > + <p> > > > + Specifies the maximum burst of upcalls in > > packets-per-second that is > > > + to be allowed. For example, if the value is 15000, > > then a maximum > > > + burst of 15000 upcalls (for packets) are allowed per > > second in each > > > + of the packet polling thread (PMD or non-PMD). > > > + </p> > > > + <p> > > > + A value of <code>0</code> means, no upcalls would be > > allowed i.e, > > > + upcall will be disabled. This is mainly for debugging. > > > + </p> > > > + <p> > > > + The default value is 1000. > > > + </p> > > > + </column> > > > + > > > <column name="other_config" key="vlan-limit" > > > type='{"type": "integer", "minInteger": 0}'> > > > <p> > > > -- > > > 1.9.1 > > > > > > > > > > > > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
