Metering is implemented as a datapath action so I think you can share the mechanism instead of inventing a new one.
On Thu, Feb 15, 2018 at 05:57:58PM +0000, Manohar Krishnappa Chidambaraswamy wrote: > [Adding to my previous reply] > > Ben, > > “[manu] Do you mean using meters in dp-netdev instead of token-bucket. I > don’t know much about meters. Will check.” > > I checked about meters. For limiting flows hitting slowpath, meters can’t be > used. Since meter is an action, new flows would still hit the slow-path > before meter-action can be executed. > > Thanx > Manu > > On 25/01/18, 11:52 AM, "[email protected] on behalf of Manohar > Krishnappa Chidambaraswamy" <[email protected] on behalf of > [email protected]> wrote: > > Ben, > > On 23/01/18, 2:01 AM, "Ben Pfaff" <[email protected]> wrote: > > I guess I'm probably the wrong one to ultimately review this patch, > since it's really a DPDK patch. > > It's difficult to say for sure that this is really an improvement, > since > it also causes traffic to be dropped (that is, the traffic that needs > to > be slow-pathed). > > [manu] Thanx for your response. Its true that this also results in > packet-drops, but these packets belong to new flows that can potentially > take lot more cycles of PMD. Already established flows will *not* see any > drops > due to this rate-limiting. If implemented in DPDK, we can’t differentiate > packets > belonging to already established vs new flows. To do this, we would need > something > in OVS only. > > Sorry I didn’t include the description in my v2 patch. Here is the main > intent > behind this patch (from v1). Do you think this is reasonable? > > “In OVS-DPDK, both fast-path and slow-path execute in the context of a > common > thread (i.e, PMD thread), without any partitioning of CPU cycles between > the > two. When there is a burst of new flows coming into the data-path, packets > are punted to slow-path in the order they are received and the PMD is busy > for the duration of the upcall. Slow-path processing of a packet consumes > 100-200 times the cycles of fast-path handling. As a result, the > forwarding > performance of a PMD degrades significantly during an upcall burst. If the > PMD was highly loaded already, it becomes temporarily overloaded and its > rx > queues start filling up. If the upcall burst is long enough, packets will > be > dropped when rx queues are full. This happens even if the new flows are > unexpected and the slow-path decides to drop the packets. > > It is likely that most of the packets dropped due to rx queue overflow > belong > to established flows that should have been processed by the fast-path. > Hence, > the current OVS-DPDK architecture favors the handling of new flows over > the > forwarding of established flows. This is generally a sub-optimal approach. > > Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS > attacks. > But even sporadic bursts of e.g. unexpected multicast packets have shown > to > cause such packet drops.” > > I think that this could be handled in a datapath independent way using > meters. > > [manu] Do you mean using meters in dp-netdev instead of token-bucket. I > don’t > know much about meters. Will check. > > Thanx > Manu > > On Mon, Jan 15, 2018 at 08:33:31AM +0000, Manohar Krishnappa > Chidambaraswamy wrote: > > Rebased to master. > > > > Could you please review this patch? > > > > v1 of this patch has been sitting idle for quite sometime. I > believe this > > feature is important. Without this feature, an attacker sending > packets > > belonging to different (unestablished) flows can result in PMDs > running only > > upcalls, reducing the throughput drastically. Huge traffic loss due > to this > > is more like a DOS attack. > > > > Please let me know your thoughts/comments. This is my first patch > series in > > OVS. Sorry if naming of this patch as v2 is not appropriate. > > > > Thanx > > Manu > > > > v1 of this patch: https://patchwork.ozlabs.org/patch/836737/ > > > > Signed-off-by: Manohar K C > <[email protected]> > > cc: Jan Scheurich <[email protected]> > > cc: Ben Pfaff <[email protected]> > > > > --- > > Documentation/howto/dpdk.rst | 21 ++++++++++ > > debian/libopenvswitch-dev.install | 1 - > > debian/libopenvswitch.install | 1 - > > debian/rules | 4 +- > > lib/dpif-netdev.c | 81 > +++++++++++++++++++++++++++++++++++++-- > > vswitchd/vswitch.xml | 47 +++++++++++++++++++++++ > > 6 files changed, 148 insertions(+), 7 deletions(-) > > > > diff --git a/Documentation/howto/dpdk.rst > b/Documentation/howto/dpdk.rst > > index 587aaed..1db1562 100644 > > --- a/Documentation/howto/dpdk.rst > > +++ b/Documentation/howto/dpdk.rst > > @@ -716,3 +716,24 @@ devices to bridge ``br0``. Once complete, > follow the below steps: > > Check traffic on multiple queues:: > > > > $ cat /proc/interrupts | grep virtio > > + > > +Upcall rate limiting > > +-------------------- > > +ovs-vsctl can be used to enable and configure upcall rate limit > parameters. > > +There are 2 configurable values ``upcall-rate`` and > ``upcall-burst`` which > > +take effect when global enable knob ``upcall-rl`` is set to true. > > + > > +Upcall rate should be set using ``upcall-rate`` in > packets-per-sec. For > > +example:: > > + > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000 > > + > > +Upcall burst should be set using ``upcall-burst`` in > packets-per-sec. For > > +example:: > > + > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000 > > + > > +Upcall ratelimit feature should be globally enabled using > ``upcall-rl``. For > > +example:: > > + > > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index c7d157a..e2fd92c 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -101,6 +101,16 @@ static struct shash dp_netdevs > OVS_GUARDED_BY(dp_netdev_mutex) > > > > static struct vlog_rate_limit upcall_rl = > VLOG_RATE_LIMIT_INIT(600, 600); > > > > +/* Upcall rate-limit parameters */ > > +static bool upcall_ratelimit; > > +static unsigned int upcall_rate; > > +static unsigned int upcall_burst; > > + > > +/* TODO: Tune these default values based on upcall perf test */ > > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */ > > +#define UPCALL_RATE_DEFAULT 1000 /* pps */ > > +#define UPCALL_BURST_DEFAULT 1000 /* pps */ > > + > > #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | > CS_RELATED \ > > | CS_INVALID | CS_REPLY_DIR | > CS_TRACKED \ > > | CS_SRC_NAT | CS_DST_NAT) > > @@ -340,6 +350,7 @@ enum dp_stat_type { > > hits */ > > DP_STAT_SENT_PKTS, /* Packets that has been sent. */ > > DP_STAT_SENT_BATCHES, /* Number of batches sent. */ > > + DP_STAT_RATELIMIT_DROP, /* Packets dropped due to upcall > policer */ > > DP_N_STATS > > }; > > > > @@ -645,6 +656,9 @@ struct dp_netdev_pmd_thread { > > unsigned long long stats_zero[DP_N_STATS]; > > uint64_t cycles_zero[PMD_N_CYCLES]; > > > > + /* Policer to rate limit slow-path */ > > + struct token_bucket upcall_tb; > > + > > /* Set to true if the pmd thread needs to be reloaded. */ > > bool need_reload; > > }; > > @@ -894,10 +908,11 @@ pmd_info_show_stats(struct ds *reply, > > ds_put_format(reply, > > "\temc hits:%llu\n\tmegaflow hits:%llu\n" > > "\tavg. subtable lookups per hit:%.2f\n" > > - "\tmiss:%llu\n\tlost:%llu\n" > > + "\tmiss:%llu\n\tlost:%llu\n\tratelimit > drops:%llu\n" > > "\tavg. packets per output batch: %.2f\n", > > stats[DP_STAT_EXACT_HIT], > stats[DP_STAT_MASKED_HIT], > > lookups_per_hit, stats[DP_STAT_MISS], > stats[DP_STAT_LOST], > > + stats[DP_STAT_RATELIMIT_DROP], > > packets_per_batch); > > > > if (total_cycles == 0) { > > @@ -3031,6 +3046,8 @@ dpif_netdev_set_config(struct dpif *dpif, > const struct smap *other_config) > > smap_get_ullong(other_config, "emc-insert-inv-prob", > > DEFAULT_EM_FLOW_INSERT_INV_PROB); > > uint32_t insert_min, cur_min; > > + unsigned int rate, burst; > > + bool ratelimit; > > > > if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) { > > free(dp->pmd_cmask); > > @@ -3056,6 +3073,36 @@ dpif_netdev_set_config(struct dpif *dpif, > const struct smap *other_config) > > } > > } > > > > + /* Handle upcall policer params */ > > + ratelimit = smap_get_bool(other_config, "upcall-rl", > > + UPCALL_RATELIMIT_DEFAULT); > > + rate = smap_get_int(other_config, "upcall-rate", > > + UPCALL_RATE_DEFAULT); > > + burst = smap_get_int(other_config, "upcall-burst", > > + UPCALL_BURST_DEFAULT); > > + > > + if ((rate != upcall_rate) || (burst != upcall_burst)) { > > + VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d > burst=%d " > > + ": New - rate=%d burst=%d\n", upcall_rate, > upcall_burst, > > + rate, burst); > > + > > + upcall_rate = rate; > > + upcall_burst = burst; > > + > > + /* > > + * TODO: See if there is a way to reconfig only the policer > > + * in each PMD. > > + */ > > + dp_netdev_request_reconfigure(dp); > > + } > > + > > + if (ratelimit != upcall_ratelimit) { > > + upcall_ratelimit = ratelimit; > > + > > + VLOG_INFO("Upcall ratelimit changed to %s\n", > > + (upcall_ratelimit ? "Enabled" : "Disabled")); > > + } > > + > > return 0; > > } > > > > @@ -3958,6 +4005,12 @@ dpif_netdev_run(struct dpif *dpif) > > ovs_mutex_lock(&dp->port_mutex); > > non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID); > > if (non_pmd) { > > + /* Reconfig the upcall policer if params have changed */ > > + if ((upcall_rate != non_pmd->upcall_tb.rate) || > > + (upcall_burst != non_pmd->upcall_tb.burst)) { > > + token_bucket_init(&non_pmd->upcall_tb, upcall_rate, > upcall_burst); > > + } > > + > > ovs_mutex_lock(&dp->non_pmd_mutex); > > cycles_count_start(non_pmd); > > HMAP_FOR_EACH (port, node, &dp->ports) { > > @@ -4154,6 +4207,9 @@ reload: > > lc = UINT_MAX; > > } > > > > + /* Initialize upcall policer token bucket with configured > params */ > > + token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst); > > + > > cycles_count_start(pmd); > > for (;;) { > > for (i = 0; i < poll_cnt; i++) { > > @@ -4638,6 +4694,10 @@ dp_netdev_configure_pmd(struct > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > > emc_cache_init(&pmd->flow_cache); > > pmd_alloc_static_tx_qid(pmd); > > } > > + > > + /* Initialize upcall policer token bucket with configured > params */ > > + token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst); > > + > > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, > &pmd->node), > > hash_int(core_id, 0)); > > } > > @@ -5076,7 +5136,7 @@ handle_packet_upcall(struct > dp_netdev_pmd_thread *pmd, > > struct dp_packet *packet, > > const struct netdev_flow_key *key, > > struct ofpbuf *actions, struct ofpbuf > *put_actions, > > - int *lost_cnt) > > + int *lost_cnt, int *rl_drop_cnt) > > { > > struct ofpbuf *add_actions; > > struct dp_packet_batch b; > > @@ -5084,6 +5144,18 @@ handle_packet_upcall(struct > dp_netdev_pmd_thread *pmd, > > ovs_u128 ufid; > > int error; > > > > + /* > > + * Grab a token from the upcall policer to enter slowpath. If > token > > + * is not available, drop and account the packet. This is to > > + * rate-limit packets getting into slowpath. > > + */ > > + if (upcall_ratelimit && > !token_bucket_withdraw(&pmd->upcall_tb, 1)) { > > + dp_packet_delete(packet); > > + (*rl_drop_cnt)++; > > + > > + return; > > + } > > + > > match.tun_md.valid = false; > > miniflow_expand(&key->mf, &match.flow); > > > > @@ -5158,7 +5230,7 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > > struct dpcls *cls; > > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > > struct dp_netdev *dp = pmd->dp; > > - int miss_cnt = 0, lost_cnt = 0; > > + int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0; > > int lookup_cnt = 0, add_lookup_cnt; > > bool any_miss; > > size_t i; > > @@ -5202,7 +5274,7 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > > > > miss_cnt++; > > handle_packet_upcall(pmd, packet, &keys[i], &actions, > > - &put_actions, &lost_cnt); > > + &put_actions, &lost_cnt, > &rl_drop_cnt); > > } > > > > ofpbuf_uninit(&actions); > > @@ -5235,6 +5307,7 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > > dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > > dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > > dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > > + dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, > rl_drop_cnt); > > } > > > > /* Packets enter the datapath from a port (or from recirculation) > here. > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 58c0ebd..bc77c35 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -412,6 +412,53 @@ > > </p> > > </column> > > > > + <column name="other_config" key="upcall-rl" > > + type='{"type": "boolean"}'> > > + <p> > > + Set this value to <code>true</code> to enable upcall > rate-limiting. > > + The upcall parameters like rate and burst will be > ignored, if this is > > + not set. > > + </p> > > + <p> > > + The default value is <code>false</code> and upcall > rate-limiting will > > + be disabled. > > + </p> > > + </column> > > + > > + <column name="other_config" key="upcall-rate" > > + type='{"type": "integer", "minInteger": 0, "maxInteger": > 4294967295}'> > > + <p> > > + Specifies the rate of upcalls in packets-per-second that > is to be > > + allowed. For example, if the value is 10000, then those > many upcalls > > + (for packets) are allowed per second in each of the > packet polling > > + thread (PMD or non-PMD). > > + </p> > > + <p> > > + A value of <code>0</code> means, no upcalls would be > allowed i.e, > > + upcall will be disabled. This is mainly for debugging. > > + </p> > > + <p> > > + The default value is 1000. > > + </p> > > + </column> > > + > > + <column name="other_config" key="upcall-burst" > > + type='{"type": "integer", "minInteger": 0, "maxInteger": > 4294967295}'> > > + <p> > > + Specifies the maximum burst of upcalls in > packets-per-second that is > > + to be allowed. For example, if the value is 15000, then > a maximum > > + burst of 15000 upcalls (for packets) are allowed per > second in each > > + of the packet polling thread (PMD or non-PMD). > > + </p> > > + <p> > > + A value of <code>0</code> means, no upcalls would be > allowed i.e, > > + upcall will be disabled. This is mainly for debugging. > > + </p> > > + <p> > > + The default value is 1000. > > + </p> > > + </column> > > + > > <column name="other_config" key="vlan-limit" > > type='{"type": "integer", "minInteger": 0}'> > > <p> > > -- > > 1.9.1 > > > > > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
