OK.  We do need some kind of special case here.

I think I was I was thinking of something different: metering for
datapath flows that need to go to the slowpath.  For those, we have a
set of datapath actions and those datapath actions can include a meter
action, and that's what we're planning to do for all dpif-based
datapaths.

I guess that for miss upcalls we don't have a set of datapath actions,
only a list of Netlink upcall pids.  That's a shame, it would be easy to
use meters here too otherwise.

At any rate, we should make the slowpath ratelimiting configurable via
the standard OFPM_SLOWPATH meter available in OpenFlow.

Thanks,

ben.

On Thu, Feb 15, 2018 at 10:42:02PM +0000, Jan Scheurich wrote:
> Hi Ben,
> 
> This patch is about rate-limiting miss upcalls. Could you elaborate how you 
> suggest to use the metering datapath action to realize that? 
> 
> I also think this is a specific issue for the DPDK datapath as only the miss 
> upcalls are executed in-line in the PMD thread so that mass upcalls directly 
> impact the datapath performance. The kernel datapath (not sure about Windows) 
> doesn't have the same issue as upcalls are transferred to the user-space 
> through netlink.
> 
> Thanks, Jan
> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of Ben Pfaff
> > Sent: Thursday, 15 February, 2018 19:13
> > To: Manohar Krishnappa Chidambaraswamy 
> > <[email protected]>
> > Cc: [email protected]
> > Subject: Re: [ovs-dev] [PATCH v2] Upcall/Slowpath rate limiter for OVS
> > 
> > Metering is implemented as a datapath action so I think you can share
> > the mechanism instead of inventing a new one.
> > 
> > On Thu, Feb 15, 2018 at 05:57:58PM +0000, Manohar Krishnappa 
> > Chidambaraswamy wrote:
> > > [Adding to my previous reply]
> > >
> > > Ben,
> > >
> > > “[manu] Do you mean using meters in dp-netdev instead of token-bucket. I 
> > > don’t know much about meters. Will check.”
> > >
> > > I checked about meters. For limiting flows hitting slowpath, meters can’t 
> > > be used. Since meter is an action, new flows would still hit the
> > slow-path before meter-action can be executed.
> > >
> > > Thanx
> > > Manu
> > >
> > > On 25/01/18, 11:52 AM, "[email protected] on behalf of 
> > > Manohar Krishnappa Chidambaraswamy" <ovs-dev-
> > [email protected] on behalf of 
> > [email protected]> wrote:
> > >
> > >     Ben,
> > >
> > >     On 23/01/18, 2:01 AM, "Ben Pfaff" <[email protected]> wrote:
> > >
> > >         I guess I'm probably the wrong one to ultimately review this 
> > > patch,
> > >         since it's really a DPDK patch.
> > >
> > >         It's difficult to say for sure that this is really an 
> > > improvement, since
> > >         it also causes traffic to be dropped (that is, the traffic that 
> > > needs to
> > >         be slow-pathed).
> > >
> > >     [manu] Thanx for your response. Its true that this also results in
> > >     packet-drops, but these packets belong to new flows that can 
> > > potentially
> > >     take lot more cycles of PMD. Already established flows will *not* see 
> > > any drops
> > >     due to this rate-limiting. If implemented in DPDK, we can’t 
> > > differentiate packets
> > >     belonging to already established vs new flows. To do this, we would 
> > > need something
> > >     in OVS only.
> > >
> > >     Sorry I didn’t include the description in my v2 patch. Here is the 
> > > main intent
> > >     behind this patch (from v1). Do you think this is reasonable?
> > >
> > >     “In OVS-DPDK, both fast-path and slow-path execute in the context of 
> > > a common
> > >     thread (i.e, PMD thread), without any partitioning of CPU cycles 
> > > between the
> > >     two. When there is a burst of new flows coming into the data-path, 
> > > packets
> > >     are punted to slow-path in the order they are received and the PMD is 
> > > busy
> > >     for the duration of the upcall. Slow-path processing of a packet 
> > > consumes
> > >     100-200 times the cycles of fast-path handling. As a result, the 
> > > forwarding
> > >     performance of a PMD degrades significantly during an upcall burst. 
> > > If the
> > >     PMD was highly loaded already, it becomes temporarily overloaded and 
> > > its rx
> > >     queues start filling up. If the upcall burst is long enough, packets 
> > > will be
> > >     dropped when rx queues are full. This happens even if the new flows 
> > > are
> > >     unexpected and the slow-path decides to drop the packets.
> > >
> > >     It is likely that most of the packets dropped due to rx queue 
> > > overflow belong
> > >     to established flows that should have been processed by the 
> > > fast-path. Hence,
> > >     the current OVS-DPDK architecture favors the handling of new flows 
> > > over the
> > >     forwarding of established flows. This is generally a sub-optimal 
> > > approach.
> > >
> > >     Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for 
> > > DoS attacks.
> > >     But even sporadic bursts of e.g. unexpected multicast packets have 
> > > shown to
> > >     cause such packet drops.”
> > >
> > >         I think that this could be handled in a datapath independent way 
> > > using
> > >         meters.
> > >
> > >     [manu] Do you mean using meters in dp-netdev instead of token-bucket. 
> > > I don’t
> > >     know much about meters. Will check.
> > >
> > >     Thanx
> > >     Manu
> > >
> > >         On Mon, Jan 15, 2018 at 08:33:31AM +0000, Manohar Krishnappa 
> > > Chidambaraswamy wrote:
> > >         > Rebased to master.
> > >         >
> > >         > Could you please review this patch?
> > >         >
> > >         > v1 of this patch has been sitting idle for quite sometime. I 
> > > believe this
> > >         > feature is important. Without this feature, an attacker sending 
> > > packets
> > >         > belonging to different (unestablished) flows can result in PMDs 
> > > running only
> > >         > upcalls, reducing the throughput drastically. Huge traffic loss 
> > > due to this
> > >         > is more like a DOS attack.
> > >         >
> > >         > Please let me know your thoughts/comments. This is my first 
> > > patch series in
> > >         > OVS. Sorry if naming of this patch as v2 is not appropriate.
> > >         >
> > >         > Thanx
> > >         > Manu
> > >         >
> > >         > v1 of this patch: https://patchwork.ozlabs.org/patch/836737/
> > >         >
> > >         > Signed-off-by: Manohar K C 
> > > <[email protected]>
> > >         > cc: Jan Scheurich <[email protected]>
> > >         > cc: Ben Pfaff <[email protected]>
> > >         >
> > >         > ---
> > >         >  Documentation/howto/dpdk.rst      | 21 ++++++++++
> > >         >  debian/libopenvswitch-dev.install |  1 -
> > >         >  debian/libopenvswitch.install     |  1 -
> > >         >  debian/rules                      |  4 +-
> > >         >  lib/dpif-netdev.c                 | 81 
> > > +++++++++++++++++++++++++++++++++++++--
> > >         >  vswitchd/vswitch.xml              | 47 +++++++++++++++++++++++
> > >         >  6 files changed, 148 insertions(+), 7 deletions(-)
> > >         >
> > >         > diff --git a/Documentation/howto/dpdk.rst 
> > > b/Documentation/howto/dpdk.rst
> > >         > index 587aaed..1db1562 100644
> > >         > --- a/Documentation/howto/dpdk.rst
> > >         > +++ b/Documentation/howto/dpdk.rst
> > >         > @@ -716,3 +716,24 @@ devices to bridge ``br0``. Once complete, 
> > > follow the below steps:
> > >         >     Check traffic on multiple queues::
> > >         >
> > >         >         $ cat /proc/interrupts | grep virtio
> > >         > +
> > >         > +Upcall rate limiting
> > >         > +--------------------
> > >         > +ovs-vsctl can be used to enable and configure upcall rate 
> > > limit parameters.
> > >         > +There are 2 configurable values ``upcall-rate`` and 
> > > ``upcall-burst`` which
> > >         > +take effect when global enable knob ``upcall-rl`` is set to 
> > > true.
> > >         > +
> > >         > +Upcall rate should be set using ``upcall-rate`` in 
> > > packets-per-sec. For
> > >         > +example::
> > >         > +
> > >         > +    $ ovs-vsctl set Open_vSwitch . 
> > > other_config:upcall-rate=2000
> > >         > +
> > >         > +Upcall burst should be set using ``upcall-burst`` in 
> > > packets-per-sec. For
> > >         > +example::
> > >         > +
> > >         > +    $ ovs-vsctl set Open_vSwitch . 
> > > other_config:upcall-burst=2000
> > >         > +
> > >         > +Upcall ratelimit feature should be globally enabled using 
> > > ``upcall-rl``. For
> > >         > +example::
> > >         > +
> > >         > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
> > >         > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > >         > index c7d157a..e2fd92c 100644
> > >         > --- a/lib/dpif-netdev.c
> > >         > +++ b/lib/dpif-netdev.c
> > >         > @@ -101,6 +101,16 @@ static struct shash dp_netdevs 
> > > OVS_GUARDED_BY(dp_netdev_mutex)
> > >         >
> > >         >  static struct vlog_rate_limit upcall_rl = 
> > > VLOG_RATE_LIMIT_INIT(600, 600);
> > >         >
> > >         > +/* Upcall rate-limit parameters */
> > >         > +static bool upcall_ratelimit;
> > >         > +static unsigned int upcall_rate;
> > >         > +static unsigned int upcall_burst;
> > >         > +
> > >         > +/* TODO: Tune these default values based on upcall perf test */
> > >         > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default 
> > > */
> > >         > +#define UPCALL_RATE_DEFAULT      1000  /* pps */
> > >         > +#define UPCALL_BURST_DEFAULT     1000  /* pps */
> > >         > +
> > >         >  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | 
> > > CS_RELATED \
> > >         >                                       | CS_INVALID | 
> > > CS_REPLY_DIR | CS_TRACKED \
> > >         >                                       | CS_SRC_NAT | CS_DST_NAT)
> > >         > @@ -340,6 +350,7 @@ enum dp_stat_type {
> > >         >                                     hits */
> > >         >      DP_STAT_SENT_PKTS,          /* Packets that has been sent. 
> > > */
> > >         >      DP_STAT_SENT_BATCHES,       /* Number of batches sent. */
> > >         > +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to 
> > > upcall policer */
> > >         >      DP_N_STATS
> > >         >  };
> > >         >
> > >         > @@ -645,6 +656,9 @@ struct dp_netdev_pmd_thread {
> > >         >      unsigned long long stats_zero[DP_N_STATS];
> > >         >      uint64_t cycles_zero[PMD_N_CYCLES];
> > >         >
> > >         > +    /* Policer to rate limit slow-path */
> > >         > +    struct token_bucket upcall_tb;
> > >         > +
> > >         >      /* Set to true if the pmd thread needs to be reloaded. */
> > >         >      bool need_reload;
> > >         >  };
> > >         > @@ -894,10 +908,11 @@ pmd_info_show_stats(struct ds *reply,
> > >         >      ds_put_format(reply,
> > >         >                    "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> > >         >                    "\tavg. subtable lookups per hit:%.2f\n"
> > >         > -                  "\tmiss:%llu\n\tlost:%llu\n"
> > >         > +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit 
> > > drops:%llu\n"
> > >         >                    "\tavg. packets per output batch: %.2f\n",
> > >         >                    stats[DP_STAT_EXACT_HIT], 
> > > stats[DP_STAT_MASKED_HIT],
> > >         >                    lookups_per_hit, stats[DP_STAT_MISS], 
> > > stats[DP_STAT_LOST],
> > >         > +                  stats[DP_STAT_RATELIMIT_DROP],
> > >         >                    packets_per_batch);
> > >         >
> > >         >      if (total_cycles == 0) {
> > >         > @@ -3031,6 +3046,8 @@ dpif_netdev_set_config(struct dpif *dpif, 
> > > const struct smap *other_config)
> > >         >          smap_get_ullong(other_config, "emc-insert-inv-prob",
> > >         >                          DEFAULT_EM_FLOW_INSERT_INV_PROB);
> > >         >      uint32_t insert_min, cur_min;
> > >         > +    unsigned int rate, burst;
> > >         > +    bool ratelimit;
> > >         >
> > >         >      if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
> > >         >          free(dp->pmd_cmask);
> > >         > @@ -3056,6 +3073,36 @@ dpif_netdev_set_config(struct dpif 
> > > *dpif, const struct smap *other_config)
> > >         >          }
> > >         >      }
> > >         >
> > >         > +    /* Handle upcall policer params */
> > >         > +    ratelimit = smap_get_bool(other_config, "upcall-rl",
> > >         > +                              UPCALL_RATELIMIT_DEFAULT);
> > >         > +    rate = smap_get_int(other_config, "upcall-rate",
> > >         > +                        UPCALL_RATE_DEFAULT);
> > >         > +    burst = smap_get_int(other_config, "upcall-burst",
> > >         > +                         UPCALL_BURST_DEFAULT);
> > >         > +
> > >         > +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
> > >         > +        VLOG_INFO("Upcall ratelimit params changed : Old - 
> > > rate=%d burst=%d "
> > >         > +                  ": New - rate=%d burst=%d\n", upcall_rate, 
> > > upcall_burst,
> > >         > +                  rate, burst);
> > >         > +
> > >         > +        upcall_rate = rate;
> > >         > +        upcall_burst = burst;
> > >         > +
> > >         > +        /*
> > >         > +         * TODO: See if there is a way to reconfig only the 
> > > policer
> > >         > +         * in each PMD.
> > >         > +         */
> > >         > +        dp_netdev_request_reconfigure(dp);
> > >         > +    }
> > >         > +
> > >         > +    if (ratelimit != upcall_ratelimit) {
> > >         > +        upcall_ratelimit = ratelimit;
> > >         > +
> > >         > +        VLOG_INFO("Upcall ratelimit changed to %s\n",
> > >         > +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
> > >         > +    }
> > >         > +
> > >         >      return 0;
> > >         >  }
> > >         >
> > >         > @@ -3958,6 +4005,12 @@ dpif_netdev_run(struct dpif *dpif)
> > >         >      ovs_mutex_lock(&dp->port_mutex);
> > >         >      non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
> > >         >      if (non_pmd) {
> > >         > +        /* Reconfig the upcall policer if params have changed 
> > > */
> > >         > +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
> > >         > +            (upcall_burst != non_pmd->upcall_tb.burst)) {
> > >         > +            token_bucket_init(&non_pmd->upcall_tb, 
> > > upcall_rate, upcall_burst);
> > >         > +        }
> > >         > +
> > >         >          ovs_mutex_lock(&dp->non_pmd_mutex);
> > >         >          cycles_count_start(non_pmd);
> > >         >          HMAP_FOR_EACH (port, node, &dp->ports) {
> > >         > @@ -4154,6 +4207,9 @@ reload:
> > >         >          lc = UINT_MAX;
> > >         >      }
> > >         >
> > >         > +    /* Initialize upcall policer token bucket with configured 
> > > params */
> > >         > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, 
> > > upcall_burst);
> > >         > +
> > >         >      cycles_count_start(pmd);
> > >         >      for (;;) {
> > >         >          for (i = 0; i < poll_cnt; i++) {
> > >         > @@ -4638,6 +4694,10 @@ dp_netdev_configure_pmd(struct 
> > > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> > >         >          emc_cache_init(&pmd->flow_cache);
> > >         >          pmd_alloc_static_tx_qid(pmd);
> > >         >      }
> > >         > +
> > >         > +    /* Initialize upcall policer token bucket with configured 
> > > params */
> > >         > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, 
> > > upcall_burst);
> > >         > +
> > >         >      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node 
> > > *, &pmd->node),
> > >         >                  hash_int(core_id, 0));
> > >         >  }
> > >         > @@ -5076,7 +5136,7 @@ handle_packet_upcall(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >         >                       struct dp_packet *packet,
> > >         >                       const struct netdev_flow_key *key,
> > >         >                       struct ofpbuf *actions, struct ofpbuf 
> > > *put_actions,
> > >         > -                     int *lost_cnt)
> > >         > +                     int *lost_cnt, int *rl_drop_cnt)
> > >         >  {
> > >         >      struct ofpbuf *add_actions;
> > >         >      struct dp_packet_batch b;
> > >         > @@ -5084,6 +5144,18 @@ handle_packet_upcall(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >         >      ovs_u128 ufid;
> > >         >      int error;
> > >         >
> > >         > +    /*
> > >         > +     * Grab a token from the upcall policer to enter slowpath. 
> > > If token
> > >         > +     * is not available, drop and account the packet. This is 
> > > to
> > >         > +     * rate-limit packets getting into slowpath.
> > >         > +     */
> > >         > +    if (upcall_ratelimit && 
> > > !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
> > >         > +        dp_packet_delete(packet);
> > >         > +        (*rl_drop_cnt)++;
> > >         > +
> > >         > +        return;
> > >         > +    }
> > >         > +
> > >         >      match.tun_md.valid = false;
> > >         >      miniflow_expand(&key->mf, &match.flow);
> > >         >
> > >         > @@ -5158,7 +5230,7 @@ fast_path_processing(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >         >      struct dpcls *cls;
> > >         >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
> > >         >      struct dp_netdev *dp = pmd->dp;
> > >         > -    int miss_cnt = 0, lost_cnt = 0;
> > >         > +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
> > >         >      int lookup_cnt = 0, add_lookup_cnt;
> > >         >      bool any_miss;
> > >         >      size_t i;
> > >         > @@ -5202,7 +5274,7 @@ fast_path_processing(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >         >
> > >         >              miss_cnt++;
> > >         >              handle_packet_upcall(pmd, packet, &keys[i], 
> > > &actions,
> > >         > -                                 &put_actions, &lost_cnt);
> > >         > +                                 &put_actions, &lost_cnt, 
> > > &rl_drop_cnt);
> > >         >          }
> > >         >
> > >         >          ofpbuf_uninit(&actions);
> > >         > @@ -5235,6 +5307,7 @@ fast_path_processing(struct 
> > > dp_netdev_pmd_thread *pmd,
> > >         >      dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, 
> > > lookup_cnt);
> > >         >      dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> > >         >      dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> > >         > +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, 
> > > rl_drop_cnt);
> > >         >  }
> > >         >
> > >         >  /* Packets enter the datapath from a port (or from 
> > > recirculation) here.
> > >         > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> > >         > index 58c0ebd..bc77c35 100644
> > >         > --- a/vswitchd/vswitch.xml
> > >         > +++ b/vswitchd/vswitch.xml
> > >         > @@ -412,6 +412,53 @@
> > >         >          </p>
> > >         >        </column>
> > >         >
> > >         > +      <column name="other_config" key="upcall-rl"
> > >         > +              type='{"type": "boolean"}'>
> > >         > +        <p>
> > >         > +          Set this value to <code>true</code> to enable upcall 
> > > rate-limiting.
> > >         > +          The upcall parameters like rate and burst will be 
> > > ignored, if this is
> > >         > +          not set.
> > >         > +        </p>
> > >         > +        <p>
> > >         > +          The default value is <code>false</code> and upcall 
> > > rate-limiting will
> > >         > +          be disabled.
> > >         > +        </p>
> > >         > +      </column>
> > >         > +
> > >         > +      <column name="other_config" key="upcall-rate"
> > >         > +        type='{"type": "integer", "minInteger": 0, 
> > > "maxInteger": 4294967295}'>
> > >         > +        <p>
> > >         > +          Specifies the rate of upcalls in packets-per-second 
> > > that is to be
> > >         > +          allowed. For example, if the value is 10000, then 
> > > those many upcalls
> > >         > +          (for packets) are allowed per second in each of the 
> > > packet polling
> > >         > +          thread (PMD or non-PMD).
> > >         > +        </p>
> > >         > +        <p>
> > >         > +          A value of <code>0</code> means, no upcalls would be 
> > > allowed i.e,
> > >         > +          upcall will be disabled. This is mainly for 
> > > debugging.
> > >         > +        </p>
> > >         > +        <p>
> > >         > +          The default value is 1000.
> > >         > +        </p>
> > >         > +      </column>
> > >         > +
> > >         > +      <column name="other_config" key="upcall-burst"
> > >         > +        type='{"type": "integer", "minInteger": 0, 
> > > "maxInteger": 4294967295}'>
> > >         > +        <p>
> > >         > +          Specifies the maximum burst of upcalls in 
> > > packets-per-second that is
> > >         > +          to be allowed. For example, if the value is 15000, 
> > > then a maximum
> > >         > +          burst of 15000 upcalls (for packets) are allowed per 
> > > second in each
> > >         > +          of the packet polling thread (PMD or non-PMD).
> > >         > +        </p>
> > >         > +        <p>
> > >         > +          A value of <code>0</code> means, no upcalls would be 
> > > allowed i.e,
> > >         > +          upcall will be disabled. This is mainly for 
> > > debugging.
> > >         > +        </p>
> > >         > +        <p>
> > >         > +          The default value is 1000.
> > >         > +        </p>
> > >         > +      </column>
> > >         > +
> > >         >        <column name="other_config" key="vlan-limit"
> > >         >                type='{"type": "integer", "minInteger": 0}'>
> > >         >          <p>
> > >         > --
> > >         > 1.9.1
> > >         >
> > >         >
> > >         >
> > >
> > >
> > >     _______________________________________________
> > >     dev mailing list
> > >     [email protected]
> > >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to