On Tue, Apr 27, 2021 at 2:38 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 3/15/20 7:43 AM, xiangxia.m....@gmail.com wrote: > > From: Tonghao Zhang <xiangxia.m....@gmail.com> > > > > For now, ovs-vswitchd use the array of the dp_meter struct > > to store meter's data, and at most, there are only 65536 > > (defined by MAX_METERS) meters that can be used. But in some > > case, for example, in the edge gateway, we should use 200,000+, > > at least, meters for IP address bandwidth limitation. > > Every one IP address will use two meters for its rx and tx > > path[1]. In other way, ovs-vswitchd should support meter-offload > > (rte_mtr_xxx api introduced by dpdk.), but there are more than > > 65536 meters in the hardware, such as Mellanox ConnectX-6. > > > > This patch use cmap to manage the meter, instead of the array. > > Hi. Here is a around of review as discussed in review of v3 of this series. > > It's a bit confusing that there was two versions of v2 of this patch set. > Some comments might be same as I already had for another v2... > > > > > * Insertion performance, ovs-ofctl add-meter 1000+ meters, > > the cmap takes abount 4000ms, as same as previous implementation. > > * Lookup performance in datapath, we add 1000+ meter which rate is > > 10G (the NIC cards are 10Gb, so netdev-datapath will not drop the > > packets.), and a flow which only forwarding the packets from p0 > > to p1, with meter action[2]. On other server, the pktgen-dpdk > > will generate 64B packets to p0. > > The forwarding performance is 4,814,400 pps. Without this path, > > 4,935,584 pps. There are about 1% performance loss. For addressing > > this issue, next patch add a meter cache. > > To get back some of the performance you may consider switching form > mutex to spin lock for meters themselves. (Note that fallback for > systems that has no spin locks will be needed in this case, so you > might want to keep meter_lock/unclock functions.) > > Also, this test case looks very synthetic and in a more real-world > setup perfromance diference should be lower, so, likely, doesn't > worth efforts to create caches and stuff. > > > > > [1]. > > $ in_port=p0,ip,ip_dst=1.1.1.x action=meter:n,output:p1 > > $ in_port=p1,ip,ip_src=1.1.1.x action=meter:m,output:p0 > > > > [2]. > > $ in_port=p0 action=meter:100,output:p1 > > > > Cc: Ben Pfaff <b...@ovn.org> > > Cc: Jarno Rajahalme <ja...@ovn.org> > > Cc: Ilya Maximets <i.maxim...@ovn.org> > > Cc: Andy Zhou <az...@ovn.org> > > Signed-off-by: Tonghao Zhang <xiangxia.m....@gmail.com> > > ---> lib/dpif-netdev.c | 199 > > +++++++++++++++++++++++++++++++++++------------------- > > 1 file changed, 130 insertions(+), 69 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index d393aab..5474d52 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -98,9 +98,11 @@ DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) > > > > /* Configuration parameters. */ > > enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. > > */ > > -enum { MAX_METERS = 65536 }; /* Maximum number of meters. */ > > -enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */ > > -enum { N_METER_LOCKS = 64 }; /* Maximum number of meters. */ > > + > > +/* Maximum number of meters in the table. */ > > +#define METER_ENTRY_MAX (1 << 19) > > +/* Maximum number of bands / meter. */ > > +#define METER_BAND_MAX (8) > > It's unclear why enums converted to macros and why we need to change names. > Please, keep them as is. Hi Ilya v3 is posted, please review. http://patchwork.ozlabs.org/project/openvswitch/patch/20210512091736.48172-1-xiangxia.m....@gmail.com/ > > > > COVERAGE_DEFINE(datapath_drop_meter); > > COVERAGE_DEFINE(datapath_drop_upcall_error); > > @@ -280,6 +282,9 @@ struct dp_meter_band { > > }; > > > > struct dp_meter { > > + struct cmap_node node; > > + struct ovs_mutex lock; > > + uint32_t id; > > uint16_t flags; > > uint16_t n_bands; > > uint32_t max_delta_t; > > @@ -289,6 +294,12 @@ struct dp_meter { > > struct dp_meter_band bands[]; > > }; > > > > +struct dp_netdev_meter { > > + struct cmap table OVS_GUARDED; > > + struct ovs_mutex lock; /* Used for meter table. */ > > + uint32_t hash_basis; > > Not sure if we need a hash basis, especially because you're choosing > it randomly. Basis for hash functions used to choose a beeter hashing > schema if you know that some particular base value will provide better > hash distribution for particular type of data, or if you need ot hash > two equal values and you want them to be hashed differently. Both cases > are not applicable here, so, we can simply use 0 always as a hash basis. > Also hashing might be a bit faster if basis is a compile-time constant. > > > +}; > > + > > struct pmd_auto_lb { > > bool auto_lb_requested; /* Auto load balancing requested by user. > > */ > > bool is_enabled; /* Current status of Auto load balancing. > > */ > > @@ -329,8 +340,7 @@ struct dp_netdev { > > atomic_uint32_t tx_flush_interval; > > > > /* Meters. */ > > - struct ovs_mutex meter_locks[N_METER_LOCKS]; > > - struct dp_meter *meters[MAX_METERS]; /* Meter bands. */ > > + struct dp_netdev_meter *meter; > > > We're not going to create more than one instance of this structure > per datapath, so , I think, to reduce naming confusion, it's better > to just expand it right here: > > struct cmap meters OVS_GUARDED; > struct ovs_mutex meters_lock; > > > > > /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ > > OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; > > @@ -378,19 +388,6 @@ struct dp_netdev { > > struct pmd_auto_lb pmd_alb; > > }; > > > > -static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) > > - OVS_ACQUIRES(dp->meter_locks[meter_id % N_METER_LOCKS]) > > -{ > > - ovs_mutex_lock(&dp->meter_locks[meter_id % N_METER_LOCKS]); > > -} > > - > > -static void meter_unlock(const struct dp_netdev *dp, uint32_t meter_id) > > - OVS_RELEASES(dp->meter_locks[meter_id % N_METER_LOCKS]) > > -{ > > - ovs_mutex_unlock(&dp->meter_locks[meter_id % N_METER_LOCKS]); > > -} > > - > > - > > static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev > > *dp, > > odp_port_t) > > OVS_REQUIRES(dp->port_mutex); > > @@ -1523,6 +1520,84 @@ choose_port(struct dp_netdev *dp, const char *name) > > return ODPP_NONE; > > } > > > > +static uint32_t > > +dp_meter_hash(uint32_t meter_id, uint32_t basis) > > +{ > > + return hash_bytes32(&meter_id, sizeof meter_id, basis); > > Instead of hash_bytes32 it might be better to use simple hash_int(). > Benefits: no need for the function at all, callers might just use > hash_int() with zero basis (basis is likely not needed as I explained > above). Also, more compilers will fully inline hash_int() unlike > hash_bytes32(). > > > > +} > > + > > +static void > > +dp_netdev_meter_init(struct dp_netdev *dp) > > Without the memory allocation and calculation of a basis this > function could be fully inlined into create_dp_netdev(). > > > +{ > > + struct dp_netdev_meter *dp_meter; > > + > > + dp_meter = xmalloc(sizeof *dp_meter); > > + > > + cmap_init(&dp_meter->table); > > + ovs_mutex_init(&dp_meter->lock); > > + dp_meter->hash_basis = random_uint32(); > > Random is not better than 0. > > > + > > + dp->meter = dp_meter; > > +} > > + > > +static void > > +dp_netdev_meter_destroy(struct dp_netdev *dp) > > +{ > > + struct dp_netdev_meter *dp_meter = dp->meter; > > + struct dp_meter *meter; > > + > > + ovs_mutex_lock(&dp_meter->lock); > > + CMAP_FOR_EACH (meter, node, &dp_meter->table) { > > + cmap_remove(&dp_meter->table, &meter->node, > > + dp_meter_hash(meter->id, dp_meter->hash_basis)); > > + > > + ovsrcu_postpone(free, meter); > > Overindented. 4 spaces, please. > > > + } > > + > > + cmap_destroy(&dp_meter->table); > > + ovs_mutex_unlock(&dp_meter->lock); > > + > > + ovs_mutex_destroy(&dp_meter->lock); > > + free(dp_meter); > > +} > > + > > +static struct dp_meter* > > +dp_meter_lookup(struct dp_netdev_meter *dp_meter, uint32_t meter_id) > > +{ > > + uint32_t hash = dp_meter_hash(meter_id, dp_meter->hash_basis); > > hash_int(meter_id, 0); > > > + struct dp_meter *meter; > > + > > + CMAP_FOR_EACH_WITH_HASH (meter, node, hash, &dp_meter->table) { > > + if (meter->id == meter_id) { > > + return meter; > > + } > > + } > > + > > + return NULL; > > +} > > + > > +static void > > +dp_meter_detach_free(struct dp_netdev_meter *dp_meter, uint32_t meter_id) > > + OVS_REQUIRES(dp_meter->lock) > > Overindented. > > > +{ > > + struct dp_meter *meter; > > + > > + meter = dp_meter_lookup(dp_meter, meter_id); > > + if (meter) { > > + cmap_remove(&dp_meter->table, &meter->node, > > + dp_meter_hash(meter_id, dp_meter->hash_basis)); > > + ovsrcu_postpone(free, meter); > > ditto. > > > + } > > +} > > + > > +static void > > +dp_meter_attach(struct dp_netdev_meter *dp_meter, struct dp_meter *meter) > > + OVS_REQUIRES(dp_meter->lock) > > ditto. > > > +{ > > + cmap_insert(&dp_meter->table, &meter->node, > > + dp_meter_hash(meter->id, dp_meter->hash_basis));> +} > > + > > static int > > create_dp_netdev(const char *name, const struct dpif_class *class, > > struct dp_netdev **dpp) > > @@ -1556,9 +1631,8 @@ create_dp_netdev(const char *name, const struct > > dpif_class *class, > > dp->reconfigure_seq = seq_create(); > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > > > - for (int i = 0; i < N_METER_LOCKS; ++i) { > > - ovs_mutex_init_adaptive(&dp->meter_locks[i]); > > - } > > + /* Init meter resource. */ > > + dp_netdev_meter_init(dp); > > > > /* Disable upcalls by default. */ > > dp_netdev_disable_upcall(dp); > > @@ -1647,16 +1721,6 @@ dp_netdev_destroy_upcall_lock(struct dp_netdev *dp) > > fat_rwlock_destroy(&dp->upcall_rwlock); > > } > > > > -static void > > -dp_delete_meter(struct dp_netdev *dp, uint32_t meter_id) > > - OVS_REQUIRES(dp->meter_locks[meter_id % N_METER_LOCKS]) > > -{ > > - if (dp->meters[meter_id]) { > > - free(dp->meters[meter_id]); > > - dp->meters[meter_id] = NULL; > > - } > > -} > > - > > /* Requires dp_netdev_mutex so that we can't get a new reference to 'dp' > > * through the 'dp_netdevs' shash while freeing 'dp'. */ > > static void > > @@ -1694,16 +1758,7 @@ dp_netdev_free(struct dp_netdev *dp) > > /* Upcalls must be disabled at this point */ > > dp_netdev_destroy_upcall_lock(dp); > > > > - int i; > > - > > - for (i = 0; i < MAX_METERS; ++i) { > > - meter_lock(dp, i); > > - dp_delete_meter(dp, i); > > - meter_unlock(dp, i); > > - } > > - for (i = 0; i < N_METER_LOCKS; ++i) { > > - ovs_mutex_destroy(&dp->meter_locks[i]); > > - } > > + dp_netdev_meter_destroy(dp); > > > > free(dp->pmd_cmask); > > free(CONST_CAST(char *, dp->name)); > > @@ -5708,10 +5763,10 @@ static void > > dpif_netdev_meter_get_features(const struct dpif * dpif OVS_UNUSED, > > struct ofputil_meter_features *features) > > { > > - features->max_meters = MAX_METERS; > > + features->max_meters = METER_ENTRY_MAX; > > features->band_types = DP_SUPPORTED_METER_BAND_TYPES; > > features->capabilities = DP_SUPPORTED_METER_FLAGS_MASK; > > - features->max_bands = MAX_BANDS; > > + features->max_bands = METER_BAND_MAX; > > features->max_color = 0; > > } > > > > @@ -5732,14 +5787,13 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct > > dp_packet_batch *packets_, > > uint32_t exceeded_rate[NETDEV_MAX_BURST]; > > int exceeded_pkt = cnt; /* First packet that exceeded a band rate. */ > > > > - if (meter_id >= MAX_METERS) { > > + if (meter_id >= METER_ENTRY_MAX) { > > return; > > } > > > > - meter_lock(dp, meter_id); > > - meter = dp->meters[meter_id]; > > + meter = dp_meter_lookup(dp->meter, meter_id); > > if (!meter) { > > - goto out; > > + return; > > } > > > > /* Initialize as negative values. */ > > @@ -5747,6 +5801,7 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct > > dp_packet_batch *packets_, > > /* Initialize as zeroes. */ > > memset(exceeded_rate, 0, cnt * sizeof *exceeded_rate); > > > > + ovs_mutex_lock(&meter->lock); > > /* All packets will hit the meter at the same time. */ > > long_delta_t = now / 1000 - meter->used / 1000; /* msec */ > > > > @@ -5860,8 +5915,8 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct > > dp_packet_batch *packets_, > > dp_packet_batch_refill(packets_, packet, j); > > } > > } > > - out: > > - meter_unlock(dp, meter_id); > > + > > + ovs_mutex_unlock(&meter->lock); > > } > > > > /* Meter set/get/del processing is still single-threaded. */ > > @@ -5870,11 +5925,12 @@ dpif_netdev_meter_set(struct dpif *dpif, > > ofproto_meter_id meter_id, > > struct ofputil_meter_config *config) > > { > > struct dp_netdev *dp = get_dp_netdev(dpif); > > + struct dp_netdev_meter *dp_meter = dp->meter; > > uint32_t mid = meter_id.uint32; > > struct dp_meter *meter; > > int i; > > > > - if (mid >= MAX_METERS) { > > + if (mid >= METER_ENTRY_MAX) { > > return EFBIG; /* Meter_id out of range. */ > > } > > > > @@ -5882,7 +5938,7 @@ dpif_netdev_meter_set(struct dpif *dpif, > > ofproto_meter_id meter_id, > > return EBADF; /* Unsupported flags set */ > > } > > > > - if (config->n_bands > MAX_BANDS) { > > + if (config->n_bands > METER_BAND_MAX) { > > return EINVAL; > > } > > > > @@ -5903,6 +5959,8 @@ dpif_netdev_meter_set(struct dpif *dpif, > > ofproto_meter_id meter_id, > > meter->n_bands = config->n_bands; > > meter->max_delta_t = 0; > > meter->used = time_usec(); > > + meter->id = mid; > > + ovs_mutex_init(&meter->lock); > > > > /* set up bands */ > > for (i = 0; i < config->n_bands; ++i) { > > @@ -5928,10 +5986,12 @@ dpif_netdev_meter_set(struct dpif *dpif, > > ofproto_meter_id meter_id, > > } > > } > > > > - meter_lock(dp, mid); > > - dp_delete_meter(dp, mid); /* Free existing meter, if any */ > > - dp->meters[mid] = meter; > > - meter_unlock(dp, mid); > > + ovs_mutex_lock(&dp_meter->lock); > > + > > + dp_meter_detach_free(dp_meter, mid); /* Free existing meter, if any */ > > + dp_meter_attach(dp_meter, meter); > > + > > + ovs_mutex_unlock(&dp_meter->lock); > > > > return 0; > > } > > @@ -5941,22 +6001,23 @@ dpif_netdev_meter_get(const struct dpif *dpif, > > ofproto_meter_id meter_id_, > > struct ofputil_meter_stats *stats, uint16_t n_bands) > > { > > - const struct dp_netdev *dp = get_dp_netdev(dpif); > > + struct dp_netdev *dp = get_dp_netdev(dpif); > > uint32_t meter_id = meter_id_.uint32; > > - int retval = 0; > > + const struct dp_meter *meter; > > > > - if (meter_id >= MAX_METERS) { > > + if (meter_id >= METER_ENTRY_MAX) { > > return EFBIG; > > } > > > > - meter_lock(dp, meter_id); > > - const struct dp_meter *meter = dp->meters[meter_id]; > > + meter = dp_meter_lookup(dp->meter, meter_id); > > if (!meter) { > > - retval = ENOENT; > > - goto done; > > + return ENOENT; > > } > > + > > if (stats) { > > - int i = 0; > > + int i; > > This change is not necessary. > > > + > > + ovs_mutex_lock(&meter->lock); > > > > stats->packet_in_count = meter->packet_count; > > stats->byte_in_count = meter->byte_count; > > @@ -5966,12 +6027,11 @@ dpif_netdev_meter_get(const struct dpif *dpif, > > stats->bands[i].byte_count = meter->bands[i].byte_count; > > } > > > > + ovs_mutex_unlock(&meter->lock); > > stats->n_bands = i; > > } > > > > -done: > > - meter_unlock(dp, meter_id); > > - return retval; > > + return 0; > > } > > > > static int > > @@ -5980,15 +6040,16 @@ dpif_netdev_meter_del(struct dpif *dpif, > > struct ofputil_meter_stats *stats, uint16_t n_bands) > > { > > struct dp_netdev *dp = get_dp_netdev(dpif); > > + struct dp_netdev_meter *dp_meter = dp->meter; > > int error; > > > > error = dpif_netdev_meter_get(dpif, meter_id_, stats, n_bands); > > if (!error) { > > uint32_t meter_id = meter_id_.uint32; > > > > - meter_lock(dp, meter_id); > > - dp_delete_meter(dp, meter_id); > > - meter_unlock(dp, meter_id); > > + ovs_mutex_lock(&dp_meter->lock); > > + dp_meter_detach_free(dp_meter, meter_id); > > + ovs_mutex_unlock(&dp_meter->lock); > > } > > return error; > > } > > >
-- Best regards, Tonghao _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev