> Acked-by: Billy O'Mahony <billy.o.mah...@intel.com> > Thanks to all for the work on this.
I've applied this to dpdk_merge branch, it will be part of this week's pull request. I rolled patch 2 of the series into the same commit as I don't think it makes sense to have a broken unit test for the 1st commit. Thanks Ian > > -----Original Message----- > > From: Wang, Yipeng1 > > Sent: Tuesday, July 10, 2018 11:14 AM > > To: d...@openvswitch.org; jan.scheur...@ericsson.com; O Mahony, Billy > > <billy.o.mah...@intel.com> > > Cc: Wang, Yipeng1 <yipeng1.w...@intel.com>; Stokes, Ian > > <ian.sto...@intel.com>; b...@ovn.org > > Subject: [PATCH v5 1/2] dpif-netdev: Add SMC cache after EMC cache > > > > This patch adds a signature match cache (SMC) after exact match cache > (EMC). > > The difference between SMC and EMC is SMC only stores a signature of a > > flow thus it is much more memory efficient. With same memory space, > > EMC can store 8k flows while SMC can store 1M flows. It is generally > > beneficial to turn on SMC but turn off EMC when traffic flow count is > much larger than EMC size. > > > > SMC cache will map a signature to an dp_netdev_flow index in > > flow_table. Thus, we add two new APIs in cmap for lookup key by index > and lookup index by key. > > > > For now, SMC is an experimental feature that it is turned off by > > default. One can turn it on using ovsdb options. > > > > Signed-off-by: Yipeng Wang <yipeng1.w...@intel.com> > > Co-authored-by: Jan Scheurich <jan.scheur...@ericsson.com> > > Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> > > --- > > Documentation/topics/dpdk/bridge.rst | 15 ++ > > NEWS | 2 + > > lib/cmap.c | 74 ++++++++ > > lib/cmap.h | 11 ++ > > lib/dpif-netdev-perf.h | 1 + > > lib/dpif-netdev.c | 329 > +++++++++++++++++++++++++++++++---- > > tests/pmd.at | 1 + > > vswitchd/vswitch.xml | 13 ++ > > 8 files changed, 409 insertions(+), 37 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/bridge.rst > > b/Documentation/topics/dpdk/bridge.rst > > index 63f8a62..df74c02 100644 > > --- a/Documentation/topics/dpdk/bridge.rst > > +++ b/Documentation/topics/dpdk/bridge.rst > > @@ -102,3 +102,18 @@ For certain traffic profiles with many parallel > > flows, it's recommended to set ``N`` to '0' to achieve higher > forwarding performance. > > > > For more information on the EMC refer to :doc:`/intro/install/dpdk` . > > + > > + > > +SMC cache (experimental) > > +------------------------- > > + > > +SMC cache or signature match cache is a new cache level after EMC > cache. > > +The difference between SMC and EMC is SMC only stores a signature of > > +a flow thus it is much more memory efficient. With same memory space, > > +EMC can store 8k flows while SMC can store 1M flows. When traffic > > +flow count is much larger than EMC size, it is generally beneficial > > +to turn off EMC and turn on SMC. It is currently turned off by > > +default and an > > experimental feature. > > + > > +To turn on SMC:: > > + > > + $ ovs-vsctl --no-wait set Open_vSwitch . > > + other_config:smc-enable=true > > diff --git a/NEWS b/NEWS > > index 92e9b92..f30a1e0 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -44,6 +44,8 @@ Post-v2.9.0 > > ovs-appctl dpif-netdev/pmd-perf-show > > * Supervision of PMD performance metrics and logging of suspicious > > iterations > > + * Add signature match cache (SMC) as experimental feature. When > > + turned > > on, > > + it improves throughput when traffic has many more flows than EMC > size. > > - ERSPAN: > > * Implemented ERSPAN protocol (draft-foschiano-erspan-00.txt) for > > both kernel datapath and userspace datapath. > > diff --git a/lib/cmap.c b/lib/cmap.c > > index 07719a8..cb9cd32 100644 > > --- a/lib/cmap.c > > +++ b/lib/cmap.c > > @@ -373,6 +373,80 @@ cmap_find(const struct cmap *cmap, uint32_t hash) > > hash); > > } > > > > +/* Find a node by the index of the entry of cmap. Index N means the > > +N/CMAP_K > > + * bucket and N%CMAP_K entry in that bucket. > > + * Notice that it is not protected by the optimistic lock > > +(versioning) because > > + * it does not compare the hashes. Currently it is only used by the > > +datapath > > + * SMC cache. > > + * > > + * Return node for the entry of index or NULL if the index beyond > > +boundary */ const struct cmap_node * cmap_find_by_index(const struct > > +cmap *cmap, uint32_t index) { > > + const struct cmap_impl *impl = cmap_get_impl(cmap); > > + > > + uint32_t b = index / CMAP_K; > > + uint32_t e = index % CMAP_K; > > + > > + if (b > impl->mask) { > > + return NULL; > > + } > > + > > + const struct cmap_bucket *bucket = &impl->buckets[b]; > > + > > + return cmap_node_next(&bucket->nodes[e]); > > +} > > + > > +/* Find the index of certain hash value. Currently only used by the > > +datapath > > + * SMC cache. > > + * > > + * Return the index of the entry if found, or UINT32_MAX if not found. > > +The > > + * function assumes entry index cannot be larger than UINT32_MAX. */ > > +uint32_t cmap_find_index(const struct cmap *cmap, uint32_t hash) { > > + const struct cmap_impl *impl = cmap_get_impl(cmap); > > + uint32_t h1 = rehash(impl, hash); > > + uint32_t h2 = other_hash(h1); > > + > > + uint32_t b_index1 = h1 & impl->mask; > > + uint32_t b_index2 = h2 & impl->mask; > > + > > + uint32_t c1, c2; > > + uint32_t index = UINT32_MAX; > > + > > + const struct cmap_bucket *b1 = &impl->buckets[b_index1]; > > + const struct cmap_bucket *b2 = &impl->buckets[b_index2]; > > + > > + do { > > + do { > > + c1 = read_even_counter(b1); > > + for (int i = 0; i < CMAP_K; i++) { > > + if (b1->hashes[i] == hash) { > > + index = b_index1 * CMAP_K + i; > > + } > > + } > > + } while (OVS_UNLIKELY(counter_changed(b1, c1))); > > + if (index != UINT32_MAX) { > > + break; > > + } > > + do { > > + c2 = read_even_counter(b2); > > + for (int i = 0; i < CMAP_K; i++) { > > + if (b2->hashes[i] == hash) { > > + index = b_index2 * CMAP_K + i; > > + } > > + } > > + } while (OVS_UNLIKELY(counter_changed(b2, c2))); > > + > > + if (index != UINT32_MAX) { > > + break; > > + } > > + } while (OVS_UNLIKELY(counter_changed(b1, c1))); > > + > > + return index; > > +} > > + > > /* Looks up multiple 'hashes', when the corresponding bit in 'map' is > 1, > > * and sets the corresponding pointer in 'nodes', if the hash value was > > * found from the 'cmap'. In other cases the 'nodes' values are not > > changed, diff --git a/lib/cmap.h b/lib/cmap.h index 8bfb6c0..d9db3c9 > > 100644 > > --- a/lib/cmap.h > > +++ b/lib/cmap.h > > @@ -145,6 +145,17 @@ size_t cmap_replace(struct cmap *, struct > > cmap_node *old_node, const struct cmap_node *cmap_find(const struct > > cmap *, uint32_t hash); struct cmap_node *cmap_find_protected(const > > struct cmap *, uint32_t hash); > > > > +/* Find node by index or find index by hash. The 'index' of a cmap > > +entry is a > > + * way to combine the specific bucket and the entry of the bucket > > +into a > > + * convenient single integer value. In other words, it is the index > > +of the > > + * entry and each entry has an unique index. It is not used > > +internally by > > + * cmap. > > + * Currently the functions assume index will not be larger than > > +uint32_t. In > > + * OvS table size is usually much smaller than this size.*/ const > > +struct cmap_node * cmap_find_by_index(const struct cmap *, > > + uint32_t index); uint32_t > > +cmap_find_index(const struct cmap *, uint32_t hash); > > + > > /* Looks up multiple 'hashes', when the corresponding bit in 'map' is > 1, > > * and sets the corresponding pointer in 'nodes', if the hash value was > > * found from the 'cmap'. In other cases the 'nodes' values are not > > changed, diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > > index b8aa4e3..299d52a 100644 > > --- a/lib/dpif-netdev-perf.h > > +++ b/lib/dpif-netdev-perf.h > > @@ -56,6 +56,7 @@ extern "C" { > > > > enum pmd_stat_type { > > PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). > */ > > + PMD_STAT_SMC_HIT, /* Packets that had a sig match hit (SMC). > */ > > PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. > */ > > PMD_STAT_MISS, /* Packets that did not match and upcall > was ok. */ > > PMD_STAT_LOST, /* Packets that did not match and upcall > failed. */ > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > 8b3556d..13a20f0 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -130,7 +130,9 @@ struct netdev_flow_key { > > uint64_t buf[FLOW_MAX_PACKET_U64S]; }; > > > > -/* Exact match cache for frequently used flows > > +/* EMC cache and SMC cache compose the datapath flow cache (DFC) > > + * > > + * Exact match cache for frequently used flows > > * > > * The cache uses a 32-bit hash of the packet (which can be the RSS > hash) to > > * search its entries for a miniflow that matches exactly the > > miniflow of the @@ > > -144,6 +146,17 @@ struct netdev_flow_key { > > * value is the index of a cache entry where the miniflow could be. > > * > > * > > + * Signature match cache (SMC) > > + * > > + * This cache stores a 16-bit signature for each flow without storing > > + keys, and > > + * stores the corresponding 16-bit flow_table index to the > 'dp_netdev_flow'. > > + * Each flow thus occupies 32bit which is much more memory efficient > > + than > > EMC. > > + * SMC uses a set-associative design that each bucket contains > > + * SMC_ENTRY_PER_BUCKET number of entries. > > + * Since 16-bit flow_table index is used, if there are more than 2^16 > > + * dp_netdev_flow, SMC will miss them that cannot be indexed by a > > + 16-bit > > value. > > + * > > + * > > * Thread-safety > > * ============= > > * > > @@ -156,6 +169,14 @@ struct netdev_flow_key { #define > > EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) #define > > EM_FLOW_HASH_SEGS 2 > > > > +/* SMC uses a set-associative design. A bucket contains a set of > > +entries that > > + * a flow item can occupy. For now, it uses one hash function rather > > +than two > > + * as for the EMC design. */ > > +#define SMC_ENTRY_PER_BUCKET 4 > > +#define SMC_ENTRIES (1u << 20) > > +#define SMC_BUCKET_CNT (SMC_ENTRIES / SMC_ENTRY_PER_BUCKET) > > #define > > +SMC_MASK (SMC_BUCKET_CNT - 1) > > + > > /* Default EMC insert probability is 1 / > > DEFAULT_EM_FLOW_INSERT_INV_PROB */ #define > DEFAULT_EM_FLOW_INSERT_INV_PROB 100 > > #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ > > @@ -171,6 +192,21 @@ struct emc_cache { > > int sweep_idx; /* For emc_cache_slow_sweep(). */ > > }; > > > > +struct smc_bucket { > > + uint16_t sig[SMC_ENTRY_PER_BUCKET]; > > + uint16_t flow_idx[SMC_ENTRY_PER_BUCKET]; }; > > + > > +/* Signature match cache, differentiate from EMC cache */ struct > > +smc_cache { > > + struct smc_bucket buckets[SMC_BUCKET_CNT]; }; > > + > > +struct dfc_cache { > > + struct emc_cache emc_cache; > > + struct smc_cache smc_cache; > > +}; > > + > > /* Iterate in the exact match cache through every entry that might > contain a > > * miniflow with hash 'HASH'. */ > > #define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \ @@ > > -215,10 +251,11 @@ static void dpcls_insert(struct dpcls *, struct > > dpcls_rule *, > > const struct netdev_flow_key *mask); static > > void dpcls_remove(struct dpcls *, struct dpcls_rule *); static bool > > dpcls_lookup(struct dpcls *cls, > > - const struct netdev_flow_key keys[], > > + const struct netdev_flow_key *keys[], > > struct dpcls_rule **rules, size_t cnt, > > int *num_lookups_p); > > - > > > +static bool dpcls_rule_matches_key(const struct dpcls_rule *rule, > > + const struct netdev_flow_key *target); > > /* Set of supported meter flags */ > > #define DP_SUPPORTED_METER_FLAGS_MASK \ > > (OFPMF13_STATS | OFPMF13_PKTPS | OFPMF13_KBPS | OFPMF13_BURST) @@ > > -285,6 +322,8 @@ struct dp_netdev { > > OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; > > /* Enable collection of PMD performance metrics. */ > > atomic_bool pmd_perf_metrics; > > + /* Enable the SMC cache from ovsdb config */ > > + atomic_bool smc_enable_db; > > > > /* Protects access to ofproto-dpif-upcall interface during > revalidator > > * thread synchronization. */ > > @@ -587,7 +626,7 @@ struct dp_netdev_pmd_thread { > > * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly > > * need to be protected by 'non_pmd_mutex'. Every other instance > > * will only be accessed by its own pmd thread. */ > > - struct emc_cache flow_cache; > > + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache; > > > > /* Flow-Table and classifiers > > * > > @@ -755,6 +794,7 @@ static int dpif_netdev_xps_get_tx_qid(const struct > > dp_netdev_pmd_thread *pmd, > > > > static inline bool emc_entry_alive(struct emc_entry *ce); static > > void emc_clear_entry(struct emc_entry *ce); > > +static void smc_clear_entry(struct smc_bucket *b, int idx); > > > > static void dp_netdev_request_reconfigure(struct dp_netdev *dp); > > static inline bool @@ -777,6 +817,24 @@ emc_cache_init(struct > > emc_cache *flow_cache) } > > > > static void > > +smc_cache_init(struct smc_cache *smc_cache) { > > + int i, j; > > + for (i = 0; i < SMC_BUCKET_CNT; i++) { > > + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { > > + smc_cache->buckets[i].flow_idx[j] = UINT16_MAX; > > + } > > + } > > +} > > + > > +static void > > +dfc_cache_init(struct dfc_cache *flow_cache) { > > + emc_cache_init(&flow_cache->emc_cache); > > + smc_cache_init(&flow_cache->smc_cache); > > +} > > + > > +static void > > emc_cache_uninit(struct emc_cache *flow_cache) { > > int i; > > @@ -786,6 +844,25 @@ emc_cache_uninit(struct emc_cache *flow_cache) > > } > > } > > > > +static void > > +smc_cache_uninit(struct smc_cache *smc) { > > + int i, j; > > + > > + for (i = 0; i < SMC_BUCKET_CNT; i++) { > > + for (j = 0; j < SMC_ENTRY_PER_BUCKET; j++) { > > + smc_clear_entry(&(smc->buckets[i]), j); > > + } > > + } > > +} > > + > > +static void > > +dfc_cache_uninit(struct dfc_cache *flow_cache) { > > + smc_cache_uninit(&flow_cache->smc_cache); > > + emc_cache_uninit(&flow_cache->emc_cache); > > +} > > + > > /* Check and clear dead flow references slowly (one entry at each > > * invocation). */ > > static void > > @@ -897,6 +974,7 @@ pmd_info_show_stats(struct ds *reply, > > " packet recirculations: %"PRIu64"\n" > > " avg. datapath passes per packet: %.02f\n" > > " emc hits: %"PRIu64"\n" > > + " smc hits: %"PRIu64"\n" > > " megaflow hits: %"PRIu64"\n" > > " avg. subtable lookups per megaflow hit: %.02f\n" > > " miss with success upcall: %"PRIu64"\n" > > @@ -904,6 +982,7 @@ pmd_info_show_stats(struct ds *reply, > > " avg. packets per output batch: %.02f\n", > > total_packets, stats[PMD_STAT_RECIRC], > > passes_per_pkt, stats[PMD_STAT_EXACT_HIT], > > + stats[PMD_STAT_SMC_HIT], > > stats[PMD_STAT_MASKED_HIT], lookups_per_hit, > > stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], > > packets_per_batch); @@ -1617,6 +1696,7 @@ > > dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats > > *stats) > > stats->n_flows += cmap_count(&pmd->flow_table); > > pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > > stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > > + stats->n_hit += pmd_stats[PMD_STAT_SMC_HIT]; > > stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > > stats->n_missed += pmd_stats[PMD_STAT_MISS]; > > stats->n_lost += pmd_stats[PMD_STAT_LOST]; @@ -2721,10 > > +2801,11 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, > > * probability of 1/100 ie. 1% */ > > > > uint32_t min; > > + > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); > > > > if (min && random_uint32() <= min) { > > - emc_insert(&pmd->flow_cache, key, flow); > > + emc_insert(&(pmd->flow_cache).emc_cache, key, flow); > > } > > } > > > > @@ -2746,6 +2827,86 @@ emc_lookup(struct emc_cache *cache, const > > struct netdev_flow_key *key) > > return NULL; > > } > > > > +static inline const struct cmap_node * smc_entry_get(struct > > +dp_netdev_pmd_thread *pmd, const uint32_t hash) { > > + struct smc_cache *cache = &(pmd->flow_cache).smc_cache; > > + struct smc_bucket *bucket = &cache->buckets[hash & SMC_MASK]; > > + uint16_t sig = hash >> 16; > > + uint16_t index = UINT16_MAX; > > + > > + for (int i = 0; i < SMC_ENTRY_PER_BUCKET; i++) { > > + if (bucket->sig[i] == sig) { > > + index = bucket->flow_idx[i]; > > + break; > > + } > > + } > > + if (index != UINT16_MAX) { > > + return cmap_find_by_index(&pmd->flow_table, index); > > + } > > + return NULL; > > +} > > + > > +static void > > +smc_clear_entry(struct smc_bucket *b, int idx) { > > + b->flow_idx[idx] = UINT16_MAX; > > +} > > + > > +/* Insert the flow_table index into SMC. Insertion may fail when 1) > > +SMC is > > + * turned off, 2) the flow_table index is larger than uint16_t can > handle. > > + * If there is already an SMC entry having same signature, the index > > +will be > > + * updated. If there is no existing entry, but an empty entry is > > +available, > > + * the empty entry will be taken. If no empty entry or existing same > > +signature, > > + * a random entry from the hashed bucket will be picked. */ static > > +inline void smc_insert(struct dp_netdev_pmd_thread *pmd, > > + const struct netdev_flow_key *key, > > + uint32_t hash) > > +{ > > + struct smc_cache *smc_cache = &(pmd->flow_cache).smc_cache; > > + struct smc_bucket *bucket = &smc_cache->buckets[key->hash & > > SMC_MASK]; > > + uint16_t index; > > + uint32_t cmap_index; > > + bool smc_enable_db; > > + int i; > > + > > + atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db); > > + if (!smc_enable_db) { > > + return; > > + } > > + > > + cmap_index = cmap_find_index(&pmd->flow_table, hash); > > + index = (cmap_index >= UINT16_MAX) ? UINT16_MAX : > > + (uint16_t)cmap_index; > > + > > + /* If the index is larger than SMC can handle (uint16_t), we don't > > + * insert */ > > + if (index == UINT16_MAX) { > > + return; > > + } > > + > > + /* If an entry with same signature already exists, update the index > */ > > + uint16_t sig = key->hash >> 16; > > + for (i = 0; i < SMC_ENTRY_PER_BUCKET; i++) { > > + if (bucket->sig[i] == sig) { > > + bucket->flow_idx[i] = index; > > + return; > > + } > > + } > > + /* If there is an empty entry, occupy it. */ > > + for (i = 0; i < SMC_ENTRY_PER_BUCKET; i++) { > > + if (bucket->flow_idx[i] == UINT16_MAX) { > > + bucket->sig[i] = sig; > > + bucket->flow_idx[i] = index; > > + return; > > + } > > + } > > + /* Otherwise, pick a random entry. */ > > + i = random_uint32() % SMC_ENTRY_PER_BUCKET; > > + bucket->sig[i] = sig; > > + bucket->flow_idx[i] = index; > > +} > > + > > static struct dp_netdev_flow * > > dp_netdev_pmd_lookup_flow(struct dp_netdev_pmd_thread *pmd, > > const struct netdev_flow_key *key, @@ > > -2759,7 +2920,7 @@ dp_netdev_pmd_lookup_flow(struct > > dp_netdev_pmd_thread *pmd, > > > > cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > > if (OVS_LIKELY(cls)) { > > - dpcls_lookup(cls, key, &rule, 1, lookup_num_p); > > + dpcls_lookup(cls, &key, &rule, 1, lookup_num_p); > > netdev_flow = dp_netdev_flow_cast(rule); > > } > > return netdev_flow; > > @@ -3606,6 +3767,17 @@ dpif_netdev_set_config(struct dpif *dpif, const > > struct smap *other_config) > > } > > } > > > > + bool smc_enable = smap_get_bool(other_config, "smc-enable", false); > > + bool cur_smc; > > + atomic_read_relaxed(&dp->smc_enable_db, &cur_smc); > > + if (smc_enable != cur_smc) { > > + atomic_store_relaxed(&dp->smc_enable_db, smc_enable); > > + if (smc_enable) { > > + VLOG_INFO("SMC cache is enabled"); > > + } else { > > + VLOG_INFO("SMC cache is disabled"); > > + } > > + } > > return 0; > > } > > > > @@ -4740,7 +4912,7 @@ pmd_thread_main(void *f_) > > ovs_numa_thread_setaffinity_core(pmd->core_id); > > dpdk_set_lcore_id(pmd->core_id); > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > > - emc_cache_init(&pmd->flow_cache); > > + dfc_cache_init(&pmd->flow_cache); > > reload: > > pmd_alloc_static_tx_qid(pmd); > > > > @@ -4794,7 +4966,7 @@ reload: > > coverage_try_clear(); > > dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt); > > if (!ovsrcu_try_quiesce()) { > > - emc_cache_slow_sweep(&pmd->flow_cache); > > + emc_cache_slow_sweep(&((pmd->flow_cache).emc_cache)); > > } > > > > atomic_read_relaxed(&pmd->reload, &reload); @@ -4819,7 > > +4991,7 @@ reload: > > goto reload; > > } > > > > - emc_cache_uninit(&pmd->flow_cache); > > + dfc_cache_uninit(&pmd->flow_cache); > > free(poll_list); > > pmd_free_cached_ports(pmd); > > return NULL; > > @@ -5255,7 +5427,7 @@ dp_netdev_configure_pmd(struct > > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > > /* init the 'flow_cache' since there is no > > * actual thread created for NON_PMD_CORE_ID. */ > > if (core_id == NON_PMD_CORE_ID) { > > - emc_cache_init(&pmd->flow_cache); > > + dfc_cache_init(&pmd->flow_cache); > > pmd_alloc_static_tx_qid(pmd); > > } > > pmd_perf_stats_init(&pmd->perf_stats); > > @@ -5298,7 +5470,7 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct > > dp_netdev_pmd_thread *pmd) > > * but extra cleanup is necessary */ > > if (pmd->core_id == NON_PMD_CORE_ID) { > > ovs_mutex_lock(&dp->non_pmd_mutex); > > - emc_cache_uninit(&pmd->flow_cache); > > + dfc_cache_uninit(&pmd->flow_cache); > > pmd_free_cached_ports(pmd); > > pmd_free_static_tx_qid(pmd); > > ovs_mutex_unlock(&dp->non_pmd_mutex); > > @@ -5602,10 +5774,72 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > > packet_batch_per_flow_update(batch, pkt, tcp_flags); } > > > > -/* Try to process all ('cnt') the 'packets' using only the exact > > match cache > > +/* SMC lookup function for a batch of packets. > > + * By doing batching SMC lookup, we can use prefetch > > + * to hide memory access latency. > > + */ > > +static inline void > > +smc_lookup_batch(struct dp_netdev_pmd_thread *pmd, > > + struct netdev_flow_key *keys, > > + struct netdev_flow_key **missed_keys, > > + struct dp_packet_batch *packets_, > > + struct packet_batch_per_flow batches[], > > + size_t *n_batches, const int cnt) { > > + int i; > > + struct dp_packet *packet; > > + size_t n_smc_hit = 0, n_missed = 0; > > + struct dfc_cache *cache = &pmd->flow_cache; > > + struct smc_cache *smc_cache = &cache->smc_cache; > > + const struct cmap_node *flow_node; > > + > > + /* Prefetch buckets for all packets */ > > + for (i = 0; i < cnt; i++) { > > + OVS_PREFETCH(&smc_cache->buckets[keys[i].hash & SMC_MASK]); > > + } > > + > > + DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { > > + struct dp_netdev_flow *flow = NULL; > > + flow_node = smc_entry_get(pmd, keys[i].hash); > > + bool hit = false; > > + > > + if (OVS_LIKELY(flow_node != NULL)) { > > + CMAP_NODE_FOR_EACH (flow, node, flow_node) { > > + /* Since we dont have per-port megaflow to check the > port > > + * number, we need to verify that the input ports > match. */ > > + if (OVS_LIKELY(dpcls_rule_matches_key(&flow->cr, > &keys[i]) && > > + flow->flow.in_port.odp_port == packet- > >md.in_port.odp_port)) { > > + /* SMC hit and emc miss, we insert into EMC */ > > + emc_probabilistic_insert(pmd, &keys[i], flow); > > + keys[i].len = > > + > netdev_flow_key_size(miniflow_n_values(&keys[i].mf)); > > + dp_netdev_queue_batches(packet, flow, > > + miniflow_get_tcp_flags(&keys[i].mf), batches, > n_batches); > > + n_smc_hit++; > > + hit = true; > > + break; > > + } > > + } > > + if (hit) { > > + continue; > > + } > > + } > > + > > + /* SMC missed. Group missed packets together at > > + * the beginning of the 'packets' array. */ > > + dp_packet_batch_refill(packets_, packet, i); > > + /* Put missed keys to the pointer arrays return to the caller > */ > > + missed_keys[n_missed++] = &keys[i]; > > + } > > + > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, > > +n_smc_hit); } > > + > > +/* Try to process all ('cnt') the 'packets' using only the datapath > > +flow cache > > * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', > the > > * miniflow is copied into 'keys' and the packet pointer is moved at > > the > > - * beginning of the 'packets' array. > > + * beginning of the 'packets' array. The pointers of missed keys are > > + put in the > > + * missed_keys pointer array for future processing. > > * > > * The function returns the number of packets that needs to be > processed in the > > * 'packets' array (they have been moved to the beginning of the > vector). > > @@ -5617,21 +5851,24 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > > * will be ignored. > > */ > > static inline size_t > > -emc_processing(struct dp_netdev_pmd_thread *pmd, > > +dfc_processing(struct dp_netdev_pmd_thread *pmd, > > struct dp_packet_batch *packets_, > > struct netdev_flow_key *keys, > > + struct netdev_flow_key **missed_keys, > > struct packet_batch_per_flow batches[], size_t > *n_batches, > > bool md_is_valid, odp_port_t port_no) { > > - struct emc_cache *flow_cache = &pmd->flow_cache; > > struct netdev_flow_key *key = &keys[0]; > > - size_t n_missed = 0, n_dropped = 0; > > + size_t n_missed = 0, n_emc_hit = 0; > > + struct dfc_cache *cache = &pmd->flow_cache; > > struct dp_packet *packet; > > const size_t cnt = dp_packet_batch_size(packets_); > > uint32_t cur_min; > > int i; > > uint16_t tcp_flags; > > + bool smc_enable_db; > > > > + atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db); > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); > > pmd_perf_update_counter(&pmd->perf_stats, > > md_is_valid ? PMD_STAT_RECIRC : > > PMD_STAT_RECV, @@ - > > 5643,7 +5880,6 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > > > > if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { > > dp_packet_delete(packet); > > - n_dropped++; > > continue; > > } > > > > @@ -5671,15 +5907,17 @@ emc_processing(struct dp_netdev_pmd_thread > > *pmd, > > > > miniflow_extract(packet, &key->mf); > > key->len = 0; /* Not computed yet. */ > > - /* If EMC is disabled skip hash computation and emc_lookup */ > > - if (cur_min) { > > + /* If EMC and SMC disabled skip hash computation */ > > + if (smc_enable_db == true || cur_min != 0) { > > if (!md_is_valid) { > > key->hash = > dpif_netdev_packet_get_rss_hash_orig_pkt(packet, > > &key->mf); > > } else { > > key->hash = dpif_netdev_packet_get_rss_hash(packet, > &key->mf); > > } > > - flow = emc_lookup(flow_cache, key); > > + } > > + if (cur_min) { > > + flow = emc_lookup(&cache->emc_cache, key); > > } else { > > flow = NULL; > > } > > @@ -5687,19 +5925,30 @@ emc_processing(struct dp_netdev_pmd_thread > > *pmd, > > tcp_flags = miniflow_get_tcp_flags(&key->mf); > > dp_netdev_queue_batches(packet, flow, tcp_flags, batches, > > n_batches); > > + n_emc_hit++; > > } else { > > /* Exact match cache missed. Group missed packets together > at > > * the beginning of the 'packets' array. */ > > dp_packet_batch_refill(packets_, packet, i); > > /* 'key[n_missed]' contains the key of the current packet > and it > > - * must be returned to the caller. The next key should be > extracted > > - * to 'keys[n_missed + 1]'. */ > > + * will be passed to SMC lookup. The next key should be > extracted > > + * to 'keys[n_missed + 1]'. > > + * We also maintain a pointer array to keys missed both SMC > and EMC > > + * which will be returned to the caller for future > processing. */ > > + missed_keys[n_missed] = key; > > key = &keys[++n_missed]; > > } > > } > > > > - pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > > - cnt - n_dropped - n_missed); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > > + n_emc_hit); > > + > > + if (!smc_enable_db) { > > + return dp_packet_batch_size(packets_); > > + } > > + > > + /* Packets miss EMC will do a batch lookup in SMC if enabled */ > > + smc_lookup_batch(pmd, keys, missed_keys, packets_, batches, > > + n_batches, n_missed); > > > > return dp_packet_batch_size(packets_); } @@ -5767,6 +6016,8 @@ > > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > > add_actions->size); > > } > > ovs_mutex_unlock(&pmd->flow_mutex); > > + uint32_t hash = dp_netdev_flow_hash(&netdev_flow->ufid); > > + smc_insert(pmd, key, hash); > > emc_probabilistic_insert(pmd, key, netdev_flow); > > } > > if (pmd_perf_metrics_enabled(pmd)) { @@ -5783,7 +6034,7 @@ > > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, static inline > > void fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > struct dp_packet_batch *packets_, > > - struct netdev_flow_key *keys, > > + struct netdev_flow_key **keys, > > struct packet_batch_per_flow batches[], > > size_t *n_batches, > > odp_port_t in_port) @@ -5805,12 +6056,13 @@ > > fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > > > for (size_t i = 0; i < cnt; i++) { > > /* Key length is needed in all the cases, hash computed on > demand. */ > > - keys[i].len = > netdev_flow_key_size(miniflow_n_values(&keys[i].mf)); > > + keys[i]->len = > > + netdev_flow_key_size(miniflow_n_values(&keys[i]->mf)); > > } > > /* Get the classifier for the in_port */ > > cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > > if (OVS_LIKELY(cls)) { > > - any_miss = !dpcls_lookup(cls, keys, rules, cnt, &lookup_cnt); > > + any_miss = !dpcls_lookup(cls, (const struct netdev_flow_key > **)keys, > > + rules, cnt, &lookup_cnt); > > } else { > > any_miss = true; > > memset(rules, 0, sizeof(rules)); @@ -5832,7 +6084,7 @@ > > fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > /* It's possible that an earlier slow path execution > installed > > * a rule covering this flow. In this case, it's a lot > cheaper > > * to catch it here than execute a miss. */ > > - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i], > > + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, keys[i], > > &add_lookup_cnt); > > if (netdev_flow) { > > lookup_cnt += add_lookup_cnt; @@ -5840,7 +6092,7 @@ > > fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > continue; > > } > > > > - int error = handle_packet_upcall(pmd, packet, &keys[i], > > + int error = handle_packet_upcall(pmd, packet, keys[i], > > &actions, &put_actions); > > > > if (OVS_UNLIKELY(error)) { @@ -5870,10 +6122,12 @@ > > fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > } > > > > flow = dp_netdev_flow_cast(rules[i]); > > + uint32_t hash = dp_netdev_flow_hash(&flow->ufid); > > + smc_insert(pmd, keys[i], hash); > > > > - emc_probabilistic_insert(pmd, &keys[i], flow); > > + emc_probabilistic_insert(pmd, keys[i], flow); > > dp_netdev_queue_batches(packet, flow, > > - miniflow_get_tcp_flags(&keys[i].mf), > > + miniflow_get_tcp_flags(&keys[i]->mf), > > batches, n_batches); > > } > > > > @@ -5904,17 +6158,18 @@ dp_netdev_input__(struct dp_netdev_pmd_thread > > *pmd, #endif > > OVS_ALIGNED_VAR(CACHE_LINE_SIZE) > > struct netdev_flow_key keys[PKT_ARRAY_SIZE]; > > + struct netdev_flow_key *missed_keys[PKT_ARRAY_SIZE]; > > struct packet_batch_per_flow batches[PKT_ARRAY_SIZE]; > > size_t n_batches; > > odp_port_t in_port; > > > > n_batches = 0; > > - emc_processing(pmd, packets, keys, batches, &n_batches, > > + dfc_processing(pmd, packets, keys, missed_keys, batches, > > + &n_batches, > > md_is_valid, port_no); > > if (!dp_packet_batch_is_empty(packets)) { > > /* Get ingress port from first packet's metadata. */ > > in_port = packets->packets[0]->md.in_port.odp_port; > > - fast_path_processing(pmd, packets, keys, > > + fast_path_processing(pmd, packets, missed_keys, > > batches, &n_batches, in_port); > > } > > > > @@ -6864,7 +7119,7 @@ dpcls_remove(struct dpcls *cls, struct > > dpcls_rule > > *rule) > > > > /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each > 1-bit > > * in 'mask' the values in 'key' and 'target' are the same. */ > > -static inline bool > > +static bool > > dpcls_rule_matches_key(const struct dpcls_rule *rule, > > const struct netdev_flow_key *target) { @@ > > -6891,7 +7146,7 @@ dpcls_rule_matches_key(const struct dpcls_rule *rule, > > * > > * Returns true if all miniflows found a corresponding rule. */ > > static bool - dpcls_lookup(struct dpcls *cls, const struct > > netdev_flow_key keys[], > > +dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key *keys[], > > struct dpcls_rule **rules, const size_t cnt, > > int *num_lookups_p) > > { > > @@ -6930,7 +7185,7 @@ dpcls_lookup(struct dpcls *cls, const struct > > netdev_flow_key keys[], > > * masked with the subtable's mask to avoid hashing the > wildcarded > > * bits. */ > > ULLONG_FOR_EACH_1(i, keys_map) { > > - hashes[i] = netdev_flow_key_hash_in_mask(&keys[i], > > + hashes[i] = netdev_flow_key_hash_in_mask(keys[i], > > &subtable->mask); > > } > > /* Lookup. */ > > @@ -6944,7 +7199,7 @@ dpcls_lookup(struct dpcls *cls, const struct > > netdev_flow_key keys[], > > struct dpcls_rule *rule; > > > > CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) { > > - if (OVS_LIKELY(dpcls_rule_matches_key(rule, &keys[i]))) > { > > + if (OVS_LIKELY(dpcls_rule_matches_key(rule, > > + keys[i]))) { > > rules[i] = rule; > > /* Even at 20 Mpps the 32-bit hit_cnt cannot wrap > > * within one second optimization interval. */ > > diff --git a/tests/pmd.at b/tests/pmd.at index f3fac63..60452f5 100644 > > --- a/tests/pmd.at > > +++ b/tests/pmd.at > > @@ -185,6 +185,7 @@ CHECK_PMD_THREADS_CREATED() AT_CHECK([ovs- appctl > > vlog/set dpif_netdev:dbg]) AT_CHECK([ovs-ofctl add-flow br0 > > action=normal]) AT_CHECK([ovs-vsctl set Open_vSwitch . > > other_config:emc- > > insert-inv-prob=1]) > > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:smc-enable=true]) > > > > sleep 1 > > > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > > 63a3a2e..6342949 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -405,6 +405,19 @@ > > </p> > > </column> > > > > + <column name="other_config" key="smc-enable" > > + type='{"type": "boolean"}'> > > + <p> > > + Signature match cache or SMC is a cache between EMC and > megaflow > > + cache. It does not store the full key of the flow, so it is > more > > + memory efficient comparing to EMC cache. SMC is especially > useful > > + when flow count is larger than EMC capacity. > > + </p> > > + <p> > > + Defaults to false but can be changed at any time. > > + </p> > > + </column> > > + > > <column name="other_config" key="n-handler-threads" > > type='{"type": "integer", "minInteger": 1}'> > > <p> > > -- > > 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev