[ovs-dev] [ovs-dev v3 4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread Peng He
When PMDs perform upcalls, the newly generated ukey will replace
the old, however, the newly generated mageflow will be discard
to reuse the old one without checking if the actions of new and
old are equal.

This code prevents in case someone runs dpctl/add-flow to add
a dp flow with inconsistent actions with the actions of ukey,
and causes more confusion.

Signed-off-by: Peng He 
---
 lib/dpif-netdev.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a45b46014..b316e59ef 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -8304,7 +8304,22 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
  * to be locking revalidators out of making flow modifications. */
 ovs_mutex_lock(>flow_mutex);
 netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
-if (OVS_LIKELY(!netdev_flow)) {
+if (OVS_UNLIKELY(netdev_flow)) {
+struct dp_netdev_actions *old_act =
+dp_netdev_flow_get_actions(netdev_flow);
+
+if ((add_actions->size != old_act->size) ||
+memcmp(old_act->actions, add_actions->data,
+ add_actions->size)) {
+
+   struct dp_netdev_actions *new_act =
+   dp_netdev_actions_create(add_actions->data,
+add_actions->size);
+
+   ovsrcu_set(_flow->actions, new_act);
+   ovsrcu_postpone(dp_netdev_actions_free, old_act);
+}
+} else {
 netdev_flow = dp_netdev_flow_add(pmd, , ,
  add_actions->data,
  add_actions->size, orig_in_port);
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v3 3/4] ofproto-dpif-upcall: new ukey needs to take the old ukey's dump seq

2022-09-23 Thread Peng He
The userspace datapath mananges all the magaflows by a cmap. The cmap
data structrue will grow/shrink during the datapath processing and it
will re-position megaflows. This might result in two revalidator threads
might process a same megaflow during one dump stage.

Consider a situation that, revalidator 1 processes a megaflow A, and
decides to delete it from the datapath, at the mean time, this megaflow
A is also queued in the process batch of revalidator 2. Normally it's ok
for revalidators to process the same megaflow multiple times, as the
dump_seq shows it's already dumped and the stats will not be contributed
twice.

Assume that right after A is deleted, a PMD thread generates again
a new megaflow B which has the same match and action of A. The ukey
of megaflow B will replace the one of megaflow A. Now the ukey B is
new to the revalidator system and its dump seq is 0.

Now since the dump seq of ukey B is 0, when processing megaflow A,
the revalidator 2 will not identify this megaflow A has already been
dumped by revalidator 1 and will contribute the old megaflow A's stats
again, this results in an inconsistent stats between ukeys and megaflows.

To fix this, the newly generated the ukey B should take the dump_seq
of the replaced ukey A to avoid a same megaflow being revalidated
twice in one dump stage.

We observe in the production environment, the OpenFlow rules' stats
sometimes are amplified compared to the actual value. I believe this
is also the reason that why somtimes there is mismatch between the
ukey and megaflow in stats value. The Eelco's patch
[ovs-dev] [PATCH v2 09/10] revalidator: Fix datapath statistics update
tried to fix it in the past.

Signed-off-by: Peng He 
---
 ofproto/ofproto-dpif-upcall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index e8bbcfeaf..89fad1bdf 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1877,6 +1877,7 @@ try_ukey_replace(struct umap *umap, struct udpif_key 
*old_ukey,
 ovs_mutex_lock(_ukey->mutex);
 cmap_replace(>cmap, _ukey->cmap_node,
  _ukey->cmap_node, new_ukey->hash);
+new_ukey->dump_seq = old_ukey->dump_seq;
 ovsrcu_postpone(ukey_delete__, old_ukey);
 transition_ukey(old_ukey, UKEY_DELETED);
 transition_ukey(new_ukey, UKEY_VISIBLE);
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v3 2/4] ofproto-dpif-upcall: fix race condition

2022-09-23 Thread Peng He
There is a race condition between the revalidator threads and
the handler/pmd threads.

revalidator  PMD threads
push_dp_ops deletes a key and tries
to del the dp magaflow.
 does the upcall, generates a new ukey,
 and replaces the old ukey, now the old
 ukey state is UKEY_DELETED

dp_ops succeeds, tries to change
the old ukey's state into
UKEY_EVICTED, however, the old
ukey's state is already UKEY_DELETED,
so OVS aborts.

I did not observe this in the real environment, as it takes time for
PMDs to finish the upcall and replace the old ukeys. Normally, the
revalidator will change ukey state into UKEY_EVICTED first.
But it's better to cover this case.

Signed-off-by: Peng He 
---
 ofproto/ofproto-dpif-upcall.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 7ea2a30f5..e8bbcfeaf 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
size_t n_ops)
 if (op->dop.error) {
 if (op->ukey) {
 ovs_mutex_lock(>ukey->mutex);
-transition_ukey(op->ukey, UKEY_EVICTED);
+if (op->ukey->state < UKEY_EVICTED) {
+transition_ukey(op->ukey, UKEY_EVICTED);
+}
 ovs_mutex_unlock(>ukey->mutex);
 }
 /* if it's a flow_del error, 'stats' is unusable, it's ok
@@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
size_t n_ops)
 
 if (op->ukey) {
 ovs_mutex_lock(>ukey->mutex);
-transition_ukey(op->ukey, UKEY_EVICTED);
+if (op->ukey->state < UKEY_EVICTED) {
+transition_ukey(op->ukey, UKEY_EVICTED);
+}
 push->used = MAX(stats->used, op->ukey->stats.used);
 push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags;
 push->n_packets = stats->n_packets - op->ukey->stats.n_packets;
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v3 1/4] ofproto-dpif-upcall: fix push_dp_ops

2022-09-23 Thread Peng He
push_dp_ops only handles delete ops errors but ignores the modify
ops results. It's better to handle all the dp operation errors in
a consistent way.

We observe in the production environment that sometimes a megaflow
with wrong actions keep staying in datapath. The coverage command shows
revalidators have dumped several times, however the correct
actions are not set. This implies that the ukey's action does not
equal to the meagaflow's, i.e. revalidators think the underlying
megaflow's actions are correct however they are not.

We also check the megaflow using the ofproto/trace command, and the
actions are not matched with the ones in the actual magaflow. By
performing a revalidator/purge command, the right actions are set.

This patch prevents the inconsistency by considering modify failure
in revalidators.

Signed-off-by: Peng He 
---
 ofproto/ofproto-dpif-upcall.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 7ad728adf..7ea2a30f5 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2416,23 +2416,26 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
size_t n_ops)
 
 for (i = 0; i < n_ops; i++) {
 struct ukey_op *op = [i];
-struct dpif_flow_stats *push, *stats, push_buf;
-
-stats = op->dop.flow_del.stats;
-push = _buf;
-
-if (op->dop.type != DPIF_OP_FLOW_DEL) {
-/* Only deleted flows need their stats pushed. */
-continue;
-}
 
 if (op->dop.error) {
-/* flow_del error, 'stats' is unusable. */
 if (op->ukey) {
 ovs_mutex_lock(>ukey->mutex);
 transition_ukey(op->ukey, UKEY_EVICTED);
 ovs_mutex_unlock(>ukey->mutex);
 }
+/* if it's a flow_del error, 'stats' is unusable, it's ok
+ * to discard the stats.
+ */
+continue;
+}
+
+struct dpif_flow_stats *push, *stats, push_buf;
+
+stats = op->dop.flow_del.stats;
+push = _buf;
+
+if (op->dop.type != DPIF_OP_FLOW_DEL) {
+/* Only deleted flows need their stats pushed. */
 continue;
 }
 
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread 0-day Robot
Bleep bloop.  Greetings Kees Cook, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 01/16] slab: Remove __malloc attribute from realloc functions

2022-09-23 Thread 0-day Robot
Bleep bloop.  Greetings Kees Cook, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (include/linux/compiler_types.h).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 slab: Remove __malloc attribute from realloc functions
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 15/16] mm: Make ksize() a reporting-only function

2022-09-23 Thread Kees Cook
With all "silently resizing" callers of ksize() refactored, remove the
logic in ksize() that would allow it to be used to effectively change
the size of an allocation (bypassing __alloc_size hints, etc). Users
wanting this feature need to either use kmalloc_size_roundup() before an
allocation, or call krealloc() directly.

For kfree_sensitive(), move the unpoisoning logic inline. Replace the
open-coded ksize() in __do_krealloc with ksize() now that it doesn't
perform unpoisoning.

Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Roman Gushchin 
Cc: Hyeonggon Yoo <42.hye...@gmail.com>
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Vincenzo Frascino 
Cc: linux...@kvack.org
Cc: kasan-...@googlegroups.com
Signed-off-by: Kees Cook 
---
 mm/slab_common.c | 38 ++
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index d7420cf649f8..60b77bcdc2e3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1160,13 +1160,8 @@ __do_krealloc(const void *p, size_t new_size, gfp_t 
flags)
void *ret;
size_t ks;
 
-   /* Don't use instrumented ksize to allow precise KASAN poisoning. */
-   if (likely(!ZERO_OR_NULL_PTR(p))) {
-   if (!kasan_check_byte(p))
-   return NULL;
-   ks = kfence_ksize(p) ?: __ksize(p);
-   } else
-   ks = 0;
+   /* How large is the allocation actually? */
+   ks = ksize(p);
 
/* If the object still fits, repoison it precisely. */
if (ks >= new_size) {
@@ -1232,8 +1227,10 @@ void kfree_sensitive(const void *p)
void *mem = (void *)p;
 
ks = ksize(mem);
-   if (ks)
+   if (ks) {
+   kasan_unpoison_range(mem, ks);
memzero_explicit(mem, ks);
+   }
kfree(mem);
 }
 EXPORT_SYMBOL(kfree_sensitive);
@@ -1242,10 +1239,11 @@ EXPORT_SYMBOL(kfree_sensitive);
  * ksize - get the actual amount of memory allocated for a given object
  * @objp: Pointer to the object
  *
- * kmalloc may internally round up allocations and return more memory
+ * kmalloc() may internally round up allocations and return more memory
  * than requested. ksize() can be used to determine the actual amount of
- * memory allocated. The caller may use this additional memory, even though
- * a smaller amount of memory was initially specified with the kmalloc call.
+ * allocated memory. The caller may NOT use this additional memory, unless
+ * it calls krealloc(). To avoid an alloc/realloc cycle, callers can use
+ * kmalloc_size_roundup() to find the size of the associated kmalloc bucket.
  * The caller must guarantee that objp points to a valid object previously
  * allocated with either kmalloc() or kmem_cache_alloc(). The object
  * must not be freed during the duration of the call.
@@ -1254,13 +1252,11 @@ EXPORT_SYMBOL(kfree_sensitive);
  */
 size_t ksize(const void *objp)
 {
-   size_t size;
-
/*
-* We need to first check that the pointer to the object is valid, and
-* only then unpoison the memory. The report printed from ksize() is
-* more useful, then when it's printed later when the behaviour could
-* be undefined due to a potential use-after-free or double-free.
+* We need to first check that the pointer to the object is valid.
+* The KASAN report printed from ksize() is more useful, then when
+* it's printed later when the behaviour could be undefined due to
+* a potential use-after-free or double-free.
 *
 * We use kasan_check_byte(), which is supported for the hardware
 * tag-based KASAN mode, unlike kasan_check_read/write().
@@ -1274,13 +1270,7 @@ size_t ksize(const void *objp)
if (unlikely(ZERO_OR_NULL_PTR(objp)) || !kasan_check_byte(objp))
return 0;
 
-   size = kfence_ksize(objp) ?: __ksize(objp);
-   /*
-* We assume that ksize callers could use whole allocated area,
-* so we need to unpoison this area.
-*/
-   kasan_unpoison_range(objp, size);
-   return size;
+   return kfence_ksize(objp) ?: __ksize(objp);
 }
 EXPORT_SYMBOL(ksize);
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 16/16] slab: Restore __alloc_size attribute to __kmalloc_track_caller

2022-09-23 Thread Kees Cook
With skbuff's post-allocation use of ksize() rearranged to use
kmalloc_size_round() prior to allocation, the compiler can correctly
reason about the size of these allocations. The prior mismatch had caused
buffer overflow mitigations to erroneously fire under CONFIG_UBSAN_BOUNDS,
requiring a partial revert of the __alloc_size attributes. Restore the
attribute that had been removed in commit 93dd04ab0b2b ("slab: remove
__alloc_size attribute from __kmalloc_track_caller").

Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Roman Gushchin 
Cc: Hyeonggon Yoo <42.hye...@gmail.com>
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
 include/linux/slab.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 727640173568..297b85ed2c29 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -693,7 +693,8 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, 
size_t size, gfp_t flag
  * allocator where we care about the real place the memory allocation
  * request comes from.
  */
-extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long 
caller);
+extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long 
caller)
+  __alloc_size(1);
 #define kmalloc_track_caller(size, flags) \
__kmalloc_track_caller(size, flags, _RET_IP_)
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 12/16] devres: Use kmalloc_size_roundup() to match ksize() usage

2022-09-23 Thread Kees Cook
Round up allocations with kmalloc_size_roundup() so that devres's use
of ksize() is always accurate and no special handling of the memory is
needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Kees Cook 
---
 drivers/base/devres.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 864d0b3f566e..7db20ce7ea8a 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -101,6 +101,9 @@ static bool check_dr_size(size_t size, size_t *tot_size)
size, tot_size)))
return false;
 
+   /* Actually allocate the full kmalloc bucket size. */
+   *tot_size = kmalloc_size_roundup(*tot_size);
+
return true;
 }
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 14/16] kasan: Remove ksize()-related tests

2022-09-23 Thread Kees Cook
In preparation for no longer unpoisoning in ksize(), remove the behavioral
self-tests for ksize().

Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Vincenzo Frascino 
Cc: Andrew Morton 
Cc: kasan-...@googlegroups.com
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
 lib/test_kasan.c  | 42 --
 mm/kasan/shadow.c |  4 +---
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 58c1b01ccfe2..bdd0ced8f8d7 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -753,46 +753,6 @@ static void kasan_global_oob_left(struct kunit *test)
KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)p);
 }
 
-/* Check that ksize() makes the whole object accessible. */
-static void ksize_unpoisons_memory(struct kunit *test)
-{
-   char *ptr;
-   size_t size = 123, real_size;
-
-   ptr = kmalloc(size, GFP_KERNEL);
-   KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
-   real_size = ksize(ptr);
-
-   OPTIMIZER_HIDE_VAR(ptr);
-
-   /* This access shouldn't trigger a KASAN report. */
-   ptr[size] = 'x';
-
-   /* This one must. */
-   KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[real_size]);
-
-   kfree(ptr);
-}
-
-/*
- * Check that a use-after-free is detected by ksize() and via normal accesses
- * after it.
- */
-static void ksize_uaf(struct kunit *test)
-{
-   char *ptr;
-   int size = 128 - KASAN_GRANULE_SIZE;
-
-   ptr = kmalloc(size, GFP_KERNEL);
-   KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
-   kfree(ptr);
-
-   OPTIMIZER_HIDE_VAR(ptr);
-   KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr));
-   KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]);
-   KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]);
-}
-
 static void kasan_stack_oob(struct kunit *test)
 {
char stack_array[10];
@@ -1392,8 +1352,6 @@ static struct kunit_case kasan_kunit_test_cases[] = {
KUNIT_CASE(kasan_stack_oob),
KUNIT_CASE(kasan_alloca_oob_left),
KUNIT_CASE(kasan_alloca_oob_right),
-   KUNIT_CASE(ksize_unpoisons_memory),
-   KUNIT_CASE(ksize_uaf),
KUNIT_CASE(kmem_cache_double_free),
KUNIT_CASE(kmem_cache_invalid_free),
KUNIT_CASE(kmem_cache_double_destroy),
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 0e3648b603a6..0895c73e9b69 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -124,9 +124,7 @@ void kasan_unpoison(const void *addr, size_t size, bool 
init)
addr = kasan_reset_tag(addr);
 
/*
-* Skip KFENCE memory if called explicitly outside of sl*b. Also note
-* that calls to ksize(), where size is not a multiple of machine-word
-* size, would otherwise poison the invalid portion of the word.
+* Skip KFENCE memory if called explicitly outside of sl*b.
 */
if (is_kfence_address(addr))
return;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

2022-09-23 Thread Kees Cook
Round up allocations with kmalloc_size_roundup() so that mempool's use
of ksize() is always accurate and no special handling of the memory is
needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
 mm/mempool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 96488b13a1ef..0f3107b28e6b 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
  */
 void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
 {
-   size_t size = (size_t)pool_data;
+   size_t size = kmalloc_size_roundup((size_t)pool_data);
return kmalloc(size, gfp_mask);
 }
 EXPORT_SYMBOL(mempool_kmalloc);
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 10/16] openvswitch: Use kmalloc_size_roundup() to match ksize() usage

2022-09-23 Thread Kees Cook
Round up allocations with kmalloc_size_roundup() so that openvswitch's
use of ksize() is always accurate and no special handling of the memory
is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Pravin B Shelar 
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: net...@vger.kernel.org
Cc: d...@openvswitch.org
Signed-off-by: Kees Cook 
---
 net/openvswitch/flow_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 4c09cf8a0ab2..6621873abde2 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2309,7 +2309,7 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int 
size)
 
WARN_ON_ONCE(size > MAX_ACTIONS_BUFSIZE);
 
-   sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL);
+   sfa = kmalloc(kmalloc_size_roundup(sizeof(*sfa) + size), GFP_KERNEL);
if (!sfa)
return ERR_PTR(-ENOMEM);
 
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 11/16] bpf: Use kmalloc_size_roundup() to match ksize() usage

2022-09-23 Thread Kees Cook
Round up allocations with kmalloc_size_roundup() so that the verifier's
use of ksize() is always accurate and no special handling of the memory
is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE. Pass the new size
information back up to callers so they can use the space immediately,
so array resizing to happen less frequently as well. Explicitly zero
any trailing bytes in new allocations.

Additionally fix a memory allocation leak: if krealloc() fails, "arr"
wasn't freed, but NULL was return to the caller of realloc_array() would
be writing NULL to the lvalue, losing the reference to the original
memory.

Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: John Fastabend 
Cc: Andrii Nakryiko 
Cc: Martin KaFai Lau 
Cc: Song Liu 
Cc: Yonghong Song 
Cc: KP Singh 
Cc: Stanislav Fomichev 
Cc: Hao Luo 
Cc: Jiri Olsa 
Cc: b...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 kernel/bpf/verifier.c | 49 +++
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 096fdac70165..80531f8f0d36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -978,42 +978,53 @@ static void print_insn_state(struct bpf_verifier_env *env,
  */
 static void *copy_array(void *dst, const void *src, size_t n, size_t size, 
gfp_t flags)
 {
-   size_t bytes;
+   size_t src_bytes, dst_bytes;
 
if (ZERO_OR_NULL_PTR(src))
goto out;
 
-   if (unlikely(check_mul_overflow(n, size, )))
+   if (unlikely(check_mul_overflow(n, size, _bytes)))
return NULL;
 
-   if (ksize(dst) < bytes) {
+   dst_bytes = kmalloc_size_roundup(src_bytes);
+   if (ksize(dst) < dst_bytes) {
kfree(dst);
-   dst = kmalloc_track_caller(bytes, flags);
+   dst = kmalloc_track_caller(dst_bytes, flags);
if (!dst)
return NULL;
}
 
-   memcpy(dst, src, bytes);
+   memcpy(dst, src, src_bytes);
+   memset(dst + src_bytes, 0, dst_bytes - src_bytes);
 out:
return dst ? dst : ZERO_SIZE_PTR;
 }
 
-/* resize an array from old_n items to new_n items. the array is reallocated 
if it's too
- * small to hold new_n items. new items are zeroed out if the array grows.
+/* Resize an array from old_n items to *new_n items. The array is reallocated 
if it's too
+ * small to hold *new_n items. New items are zeroed out if the array grows. 
Allocation
+ * is rounded up to next kmalloc bucket size to reduce frequency of resizing. 
*new_n
+ * contains the new total number of items that will fit.
  *
- * Contrary to krealloc_array, does not free arr if new_n is zero.
+ * Contrary to krealloc, does not free arr if new_n is zero.
  */
-static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size)
+static void *realloc_array(void *arr, size_t old_n, size_t *new_n, size_t size)
 {
-   if (!new_n || old_n == new_n)
+   void *old_arr = arr;
+   size_t alloc_size;
+
+   if (!new_n || !*new_n || old_n == *new_n)
goto out;
 
-   arr = krealloc_array(arr, new_n, size, GFP_KERNEL);
-   if (!arr)
+   alloc_size = kmalloc_size_roundup(size_mul(*new_n, size));
+   arr = krealloc(old_arr, alloc_size, GFP_KERNEL);
+   if (!arr) {
+   kfree(old_arr);
return NULL;
+   }
 
-   if (new_n > old_n)
-   memset(arr + old_n * size, 0, (new_n - old_n) * size);
+   *new_n = alloc_size / size;
+   if (*new_n > old_n)
+   memset(arr + old_n * size, 0, (*new_n - old_n) * size);
 
 out:
return arr ? arr : ZERO_SIZE_PTR;
@@ -1045,7 +1056,7 @@ static int copy_stack_state(struct bpf_func_state *dst, 
const struct bpf_func_st
 
 static int resize_reference_state(struct bpf_func_state *state, size_t n)
 {
-   state->refs = realloc_array(state->refs, state->acquired_refs, n,
+   state->refs = realloc_array(state->refs, state->acquired_refs, ,
sizeof(struct bpf_reference_state));
if (!state->refs)
return -ENOMEM;
@@ -1061,11 +1072,11 @@ static int grow_stack_state(struct bpf_func_state 
*state, int size)
if (old_n >= n)
return 0;
 
-   state->stack = realloc_array(state->stack, old_n, n, sizeof(struct 
bpf_stack_state));
+   state->stack = realloc_array(state->stack, old_n, , sizeof(struct 
bpf_stack_state));
if (!state->stack)
return -ENOMEM;
 
-   state->allocated_stack = size;
+   state->allocated_stack = n * BPF_REG_SIZE;
return 0;
 }
 
@@ -2472,9 +2483,11 @@ static int push_jmp_history(struct bpf_verifier_env *env,
 {
u32 cnt = cur->jmp_history_cnt;
struct bpf_idx_pair *p;
+   size_t size;
 
cnt++;
-   p = krealloc(cur->jmp_history, cnt * sizeof(*p), GFP_USER);
+   size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
+   p = 

[ovs-dev] [PATCH v2 09/16] coredump: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 fs/coredump.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 9f4aae202109..0894b2c35d98 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -68,7 +68,10 @@ struct core_name {
 
 static int expand_corename(struct core_name *cn, int size)
 {
-   char *corename = krealloc(cn->corename, size, GFP_KERNEL);
+   char *corename;
+
+   size = kmalloc_size_roundup(size);
+   corename = krealloc(cn->corename, size, GFP_KERNEL);
 
if (!corename)
return -ENOMEM;
@@ -76,7 +79,7 @@ static int expand_corename(struct core_name *cn, int size)
if (size > core_name_size) /* racy but harmless */
core_name_size = size;
 
-   cn->size = ksize(corename);
+   cn->size = size;
cn->corename = corename;
return 0;
 }
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 08/16] dma-buf: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Kees Cook 
---
 drivers/dma-buf/dma-resv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 205acb2c744d..5b0a4b8830ff 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list,
 static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences)
 {
struct dma_resv_list *list;
+   size_t size;
 
-   list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL);
+   /* Round up to the next kmalloc bucket size. */
+   size = kmalloc_size_roundup(struct_size(list, table, max_fences));
+
+   list = kmalloc(size, GFP_KERNEL);
if (!list)
return NULL;
 
-   list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) /
+   /* Given the resulting bucket size, recalculated max_fences. */
+   list->max_fences = (size - offsetof(typeof(*list), table)) /
sizeof(*list->table);
 
return list;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 05/16] net: ipa: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: net...@vger.kernel.org
Reviewed-by: Alex Elder 
Link: https://lore.kernel.org/lkml/4d75a9fd-1b94-7208-9de8-5a0102223...@ieee.org
Signed-off-by: Kees Cook 
---
 drivers/net/ipa/gsi_trans.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipa/gsi_trans.c b/drivers/net/ipa/gsi_trans.c
index 18e7e8c405be..eeec149b5d89 100644
--- a/drivers/net/ipa/gsi_trans.c
+++ b/drivers/net/ipa/gsi_trans.c
@@ -88,6 +88,7 @@ struct gsi_tre {
 int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count,
u32 max_alloc)
 {
+   size_t alloc_size;
void *virt;
 
if (!size)
@@ -104,13 +105,15 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, 
size_t size, u32 count,
 * If there aren't enough entries starting at the free index,
 * we just allocate free entries from the beginning of the pool.
 */
-   virt = kcalloc(count + max_alloc - 1, size, GFP_KERNEL);
+   alloc_size = size_mul(count + max_alloc - 1, size);
+   alloc_size = kmalloc_size_roundup(alloc_size);
+   virt = kzalloc(alloc_size, GFP_KERNEL);
if (!virt)
return -ENOMEM;
 
pool->base = virt;
/* If the allocator gave us any extra memory, use it */
-   pool->count = ksize(pool->base) / size;
+   pool->count = alloc_size / size;
pool->free = 0;
pool->max_alloc = max_alloc;
pool->size = size;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 07/16] btrfs: send: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: linux-bt...@vger.kernel.org
Acked-by: David Sterba 
Link: https://lore.kernel.org/lkml/20220922133014.gi32...@suse.cz
Signed-off-by: Kees Cook 
---
 fs/btrfs/send.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e7671afcee4f..d40d65598e8f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -435,6 +435,11 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
path_len = p->end - p->start;
old_buf_len = p->buf_len;
 
+   /*
+* Allocate to the next largest kmalloc bucket size, to let
+* the fast path happen most of the time.
+*/
+   len = kmalloc_size_roundup(len);
/*
 * First time the inline_buf does not suffice
 */
@@ -448,11 +453,7 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
if (!tmp_buf)
return -ENOMEM;
p->buf = tmp_buf;
-   /*
-* The real size of the buffer is bigger, this will let the fast path
-* happen most of the time
-*/
-   p->buf_len = ksize(p->buf);
+   p->buf_len = len;
 
if (p->reversed) {
tmp_buf = p->buf + old_buf_len - path_len - 1;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 03/16] skbuff: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the
coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain
back the __alloc_size() hints that were temporarily reverted in commit
93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")

Additionally tries to normalize size variables to u32 from int. Most
interfaces are using "int", but notably __alloc_skb uses unsigned int.

Also fix some reverse Christmas tree and comments while touching nearby
code.

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: net...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Nick Desaulniers 
Cc: David Rientjes 
Cc: Vlastimil Babka 
Signed-off-by: Kees Cook 
---
 include/linux/skbuff.h |  5 +---
 net/core/skbuff.c  | 64 +-
 2 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ca8afa382bf2..5a16177f38b5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1234,7 +1234,7 @@ void kfree_skb_partial(struct sk_buff *skb, bool 
head_stolen);
 bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
  bool *fragstolen, int *delta_truesize);
 
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int flags,
+struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t priority, int flags,
int node);
 struct sk_buff *__build_skb(void *data, unsigned int frag_size);
 struct sk_buff *build_skb(void *data, unsigned int frag_size);
@@ -1870,9 +1870,6 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t 
pri)
 
 /* This variant of skb_unclone() makes sure skb->truesize
  * and skb_end_offset() are not changed, whenever a new skb->head is needed.
- *
- * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X))
- * when various debugging features are in place.
  */
 int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri);
 static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 974e7138..0b30fbdbd0d0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -343,19 +343,23 @@ EXPORT_SYMBOL(napi_build_skb);
  * the caller if emergency pfmemalloc reserves are being used. If it is and
  * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
  * may be used. Otherwise, the packet data may be discarded until enough
- * memory is free
+ * memory is free.
  */
-static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+static void *kmalloc_reserve(u32 *size, gfp_t flags, int node,
 bool *pfmemalloc)
 {
void *obj;
bool ret_pfmemalloc = false;
 
+   /* kmalloc(size) might give us more room than requested, so
+* allocate the true bucket size up front.
+*/
+   *size = kmalloc_size_roundup(*size);
/*
 * Try a regular allocation, when that fails and we're not entitled
 * to the reserves, fail.
 */
-   obj = kmalloc_node_track_caller(size,
+   obj = kmalloc_node_track_caller(*size,
flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
node);
if (obj || !(gfp_pfmemalloc_allowed(flags)))
@@ -363,7 +367,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int 
node,
 
/* Try again but now we are using pfmemalloc reserves */
ret_pfmemalloc = true;
-   obj = kmalloc_node_track_caller(size, flags, node);
+   obj = kmalloc_node_track_caller(*size, flags, node);
 
 out:
if (pfmemalloc)
@@ -380,7 +384,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int 
node,
 
 /**
  * __alloc_skb -   allocate a network buffer
- * @size: size to allocate
+ * @bytes: minimum bytes to allocate
  * @gfp_mask: allocation mask
  * @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
  * instead of head cache and allocate a cloned (child) skb.
@@ -395,12 +399,12 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, 
int node,
  * Buffers may only be allocated from interrupts using a @gfp_mask of
  * %GFP_ATOMIC.
  */
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
+struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t gfp_mask,
int flags, int node)
 {
struct kmem_cache *cache;
struct sk_buff *skb;
-   unsigned int osize;
+   u32 size = bytes;
bool pfmemalloc;
u8 *data;
 
@@ -427,15 +431,13 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
gfp_mask,
   

[ovs-dev] [PATCH v2 04/16] skbuff: Phase out ksize() fallback for frag_size

2022-09-23 Thread Kees Cook
All callers of APIs that allowed a 0-sized frag_size appear to be
passing actual size information already, so this use of ksize() can
be removed. However, just in case there is something still depending
on this behavior, issue a WARN and fall back to as before to ksize()
which means we'll also potentially get KASAN warnings.

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: net...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 net/core/skbuff.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0b30fbdbd0d0..84ca89c781cd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void 
*data,
   unsigned int frag_size)
 {
struct skb_shared_info *shinfo;
-   unsigned int size = frag_size ? : ksize(data);
+   unsigned int size = frag_size;
+
+   /* All callers should be setting frag size now? */
+   if (WARN_ON_ONCE(size == 0))
+   size = ksize(data);
 
size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
@@ -220,12 +224,10 @@ static void __build_skb_around(struct sk_buff *skb, void 
*data,
 /**
  * __build_skb - build a network buffer
  * @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
  *
  * Allocate a new _buff. Caller provides space holding head and
- * skb_shared_info. @data must have been allocated by kmalloc() only if
- * @frag_size is 0, otherwise data should come from the page allocator
- *  or vmalloc()
+ * skb_shared_info.
  * The return is the new skb buffer.
  * On a failure the return is %NULL, and @data is not freed.
  * Notes :
@@ -272,7 +274,7 @@ EXPORT_SYMBOL(build_skb);
  * build_skb_around - build a network buffer around provided skb
  * @skb: sk_buff provide by caller, must be memset cleared
  * @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
  */
 struct sk_buff *build_skb_around(struct sk_buff *skb,
 void *data, unsigned int frag_size)
@@ -294,7 +296,7 @@ EXPORT_SYMBOL(build_skb_around);
 /**
  * __napi_build_skb - build a network buffer
  * @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
  *
  * Version of __build_skb() that uses NAPI percpu caches to obtain
  * skbuff_head instead of inplace allocation.
@@ -318,7 +320,7 @@ static struct sk_buff *__napi_build_skb(void *data, 
unsigned int frag_size)
 /**
  * napi_build_skb - build a network buffer
  * @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
  *
  * Version of __napi_build_skb() that takes care of skb->head_frag
  * and skb->pfmemalloc when the data is a page or page fragment.
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size

2022-09-23 Thread Kees Cook
In preparation for removing the "silently change allocation size"
users of ksize(), explicitly round up all q_vector allocations so that
allocations can be correctly compared to ksize().

Additionally fix potential use-after-free in the case of new allocation
failure: only free memory if the replacement allocation succeeds.

Cc: Jesse Brandeburg 
Cc: Tony Nguyen 
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Jakub Kicinski 
Cc: Paolo Abeni 
Cc: intel-wired-...@lists.osuosl.org
Cc: net...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 2796e81d2726..eb51e531c096 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter 
*adapter,
return -ENOMEM;
 
ring_count = txr_count + rxr_count;
-   size = struct_size(q_vector, ring, ring_count);
+   size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));
 
/* allocate q_vector and rings */
q_vector = adapter->q_vector[v_idx];
if (!q_vector) {
q_vector = kzalloc(size, GFP_KERNEL);
} else if (size > ksize(q_vector)) {
-   kfree_rcu(q_vector, rcu);
q_vector = kzalloc(size, GFP_KERNEL);
+   if (q_vector)
+   kfree_rcu(q_vector, rcu);
} else {
memset(q_vector, 0, size);
}
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 00/16] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread Kees Cook
Hi,

The main details on this series are in patch #2's commit log. It's long,
so I won't repeat it again here for the v2. As before, I've tried to
trim the CC list.

v2:
- _keep_ ksize(), but remove instrumentation (makes patch series smaller)
- reorganized skbuff logic to avoid yet more copy/paste code
- added a WARN to a separate skbuff ksize usage
- add new refactorings: bpf, openvswitch, devres, mempool, kasan
- dropped "independent" patches: iwlwifi, x86/microcode/AMD (sent separately)
v1: https://lore.kernel.org/lkml/20220922031013.2150682-1-keesc...@chromium.org

Notes:

Originally when I was going to entirely remove ksize(), there were a
handful for refactorings that just needed to do ksize -> __ksize. In
the end, it was cleaner to actually leave ksize() as a real function,
just without the kasan instrumentation. I wonder, however, if it should
be converted into a static inline now?

I dropped Jakub's Ack because I refactored that code a bunch more.

The 2 patches that didn't need to call kmalloc_size_roundup() don't need
to be part of this series. (One is already in -next, actually.)

I'd like to land at least the first two patches in the coming v6.1 merge
window so that the per-subsystem patches can be sent to their various
subsystems directly. Vlastimil, what you think?

Thanks!

-Kees


Kees Cook (16):
  slab: Remove __malloc attribute from realloc functions
  slab: Introduce kmalloc_size_roundup()
  skbuff: Proactively round up to kmalloc bucket size
  skbuff: Phase out ksize() fallback for frag_size
  net: ipa: Proactively round up to kmalloc bucket size
  igb: Proactively round up to kmalloc bucket size
  btrfs: send: Proactively round up to kmalloc bucket size
  dma-buf: Proactively round up to kmalloc bucket size
  coredump: Proactively round up to kmalloc bucket size
  openvswitch: Use kmalloc_size_roundup() to match ksize() usage
  bpf: Use kmalloc_size_roundup() to match ksize() usage
  devres: Use kmalloc_size_roundup() to match ksize() usage
  mempool: Use kmalloc_size_roundup() to match ksize() usage
  kasan: Remove ksize()-related tests
  mm: Make ksize() a reporting-only function
  slab: Restore __alloc_size attribute to __kmalloc_track_caller

 drivers/base/devres.c |  3 +
 drivers/dma-buf/dma-resv.c|  9 ++-
 drivers/net/ethernet/intel/igb/igb_main.c |  5 +-
 drivers/net/ipa/gsi_trans.c   |  7 +-
 fs/btrfs/send.c   | 11 +--
 fs/coredump.c |  7 +-
 include/linux/compiler_types.h| 13 ++--
 include/linux/skbuff.h|  5 +-
 include/linux/slab.h  | 46 +++--
 kernel/bpf/verifier.c | 49 +-
 lib/test_kasan.c  | 42 
 mm/kasan/shadow.c |  4 +-
 mm/mempool.c  |  2 +-
 mm/slab.c |  9 ++-
 mm/slab_common.c  | 62 ++---
 net/core/skbuff.c | 82 ---
 net/openvswitch/flow_netlink.c|  2 +-
 17 files changed, 192 insertions(+), 166 deletions(-)

-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread Kees Cook
In the effort to help the compiler reason about buffer sizes, the
__alloc_size attribute was added to allocators. This improves the scope
of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
as the vast majority of callers are not expecting to use more memory
than what they asked for.

There is, however, one common exception to this: anticipatory resizing
of kmalloc allocations. These cases all use ksize() to determine the
actual bucket size of a given allocation (e.g. 128 when 126 was asked
for). This comes in two styles in the kernel:

1) An allocation has been determined to be too small, and needs to be
   resized. Instead of the caller choosing its own next best size, it
   wants to minimize the number of calls to krealloc(), so it just uses
   ksize() plus some additional bytes, forcing the realloc into the next
   bucket size, from which it can learn how large it is now. For example:

data = krealloc(data, ksize(data) + 1, gfp);
data_len = ksize(data);

2) The minimum size of an allocation is calculated, but since it may
   grow in the future, just use all the space available in the chosen
   bucket immediately, to avoid needing to reallocate later. A good
   example of this is skbuff's allocators:

data = kmalloc_reserve(size, gfp_mask, node, );
...
/* kmalloc(size) might give us more room than requested.
 * Put skb_shared_info exactly at the end of allocated zone,
 * to allow max possible filling before reallocation.
 */
osize = ksize(data);
size = SKB_WITH_OVERHEAD(osize);

In both cases, the "how much was actually allocated?" question is answered
_after_ the allocation, where the compiler hinting is not in an easy place
to make the association any more. This mismatch between the compiler's
view of the buffer length and the code's intention about how much it is
going to actually use has already caused problems[1]. It is possible to
fix this by reordering the use of the "actual size" information.

We can serve the needs of users of ksize() and still have accurate buffer
length hinting for the compiler by doing the bucket size calculation
_before_ the allocation. Code can instead ask "how large an allocation
would I get for a given size?".

Introduce kmalloc_size_roundup(), to serve this function so we can start
replacing the "anticipatory resizing" uses of ksize().

[1] https://github.com/ClangBuiltLinux/linux/issues/1599
https://github.com/KSPP/linux/issues/183

Cc: Vlastimil Babka 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
 include/linux/slab.h | 31 +++
 mm/slab.c|  9 ++---
 mm/slab_common.c | 20 
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41bd036e7551..727640173568 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t 
new_size, gfp_t flags) __r
 void kfree(const void *objp);
 void kfree_sensitive(const void *objp);
 size_t __ksize(const void *objp);
+
+/**
+ * ksize - Report actual allocation size of associated object
+ *
+ * @objp: Pointer returned from a prior kmalloc()-family allocation.
+ *
+ * This should not be used for writing beyond the originally requested
+ * allocation size. Either use krealloc() or round up the allocation size
+ * with kmalloc_size_roundup() prior to allocation. If this is used to
+ * access beyond the originally requested allocation size, UBSAN_BOUNDS
+ * and/or FORTIFY_SOURCE may trip, since they only know about the
+ * originally allocated size via the __alloc_size attribute.
+ */
 size_t ksize(const void *objp);
+
 #ifdef CONFIG_PRINTK
 bool kmem_valid_obj(void *object);
 void kmem_dump_obj(void *object);
@@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
 extern void kvfree_sensitive(const void *addr, size_t len);
 
 unsigned int kmem_cache_size(struct kmem_cache *s);
+
+/**
+ * kmalloc_size_roundup - Report allocation bucket size for the given size
+ *
+ * @size: Number of bytes to round up from.
+ *
+ * This returns the number of bytes that would be available in a kmalloc()
+ * allocation of @size bytes. For example, a 126 byte request would be
+ * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
+ * for the general-purpose kmalloc()-based allocations, and is not for the
+ * pre-sized kmem_cache_alloc()-based allocations.)
+ *
+ * Use this to kmalloc() the full bucket size ahead of time instead of using
+ * ksize() to query the size after an allocation.
+ */
+size_t kmalloc_size_roundup(size_t size);
+
 void __init kmem_cache_init_late(void);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
diff --git a/mm/slab.c b/mm/slab.c
index 

[ovs-dev] [PATCH v2 01/16] slab: Remove __malloc attribute from realloc functions

2022-09-23 Thread Kees Cook
The __malloc attribute should not be applied to "realloc" functions, as
the returned pointer may alias the storage of the prior pointer. Instead
of splitting __malloc from __alloc_size, which would be a huge amount of
churn, just create __realloc_size for the few cases where it is needed.

Additionally removes the conditional test for __alloc_size__, which is
always defined now.

Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Roman Gushchin 
Cc: Hyeonggon Yoo <42.hye...@gmail.com>
Cc: Marco Elver 
Cc: linux...@kvack.org
Signed-off-by: Kees Cook 
---
 include/linux/compiler_types.h | 13 +
 include/linux/slab.h   | 12 ++--
 mm/slab_common.c   |  4 ++--
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 4f2a819fd60a..f141a6f6b9f6 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -271,15 +271,12 @@ struct ftrace_likely_data {
 
 /*
  * Any place that could be marked with the "alloc_size" attribute is also
- * a place to be marked with the "malloc" attribute. Do this as part of the
- * __alloc_size macro to avoid redundant attributes and to avoid missing a
- * __malloc marking.
+ * a place to be marked with the "malloc" attribute, except those that may
+ * be performing a _reallocation_, as that may alias the existing pointer.
+ * For these, use __realloc_size().
  */
-#ifdef __alloc_size__
-# define __alloc_size(x, ...)  __alloc_size__(x, ## __VA_ARGS__) __malloc
-#else
-# define __alloc_size(x, ...)  __malloc
-#endif
+#define __alloc_size(x, ...)   __alloc_size__(x, ## __VA_ARGS__) __malloc
+#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
 
 #ifndef asm_volatile_goto
 #define asm_volatile_goto(x...) asm goto(x)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..41bd036e7551 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s);
 /*
  * Common kmalloc functions provided by all allocators
  */
-void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) 
__alloc_size(2);
+void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) 
__realloc_size(2);
 void kfree(const void *objp);
 void kfree_sensitive(const void *objp);
 size_t __ksize(const void *objp);
@@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void 
*kmalloc_array(size_t n, size_t size, gfp_
  * @new_size: new size of a single member of the array
  * @flags: the type of memory to allocate (see kmalloc)
  */
-static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p,
-   size_t 
new_n,
-   size_t 
new_size,
-   gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
+ size_t 
new_n,
+ size_t 
new_size,
+ gfp_t 
flags)
 {
size_t bytes;
 
@@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, 
size_t size, gfp_t fla
 }
 
 extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t 
flags)
- __alloc_size(3);
+ __realloc_size(3);
 extern void kvfree(const void *addr);
 extern void kvfree_sensitive(const void *addr, size_t len);
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 17996649cfe3..457671ace7eb 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1134,8 +1134,8 @@ module_init(slab_proc_init);
 
 #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
 
-static __always_inline void *__do_krealloc(const void *p, size_t new_size,
-  gfp_t flags)
+static __always_inline __realloc_size(2) void *
+__do_krealloc(const void *p, size_t new_size, gfp_t flags)
 {
void *ret;
size_t ks;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 01/12] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread Kees Cook
On Fri, Sep 23, 2022 at 09:17:25AM +0800, Feng Tang wrote:
> On Thu, Sep 22, 2022 at 07:12:21PM +0800, Hyeonggon Yoo wrote:
> > On Wed, Sep 21, 2022 at 08:10:02PM -0700, Kees Cook wrote:
> > > [...]
> > > Introduce kmalloc_size_roundup(), to serve this function so we can start
> > > replacing the "anticipatory resizing" uses of ksize().
> > [...]
> >
> > This looks okay.
> > [...]
> > Cc-ing Feng Tang who may welcome this series ;)
>  
> Indeed! This will help our work of extending slub redzone check,
> as we also ran into some trouble with ksize() users when extending
> the redzone support to this extra allocated space than requested
> size [1], and have to disable the redzone sanity for all ksize()
> users [2].
> 
> [1]. 
> https://lore.kernel.org/lkml/20220719134503.ga56...@shbuild999.sh.intel.com/
> [2]. https://lore.kernel.org/lkml/20220913065423.520159-5-feng.t...@intel.com/

Thanks for the feedback! I'll send my v2 series -- I'm hoping at least
this patch can land in v6.1 so the various other patches would be clear
to land via their separate trees, etc.

-- 
Kees Cook
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH ovn 0/5] Add OVN component templates.

2022-09-23 Thread Han Zhou
On Fri, Sep 23, 2022 at 8:10 AM Dumitru Ceara  wrote:
>
> On 9/22/22 19:55, Han Zhou wrote:
> > On Thu, Sep 22, 2022 at 10:38 AM Han Zhou  wrote:
> >>
> >>
> >>
> >> On Thu, Sep 22, 2022 at 1:00 AM Dumitru Ceara 
wrote:
> >>>
> >>> Hi Han,
> >>>
> >>> On 9/21/22 23:06, Han Zhou wrote:
>  Thanks Dumitru for this promising optimization!
> 
> >>>
> >>> Thanks for checking it out!
> >>>
>  On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara 
> > wrote:
> >
> > On 8/10/22 19:54, Mark Michelson wrote:
> >> Hi Dumitru,
> >>
> >
> > Hi Mark,
> >
> >> I read the patch series, and I think the idea of chassis-specific
> >> variables is a good idea to reduce the number of DB records for
> > certain
> >> things. Aside from load balancers, I suspect this could have a
> > positive
> >> impact for other structures as well.
> >>
> >
> > Thanks for taking a look!  Yes, I think this might be applicable to
> > other structures too.
> >
> 
>  I think it is a good idea to make it more generic, but for my
> > understanding
>  this template concept is for anything that's "node/chassis" specific,
> > and
>  supposed to be instantiated at chassis level. Probably we should name
> > the
>  DB tables as something like chassis_template_var.
> 
> >>>
> >>> If we decide to go for a simple "chassis" column (instead of a generic
> >>> "match" column) then naming the table Chassis_Template_Var makes sense
> >>> indeed.
> >>>
> >> Rather than criticizing the individual lines of code, I'll focus
> > instead
> >> on some higher-level questions/ideas.
> >>
> >
> > Sure, thanks! :)
> >
> >> First, one question I had was what happens when a template variable
> > name
> >> is used in a load balancer, but there is no appropriate value to
> >> substitute? For instance, what if a load balancer applies to
> > chassis-3,
> >> but you only have template variables for chassis-1 and chassis-2?
> > This
> >> might be addressed in the code but I didn't notice if it was.
> >>
> >
> > There are actually two things to consider here:
> >
> > 1. there might be a logical flow that uses a template variable: in
> > this
> > case if template expansion/instantiation fails we currently leave
the
> > token untouched (e.g., '^variable' stays '^variable').  That will
> > cause
> > the flow action/match parsing to fail and currently logs a warning.
> > The
> > flow itself is skipped, as it should be.  We probably need to avoid
> > logging a warning though.
> >
> > 2. like you pointed out, there might be a load balancer using
> > templates
> > in its backends/vips: if some of those templates cannot be
> > instantiated
> > locally the backend/vip where they're added is skipped.  Unless I
> > missed
> > something, the code should already do that.
> >
> >> Second, it seems like template variables are a natural extension of
> >> existing concepts like address sets and port groups. In those
cases,
> >> they were an unconditional collection of IP addresses or ports. For
> >
> > You're right to some extent template variables are similar to port
> > groups.  The southbound database port group table splits the
> > northbound
> > port group per datapath though not per chassis like template
> > variables.
> >
> >> template variables, they're a collection of untyped values with the
> >> condition of only applying on certain Chassis. I wonder if this
> > could
> >> all be reconciled with a single table that uses untyped values with
> >> user-specified conditions. Right now template variables have a
> > "Chassis"
> >> column, but maybe this could be replaced with a broader
"condition",
> >> "when", or "match" column. To get something integrated quickly,
this
> >> column could just accept the syntax of "chassis.name == " or
> >> "chassis.uuid == " to allow for chassis-specific application
> > of
> >> the values. With this foundation, we could eventually allow
> >> unconditional application of the value, or more complex conditions
> > (e.g.
> >> only apply to logical switch ports that are connected to a router
> > with a
> >> distributed gateway port). Doing this, we could deprecate address
> > sets
> >> and port groups eventually in favor of template variables.
> >
> > This sounds like a good idea to me.  I wasn't too happy with the
> > "chassis" string column of the Template_Var table anyway.  A generic
> > condition field makes more sense.
> >
>  If it is chassis-specific template, a column "chassis" seems to be
>  straightforward. With a "match" column it is another burden of
parsing
> >>>
> >>> I have a generic implementation (with a "predicate" column) almost
ready
> >>> for review.  I agree it's a bit more work to parse and maintain
> >>> references.  I think it's probably 

Re: [ovs-dev] [来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread . 贺鹏
Hi, Zhike,


First I am trying to explain why this fix is not fixing it.

This lock-and-lookup code is actually for the case when all megaflows are
not per-PMD stored according to the ovs commit history.
So it's necessary to check if the megaflow is installed by another PMD.

Now megaflows are per-PMD stored and so the reason for megaflow insertion
can only be upcalls from this PMD,
except you are doing dpctl/flow-add to insert a megaflow manually, but I
guess this is not the case.

So, this code is mostly dead code, meaning that it's mostly impossible for
the code to run into this place in the real environment.

Revalidators only modify and delete megaflows. it will not add one.

Upcalls only happends when the megaflow do not exist in the datapath. We
can not expect a megaflow with drop action already resides in the cmap,
while upcall still happens, brings the correct actions, however does not
install the megaflow but reuse the old one.

it's just impossible.

Then my new suspect is that perhaps the ukey is now with drop action and
megaflow is also with drop action.

And now the revalidator tries to modify the actions, but somehow the
modifying action fails.
However, this failure is ignored. So the inconsistency now exists

So my first patch actually fixes it...
"[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops"



On Fri, Sep 23, 2022 at 9:39 PM 王志克  wrote:

> Hi Peng,
>
>
>
> Right, I also met this issue, and wondering the sequence for this
> inconsistence, and would like to hear your “new cause”.
>
>
>
> Anyway I believe your below patch should fix this issue.
>
> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
>
>
>
> Br,
>
> Zhike
>
>
>
> *From: *".贺鹏" 
> *Date: *Friday, September 23, 2022 at 8:59 PM
> *To: *王志克 
> *Cc: *"ovs-dev@openvswitch.org" , "
> d...@openvswitch.org" 
> *Subject: *[来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4]
> dpif-netdev: fix inconsistent processing between ukey and megaflow
>
>
>
> 京东安全提示:此封邮件来自公司外部,除非您能判断发件人和知道邮件内容安全,否则请勿打开链接或者附件。
> * JD Security Tips: Please do not click on links or open attachments
> unless you trust the sender and know the content is safe.*
>
>
>
> Hi, Zhike,
>
>
>
> After receiving your email, I was becoming curious about this code and did
> more investigation on it.
>
>
>
> and I found some problems with the code and now I believe this
> inconsistent processing is NOT the root cause for the inconsistent actions
> between ukey and datapath.
>
> and I found a new cause for that, but due to this complex race between PMD
> and revalidator, I wish this time I am right.
>
>
>
> But before that, why are you interested in this patch? Have you found the
> same issue in your environment?
>
>
>
>
>
>
>
>
>
> On Thu, Sep 22, 2022 at 6:54 PM .贺鹏  wrote:
>
> Hi, zhike,
>
>
>
> It's difficult to give a very clear sequences about how this inconsistency
> happens, but I can give you more details.
>
>
>
> This is observed in our production environment. The correct megaflow
> should encap packets with vxlan header and send out, but the action is drop.
>
> This is usually because the neigh info is not available at the moment when
> the upcall happens.
>
>
>
> Normally, the drop action is ephemeral, and reavalidator will later modify
> the megaflow's action into the tnl_push.
>
>
>
> But there are a few of cases, only happened 1~2 times in a year, where the
> drop actions will never be replaced by tnl_push.
>
>
>
> just like in the commits mentioned,
>
>
>
> "The coverage command shows revalidators have dumped several times,
>
> however the correct actions are not set. This implies that the ukey's
>
> action does not equal to the meagaflow's, i.e. revalidators think the
> underlying
>
> megaflow's actions are correct however they are not."
>
>
>
> I do not know how this happened, but I do think this inconsistent processing 
> could be one of the reasons.
>
> Even there is no such bug, I think keeping processing inconsistent is 
> necessary.
>
>
>
>
>
>
>
>
>
> On Wed, Sep 21, 2022 at 5:57 PM 王志克  wrote:
>
> Hi Hepeng,
>
>
>
> Can you please explain the sequence that how this inconsistence could
> happen? Why you believe the current actions in existing netdev_flow is old?
>
>
>
> Thanks.
>
>
>
> Br,
>
> wangzhike
>
>
>
>
>
>
>
>
>
>
> *
>
> [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between
> ukey and megaflow
>
> Message ID
>
> 20220604151857.66550-4-hepeng.0...@bytedance.com
>
> State
>
> New
>
> Headers
>
> show
>
> Series
>
> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
> |
> expand
> Checks
>
> Context
>
> Check
>
> Description
>
> ovsrobot/apply-robot
>
> *warning*
>
> apply and check: warning
> 
>
> 

Re: [ovs-dev] [PATCH ovn 3/3] inc-proc-eng: Rename the 'clear_tracked_data' callback to 'init_run'.

2022-09-23 Thread Han Zhou
On Fri, Sep 23, 2022 at 1:42 AM Dumitru Ceara  wrote:
>
> On 9/23/22 01:07, Han Zhou wrote:
> > On Wed, Sep 14, 2022 at 6:10 AM Dumitru Ceara  wrote:
> >>
> >> This is actually more in line with how the callback is used.  It's
called
> >> every the incremental engine preparese for the next engine run.
> >>
> >> Signed-off-by: Dumitru Ceara 
> >
> > Thanks Dumtru. The name looks good to me, but why does the new function
> > require both the node and node->data as parameters?
> >
>
> Thanks for the review!  Considering that this is an initialization
> function that runs before every engine run for every node, users might
> find it interesting to do other things too.  For example, getting some
> OVSDB indexes from input nodes.
>
> This is an example from the not yet submitted components template code:
>
> static void
> en_template_vars_init_run(struct engine_node *node, void *data)
> {
> struct ed_type_template_vars *tv_data = data;
>
> tv_data->sbrec_template_var_table =
> EN_OVSDB_GET(engine_get_input("SB_template_var", node));
> tv_data->ovsrec_ovs_table =
> EN_OVSDB_GET(engine_get_input("OVS_open_vswitch", node));
> tv_data->sbrec_port_binding_by_name =
> engine_ovsdb_node_get_index(engine_get_input("SB_port_binding",
node),
> "name");
> tv_data->sbrec_chassis_by_name =
> engine_ovsdb_node_get_index(engine_get_input("SB_chassis", node),
> "name");
>
> sset_clear(_data->new);
> sset_clear(_data->deleted);
> sset_clear(_data->updated);
> tv_data->change_tracked = false;
> }
>

I don't quite understand this example. It seems ed_type_template_vars
stores some of its input to its own data, but could you explain why? These
members should belong to the input nodes, and they can always be accessed
in the run() or handler functions.  If it requires more code to explain,
I'd suggest including this as part of your *template* series so that it is
easier to review together.

> >> ---
> >>  controller/ovn-controller.c |   41
> > -
> >>  lib/inc-proc-eng.c  |   19 +++
> >>  lib/inc-proc-eng.h  |   19 ++-
> >>  3 files changed, 41 insertions(+), 38 deletions(-)
> >>
> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> index 18a01bbab..f26d6a9e0 100644
> >> --- a/controller/ovn-controller.c
> >> +++ b/controller/ovn-controller.c
> >> @@ -1058,7 +1058,7 @@ en_ofctrl_is_connected_run(struct engine_node
> > *node, void *data)
> >>   *processing to OVS_interface changes but simply mark the node
> > status as
> >>   *UPDATED (and so the run() and the change handler is the same).
> >>   * 2. The iface_table_external_ids_old is computed/updated in the
member
> >> - *clear_tracked_data(), because that is when the last round of
> > processing
> >> + *init_run(), because that is when the last round of processing
> >>   *has completed but the new IDL data is yet to refresh, so we
> > replace the
> >>   *old data with the current data. */
> >>  struct ed_type_ovs_interface_shadow {
> >> @@ -1096,7 +1096,8 @@ en_ovs_interface_shadow_cleanup(void *data_)
> >>  }
> >>
> >>  static void
> >> -en_ovs_interface_shadow_clear_tracked_data(void *data_)
> >> +en_ovs_interface_shadow_init_run(struct engine_node *node OVS_UNUSED,
> >> + void *data_)
> >>  {
> >>  struct ed_type_ovs_interface_shadow *data = data_;
> >>
> >
 iface_table_external_ids_old_destroy(>iface_table_external_ids_old);
> >> @@ -1163,7 +1164,7 @@ en_activated_ports_cleanup(void *data_)
> >>  }
> >>
> >>  static void
> >> -en_activated_ports_clear_tracked_data(void *data)
> >> +en_activated_ports_init_run(struct engine_node *node OVS_UNUSED, void
> > *data)
> >>  {
> >>  en_activated_ports_cleanup(data);
> >>  }
> >> @@ -1311,7 +1312,7 @@ struct ed_type_runtime_data {
> >>   */
> >>
> >>  static void
> >> -en_runtime_data_clear_tracked_data(void *data_)
> >> +en_runtime_data_init_run(struct engine_node *node OVS_UNUSED, void
> > *data_)
> >>  {
> >>  struct ed_type_runtime_data *data = data_;
> >>
> >> @@ -1669,14 +1670,14 @@ en_addr_sets_init(struct engine_node *node
> > OVS_UNUSED,
> >>  }
> >>
> >>  static void
> >> -en_addr_sets_clear_tracked_data(void *data)
> >> +en_addr_sets_init_run(struct engine_node *node OVS_UNUSED, void *data)
> >>  {
> >>  struct ed_type_addr_sets *as = data;
> >>  sset_clear(>new);
> >>  sset_clear(>deleted);
> >> -struct shash_node *node;
> >> -SHASH_FOR_EACH_SAFE (node, >updated) {
> >> -struct addr_set_diff *asd = node->data;
> >> +struct shash_node *as_node;
> >> +SHASH_FOR_EACH_SAFE (as_node, >updated) {
> >> +struct addr_set_diff *asd = as_node->data;
> >>  expr_constant_set_destroy(asd->added);
> >>  free(asd->added);
> >>  

Re: [ovs-dev] [v2] odp-execute: Add ISA implementation of set_masked IPv6 action

2022-09-23 Thread Finn, Emma



> -Original Message-
> From: David Marchand 
> Sent: Wednesday 21 September 2022 11:26
> To: Finn, Emma 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v2] odp-execute: Add ISA implementation of
> set_masked IPv6 action
> 
> On Tue, Sep 20, 2022 at 3:19 PM Emma Finn  wrote:
> >
> > This commit adds support for the AVX512 implementation of the
> > ipv6_set_addrs action as well as an AVX512 implementation of updating
> > the L4 checksums.
> >
> > Signed-off-by: Emma Finn 
> >
> > ---
> > v2:
> >   - Added check for availbility of s6_addr32 field of struct in6_addr.
> >   - Fixed network headers for freebsd builds.
> > ---
> > ---
> >  lib/odp-execute-avx512.c | 172
> > +++
> >  1 file changed, 172 insertions(+)
> >
> > diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index
> > 6c7713251..58d1746c9 100644
> > --- a/lib/odp-execute-avx512.c
> > +++ b/lib/odp-execute-avx512.c
> > @@ -20,6 +20,9 @@
> >
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> >
> >  #include "csum.h"
> >  #include "dp-packet.h"
> > @@ -483,6 +486,172 @@ action_avx512_ipv4_set_addrs(struct
> dp_packet_batch *batch,
> >  }
> >  }
> >
> > +#if HAVE_AVX512VBMI
> > +static inline uint16_t ALWAYS_INLINE
> > +__attribute__((__target__("avx512vbmi")))
> > +avx512_ipv6_get_delta(__m512i ip6_header) {
> > +__m256i v_zeros = _mm256_setzero_si256();
> > +__m512i v_shuf_src_dst = _mm512_setr_epi64(0x01, 0x02, 0x03, 0x04,
> > +   0xFF, 0xFF, 0xFF,
> > +0xFF);
> > +
> > +__m512i v_header = _mm512_permutexvar_epi64(v_shuf_src_dst,
> ip6_header);
> > +__m256i v_ip6_src_dst =  _mm512_extracti64x4_epi64(v_header, 0);
> > +/* These two shuffle masks, v_swap16a and v_swap16b, are to shuffle
> the
> > + * src and dst fields and add padding after each 16-bit value for the
> > + * following carry over addition. */
> > +__m256i v_swap16a = _mm256_setr_epi16(0x0100, 0x, 0x0302,
> 0x,
> > +  0x0504, 0x, 0x0706, 0x,
> > +  0x0100, 0x, 0x0302, 0x,
> > +  0x0504, 0x, 0x0706, 0x);
> > +__m256i v_swap16b = _mm256_setr_epi16(0x0908, 0x, 0x0B0A,
> 0x,
> > +  0x0D0C, 0x, 0x0F0E, 0x,
> > +  0x0908, 0x, 0x0B0A, 0x,
> > +  0x0D0C, 0x, 0x0F0E, 0x);
> > +__m256i v_shuf_old1 = _mm256_shuffle_epi8(v_ip6_src_dst,
> v_swap16a);
> > +__m256i v_shuf_old2 = _mm256_shuffle_epi8(v_ip6_src_dst,
> > + v_swap16b);
> > +
> > +/* Add each part of the old and new headers together. */
> > +__m256i v_delta = _mm256_add_epi32(v_shuf_old1, v_shuf_old2);
> > +
> > +/* Perform horizontal add to go from 8x32-bits to 2x32-bits. */
> > +v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
> > +v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
> > +
> > +/* Shuffle 32-bit value from 3rd lane into first lane for final
> > + * horizontal add. */
> > +__m256i v_swap32a = _mm256_setr_epi32(0x0, 0x4, 0xF, 0xF,
> > +  0xF, 0xF, 0xF, 0xF);
> > +v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
> > +
> > +v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
> > +v_delta = _mm256_hadd_epi16(v_delta, v_zeros);
> > +
> > +/* Extract delta value. */
> > +return _mm256_extract_epi16(v_delta, 0); }
> > +
> > +static inline uint16_t ALWAYS_INLINE
> > +__attribute__((__target__("avx512vbmi")))
> > +avx512_ipv6_addr_csum_delta(__m512i old_header, __m512i
> new_header) {
> > +uint16_t delta;
> > +uint16_t old_delta = avx512_ipv6_get_delta(old_header);
> > +uint16_t new_delta = avx512_ipv6_get_delta(new_header);
> > +old_delta = ~old_delta;
> > +uint32_t csum_delta = old_delta + new_delta;
> > +delta = csum_finish(csum_delta);
> > +
> > +return ~delta;
> > +}
> > +
> > +/* This function performs the same operation on each packet in the
> > +batch as
> > + * the scalar odp_set_ipv6() function. */ static void
> > +__attribute__((__target__("avx512vbmi")))
> > +action_avx512_ipv6_set_addrs(struct dp_packet_batch *batch,
> > + const struct nlattr *a) {
> > +const struct ovs_key_ipv6 *key, *mask;
> > +struct dp_packet *packet;
> > +a = nl_attr_get(a);
> > +key = nl_attr_get(a);
> > +mask = odp_get_key_mask(a, struct ovs_key_ipv6);
> > +
> > +/* Read the content of the key and mask in the respective registers. We
> > + * only load the size of the actual structure, which is only 40 bytes. 
> > */
> > +__m512i v_key = _mm512_maskz_loadu_epi64(0x1F, (void *) key);
> > +__m512i v_mask = _mm512_maskz_loadu_epi64(0x1F, (void *) mask);
> > +
> > +/* 

Re: [ovs-dev] [RFC PATCH ovn 0/5] Add OVN component templates.

2022-09-23 Thread Dumitru Ceara
On 9/22/22 19:55, Han Zhou wrote:
> On Thu, Sep 22, 2022 at 10:38 AM Han Zhou  wrote:
>>
>>
>>
>> On Thu, Sep 22, 2022 at 1:00 AM Dumitru Ceara  wrote:
>>>
>>> Hi Han,
>>>
>>> On 9/21/22 23:06, Han Zhou wrote:
 Thanks Dumitru for this promising optimization!

>>>
>>> Thanks for checking it out!
>>>
 On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara 
> wrote:
>
> On 8/10/22 19:54, Mark Michelson wrote:
>> Hi Dumitru,
>>
>
> Hi Mark,
>
>> I read the patch series, and I think the idea of chassis-specific
>> variables is a good idea to reduce the number of DB records for
> certain
>> things. Aside from load balancers, I suspect this could have a
> positive
>> impact for other structures as well.
>>
>
> Thanks for taking a look!  Yes, I think this might be applicable to
> other structures too.
>

 I think it is a good idea to make it more generic, but for my
> understanding
 this template concept is for anything that's "node/chassis" specific,
> and
 supposed to be instantiated at chassis level. Probably we should name
> the
 DB tables as something like chassis_template_var.

>>>
>>> If we decide to go for a simple "chassis" column (instead of a generic
>>> "match" column) then naming the table Chassis_Template_Var makes sense
>>> indeed.
>>>
>> Rather than criticizing the individual lines of code, I'll focus
> instead
>> on some higher-level questions/ideas.
>>
>
> Sure, thanks! :)
>
>> First, one question I had was what happens when a template variable
> name
>> is used in a load balancer, but there is no appropriate value to
>> substitute? For instance, what if a load balancer applies to
> chassis-3,
>> but you only have template variables for chassis-1 and chassis-2?
> This
>> might be addressed in the code but I didn't notice if it was.
>>
>
> There are actually two things to consider here:
>
> 1. there might be a logical flow that uses a template variable: in
> this
> case if template expansion/instantiation fails we currently leave the
> token untouched (e.g., '^variable' stays '^variable').  That will
> cause
> the flow action/match parsing to fail and currently logs a warning.
> The
> flow itself is skipped, as it should be.  We probably need to avoid
> logging a warning though.
>
> 2. like you pointed out, there might be a load balancer using
> templates
> in its backends/vips: if some of those templates cannot be
> instantiated
> locally the backend/vip where they're added is skipped.  Unless I
> missed
> something, the code should already do that.
>
>> Second, it seems like template variables are a natural extension of
>> existing concepts like address sets and port groups. In those cases,
>> they were an unconditional collection of IP addresses or ports. For
>
> You're right to some extent template variables are similar to port
> groups.  The southbound database port group table splits the
> northbound
> port group per datapath though not per chassis like template
> variables.
>
>> template variables, they're a collection of untyped values with the
>> condition of only applying on certain Chassis. I wonder if this
> could
>> all be reconciled with a single table that uses untyped values with
>> user-specified conditions. Right now template variables have a
> "Chassis"
>> column, but maybe this could be replaced with a broader "condition",
>> "when", or "match" column. To get something integrated quickly, this
>> column could just accept the syntax of "chassis.name == " or
>> "chassis.uuid == " to allow for chassis-specific application
> of
>> the values. With this foundation, we could eventually allow
>> unconditional application of the value, or more complex conditions
> (e.g.
>> only apply to logical switch ports that are connected to a router
> with a
>> distributed gateway port). Doing this, we could deprecate address
> sets
>> and port groups eventually in favor of template variables.
>
> This sounds like a good idea to me.  I wasn't too happy with the
> "chassis" string column of the Template_Var table anyway.  A generic
> condition field makes more sense.
>
 If it is chassis-specific template, a column "chassis" seems to be
 straightforward. With a "match" column it is another burden of parsing
>>>
>>> I have a generic implementation (with a "predicate" column) almost ready
>>> for review.  I agree it's a bit more work to parse and maintain
>>> references.  I think it's probably best to discuss these details once I
>>> post v1.  It's no problem for me to go back to the "chassis" column
>>> version if we decide to use that approach.
>>>
 (which is costly and error prone). In addition, the LB object (or
> other
 structures) is not a logical-flow, and it doesn't directly map 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Optimize datapath action set by removing last clone action.

2022-09-23 Thread Marcelo Ricardo Leitner
Hi,

On Mon, Sep 12, 2022 at 05:11:32PM +0200, Ilya Maximets wrote:
> On 8/29/22 12:04, Roi Dayan via dev wrote:
...
> > Hi,
> >
> > we are also running some ovs/ovn tests with this patch.
> > we didn't encounter breaking issues with current tests we have.
> >
> > Thanks,
> > Roi
>
> Thanks, Roi, Ales and Eelco!  Applied to master and 3.0.
> It doesn't apply to earlier branches due to missing test
> fixes from commit:
>   c8bff848aeca ("ofproto-dpif-xlate: No clone when tunnel push is last 
> action.")
> But I'm not sure if we actually need to backport further.
> Let me know if it is needed.

To 2.17 is actually very welcomed. Is it feasible?

Best,
Marcelo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [branch-2.16, v2] dpdk: Use DPDK 20.11.6 release.

2022-09-23 Thread David Marchand
On Fri, Sep 23, 2022 at 2:43 PM Kevin Traynor  wrote:
>
> On 22/09/2022 13:40, Michael Phelan wrote:
> > Update OVS CLI and relevant documentation to use DPDK 20.11.6.
> >
> > A bug was introduced in DPDK 20.11.5 by the commit 33f2e3756186 ("vhost: 
> > fix unsafe vring addresses modifications").
> > This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation 
> > of the virtqueues happen.
> > A fix [1] has been posted and pushed to the DPDK 20.11 branch.
> > If a user wishes to avoid the issue then it is recommended to use DPDK 
> > 20.11.4 until the release of DPDK 20.11.7.
> > It should be noted that DPDK 20.11.4 does not benefit from the numerous bug 
> > fixes addressed since its release.
> > If a user wishes to benefit from these fixes it is recommended to use DPDK 
> > 20.11.6.
> >
> > [1] 
> > https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/
> > Signed-off-by: Michael Phelan 
> >
>
> For branches 2.15 [0] and 2.16 [1] I ran github actions and it failed.
> For 2.16 branch I removed the patch and it passed [2]. It seems like
> that the meson used (0.47.1 - which is min version for 20.11) does not
> like the 20.11.5/6 package, or there is some other github effect. It is
> working fine with 20.11.4.
>
> Afterwards, checking the ovs-build mailing [4] list I also see failures
> here and an additional failure for 2.17 branch. So all these failures
> need to checked.
>
> [1] https://github.com/kevintraynor/ovs/actions/runs/3111862351
> [2] https://github.com/kevintraynor/ovs/actions/runs/3111865180
> [3] https://github.com/kevintraynor/ovs/actions/runs/3112089634
> [4]
> https://mail.openvswitch.org/pipermail/ovs-build/2022-September/date.html

This looks like a regression in 20.11 LTS with older meson.
Adding 20.11 LTS maintainers to the thread.

Afaics, this is triggered by "build: fix warnings when running
external commands".
And reverting it is enough to fix the error with meson 0.47.1.
https://github.com/david-marchand/dpdk/commits/20.11
https://github.com/david-marchand/dpdk/actions/runs/3113099408


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 net-next 2/2] net: openvswitch: allow conntrack in non-initial user namespace

2022-09-23 Thread Michael Weiß
Similar to the previous commit, the Netlink interface of the OVS
conntrack module was restricted to global CAP_NET_ADMIN by using
GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support
unprivileged containers in non-initial user namespace.

Signed-off-by: Michael Weiß 
---
 net/openvswitch/conntrack.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 48e8f5c29b67..cb255d8ed99a 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1982,7 +1982,8 @@ static int ovs_ct_limit_set_zone_limit(struct nlattr 
*nla_zone_limit,
} else {
struct ovs_ct_limit *ct_limit;
 
-   ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);
+   ct_limit = kmalloc(sizeof(*ct_limit),
+  GFP_KERNEL_ACCOUNT);
if (!ct_limit)
return -ENOMEM;
 
@@ -2252,14 +2253,16 @@ static int ovs_ct_limit_cmd_get(struct sk_buff *skb, 
struct genl_info *info)
 static const struct genl_small_ops ct_limit_genl_ops[] = {
{ .cmd = OVS_CT_LIMIT_CMD_SET,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
-  * privilege. */
+   .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+  * privilege.
+  */
.doit = ovs_ct_limit_cmd_set,
},
{ .cmd = OVS_CT_LIMIT_CMD_DEL,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
-  * privilege. */
+   .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+  * privilege.
+  */
.doit = ovs_ct_limit_cmd_del,
},
{ .cmd = OVS_CT_LIMIT_CMD_GET,
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 net-next 1/2] net: openvswitch: allow metering in non-initial user namespace

2022-09-23 Thread Michael Weiß
The Netlink interface for metering was restricted to global CAP_NET_ADMIN
by using GENL_ADMIN_PERM. To allow metring in a non-inital user namespace,
e.g., a container, this is changed to GENL_UNS_ADMIN_PERM.

Signed-off-by: Michael Weiß 
---
 net/openvswitch/meter.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
index 5a9009bd..6e38f68f88c2 100644
--- a/net/openvswitch/meter.c
+++ b/net/openvswitch/meter.c
@@ -343,7 +343,7 @@ static struct dp_meter *dp_meter_create(struct nlattr **a)
return ERR_PTR(-EINVAL);
 
/* Allocate and set up the meter before locking anything. */
-   meter = kzalloc(struct_size(meter, bands, n_bands), GFP_KERNEL);
+   meter = kzalloc(struct_size(meter, bands, n_bands), GFP_KERNEL_ACCOUNT);
if (!meter)
return ERR_PTR(-ENOMEM);
 
@@ -687,9 +687,9 @@ static const struct genl_small_ops dp_meter_genl_ops[] = {
},
{ .cmd = OVS_METER_CMD_SET,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
-  *  privilege.
-  */
+   .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+  *  privilege.
+  */
.doit = ovs_meter_cmd_set,
},
{ .cmd = OVS_METER_CMD_GET,
@@ -699,9 +699,9 @@ static const struct genl_small_ops dp_meter_genl_ops[] = {
},
{ .cmd = OVS_METER_CMD_DEL,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
-  *  privilege.
-  */
+   .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+  *  privilege.
+  */
.doit = ovs_meter_cmd_del
},
 };
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread 王志克
Hi Peng,

Right, I also met this issue, and wondering the sequence for this 
inconsistence, and would like to hear your “new cause”.

Anyway I believe your below patch should fix this issue.
[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops

Br,
Zhike

From: ".贺鹏" 
Date: Friday, September 23, 2022 at 8:59 PM
To: 王志克 
Cc: "ovs-dev@openvswitch.org" , "d...@openvswitch.org" 

Subject: [来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix 
inconsistent processing between ukey and megaflow

京东安全提示:此封邮件来自公司外部,除非您能判断发件人和知道邮件内容安全,否则请勿打开链接或者附件。
JD Security Tips: Please do not click on links or open attachments unless you 
trust the sender and know the content is safe.


Hi, Zhike,

After receiving your email, I was becoming curious about this code and did more 
investigation on it.

and I found some problems with the code and now I believe this inconsistent 
processing is NOT the root cause for the inconsistent actions between ukey and 
datapath.
and I found a new cause for that, but due to this complex race between PMD and 
revalidator, I wish this time I am right.

But before that, why are you interested in this patch? Have you found the same 
issue in your environment?




On Thu, Sep 22, 2022 at 6:54 PM .贺鹏 
mailto:hepeng.0...@bytedance.com>> wrote:
Hi, zhike,

It's difficult to give a very clear sequences about how this inconsistency 
happens, but I can give you more details.

This is observed in our production environment. The correct megaflow should 
encap packets with vxlan header and send out, but the action is drop.
This is usually because the neigh info is not available at the moment when the 
upcall happens.

Normally, the drop action is ephemeral, and reavalidator will later modify the 
megaflow's action into the tnl_push.

But there are a few of cases, only happened 1~2 times in a year, where the drop 
actions will never be replaced by tnl_push.

just like in the commits mentioned,

"The coverage command shows revalidators have dumped several times,
however the correct actions are not set. This implies that the ukey's
action does not equal to the meagaflow's, i.e. revalidators think the underlying

megaflow's actions are correct however they are not."



I do not know how this happened, but I do think this inconsistent processing 
could be one of the reasons.

Even there is no such bug, I think keeping processing inconsistent is necessary.





On Wed, Sep 21, 2022 at 5:57 PM 王志克 mailto:wangzh...@jd.com>> 
wrote:
Hi Hepeng,

Can you please explain the sequence that how this inconsistence could happen? 
Why you believe the current actions in existing netdev_flow is old?

Thanks.

Br,
wangzhike




*
[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey 
and megaflow
Message ID

20220604151857.66550-4-hepeng.0...@bytedance.com

State

New

Headers

show

Series

[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops 
 | expand

Checks
Context

Check

Description

ovsrobot/apply-robot

warning

apply and check: 
warning

ovsrobot/github-robot-_Build_and_Test

success

github build: 
passed

ovsrobot/intel-ovs-compilation

success

test: 
success

Commit Message
Peng 
HeJune 
4, 2022, 3:18 p.m. UTC

When PMDs perform upcalls, the newly generated ukey will replace

the old, however, the newly generated mageflow will be discard

to reuse the old one without checking if the actions of new and

old are equal.



We observe in the production environment that sometimes a megaflow

with wrong actions keep staying in datapath. The coverage command shows

revalidators have dumped serveral times, however the correct

actions are not set. This implies that the ukey's action does not

equal to the meagaflow's, i.e. revalidators think the underlying

megaflow's actions are correct however they are not.



We also check the megaflow using the ofproto/trace command, and the

actions are not matched with the ones in the actual magaflow. By

performing a revalidator/purge command, the right actions are set.



Signed-off-by: Peng He 
mailto:hepeng.0...@bytedance.com>>

---

 lib/dpif-netdev.c | 17 -

 1 file changed, 16 insertions(+), 1 deletion(-)

Comments
0-day 
RobotJune
 4, 2022, 3:44 p.m. UTC | #1

Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.

Thanks for your 

[ovs-dev] [PATCH v3 net-next 0/2] net: openvswitch: metering and conntrack in userns

2022-09-23 Thread Michael Weiß
Currently using openvswitch in a non-initial user namespace, e.g., an
unprivileged container, is possible but without metering and conntrack
support. This is due to the restriction of the corresponding Netlink
interfaces to the global CAP_NET_ADMIN.

This simple patches switch from GENL_ADMIN_PERM to GENL_UNS_ADMIN_PERM
in several cases to allow this also for the unprivileged container
use case.

We tested this for unprivileged containers created by the container
manager of GyroidOS (gyroidos.github.io). However, for other container
managers such as LXC or systemd which provide unprivileged containers
this should be apply equally.

Changes in v3:
- also changed GFP_KERNEL to GFP_KERNEL_ACCOUNT in
  ovs_ct_limit_set_zone_limit() as suggested by Jakub
- Rebased on net-next/main branch of networking tree

Changes in v2:
- changed GFP_KERNEL to GFP_KERNEL_ACCOUNT in dp_meter_create()
  as suggested by Paolo
- Rebased on net branch of networking tree

Michael Weiß (2):
  net: openvswitch: allow metering in non-initial user namespace
  net: openvswitch: allow conntrack in non-initial user namespace

 net/openvswitch/conntrack.c | 13 -
 net/openvswitch/meter.c | 14 +++---
 2 files changed, 15 insertions(+), 12 deletions(-)


base-commit: 3aba35bb201fd2481b3fd5794120d9d1b0734fe8
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netlink: add revalidator for offload of meters

2022-09-23 Thread Eelco Chaudron


On 23 Sep 2022, at 10:35, Simon Horman wrote:

> From: Yifan Li 
>
> Allow revalidator for hardware offload of meters via OVS-TC.
> Without revalidator, tc meter action can not be deleted while
> flow exists. The revalidator fix this bug by continuously
> checking existing meters and delete the unneeded ones. The
> autotest cases of revalidator are also added.

This is not a review, but just some quick observations/questions.

- Please undo the n_handlers to _n_handlers renames, as it does not make sense 
for this patch.
- New TC related code was added to dpif-netdev.c this is not the place where 
such code should live.
- Log messages do not need a trailing \n, and please have them uniform, i.e. 
start with a capital for all new log messages.
- Can you explain why we need this expensive revalidate process and isn’t there 
a better way to make sure they are cleaned up properly?

Also including Jianbo who added the initial code.


//Eelco

> Signed-off-by: Yifan Li 
> Signed-off-by: Simon Horman 
> ---
>  lib/dpif-netdev.c|   1 +
>  lib/dpif-netlink.c   | 257 +++
>  lib/dpif-netlink.h   |   2 +
>  lib/dpif-provider.h  |   5 +
>  lib/dpif.c   |  15 +-
>  lib/id-pool.c|  13 ++
>  lib/id-pool.h|   4 +-
>  lib/netdev-linux.c   |   6 +
>  lib/netdev-offload-tc.c  |  11 +-
>  lib/tc.c |   5 -
>  lib/tc.h |  10 +-
>  ofproto/ofproto-dpif-upcall.c|   5 +
>  ofproto/ofproto-dpif.h   |   2 +
>  tests/system-offloads-traffic.at |   4 +
>  14 files changed, 301 insertions(+), 39 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index a45b460145c6..365aacadb03a 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -9583,6 +9583,7 @@ const struct dpif_class dpif_netdev_class = {
>  dpif_netdev_meter_set,
>  dpif_netdev_meter_get,
>  dpif_netdev_meter_del,
> +NULL,   /* meter_revalidate */
>  dpif_netdev_bond_add,
>  dpif_netdev_bond_del,
>  dpif_netdev_bond_stats_get,
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index a620a6ec52dd..4eee4761dc6b 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -48,6 +49,7 @@
>  #include "netlink.h"
>  #include "netnsid.h"
>  #include "odp-util.h"
> +#include "ofproto/ofproto-dpif.h"
>  #include "openvswitch/dynamic-string.h"
>  #include "openvswitch/flow.h"
>  #include "openvswitch/hmap.h"
> @@ -61,6 +63,7 @@
>  #include "packets.h"
>  #include "random.h"
>  #include "sset.h"
> +#include "tc.h"
>  #include "timeval.h"
>  #include "unaligned.h"
>  #include "util.h"
> @@ -2571,18 +2574,18 @@ static uint32_t
>  dpif_netlink_calculate_n_handlers(void)
>  {
>  uint32_t total_cores = count_total_cores();
> -uint32_t n_handlers = count_cpu_cores();
> +uint32_t _n_handlers = count_cpu_cores();
>  uint32_t next_prime_num;
>
>  /* If not all cores are available to OVS, create additional handler
>   * threads to ensure more fair distribution of load between them.
>   */
> -if (n_handlers < total_cores && total_cores > 2) {
> -next_prime_num = next_prime(n_handlers + 1);
> -n_handlers = MIN(next_prime_num, total_cores);
> +if (_n_handlers < total_cores && total_cores > 2) {
> +next_prime_num = next_prime(_n_handlers + 1);
> +_n_handlers = MIN(next_prime_num, total_cores);
>  }
>
> -return n_handlers;
> +return _n_handlers;
>  }
>
>  static int
> @@ -2591,17 +2594,17 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
> dpif_netlink *dpif)
>  {
>  int handler_id;
>  int error = 0;
> -uint32_t n_handlers;
>  uint32_t *upcall_pids;
> +uint32_t _n_handlers;
>
> -n_handlers = dpif_netlink_calculate_n_handlers();
> -if (dpif->n_handlers != n_handlers) {
> +_n_handlers = dpif_netlink_calculate_n_handlers();
> +if (dpif->n_handlers != _n_handlers) {
>  VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers",
> -   n_handlers);
> +   _n_handlers);
>  destroy_all_handlers(dpif);
> -upcall_pids = xzalloc(n_handlers * sizeof *upcall_pids);
> -dpif->handlers = xzalloc(n_handlers * sizeof *dpif->handlers);
> -for (handler_id = 0; handler_id < n_handlers; handler_id++) {
> +upcall_pids = xzalloc(_n_handlers * sizeof *upcall_pids);
> +dpif->handlers = xzalloc(_n_handlers * sizeof *dpif->handlers);
> +for (handler_id = 0; handler_id < _n_handlers; handler_id++) {
>  struct dpif_handler *handler = >handlers[handler_id];
>  error = create_nl_sock(dpif, >sock);
>  if (error) {
> @@ -2615,9 +2618,9 @@ 

Re: [ovs-dev] [External] Re:[ovs-dev, ovs-dev, v2, 4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread . 贺鹏
Hi, Zhike,

After receiving your email, I was becoming curious about this code and did
more investigation on it.

and I found some problems with the code and now I believe this inconsistent
processing is NOT the root cause for the inconsistent actions between ukey
and datapath.
and I found a new cause for that, but due to this complex race between PMD
and revalidator, I wish this time I am right.

But before that, why are you interested in this patch? Have you found the
same issue in your environment?




On Thu, Sep 22, 2022 at 6:54 PM .贺鹏  wrote:

> Hi, zhike,
>
> It's difficult to give a very clear sequences about how this inconsistency
> happens, but I can give you more details.
>
> This is observed in our production environment. The correct megaflow
> should encap packets with vxlan header and send out, but the action is drop.
> This is usually because the neigh info is not available at the moment when
> the upcall happens.
>
> Normally, the drop action is ephemeral, and reavalidator will later modify
> the megaflow's action into the tnl_push.
>
> But there are a few of cases, only happened 1~2 times in a year, where the
> drop actions will never be replaced by tnl_push.
>
> just like in the commits mentioned,
>
> "The coverage command shows revalidators have dumped several times,
> however the correct actions are not set. This implies that the ukey's
> action does not equal to the meagaflow's, i.e. revalidators think the
> underlying
>
> megaflow's actions are correct however they are not."
>
>
> I do not know how this happened, but I do think this inconsistent processing 
> could be one of the reasons.
>
> Even there is no such bug, I think keeping processing inconsistent is 
> necessary.
>
>
>
>
>
> On Wed, Sep 21, 2022 at 5:57 PM 王志克  wrote:
>
>> Hi Hepeng,
>>
>>
>>
>> Can you please explain the sequence that how this inconsistence could
>> happen? Why you believe the current actions in existing netdev_flow is old?
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Br,
>>
>> wangzhike
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *
>>
>> [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between
>> ukey and megaflow
>>
>> Message ID
>>
>> 20220604151857.66550-4-hepeng.0...@bytedance.com
>>
>> State
>>
>> New
>>
>> Headers
>>
>> show
>>
>> Series
>>
>> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
>> |
>> expand
>> Checks
>>
>> Context
>>
>> Check
>>
>> Description
>>
>> ovsrobot/apply-robot
>>
>> *warning*
>>
>> apply and check: warning
>> 
>>
>> ovsrobot/github-robot-_Build_and_Test
>>
>> *success*
>>
>> github build: passed
>> 
>>
>> ovsrobot/intel-ovs-compilation
>>
>> *success*
>>
>> test: success
>> 
>> Commit Message
>>
>> Peng He
>> June
>> 4, 2022, 3:18 p.m. UTC
>>
>> When PMDs perform upcalls, the newly generated ukey will replace
>>
>> the old, however, the newly generated mageflow will be discard
>>
>> to reuse the old one without checking if the actions of new and
>>
>> old are equal.
>>
>>
>>
>> We observe in the production environment that sometimes a megaflow
>>
>> with wrong actions keep staying in datapath. The coverage command shows
>>
>> revalidators have dumped serveral times, however the correct
>>
>> actions are not set. This implies that the ukey's action does not
>>
>> equal to the meagaflow's, i.e. revalidators think the underlying
>>
>> megaflow's actions are correct however they are not.
>>
>>
>>
>> We also check the megaflow using the ofproto/trace command, and the
>>
>> actions are not matched with the ones in the actual magaflow. By
>>
>> performing a revalidator/purge command, the right actions are set.
>>
>>
>>
>> *Signed-off-by: Peng He > >*
>>
>> ---
>>
>>  lib/dpif-netdev.c | 17 -
>>
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> Comments
>>
>> 0-day Robot
>> June
>> 4, 2022, 3:44 p.m. UTC | #1
>> 
>>
>> Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your 
>> patch.
>>
>> Thanks for your contribution.
>>
>>
>>
>> I encountered some error that I wasn't expecting.  See the details below.
>>
>>
>>
>>
>>
>> checkpatch:
>>
>> ERROR: Author Peng He  needs to sign off.
>>
>> WARNING: Unexpected sign-offs from developers who are not authors or 
>> co-authors or committers: Peng He 
>>
>> Lines checked: 58, Warnings: 1, Errors: 1
>>
>>
>>
>>
>>
>> Please check this out.  If you feel there has been an error, please email 
>> 

Re: [ovs-dev] [branch-2.16, v2] dpdk: Use DPDK 20.11.6 release.

2022-09-23 Thread Kevin Traynor

On 22/09/2022 13:40, Michael Phelan wrote:

Update OVS CLI and relevant documentation to use DPDK 20.11.6.

A bug was introduced in DPDK 20.11.5 by the commit 33f2e3756186 ("vhost: fix unsafe 
vring addresses modifications").
This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation of 
the virtqueues happen.
A fix [1] has been posted and pushed to the DPDK 20.11 branch.
If a user wishes to avoid the issue then it is recommended to use DPDK 20.11.4 
until the release of DPDK 20.11.7.
It should be noted that DPDK 20.11.4 does not benefit from the numerous bug 
fixes addressed since its release.
If a user wishes to benefit from these fixes it is recommended to use DPDK 
20.11.6.

[1] 
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/
Signed-off-by: Michael Phelan 



For branches 2.15 [0] and 2.16 [1] I ran github actions and it failed. 
For 2.16 branch I removed the patch and it passed [2]. It seems like 
that the meson used (0.47.1 - which is min version for 20.11) does not 
like the 20.11.5/6 package, or there is some other github effect. It is 
working fine with 20.11.4.


Afterwards, checking the ovs-build mailing [4] list I also see failures 
here and an additional failure for 2.17 branch. So all these failures 
need to checked.


[1] https://github.com/kevintraynor/ovs/actions/runs/3111862351
[2] https://github.com/kevintraynor/ovs/actions/runs/3111865180
[3] https://github.com/kevintraynor/ovs/actions/runs/3112089634
[4] 
https://mail.openvswitch.org/pipermail/ovs-build/2022-September/date.html



---
v2:
   - Update recommended DPDK version for older OvS versions in Documentation.

---
---
  .ci/linux-build.sh   |  2 +-
  Documentation/faq/releases.rst   |  8 
  Documentation/intro/install/dpdk.rst |  8 
  NEWS | 15 +++
  4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index c06e88c57..dd0a57850 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -216,7 +216,7 @@ fi
  
  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then

  if [ -z "$DPDK_VER" ]; then
-DPDK_VER="20.11.4"
+DPDK_VER="20.11.6"
  fi
  install_dpdk $DPDK_VER
  if [ "$CC" = "clang" ]; then
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index d62d575eb..977822984 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -205,10 +205,10 @@ Q: What DPDK version does each Open vSwitch release work 
with?
  2.10.x   17.11.10
  2.11.x   18.11.9
  2.12.x   18.11.9
-2.13.x   19.11.10
-2.14.x   19.11.10
-2.15.x   20.11.4
-2.16.x   20.11.4
+2.13.x   19.11.13
+2.14.x   19.11.13
+2.15.x   20.11.6
+2.16.x   20.11.6
   
  
  Q: Are all the DPDK releases that OVS versions work with maintained?

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index 9ce5285c5..8bc6043f7 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
  In addition to the requirements described in :doc:`general`, building Open
  vSwitch with DPDK will require the following:
  
-- DPDK 20.11.4

+- DPDK 20.11.6
  
  - A `DPDK supported NIC`_
  
@@ -73,9 +73,9 @@ Install DPDK

  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
  
 $ cd /usr/src/

-   $ wget https://fast.dpdk.org/rel/dpdk-20.11.4.tar.xz
-   $ tar xf dpdk-20.11.4.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-stable-20.11.4
+   $ wget https://fast.dpdk.org/rel/dpdk-20.11.6.tar.xz
+   $ tar xf dpdk-20.11.6.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-stable-20.11.6
 $ cd $DPDK_DIR
  
  #. Configure and install DPDK using Meson

diff --git a/NEWS b/NEWS
index c6b9c2ca8..76ecb2b80 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,20 @@
  v2.16.5 - xx xxx 
  -
+   - DPDK:
+ * OVS validated with DPDK 20.11.6.
+   A bug was introduced in DPDK 20.11.5 by the commit
+   33f2e3756186 ("vhost: fix unsafe vring addresses modifications").
+   This bug can cause a deadlock when vIOMMU is enabled and NUMA
+   reallocation of the virtqueues happen.
+   A fix has been posted and pushed to the DPDK 20.11 branch.
+   It can be found here:
+   
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/.
+   If a user wishes to avoid the issue then it is recommended to use
+   DPDK 20.11.4 until the release of DPDK 20.11.7.
+   It should be noted that DPDK 20.11.4 does not benefit from the numerous
+   bug fixes addressed since its release.
+   If a user wishes to benefit from these fixes it is recommended to use
+   DPDK 20.11.6.
  
  v2.16.4 - 15 Jun 2022

  

Re: [ovs-dev] [PATCH v5 1/2] ofproto-dpif-xlate: Extract the freezing processing into a function

2022-09-23 Thread Eelco Chaudron


On 23 Sep 2022, at 12:35, Ales Musil wrote:

> On Fri, Sep 23, 2022 at 12:29 PM Eelco Chaudron  wrote:
>
>>
>>
>> On 7 Sep 2022, at 8:54, Ales Musil wrote:
>>
>>> Through out the code there is the same pattern that occurs
>>> in regards to to finish_freezing when ctx->freezing=true or
>>> xlate_action_set when ctx->freezing=false. Extract it to common
>>> function that is called from those places instead.
>>>
>>> Signed-off-by: Ales Musil 
>>
>> Thanks for this change, it looks good to me.
>>
>> Acked-by: Eelco Chaudron 
>>
>>
> Thank you for the review. Actually I think I have made a mistake.
> I did not realize that the xlate_action_set() can actually start freezing
> again.
> So the following diff should be applied to this patch set. If there will be
> another version
> I'll will apply the diff below:
>
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index e181e3089..c84d6c9d0 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3884,10 +3884,11 @@ xlate_flow_is_protected(const struct xlate_ctx
> *ctx, const struct flow *flow, co
>  static void
>  xlate_ctx_process_freezing(struct xlate_ctx *ctx)
>  {
> +if (!ctx->freezing) {
> +xlate_action_set(ctx);
> +}
>  if (ctx->freezing) {
>  finish_freezing(ctx);
> -} else {
> -xlate_action_set(ctx);
>  }
>  }
>

Yes you are right, I totally missed the “else” when reviewing :(

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 2/2] ofproto-dpif-xlate: Optimize the clone for patch ports

2022-09-23 Thread Eelco Chaudron


On 7 Sep 2022, at 8:54, Ales Musil wrote:

> When the packet was traveling through patch port boundary
> OvS would check if any of the actions is reversible,
> if not it would clone the packet. However, the check
> was only at the first level of the second bridge.
> That caused some issues when the packet had gone
> through more actions, some of them might have been
> irreversible.
>
> In order to keep the semantics the same we might
> need to run the actions twice in the worst case
> scenario. During the clone there are 4 scenarios
> that can happen.
>
> 1) The action is last one for that flow,
> in that case we just continue without clone.
>
> 2) There is irreversible action in the action
> set (first level). In this case we know that
> there is at leas one irreversible action which
> is enough to do clone.
>
> 3) All actions in first level are reversible,
> we can try to run all actions as if we don't
> need any clone and inspect the ofpbuf at the
> end. In positive case there are no irreversible
> actions so we will just submit the buffer and continue.
>
> 4) This is same as 3) with the difference that
> there is irreversible action in the ofpbuf.
> To keep the semantics we need to re-run the actions
> and treat it as clone. This requires resotration
> of the xlate_ctx.
>
> Add test cases for all irreversible actions
> to see if they are properly cloned.
>
> Signed-off-by: Ales Musil 

Some more comments on this v5, see inline below.

> ---
> v4: Rebase on top of current master.
> Address comments from Eelco.
> v5: Make the code more readable and reduce duplication.
> ---
>  lib/odp-util.c   | 148 +
>  lib/odp-util.h   |   1 +
>  ofproto/ofproto-dpif-trace.c |   2 +-
>  ofproto/ofproto-dpif-trace.h |   1 +
>  ofproto/ofproto-dpif-xlate.c | 177 +--
>  tests/ofproto-dpif.at| 105 +
>  6 files changed, 386 insertions(+), 48 deletions(-)
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index ba5be4bb3..6526daafb 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -8768,3 +8768,151 @@ commit_odp_actions(const struct flow *flow, struct 
> flow *base,
>
>  return slow1 ? slow1 : slow2;
>  }
> +
> +static inline bool
> +nlattr_action_is_reversible(const uint16_t type)

The below function are all named irreversible, so it might confuse people 
looking at the code.
Maybe make this function nlattr_action_is_irreversible(), so it's more 
consistent.

> +{
> +switch ((enum ovs_action_attr) type) {
> +case OVS_ACTION_ATTR_CT:
> +case OVS_ACTION_ATTR_CT_CLEAR:
> +case OVS_ACTION_ATTR_TRUNC:
> +case OVS_ACTION_ATTR_PUSH_ETH:
> +case OVS_ACTION_ATTR_POP_ETH:
> +case OVS_ACTION_ATTR_PUSH_NSH:
> +case OVS_ACTION_ATTR_POP_NSH:
> +case OVS_ACTION_ATTR_METER:
> +case OVS_ACTION_ATTR_TUNNEL_PUSH:
> +case OVS_ACTION_ATTR_TUNNEL_POP:
> +return false;
> +
> +case OVS_ACTION_ATTR_UNSPEC:
> +case OVS_ACTION_ATTR_OUTPUT:
> +case OVS_ACTION_ATTR_USERSPACE:
> +case OVS_ACTION_ATTR_SET:
> +case OVS_ACTION_ATTR_PUSH_VLAN:
> +case OVS_ACTION_ATTR_POP_VLAN:
> +case OVS_ACTION_ATTR_SAMPLE:
> +case OVS_ACTION_ATTR_RECIRC:
> +case OVS_ACTION_ATTR_HASH:
> +case OVS_ACTION_ATTR_SET_MASKED:
> +case OVS_ACTION_ATTR_CLONE:
> +case OVS_ACTION_ATTR_CHECK_PKT_LEN:
> +case OVS_ACTION_ATTR_LB_OUTPUT:
> +case OVS_ACTION_ATTR_ADD_MPLS:
> +case OVS_ACTION_ATTR_PUSH_MPLS:
> +case OVS_ACTION_ATTR_POP_MPLS:
> +case OVS_ACTION_ATTR_DROP:
> +case __OVS_ACTION_ATTR_MAX:

This is an unknown/unsupported action, I would suggest returning false for this.

> +return true;
> +}
> +return false;
> +}
> +
> +static bool
> +odp_cpl_contains_irreversible_actions(const struct nlattr *attr)
> +{
> +static const struct nl_policy ovs_cpl_policy[] = {
> +[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN] = {.type = NL_A_U16},
> +[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER] = {.type = NL_A_NESTED},
> +[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL] = {.type = 
> NL_A_NESTED},
> +};
> +struct nlattr *a[ARRAY_SIZE(ovs_cpl_policy)];
> +
> +if (!nl_parse_nested(attr, ovs_cpl_policy, a, ARRAY_SIZE(a))) {
> +return false;
> +}
> +
> +const struct nlattr *greater =
> +a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER];
> +const struct nlattr *less =
> +a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL];
> +const void *greater_data = nl_attr_get(greater);
> +const void *less_data = nl_attr_get(less);
> +size_t greater_len = nl_attr_get_size(greater);
> +size_t less_len = nl_attr_get_size(less);
> +
> +return odp_contains_irreversible_action(greater_data, greater_len) ||
> +   odp_contains_irreversible_action(less_data, 

Re: [ovs-dev] [branch-2.17, v2] dpdk: Use DPDK 21.11.2 release.

2022-09-23 Thread Kevin Traynor

On 22/09/2022 10:39, Michael Phelan wrote:

Update OVS CLI and relevant documentation to use DPDK 21.11.2.

DPDK 21.11.2 contains fixes for the CVEs listed below:
CVE-2022-28199 [1]
CVE-2022-2132 [2]

A bug was introduced in DPDK 21.11.1 by the commit 01e3dee29c02 ("vhost: fix unsafe 
vring addresses modifications").
This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation of 
the virtqueues happen.
A fix [3] has been posted and pushed to the DPDK 21.11 branch.
If a user wishes to avoid the issue then it is recommended to use DPDK 21.11.0 
until the release of DPDK 21.11.3.
It should be noted that DPDK 21.11.0 does not benefit from the numerous bug and 
CVE fixes addressed since its release.
If a user wishes to benefit from these fixes it is recommended to use DPDK 
21.11.2.

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
[3] 
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/

Signed-off-by: Michael Phelan 



branch-2.17, reviewed, tested basic PVP and ran github actions.

Acked-by: Kevin Traynor 


---
v2:
   - Update recommended DPDK version for older OvS versions in Documentation.

---
---
  .ci/linux-build.sh   |  2 +-
  Documentation/faq/releases.rst   | 10 +-
  Documentation/intro/install/dpdk.rst |  8 
  NEWS | 18 ++
  4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 2dabd3d0a..392c7ee79 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -220,7 +220,7 @@ fi
  
  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then

  if [ -z "$DPDK_VER" ]; then
-DPDK_VER="21.11.1"
+DPDK_VER="21.11.2"
  fi
  install_dpdk $DPDK_VER
  fi
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 33a0d5d2d..49895c595 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -206,11 +206,11 @@ Q: What DPDK version does each Open vSwitch release work 
with?
  2.10.x   17.11.10
  2.11.x   18.11.9
  2.12.x   18.11.9
-2.13.x   19.11.10
-2.14.x   19.11.10
-2.15.x   20.11.4
-2.16.x   20.11.4
-2.17.x   21.11.1
+2.13.x   19.11.13
+2.14.x   19.11.13
+2.15.x   20.11.6
+2.16.x   20.11.6
+2.17.x   21.11.2
   
  
  Q: Are all the DPDK releases that OVS versions work with maintained?

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index f8f01bfad..a284e6851 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
  In addition to the requirements described in :doc:`general`, building Open
  vSwitch with DPDK will require the following:
  
-- DPDK 21.11.1

+- DPDK 21.11.2
  
  - A `DPDK supported NIC`_
  
@@ -73,9 +73,9 @@ Install DPDK

  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
  
 $ cd /usr/src/

-   $ wget https://fast.dpdk.org/rel/dpdk-21.11.1.tar.xz
-   $ tar xf dpdk-21.11.1.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-stable-21.11
+   $ wget https://fast.dpdk.org/rel/dpdk-21.11.2.tar.xz
+   $ tar xf dpdk-21.11.2.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-stable-21.11.2
 $ cd $DPDK_DIR
  
  #. Configure and install DPDK using Meson

diff --git a/NEWS b/NEWS
index 7c71284f9..36fcbb874 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,24 @@ v2.17.3 - xx xxx 
 configuration in a clustered databse independently for each server.
 E.g. for listening on unique addresses.  See the ovsdb.local-config.5
 manpage for schema details.
+   - DPDK:
+ * OVS validated with DPDK 21.11.2.
+   DPDK 21.11.2 contains fixes for the following CVEs:
+   CVE-2022-28199 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
+   CVE-2022-2132 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
+   A bug was introduced in DPDK 21.11.1 by the commit
+   01e3dee29c02 ("vhost: fix unsafe vring addresses modifications").
+   This bug can cause a deadlock when vIOMMU is enabled and NUMA
+   reallocation of the virtqueues happen.
+   A fix has been posted and pushed to the DPDK 21.11 branch.
+   It can be found here:
+   
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/.
+   If a user wishes to avoid the issue then it is recommended to use
+   DPDK 21.11.0 until the release of DPDK 21.11.3.
+   It should be noted that DPDK 21.11.0 does not benefit from the numerous
+   bug and CVE fixes addressed since its release.
+   If a user wishes to benefit from these fixes it is recommended to use
+   DPDK 21.11.2.
  
  v2.17.2 - 15 Jun 2022

  -



Re: [ovs-dev] [branch-3.0, v2] dpdk: Use DPDK 21.11.2 release.

2022-09-23 Thread Kevin Traynor

On 22/09/2022 10:35, Michael Phelan wrote:

Update OVS CLI and relevant documentation to use DPDK 21.11.2.

DPDK 21.11.2 contains fixes for the CVEs listed below:
CVE-2022-28199 [1]
CVE-2022-2132 [2]

A bug was introduced in DPDK 21.11.1 by the commit 01e3dee29c02 ("vhost: fix unsafe 
vring addresses modifications").
This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation of 
the virtqueues happen.
A fix [3] has been posted and pushed to the DPDK 21.11 branch.
If a user wishes to avoid the issue then it is recommended to use DPDK 21.11.0 
until the release of DPDK 21.11.3.
It should be noted that DPDK 21.11.0 does not benefit from the numerous bug and 
CVE fixes addressed since its release.
If a user wishes to benefit from these fixes it is recommended to use DPDK 
21.11.2.

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
[3] 
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/

Signed-off-by: Michael Phelan 



Hi Michael,

For all patches, I think Ian will need to shorten the commit description 
line widths on commit as they are very long.


I didn't notice any copy and paste mistakes in the listing of DPDK 
versions, previous versions, future versions or the different commit ids 
with the different LTSs on OVS branches - you deserve a prize for that :-)


branch-3.0, reviewed, tested basic PVP and ran github actions.

Acked-by: Kevin Traynor 


---
v2:
   - Update recommended DPDK version for older OvS versions in Documentation.

---
---
  .ci/linux-build.sh   |  2 +-
  Documentation/faq/releases.rst   | 12 ++--
  Documentation/intro/install/dpdk.rst |  8 
  NEWS | 18 ++
  4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 509314a07..23c8bbb7a 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -228,7 +228,7 @@ fi
  
  if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then

  if [ -z "$DPDK_VER" ]; then
-DPDK_VER="21.11.1"
+DPDK_VER="21.11.2"
  fi
  install_dpdk $DPDK_VER
  fi
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 1bc22a6ba..6ce0b4cd5 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -210,12 +210,12 @@ Q: What DPDK version does each Open vSwitch release work 
with?
  2.10.x   17.11.10
  2.11.x   18.11.9
  2.12.x   18.11.9
-2.13.x   19.11.10
-2.14.x   19.11.10
-2.15.x   20.11.4
-2.16.x   20.11.4
-2.17.x   21.11.1
-3.0.x21.11.1
+2.13.x   19.11.13
+2.14.x   19.11.13
+2.15.x   20.11.6
+2.16.x   20.11.6
+2.17.x   21.11.2
+3.0.x21.11.2
   
  
  Q: Are all the DPDK releases that OVS versions work with maintained?

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index 0f3712c79..a284e6851 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
  In addition to the requirements described in :doc:`general`, building Open
  vSwitch with DPDK will require the following:
  
-- DPDK 21.11.1

+- DPDK 21.11.2
  
  - A `DPDK supported NIC`_
  
@@ -73,9 +73,9 @@ Install DPDK

  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
  
 $ cd /usr/src/

-   $ wget https://fast.dpdk.org/rel/dpdk-21.11.1.tar.xz
-   $ tar xf dpdk-21.11.1.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-stable-21.11.1
+   $ wget https://fast.dpdk.org/rel/dpdk-21.11.2.tar.xz
+   $ tar xf dpdk-21.11.2.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-stable-21.11.2
 $ cd $DPDK_DIR
  
  #. Configure and install DPDK using Meson

diff --git a/NEWS b/NEWS
index b26590e49..ad5ba021d 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,23 @@
  v3.0.1 - xx xxx 
  
+   - DPDK:
+ * OVS validated with DPDK 21.11.2.
+   DPDK 21.11.2 contains fixes for the following CVEs:
+   CVE-2022-28199 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
+   CVE-2022-2132 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
+   A bug was introduced in DPDK 21.11.1 by the commit
+   01e3dee29c02 ("vhost: fix unsafe vring addresses modifications").
+   This bug can cause a deadlock when vIOMMU is enabled and NUMA
+   reallocation of the virtqueues happen.
+   A fix has been posted and pushed to the DPDK 21.11 branch.
+   It can be found here:
+   
https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.march...@redhat.com/.
+   If a user wishes to avoid the issue then it is recommended to use
+   DPDK 21.11.0 until the release of DPDK 21.11.3.
+   It should be noted that DPDK 21.11.0 does not 

Re: [ovs-dev] [PATCH v5 1/2] ofproto-dpif-xlate: Extract the freezing processing into a function

2022-09-23 Thread Ales Musil
On Fri, Sep 23, 2022 at 12:29 PM Eelco Chaudron  wrote:

>
>
> On 7 Sep 2022, at 8:54, Ales Musil wrote:
>
> > Through out the code there is the same pattern that occurs
> > in regards to to finish_freezing when ctx->freezing=true or
> > xlate_action_set when ctx->freezing=false. Extract it to common
> > function that is called from those places instead.
> >
> > Signed-off-by: Ales Musil 
>
> Thanks for this change, it looks good to me.
>
> Acked-by: Eelco Chaudron 
>
>
Thank you for the review. Actually I think I have made a mistake.
I did not realize that the xlate_action_set() can actually start freezing
again.
So the following diff should be applied to this patch set. If there will be
another version
I'll will apply the diff below:


diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index e181e3089..c84d6c9d0 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3884,10 +3884,11 @@ xlate_flow_is_protected(const struct xlate_ctx
*ctx, const struct flow *flow, co
 static void
 xlate_ctx_process_freezing(struct xlate_ctx *ctx)
 {
+if (!ctx->freezing) {
+xlate_action_set(ctx);
+}
 if (ctx->freezing) {
 finish_freezing(ctx);
-} else {
-xlate_action_set(ctx);
 }
 }

Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/2] ofproto-dpif-xlate: Extract the freezing processing into a function

2022-09-23 Thread Eelco Chaudron



On 7 Sep 2022, at 8:54, Ales Musil wrote:

> Through out the code there is the same pattern that occurs
> in regards to to finish_freezing when ctx->freezing=true or
> xlate_action_set when ctx->freezing=false. Extract it to common
> function that is called from those places instead.
>
> Signed-off-by: Ales Musil 

Thanks for this change, it looks good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/3] controller: Avoid building dhcp/nd_ra/controller_event opt maps every time.

2022-09-23 Thread Dumitru Ceara
On 9/22/22 08:24, Han Zhou wrote:
> On Wed, Sep 14, 2022 at 6:09 AM Dumitru Ceara  wrote:
>>
> 
> Thanks Dumitru for the improvement.
> 
>> The nd_ra_opts and controller_event_ops are actually static they never
>> change at runtime.  DHCP records can instead be computed when populating
>> the lflow "input context" to be used during incremental processing.  This
>> is likely more efficient than building the DHCP opts maps for every
> logical
>> flow recomputed incrementally.
> 
> Maybe a slight correction here. Before this patch it wasn't rebuilding the
> maps for *every logical flow*. For lflow changes, it was performed once for
> all the changed lflows. However, it was indeed inefficient for "reference"
> changes such as port-bindings, where it was performed for every
> port-binding.
> 

Sure, it's more clear like this.

>>
>> An even more efficient solution would be to introduce proper incremental
>> processing for the DHCP opt maps but that seems like too much complexity
>> without enough benefit.  It's probably OK to recompute the maps at every
>> ovn-controller iteration.
> 
> Well, every iteration for every handler, but still, I think it is totally
> OK in terms of performance (and already better than before).
> However, there may be a benefit of splitting the DHCP opt to a separate I-P
> node, as input to the lflow_output node. It would avoid introducing the
> destroy_lflow_ctx(). I don't see complexity here because the dependency
> looks quite clear. But I am ok either way for this patch.
> For the destroy_lflow_ctx(), if we want to keep it, I'd remove the
> l_ctx_out parameter, because the output is essentially the data of the
> lflow_output engine node, and naturally shouldn't contain anything to be
> destroyed here.

Ok, I'll try to just split the dhcp opt part.  Should be more clear, I
agree.

> 
> Please see one more finding below.
> 
>>
>> Signed-off-by: Dumitru Ceara 
>> ---
>>  controller/lflow.c  |  206
> +++
>>  controller/lflow.h  |9 +-
>>  controller/ovn-controller.c |  128 +++
>>  lib/ovn-l7.h|2
>>  4 files changed, 110 insertions(+), 235 deletions(-)
>>
>> diff --git a/controller/lflow.c b/controller/lflow.c
>> index eef44389f..de9f17b9a 100644
>> --- a/controller/lflow.c
>> +++ b/controller/lflow.c
>> @@ -90,9 +90,6 @@ add_matches_to_flow_table(const struct
> sbrec_logical_flow *,
>>struct lflow_ctx_out *);
>>  static void
>>  consider_logical_flow(const struct sbrec_logical_flow *lflow,
>> -  struct hmap *dhcp_opts, struct hmap *dhcpv6_opts,
>> -  struct hmap *nd_ra_opts,
>> -  struct controller_event_options
> *controller_event_opts,
>>bool is_recompute,
>>struct lflow_ctx_in *l_ctx_in,
>>struct lflow_ctx_out *l_ctx_out);
>> @@ -371,40 +368,9 @@ add_logical_flows(struct lflow_ctx_in *l_ctx_in,
>>struct lflow_ctx_out *l_ctx_out)
>>  {
>>  const struct sbrec_logical_flow *lflow;
>> -
>> -struct hmap dhcp_opts = HMAP_INITIALIZER(_opts);
>> -struct hmap dhcpv6_opts = HMAP_INITIALIZER(_opts);
>> -const struct sbrec_dhcp_options *dhcp_opt_row;
>> -SBREC_DHCP_OPTIONS_TABLE_FOR_EACH (dhcp_opt_row,
>> -   l_ctx_in->dhcp_options_table) {
>> -dhcp_opt_add(_opts, dhcp_opt_row->name, dhcp_opt_row->code,
>> - dhcp_opt_row->type);
>> -}
>> -
>> -
>> -const struct sbrec_dhcpv6_options *dhcpv6_opt_row;
>> -SBREC_DHCPV6_OPTIONS_TABLE_FOR_EACH (dhcpv6_opt_row,
>> - l_ctx_in->dhcpv6_options_table)
> {
>> -   dhcp_opt_add(_opts, dhcpv6_opt_row->name,
> dhcpv6_opt_row->code,
>> -dhcpv6_opt_row->type);
>> -}
>> -
>> -struct hmap nd_ra_opts = HMAP_INITIALIZER(_ra_opts);
>> -nd_ra_opts_init(_ra_opts);
>> -
>> -struct controller_event_options controller_event_opts;
>> -controller_event_opts_init(_event_opts);
>> -
>>  SBREC_LOGICAL_FLOW_TABLE_FOR_EACH (lflow,
> l_ctx_in->logical_flow_table) {
>> -consider_logical_flow(lflow, _opts, _opts,
>> -  _ra_opts, _event_opts, true,
>> -  l_ctx_in, l_ctx_out);
>> +consider_logical_flow(lflow, true, l_ctx_in, l_ctx_out);
>>  }
>> -
>> -dhcp_opts_destroy(_opts);
>> -dhcp_opts_destroy(_opts);
>> -nd_ra_opts_destroy(_ra_opts);
>> -controller_event_opts_destroy(_event_opts);
>>  }
>>
>>  bool
>> @@ -414,29 +380,6 @@ lflow_handle_changed_flows(struct lflow_ctx_in
> *l_ctx_in,
>>  bool ret = true;
>>  const struct sbrec_logical_flow *lflow;
>>
>> -struct hmap dhcp_opts = HMAP_INITIALIZER(_opts);
>> -struct hmap dhcpv6_opts = HMAP_INITIALIZER(_opts);
>> -const struct sbrec_dhcp_options *dhcp_opt_row;
>> - 

Re: [ovs-dev] [PATCH 01/12] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread Feng Tang
Thanks Hyeonggon for looping in me.

On Thu, Sep 22, 2022 at 07:12:21PM +0800, Hyeonggon Yoo wrote:
> On Wed, Sep 21, 2022 at 08:10:02PM -0700, Kees Cook wrote:
> > In the effort to help the compiler reason about buffer sizes, the
> > __alloc_size attribute was added to allocators. This improves the scope
> > of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> > future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> > as the vast majority of callers are not expecting to use more memory
> > than what they asked for.
> > 
> > There is, however, one common exception to this: anticipatory resizing
> > of kmalloc allocations. These cases all use ksize() to determine the
> > actual bucket size of a given allocation (e.g. 128 when 126 was asked
> > for). This comes in two styles in the kernel:
> > 
> > 1) An allocation has been determined to be too small, and needs to be
> >resized. Instead of the caller choosing its own next best size, it
> >wants to minimize the number of calls to krealloc(), so it just uses
> >ksize() plus some additional bytes, forcing the realloc into the next
> >bucket size, from which it can learn how large it is now. For example:
> > 
> > data = krealloc(data, ksize(data) + 1, gfp);
> > data_len = ksize(data);
> > 
> > 2) The minimum size of an allocation is calculated, but since it may
> >grow in the future, just use all the space available in the chosen
> >bucket immediately, to avoid needing to reallocate later. A good
> >example of this is skbuff's allocators:
> > 
> > data = kmalloc_reserve(size, gfp_mask, node, );
> > ...
> > /* kmalloc(size) might give us more room than requested.
> >  * Put skb_shared_info exactly at the end of allocated zone,
> >  * to allow max possible filling before reallocation.
> >  */
> > osize = ksize(data);
> > size = SKB_WITH_OVERHEAD(osize);
> > 
> > In both cases, the "how large is the allocation?" question is answered
> > _after_ the allocation, where the compiler hinting is not in an easy place
> > to make the association any more. This mismatch between the compiler's
> > view of the buffer length and the code's intention about how much it is
> > going to actually use has already caused problems[1]. It is possible to
> > fix this by reordering the use of the "actual size" information.
> > 
> > We can serve the needs of users of ksize() and still have accurate buffer
> > length hinting for the compiler by doing the bucket size calculation
> > _before_ the allocation. Code can instead ask "how large an allocation
> > would I get for a given size?".
> > 
> > Introduce kmalloc_size_roundup(), to serve this function so we can start
> > replacing the "anticipatory resizing" uses of ksize().
> >
> 
> Cc-ing Feng Tang who may welcome this series ;)
 
Indeed! This will help our work of extending slub redzone check,
as we also ran into some trouble with ksize() users when extending
the redzone support to this extra allocated space than requested
size [1], and have to disable the redzone sanity for all ksize()
users [2].

[1]. 
https://lore.kernel.org/lkml/20220719134503.ga56...@shbuild999.sh.intel.com/
[2]. https://lore.kernel.org/lkml/20220913065423.520159-5-feng.t...@intel.com/

Thanks,
Feng

> > [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> > https://github.com/KSPP/linux/issues/183
> > 
> > Cc: Vlastimil Babka 
> > Cc: Pekka Enberg 
> > Cc: David Rientjes 
> > Cc: Joonsoo Kim 
> > Cc: Andrew Morton 
> > Cc: linux...@kvack.org
> > Signed-off-by: Kees Cook 
> > ---
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/12] slab: Introduce kmalloc_size_roundup()

2022-09-23 Thread Vlastimil Babka
On 9/22/22 23:49, Kees Cook wrote:
> On Thu, Sep 22, 2022 at 11:05:47PM +0200, Vlastimil Babka wrote:
>> On 9/22/22 17:55, Kees Cook wrote:
>> > On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
>> > [...]
>> > > So when this patch set is about to clean up this use case it should 
>> > > probably
>> > > also take care to remove ksize() or at least limit it so that it won't be
>> > > used for this use case in the future.
>> > 
>> > Yeah, my goal would be to eliminate ksize(), and it seems possible if
>> > other cases are satisfied with tracking their allocation sizes directly.
>> 
>> I think we could leave ksize() to determine the size without a need for
>> external tracking, but from now on forbid callers from using that hint to
>> overflow the allocation size they actually requested? Once we remove the
>> kasan/kfence hooks in ksize() that make the current kinds of usage possible,
>> we should be able to catch any offenders of the new semantics that would 
>> appear?
> 
> That's correct. I spent the morning working my way through the rest of
> the ksize() users I didn't clean up yesterday, and in several places I
> just swapped in __ksize(). But that wouldn't even be needed if we just
> removed the kasan unpoisoning from ksize(), etc.
> 
> I am tempted to leave it __ksize(), though, just to reinforce that it's
> not supposed to be used "normally". What do you think?

Sounds good. Note in linux-next there's now a series in slab.git planned for
6.1 that moves __ksize() declaration to mm/slab.h to make it more private.
But we don't want random users outside mm and related kasan/kfence
subsystems to include mm/slab.h, so we'll have to expose it again instead of
ksize().
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netlink: add revalidator for offload of meters

2022-09-23 Thread 0-day Robot
References:  <20220923083514.296638-1-simon.hor...@corigine.com>
 

Bleep bloop.  Greetings Simon Horman, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Simon Horman 
Lines checked: 715, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 3/3] inc-proc-eng: Rename the 'clear_tracked_data' callback to 'init_run'.

2022-09-23 Thread Dumitru Ceara
On 9/23/22 01:07, Han Zhou wrote:
> On Wed, Sep 14, 2022 at 6:10 AM Dumitru Ceara  wrote:
>>
>> This is actually more in line with how the callback is used.  It's called
>> every the incremental engine preparese for the next engine run.
>>
>> Signed-off-by: Dumitru Ceara 
> 
> Thanks Dumtru. The name looks good to me, but why does the new function
> require both the node and node->data as parameters?
> 

Thanks for the review!  Considering that this is an initialization
function that runs before every engine run for every node, users might
find it interesting to do other things too.  For example, getting some
OVSDB indexes from input nodes.

This is an example from the not yet submitted components template code:

static void
en_template_vars_init_run(struct engine_node *node, void *data)
{
struct ed_type_template_vars *tv_data = data;

tv_data->sbrec_template_var_table =
EN_OVSDB_GET(engine_get_input("SB_template_var", node));
tv_data->ovsrec_ovs_table =
EN_OVSDB_GET(engine_get_input("OVS_open_vswitch", node));
tv_data->sbrec_port_binding_by_name =
engine_ovsdb_node_get_index(engine_get_input("SB_port_binding", node),
"name");
tv_data->sbrec_chassis_by_name =
engine_ovsdb_node_get_index(engine_get_input("SB_chassis", node),
"name");

sset_clear(_data->new);
sset_clear(_data->deleted);
sset_clear(_data->updated);
tv_data->change_tracked = false;
}

>> ---
>>  controller/ovn-controller.c |   41
> -
>>  lib/inc-proc-eng.c  |   19 +++
>>  lib/inc-proc-eng.h  |   19 ++-
>>  3 files changed, 41 insertions(+), 38 deletions(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 18a01bbab..f26d6a9e0 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -1058,7 +1058,7 @@ en_ofctrl_is_connected_run(struct engine_node
> *node, void *data)
>>   *processing to OVS_interface changes but simply mark the node
> status as
>>   *UPDATED (and so the run() and the change handler is the same).
>>   * 2. The iface_table_external_ids_old is computed/updated in the member
>> - *clear_tracked_data(), because that is when the last round of
> processing
>> + *init_run(), because that is when the last round of processing
>>   *has completed but the new IDL data is yet to refresh, so we
> replace the
>>   *old data with the current data. */
>>  struct ed_type_ovs_interface_shadow {
>> @@ -1096,7 +1096,8 @@ en_ovs_interface_shadow_cleanup(void *data_)
>>  }
>>
>>  static void
>> -en_ovs_interface_shadow_clear_tracked_data(void *data_)
>> +en_ovs_interface_shadow_init_run(struct engine_node *node OVS_UNUSED,
>> + void *data_)
>>  {
>>  struct ed_type_ovs_interface_shadow *data = data_;
>>
>  iface_table_external_ids_old_destroy(>iface_table_external_ids_old);
>> @@ -1163,7 +1164,7 @@ en_activated_ports_cleanup(void *data_)
>>  }
>>
>>  static void
>> -en_activated_ports_clear_tracked_data(void *data)
>> +en_activated_ports_init_run(struct engine_node *node OVS_UNUSED, void
> *data)
>>  {
>>  en_activated_ports_cleanup(data);
>>  }
>> @@ -1311,7 +1312,7 @@ struct ed_type_runtime_data {
>>   */
>>
>>  static void
>> -en_runtime_data_clear_tracked_data(void *data_)
>> +en_runtime_data_init_run(struct engine_node *node OVS_UNUSED, void
> *data_)
>>  {
>>  struct ed_type_runtime_data *data = data_;
>>
>> @@ -1669,14 +1670,14 @@ en_addr_sets_init(struct engine_node *node
> OVS_UNUSED,
>>  }
>>
>>  static void
>> -en_addr_sets_clear_tracked_data(void *data)
>> +en_addr_sets_init_run(struct engine_node *node OVS_UNUSED, void *data)
>>  {
>>  struct ed_type_addr_sets *as = data;
>>  sset_clear(>new);
>>  sset_clear(>deleted);
>> -struct shash_node *node;
>> -SHASH_FOR_EACH_SAFE (node, >updated) {
>> -struct addr_set_diff *asd = node->data;
>> +struct shash_node *as_node;
>> +SHASH_FOR_EACH_SAFE (as_node, >updated) {
>> +struct addr_set_diff *asd = as_node->data;
>>  expr_constant_set_destroy(asd->added);
>>  free(asd->added);
>>  expr_constant_set_destroy(asd->deleted);
>> @@ -1689,8 +1690,6 @@ en_addr_sets_clear_tracked_data(void *data)
>>  static void
>>  en_addr_sets_cleanup(void *data)
>>  {
>> -en_addr_sets_clear_tracked_data(data);
>> -
>>  struct ed_type_addr_sets *as = data;
>>  expr_const_sets_destroy(>addr_sets);
>>  shash_destroy(>addr_sets);
>> @@ -1933,7 +1932,7 @@ port_groups_update(const struct
> sbrec_port_group_table *port_group_table,
>>  }
>>
>>  static void
>> -en_port_groups_clear_tracked_data(void *data_)
>> +en_port_groups_init_run(struct engine_node *node OVS_UNUSED, void *data_)
>>  {
>>  struct ed_type_port_groups *pg = data_;
>>  sset_clear(>new);
>> @@ -2078,7 +2077,7 @@ 

[ovs-dev] [PATCH] dpif-netlink: add revalidator for offload of meters

2022-09-23 Thread Simon Horman
From: Yifan Li 

Allow revalidator for hardware offload of meters via OVS-TC.
Without revalidator, tc meter action can not be deleted while
flow exists. The revalidator fix this bug by continuously
checking existing meters and delete the unneeded ones. The
autotest cases of revalidator are also added.

Signed-off-by: Yifan Li 
Signed-off-by: Simon Horman 
---
 lib/dpif-netdev.c|   1 +
 lib/dpif-netlink.c   | 257 +++
 lib/dpif-netlink.h   |   2 +
 lib/dpif-provider.h  |   5 +
 lib/dpif.c   |  15 +-
 lib/id-pool.c|  13 ++
 lib/id-pool.h|   4 +-
 lib/netdev-linux.c   |   6 +
 lib/netdev-offload-tc.c  |  11 +-
 lib/tc.c |   5 -
 lib/tc.h |  10 +-
 ofproto/ofproto-dpif-upcall.c|   5 +
 ofproto/ofproto-dpif.h   |   2 +
 tests/system-offloads-traffic.at |   4 +
 14 files changed, 301 insertions(+), 39 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a45b460145c6..365aacadb03a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9583,6 +9583,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_meter_set,
 dpif_netdev_meter_get,
 dpif_netdev_meter_del,
+NULL,   /* meter_revalidate */
 dpif_netdev_bond_add,
 dpif_netdev_bond_del,
 dpif_netdev_bond_stats_get,
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index a620a6ec52dd..4eee4761dc6b 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -48,6 +49,7 @@
 #include "netlink.h"
 #include "netnsid.h"
 #include "odp-util.h"
+#include "ofproto/ofproto-dpif.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/flow.h"
 #include "openvswitch/hmap.h"
@@ -61,6 +63,7 @@
 #include "packets.h"
 #include "random.h"
 #include "sset.h"
+#include "tc.h"
 #include "timeval.h"
 #include "unaligned.h"
 #include "util.h"
@@ -2571,18 +2574,18 @@ static uint32_t
 dpif_netlink_calculate_n_handlers(void)
 {
 uint32_t total_cores = count_total_cores();
-uint32_t n_handlers = count_cpu_cores();
+uint32_t _n_handlers = count_cpu_cores();
 uint32_t next_prime_num;
 
 /* If not all cores are available to OVS, create additional handler
  * threads to ensure more fair distribution of load between them.
  */
-if (n_handlers < total_cores && total_cores > 2) {
-next_prime_num = next_prime(n_handlers + 1);
-n_handlers = MIN(next_prime_num, total_cores);
+if (_n_handlers < total_cores && total_cores > 2) {
+next_prime_num = next_prime(_n_handlers + 1);
+_n_handlers = MIN(next_prime_num, total_cores);
 }
 
-return n_handlers;
+return _n_handlers;
 }
 
 static int
@@ -2591,17 +2594,17 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
dpif_netlink *dpif)
 {
 int handler_id;
 int error = 0;
-uint32_t n_handlers;
 uint32_t *upcall_pids;
+uint32_t _n_handlers;
 
-n_handlers = dpif_netlink_calculate_n_handlers();
-if (dpif->n_handlers != n_handlers) {
+_n_handlers = dpif_netlink_calculate_n_handlers();
+if (dpif->n_handlers != _n_handlers) {
 VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers",
-   n_handlers);
+   _n_handlers);
 destroy_all_handlers(dpif);
-upcall_pids = xzalloc(n_handlers * sizeof *upcall_pids);
-dpif->handlers = xzalloc(n_handlers * sizeof *dpif->handlers);
-for (handler_id = 0; handler_id < n_handlers; handler_id++) {
+upcall_pids = xzalloc(_n_handlers * sizeof *upcall_pids);
+dpif->handlers = xzalloc(_n_handlers * sizeof *dpif->handlers);
+for (handler_id = 0; handler_id < _n_handlers; handler_id++) {
 struct dpif_handler *handler = >handlers[handler_id];
 error = create_nl_sock(dpif, >sock);
 if (error) {
@@ -2615,9 +2618,9 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
dpif_netlink *dpif)
   handler_id, upcall_pids[handler_id]);
 }
 
-dpif->n_handlers = n_handlers;
+dpif->n_handlers = _n_handlers;
 error = dpif_netlink_set_handler_pids(>dpif, upcall_pids,
-  n_handlers);
+  _n_handlers);
 free(upcall_pids);
 }
 return error;
@@ -2629,7 +2632,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
dpif_netlink *dpif)
  * backing kernel vports. */
 static int
 dpif_netlink_refresh_handlers_vport_dispatch(struct dpif_netlink *dpif,
- uint32_t n_handlers)
+ uint32_t _n_handlers)
 OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 unsigned long int *keep_channels;
@@ 

Re: [ovs-dev] [PATCH v6] bond: Improve bond and lacp visibility

2022-09-23 Thread David Marchand
On Tue, Aug 30, 2022 at 6:05 PM Mike Pattrick  wrote:
>
> Add additional logging for debug and info level with a focus on code
> related to bond members coming up, go down, and changing.
>
> Several existing log messages were modified to handle sub 1kB log
> messages with more grace. Instead of reporting 0kB of traffic load
> shifting from one member to another, we now suppress these messages at
> the INFO level and display exact byte count at the debug level.
>
> Signed-off-by: Mike Pattrick 

[snip]

> @@ -1131,9 +1135,15 @@ log_bals(struct bond *bond, const struct ovs_list 
> *bals)
>  LIST_FOR_EACH (member, bal_node, bals) {
>  if (ds.length) {
>  ds_put_char(, ',');
> +ds_put_cstr(, ", ");

Hum, double ',' here.
This can probably be fixed while applying.


With this fixed, you can add:
Reviewed-by: David Marchand 


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev