The pipapo set is a specific nftables set which holds ip-ranges. When updating the set, the new rules are not visible (to the live ruleset) until the next generation, so we don't need to check the genbit on lookup.
Signed-off-by: Gabriel Goller <[email protected]> --- ...t_pipapo-don-t-check-genbit-from-pac.patch | 160 ++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch diff --git a/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch b/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch new file mode 100644 index 000000000000..be34c0efa7e9 --- /dev/null +++ b/patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch @@ -0,0 +1,160 @@ +From da2f3da272763d4382e4c82d2a6d1358aed10123 Mon Sep 17 00:00:00 2001 +From: Gabriel Goller <[email protected]> +Date: Wed, 10 Sep 2025 12:08:09 +0200 +Subject: [PATCH 1/5] netfilter: nft_set_pipapo: don't check genbit from + packetpath lookups + +The pipapo set type is special in that it has two copies of its +datastructure: one live copy containing only valid elements and one +on-demand clone used during transaction where adds/deletes happen. + +This clone is not visible to the datapath. + +This is unlike all other set types in nftables, those all link new +elements into their live hlist/tree. + +For those sets, the lookup functions must skip the new elements while the +transaction is ongoing to ensure consistency. + +As the clone is shallow, removal does have an effect on the packet path: +once the transaction enters the commit phase the 'gencursor' bit that +determines which elements are active and which elements should be ignored +(because they are no longer valid) is flipped. + +This causes the datapath lookup to ignore these elements if they are found +during lookup. + +This opens up a small race window where pipapo has an inconsistent view of +the dataset from when the transaction-cpu flipped the genbit until the +transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers: + +cpu0 cpu1 + has added new elements to clone + has marked elements as being + inactive in new generation + perform lookup in the set + enters commit phase: + +I) increments the genbit + A) observes new genbit + removes elements from the clone so + they won't be found anymore + B) lookup in datastructure + can't see new elements yet, + but old elements are ignored + -> Only matches elements that + were not changed in the + transaction +II) calls nft_pipapo_commit(), clone + and live pointers are swapped. + C New nft_lookup happening now + will find matching elements. + +Consider a packet matching range r1-r2: + +cpu0 processes following transaction: +1. remove r1-r2 +2. add r1-r3 + +P is contained in both ranges. Therefore, cpu1 should always find a match +for P. Due to above race, this is not the case: + +cpu1 does find r1-r2, but then ignores it due to the genbit indicating +the range has been removed. + +At the same time, r1-r3 is not visible yet, because it can only be found +in the clone. + +The situation persists for all lookups until after cpu0 hits II). + +The fix is easy: Don't check the genbit from pipapo lookup functions. +This is possible because unlike the other set types, the new elements are +not reachable from the live copy of the dataset. + +The clone/live pointer swap is enough to avoid matching on old elements +while at the same time all new elements are exposed in one go. + +After this change, step B above returns a match in r1-r2. +This is fine: r1-r2 only becomes truly invalid the moment they get freed. +This happens after a synchronize_rcu() call and rcu read lock is held +via netfilter hook traversal (nf_hook_slow()). + +Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") +Signed-off-by: Florian Westphal <[email protected]> +Signed-off-by: Gabriel Goller <[email protected]> +--- + net/netfilter/nft_set_pipapo.c | 21 +++++++++++++++++++-- + net/netfilter/nft_set_pipapo_avx2.c | 4 +--- + 2 files changed, 20 insertions(+), 5 deletions(-) + +diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c +index 0529e4ef7520..a386c3761cac 100644 +--- a/net/netfilter/nft_set_pipapo.c ++++ b/net/netfilter/nft_set_pipapo.c +@@ -405,6 +405,24 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, + * + * For more details, see DOC: Theory of Operation. + * ++ * Unlike other set types, this uses NFT_GENMASK_ANY instead of ++ * nft_genmask_cur(). ++ * ++ * This is because new (future) elements are not reachable from ++ * priv->match, they get added to priv->clone instead. ++ * When the commit phase flips the generation bitmask, the ++ * 'now old' entries are skipped but without the 'now current' ++ * elements becoming visible. Using nft_genmask_cur() thus creates ++ * inconsistent state: matching old entries get skipped but thew ++ * newly matching entries are unreachable. ++ * ++ * GENMASK will still find the 'now old' entries which ensures consistent ++ * priv->match view. ++ * ++ * nft_pipapo_commit swaps ->clone and ->match shortly after the ++ * genbit flip. As ->clone doesn't contain the old entries in the first ++ * place, lookup will only find the now-current ones. ++ * + * Return: true on match, false otherwise. + */ + bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, +@@ -413,7 +431,6 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + struct nft_pipapo *priv = nft_set_priv(set); + struct nft_pipapo_scratch *scratch; + unsigned long *res_map, *fill_map; +- u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_field *f; + const u8 *rp = (const u8 *)key; +@@ -471,7 +488,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + if (last) { + *ext = &f->mt[b].e->ext; + if (unlikely(nft_set_elem_expired(*ext) || +- !nft_set_elem_active(*ext, genmask))) ++ !nft_set_elem_active(*ext, NFT_GENMASK_ANY))) + goto next_match; + + /* Last field: we're just returning the key without +diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c +index be7c16c79f71..675c0a9634b6 100644 +--- a/net/netfilter/nft_set_pipapo_avx2.c ++++ b/net/netfilter/nft_set_pipapo_avx2.c +@@ -1151,7 +1151,6 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + { + struct nft_pipapo *priv = nft_set_priv(set); + struct nft_pipapo_scratch *scratch; +- u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_field *f; + const u8 *rp = (const u8 *)key; +@@ -1245,8 +1244,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + + if (last) { + *ext = &f->mt[ret].e->ext; +- if (unlikely(nft_set_elem_expired(*ext) || +- !nft_set_elem_active(*ext, genmask))) { ++ if (unlikely(nft_set_elem_expired(*ext))) { + ret = 0; + goto next_match; + } +-- +2.47.3 + -- 2.47.3 _______________________________________________ pve-devel mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
