On Mon, Oct 27, 2025 at 2:20 PM Dumitru Ceara <[email protected]> wrote:
> On 10/23/25 3:07 PM, Ales Musil via dev wrote: > > On Thu, Oct 23, 2025 at 12:45 PM Ilya Maximets <[email protected]> > wrote: > > > >> While converting inequality matches into sets of equality matches > >> we use the most simple and efficient method - create single bit masks > >> matching on the opposite value of each bit. For example, 'a != 0b0101' > >> where 'a' is a 4-bit field turns into (note, a[0] is the least > >> significant bit, a[3] is the most significant): > >> > >> a[0] != 1 || a[1] != 0 || a[2] != 1 || a[3] != 0 > >> or > >> a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1 > >> > >> This corresponds to the following matches: > >> > >> LSB MSB > >> 0123 > >> original : 1010 > >> a[0] == 0: 0... > >> a[1] == 1: .1.. > >> a[2] == 0: ..0. > >> a[3] == 1: ...1 > >> > >> We can technically preserve part of the mask that is already covered > >> by previous matches like this: > >> > >> a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1 > >> > >> Which just corresponds to filling the upper half of the matrix with > >> the original values: > >> > >> LSB MSB LSB MSB > >> 0123 0123 > >> original : 1010 1010 > >> a[0] == 0: 0... a[0..3] == 0b0100: 0010 > >> a[1] == 1: .1.. a[1..3] == 0b011 : .110 > >> a[2] == 0: ..0. a[2..3] == 0b00 : ..00 > >> a[3] == 1: ...1 a[3] == 0b1 : ...1 > >> > >> The diagonal values are the same in both sets of matches. > >> > >> Both sets of matches are logically equivalent and fully represent the > >> original inequality match. > >> > >> This is true, because if the value 'a' equals 0b0101, then it will not > >> match any of the rules in the set, as each of them has at least one > >> bit different from 0b0101 (diagonal ones). And if the value 'a' is > >> different from 0b0101, then we can find the first bit that is different > >> starting from the most significant, all the higher ones will be the > >> same as in 0b0101. And we have a rule that matches on exactly that by > >> design. > >> > >> Now, since every match in the second set has a contiguous mask ending > >> at the most significant bit, we can re-write it using a CIDR-like > >> notation: > >> > >> a == 0b0100/4 > >> a == 0b0110/3 > >> a == 0b0000/2 > >> a == 0b1000/1 > >> > >> Or: a == 0b0100/4 || a == 0b0110/3 || a == 0b0000/2 || a == 0b1000/1 > >> > >> Why does it matter? > >> > >> Having larger masks allows much easier elimination of incompatible > >> sub-expressions during the normalization process. > >> > >> Let's say we have the following expression: > >> > >> a != 0b1000/2 && a != 0b0101/4 > >> > >> Let's normalize it by swapping the inequality with equality sets using > >> the single-bit method: > >> > >> (a[2] == 1 || a[3] == 0) && > >> (a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1) > >> > >> Opening the parenthesis: > >> > >> a[2] == 1 && a[0] == 0 || > >> a[2] == 1 && a[1] == 1 || > >> a[2] == 1 && a[2] == 0 || < Inconsistent > >> a[2] == 1 && a[3] == 1 || > >> a[3] == 0 && a[0] == 0 || > >> a[3] == 0 && a[1] == 1 || > >> a[3] == 0 && a[2] == 0 || > >> a[3] == 0 && a[3] == 1 < Inconsistent > >> > >> We can see that two sub-expressions can be eliminated as they are > >> always false. Only the sub-expressions that match on the same bit > >> can be eliminated at this stage, the rest must be processed further. > >> > >> With the prefix method we get: > >> > >> (a[2..3] == 0b11 || a[3] == 0) && > >> (a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1) > >> > >> In a bit easier to read form: > >> > >> (a == ..11 || a == ...0) && > >> (a == 0010 || a == .110 || a == ..00 || a == ...1) > >> > >> Opening the parenthesis: > >> > >> LSB MSB LSB MSB > >> 0123 0123 > >> a == ..11 && a == 0010 || < Inconsistent > >> a == ..11 && a == .110 || < Inconsistent > >> a == ..11 && a == ..00 || < Inconsistent > >> a == ..11 && a == ...1 || > >> a == ...0 && a == 0010 || > >> a == ...0 && a == .110 || > >> a == ...0 && a == ..00 || > >> a == ...0 && a == ...1 < Inconsistent > >> > >> Now four sub-expressions are always false instead of just two. And we > >> can see that if we had more bits matched in the second inequality > >> match instead of 4, then all the rest of 'a == ..11 && ...' > >> sub-expressions would also be always false. By not dealing with all > >> possible bit masks and reducing the problem to a much smaller space > >> of prefix matches, we're significantly increasing chances of resulted > >> sub-expressions to end up internally inconsistent and discarded. > >> > >> This greatly reduces the amount of sub-expressions that we need to > >> process on the next steps. Removal of duplicates and crushing of > >> supersets will get rid of most of the unnecessary matches in the case > >> of single-bit method, but not all of them, since we do not have a lot > >> of context to track dependencies between different sub-expressions. > >> And it's also much more expensive to do so, since superset crushing > >> has quadratic complexity in the worst case, which grows pretty fast > >> with the amount of input sub-expressions. More we can eliminate > >> before this stage, the better. > >> > >> In practice, using the prefix method provides a huge performance > >> boost while processing inequality matches. For example, normalizing > >> the inequality match on the following set of 15 random IP addresses > >> takes about 28 minutes with a single bit approach, generating 60519 > >> OpenFlow rules: > >> > >> ip4.dst != { > >> 179.141.79.238/32, 23.87.193.64/32, 240.238.112.253/32, > >> 42.140.78.76/32, 243.50.68.23/32, 229.151.116.21/32, > >> 201.156.140.194/32, 94.49.119.92/32, 133.167.173.49/32, > >> 51.199.153.252/32, 214.80.176.168/32, 134.195.2.51/32, > >> 68.18.121.185/32, 118.53.215.77/32, 203.42.132.181/32, > >> } > >> > >> With the prefixes, the same task takes just about 4 milliseconds, > >> generating only 412 OpenFlow rules. Generated rules match the full > >> negation of the set performed with python netaddr library: > >> > >> from netaddr import IPSet > >> list((IPSet(['0.0.0.0/0']) - IPSet(exclusions)).iter_cidrs()) > >> > >> Additional bonus is that prefix matches are much friendlier to the > >> OpenFlow classifier in OVS that can now utilize the prefix match > >> optimizations for some of these rules, potentially reducing the amount > >> of datapath flows. > >> > >> The superset crushing logic is still needed to get rid of the extra > >> rules when processing sets with particularly diverse masks and > >> custom bit matches, so keeping it as it is. New tests added to > >> substitute the reduced test coverage for this functionality. > >> > >> The prefixes approach is still not a silver bullet and it will take > >> noticeable amount of time on very large sets of random IPs with > >> inequality matches, but we're moving the scalability pole way further > >> with a relatively minor change. For example, it will take 2.7 seconds > >> to normalize inequality match on a set with 200 random /32 addresses > >> and 38 seconds on a set with 500 random /32 addresses. In real world > >> applications the speed should be much higher as IPs are normally not > >> that random. With the single-bit method, normalization of such sets > >> is practically impossible. > >> > >> If we'll need further improvements for handling very large sets, we'll > >> likely need to special-case the inequality checks for constant sets in > >> curly braces that have CIDR-like masks. This way we could negate the > >> set as a whole, which is much simpler computationally, but a fair bit > >> heavier code-wise. This approach would not handle expressions without > >> curly-braced sets or with multiples of them and will not handle > >> different types of masks, while generic conversion of each negation > >> into a set of prefix matches doesn't require any specific format of > >> the original expression and works for any masks, even if they are not > >> contiguous. > >> > >> Reported-at: https://issues.redhat.com/browse/FDP-1952 > >> Signed-off-by: Ilya Maximets <[email protected]> > >> --- > >> > >> As usual, the fact that it's easy to break the cluster by configuring > >> a single not-that-large ACL is sort of a "performance bug", which are > >> always hard to classify. So I'll leave it up for maintainers to decide > >> if they want to backport this down to LTS. > >> > >> FWIW, in comparison with the previous superset crushing patch that was > >> in a similar position, this change is very simple in both logic and the > >> code, ignoring the fact that I had to explain why it works in 200 lines > >> of the commit message. :D > >> > >> lib/expr.c | 9 ++++--- > >> tests/ovn.at | 68 ++++++++++++++++++++++++++++++++++------------------ > >> 2 files changed, 51 insertions(+), 26 deletions(-) > >> > >> diff --git a/lib/expr.c b/lib/expr.c > >> index e01cd028f..288e245c6 100644 > >> --- a/lib/expr.c > >> +++ b/lib/expr.c > >> @@ -2139,9 +2139,12 @@ expr_simplify_ne(struct expr *expr) > >> e->type = EXPR_T_CMP; > >> e->cmp.symbol = expr->cmp.symbol; > >> e->cmp.relop = EXPR_R_EQ; > >> - bitwise_put_bit(&e->cmp.value, sizeof e->cmp.value, i, > >> - !bitwise_get_bit(value, sizeof *value, i)); > >> - bitwise_put1(&e->cmp.mask, sizeof e->cmp.mask, i); > >> + > >> + bitwise_copy(mask, sizeof *mask, i, > >> + &e->cmp.mask, sizeof e->cmp.mask, i, w - i); > >> + bitwise_copy(value, sizeof *value, i, > >> + &e->cmp.value, sizeof e->cmp.value, i, w - i); > >> + bitwise_toggle_bit(&e->cmp.value, sizeof e->cmp.value, i); > >> > >> new = expr_combine(EXPR_T_OR, new, e); > >> } > >> diff --git a/tests/ovn.at b/tests/ovn.at > >> index f8ad98917..df77477b9 100644 > >> --- a/tests/ovn.at > >> +++ b/tests/ovn.at > >> @@ -444,10 +444,10 @@ AT_CHECK([simplify 'tcp.dst <= 65535'], [0], > >> [ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd) > >> ]) > >> AT_CHECK([simplify 'tcp.dst > 0'], [0], > >> - [[(tcp.dst[0] || tcp.dst[1] || tcp.dst[2] || tcp.dst[3] || > tcp.dst[4] > >> || tcp.dst[5] || tcp.dst[6] || tcp.dst[7] || tcp.dst[8] || tcp.dst[9] || > >> tcp.dst[10] || tcp.dst[11] || tcp.dst[12] || tcp.dst[13] || tcp.dst[14] > || > >> tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || eth.type == > 0x86dd) > >> + [[(tcp.dst == 0x1 || tcp.dst[1..15] == 0x1 || tcp.dst[2..15] == 0x1 > >> || tcp.dst[3..15] == 0x1 || tcp.dst[4..15] == 0x1 || tcp.dst[5..15] == > 0x1 > >> || tcp.dst[6..15] == 0x1 || tcp.dst[7..15] == 0x1 || tcp.dst[8..15] == > 0x1 > >> || tcp.dst[9..15] == 0x1 || tcp.dst[10..15] == 0x1 || tcp.dst[11..15] == > >> 0x1 || tcp.dst[12..15] == 0x1 || tcp.dst[13..15] == 0x1 || > tcp.dst[14..15] > >> == 0x1 || tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || > eth.type > >> == 0x86dd) > >> ]]) > >> AT_CHECK([simplify 'tcp.dst < 65535'], [0], > >> - [[(!tcp.dst[0] || !tcp.dst[1] || !tcp.dst[2] || !tcp.dst[3] || > >> !tcp.dst[4] || !tcp.dst[5] || !tcp.dst[6] || !tcp.dst[7] || !tcp.dst[8] > || > >> !tcp.dst[9] || !tcp.dst[10] || !tcp.dst[11] || !tcp.dst[12] || > !tcp.dst[13] > >> || !tcp.dst[14] || !tcp.dst[15]) && ip.proto == 0x6 && (eth.type == > 0x800 > >> || eth.type == 0x86dd) > >> + [[(tcp.dst == 0xfffe || tcp.dst[1..15] == 0x7ffe || tcp.dst[2..15] > == > >> 0x3ffe || tcp.dst[3..15] == 0x1ffe || tcp.dst[4..15] == 0xffe || > >> tcp.dst[5..15] == 0x7fe || tcp.dst[6..15] == 0x3fe || tcp.dst[7..15] == > >> 0x1fe || tcp.dst[8..15] == 0xfe || tcp.dst[9..15] == 0x7e || > >> tcp.dst[10..15] == 0x3e || tcp.dst[11..15] == 0x1e || tcp.dst[12..15] == > >> 0xe || tcp.dst[13..15] == 0x6 || tcp.dst[14..15] == 0x2 || > !tcp.dst[15]) && > >> ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd) > >> ]]) > >> AT_CLEANUP > >> > >> @@ -569,6 +569,27 @@ AT_CHECK([expr_to_flow 'inport == "eth0" && inport > == > >> "eth1"'], [0], [dnl > >> ]) > >> AT_CLEANUP > >> > >> +AT_SETUP([converting expressions to flows -- OR supersets]) > >> +AT_KEYWORDS([expression]) > >> +expr_to_flow () { > >> + echo "$1" | ovstest test-ovn expr-to-flows | sort > >> +} > >> +AT_CHECK([expr_to_flow 'dnl > >> + (ip4.src[[8]] == 0 || ip4.src[[9]] == 1 || ip4.src[[10]] == 0) && > dnl > >> + (ip4.src[[8]] == 0 || ip4.src[[9]] == 0 || ip4.src[[10]] == 1)'] , > >> [0], [dnl > >> +ip,nw_src=0.0.0.0/0.0.1.0 > >> +ip,nw_src=0.0.0.0/0.0.6.0 > >> +ip,nw_src=0.0.6.0/0.0.6.0 > >> +]) > >> +AT_CHECK([expr_to_flow 'dnl > >> + (ip4.src[[8]] != 0 || ip4.src[[9]] != 1 || ip4.src[[10]] != 0) && > dnl > >> + (ip4.src[[8]] != 0 || ip4.src[[9]] != 0 || ip4.src[[10]] != 1)'] , > >> [0], [dnl > >> +ip,nw_src=0.0.0.0/0.0.6.0 > >> +ip,nw_src=0.0.1.0/0.0.1.0 > >> +ip,nw_src=0.0.6.0/0.0.6.0 > >> +]) > >> +AT_CLEANUP > >> + > >> AT_SETUP([converting expressions to flows -- address sets]) > >> AT_KEYWORDS([expression]) > >> expr_to_flow () { > >> @@ -637,24 +658,24 @@ AT_CHECK([expr_to_flow 'ip4.src != {$set4}'], [0], > >> [dnl > >> > >> ]) > >> AT_CHECK([expr_to_flow 'ip4.src != {1.0.0.0/8, $set4}'], [0], [dnl > >> -ip,nw_src=0.0.0.0/1.0.0.0 > >> +ip,nw_src=0.0.0.0/8 > >> ip,nw_src=128.0.0.0/1 > >> -ip,nw_src=16.0.0.0/16.0.0.0 > >> -ip,nw_src=2.0.0.0/2.0.0.0 > >> -ip,nw_src=32.0.0.0/32.0.0.0 > >> -ip,nw_src=4.0.0.0/4.0.0.0 > >> -ip,nw_src=64.0.0.0/64.0.0.0 > >> -ip,nw_src=8.0.0.0/8.0.0.0 > >> +ip,nw_src=16.0.0.0/4 > >> +ip,nw_src=2.0.0.0/7 > >> +ip,nw_src=32.0.0.0/3 > >> +ip,nw_src=4.0.0.0/6 > >> +ip,nw_src=64.0.0.0/2 > >> +ip,nw_src=8.0.0.0/5 > >> ]) > >> AT_CHECK([expr_to_flow 'ip4.src != 1.0.0.0/8 && ip4.src != {$set4}'], > >> [0], [dnl > >> -ip,nw_src=0.0.0.0/1.0.0.0 > >> +ip,nw_src=0.0.0.0/8 > >> ip,nw_src=128.0.0.0/1 > >> -ip,nw_src=16.0.0.0/16.0.0.0 > >> -ip,nw_src=2.0.0.0/2.0.0.0 > >> -ip,nw_src=32.0.0.0/32.0.0.0 > >> -ip,nw_src=4.0.0.0/4.0.0.0 > >> -ip,nw_src=64.0.0.0/64.0.0.0 > >> -ip,nw_src=8.0.0.0/8.0.0.0 > >> +ip,nw_src=16.0.0.0/4 > >> +ip,nw_src=2.0.0.0/7 > >> +ip,nw_src=32.0.0.0/3 > >> +ip,nw_src=4.0.0.0/6 > >> +ip,nw_src=64.0.0.0/2 > >> +ip,nw_src=8.0.0.0/5 > >> ]) > >> AT_CHECK([expr_to_flow 'ip4.dst == 172.27.0.65 && ip4.src == $set1 && > >> ip4.dst != 10.128.0.0/14'], [0], [dnl > >> ip,nw_src=10.0.0.1,nw_dst=172.27.0.65 > >> @@ -662,17 +683,18 @@ ip,nw_src=10.0.0.2,nw_dst=172.27.0.65 > >> ip,nw_src=10.0.0.3,nw_dst=172.27.0.65 > >> ]) > >> AT_CHECK([expr_to_flow 'ip4.src == 172.168.13.0/16 && ip4.src != { > >> 172.168.13.0/24, 172.168.14.0/24}' <http://172.168.14.0/24%7D'>], [0], > >> [dnl > >> -ip,nw_src=172.168.0.0/255.255.3.0 > >> -ip,nw_src=172.168.0.0/255.255.4.0 > >> -ip,nw_src=172.168.0.0/255.255.8.0 > >> +ip,nw_src=172.168.0.0/21 > >> +ip,nw_src=172.168.12.0/24 > >> ip,nw_src=172.168.128.0/17 > >> -ip,nw_src=172.168.16.0/255.255.16.0 > >> -ip,nw_src=172.168.3.0/255.255.3.0 > >> -ip,nw_src=172.168.32.0/255.255.32.0 > >> -ip,nw_src=172.168.64.0/255.255.64.0 > >> +ip,nw_src=172.168.15.0/24 > >> +ip,nw_src=172.168.16.0/20 > >> +ip,nw_src=172.168.32.0/19 > >> +ip,nw_src=172.168.64.0/18 > >> +ip,nw_src=172.168.8.0/22 > >> ]) > >> dnl Negative match flow explosion. > >> AT_CHECK([test $(expr_to_flow 'ip4.src != {172.168.13.0/24, > >> 172.168.14.0/24, 172.168.15.0/24}' | wc -l) -le 30]) > >> +AT_CHECK([test $(expr_to_flow 'ip4.dst != {179.141.79.238/32, > >> 23.87.193.64/32, 240.238.112.253/32}' | wc -l) -le 100]) > >> AT_CLEANUP > >> > >> AT_SETUP([converting expressions to flows -- port groups]) > >> -- > >> 2.51.0 > >> > >> _______________________________________________ > >> dev mailing list > >> [email protected] > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > >> > > Thank you Ilya, > > that's a really great improvement, and I would vote to backport this > change. > > > > Acked-by: Ales Musil <[email protected]> > > Hi Ilya, Ales, Lorenzo, > > I agree, this is a huge improvement! I read the commit message a bunch > of times to make sure I understand how it works and I couldn't find any > reason to think that the change is wrong. The patch in general looks > good to me: > > Acked-by: Dumitru Ceara <[email protected]> > > I completely agree with the backport too. > > Regards, > Dumitru > > Thank you Ilya, Lorenzo and Dumitru, I went ahead, merged this into main and backported all the way down to 24.03. Regards, Ales _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
