On Mon, Oct 27, 2025 at 2:20 PM Dumitru Ceara <[email protected]> wrote:

> On 10/23/25 3:07 PM, Ales Musil via dev wrote:
> > On Thu, Oct 23, 2025 at 12:45 PM Ilya Maximets <[email protected]>
> wrote:
> >
> >> While converting inequality matches into sets of equality matches
> >> we use the most simple and efficient method - create single bit masks
> >> matching on the opposite value of each bit.  For example, 'a != 0b0101'
> >> where 'a' is a 4-bit field turns into (note, a[0] is the least
> >> significant bit, a[3] is the most significant):
> >>
> >>   a[0] != 1 || a[1] != 0 || a[2] != 1 || a[3] != 0
> >> or
> >>   a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1
> >>
> >> This corresponds to the following matches:
> >>
> >>            LSB    MSB
> >>               0123
> >>   original :  1010
> >>   a[0] == 0:  0...
> >>   a[1] == 1:  .1..
> >>   a[2] == 0:  ..0.
> >>   a[3] == 1:  ...1
> >>
> >> We can technically preserve part of the mask that is already covered
> >> by previous matches like this:
> >>
> >>   a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1
> >>
> >> Which just corresponds to filling the upper half of the matrix with
> >> the original values:
> >>
> >>            LSB    MSB                   LSB    MSB
> >>               0123                         0123
> >>   original :  1010                         1010
> >>   a[0] == 0:  0...     a[0..3] == 0b0100:  0010
> >>   a[1] == 1:  .1..     a[1..3] == 0b011 :  .110
> >>   a[2] == 0:  ..0.     a[2..3] == 0b00  :  ..00
> >>   a[3] == 1:  ...1     a[3]    == 0b1   :  ...1
> >>
> >> The diagonal values are the same in both sets of matches.
> >>
> >> Both sets of matches are logically equivalent and fully represent the
> >> original inequality match.
> >>
> >> This is true, because if the value 'a' equals 0b0101, then it will not
> >> match any of the rules in the set, as each of them has at least one
> >> bit different from 0b0101 (diagonal ones).  And if the value 'a' is
> >> different from 0b0101, then we can find the first bit that is different
> >> starting from the most significant, all the higher ones will be the
> >> same as in 0b0101.  And we have a rule that matches on exactly that by
> >> design.
> >>
> >> Now, since every match in the second set has a contiguous mask ending
> >> at the most significant bit, we can re-write it using a CIDR-like
> >> notation:
> >>
> >>   a == 0b0100/4
> >>   a == 0b0110/3
> >>   a == 0b0000/2
> >>   a == 0b1000/1
> >>
> >> Or: a == 0b0100/4 || a == 0b0110/3 || a == 0b0000/2 || a == 0b1000/1
> >>
> >> Why does it matter?
> >>
> >> Having larger masks allows much easier elimination of incompatible
> >> sub-expressions during the normalization process.
> >>
> >> Let's say we have the following expression:
> >>
> >>  a != 0b1000/2 && a != 0b0101/4
> >>
> >> Let's normalize it by swapping the inequality with equality sets using
> >> the single-bit method:
> >>
> >>  (a[2] == 1 || a[3] == 0) &&
> >>  (a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1)
> >>
> >> Opening the parenthesis:
> >>
> >>  a[2] == 1 && a[0] == 0 ||
> >>  a[2] == 1 && a[1] == 1 ||
> >>  a[2] == 1 && a[2] == 0 ||   < Inconsistent
> >>  a[2] == 1 && a[3] == 1 ||
> >>  a[3] == 0 && a[0] == 0 ||
> >>  a[3] == 0 && a[1] == 1 ||
> >>  a[3] == 0 && a[2] == 0 ||
> >>  a[3] == 0 && a[3] == 1      < Inconsistent
> >>
> >> We can see that two sub-expressions can be eliminated as they are
> >> always false.  Only the sub-expressions that match on the same bit
> >> can be eliminated at this stage, the rest must be processed further.
> >>
> >> With the prefix method we get:
> >>
> >>  (a[2..3] == 0b11 || a[3] == 0) &&
> >>  (a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1)
> >>
> >> In a bit easier to read form:
> >>
> >>  (a == ..11 || a == ...0) &&
> >>  (a == 0010 || a == .110 || a == ..00 || a == ...1)
> >>
> >> Opening the parenthesis:
> >>
> >>    LSB    MSB     LSB    MSB
> >>       0123           0123
> >>  a == ..11  &&  a == 0010 ||  < Inconsistent
> >>  a == ..11  &&  a == .110 ||  < Inconsistent
> >>  a == ..11  &&  a == ..00 ||  < Inconsistent
> >>  a == ..11  &&  a == ...1 ||
> >>  a == ...0  &&  a == 0010 ||
> >>  a == ...0  &&  a == .110 ||
> >>  a == ...0  &&  a == ..00 ||
> >>  a == ...0  &&  a == ...1     < Inconsistent
> >>
> >> Now four sub-expressions are always false instead of just two.  And we
> >> can see that if we had more bits matched in the second inequality
> >> match instead of 4, then all the rest of 'a == ..11 && ...'
> >> sub-expressions would also be always false.  By not dealing with all
> >> possible bit masks and reducing the problem to a much smaller space
> >> of prefix matches, we're significantly increasing chances of resulted
> >> sub-expressions to end up internally inconsistent and discarded.
> >>
> >> This greatly reduces the amount of sub-expressions that we need to
> >> process on the next steps.  Removal of duplicates and crushing of
> >> supersets will get rid of most of the unnecessary matches in the case
> >> of single-bit method, but not all of them, since we do not have a lot
> >> of context to track dependencies between different sub-expressions.
> >> And it's also much more expensive to do so, since superset crushing
> >> has quadratic complexity in the worst case, which grows pretty fast
> >> with the amount of input sub-expressions.  More we can eliminate
> >> before this stage, the better.
> >>
> >> In practice, using the prefix method provides a huge performance
> >> boost while processing inequality matches.  For example, normalizing
> >> the inequality match on the following set of 15 random IP addresses
> >> takes about 28 minutes with a single bit approach, generating 60519
> >> OpenFlow rules:
> >>
> >>   ip4.dst != {
> >>     179.141.79.238/32,  23.87.193.64/32,   240.238.112.253/32,
> >>     42.140.78.76/32,    243.50.68.23/32,   229.151.116.21/32,
> >>     201.156.140.194/32, 94.49.119.92/32,   133.167.173.49/32,
> >>     51.199.153.252/32,  214.80.176.168/32, 134.195.2.51/32,
> >>     68.18.121.185/32,   118.53.215.77/32,  203.42.132.181/32,
> >>   }
> >>
> >> With the prefixes, the same task takes just about 4 milliseconds,
> >> generating only 412 OpenFlow rules.  Generated rules match the full
> >> negation of the set performed with python netaddr library:
> >>
> >>   from netaddr import IPSet
> >>   list((IPSet(['0.0.0.0/0']) - IPSet(exclusions)).iter_cidrs())
> >>
> >> Additional bonus is that prefix matches are much friendlier to the
> >> OpenFlow classifier in OVS that can now utilize the prefix match
> >> optimizations for some of these rules, potentially reducing the amount
> >> of datapath flows.
> >>
> >> The superset crushing logic is still needed to get rid of the extra
> >> rules when processing sets with particularly diverse masks and
> >> custom bit matches, so keeping it as it is.  New tests added to
> >> substitute the reduced test coverage for this functionality.
> >>
> >> The prefixes approach is still not a silver bullet and it will take
> >> noticeable amount of time on very large sets of random IPs with
> >> inequality matches, but we're moving the scalability pole way further
> >> with a relatively minor change.  For example, it will take 2.7 seconds
> >> to normalize inequality match on a set with 200 random /32 addresses
> >> and 38 seconds on a set with 500 random /32 addresses.  In real world
> >> applications the speed should be much higher as IPs are normally not
> >> that random.  With the single-bit method, normalization of such sets
> >> is practically impossible.
> >>
> >> If we'll need further improvements for handling very large sets, we'll
> >> likely need to special-case the inequality checks for constant sets in
> >> curly braces that have CIDR-like masks.  This way we could negate the
> >> set as a whole, which is much simpler computationally, but a fair bit
> >> heavier code-wise.  This approach would not handle expressions without
> >> curly-braced sets or with multiples of them and will not handle
> >> different types of masks, while generic conversion of each negation
> >> into a set of prefix matches doesn't require any specific format of
> >> the original expression and works for any masks, even if they are not
> >> contiguous.
> >>
> >> Reported-at: https://issues.redhat.com/browse/FDP-1952
> >> Signed-off-by: Ilya Maximets <[email protected]>
> >> ---
> >>
> >> As usual, the fact that it's easy to break the cluster by configuring
> >> a single not-that-large ACL is sort of a "performance bug", which are
> >> always hard to classify.  So I'll leave it up for maintainers to decide
> >> if they want to backport this down to LTS.
> >>
> >> FWIW, in comparison with the previous superset crushing patch that was
> >> in a similar position, this change is very simple in both logic and the
> >> code, ignoring the fact that I had to explain why it works in 200 lines
> >> of the commit message. :D
> >>
> >>  lib/expr.c   |  9 ++++---
> >>  tests/ovn.at | 68 ++++++++++++++++++++++++++++++++++------------------
> >>  2 files changed, 51 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/lib/expr.c b/lib/expr.c
> >> index e01cd028f..288e245c6 100644
> >> --- a/lib/expr.c
> >> +++ b/lib/expr.c
> >> @@ -2139,9 +2139,12 @@ expr_simplify_ne(struct expr *expr)
> >>          e->type = EXPR_T_CMP;
> >>          e->cmp.symbol = expr->cmp.symbol;
> >>          e->cmp.relop = EXPR_R_EQ;
> >> -        bitwise_put_bit(&e->cmp.value, sizeof e->cmp.value, i,
> >> -                        !bitwise_get_bit(value, sizeof *value, i));
> >> -        bitwise_put1(&e->cmp.mask, sizeof e->cmp.mask, i);
> >> +
> >> +        bitwise_copy(mask, sizeof *mask, i,
> >> +                     &e->cmp.mask, sizeof e->cmp.mask, i, w - i);
> >> +        bitwise_copy(value, sizeof *value, i,
> >> +                     &e->cmp.value, sizeof e->cmp.value, i, w - i);
> >> +        bitwise_toggle_bit(&e->cmp.value, sizeof e->cmp.value, i);
> >>
> >>          new = expr_combine(EXPR_T_OR, new, e);
> >>      }
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index f8ad98917..df77477b9 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -444,10 +444,10 @@ AT_CHECK([simplify 'tcp.dst <= 65535'], [0],
> >>      [ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
> >>  ])
> >>  AT_CHECK([simplify 'tcp.dst > 0'], [0],
> >> -    [[(tcp.dst[0] || tcp.dst[1] || tcp.dst[2] || tcp.dst[3] ||
> tcp.dst[4]
> >> || tcp.dst[5] || tcp.dst[6] || tcp.dst[7] || tcp.dst[8] || tcp.dst[9] ||
> >> tcp.dst[10] || tcp.dst[11] || tcp.dst[12] || tcp.dst[13] || tcp.dst[14]
> ||
> >> tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || eth.type ==
> 0x86dd)
> >> +    [[(tcp.dst == 0x1 || tcp.dst[1..15] == 0x1 || tcp.dst[2..15] == 0x1
> >> || tcp.dst[3..15] == 0x1 || tcp.dst[4..15] == 0x1 || tcp.dst[5..15] ==
> 0x1
> >> || tcp.dst[6..15] == 0x1 || tcp.dst[7..15] == 0x1 || tcp.dst[8..15] ==
> 0x1
> >> || tcp.dst[9..15] == 0x1 || tcp.dst[10..15] == 0x1 || tcp.dst[11..15] ==
> >> 0x1 || tcp.dst[12..15] == 0x1 || tcp.dst[13..15] == 0x1 ||
> tcp.dst[14..15]
> >> == 0x1 || tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 ||
> eth.type
> >> == 0x86dd)
> >>  ]])
> >>  AT_CHECK([simplify 'tcp.dst < 65535'], [0],
> >> -    [[(!tcp.dst[0] || !tcp.dst[1] || !tcp.dst[2] || !tcp.dst[3] ||
> >> !tcp.dst[4] || !tcp.dst[5] || !tcp.dst[6] || !tcp.dst[7] || !tcp.dst[8]
> ||
> >> !tcp.dst[9] || !tcp.dst[10] || !tcp.dst[11] || !tcp.dst[12] ||
> !tcp.dst[13]
> >> || !tcp.dst[14] || !tcp.dst[15]) && ip.proto == 0x6 && (eth.type ==
> 0x800
> >> || eth.type == 0x86dd)
> >> +    [[(tcp.dst == 0xfffe || tcp.dst[1..15] == 0x7ffe || tcp.dst[2..15]
> ==
> >> 0x3ffe || tcp.dst[3..15] == 0x1ffe || tcp.dst[4..15] == 0xffe ||
> >> tcp.dst[5..15] == 0x7fe || tcp.dst[6..15] == 0x3fe || tcp.dst[7..15] ==
> >> 0x1fe || tcp.dst[8..15] == 0xfe || tcp.dst[9..15] == 0x7e ||
> >> tcp.dst[10..15] == 0x3e || tcp.dst[11..15] == 0x1e || tcp.dst[12..15] ==
> >> 0xe || tcp.dst[13..15] == 0x6 || tcp.dst[14..15] == 0x2 ||
> !tcp.dst[15]) &&
> >> ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
> >>  ]])
> >>  AT_CLEANUP
> >>
> >> @@ -569,6 +569,27 @@ AT_CHECK([expr_to_flow 'inport == "eth0" && inport
> ==
> >> "eth1"'], [0], [dnl
> >>  ])
> >>  AT_CLEANUP
> >>
> >> +AT_SETUP([converting expressions to flows -- OR supersets])
> >> +AT_KEYWORDS([expression])
> >> +expr_to_flow () {
> >> +    echo "$1" | ovstest test-ovn expr-to-flows | sort
> >> +}
> >> +AT_CHECK([expr_to_flow 'dnl
> >> +    (ip4.src[[8]] == 0 || ip4.src[[9]] == 1 || ip4.src[[10]] == 0) &&
> dnl
> >> +    (ip4.src[[8]] == 0 || ip4.src[[9]] == 0 || ip4.src[[10]] == 1)'] ,
> >> [0], [dnl
> >> +ip,nw_src=0.0.0.0/0.0.1.0
> >> +ip,nw_src=0.0.0.0/0.0.6.0
> >> +ip,nw_src=0.0.6.0/0.0.6.0
> >> +])
> >> +AT_CHECK([expr_to_flow 'dnl
> >> +    (ip4.src[[8]] != 0 || ip4.src[[9]] != 1 || ip4.src[[10]] != 0) &&
> dnl
> >> +    (ip4.src[[8]] != 0 || ip4.src[[9]] != 0 || ip4.src[[10]] != 1)'] ,
> >> [0], [dnl
> >> +ip,nw_src=0.0.0.0/0.0.6.0
> >> +ip,nw_src=0.0.1.0/0.0.1.0
> >> +ip,nw_src=0.0.6.0/0.0.6.0
> >> +])
> >> +AT_CLEANUP
> >> +
> >>  AT_SETUP([converting expressions to flows -- address sets])
> >>  AT_KEYWORDS([expression])
> >>  expr_to_flow () {
> >> @@ -637,24 +658,24 @@ AT_CHECK([expr_to_flow 'ip4.src != {$set4}'], [0],
> >> [dnl
> >>
> >>  ])
> >>  AT_CHECK([expr_to_flow 'ip4.src != {1.0.0.0/8, $set4}'], [0], [dnl
> >> -ip,nw_src=0.0.0.0/1.0.0.0
> >> +ip,nw_src=0.0.0.0/8
> >>  ip,nw_src=128.0.0.0/1
> >> -ip,nw_src=16.0.0.0/16.0.0.0
> >> -ip,nw_src=2.0.0.0/2.0.0.0
> >> -ip,nw_src=32.0.0.0/32.0.0.0
> >> -ip,nw_src=4.0.0.0/4.0.0.0
> >> -ip,nw_src=64.0.0.0/64.0.0.0
> >> -ip,nw_src=8.0.0.0/8.0.0.0
> >> +ip,nw_src=16.0.0.0/4
> >> +ip,nw_src=2.0.0.0/7
> >> +ip,nw_src=32.0.0.0/3
> >> +ip,nw_src=4.0.0.0/6
> >> +ip,nw_src=64.0.0.0/2
> >> +ip,nw_src=8.0.0.0/5
> >>  ])
> >>  AT_CHECK([expr_to_flow 'ip4.src != 1.0.0.0/8 && ip4.src != {$set4}'],
> >> [0], [dnl
> >> -ip,nw_src=0.0.0.0/1.0.0.0
> >> +ip,nw_src=0.0.0.0/8
> >>  ip,nw_src=128.0.0.0/1
> >> -ip,nw_src=16.0.0.0/16.0.0.0
> >> -ip,nw_src=2.0.0.0/2.0.0.0
> >> -ip,nw_src=32.0.0.0/32.0.0.0
> >> -ip,nw_src=4.0.0.0/4.0.0.0
> >> -ip,nw_src=64.0.0.0/64.0.0.0
> >> -ip,nw_src=8.0.0.0/8.0.0.0
> >> +ip,nw_src=16.0.0.0/4
> >> +ip,nw_src=2.0.0.0/7
> >> +ip,nw_src=32.0.0.0/3
> >> +ip,nw_src=4.0.0.0/6
> >> +ip,nw_src=64.0.0.0/2
> >> +ip,nw_src=8.0.0.0/5
> >>  ])
> >>  AT_CHECK([expr_to_flow 'ip4.dst == 172.27.0.65 && ip4.src == $set1 &&
> >> ip4.dst != 10.128.0.0/14'], [0], [dnl
> >>  ip,nw_src=10.0.0.1,nw_dst=172.27.0.65
> >> @@ -662,17 +683,18 @@ ip,nw_src=10.0.0.2,nw_dst=172.27.0.65
> >>  ip,nw_src=10.0.0.3,nw_dst=172.27.0.65
> >>  ])
> >>  AT_CHECK([expr_to_flow 'ip4.src == 172.168.13.0/16 && ip4.src != {
> >> 172.168.13.0/24, 172.168.14.0/24}' <http://172.168.14.0/24%7D'>], [0],
> >> [dnl
> >> -ip,nw_src=172.168.0.0/255.255.3.0
> >> -ip,nw_src=172.168.0.0/255.255.4.0
> >> -ip,nw_src=172.168.0.0/255.255.8.0
> >> +ip,nw_src=172.168.0.0/21
> >> +ip,nw_src=172.168.12.0/24
> >>  ip,nw_src=172.168.128.0/17
> >> -ip,nw_src=172.168.16.0/255.255.16.0
> >> -ip,nw_src=172.168.3.0/255.255.3.0
> >> -ip,nw_src=172.168.32.0/255.255.32.0
> >> -ip,nw_src=172.168.64.0/255.255.64.0
> >> +ip,nw_src=172.168.15.0/24
> >> +ip,nw_src=172.168.16.0/20
> >> +ip,nw_src=172.168.32.0/19
> >> +ip,nw_src=172.168.64.0/18
> >> +ip,nw_src=172.168.8.0/22
> >>  ])
> >>  dnl Negative match flow explosion.
> >>  AT_CHECK([test $(expr_to_flow 'ip4.src != {172.168.13.0/24,
> >> 172.168.14.0/24, 172.168.15.0/24}' | wc -l) -le 30])
> >> +AT_CHECK([test $(expr_to_flow 'ip4.dst != {179.141.79.238/32,
> >> 23.87.193.64/32, 240.238.112.253/32}' | wc -l) -le 100])
> >>  AT_CLEANUP
> >>
> >>  AT_SETUP([converting expressions to flows -- port groups])
> >> --
> >> 2.51.0
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >>
> > Thank you Ilya,
> > that's a really great improvement, and I would vote to backport this
> change.
> >
> > Acked-by: Ales Musil <[email protected]>
>
> Hi Ilya, Ales, Lorenzo,
>
> I agree, this is a huge improvement!  I read the commit message a bunch
> of times to make sure I understand how it works and I couldn't find any
> reason to think that the change is wrong.  The patch in general looks
> good to me:
>
> Acked-by: Dumitru Ceara <[email protected]>
>
> I completely agree with the backport too.
>
> Regards,
> Dumitru
>
>
Thank you Ilya, Lorenzo and Dumitru,

I went ahead, merged this into main and backported all the way down to
24.03.

Regards,
Ales
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to