While converting inequality matches into sets of equality matches
we use the most simple and efficient method - create single bit masks
matching on the opposite value of each bit. For example, 'a != 0b0101'
where 'a' is a 4-bit field turns into (note, a[0] is the least
significant bit, a[3] is the most significant):
a[0] != 1 || a[1] != 0 || a[2] != 1 || a[3] != 0
or
a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1
This corresponds to the following matches:
LSB MSB
0123
original : 1010
a[0] == 0: 0...
a[1] == 1: .1..
a[2] == 0: ..0.
a[3] == 1: ...1
We can technically preserve part of the mask that is already covered
by previous matches like this:
a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1
Which just corresponds to filling the upper half of the matrix with
the original values:
LSB MSB LSB MSB
0123 0123
original : 1010 1010
a[0] == 0: 0... a[0..3] == 0b0100: 0010
a[1] == 1: .1.. a[1..3] == 0b011 : .110
a[2] == 0: ..0. a[2..3] == 0b00 : ..00
a[3] == 1: ...1 a[3] == 0b1 : ...1
The diagonal values are the same in both sets of matches.
Both sets of matches are logically equivalent and fully represent the
original inequality match.
This is true, because if the value 'a' equals 0b0101, then it will not
match any of the rules in the set, as each of them has at least one
bit different from 0b0101 (diagonal ones). And if the value 'a' is
different from 0b0101, then we can find the first bit that is different
starting from the most significant, all the higher ones will be the
same as in 0b0101. And we have a rule that matches on exactly that by
design.
Now, since every match in the second set has a contiguous mask ending
at the most significant bit, we can re-write it using a CIDR-like
notation:
a == 0b0100/4
a == 0b0110/3
a == 0b0000/2
a == 0b1000/1
Or: a == 0b0100/4 || a == 0b0110/3 || a == 0b0000/2 || a == 0b1000/1
Why does it matter?
Having larger masks allows much easier elimination of incompatible
sub-expressions during the normalization process.
Let's say we have the following expression:
a != 0b1000/2 && a != 0b0101/4
Let's normalize it by swapping the inequality with equality sets using
the single-bit method:
(a[2] == 1 || a[3] == 0) &&
(a[0] == 0 || a[1] == 1 || a[2] == 0 || a[3] == 1)
Opening the parenthesis:
a[2] == 1 && a[0] == 0 ||
a[2] == 1 && a[1] == 1 ||
a[2] == 1 && a[2] == 0 || < Inconsistent
a[2] == 1 && a[3] == 1 ||
a[3] == 0 && a[0] == 0 ||
a[3] == 0 && a[1] == 1 ||
a[3] == 0 && a[2] == 0 ||
a[3] == 0 && a[3] == 1 < Inconsistent
We can see that two sub-expressions can be eliminated as they are
always false. Only the sub-expressions that match on the same bit
can be eliminated at this stage, the rest must be processed further.
With the prefix method we get:
(a[2..3] == 0b11 || a[3] == 0) &&
(a[0..3] == 0b0100 || a[1..3] == 0b011 || a[2..3] == 0b00 || a[3] == 1)
In a bit easier to read form:
(a == ..11 || a == ...0) &&
(a == 0010 || a == .110 || a == ..00 || a == ...1)
Opening the parenthesis:
LSB MSB LSB MSB
0123 0123
a == ..11 && a == 0010 || < Inconsistent
a == ..11 && a == .110 || < Inconsistent
a == ..11 && a == ..00 || < Inconsistent
a == ..11 && a == ...1 ||
a == ...0 && a == 0010 ||
a == ...0 && a == .110 ||
a == ...0 && a == ..00 ||
a == ...0 && a == ...1 < Inconsistent
Now four sub-expressions are always false instead of just two. And we
can see that if we had more bits matched in the second inequality
match instead of 4, then all the rest of 'a == ..11 && ...'
sub-expressions would also be always false. By not dealing with all
possible bit masks and reducing the problem to a much smaller space
of prefix matches, we're significantly increasing chances of resulted
sub-expressions to end up internally inconsistent and discarded.
This greatly reduces the amount of sub-expressions that we need to
process on the next steps. Removal of duplicates and crushing of
supersets will get rid of most of the unnecessary matches in the case
of single-bit method, but not all of them, since we do not have a lot
of context to track dependencies between different sub-expressions.
And it's also much more expensive to do so, since superset crushing
has quadratic complexity in the worst case, which grows pretty fast
with the amount of input sub-expressions. More we can eliminate
before this stage, the better.
In practice, using the prefix method provides a huge performance
boost while processing inequality matches. For example, normalizing
the inequality match on the following set of 15 random IP addresses
takes about 28 minutes with a single bit approach, generating 60519
OpenFlow rules:
ip4.dst != {
179.141.79.238/32, 23.87.193.64/32, 240.238.112.253/32,
42.140.78.76/32, 243.50.68.23/32, 229.151.116.21/32,
201.156.140.194/32, 94.49.119.92/32, 133.167.173.49/32,
51.199.153.252/32, 214.80.176.168/32, 134.195.2.51/32,
68.18.121.185/32, 118.53.215.77/32, 203.42.132.181/32,
}
With the prefixes, the same task takes just about 4 milliseconds,
generating only 412 OpenFlow rules. Generated rules match the full
negation of the set performed with python netaddr library:
from netaddr import IPSet
list((IPSet(['0.0.0.0/0']) - IPSet(exclusions)).iter_cidrs())
Additional bonus is that prefix matches are much friendlier to the
OpenFlow classifier in OVS that can now utilize the prefix match
optimizations for some of these rules, potentially reducing the amount
of datapath flows.
The superset crushing logic is still needed to get rid of the extra
rules when processing sets with particularly diverse masks and
custom bit matches, so keeping it as it is. New tests added to
substitute the reduced test coverage for this functionality.
The prefixes approach is still not a silver bullet and it will take
noticeable amount of time on very large sets of random IPs with
inequality matches, but we're moving the scalability pole way further
with a relatively minor change. For example, it will take 2.7 seconds
to normalize inequality match on a set with 200 random /32 addresses
and 38 seconds on a set with 500 random /32 addresses. In real world
applications the speed should be much higher as IPs are normally not
that random. With the single-bit method, normalization of such sets
is practically impossible.
If we'll need further improvements for handling very large sets, we'll
likely need to special-case the inequality checks for constant sets in
curly braces that have CIDR-like masks. This way we could negate the
set as a whole, which is much simpler computationally, but a fair bit
heavier code-wise. This approach would not handle expressions without
curly-braced sets or with multiples of them and will not handle
different types of masks, while generic conversion of each negation
into a set of prefix matches doesn't require any specific format of
the original expression and works for any masks, even if they are not
contiguous.
Reported-at: https://issues.redhat.com/browse/FDP-1952
Signed-off-by: Ilya Maximets <[email protected]>
---
As usual, the fact that it's easy to break the cluster by configuring
a single not-that-large ACL is sort of a "performance bug", which are
always hard to classify. So I'll leave it up for maintainers to decide
if they want to backport this down to LTS.
FWIW, in comparison with the previous superset crushing patch that was
in a similar position, this change is very simple in both logic and the
code, ignoring the fact that I had to explain why it works in 200 lines
of the commit message. :D
lib/expr.c | 9 ++++---
tests/ovn.at | 68 ++++++++++++++++++++++++++++++++++------------------
2 files changed, 51 insertions(+), 26 deletions(-)
diff --git a/lib/expr.c b/lib/expr.c
index e01cd028f..288e245c6 100644
--- a/lib/expr.c
+++ b/lib/expr.c
@@ -2139,9 +2139,12 @@ expr_simplify_ne(struct expr *expr)
e->type = EXPR_T_CMP;
e->cmp.symbol = expr->cmp.symbol;
e->cmp.relop = EXPR_R_EQ;
- bitwise_put_bit(&e->cmp.value, sizeof e->cmp.value, i,
- !bitwise_get_bit(value, sizeof *value, i));
- bitwise_put1(&e->cmp.mask, sizeof e->cmp.mask, i);
+
+ bitwise_copy(mask, sizeof *mask, i,
+ &e->cmp.mask, sizeof e->cmp.mask, i, w - i);
+ bitwise_copy(value, sizeof *value, i,
+ &e->cmp.value, sizeof e->cmp.value, i, w - i);
+ bitwise_toggle_bit(&e->cmp.value, sizeof e->cmp.value, i);
new = expr_combine(EXPR_T_OR, new, e);
}
diff --git a/tests/ovn.at b/tests/ovn.at
index f8ad98917..df77477b9 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -444,10 +444,10 @@ AT_CHECK([simplify 'tcp.dst <= 65535'], [0],
[ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
])
AT_CHECK([simplify 'tcp.dst > 0'], [0],
- [[(tcp.dst[0] || tcp.dst[1] || tcp.dst[2] || tcp.dst[3] || tcp.dst[4] ||
tcp.dst[5] || tcp.dst[6] || tcp.dst[7] || tcp.dst[8] || tcp.dst[9] ||
tcp.dst[10] || tcp.dst[11] || tcp.dst[12] || tcp.dst[13] || tcp.dst[14] ||
tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
+ [[(tcp.dst == 0x1 || tcp.dst[1..15] == 0x1 || tcp.dst[2..15] == 0x1 ||
tcp.dst[3..15] == 0x1 || tcp.dst[4..15] == 0x1 || tcp.dst[5..15] == 0x1 ||
tcp.dst[6..15] == 0x1 || tcp.dst[7..15] == 0x1 || tcp.dst[8..15] == 0x1 ||
tcp.dst[9..15] == 0x1 || tcp.dst[10..15] == 0x1 || tcp.dst[11..15] == 0x1 ||
tcp.dst[12..15] == 0x1 || tcp.dst[13..15] == 0x1 || tcp.dst[14..15] == 0x1 ||
tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
]])
AT_CHECK([simplify 'tcp.dst < 65535'], [0],
- [[(!tcp.dst[0] || !tcp.dst[1] || !tcp.dst[2] || !tcp.dst[3] || !tcp.dst[4]
|| !tcp.dst[5] || !tcp.dst[6] || !tcp.dst[7] || !tcp.dst[8] || !tcp.dst[9] ||
!tcp.dst[10] || !tcp.dst[11] || !tcp.dst[12] || !tcp.dst[13] || !tcp.dst[14] ||
!tcp.dst[15]) && ip.proto == 0x6 && (eth.type == 0x800 || eth.type == 0x86dd)
+ [[(tcp.dst == 0xfffe || tcp.dst[1..15] == 0x7ffe || tcp.dst[2..15] ==
0x3ffe || tcp.dst[3..15] == 0x1ffe || tcp.dst[4..15] == 0xffe || tcp.dst[5..15]
== 0x7fe || tcp.dst[6..15] == 0x3fe || tcp.dst[7..15] == 0x1fe ||
tcp.dst[8..15] == 0xfe || tcp.dst[9..15] == 0x7e || tcp.dst[10..15] == 0x3e ||
tcp.dst[11..15] == 0x1e || tcp.dst[12..15] == 0xe || tcp.dst[13..15] == 0x6 ||
tcp.dst[14..15] == 0x2 || !tcp.dst[15]) && ip.proto == 0x6 && (eth.type ==
0x800 || eth.type == 0x86dd)
]])
AT_CLEANUP
@@ -569,6 +569,27 @@ AT_CHECK([expr_to_flow 'inport == "eth0" && inport ==
"eth1"'], [0], [dnl
])
AT_CLEANUP
+AT_SETUP([converting expressions to flows -- OR supersets])
+AT_KEYWORDS([expression])
+expr_to_flow () {
+ echo "$1" | ovstest test-ovn expr-to-flows | sort
+}
+AT_CHECK([expr_to_flow 'dnl
+ (ip4.src[[8]] == 0 || ip4.src[[9]] == 1 || ip4.src[[10]] == 0) && dnl
+ (ip4.src[[8]] == 0 || ip4.src[[9]] == 0 || ip4.src[[10]] == 1)'] , [0],
[dnl
+ip,nw_src=0.0.0.0/0.0.1.0
+ip,nw_src=0.0.0.0/0.0.6.0
+ip,nw_src=0.0.6.0/0.0.6.0
+])
+AT_CHECK([expr_to_flow 'dnl
+ (ip4.src[[8]] != 0 || ip4.src[[9]] != 1 || ip4.src[[10]] != 0) && dnl
+ (ip4.src[[8]] != 0 || ip4.src[[9]] != 0 || ip4.src[[10]] != 1)'] , [0],
[dnl
+ip,nw_src=0.0.0.0/0.0.6.0
+ip,nw_src=0.0.1.0/0.0.1.0
+ip,nw_src=0.0.6.0/0.0.6.0
+])
+AT_CLEANUP
+
AT_SETUP([converting expressions to flows -- address sets])
AT_KEYWORDS([expression])
expr_to_flow () {
@@ -637,24 +658,24 @@ AT_CHECK([expr_to_flow 'ip4.src != {$set4}'], [0], [dnl
])
AT_CHECK([expr_to_flow 'ip4.src != {1.0.0.0/8, $set4}'], [0], [dnl
-ip,nw_src=0.0.0.0/1.0.0.0
+ip,nw_src=0.0.0.0/8
ip,nw_src=128.0.0.0/1
-ip,nw_src=16.0.0.0/16.0.0.0
-ip,nw_src=2.0.0.0/2.0.0.0
-ip,nw_src=32.0.0.0/32.0.0.0
-ip,nw_src=4.0.0.0/4.0.0.0
-ip,nw_src=64.0.0.0/64.0.0.0
-ip,nw_src=8.0.0.0/8.0.0.0
+ip,nw_src=16.0.0.0/4
+ip,nw_src=2.0.0.0/7
+ip,nw_src=32.0.0.0/3
+ip,nw_src=4.0.0.0/6
+ip,nw_src=64.0.0.0/2
+ip,nw_src=8.0.0.0/5
])
AT_CHECK([expr_to_flow 'ip4.src != 1.0.0.0/8 && ip4.src != {$set4}'], [0], [dnl
-ip,nw_src=0.0.0.0/1.0.0.0
+ip,nw_src=0.0.0.0/8
ip,nw_src=128.0.0.0/1
-ip,nw_src=16.0.0.0/16.0.0.0
-ip,nw_src=2.0.0.0/2.0.0.0
-ip,nw_src=32.0.0.0/32.0.0.0
-ip,nw_src=4.0.0.0/4.0.0.0
-ip,nw_src=64.0.0.0/64.0.0.0
-ip,nw_src=8.0.0.0/8.0.0.0
+ip,nw_src=16.0.0.0/4
+ip,nw_src=2.0.0.0/7
+ip,nw_src=32.0.0.0/3
+ip,nw_src=4.0.0.0/6
+ip,nw_src=64.0.0.0/2
+ip,nw_src=8.0.0.0/5
])
AT_CHECK([expr_to_flow 'ip4.dst == 172.27.0.65 && ip4.src == $set1 && ip4.dst
!= 10.128.0.0/14'], [0], [dnl
ip,nw_src=10.0.0.1,nw_dst=172.27.0.65
@@ -662,17 +683,18 @@ ip,nw_src=10.0.0.2,nw_dst=172.27.0.65
ip,nw_src=10.0.0.3,nw_dst=172.27.0.65
])
AT_CHECK([expr_to_flow 'ip4.src == 172.168.13.0/16 && ip4.src !=
{172.168.13.0/24, 172.168.14.0/24}'], [0], [dnl
-ip,nw_src=172.168.0.0/255.255.3.0
-ip,nw_src=172.168.0.0/255.255.4.0
-ip,nw_src=172.168.0.0/255.255.8.0
+ip,nw_src=172.168.0.0/21
+ip,nw_src=172.168.12.0/24
ip,nw_src=172.168.128.0/17
-ip,nw_src=172.168.16.0/255.255.16.0
-ip,nw_src=172.168.3.0/255.255.3.0
-ip,nw_src=172.168.32.0/255.255.32.0
-ip,nw_src=172.168.64.0/255.255.64.0
+ip,nw_src=172.168.15.0/24
+ip,nw_src=172.168.16.0/20
+ip,nw_src=172.168.32.0/19
+ip,nw_src=172.168.64.0/18
+ip,nw_src=172.168.8.0/22
])
dnl Negative match flow explosion.
AT_CHECK([test $(expr_to_flow 'ip4.src != {172.168.13.0/24, 172.168.14.0/24,
172.168.15.0/24}' | wc -l) -le 30])
+AT_CHECK([test $(expr_to_flow 'ip4.dst != {179.141.79.238/32, 23.87.193.64/32,
240.238.112.253/32}' | wc -l) -le 100])
AT_CLEANUP
AT_SETUP([converting expressions to flows -- port groups])
--
2.51.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev