On 3/31/23 23:30, Han Zhou wrote:
>
>
> On Fri, Mar 31, 2023 at 5:34 AM Ilya Maximets <[email protected]
> <mailto:[email protected]>> wrote:
>>
>> On 3/31/23 06:39, Han Zhou wrote:
>> >
>> >
>> > On Wed, Mar 29, 2023 at 9:05 AM Ilya Maximets <[email protected]
>> > <mailto:[email protected]> <mailto:[email protected]
>> > <mailto:[email protected]>>> wrote:
>> >>
>> >> While crushing OR expressions, OVN removes exact replicas of sub
>> >> expressions. However, there could be many CMP expressions that are
>> >> supersets of each other. These are most likely to be created as a
>> >> result of cross-product while expanding brackets in the AND expression
>> >> in crush_and_numeric(), i.e. while converting
>> >> "x && (a0 || a1) && (b0 || b1)" into "xa0b0 || xa0b1 || xa1b0 || xa1b1".
>> >>
>> >> In addition to removal of exact duplicates introducing scan and removal
>> >> of supersets of other existing sub-expressions to reduce the amount of
>> >> generated flows. This adds extra computations, but should save time
>> >> later, since less flows will be generated.
>> >>
>> >> Example:
>> >>
>> >> "ip4.src == 172.168.0.0/16 <http://172.168.0.0/16>
>> >> <http://172.168.0.0/16 <http://172.168.0.0/16>> &&
>> >> ip4.src!={172.168.13.0/24 <http://172.168.13.0/24>
>> >> <http://172.168.13.0/24 <http://172.168.13.0/24>>, 172.168.15.0/24
>> >> <http://172.168.15.0/24> <http://172.168.15.0/24
>> >> <http://172.168.15.0/24>>}"
>> >>
>> >> Processing of this expression yields 42 flows:
>> >>
>> >> $ ./tests/ovstest test-ovn expr-to-flows <<< "$expr"
>> >>
>> >> ip,nw_src=172.168.0.0/255.255.1.0 <http://172.168.0.0/255.255.1.0>
>> >> <http://172.168.0.0/255.255.1.0 <http://172.168.0.0/255.255.1.0>>
>> >> ip,nw_src=172.168.0.0/255.255.10.0 <http://172.168.0.0/255.255.10.0>
>> >> <http://172.168.0.0/255.255.10.0 <http://172.168.0.0/255.255.10.0>>
>> >> ip,nw_src=172.168.0.0/255.255.12.0 <http://172.168.0.0/255.255.12.0>
>> >> <http://172.168.0.0/255.255.12.0 <http://172.168.0.0/255.255.12.0>>
>> >> ip,nw_src=172.168.0.0/255.255.3.0 <http://172.168.0.0/255.255.3.0>
>> >> <http://172.168.0.0/255.255.3.0 <http://172.168.0.0/255.255.3.0>>
>> >> ip,nw_src=172.168.0.0/255.255.4.0 <http://172.168.0.0/255.255.4.0>
>> >> <http://172.168.0.0/255.255.4.0 <http://172.168.0.0/255.255.4.0>>
>> >> ip,nw_src=172.168.0.0/255.255.5.0 <http://172.168.0.0/255.255.5.0>
>> >> <http://172.168.0.0/255.255.5.0 <http://172.168.0.0/255.255.5.0>>
>> >> ip,nw_src=172.168.0.0/255.255.6.0 <http://172.168.0.0/255.255.6.0>
>> >> <http://172.168.0.0/255.255.6.0 <http://172.168.0.0/255.255.6.0>>
>> >> ip,nw_src=172.168.0.0/255.255.8.0 <http://172.168.0.0/255.255.8.0>
>> >> <http://172.168.0.0/255.255.8.0 <http://172.168.0.0/255.255.8.0>>
>> >> ip,nw_src=172.168.0.0/255.255.9.0 <http://172.168.0.0/255.255.9.0>
>> >> <http://172.168.0.0/255.255.9.0 <http://172.168.0.0/255.255.9.0>>
>> >> ip,nw_src=172.168.128.0/17 <http://172.168.128.0/17>
>> >> <http://172.168.128.0/17 <http://172.168.128.0/17>>
>> >> <... 32 more flows ...>
>> >>
>> >> We can see that many flows above do overlap, e.g. 255.255.3.0
>> >> mask is a superset of 255.255.1.0. Everything that matches
>> >> 255.255.3.0, will match 255.255.1.0 as well (the value is the same).
>> >>
>> >> By removing all the unnecessary supersets, the set of flows can
>> >> be reduced from 42 down to 7:
>> >>
>> >> ip,nw_src=172.168.0.0/255.255.1.0 <http://172.168.0.0/255.255.1.0>
>> >> <http://172.168.0.0/255.255.1.0 <http://172.168.0.0/255.255.1.0>>
>> >> ip,nw_src=172.168.0.0/255.255.4.0 <http://172.168.0.0/255.255.4.0>
>> >> <http://172.168.0.0/255.255.4.0 <http://172.168.0.0/255.255.4.0>>
>> >> ip,nw_src=172.168.0.0/255.255.8.0 <http://172.168.0.0/255.255.8.0>
>> >> <http://172.168.0.0/255.255.8.0 <http://172.168.0.0/255.255.8.0>>
>> >> ip,nw_src=172.168.128.0/17 <http://172.168.128.0/17>
>> >> <http://172.168.128.0/17 <http://172.168.128.0/17>>
>> >> ip,nw_src=172.168.16.0/255.255.16.0 <http://172.168.16.0/255.255.16.0>
>> >> <http://172.168.16.0/255.255.16.0 <http://172.168.16.0/255.255.16.0>>
>> >> ip,nw_src=172.168.32.0/255.255.32.0 <http://172.168.32.0/255.255.32.0>
>> >> <http://172.168.32.0/255.255.32.0 <http://172.168.32.0/255.255.32.0>>
>> >> ip,nw_src=172.168.64.0/255.255.64.0 <http://172.168.64.0/255.255.64.0>
>> >> <http://172.168.64.0/255.255.64.0 <http://172.168.64.0/255.255.64.0>>
>> >>
>> >> This change should be particularly useful for expressions with
>> >> inequality checks, like the one above. Such expressions are
>> >> frequent among ACL rules.
>> >>
>> >> "ip4.src != {172.168.13.0/24 <http://172.168.13.0/24>
>> >> <http://172.168.13.0/24 <http://172.168.13.0/24>>, 172.168.14.0/24
>> >> <http://172.168.14.0/24> <http://172.168.14.0/24
>> >> <http://172.168.14.0/24>>, 172.168.15.0/24 <http://172.168.15.0/24>
>> >> <http://172.168.15.0/24 <http://172.168.15.0/24>>}"
>> >>
>> >> Brefore:
>> >> $ ./tests/ovstest test-ovn expr-to-flows <<< "$expr" | wc -l
>> >> 2894
>> >>
>> >> After:
>> >> $ ./tests/ovstest test-ovn expr-to-flows <<< "$expr" | wc -l
>> >> 23
>> >>
>> >> Superset lookups are performed only if there are expressions with
>> >> more bits in the mask than in the current one. So, there is no extra
>> >> cost for equality checks on normal address sets, like port group sets,
>> >> where all the IPs are exect matches or have the same prefix length
>> >> otherwise.
>> >>
>> >> Use of bitmaps instead of subvalue functions significantly speeds up
>> >> processing since most of the subvalue space is an all-zero empty space.
>> >>
>> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2177197
>> >> <https://bugzilla.redhat.com/show_bug.cgi?id=2177197>
>> >> <https://bugzilla.redhat.com/show_bug.cgi?id=2177197
>> >> <https://bugzilla.redhat.com/show_bug.cgi?id=2177197>>
>> >> Reported-by: Nadia Pinaeva <[email protected]
>> >> <mailto:[email protected]> <mailto:[email protected]
>> >> <mailto:[email protected]>>>
>> >> Signed-off-by: Ilya Maximets <[email protected]
>> >> <mailto:[email protected]> <mailto:[email protected]
>> >> <mailto:[email protected]>>>
>> >> ---
>> >> include/ovn/expr.h | 1 +
>> >> lib/expr.c | 238 ++++++++++++++++++++++++++++++++++-----------
>> >> tests/ovn.at <http://ovn.at> <http://ovn.at <http://ovn.at>> | 12
>> >> +++
>> >> 3 files changed, 196 insertions(+), 55 deletions(-)
>> >>
<snip>
>> > I think the algorithm is very smart. However, as you mentioned in the
>> > meeting today, the worst case of this nested loop can still be bad. For
>> > example, if we have an address set with 1000 addresses (not very big), but
>> > half of them are with the mask of /24, and the other half are with /32,
>> > without any overlapping, then it would be O(500 * 500) time complexity.
>> > This example may be too extreme, but in reality, I think we shouldn't be
>> > surprised to see someone using an address set with 10 CIDRs of /24
>> > together with another 1000 individual IPs, and the algorithm would make it
>> > 10 times slower than before. We should at least explicitly document this
>> > inefficient use of address sets and remind users to avoid the pitfalls,
>> > and in turn, document or optimize in CMS (k8s, openstack, ...) to avoid
>> > such usage.
>>
>> We could document, sure. One thing worth mentioning is that we're
>> still talking about sub-millisecond times here. I did some tests
>> and normalization of the 10 CIDRs of /24 + 1000 individual IPs case
>> takes about 11 microseconds.
>
> Based on the test result below I guess you are talking about 10 + 100
> (instead of 10 + 1000) here for 11 us.
Yep, sorry. I messed up the total number. The table below is
correct though (it says 10+100). I updated it with 10+1000 results.
>
>> The "too extreme" 500+500 case takes
>> 700 microseconds, i.e. 0.7 ms. In comparison, if we do not look
>> for supersets, the same operations take 8 and 100 microseconds
>> respectively.
>>
>> There is an obvious difference, especially in the 500+500 case.
>> I'm just not sure how much real impact we have as such big manually
>> created sets shouldn't really appear in way too many logical flows,
>> so we're probably talking about aggregated full recompute cost of
>> tens of milliseconds, maybe a hundred in extreme cases.
>>
>> Just for fun I also tested the first patch from v2 and it is
>> horrible. :) It scores 0.3 and 26 ms respectively.
>> v3 is 30-40 times faster.
>>
>> 10+100 500+500
>> ------- --------
>> main 8 us 100 us
>> v2 300 us 26000 us
>> v3 11 us 700 us
>>
>
> Yes, the result of v3 looks not too bad even in the worst case.
> I meant to say with the same total number of addresses, e.g. 1000, the
> distribution of masked and unmasked addresses (1 + 999 v.s. 10 + 990 v.s. 500
> + 500) would impact the result significantly.
The updated table with the intended sizes:
10+1000 250+750 500+500
-------- -------- --------
main 100 us 100 us 100 us
v2 18100 us 19500 us 26000 us
v3 140 us 600 us 700 us
> The worst case of main is O(n), v2 is O(n * n), and v3 is O(n/2 * n/2).
> However, I didn't understand why the test result showed a different ratio.
> v2's result is a little faster than O(n * n) but still in the order of
> magnitude expected, compared with main. But v3 is only 7 times slower than
> main, while I'd expect it to be 250 times slower than main, or 4 times faster
> than v2. Probably I missed something that could have made the v3
> significantly faster than I expected. Do you have any insight here?
The main difference, I think, is a switch from mf_subvalue_intersect()
in v2 to expr_bitmap_intersect_check() in v3. The latter is about 10x
faster in my testing. While it's true that we do a lot of iterations,
250K iterations is nothing for a modern CPU as long as operations inside
the loop are cheap. 700 * 4 * 10 = 28000, so more or less checks out,
at least for that particular case.
Comparisons with the main are also tricky, because we're comparing
fairly different workloads. Also, we might have some cache related
effects with different sizes and different order of operations.
>
> Consider a reasonably large scale address set which may be 10k addresses,
> among them (for whatever reason) 1000 of the addresses are masked CIDRs, the
> cost is still within tens of ms, which seems fine.
Yes, about 40 ms in my tests for a 1000+9000 case.
>
> However, I think a more important impact is that if an address set contains
> both CIDRs and individual IPs with some overlapping - it may happen in
> reality for a user managed address set when it is not very well managed,
> meaning the admin didn't take the effort to remove IPs that is covered by
> some of the CIDRs probably because they didn't know that matters. In this
> case, the I-P of address set won't work, and it is a much bigger cost (not
> because of the algorithm here but because of the other part of the lflow
> reprocessing).
Still a good point.
>
> So I still think it is better to contain the "superset" logic for negation
> expr handling only. But I don't have a very strong opinion on that, if you
> see other difficulties of doing that (in that case we will have to leave that
> burden to the users). What do you think?
OK, I'll restrict the superset searching for now as it may be a problem
in some weird corner cases. In general, I think, we need some better
address set tracking in I-P to avoid breaking it if the set changed.
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev