On 3/22/21 1:10 PM, gmingchen(陈供明) wrote:
>
>
> On 2021/3/19, 9:02 PM, "Dumitru Ceara" <[email protected]> wrote:
>
> On 3/10/21 2:29 PM, gmingchen(陈供明) wrote:
> > From: Gongming Chen <[email protected]>
> >
> > This patch merges ipv4 addresses with wildcard masks, and replaces this
> > ipv4 addresses with the combined ip/mask. This will greatly reduce the
> > entries in the ovs security group flow table, especially when the host
> > size is large.
> >
> > Analysis in the simplest scenario, a network 1.1.1.0/24 network,create
> 253
> > ports(1.1.1.2-1.1.1.254).
> > Only focus on the number of ip addresses, the original 253 addresses
> will
> > be combined into 13 addresses.
> > 1.1.1.2/31
> > 1.1.1.4/30
> > 1.1.1.8/29
> > 1.1.1.16/28
> > 1.1.1.32/27
> > 1.1.1.64/26
> > 1.1.1.128/26
> > 1.1.1.192/27
> > 1.1.1.224/28
> > 1.1.1.240/29
> > 1.1.1.248/30
> > 1.1.1.252/31
> > 1.1.1.254
> >
> > Some scenes are similar to the following:
> > 1.1.1.2, 1.1.1.6
> > After the combine:
> > 1.1.1.2/255.255.255.251
> > You can use ovn-match-ip utility to match ip.
> > such as:
> > ovs-ofctl dump-flows br-int | ovn-match-ip 1.1.1.6
> > 1.1.1.2/255.255.255.251 will show.
> >
> > Simple description of the algorithm.
> > There are two situations
> > 1. Combine once
> > such as:
> > 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
> > Combined into: 1.1.1.0/31, 1.0.1.0/31
> > 2. Combine multiple times
> > 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
> > Combined into: 1.0.1.0/255.254.255.254
> >
> > Considering the actual scene and simplicity, the first case is used to
> > combine once.
> >
> > ...00...
> > ...01...
> > ...10...
> > ...11...
> > "..." means the same value omitted.
> > Obviously, the above value can be expressed as ...00.../11100111. This
> > continuous interval that can be represented by one or several wildcard
> > masks is called a segment.
> > Only if all 2<<n values from the minimum value 00...(n) to 11...(n)
> > exist, can they be combined into 00...(n)/00...( n)
> >
> > First sort all the values by size. Iterate through each value.
> > 1. Find a new segment, where two values differ only by 1 bit, such as
> > ...0... and ...1...
> > diff = ip_next ^ ip
> > if (diff & (diff-1)) == 0
> > new_segment = true
> > The first non-zero place in the high direction of ip is the end of the
> > segment(segment_end).
> > For example...100... and...101..., the segment_end is ...111...
> >
> > 2. Count the number of consecutive and less than continuous_size in the
> > segment.
> > diff = ip_next - ip
> > if (diff & (diff-1)) == 0 && ip_next <= segment_end
> > continuous_size++
> >
> > 3. Combine different ip intervals in the segment according to
> > continuous_size.
> > In continuous_size, from the highest bit of 1 to the lowest bit of 1, in
> > the order of segment start, each bit that is 1 is a different ip
> interval
> > that can be combined with a wildcard mask.
> > For example, 000, 001, 010:
> > continuous_size: 3 (binary 11), segment_start: 000
> > mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111;
> > Combined to: 000/110, 010/111
> >
> > 4. The ip that cannot be recorded in a segment will not be combined.
> >
> > Signed-off-by: Gongming Chen <[email protected]>
> > ---
>
> Hi Gongming,
>
> Sorry for the delayed review.
>
> I have a few general remarks/concerns and some specific comments inline.
> First, the general remarks.
>
> I'm wondering if it would make more sense for this wildcard combination
> of IPs to be done outside OVN, in the CMS, for the following reasons:
>
> - it's IPv4 specific.
> - the CMS usually has more information about when it is matching on IP
> sets and can probably optimize the match it uses in the ACL.
> - the code in expr_const_sets_add() used to be a relatively direct
> translation from sets of constants to sets of port names/IPs; this
> changes with the optimization your proposing.
> - the new utility, ovn-match-ip, would be useful but I'm worried that
> users will often forget to use it.
>
> I'm curious about your and the maintainers' opinion on these points.
>
> Hi Dumitru,
> First, thanks for your review.
>
> On the one hand, I agree with your point of view. The cms side is more
> clear about the details of the ip, and the ip can be better optimized.
> However, on the other hand, I think that ovn, as a network provider, should
> not assume that upper layers such as cms have already made these
> optimizations.
> In fact, some cms do not have such a function, such as openstack, k8s,
> and they hope that this specific and complex work will be realized by
> the network provider.
> In terms of better serving the upper layer or upper layer use, this
> optimization function of ovn simplifies the upper layer work, which
> will just make users more satisfied.
> This is some of my views from the perspective of cms.
I see, I'll let Numan, Mark, and others share their opinions on this too.
>
> Another general remark is to increment the patch version when submitting
> a new revision. From what I can see this specific patch is at v7. When
> sending a v8 you could add "-v8" when generating the patch, e.g.:
>
> git format-patch --subject-prefix="PATCH ovn" -v8 -M origin/master
>
> Would generate a patch with subject:
>
> "Subject: [PATCH ovn v8] ...."
>
> This usually helps when reviewing patches that go through multiple
> iterations.
>
> Thanks.
>
> More specific comments inline.
>
> > debian/ovn-common.install | 1 +
> > lib/expr.c | 219 ++++++++++++++++++++++++++++++++++++++
> > rhel/ovn-fedora.spec.in | 1 +
> > tests/ovn.at | 136 +++++++++++------------
> > tests/test-ovn.c | 9 ++
> > utilities/automake.mk | 5 +-
> > utilities/ovn-match-ip.in | 78 ++++++++++++++
> > 7 files changed, 382 insertions(+), 67 deletions(-)
> > create mode 100755 utilities/ovn-match-ip.in
> >
> > diff --git a/debian/ovn-common.install b/debian/ovn-common.install
> > index 8e5915724..0095df918 100644
> > --- a/debian/ovn-common.install
> > +++ b/debian/ovn-common.install
> > @@ -5,6 +5,7 @@ usr/bin/ovn-ic-nbctl
> > usr/bin/ovn-ic-sbctl
> > usr/bin/ovn-trace
> > usr/bin/ovn-detrace
> > +usr/bin/ovn-match-ip
> > usr/share/ovn/scripts/ovn-ctl
> > usr/share/ovn/scripts/ovndb-servers.ocf
> > usr/share/ovn/scripts/ovn-lib
> > diff --git a/lib/expr.c b/lib/expr.c
> > index f061a8fbe..0a7203817 100644
> > --- a/lib/expr.c
> > +++ b/lib/expr.c
> > @@ -1060,6 +1060,200 @@ expr_constant_set_destroy(struct
> expr_constant_set *cs)
> > }
> > }
> >
> > +struct ip_in
> > +{
> > + uint32_t *ip;
> > + size_t size;
> > +};
>
> I don't think we really need this, we could just pass the IPv4 address
> array to combine_ipv4_in_mask(), along with the number of items in the
> array.
>
> Yes, in combine_ipv4_in_mask(), size of ipv4 array is not necessary.
> But I think that the structure composed of *ip and size is more helpful for
> expr_const_sets_add to initialize an array memory of n_values size instead
> of accurately malloc the entry size of LEX_F_IPV4.
> Because expr_const_sets_add does not know the number of LEX_F_IPV4
> when initializing the ipv4 array.
> What do you think?
In expr_const_sets_add() you currently allocate 'n_values' entries in
ip_in_data.ip and then update ip_in_data.size to the exact number of
non-masked IPv4 tokens.
I don't see a need for 'size' to be stored in 'ip_in_data', it can just
be local to expr_const_sets_add() and passed as argument to
combine_ipv4_in_mask().
>
> > +
> > +struct ip_out_entry
> > +{
> > + uint32_t ip;
> > + uint32_t mask;
> > + bool masked;
> > +};
> > +
> > +struct ip_out
> > +{
> > + struct ip_out_entry *entries;
> > + size_t used;
> > + size_t size;
> > +};
>
> AFAIU, we can never have more "out" entries than "in" entries. This
> means we could be conservative and always allocate as many output
> entries as ip_in entries and pass the preallocated struct ip_out_entry
> array to combine_ipv4_in_mask().
>
> This would also remove the need to resize the "out" array.
>
> Yes, good idea.
>
> > +
> > +static int
> > +compare_mask_ip(const void *a, const void *b)
>
> The implementation doesn't really use the fact that 'a' and 'b' are IPs.
> They're just compared as integers. We might want to use a different
> name and move this function to a common module as it could potentially
> be used in multiple places in the future.
>
> How about adding compare_uint32s(const void *a_, const void *b_) to
> lib/util.c?
Sounds good to me. I guess you mean lib/ovn-util.c though.
>
> > +{
> > + uint32_t a_ = *(uint32_t *)a;
> > + uint32_t b_ = *(uint32_t *)b;
> > +
> > + return a_ < b_ ? -1 : a_ > b_;
> > +}
> > +
> > +/* Function to check ip return data and xrealloc. */
> > +static void
> > +check_realloc_ip_out(struct ip_out *ip_out_data){
> > + if (ip_out_data->used >= ip_out_data->size) {
> > + ip_out_data->entries = x2nrealloc(ip_out_data->entries,
> > + &ip_out_data->size,
> > + sizeof *ip_out_data->entries);
> > + }
> > +}
>
> As mentioned in the comment earlier, we can probably avoid using this
> function.
>
> Yes.
>
> > +
> > +/* Simple description of the algorithm.
> > + * There are two situations
> > + * 1. Combine once
> > + * such as:
> > + * 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
> > + * Combined into: 1.1.1.0/31, 1.0.1.0/31
> > + * 2. Combine multiple times
> > + * 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
> > + * Combined into: 1.0.1.0/255.254.255.254
> > + *
> > + * Considering the actual scene and simplicity, the first case is used
> to
> > + * combine once.
> > + *
> > + * ...00...
> > + * ...01...
> > + * ...10...
> > + * ...11...
> > + * "..." means the same value omitted.
> > + * Obviously, the above value can be expressed as ...00.../11100111.
> This
> > + * continuous interval that can be represented by one or several
> wildcard
> > + * masks is called a segment.
> > + * Only if all 2<<n values from the minimum value 00...(n) to 11...(n)
> > + * exist, can they be combined into 00...(n)/00...( n)
> > + *
> > + * First sort all the values by size. Iterate through each value.
> > + * 1. Find a new segment, where two values differ only by 1 bit, such
> as
> > + * ...0... and ...1...
> > + * diff = ip_next ^ ip
> > + * if (diff & (diff-1)) == 0
> > + * new_segment = true
> > + * The first non-zero place in the high direction of ip is the end of
> the
> > + * segment(segment_end).
> > + * For example...100... and...101..., the segment_end is ...111...
> > + *
> > + * 2. Count the number of consecutive and less than continuous_size in
> the
> > + * segment.
> > + * diff = ip_next - ip
> > + * if (diff & (diff-1)) == 0 && ip_next <= segment_end
> > + * continuous_size++
> > + *
> > + * 3. Combine different ip intervals in the segment according to
> > + * continuous_size.
> > + * In continuous_size, from the highest bit of 1 to the lowest bit of
> 1, in
> > + * the order of segment start, each bit that is 1 is a different ip
> interval
> > + * that can be combined with a wildcard mask.
> > + * For example, 000, 001, 010:
> > + * continuous_size: 3 (binary 11), segment_start: 000
> > + * mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111;
> > + * Combined to: 000/110, 010/111
> > + *
> > + * 4. The ip that cannot be recorded in a segment will not be
> combined. */
>
> Thanks for adding the detailed description here. However, it's a bit
> confusing because the names used in the description don't match the
> variable names used in the implementation below. Maybe it's better to
> not use names in the description?
>
> Ok, It would be better to replace it with a general description word.
Thanks.
>
> > +static void
> > +combine_ipv4_in_mask(struct ip_in *ip_in_data, struct ip_out
> *ip_out_data){
>
> Please move the curly brace on a new line.
>
> Ok, thanks.
>
> > + bool recording = false;
> > + uint32_t i, diff, connect, start, end, continuous_size, mask_base;
>
> Some of these can be moved in the specific blocks where they're used.
>
> Ok.
>
> > +
> > + start = 0;
> > + continuous_size = 0;
> > + mask_base = 0;
> > + end = 0;
> > +
> > + qsort(ip_in_data->ip, ip_in_data->size, sizeof(uint32_t),
>
> Please use "sizeof *ip_in_data->ip".
>
> Ok, thanks.
>
> > + compare_mask_ip);
> > + memset(ip_out_data, 0, sizeof(struct ip_out));
> > +
> > + for (i = 0; i < ip_in_data->size; i++) {
> > + if (i + 1 >= ip_in_data->size) {
> > + if (!recording) {
> > + goto end_when_not_recording;
> > + } else {
> > + goto end_when_recording;
> > + }
>
> IMO it makes the code very hard to read if we have gotos jumping inside
> if/else branches or inside loops. This is also usually a sign that
> we're trying to avoid code duplication. A potentially simpler and more
> readable way to do that is to add functions for the specific operations
> we don't want to duplicate.
>
> Ok, I will use function instead of goto.
Cool.
>
> Thanks,
> Gongming
Thanks,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev