Re: [ovs-dev] [PATCH 2/2] openflow: Add extension to flush CT by generic match

2022-12-05 Thread Ales Musil
On Mon, Dec 5, 2022 at 10:38 PM Ilya Maximets  wrote:

> On 12/1/22 14:19, Ales Musil wrote:
> > +
> > +static enum ofperr
> > +handle_nxt_ct_flush(struct ofconn *ofconn, const struct ofp_header
> *oh)
> > +{
> > +struct ofproto *ofproto = ofconn_get_ofproto(ofconn);
> > +struct ofputil_ct_match match = {0};
> > +uint16_t zone_id;
> > +
> > +ofputil_ct_match_decode(, _id, oh);
> > +
> > +if (ofproto->ofproto_class->ct_flush) {
> > +ofproto->ofproto_class->ct_flush(ofproto, _id, );
> >
> >
> > I've realized that v1 doesn't allow zone_id being NULL. However I'm not
> sure how to put that information into the extension struct.
> > I'm open to any suggestion, I was thinking about flags field, which
> would grow the whole struct by 4 bytes.
>
>
> IIUC, you're talking about OpenFlow interface that you created
> requiring zone_id to be provided, right?
>

Yes.


>
> Optional arguments and multiple arguments with mixed order are
> usually solved by using TLVs.  You may shrink down the
> 'struct nx_ct_flush' structure to only mandatory elements and
> make all the 'match' fields including zone_id as TLVs with the
> help of include/openvswitch/ofp-prop.h.
>

That sounds good. I'll look into it, thanks.


>
> One more thing: I don't think we should add dpctl interface for
> the flushing.  We don't have such interface for ct-flush-zone
> so we should not have it for ct-flush.  Instead, what is missing
> in your implementation, is the native OpenFlow interface via
> ovs-ofctl, i.e. 'ovs-ofctl ct-flush' command.  And all the tests
> should use it instead.  This way we will also have test coverage
> for the code that will actually be used by OVN/CMS.
>

The problem is that the dpctl interface actually already exists, and we
shouldn't
just remove it, this wouldn't make sense IMO. The first patch just extends
that interface.
Adding the ovs-ofctl makes, I'll work on it for v2.


>
> Best regards, Ilya Maximets.
>
>
Thanks,
Ales.

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] openflow: Add extension to flush CT by generic match

2022-12-05 Thread Ilya Maximets
On 12/1/22 14:19, Ales Musil wrote:
> +
> +static enum ofperr
> +handle_nxt_ct_flush(struct ofconn *ofconn, const struct ofp_header *oh)
> +{
> +    struct ofproto *ofproto = ofconn_get_ofproto(ofconn);
> +    struct ofputil_ct_match match = {0};
> +    uint16_t zone_id;
> +
> +    ofputil_ct_match_decode(, _id, oh);
> +
> +    if (ofproto->ofproto_class->ct_flush) {
> +        ofproto->ofproto_class->ct_flush(ofproto, _id, );
> 
> 
> I've realized that v1 doesn't allow zone_id being NULL. However I'm not sure 
> how to put that information into the extension struct.
> I'm open to any suggestion, I was thinking about flags field, which would 
> grow the whole struct by 4 bytes.


IIUC, you're talking about OpenFlow interface that you created
requiring zone_id to be provided, right?

Optional arguments and multiple arguments with mixed order are
usually solved by using TLVs.  You may shrink down the
'struct nx_ct_flush' structure to only mandatory elements and
make all the 'match' fields including zone_id as TLVs with the
help of include/openvswitch/ofp-prop.h.

One more thing: I don't think we should add dpctl interface for
the flushing.  We don't have such interface for ct-flush-zone
so we should not have it for ct-flush.  Instead, what is missing
in your implementation, is the native OpenFlow interface via
ovs-ofctl, i.e. 'ovs-ofctl ct-flush' command.  And all the tests
should use it instead.  This way we will also have test coverage
for the code that will actually be used by OVN/CMS.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v2] dpdk: Update to use v22.11.

2022-12-05 Thread Stokes, Ian
> > On Thu, Dec 1, 2022 at 3:24 PM Ilya Maximets  wrote:
>  Frode, do you know the approximate timeline on when we could expect
> >>>
> >>> As explained above, without landing this ... never :-)
> >>> But speaking of timeline - we'd prefer to get this resolved for
> >>> Debian/Ubuntu before most leave for the Christmas break.
> >>> Any chance to get this landed by the end of next  week (9th)?
> >>
> >> Not sure about next week.  I think we're waiting for the 22.11.1 release,
> >> which should be somewhere in the coming days, but I do not know when
> exactly
> >> that will happen.
> >>
> >> Kevin, David, is there an exact date available for that?
> >>
> >> Once the 22.11.1 is out, it should take at most a couple of days to make
> >> some final adjustments and get the patch in.
> >
> > The fix is reviewed and tested by Ferruh and Luca.
> > I'll prepare the 22.11.1 release and tag it tomorrow (unless there is
> > something new).
> 
> OK, sounds good.  I guess, it should be generally possible to apply
> the change in OVS next week, unless something unexpected will happen.
> 
> Ian, what do you think?

Sounds good, I've sent a v4 that uses DPDK 22.11.1 and removes the Debian 
shared DPDK compilation action.

http://patchwork.ozlabs.org/project/openvswitch/patch/20221205213110.19019-1-ian.sto...@intel.com/

Thanks
Ian
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] dpdk: Update to use v22.11.1.

2022-12-05 Thread Ian Stokes
This commit add support to for DPDK v22.11.1, it includes the following
changes.

1. ci: Reduce DPDK compilation time.
2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528

3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332

4. netdev-dpdk: Report device bus specific information.
5. netdev-dpdk: Drop reference to Rx header split.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808

In addition documentation was also updated in this commit for use with
DPDK v22.11.1.

The Debian shared DPDK compilation test is removed as part of this patch
due to a packaging requirement. Once DPDK v22.11.1 is available in Debian
repositories it should be re-enabled in OVS.

For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit

Signed-off-by: David Marchand 
Co-authored-by: David Marchand 
Signed-off-by: Sunil Pai G 
Co-authored-by: Sunil Pai G 
Signed-off-by: Ian Stokes 

---
v3 -> v4
* Rebase to master.
* Update to use DPDK v22.11.1
* Update missed documentation mentioned by David.
* Remove Debian shared DPDK test in test matrix.

v2 -> v3
* Remove RFC status.
* Update debian control to use 22.11.

v1 -> v2
* Updated to use DPDK 22.11 rc4.

* Please Note: Although DPDK documentation has been updated in this patch
the resource has not been updated on the DPDK site as of yet, this will
be expected as part of DPDK 22.11 final release.

* The GitHub actions 'linux deb shared dpdk' is expected to fail with this
patch as DPDK 22.11 is not part of the package structure yet.
---
 .ci/linux-build.sh   |  7 ++-
 .github/workflows/build-and-test.yml |  1 -
 Documentation/faq/releases.rst   |  2 +-
 Documentation/intro/install/dpdk.rst | 16 +++
 Documentation/topics/dpdk/phy.rst|  8 ++--
 Documentation/topics/dpdk/vdev.rst   |  2 +-
 Documentation/topics/dpdk/vhost-user.rst |  2 +-
 Documentation/topics/testing.rst |  2 +-
 Documentation/topics/userspace-tso.rst   |  2 +-
 NEWS | 18 +---
 debian/control.in|  2 +-
 lib/netdev-dpdk.c| 24 --
 rhel/openvswitch-fedora.spec.in  |  2 +-
 tests/system-dpdk.at | 78 
 14 files changed, 73 insertions(+), 93 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 23c8bbb7a..485109672 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -160,6 +160,11 @@ function install_dpdk()
 # meson verbose outputs.
 DPDK_OPTS="$DPDK_OPTS -Ddeveloper_mode=disabled"
 
+# OVS compilation and "normal" unit tests (run in the CI) do not depend on
+# any DPDK driver being present.
+# We can disable all drivers to save compilation time.
+DPDK_OPTS="$DPDK_OPTS -Ddisable_drivers=*/*"
+
 # Install DPDK using prefix.
 DPDK_OPTS="$DPDK_OPTS --prefix=$(pwd)/build"
 
@@ -228,7 +233,7 @@ fi
 
 if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
 if [ -z "$DPDK_VER" ]; then
-DPDK_VER="21.11.2"
+DPDK_VER="22.11.1"
 fi
 install_dpdk $DPDK_VER
 fi
diff --git a/.github/workflows/build-and-test.yml 
b/.github/workflows/build-and-test.yml
index 7baa91403..e08d7b1ba 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -213,7 +213,6 @@ jobs:
   matrix:
 include:
   - dpdk: no
-  - dpdk: shared
 
 steps:
 - name: checkout
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index ac0001cd5..e19f54c8f 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -233,7 +233,7 @@ Q: Are all the DPDK releases that OVS versions work with 
maintained?
 The latest information about DPDK stable and LTS releases can be found
 at `DPDK stable`_.
 
-.. _DPDK stable: http://doc.dpdk.org/guides-21.11/contributing/stable.html
+.. _DPDK stable: http://doc.dpdk.org/guides-22.11/contributing/stable.html
 
 Q: I get an error like this when I configure Open vSwitch:
 
diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index a284e6851..e360ee83d 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
 In addition to the requirements described in :doc:`general`, building Open
 vSwitch with DPDK will require the following:
 
-- DPDK 21.11.2
+- DPDK 22.11.1
 
 - A `DPDK supported NIC`_
 
@@ -59,8 +59,8 @@ vSwitch with DPDK will require the following:
 
 Detailed system requirements can be found at `DPDK requirements`_.
 
-.. _DPDK supported NIC: https://doc.dpdk.org/guides-21.11/nics/index.html
-.. _DPDK requirements: 

Re: [ovs-dev] [PATCH ovn 0/7] OVN IC bugfixes & proposals/questions

2022-12-05 Thread Ilya Maximets
On 12/5/22 17:40, Dumitru Ceara wrote:
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> Hi,
>>
>> we’ve met with an issue, where it was possible to create multiple similar
>> routes within LR (same ip_prefix, nexthop, and route table).
>>
>> Initially the problem stared after OVN upgrade. We use python ovsdbapp 
>> library,
>> and we found a problem in python-ovs, which is described here
>> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by 
>> my
>> colleague Anton.  @Terry Wilson, please take a look on this.
>>
>> The problem itself touches OVN and OVS.  Sorry for the long read, but it 
>> seems
>> that there are a couple of bugs in different places, part of which this RFC
>> used to cover.
>>
>> How the issue was initially reproduced:
>>
>> 1. assume we have (at least) 2-Availability Zone OVN deployment
>>(utilising ovn-ic infrastructure).
>> 2. create transit switch in IC NB
>> 3. create LR in each AZ, connect them to transit switch
>> 4. create one logical switch with a VIF port attached to local OVS &
>>connect this logical switch to LR (e.g. 192.168.0.1/24)
>> 5. install in one AZ in LR 2 static routes with a create command (invoke
>>next command twice):
>>
>>ovn-nbctl --id=@id create logical-router-static-route 
>> ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 
>> static_routes @id
>>
>> From this time there is a couple of strange behaviour/bugs appear:
>>
>> 1. [possible problem] There is a duplicated route in the NB within a
>>single LR.  lflow is computed to have ECMP group with two similar
>>routes:
>>
>>table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && 
>> ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 
>> 1; reg8[16..31] = select(1, 2);
>>table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 
>> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
>> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 
>> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
>> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>>
>>Maybe, it’s better to have some kind of handling such routes?
>>ovsdb index or some logic in ovn-northd?
>>
>> 2. [bug] There is a duplicated route advertisement in
>>OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>>new index to this table for availability_zone, transit_switch,
>>ip_prefix, nexthop and route_table; adding a logic to check if the
>>route was already advertised (covered in Patch #7).
>>
>> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>>on the opposite availability zone adds one new same route.  It creates
>>thousands of same routes each second. This bug is covered by Patch #7.
>>
>> 4. [possible problem] After multiple routes are learned to NB on the
>>opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>>in #1: one in lr_in_ip_routing with select()
>>and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>>installing UINT_MAX routes within ECMP group.
>>
>> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>>ovn-controller installed long-long openflow group rule
>>(group #3):
>>
>># ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>>797824
>>
>>When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>>next error in console:
>>
>># ovs-ofctl dump-groups br-int
>>ovs-ofctl: OpenFlow packet receive failed (End of file)
>>
>>In ovs-vswitchd I see next error in logs and after this line ovs is
>>restarted:
>>
>>2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion 
>> start_ofs <= UINT16_MAX failed in ofpmp_postappend()
> 
> This looks like an OVS bug to me.  Ilya, what do you think the best way
> to fix this is?

This might be considered as a bug in OVS.  In any case, OVS
should not crash, but print an error and continue.

I'm not sure what is the best way to fix that, need to look
closer at the code.

However...

>> 7. From this problem with groups-dump I have some questions:
>>1. Is there a limit for a buckets count in group? Or a limit for the
>>   group string length?
>>2. If yes, should OVN limit on its side the count of buckets in a
>>   group? (Patches #4 && #6).

Reading the OpenFlow 1.5 spec, there is a limit on the number
of buckets, but it is derived from the maximum bucket id, which
is close to a 32bit unsigned value.  So, there is no meaningful
limit until you reach 32bit limit, which is unlikely.

But, there are other indirect limits:

1. For the group modification message, the bucket should fit
   into a single OFPT_GROUP_MOD message (struct ofp_group_mod).
   Meaning that each bucket (including actions) cannot take
   more than 

Re: [ovs-dev] [PATCH v3] dpdk: Update to use v22.11.

2022-12-05 Thread Stokes, Ian
> Hi Ian,
> 
> On Wed, Nov 30, 2022 at 4:32 PM Ian Stokes  wrote:
> >
> > This commit add support to for DPDK v22.11, it includes the following
> > changes.
> >
> > 1. ci: Reduce DPDK compilation time.
> > 2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
> >
> >http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528
> >
> > 3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
> >
> >http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332
> >
> > 4. netdev-dpdk: Report device bus specific information.
> > 5. netdev-dpdk: Drop reference to Rx header split.
> >
> >http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808
> >
> > In addition documentation was also updated in this commit for use with
> > DPDK v22.11.
> >
> > For credit all authors of the original commits to 'dpdk-latest' with the
> > above changes have been added as co-authors for this commit
> >
> > Signed-off-by: David Marchand 
> > Co-authored-by: David Marchand 
> > Signed-off-by: Sunil Pai G 
> > Co-authored-by: Sunil Pai G 
> > Signed-off-by: Ian Stokes 
> >
> > ---
> > v2 -> v3
> > * Remove RFC status.
> > * Update debian control to use 22.11.
> >
> > v1 -> v2
> > * Updated to use DPDK 22.11 rc4.
> >
> > * Please Note: Although DPDK documentation has been updated in this patch
> > the resource has not been updated on the DPDK site as of yet, this will
> > be expected as part of DPDK 22.11 final release.
> >
> > * The GitHub actions 'linux deb shared dpdk' is expected to fail with this
> > patch as DPDK 22.11 is not part of the package structure yet.
> 
> 
> > ---
> >  .ci/linux-build.sh   |  7 ++-
> >  Documentation/faq/releases.rst   |  2 +-
> >  Documentation/intro/install/dpdk.rst | 16 +++---
> >  Documentation/topics/dpdk/phy.rst|  8 +--
> 
> We are missing some updates in the documentation:
> 
> Documentation/topics/dpdk/vdev.rst:__
> https://doc.dpdk.org/guides-21.11/nics/overview.html
> Documentation/topics/dpdk/vhost-user.rst: 21.11/prog_guide/vhost_lib.html>`__
> Documentation/topics/testing.rst:.. _Configure hugepages:
> https://doc.dpdk.org/guides-21.11/linux_gsg/sys_reqs.html
> Documentation/topics/userspace-tso.rst:__
> https://doc.dpdk.org/guides-21.11/nics/overview.html
> 
> The rest lgtm.

Thanks for the catch David, I've updated these in the latest revision.

Thanks
Ian

> 
> 
> --
> David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 6/7] northd: limit ECMP group by 1024 members

2022-12-05 Thread Vladislav Odintsov
Saying more, the OVS register, which is used to store the bucket ID is a 16-bit 
reg8[16..31], which was introduced by Han.
@Han, can you clarify if it was planned to have 2^16 ECMP paths within one 
group (route)?
Or maybe this is not needed to have such a bit IDs space and we can give more 
space for the routes instead of paths?

Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 21:14, Vladislav Odintsov  wrote:
> 
> It’s a good idea.
> But one thing is that this is not the only one place where the buckets are 
> created.
> Also they’re created in LBs. Should we just put some common function, which 
> returns the current configured (or default MAX) and use it in every place?
> 
> Regards,
> Vladislav Odintsov
> 
>> On 5 Dec 2022, at 19:37, Dumitru Ceara > > wrote:
>> 
>> On 12/2/22 18:31, Vladislav Odintsov wrote:
>>> This patch is intended to show that currently it's possible to build
>>> ECMP group of 65k buckets.
>>> 
>>> Signed-off-by: Vladislav Odintsov >> >
>>> ---
>>> northd/northd.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index e1f3bace8..f8f7977ae 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -9271,7 +9271,7 @@ static void
>>> ecmp_groups_add_route(struct ecmp_groups_node *group,
>>>   const struct parsed_route *route)
>>> {
>>> -if (group->route_count == UINT16_MAX) {
>>> +if (group->route_count == 1024) {
>>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>>> VLOG_WARN_RL(, "too many routes in a single ecmp group.");
>>> return;
>> 
>> Should we make the limit configurable?  What if the CMS wants to install
>> a route with more than 1K paths?  Not sure if that's realistic but I
>> would avoid the hardcoded 1K.
>> 
>> Thanks,
>> Dumitru
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 6/7] northd: limit ECMP group by 1024 members

2022-12-05 Thread Vladislav Odintsov
It’s a good idea.
But one thing is that this is not the only one place where the buckets are 
created.
Also they’re created in LBs. Should we just put some common function, which 
returns the current configured (or default MAX) and use it in every place?

Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 19:37, Dumitru Ceara  wrote:
> 
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> This patch is intended to show that currently it's possible to build
>> ECMP group of 65k buckets.
>> 
>> Signed-off-by: Vladislav Odintsov 
>> ---
>> northd/northd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/northd/northd.c b/northd/northd.c
>> index e1f3bace8..f8f7977ae 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -9271,7 +9271,7 @@ static void
>> ecmp_groups_add_route(struct ecmp_groups_node *group,
>>   const struct parsed_route *route)
>> {
>> -if (group->route_count == UINT16_MAX) {
>> +if (group->route_count == 1024) {
>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>> VLOG_WARN_RL(, "too many routes in a single ecmp group.");
>> return;
> 
> Should we make the limit configurable?  What if the CMS wants to install
> a route with more than 1K paths?  Not sure if that's realistic but I
> would avoid the hardcoded 1K.
> 
> Thanks,
> Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 5/7] ic: minor code improvements

2022-12-05 Thread Vladislav Odintsov


Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 19:37, Dumitru Ceara  wrote:
> 
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> 1. Remove excess nbrec_logical_router variable.
>> 2. Remove excess call to add_static_to_routes_ad().
>> 3. Remove double route_table check in ic_route_fin().
> 
> Nit: s/route_table/nexthop/
> 
>> 4. Move variable declarations out of loop.
>> 
>> Signed-off-by: Vladislav Odintsov 
>> ---
>> ic/ovn-ic.c | 31 ++-
>> 1 file changed, 10 insertions(+), 21 deletions(-)
>> 
>> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
>> index 3e02b4c98..59468545d 100644
>> --- a/ic/ovn-ic.c
>> +++ b/ic/ovn-ic.c
>> @@ -895,8 +895,7 @@ ic_route_find(struct hmap *routes, const struct in6_addr 
>> *prefix,
>> r->plen == plen &&
>> ipv6_addr_equals(>nexthop, nexthop) &&
>> !strcmp(r->origin, origin) &&
>> -!strcmp(r->route_table ? r->route_table : "", route_table) &&
>> -ipv6_addr_equals(>nexthop, nexthop)) {
>> +!strcmp(r->route_table ? r->route_table : "", route_table)) {
>> return r;
>> }
>> }
>> @@ -1109,17 +1108,8 @@ add_static_to_routes_ad(
>> struct hmap *routes_ad,
>> const struct nbrec_logical_router_static_route *nb_route,
>> const struct lport_addresses *nexthop_addresses,
>> -const struct smap *nb_options, const char *route_table)
>> +const struct smap *nb_options)
>> {
>> -if (strcmp(route_table, nb_route->route_table)) {
>> -if (VLOG_IS_DBG_ENABLED()) {
>> -VLOG_DBG("Skip advertising route %s -> %s as its route table %s 
>> !="
>> - " %s of TS port", nb_route->ip_prefix, 
>> nb_route->nexthop,
>> - nb_route->route_table, route_table);
>> -}
>> -return;
>> -}
>> -
>> struct in6_addr prefix, nexthop;
>> unsigned int plen;
>> if (!parse_route(nb_route->ip_prefix, nb_route->nexthop,
>> @@ -1541,13 +1531,13 @@ build_ts_routes_to_adv(struct ic_context *ctx,
>> {
>> const struct nbrec_logical_router *lr = ic_lr->lr;
>> 
>> +const struct nbrec_logical_router_static_route *nb_route;
> 
> I'm not sure I agree with this one.  Why not keep it inside the for loop
> below.  We don't use 'nb_route' afterwards AFAICT.

Agree. I’ll revert this change in v2. Also I’ll revert uuid variable movement.

> 
>> +struct uuid id;
>> +
>> /* Check static routes of the LR */
>> for (int i = 0; i < lr->n_static_routes; i++) {
>> -const struct nbrec_logical_router_static_route *nb_route
>> -= lr->static_routes[i];
>> -struct uuid isb_uuid;
>> -if (smap_get_uuid(_route->external_ids, "ic-learned-route",
>> -  _uuid)) {
>> +nb_route = lr->static_routes[i];
>> +if (smap_get_uuid(_route->external_ids, "ic-learned-route", 
>> )) {
>> /* It is a learned route */
>> if (!add_to_routes_learned(_lr->routes_learned, nb_route)) {
>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
>> 1);
>> @@ -1557,10 +1547,10 @@ build_ts_routes_to_adv(struct ic_context *ctx,
>> nbrec_logical_router_update_static_routes_delvalue(lr,
>> nb_route);
>> }
>> -} else {
>> +} else if (!strcmp(ts_route_table, nb_route->route_table)) {
>> /* It may be a route to be advertised */
>> add_static_to_routes_ad(routes_ad, nb_route, ts_port_addrs,
>> -_global->options, ts_route_table);
>> +_global->options);
>> }
>> }
>> 
>> @@ -1593,7 +1583,6 @@ advertise_lr_routes(struct ic_context *ctx,
>> const struct icsbrec_port_binding *isb_pb;
>> const char *lrp_name, *ts_name, *route_table;
>> struct lport_addresses ts_port_addrs;
>> -const struct nbrec_logical_router *lr = ic_lr->lr;
>> const struct icnbrec_transit_switch *key;
>> 
>> struct hmap routes_ad = HMAP_INITIALIZER(_ad);
>> @@ -1611,7 +1600,7 @@ advertise_lr_routes(struct ic_context *ctx,
>> VLOG_INFO_RL(, "Route sync ignores port %s on ts %s for 
>> router"
>>  " %s because the addresses are invalid.",
>>  isb_pb->logical_port, isb_pb->transit_switch,
>> - lr->name);
>> + ic_lr->lr->name);
>> continue;
>> }
>> lrp_name = get_lrp_name_by_ts_port_name(ctx, isb_pb->logical_port);
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 4/7] actions: limit possible OF group bucket count

2022-12-05 Thread Vladislav Odintsov
Do you mean that northd should have some kind of common code, which is used to 
generate buckets?
Initially I wanted to prevent the possibility even to send such a big OF group 
to OVS. For all the cases:
- a new possible functionality which doesn’t limit buckets count;
- some kind of northd bug.

This place looks to me like a "last resort" for OF group generation. All 
buckets are parsed/converted to OF syntax here.
Please correct me if I’m wrong.

Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 19:37, Dumitru Ceara  wrote:
> 
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> It is possible to send OpenFlow group_mod message to OVS to create a
>> group with any number of buckets:
>> 
>> ovs-ofctl dump-groups br-int
>> NXST_GROUP_DESC reply (xid=0x2):
>> group_id=4,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,20),...bucket=bucket_id:1,...
>> 
>> This patch introduces a limit of buckets that may be requested to 1024.
>> In case the limit is reached, ovn-controller will write WARN log about
>> this fact.
> 
> Isn't it simpler to just limit the number of buckets in northd instead?
> What is the downside of doing that instead?
> 
> Thanks,
> Dumitru
> 
>> 
>> Signed-off-by: Vladislav Odintsov 
>> ---
>> lib/actions.c | 40 ++--
>> 1 file changed, 38 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/actions.c b/lib/actions.c
>> index adbb42db4..4322556bf 100644
>> --- a/lib/actions.c
>> +++ b/lib/actions.c
>> @@ -44,6 +44,9 @@
>> #include "controller/lflow.h"
>> 
>> VLOG_DEFINE_THIS_MODULE(actions);
>> +
>> +#define MAX_BUCKETS_PER_GROUP 1024
>> +
>> 
>> /* Prototypes for functions to be defined by each action. */
>> #define OVNACT(ENUM, STRUCT)\
>> @@ -1371,7 +1374,18 @@ encode_ct_lb(const struct ovnact_ct_lb *cl,
>> BUILD_ASSERT(MFF_LOG_CT_ZONE < MFF_REG0 + FLOW_N_REGS);
>> BUILD_ASSERT(MFF_LOG_DNAT_ZONE >= MFF_REG0);
>> BUILD_ASSERT(MFF_LOG_DNAT_ZONE < MFF_REG0 + FLOW_N_REGS);
>> -for (size_t bucket_id = 0; bucket_id < cl->n_dsts; bucket_id++) {
>> +
>> +int n_buckets;
>> +bool group_overflow = false;
>> +if (cl->n_dsts > MAX_BUCKETS_PER_GROUP) {
>> +n_buckets = MAX_BUCKETS_PER_GROUP;
>> +group_overflow = true;
>> +}
>> +else {
>> +n_buckets = cl->n_dsts;
>> +}
>> +
>> +for (size_t bucket_id = 0; bucket_id < n_buckets; bucket_id++) {
>> const struct ovnact_ct_lb_dst *dst = >dsts[bucket_id];
>> char ip_addr[INET6_ADDRSTRLEN];
>> if (dst->family == AF_INET) {
>> @@ -1405,6 +1419,12 @@ encode_ct_lb(const struct ovnact_ct_lb *cl,
>> /* Create an action to set the group. */
>> og = ofpact_put_GROUP(ofpacts);
>> og->group_id = table_id;
>> +
>> +if (group_overflow) {
>> +VLOG_WARN("OF group id '%d' is desired to have more than "
>> +  "MAX_BUCKETS_PER_GROUP (%d) buckets. Limited to maximum.",
>> +  table_id, MAX_BUCKETS_PER_GROUP);
>> +}
>> }
>> 
>> static void
>> @@ -1542,7 +1562,17 @@ encode_SELECT(const struct ovnact_select *select,
>> 
>> struct mf_subfield sf = expr_resolve_field(>res_field);
>> 
>> -for (size_t bucket_id = 0; bucket_id < select->n_dsts; bucket_id++) {
>> +int n_buckets;
>> +bool group_overflow = false;
>> +if (select->n_dsts > MAX_BUCKETS_PER_GROUP) {
>> +n_buckets = MAX_BUCKETS_PER_GROUP;
>> +group_overflow = true;
>> +}
>> +else {
>> +n_buckets = select->n_dsts;
>> +}
>> +
>> +for (size_t bucket_id = 0; bucket_id < n_buckets; bucket_id++) {
>> const struct ovnact_select_dst *dst = >dsts[bucket_id];
>> ds_put_format(, ",bucket=bucket_id=%"PRIuSIZE",weight:%"PRIu16
>>   ",actions=", bucket_id, dst->weight);
>> @@ -1561,6 +1591,12 @@ encode_SELECT(const struct ovnact_select *select,
>> /* Create an action to set the group. */
>> og = ofpact_put_GROUP(ofpacts);
>> og->group_id = table_id;
>> +
>> +if (group_overflow) {
>> +VLOG_WARN("OF group id '%d' is desired to have more than "
>> +  "MAX_BUCKETS_PER_GROUP (%d) buckets. Limited to maximum.",
>> +  table_id, MAX_BUCKETS_PER_GROUP);
>> +}
>> }
>> 
>> static void
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/7] ic: move routes_ad hmap insert to separate function

2022-12-05 Thread Vladislav Odintsov
Hi,

Okay, I’ll split these patches in two series and squash patch #1 with patch #7.

Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 20:00, Numan Siddique  wrote:
> 
> On Mon, Dec 5, 2022 at 11:37 AM Dumitru Ceara  > wrote:
>> 
>> On 12/2/22 18:31, Vladislav Odintsov wrote:
>>> This change will be useful in next commit.
>>> 
>>> Signed-off-by: Vladislav Odintsov >> >
>>> ---
>> 
>> Hi Vladislav,
>> 
>> This looks OK to me but I think I'd squash it in the patch that actually
>> uses the new way of calling ic_route_find().
> 
> +1 for this.
> 
> I'd also suggest splitting this series into 2.
> 
> Patch 1, 2, 3, 5 and 7 into 1 series since these patches are fixing ic
> related issues.
> These can be backported easily to older branches.
> 
> Patch 4 and 6 can be a separate patch series independent of these.   I
> think these 2 patches
> need to be carefully reviewed.
> 
> @Dumitru Ceara  Do you have any objections ?
> 
> Thanks for identifying these issues and fixing them.
> 
> Thanks
> Numan
> 
>> 
>> Thanks,
>> Dumitru
>> 
>>> ic/ovn-ic.c | 45 +++--
>>> 1 file changed, 27 insertions(+), 18 deletions(-)
>>> 
>>> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
>>> index e5c193d9d..50ff65a26 100644
>>> --- a/ic/ovn-ic.c
>>> +++ b/ic/ovn-ic.c
>>> @@ -881,10 +881,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned 
>>> int plen,
>>> static struct ic_route_info *
>>> ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
>>>   unsigned int plen, const struct in6_addr *nexthop,
>>> -  const char *origin, char *route_table)
>>> +  const char *origin, const char *route_table, uint32_t hash)
>>> {
>>> struct ic_route_info *r;
>>> -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
>>> route_table);
>>> +if (!hash) {
>>> +hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
>>> +}
>>> HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
>>> if (ipv6_addr_equals(>prefix, prefix) &&
>>> r->plen == plen &&
>>> @@ -942,8 +944,8 @@ add_to_routes_learned(struct hmap *routes_learned,
>>> }
>>> const char *origin = smap_get_def(_route->options, "origin", "");
>>> if (ic_route_find(routes_learned, , plen, , origin,
>>> -  nb_route->route_table)) {
>>> -/* Route is already added to learned in previous iteration. */
>>> +  nb_route->route_table, 0)) {
>>> +/* Route was added to learned on previous iteration. */
>>> return true;
>>> }
>>> 
>>> @@ -1090,10 +1092,21 @@ route_need_advertise(const char *policy,
>>> }
>>> 
>>> static void
>>> -add_to_routes_ad(struct hmap *routes_ad,
>>> - const struct nbrec_logical_router_static_route *nb_route,
>>> - const struct lport_addresses *nexthop_addresses,
>>> - const struct smap *nb_options, const char *route_table)
>>> +add_to_routes_ad(struct hmap *routes_ad, struct ic_route_info *ic_route)
>>> +{
>>> +uint hash = ic_route_hash(_route->prefix, ic_route->plen,
>>> +  _route->nexthop, ic_route->origin,
>>> +  ic_route->route_table ? ic_route->route_table
>>> +: "");
>>> +hmap_insert(routes_ad, _route->node, hash);
>>> +}
>>> +
>>> +static void
>>> +add_static_to_routes_ad(
>>> +struct hmap *routes_ad,
>>> +const struct nbrec_logical_router_static_route *nb_route,
>>> +const struct lport_addresses *nexthop_addresses,
>>> +const struct smap *nb_options, const char *route_table)
>>> {
>>> if (strcmp(route_table, nb_route->route_table)) {
>>> if (VLOG_IS_DBG_ENABLED()) {
>>> @@ -1149,9 +1162,7 @@ add_to_routes_ad(struct hmap *routes_ad,
>>> ic_route->nb_route = nb_route;
>>> ic_route->origin = ROUTE_ORIGIN_STATIC;
>>> ic_route->route_table = nb_route->route_table;
>>> -hmap_insert(routes_ad, _route->node,
>>> -ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
>>> -  nb_route->route_table));
>>> +add_to_routes_ad(routes_ad, ic_route);
>>> }
>>> 
>>> static void
>>> @@ -1204,9 +1215,7 @@ add_network_to_routes_ad(struct hmap *routes_ad, 
>>> const char *network,
>>> 
>>> /* directly-connected routes go to  route table */
>>> ic_route->route_table = NULL;
>>> -hmap_insert(routes_ad, _route->node,
>>> -ic_route_hash(, plen, ,
>>> -  ROUTE_ORIGIN_CONNECTED, ""));
>>> +add_to_routes_ad(routes_ad, ic_route);
>>> }
>>> 
>>> static bool
>>> @@ -1366,7 +1375,7 @@ sync_learned_routes(struct ic_context *ctx,
>>> struct ic_route_info *route_learned
>>> = ic_route_find(_lr->routes_learned, , plen,
>>> , isb_route->origin,
>>> -

Re: [ovs-dev] [PATCH ovn 2/7] ic: remove orphan ovn interconnection routes

2022-12-05 Thread Vladislav Odintsov
Hi Dumitru,

please, see answer inline.

Regards,
Vladislav Odintsov

> On 5 Dec 2022, at 19:37, Dumitru Ceara  wrote:
> 
> On 12/2/22 18:31, Vladislav Odintsov wrote:
>> Before this patch if one deletes transit switch through which there were
>> routes in ICSB:Route table, such routes were left forever in the DB.
>> 
>> Now we validate that each ICSB:Route has an appropriate transit switch.
>> 
>> Signed-off-by: Vladislav Odintsov 
>> ---
>> ic/ovn-ic.c | 40 +++
>> tests/ovn-ic.at | 73 +
>> 2 files changed, 113 insertions(+)
>> 
>> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
>> index 50ff65a26..b3790e965 100644
>> --- a/ic/ovn-ic.c
>> +++ b/ic/ovn-ic.c
>> @@ -71,6 +71,7 @@ struct ic_context {
>> struct ovsdb_idl_index *icsbrec_port_binding_by_az;
>> struct ovsdb_idl_index *icsbrec_port_binding_by_ts;
>> struct ovsdb_idl_index *icsbrec_port_binding_by_ts_az;
>> +struct ovsdb_idl_index *icsbrec_route_by_az;
>> struct ovsdb_idl_index *icsbrec_route_by_ts;
>> struct ovsdb_idl_index *icsbrec_route_by_ts_az;
>> };
>> @@ -1621,6 +1622,38 @@ advertise_lr_routes(struct ic_context *ctx,
>> hmap_destroy(_ad);
>> }
>> 
>> +static void
>> +delete_orphan_ic_routes(struct ic_context *ctx,
>> + const struct icsbrec_availability_zone *az)
>> +{
>> +const struct icsbrec_route *isb_route, *isb_route_key =
>> +icsbrec_route_index_init_row(ctx->icsbrec_route_by_az);
>> +icsbrec_route_index_set_availability_zone(isb_route_key, az);
>> +
>> +const struct icnbrec_transit_switch *t_sw, *t_sw_key;
>> +
>> +ICSBREC_ROUTE_FOR_EACH_EQUAL (isb_route, isb_route_key,
>> +  ctx->icsbrec_route_by_az)
>> +{
>> +t_sw_key = icnbrec_transit_switch_index_init_row(
>> +ctx->icnbrec_transit_switch_by_name);
>> +icnbrec_transit_switch_index_set_name(t_sw_key,
>> +isb_route->transit_switch);
>> +t_sw = icnbrec_transit_switch_index_find(
>> +ctx->icnbrec_transit_switch_by_name, t_sw_key);
>> +icnbrec_transit_switch_index_destroy_row(t_sw_key);
>> +
>> +if (!t_sw) {
>> +VLOG_WARN("Deleting orphan ICDB:Route: %s->%s (%s, rtb:%s, "
>> +  "transit switch: %s)", isb_route->ip_prefix,
>> +  isb_route->nexthop, isb_route->origin,
>> +  isb_route->route_table, isb_route->transit_switch);
> 
> This seems like something that can happen under normal operation (e.g.,
> a zone going away).  I don't think we should WARN.  Maybe VLOG_INFO_RL
> is more appropriate?  What do you think?

No, the zone going away is not covered here. There is a for..each loop here, 
which iterates over routes from local az (index is used).
I don’t think that Availability Zone is a common scenario, but if user 
de-register Availability Zone, its port bindings should be removed as well as 
routes, gateways and encaps.

Regarding loglevel I’m agree with you — it’s a good notice, thanks. I’ll 
address this in v2.

> 
> Thanks,
> Dumitru
> 
>> +icsbrec_route_delete(isb_route);
>> +}
>> +}
>> +icsbrec_route_index_destroy_row(isb_route_key);
>> +}
>> +
>> static void
>> route_run(struct ic_context *ctx,
>>   const struct icsbrec_availability_zone *az)
>> @@ -1629,6 +1662,8 @@ route_run(struct ic_context *ctx,
>> return;
>> }
>> 
>> +delete_orphan_ic_routes(ctx, az);
>> +
>> struct hmap ic_lrs = HMAP_INITIALIZER(_lrs);
>> const struct icsbrec_port_binding *isb_pb;
>> const struct icsbrec_port_binding *isb_pb_key =
>> @@ -1917,6 +1952,10 @@ main(int argc, char *argv[])
>>   _port_binding_col_transit_switch,
>>   
>> _port_binding_col_availability_zone);
>> 
>> +struct ovsdb_idl_index *icsbrec_route_by_az
>> += ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
>> +  _route_col_availability_zone);
>> +
>> struct ovsdb_idl_index *icsbrec_route_by_ts
>> = ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
>>   _route_col_transit_switch);
>> @@ -1971,6 +2010,7 @@ main(int argc, char *argv[])
>> .icsbrec_port_binding_by_az = icsbrec_port_binding_by_az,
>> .icsbrec_port_binding_by_ts = icsbrec_port_binding_by_ts,
>> .icsbrec_port_binding_by_ts_az = 
>> icsbrec_port_binding_by_ts_az,
>> +.icsbrec_route_by_az = icsbrec_route_by_az,
>> .icsbrec_route_by_ts = icsbrec_route_by_ts,
>> .icsbrec_route_by_ts_az = icsbrec_route_by_ts_az,
>> };
>> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
>> index 0bdfc55e6..e234b7fb9 100644
>> --- a/tests/ovn-ic.at
>> +++ b/tests/ovn-ic.at
>> @@ -121,6 +121,79 @@ OVN_CLEANUP_IC
>> AT_CLEANUP
>> ])
>> 
>> +OVN_FOR_EACH_NORTHD([
>> 

Re: [ovs-dev] [PATCH ovn 1/7] ic: move routes_ad hmap insert to separate function

2022-12-05 Thread Numan Siddique
On Mon, Dec 5, 2022 at 11:37 AM Dumitru Ceara  wrote:
>
> On 12/2/22 18:31, Vladislav Odintsov wrote:
> > This change will be useful in next commit.
> >
> > Signed-off-by: Vladislav Odintsov 
> > ---
>
> Hi Vladislav,
>
> This looks OK to me but I think I'd squash it in the patch that actually
> uses the new way of calling ic_route_find().

+1 for this.

I'd also suggest splitting this series into 2.

Patch 1, 2, 3, 5 and 7 into 1 series since these patches are fixing ic
related issues.
These can be backported easily to older branches.

Patch 4 and 6 can be a separate patch series independent of these.   I
think these 2 patches
need to be carefully reviewed.

@Dumitru Ceara  Do you have any objections ?

Thanks for identifying these issues and fixing them.

Thanks
Numan

>
> Thanks,
> Dumitru
>
> >  ic/ovn-ic.c | 45 +++--
> >  1 file changed, 27 insertions(+), 18 deletions(-)
> >
> > diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> > index e5c193d9d..50ff65a26 100644
> > --- a/ic/ovn-ic.c
> > +++ b/ic/ovn-ic.c
> > @@ -881,10 +881,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned 
> > int plen,
> >  static struct ic_route_info *
> >  ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
> >unsigned int plen, const struct in6_addr *nexthop,
> > -  const char *origin, char *route_table)
> > +  const char *origin, const char *route_table, uint32_t hash)
> >  {
> >  struct ic_route_info *r;
> > -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
> > route_table);
> > +if (!hash) {
> > +hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
> > +}
> >  HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
> >  if (ipv6_addr_equals(>prefix, prefix) &&
> >  r->plen == plen &&
> > @@ -942,8 +944,8 @@ add_to_routes_learned(struct hmap *routes_learned,
> >  }
> >  const char *origin = smap_get_def(_route->options, "origin", "");
> >  if (ic_route_find(routes_learned, , plen, , origin,
> > -  nb_route->route_table)) {
> > -/* Route is already added to learned in previous iteration. */
> > +  nb_route->route_table, 0)) {
> > +/* Route was added to learned on previous iteration. */
> >  return true;
> >  }
> >
> > @@ -1090,10 +1092,21 @@ route_need_advertise(const char *policy,
> >  }
> >
> >  static void
> > -add_to_routes_ad(struct hmap *routes_ad,
> > - const struct nbrec_logical_router_static_route *nb_route,
> > - const struct lport_addresses *nexthop_addresses,
> > - const struct smap *nb_options, const char *route_table)
> > +add_to_routes_ad(struct hmap *routes_ad, struct ic_route_info *ic_route)
> > +{
> > +uint hash = ic_route_hash(_route->prefix, ic_route->plen,
> > +  _route->nexthop, ic_route->origin,
> > +  ic_route->route_table ? ic_route->route_table
> > +: "");
> > +hmap_insert(routes_ad, _route->node, hash);
> > +}
> > +
> > +static void
> > +add_static_to_routes_ad(
> > +struct hmap *routes_ad,
> > +const struct nbrec_logical_router_static_route *nb_route,
> > +const struct lport_addresses *nexthop_addresses,
> > +const struct smap *nb_options, const char *route_table)
> >  {
> >  if (strcmp(route_table, nb_route->route_table)) {
> >  if (VLOG_IS_DBG_ENABLED()) {
> > @@ -1149,9 +1162,7 @@ add_to_routes_ad(struct hmap *routes_ad,
> >  ic_route->nb_route = nb_route;
> >  ic_route->origin = ROUTE_ORIGIN_STATIC;
> >  ic_route->route_table = nb_route->route_table;
> > -hmap_insert(routes_ad, _route->node,
> > -ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
> > -  nb_route->route_table));
> > +add_to_routes_ad(routes_ad, ic_route);
> >  }
> >
> >  static void
> > @@ -1204,9 +1215,7 @@ add_network_to_routes_ad(struct hmap *routes_ad, 
> > const char *network,
> >
> >  /* directly-connected routes go to  route table */
> >  ic_route->route_table = NULL;
> > -hmap_insert(routes_ad, _route->node,
> > -ic_route_hash(, plen, ,
> > -  ROUTE_ORIGIN_CONNECTED, ""));
> > +add_to_routes_ad(routes_ad, ic_route);
> >  }
> >
> >  static bool
> > @@ -1366,7 +1375,7 @@ sync_learned_routes(struct ic_context *ctx,
> >  struct ic_route_info *route_learned
> >  = ic_route_find(_lr->routes_learned, , plen,
> >  , isb_route->origin,
> > -isb_route->route_table);
> > +isb_route->route_table, 0);
> >  if (route_learned) {
> >  /* Sync external-ids */
> >  struct uuid ext_id;
> > @@ -1465,7 +1474,7 @@ 

Re: [ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Dumitru Ceara
On 12/5/22 16:22, Lorenzo Bianconi wrote:
>> On 12/5/22 16:16, Lorenzo Bianconi wrote:
 For the case when multiple LBs (same VIP but different port) share the
 same subset of backends we need to differentiate between them by also
 matching on the L4 port.  Without that affinity configuration from one
 load balancer might be incorrectly applied to another.

 Adapt the unit and system tests to cover this scenario too.

 Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
 Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
 Reported-by: Surya Seetharaman 
 Signed-off-by: Dumitru Ceara 
>>>
>>> Hi Dumitru,
>>>
>>
>> Hi Lorenzo,
>>
>>> thx for fixing this issue, just a small nit inline.
>>>
>>> Acked-by: Lorenzo Bianconi 
>>>
>>
>> Thanks for your review!
>>
 ---
  northd/northd.c | 48 +++---
  tests/ovn-northd.at |  8 +++
  tests/system-ovn.at | 57 -
  3 files changed, 95 insertions(+), 18 deletions(-)

 diff --git a/northd/northd.c b/northd/northd.c
 index 74facce7ac..27047ff74b 100644
 --- a/northd/northd.c
 +++ b/northd/northd.c
 @@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows, 
 struct ovn_northd_lb *lb,
   *   table=lr_in_lb_aff_learn, priority=100
   *  match=(REGBIT_KNOWN_LB_SESSION == 0
   * && ct.new && ip4
 - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == 
 BP1)
 + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
 + * && ip4.dst == B1 && tcp.dst == BP1)
   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
   *proto = tcp, timeout = T));
   *   table=lr_in_lb_aff_learn, priority=100
   *  match=(REGBIT_KNOWN_LB_SESSION == 0
   * && ct.new && ip4
 - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == 
 BP2)
 + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
 + * && ip4.dst == B2 && tcp.dst == BP2)
   *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
   *proto = tcp, timeout = T));
   *
 @@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
 struct ovn_northd_lb *lb,
  const char *ip_match = ipv6 ? "ip6" : "ip4";
  
  const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
 +const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;
>>>
>>> do we need reg_port? I guess we can just use REG_ORIG_TP_DPORT_ROUTER 
>>> directly.
>>>
>>
>> We can use it directly but I wanted to match the rest of the flow's
>> style.  Would it seem better if I renamed it to 'reg_vport'?
>> Alternatively, if you prefer, I can easily inline it.
> 
> I would say to use REG_ORIG_TP_DPORT_ROUTER directly, in the other cases we
> have ternary operator, but I do not have a strong opinion on it, up to you.
> 

OK, I made the change you suggested and then pushed the patch to the
main branch and to branch-22.12.

Thanks for the reviews Lorenzo and Ales!

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 0/7] OVN IC bugfixes & proposals/questions

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> Hi,
> 
> we’ve met with an issue, where it was possible to create multiple similar
> routes within LR (same ip_prefix, nexthop, and route table).
> 
> Initially the problem stared after OVN upgrade. We use python ovsdbapp 
> library,
> and we found a problem in python-ovs, which is described here
> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
> colleague Anton.  @Terry Wilson, please take a look on this.
> 
> The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
> that there are a couple of bugs in different places, part of which this RFC
> used to cover.
> 
> How the issue was initially reproduced:
> 
> 1. assume we have (at least) 2-Availability Zone OVN deployment
>(utilising ovn-ic infrastructure).
> 2. create transit switch in IC NB
> 3. create LR in each AZ, connect them to transit switch
> 4. create one logical switch with a VIF port attached to local OVS &
>connect this logical switch to LR (e.g. 192.168.0.1/24)
> 5. install in one AZ in LR 2 static routes with a create command (invoke
>next command twice):
> 
>ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 
> nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id
> 
> From this time there is a couple of strange behaviour/bugs appear:
> 
> 1. [possible problem] There is a duplicated route in the NB within a
>single LR.  lflow is computed to have ECMP group with two similar
>routes:
> 
>table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst 
> == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; 
> reg8[16..31] = select(1, 2);
>table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
> 
>Maybe, it’s better to have some kind of handling such routes?
>ovsdb index or some logic in ovn-northd?
> 
> 2. [bug] There is a duplicated route advertisement in
>OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>new index to this table for availability_zone, transit_switch,
>ip_prefix, nexthop and route_table; adding a logic to check if the
>route was already advertised (covered in Patch #7).
> 
> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>on the opposite availability zone adds one new same route.  It creates
>thousands of same routes each second. This bug is covered by Patch #7.
> 
> 4. [possible problem] After multiple routes are learned to NB on the
>opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>in #1: one in lr_in_ip_routing with select()
>and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>installing UINT_MAX routes within ECMP group.
> 
> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>ovn-controller installed long-long openflow group rule
>(group #3):
> 
># ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>797824
> 
>When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>next error in console:
> 
># ovs-ofctl dump-groups br-int
>ovs-ofctl: OpenFlow packet receive failed (End of file)
> 
>In ovs-vswitchd I see next error in logs and after this line ovs is
>restarted:
> 
>2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion 
> start_ofs <= UINT16_MAX failed in ofpmp_postappend()

This looks like an OVS bug to me.  Ilya, what do you think the best way
to fix this is?

> 
>If I issue command again, sometimes it prints same error, but
>sometimes this one (I had on the dev machine another OVN LB, so there
>are excess groups):
> 
># ovs-ofctl dump-groups br-int
>NXST_GROUP_DESC reply (xid=0x2): flags=[more]
>
> group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>
> group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>2022-11-17T17:53:41Z|1|ofp_group|WARN|OpenFlow message bucket length 
> 56 exceeds remaining buckets data size 40
>NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
>  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..# 
> |
>0010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 
> |.@..|
>0020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 
> |.8.(|

Re: [ovs-dev] [PATCH ovn 6/7] northd: limit ECMP group by 1024 members

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> This patch is intended to show that currently it's possible to build
> ECMP group of 65k buckets.
> 
> Signed-off-by: Vladislav Odintsov 
> ---
>  northd/northd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index e1f3bace8..f8f7977ae 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -9271,7 +9271,7 @@ static void
>  ecmp_groups_add_route(struct ecmp_groups_node *group,
>const struct parsed_route *route)
>  {
> -if (group->route_count == UINT16_MAX) {
> +if (group->route_count == 1024) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>  VLOG_WARN_RL(, "too many routes in a single ecmp group.");
>  return;

Should we make the limit configurable?  What if the CMS wants to install
a route with more than 1K paths?  Not sure if that's realistic but I
would avoid the hardcoded 1K.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 5/7] ic: minor code improvements

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> 1. Remove excess nbrec_logical_router variable.
> 2. Remove excess call to add_static_to_routes_ad().
> 3. Remove double route_table check in ic_route_fin().

Nit: s/route_table/nexthop/

> 4. Move variable declarations out of loop.
> 
> Signed-off-by: Vladislav Odintsov 
> ---
>  ic/ovn-ic.c | 31 ++-
>  1 file changed, 10 insertions(+), 21 deletions(-)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 3e02b4c98..59468545d 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -895,8 +895,7 @@ ic_route_find(struct hmap *routes, const struct in6_addr 
> *prefix,
>  r->plen == plen &&
>  ipv6_addr_equals(>nexthop, nexthop) &&
>  !strcmp(r->origin, origin) &&
> -!strcmp(r->route_table ? r->route_table : "", route_table) &&
> -ipv6_addr_equals(>nexthop, nexthop)) {
> +!strcmp(r->route_table ? r->route_table : "", route_table)) {
>  return r;
>  }
>  }
> @@ -1109,17 +1108,8 @@ add_static_to_routes_ad(
>  struct hmap *routes_ad,
>  const struct nbrec_logical_router_static_route *nb_route,
>  const struct lport_addresses *nexthop_addresses,
> -const struct smap *nb_options, const char *route_table)
> +const struct smap *nb_options)
>  {
> -if (strcmp(route_table, nb_route->route_table)) {
> -if (VLOG_IS_DBG_ENABLED()) {
> -VLOG_DBG("Skip advertising route %s -> %s as its route table %s 
> !="
> - " %s of TS port", nb_route->ip_prefix, 
> nb_route->nexthop,
> - nb_route->route_table, route_table);
> -}
> -return;
> -}
> -
>  struct in6_addr prefix, nexthop;
>  unsigned int plen;
>  if (!parse_route(nb_route->ip_prefix, nb_route->nexthop,
> @@ -1541,13 +1531,13 @@ build_ts_routes_to_adv(struct ic_context *ctx,
>  {
>  const struct nbrec_logical_router *lr = ic_lr->lr;
>  
> +const struct nbrec_logical_router_static_route *nb_route;

I'm not sure I agree with this one.  Why not keep it inside the for loop
below.  We don't use 'nb_route' afterwards AFAICT.

> +struct uuid id;
> +
>  /* Check static routes of the LR */
>  for (int i = 0; i < lr->n_static_routes; i++) {
> -const struct nbrec_logical_router_static_route *nb_route
> -= lr->static_routes[i];
> -struct uuid isb_uuid;
> -if (smap_get_uuid(_route->external_ids, "ic-learned-route",
> -  _uuid)) {
> +nb_route = lr->static_routes[i];
> +if (smap_get_uuid(_route->external_ids, "ic-learned-route", )) 
> {
>  /* It is a learned route */
>  if (!add_to_routes_learned(_lr->routes_learned, nb_route)) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
> 1);
> @@ -1557,10 +1547,10 @@ build_ts_routes_to_adv(struct ic_context *ctx,
>  nbrec_logical_router_update_static_routes_delvalue(lr,
>  nb_route);
>  }
> -} else {
> +} else if (!strcmp(ts_route_table, nb_route->route_table)) {
>  /* It may be a route to be advertised */
>  add_static_to_routes_ad(routes_ad, nb_route, ts_port_addrs,
> -_global->options, ts_route_table);
> +_global->options);
>  }
>  }
>  
> @@ -1593,7 +1583,6 @@ advertise_lr_routes(struct ic_context *ctx,
>  const struct icsbrec_port_binding *isb_pb;
>  const char *lrp_name, *ts_name, *route_table;
>  struct lport_addresses ts_port_addrs;
> -const struct nbrec_logical_router *lr = ic_lr->lr;
>  const struct icnbrec_transit_switch *key;
>  
>  struct hmap routes_ad = HMAP_INITIALIZER(_ad);
> @@ -1611,7 +1600,7 @@ advertise_lr_routes(struct ic_context *ctx,
>  VLOG_INFO_RL(, "Route sync ignores port %s on ts %s for 
> router"
>   " %s because the addresses are invalid.",
>   isb_pb->logical_port, isb_pb->transit_switch,
> - lr->name);
> + ic_lr->lr->name);
>  continue;
>  }
>  lrp_name = get_lrp_name_by_ts_port_name(ctx, isb_pb->logical_port);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 4/7] actions: limit possible OF group bucket count

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> It is possible to send OpenFlow group_mod message to OVS to create a
> group with any number of buckets:
> 
> ovs-ofctl dump-groups br-int
> NXST_GROUP_DESC reply (xid=0x2):
>  
> group_id=4,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,20),...bucket=bucket_id:1,...
> 
> This patch introduces a limit of buckets that may be requested to 1024.
> In case the limit is reached, ovn-controller will write WARN log about
> this fact.

Isn't it simpler to just limit the number of buckets in northd instead?
What is the downside of doing that instead?

Thanks,
Dumitru

> 
> Signed-off-by: Vladislav Odintsov 
> ---
>  lib/actions.c | 40 ++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/actions.c b/lib/actions.c
> index adbb42db4..4322556bf 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -44,6 +44,9 @@
>  #include "controller/lflow.h"
>  
>  VLOG_DEFINE_THIS_MODULE(actions);
> +
> +#define MAX_BUCKETS_PER_GROUP 1024
> +
>  
>  /* Prototypes for functions to be defined by each action. */
>  #define OVNACT(ENUM, STRUCT)\
> @@ -1371,7 +1374,18 @@ encode_ct_lb(const struct ovnact_ct_lb *cl,
>  BUILD_ASSERT(MFF_LOG_CT_ZONE < MFF_REG0 + FLOW_N_REGS);
>  BUILD_ASSERT(MFF_LOG_DNAT_ZONE >= MFF_REG0);
>  BUILD_ASSERT(MFF_LOG_DNAT_ZONE < MFF_REG0 + FLOW_N_REGS);
> -for (size_t bucket_id = 0; bucket_id < cl->n_dsts; bucket_id++) {
> +
> +int n_buckets;
> +bool group_overflow = false;
> +if (cl->n_dsts > MAX_BUCKETS_PER_GROUP) {
> +n_buckets = MAX_BUCKETS_PER_GROUP;
> +group_overflow = true;
> +}
> +else {
> +n_buckets = cl->n_dsts;
> +}
> +
> +for (size_t bucket_id = 0; bucket_id < n_buckets; bucket_id++) {
>  const struct ovnact_ct_lb_dst *dst = >dsts[bucket_id];
>  char ip_addr[INET6_ADDRSTRLEN];
>  if (dst->family == AF_INET) {
> @@ -1405,6 +1419,12 @@ encode_ct_lb(const struct ovnact_ct_lb *cl,
>  /* Create an action to set the group. */
>  og = ofpact_put_GROUP(ofpacts);
>  og->group_id = table_id;
> +
> +if (group_overflow) {
> +VLOG_WARN("OF group id '%d' is desired to have more than "
> +  "MAX_BUCKETS_PER_GROUP (%d) buckets. Limited to maximum.",
> +  table_id, MAX_BUCKETS_PER_GROUP);
> +}
>  }
>  
>  static void
> @@ -1542,7 +1562,17 @@ encode_SELECT(const struct ovnact_select *select,
>  
>  struct mf_subfield sf = expr_resolve_field(>res_field);
>  
> -for (size_t bucket_id = 0; bucket_id < select->n_dsts; bucket_id++) {
> +int n_buckets;
> +bool group_overflow = false;
> +if (select->n_dsts > MAX_BUCKETS_PER_GROUP) {
> +n_buckets = MAX_BUCKETS_PER_GROUP;
> +group_overflow = true;
> +}
> +else {
> +n_buckets = select->n_dsts;
> +}
> +
> +for (size_t bucket_id = 0; bucket_id < n_buckets; bucket_id++) {
>  const struct ovnact_select_dst *dst = >dsts[bucket_id];
>  ds_put_format(, ",bucket=bucket_id=%"PRIuSIZE",weight:%"PRIu16
>",actions=", bucket_id, dst->weight);
> @@ -1561,6 +1591,12 @@ encode_SELECT(const struct ovnact_select *select,
>  /* Create an action to set the group. */
>  og = ofpact_put_GROUP(ofpacts);
>  og->group_id = table_id;
> +
> +if (group_overflow) {
> +VLOG_WARN("OF group id '%d' is desired to have more than "
> +  "MAX_BUCKETS_PER_GROUP (%d) buckets. Limited to maximum.",
> +  table_id, MAX_BUCKETS_PER_GROUP);
> +}
>  }
>  
>  static void

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 3/7] ic: lookup southbound port_binding only if needed

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> Signed-off-by: Vladislav Odintsov 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/7] ic: move routes_ad hmap insert to separate function

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> This change will be useful in next commit.
> 
> Signed-off-by: Vladislav Odintsov 
> ---

Hi Vladislav,

This looks OK to me but I think I'd squash it in the patch that actually
uses the new way of calling ic_route_find().

Thanks,
Dumitru

>  ic/ovn-ic.c | 45 +++--
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index e5c193d9d..50ff65a26 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -881,10 +881,12 @@ ic_route_hash(const struct in6_addr *prefix, unsigned 
> int plen,
>  static struct ic_route_info *
>  ic_route_find(struct hmap *routes, const struct in6_addr *prefix,
>unsigned int plen, const struct in6_addr *nexthop,
> -  const char *origin, char *route_table)
> +  const char *origin, const char *route_table, uint32_t hash)
>  {
>  struct ic_route_info *r;
> -uint32_t hash = ic_route_hash(prefix, plen, nexthop, origin, 
> route_table);
> +if (!hash) {
> +hash = ic_route_hash(prefix, plen, nexthop, origin, route_table);
> +}
>  HMAP_FOR_EACH_WITH_HASH (r, node, hash, routes) {
>  if (ipv6_addr_equals(>prefix, prefix) &&
>  r->plen == plen &&
> @@ -942,8 +944,8 @@ add_to_routes_learned(struct hmap *routes_learned,
>  }
>  const char *origin = smap_get_def(_route->options, "origin", "");
>  if (ic_route_find(routes_learned, , plen, , origin,
> -  nb_route->route_table)) {
> -/* Route is already added to learned in previous iteration. */
> +  nb_route->route_table, 0)) {
> +/* Route was added to learned on previous iteration. */
>  return true;
>  }
>  
> @@ -1090,10 +1092,21 @@ route_need_advertise(const char *policy,
>  }
>  
>  static void
> -add_to_routes_ad(struct hmap *routes_ad,
> - const struct nbrec_logical_router_static_route *nb_route,
> - const struct lport_addresses *nexthop_addresses,
> - const struct smap *nb_options, const char *route_table)
> +add_to_routes_ad(struct hmap *routes_ad, struct ic_route_info *ic_route)
> +{
> +uint hash = ic_route_hash(_route->prefix, ic_route->plen,
> +  _route->nexthop, ic_route->origin,
> +  ic_route->route_table ? ic_route->route_table
> +: "");
> +hmap_insert(routes_ad, _route->node, hash);
> +}
> +
> +static void
> +add_static_to_routes_ad(
> +struct hmap *routes_ad,
> +const struct nbrec_logical_router_static_route *nb_route,
> +const struct lport_addresses *nexthop_addresses,
> +const struct smap *nb_options, const char *route_table)
>  {
>  if (strcmp(route_table, nb_route->route_table)) {
>  if (VLOG_IS_DBG_ENABLED()) {
> @@ -1149,9 +1162,7 @@ add_to_routes_ad(struct hmap *routes_ad,
>  ic_route->nb_route = nb_route;
>  ic_route->origin = ROUTE_ORIGIN_STATIC;
>  ic_route->route_table = nb_route->route_table;
> -hmap_insert(routes_ad, _route->node,
> -ic_route_hash(, plen, , ROUTE_ORIGIN_STATIC,
> -  nb_route->route_table));
> +add_to_routes_ad(routes_ad, ic_route);
>  }
>  
>  static void
> @@ -1204,9 +1215,7 @@ add_network_to_routes_ad(struct hmap *routes_ad, const 
> char *network,
>  
>  /* directly-connected routes go to  route table */
>  ic_route->route_table = NULL;
> -hmap_insert(routes_ad, _route->node,
> -ic_route_hash(, plen, ,
> -  ROUTE_ORIGIN_CONNECTED, ""));
> +add_to_routes_ad(routes_ad, ic_route);
>  }
>  
>  static bool
> @@ -1366,7 +1375,7 @@ sync_learned_routes(struct ic_context *ctx,
>  struct ic_route_info *route_learned
>  = ic_route_find(_lr->routes_learned, , plen,
>  , isb_route->origin,
> -isb_route->route_table);
> +isb_route->route_table, 0);
>  if (route_learned) {
>  /* Sync external-ids */
>  struct uuid ext_id;
> @@ -1465,7 +1474,7 @@ advertise_routes(struct ic_context *ctx,
>  }
>  struct ic_route_info *route_adv =
>  ic_route_find(routes_ad, , plen, ,
> -  isb_route->origin, isb_route->route_table);
> +  isb_route->origin, isb_route->route_table, 0);
>  if (!route_adv) {
>  /* Delete the extra route from IC-SB. */
>  VLOG_DBG("Delete route %s -> %s from IC-SB, which is not found"
> @@ -1547,8 +1556,8 @@ build_ts_routes_to_adv(struct ic_context *ctx,
>  }
>  } else {
>  /* It may be a route to be advertised */
> -add_to_routes_ad(routes_ad, nb_route, ts_port_addrs,
> - 

Re: [ovs-dev] [PATCH ovn 2/7] ic: remove orphan ovn interconnection routes

2022-12-05 Thread Dumitru Ceara
On 12/2/22 18:31, Vladislav Odintsov wrote:
> Before this patch if one deletes transit switch through which there were
> routes in ICSB:Route table, such routes were left forever in the DB.
> 
> Now we validate that each ICSB:Route has an appropriate transit switch.
> 
> Signed-off-by: Vladislav Odintsov 
> ---
>  ic/ovn-ic.c | 40 +++
>  tests/ovn-ic.at | 73 +
>  2 files changed, 113 insertions(+)
> 
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 50ff65a26..b3790e965 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -71,6 +71,7 @@ struct ic_context {
>  struct ovsdb_idl_index *icsbrec_port_binding_by_az;
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts;
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts_az;
> +struct ovsdb_idl_index *icsbrec_route_by_az;
>  struct ovsdb_idl_index *icsbrec_route_by_ts;
>  struct ovsdb_idl_index *icsbrec_route_by_ts_az;
>  };
> @@ -1621,6 +1622,38 @@ advertise_lr_routes(struct ic_context *ctx,
>  hmap_destroy(_ad);
>  }
>  
> +static void
> +delete_orphan_ic_routes(struct ic_context *ctx,
> + const struct icsbrec_availability_zone *az)
> +{
> +const struct icsbrec_route *isb_route, *isb_route_key =
> +icsbrec_route_index_init_row(ctx->icsbrec_route_by_az);
> +icsbrec_route_index_set_availability_zone(isb_route_key, az);
> +
> +const struct icnbrec_transit_switch *t_sw, *t_sw_key;
> +
> +ICSBREC_ROUTE_FOR_EACH_EQUAL (isb_route, isb_route_key,
> +  ctx->icsbrec_route_by_az)
> +{
> +t_sw_key = icnbrec_transit_switch_index_init_row(
> +ctx->icnbrec_transit_switch_by_name);
> +icnbrec_transit_switch_index_set_name(t_sw_key,
> +isb_route->transit_switch);
> +t_sw = icnbrec_transit_switch_index_find(
> +ctx->icnbrec_transit_switch_by_name, t_sw_key);
> +icnbrec_transit_switch_index_destroy_row(t_sw_key);
> +
> +if (!t_sw) {
> +VLOG_WARN("Deleting orphan ICDB:Route: %s->%s (%s, rtb:%s, "
> +  "transit switch: %s)", isb_route->ip_prefix,
> +  isb_route->nexthop, isb_route->origin,
> +  isb_route->route_table, isb_route->transit_switch);

This seems like something that can happen under normal operation (e.g.,
a zone going away).  I don't think we should WARN.  Maybe VLOG_INFO_RL
is more appropriate?  What do you think?

Thanks,
Dumitru

> +icsbrec_route_delete(isb_route);
> +}
> +}
> +icsbrec_route_index_destroy_row(isb_route_key);
> +}
> +
>  static void
>  route_run(struct ic_context *ctx,
>const struct icsbrec_availability_zone *az)
> @@ -1629,6 +1662,8 @@ route_run(struct ic_context *ctx,
>  return;
>  }
>  
> +delete_orphan_ic_routes(ctx, az);
> +
>  struct hmap ic_lrs = HMAP_INITIALIZER(_lrs);
>  const struct icsbrec_port_binding *isb_pb;
>  const struct icsbrec_port_binding *isb_pb_key =
> @@ -1917,6 +1952,10 @@ main(int argc, char *argv[])
>_port_binding_col_transit_switch,
>
> _port_binding_col_availability_zone);
>  
> +struct ovsdb_idl_index *icsbrec_route_by_az
> += ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
> +  _route_col_availability_zone);
> +
>  struct ovsdb_idl_index *icsbrec_route_by_ts
>  = ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
>_route_col_transit_switch);
> @@ -1971,6 +2010,7 @@ main(int argc, char *argv[])
>  .icsbrec_port_binding_by_az = icsbrec_port_binding_by_az,
>  .icsbrec_port_binding_by_ts = icsbrec_port_binding_by_ts,
>  .icsbrec_port_binding_by_ts_az = 
> icsbrec_port_binding_by_ts_az,
> +.icsbrec_route_by_az = icsbrec_route_by_az,
>  .icsbrec_route_by_ts = icsbrec_route_by_ts,
>  .icsbrec_route_by_ts_az = icsbrec_route_by_ts_az,
>  };
> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
> index 0bdfc55e6..e234b7fb9 100644
> --- a/tests/ovn-ic.at
> +++ b/tests/ovn-ic.at
> @@ -121,6 +121,79 @@ OVN_CLEANUP_IC
>  AT_CLEANUP
>  ])
>  
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn-ic -- route deletion upon TS deletion])
> +
> +ovn_init_ic_db
> +net_add n1
> +
> +# 1 GW per AZ
> +for i in 1 2; do
> +az=az$i
> +ovn_start $az
> +sim_add gw-$az
> +as gw-$az
> +check ovs-vsctl add-br br-phys
> +ovn_az_attach $az n1 br-phys 192.168.1.$i
> +check ovs-vsctl set open . external-ids:ovn-is-interconn=true
> +check ovn-nbctl set nb-global . \
> +options:ic-route-adv=true \
> +options:ic-route-adv-default=true \
> +options:ic-route-learn=true \
> +options:ic-route-learn-default=true
> +done
> +
> +create_ic_infra() {
> +az_id=$1

Re: [ovs-dev] [PATCH ovn] controller: Restore MAC and vlan for DVR scenario

2022-12-05 Thread Dumitru Ceara
On 9/30/22 12:43, Dumitru Ceara wrote:
> On 9/20/22 22:18, Mark Michelson wrote:
>> Thanks Ales,
>>
>> Acked-by: Mark Michelson 
>>
> 
> I applied this to the main branch and backported it to all stable
> branches down to branch-22.03.
> 

Hi all,

There was an internal request from within Red Hat (from Jakub in CC) to
backport this fix to branch-21.12 too.  If nobody is it I can take care
of porting the patch to 21.12.

I'll wait a day or two to give people time to reply.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] mlx5 rte_eth_dev_info.reta_size value

2022-12-05 Thread Robin Jarry
Hi Ori,

While working on a patch for OvS[1], I have tried to reconfigure the
redirection table using the code examples that are layout around in
testpmd and other places.

[1]: 
http://patchwork.ozlabs.org/project/openvswitch/patch/20221021145308.141933-1-rja...@redhat.com/

Here is a stripped down version of the code I use:

 int update_reta(int port_id, int num_rxq)
 {
   struct rte_eth_rss_reta_entry64 *conf;
   struct rte_eth_dev_info info;
   size_t conf_size;
   int err;

   rte_eth_dev_info_get(port_id, );
   conf_size = (info.reta_size / RTE_ETH_RETA_GROUP_SIZE) * sizeof(*conf);
   conf = malloc(conf_size);
   memset(conf, 0, conf_size);

   for (uint16_t i = 0; i < info.reta_size; i++) {
 uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE;
 uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE;
 reta_conf[idx].mask |= 1ULL << shift;
 reta_conf[idx].reta[shift] = i % num_rxq;
   }
   err = rte_eth_dev_rss_reta_update(port_id, conf, info.reta_size);
   free(conf);

   return err;
 }

This works well for i40e and ice drivers but I get very confusing
reta_size values with mlx5.

mlx5_ethdev.c

333├>info->reta_size = priv->reta_idx_n ?
334│ priv->reta_idx_n : config->ind_table_max_size;

(gdb) p priv->reta_idx_n
$5 = 2
(gdb) p config->ind_table_max_size
$6 = 512

Obviously, info.reta_size / RTE_ETH_RETA_GROUP_SIZE = 1 / 512 = 0

From what I had understood info.reta_size should be a multiple of
RTE_ETH_RETA_GROUP_SIZE. This is what I can observe with i40e and ice at
least. Is it possible that the mlx5 driver has an issue there?

I found this commit[2] from 2015 that may have introduced an issue but
I am surprised that no one has ever encountered that before me. The
suspicious code bit is:

+/* If the requested number of RX queues is not a power of two, use the
+ * maximum indirection table size for better balancing.
+ * The result is always rounded to the next power of two. */
+reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
+priv->ind_table_max_size :
+rxqs_n));

When rxqs_n == 2, reta_idx_n is initialized to 2 as well.

[2]: https://git.dpdk.org/dpdk/commit/?id=634efbc2c8c05

If you can provide any help, that would be much appreciated.

Thanks!

-- 
Robin Jarry
Principal Software Engineer
Red Hat

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Lorenzo Bianconi
> On 12/5/22 16:16, Lorenzo Bianconi wrote:
> >> For the case when multiple LBs (same VIP but different port) share the
> >> same subset of backends we need to differentiate between them by also
> >> matching on the L4 port.  Without that affinity configuration from one
> >> load balancer might be incorrectly applied to another.
> >>
> >> Adapt the unit and system tests to cover this scenario too.
> >>
> >> Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
> >> Reported-by: Surya Seetharaman 
> >> Signed-off-by: Dumitru Ceara 
> > 
> > Hi Dumitru,
> > 
> 
> Hi Lorenzo,
> 
> > thx for fixing this issue, just a small nit inline.
> > 
> > Acked-by: Lorenzo Bianconi 
> > 
> 
> Thanks for your review!
> 
> >> ---
> >>  northd/northd.c | 48 +++---
> >>  tests/ovn-northd.at |  8 +++
> >>  tests/system-ovn.at | 57 -
> >>  3 files changed, 95 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index 74facce7ac..27047ff74b 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows, 
> >> struct ovn_northd_lb *lb,
> >>   *   table=lr_in_lb_aff_learn, priority=100
> >>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
> >>   * && ct.new && ip4
> >> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == 
> >> BP1)
> >> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> >> + * && ip4.dst == B1 && tcp.dst == BP1)
> >>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
> >>   *proto = tcp, timeout = T));
> >>   *   table=lr_in_lb_aff_learn, priority=100
> >>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
> >>   * && ct.new && ip4
> >> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == 
> >> BP2)
> >> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> >> + * && ip4.dst == B2 && tcp.dst == BP2)
> >>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
> >>   *proto = tcp, timeout = T));
> >>   *
> >> @@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
> >> struct ovn_northd_lb *lb,
> >>  const char *ip_match = ipv6 ? "ip6" : "ip4";
> >>  
> >>  const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
> >> +const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;
> > 
> > do we need reg_port? I guess we can just use REG_ORIG_TP_DPORT_ROUTER 
> > directly.
> > 
> 
> We can use it directly but I wanted to match the rest of the flow's
> style.  Would it seem better if I renamed it to 'reg_vport'?
> Alternatively, if you prefer, I can easily inline it.

I would say to use REG_ORIG_TP_DPORT_ROUTER directly, in the other cases we
have ternary operator, but I do not have a strong opinion on it, up to you.

Regards,
Lorenzo

> 
> What do you think?
> 
> Thanks,
> Dumitru
> 
> > Regards,
> > Lorenzo
> > 
> >>  const char *reg_backend =
> >>  ipv6 ? REG_LB_L3_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
> >>  
> >> @@ -7040,7 +7043,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
> >> struct ovn_northd_lb *lb,
> >>  ds_put_cstr(_action_learn, "commit_lb_aff(vip = \"");
> >>  
> >>  if (lb_vip->vip_port) {
> >> -ds_put_format(_action_learn, ipv6 ? "[%s]:%d" : "%s:%d",
> >> +ds_put_format(_action_learn, ipv6 ? "[%s]:%"PRIu16 : 
> >> "%s:%"PRIu16,
> >>lb_vip->vip_str, lb_vip->vip_port);
> >>  } else {
> >>  ds_put_cstr(_action_learn, lb_vip->vip_str);
> >> @@ -7053,9 +7056,17 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
> >> struct ovn_northd_lb *lb,
> >>  ds_put_cstr(_action_learn, "\", backend = \"");
> >>  
> >>  /* Prepare common part of affinity learn match. */
> >> -ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> >> -  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> >> -  reg_vip, lb_vip->vip_str, ip_match);
> >> +if (lb_vip->vip_port) {
> >> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> >> +  "ct.new && %s && %s == %s && "
> >> +  "%s == %"PRIu16" && %s.dst == ", ip_match,
> >> +  reg_vip, lb_vip->vip_str,
> >> +  reg_port, lb_vip->vip_port, ip_match);
> >> +} else {
> >> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> >> +  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> >> +  reg_vip, lb_vip->vip_str, ip_match);
> >> +}
> >>  
> >>  /* Prepare common part of affinity match. */
> >>  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
> >> @@ -7172,13 

Re: [ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Ales Musil
Looks good to me, thanks.

Acked-by: Ales Musil 

On Mon, Dec 5, 2022 at 2:04 PM Dumitru Ceara  wrote:

> For the case when multiple LBs (same VIP but different port) share the
> same subset of backends we need to differentiate between them by also
> matching on the L4 port.  Without that affinity configuration from one
> load balancer might be incorrectly applied to another.
>
> Adapt the unit and system tests to cover this scenario too.
>
> Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
> Reported-by: Surya Seetharaman 
> Signed-off-by: Dumitru Ceara 
> ---
>  northd/northd.c | 48 +++---
>  tests/ovn-northd.at |  8 +++
>  tests/system-ovn.at | 57 -
>  3 files changed, 95 insertions(+), 18 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 74facce7ac..27047ff74b 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows,
> struct ovn_northd_lb *lb,
>   *   table=lr_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst ==
> BP1)
> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> + * && ip4.dst == B1 && tcp.dst == BP1)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
>   *proto = tcp, timeout = T));
>   *   table=lr_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst ==
> BP2)
> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> + * && ip4.dst == B2 && tcp.dst == BP2)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
>   *proto = tcp, timeout = T));
>   *
> @@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows,
> struct ovn_northd_lb *lb,
>  const char *ip_match = ipv6 ? "ip6" : "ip4";
>
>  const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
> +const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;
>  const char *reg_backend =
>  ipv6 ? REG_LB_L3_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
>
> @@ -7040,7 +7043,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows,
> struct ovn_northd_lb *lb,
>  ds_put_cstr(_action_learn, "commit_lb_aff(vip = \"");
>
>  if (lb_vip->vip_port) {
> -ds_put_format(_action_learn, ipv6 ? "[%s]:%d" : "%s:%d",
> +ds_put_format(_action_learn, ipv6 ? "[%s]:%"PRIu16 :
> "%s:%"PRIu16,
>lb_vip->vip_str, lb_vip->vip_port);
>  } else {
>  ds_put_cstr(_action_learn, lb_vip->vip_str);
> @@ -7053,9 +7056,17 @@ build_lb_affinity_lr_flows(struct hmap *lflows,
> struct ovn_northd_lb *lb,
>  ds_put_cstr(_action_learn, "\", backend = \"");
>
>  /* Prepare common part of affinity learn match. */
> -ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> -  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> -  reg_vip, lb_vip->vip_str, ip_match);
> +if (lb_vip->vip_port) {
> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> +  "ct.new && %s && %s == %s && "
> +  "%s == %"PRIu16" && %s.dst == ", ip_match,
> +  reg_vip, lb_vip->vip_str,
> +  reg_port, lb_vip->vip_port, ip_match);
> +} else {
> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> +  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> +  reg_vip, lb_vip->vip_str, ip_match);
> +}
>
>  /* Prepare common part of affinity match. */
>  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
> @@ -7172,13 +7183,15 @@ build_lb_affinity_lr_flows(struct hmap *lflows,
> struct ovn_northd_lb *lb,
>   *   table=ls_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst ==
> BP1)
> + * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
> + * && ip4.dst == B1 && tcp.dst == BP1)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
>   *proto = tcp, timeout = T));
>   *   table=ls_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst ==
> BP2)
> + * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
> + * && ip4.dst == B2 && tcp.dst == BP2)
>   *  action=(commit_lb_aff(vip = 

Re: [ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Dumitru Ceara
On 12/5/22 16:16, Lorenzo Bianconi wrote:
>> For the case when multiple LBs (same VIP but different port) share the
>> same subset of backends we need to differentiate between them by also
>> matching on the L4 port.  Without that affinity configuration from one
>> load balancer might be incorrectly applied to another.
>>
>> Adapt the unit and system tests to cover this scenario too.
>>
>> Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
>> Reported-by: Surya Seetharaman 
>> Signed-off-by: Dumitru Ceara 
> 
> Hi Dumitru,
> 

Hi Lorenzo,

> thx for fixing this issue, just a small nit inline.
> 
> Acked-by: Lorenzo Bianconi 
> 

Thanks for your review!

>> ---
>>  northd/northd.c | 48 +++---
>>  tests/ovn-northd.at |  8 +++
>>  tests/system-ovn.at | 57 -
>>  3 files changed, 95 insertions(+), 18 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 74facce7ac..27047ff74b 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows, 
>> struct ovn_northd_lb *lb,
>>   *   table=lr_in_lb_aff_learn, priority=100
>>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>>   * && ct.new && ip4
>> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
>> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
>> + * && ip4.dst == B1 && tcp.dst == BP1)
>>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
>>   *proto = tcp, timeout = T));
>>   *   table=lr_in_lb_aff_learn, priority=100
>>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>>   * && ct.new && ip4
>> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == BP2)
>> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
>> + * && ip4.dst == B2 && tcp.dst == BP2)
>>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
>>   *proto = tcp, timeout = T));
>>   *
>> @@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
>> ovn_northd_lb *lb,
>>  const char *ip_match = ipv6 ? "ip6" : "ip4";
>>  
>>  const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
>> +const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;
> 
> do we need reg_port? I guess we can just use REG_ORIG_TP_DPORT_ROUTER 
> directly.
> 

We can use it directly but I wanted to match the rest of the flow's
style.  Would it seem better if I renamed it to 'reg_vport'?
Alternatively, if you prefer, I can easily inline it.

What do you think?

Thanks,
Dumitru

> Regards,
> Lorenzo
> 
>>  const char *reg_backend =
>>  ipv6 ? REG_LB_L3_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
>>  
>> @@ -7040,7 +7043,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
>> ovn_northd_lb *lb,
>>  ds_put_cstr(_action_learn, "commit_lb_aff(vip = \"");
>>  
>>  if (lb_vip->vip_port) {
>> -ds_put_format(_action_learn, ipv6 ? "[%s]:%d" : "%s:%d",
>> +ds_put_format(_action_learn, ipv6 ? "[%s]:%"PRIu16 : 
>> "%s:%"PRIu16,
>>lb_vip->vip_str, lb_vip->vip_port);
>>  } else {
>>  ds_put_cstr(_action_learn, lb_vip->vip_str);
>> @@ -7053,9 +7056,17 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
>> struct ovn_northd_lb *lb,
>>  ds_put_cstr(_action_learn, "\", backend = \"");
>>  
>>  /* Prepare common part of affinity learn match. */
>> -ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
>> -  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
>> -  reg_vip, lb_vip->vip_str, ip_match);
>> +if (lb_vip->vip_port) {
>> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
>> +  "ct.new && %s && %s == %s && "
>> +  "%s == %"PRIu16" && %s.dst == ", ip_match,
>> +  reg_vip, lb_vip->vip_str,
>> +  reg_port, lb_vip->vip_port, ip_match);
>> +} else {
>> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
>> +  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
>> +  reg_vip, lb_vip->vip_str, ip_match);
>> +}
>>  
>>  /* Prepare common part of affinity match. */
>>  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
>> @@ -7172,13 +7183,15 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
>> struct ovn_northd_lb *lb,
>>   *   table=ls_in_lb_aff_learn, priority=100
>>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>>   * && ct.new && ip4
>> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
>> + * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
>> + * && ip4.dst == B1 

Re: [ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Lorenzo Bianconi
> For the case when multiple LBs (same VIP but different port) share the
> same subset of backends we need to differentiate between them by also
> matching on the L4 port.  Without that affinity configuration from one
> load balancer might be incorrectly applied to another.
> 
> Adapt the unit and system tests to cover this scenario too.
> 
> Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
> Reported-by: Surya Seetharaman 
> Signed-off-by: Dumitru Ceara 

Hi Dumitru,

thx for fixing this issue, just a small nit inline.

Acked-by: Lorenzo Bianconi 

> ---
>  northd/northd.c | 48 +++---
>  tests/ovn-northd.at |  8 +++
>  tests/system-ovn.at | 57 -
>  3 files changed, 95 insertions(+), 18 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 74facce7ac..27047ff74b 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows, 
> struct ovn_northd_lb *lb,
>   *   table=lr_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> + * && ip4.dst == B1 && tcp.dst == BP1)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
>   *proto = tcp, timeout = T));
>   *   table=lr_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == BP2)
> + * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
> + * && ip4.dst == B2 && tcp.dst == BP2)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
>   *proto = tcp, timeout = T));
>   *
> @@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
> ovn_northd_lb *lb,
>  const char *ip_match = ipv6 ? "ip6" : "ip4";
>  
>  const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
> +const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;

do we need reg_port? I guess we can just use REG_ORIG_TP_DPORT_ROUTER directly.

Regards,
Lorenzo

>  const char *reg_backend =
>  ipv6 ? REG_LB_L3_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
>  
> @@ -7040,7 +7043,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
> ovn_northd_lb *lb,
>  ds_put_cstr(_action_learn, "commit_lb_aff(vip = \"");
>  
>  if (lb_vip->vip_port) {
> -ds_put_format(_action_learn, ipv6 ? "[%s]:%d" : "%s:%d",
> +ds_put_format(_action_learn, ipv6 ? "[%s]:%"PRIu16 : 
> "%s:%"PRIu16,
>lb_vip->vip_str, lb_vip->vip_port);
>  } else {
>  ds_put_cstr(_action_learn, lb_vip->vip_str);
> @@ -7053,9 +7056,17 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
> ovn_northd_lb *lb,
>  ds_put_cstr(_action_learn, "\", backend = \"");
>  
>  /* Prepare common part of affinity learn match. */
> -ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> -  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> -  reg_vip, lb_vip->vip_str, ip_match);
> +if (lb_vip->vip_port) {
> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> +  "ct.new && %s && %s == %s && "
> +  "%s == %"PRIu16" && %s.dst == ", ip_match,
> +  reg_vip, lb_vip->vip_str,
> +  reg_port, lb_vip->vip_port, ip_match);
> +} else {
> +ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
> +  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
> +  reg_vip, lb_vip->vip_str, ip_match);
> +}
>  
>  /* Prepare common part of affinity match. */
>  ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
> @@ -7172,13 +7183,15 @@ build_lb_affinity_lr_flows(struct hmap *lflows, 
> struct ovn_northd_lb *lb,
>   *   table=ls_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
> + * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
> + * && ip4.dst == B1 && tcp.dst == BP1)
>   *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
>   *proto = tcp, timeout = T));
>   *   table=ls_in_lb_aff_learn, priority=100
>   *  match=(REGBIT_KNOWN_LB_SESSION == 0
>   * && ct.new && ip4
> - * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == BP2)
> + * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT 

Re: [ovs-dev] [PATCH v2] rhel: move conf.db to /var/lib/openvswitch, using symlinks

2022-12-05 Thread Ilya Maximets
On 12/4/22 09:23, Roi Dayan wrote:
> 
> 
> On 30/11/2022 17:55, Ilya Maximets wrote:
>> On 11/14/22 20:41, Timothy Redaelli wrote:
>>> conf.db is by default at /etc/openvswitch, but it should be at
>>> /var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.
>>>
>>> If conf.db already exists in /etc/openvswitch then it's moved to
>>> /var/lib/openvswitch.
>>> Symlinks are created for conf.db and .conf.db.~lock~ into /etc/openvswitch
>>> for backward compatibility.
>>>
>>> Reported-at: 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2F1830857data=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=%2BIcIVZBKrfhIpq%2B6r6I3QvjdZ9KvjLsrRSlvi9kFHzc%3Dreserved=0
>>> Reported-by: Yedidyah Bar David 
>>> Signed-off-by: Timothy Redaelli 
>>> ---
>>> v1 -> v2:
>>> - Use hugetlbfs group instead of openvswitch when the package is built
>>>   with dpdk (as reported by Flavio)
>>> ---
>>>  rhel/openvswitch-fedora.spec.in | 27 +++
>>>  1 file changed, 23 insertions(+), 4 deletions(-)
>>
>> If that works for Fedora, then LGTM.  Applied.
>>
>> Thanks!
>> Best regards, Ilya Maximets.
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-devdata=05%7C01%7Croid%40nvidia.com%7Cd69116141ff645fc2c7308dad2eb4612%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638054205222362304%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=fZZh4iYeUu%2BL2%2F%2FWTIgPNzpvfhpe%2F9MANkVPLmv57aY%3Dreserved=0
> 
> 
> hi,
> 
> This commit expose some kind of issue and cause openvswitch not
> to start on clean systems.
> 
> If old conf.db file didn't exists it creates an empty conf.db with
> the touch command.
> Empty conf.db cause ovsdb-server not to start.
> 
> #  /usr/share/openvswitch/scripts/ovs-ctl start
> ovsdb-tool: ovsdb error: /etc/openvswitch/conf.db: cannot identify file type
> Starting ovsdb-server ovsdb-server: ovsdb error: /etc/openvswitch/conf.db: 
> cannot identify file type
>[FAILED]
> 
> If I remove the conf.db file (can leave the symbolic link in /etc)
> then ovs starts fine.
> # rm /var/lib/openvswitch/conf.db
> #  /usr/share/openvswitch/scripts/ovs-ctl start
> /etc/openvswitch/conf.db does not exist ... (warning).
> Creating empty database /etc/openvswitch/conf.db   [  OK  ]
> Starting ovsdb-server  [  OK  ]
> system ID not configured, please use --system-id ... failed!
> Configuring Open vSwitch system IDs[  OK  ]
> Starting ovs-vswitchd  [  OK  ]
> Enabling remote OVSDB managers [  OK  ]
> 
> 
> I'm not sure where it's better to fix this. either the spec here
> not to create an empty file or in ovsdb/log.c to an accept empty conf.db,
> or maybe even upgrade_db() in ovs-lib bash file to call create_db
> even if conf.db exists but it's empty.

Thanks, Roi, for the report!
I think, fixing the spec should be the right approach here.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpdk: Update to use v22.11.

2022-12-05 Thread David Marchand
Hi Ian,

On Wed, Nov 30, 2022 at 4:32 PM Ian Stokes  wrote:
>
> This commit add support to for DPDK v22.11, it includes the following
> changes.
>
> 1. ci: Reduce DPDK compilation time.
> 2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
>
>http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528
>
> 3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
>
>http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332
>
> 4. netdev-dpdk: Report device bus specific information.
> 5. netdev-dpdk: Drop reference to Rx header split.
>
>http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808
>
> In addition documentation was also updated in this commit for use with
> DPDK v22.11.
>
> For credit all authors of the original commits to 'dpdk-latest' with the
> above changes have been added as co-authors for this commit
>
> Signed-off-by: David Marchand 
> Co-authored-by: David Marchand 
> Signed-off-by: Sunil Pai G 
> Co-authored-by: Sunil Pai G 
> Signed-off-by: Ian Stokes 
>
> ---
> v2 -> v3
> * Remove RFC status.
> * Update debian control to use 22.11.
>
> v1 -> v2
> * Updated to use DPDK 22.11 rc4.
>
> * Please Note: Although DPDK documentation has been updated in this patch
> the resource has not been updated on the DPDK site as of yet, this will
> be expected as part of DPDK 22.11 final release.
>
> * The GitHub actions 'linux deb shared dpdk' is expected to fail with this
> patch as DPDK 22.11 is not part of the package structure yet.


> ---
>  .ci/linux-build.sh   |  7 ++-
>  Documentation/faq/releases.rst   |  2 +-
>  Documentation/intro/install/dpdk.rst | 16 +++---
>  Documentation/topics/dpdk/phy.rst|  8 +--

We are missing some updates in the documentation:

Documentation/topics/dpdk/vdev.rst:__
https://doc.dpdk.org/guides-21.11/nics/overview.html
Documentation/topics/dpdk/vhost-user.rst:`__
Documentation/topics/testing.rst:.. _Configure hugepages:
https://doc.dpdk.org/guides-21.11/linux_gsg/sys_reqs.html
Documentation/topics/userspace-tso.rst:__
https://doc.dpdk.org/guides-21.11/nics/overview.html

The rest lgtm.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/2] tests: add unit tests to rculist

2022-12-05 Thread Mike Pattrick
On Mon, Dec 5, 2022 at 3:41 AM Adrian Moreno  wrote:
>
> Low test coverage on this area caused some errors to remain unnoticed.
> Add basic functional test of rculist.
>
> Signed-off-by: Adrian Moreno 

Looks good! Triggers a warning if the preceding patch hasn't been
applied as intended.

Acked-by: Mike Pattrick 

> ---
>  tests/automake.mk|   1 +
>  tests/library.at |   5 ++
>  tests/test-rculist.c | 205 +++
>  3 files changed, 211 insertions(+)
>  create mode 100644 tests/test-rculist.c
>
> diff --git a/tests/automake.mk b/tests/automake.mk
> index d509cf935..88f97b8b7 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -474,6 +474,7 @@ tests_ovstest_SOURCES = \
> tests/test-packets.c \
> tests/test-random.c \
> tests/test-rcu.c \
> +   tests/test-rculist.c \
> tests/test-reconnect.c \
> tests/test-rstp.c \
> tests/test-sflow.c \
> diff --git a/tests/library.at b/tests/library.at
> index bafb28277..164ae789d 100644
> --- a/tests/library.at
> +++ b/tests/library.at
> @@ -27,6 +27,11 @@ AT_CHECK([ovstest test-hindex], [0], [.
>  ])
>  AT_CLEANUP
>
> +AT_SETUP([test rcu linked lists])
> +AT_CHECK([ovstest test-rculist], [0], [.
> +])
> +AT_CLEANUP
> +
>  AT_SETUP([cuckoo hash])
>  AT_KEYWORDS([cmap])
>  AT_CHECK([ovstest test-cmap check 1], [0], [...
> diff --git a/tests/test-rculist.c b/tests/test-rculist.c
> new file mode 100644
> index 0..49fe434ff
> --- /dev/null
> +++ b/tests/test-rculist.c
> @@ -0,0 +1,205 @@
> +#include 
> +#undef NDEBUG
> +#include 
> +
> +#include "ovstest.h"
> +#include "rculist.h"
> +#include "openvswitch/list.h"
> +#include "ovs-thread.h"
> +#include "random.h"
> +#include "util.h"
> +
> +enum { MAX_ELEMS = 10, MAX_CHECKS = 200 };
> +
> +/* Sample list element. */
> +struct element {
> +int value;
> +struct rculist node;
> +};
> +
> +/* Continuously check the integrity of the list until it's empty. */
> +static void *
> +checker_main(void *aux)
> +{
> +struct element *elem;
> +struct rculist *list = (struct rculist *) aux;
> +bool checked = false;
> +
> +for (int i = 0; i < MAX_CHECKS; i++) {
> +int value = -1;
> +RCULIST_FOR_EACH (elem, node, list) {
> +ovs_assert(value <= elem->value);
> +ovs_assert(elem->value < MAX_ELEMS);
> +value = elem->value;
> +if (!checked) {
> +checked = true;
> +}
> +usleep(10);
> +}
> +
> +ovsrcu_quiesce();
> +
> +if (checked && rculist_is_empty(list)) {
> +break;
> +}
> +}
> +return NULL;
> +}
> +
> +/* Run test while a thread checks the integrity of the list.
> + * Tests must end up emptying the list. */
> +static void
> +run_test_while_checking(void (*function)(struct rculist *list))
> +{
> +struct rculist list;
> +pthread_t checker;
> +
> +rculist_init();
> +
> +checker = ovs_thread_create("checker", checker_main, );
> +function();
> +
> +ovs_assert(rculist_is_empty());
> +ovsrcu_quiesce();
> +xpthread_join(checker, NULL);
> +printf(".");
> +}
> +
> +static void
> +test_rculist_insert_delete__(struct rculist *list, bool long_version)
> +{
> +struct element *elem;
> +int value;
> +
> +for (int i = 1; i < MAX_ELEMS; i++) {
> +elem = xmalloc(sizeof *elem);
> +elem->value = i;
> +rculist_insert(list, >node);
> +/* Leave some time for checkers to iterate through. */
> +usleep(random_range(1000));
> +}
> +
> +ovsrcu_quiesce();
> +
> +value = MAX_ELEMS;
> +RCULIST_FOR_EACH_REVERSE_PROTECTED (elem, node, list) {
> +ovs_assert (elem->value <= value);
> +value = elem->value;
> +}
> +
> +if (long_version) {
> +struct element *next;
> +RCULIST_FOR_EACH_SAFE_PROTECTED (elem, next, node, list) {
> +rculist_remove(>node);
> +ovsrcu_postpone(free, elem);
> +/* Leave some time for checkers to iterate through. */
> +usleep(random_range(1000));
> +}
> +} else {
> +RCULIST_FOR_EACH_SAFE_PROTECTED (elem, node, list) {
> +rculist_remove(>node);
> +ovsrcu_postpone(free, elem);
> +/* Leave some time for checkers to iterate through. */
> +usleep(random_range(1000));
> +}
> +}
> +}
> +
> +static void
> +test_rculist_insert_delete(struct rculist *list) {
> +test_rculist_insert_delete__(list, false);
> +}
> +
> +static void
> +test_rculist_insert_delete_long(struct rculist *list) {
> +test_rculist_insert_delete__(list, true);
> +}
> +
> +static void
> +test_rculist_push_front_pop_back(struct rculist *list)
> +{
> +struct element *elem;
> +
> +for (int i = MAX_ELEMS - 1; i > 0; i--) {
> +elem = xmalloc(sizeof *elem);
> +elem->value = i;
> +

Re: [ovs-dev] [PATCH v1 1/2] rculist: use rculist_back_protected to access prev

2022-12-05 Thread Mike Pattrick
On Mon, Dec 5, 2022 at 3:41 AM Adrian Moreno  wrote:
>
> The .prev member of a rculist should not be used directly by users
> because it's not rcu-safe. A convenient fake mutex (rculist_fake_mutex)
> helps ensuring that in conjunction with clang's thread safety
> extensions.
>
> Only writers with exclusive access to the rculist should access .prev
> via some of the provided *_protected() accessors.
>
> Use rculist_back_protected() in REVERSE_PROTECTED iterators to avoid
> clang's compilation warning.
>
> Signed-off-by: Adrian Moreno 

This fixes the clang warning.

Acked-by: Mike Pattrick 

> ---
>  lib/rculist.h | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/rculist.h b/lib/rculist.h
> index 9bb8cbf3e..6df963eb2 100644
> --- a/lib/rculist.h
> +++ b/lib/rculist.h
> @@ -378,12 +378,14 @@ rculist_is_singleton_protected(const struct rculist 
> *list)
>   UPDATE_MULTIVAR(ITER, rculist_next(ITER_VAR(ITER
>
>  #define RCULIST_FOR_EACH_REVERSE_PROTECTED(ITER, MEMBER, RCULIST)
>  \
> -for (INIT_MULTIVAR(ITER, MEMBER, (RCULIST)->prev, struct rculist);   
>  \
> +for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(RCULIST),
>  \
> +   struct rculist);  
>  \
>   CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST));  
>  \
> - UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
> + UPDATE_MULTIVAR(ITER, rculist_back_protected(ITER_VAR(ITER
>
>  #define RCULIST_FOR_EACH_REVERSE_PROTECTED_CONTINUE(ITER, MEMBER, RCULIST)   
>  \
> -for (INIT_MULTIVAR(ITER, MEMBER, (ITER)->MEMBER.prev, struct rculist);   
>  \
> +for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(ITER->MEMBER),   
>  \
> +   struct rculist);  
>  \
>   CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST));  
>  \
>   UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
>
> --
> 2.38.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Include VIP port in LB affinity learn flow matches.

2022-12-05 Thread Dumitru Ceara
For the case when multiple LBs (same VIP but different port) share the
same subset of backends we need to differentiate between them by also
matching on the L4 port.  Without that affinity configuration from one
load balancer might be incorrectly applied to another.

Adapt the unit and system tests to cover this scenario too.

Fixes: d3926b433e44 ("northd: rely on new actions for lb affinity")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2150533
Reported-by: Surya Seetharaman 
Signed-off-by: Dumitru Ceara 
---
 northd/northd.c | 48 +++---
 tests/ovn-northd.at |  8 +++
 tests/system-ovn.at | 57 -
 3 files changed, 95 insertions(+), 18 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 74facce7ac..27047ff74b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -6984,13 +6984,15 @@ build_lb_rules_pre_stateful(struct hmap *lflows, struct 
ovn_northd_lb *lb,
  *   table=lr_in_lb_aff_learn, priority=100
  *  match=(REGBIT_KNOWN_LB_SESSION == 0
  * && ct.new && ip4
- * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
+ * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
+ * && ip4.dst == B1 && tcp.dst == BP1)
  *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
  *proto = tcp, timeout = T));
  *   table=lr_in_lb_aff_learn, priority=100
  *  match=(REGBIT_KNOWN_LB_SESSION == 0
  * && ct.new && ip4
- * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == BP2)
+ * && REG_NEXT_HOP_IPV4 == V && REG_ORIG_TP_DPORT_ROUTER = VP
+ * && ip4.dst == B2 && tcp.dst == BP2)
  *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
  *proto = tcp, timeout = T));
  *
@@ -7032,6 +7034,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
ovn_northd_lb *lb,
 const char *ip_match = ipv6 ? "ip6" : "ip4";
 
 const char *reg_vip = ipv6 ? REG_NEXT_HOP_IPV6 : REG_NEXT_HOP_IPV4;
+const char *reg_port = REG_ORIG_TP_DPORT_ROUTER;
 const char *reg_backend =
 ipv6 ? REG_LB_L3_AFF_BACKEND_IP6 : REG_LB_AFF_BACKEND_IP4;
 
@@ -7040,7 +7043,7 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
ovn_northd_lb *lb,
 ds_put_cstr(_action_learn, "commit_lb_aff(vip = \"");
 
 if (lb_vip->vip_port) {
-ds_put_format(_action_learn, ipv6 ? "[%s]:%d" : "%s:%d",
+ds_put_format(_action_learn, ipv6 ? "[%s]:%"PRIu16 : "%s:%"PRIu16,
   lb_vip->vip_str, lb_vip->vip_port);
 } else {
 ds_put_cstr(_action_learn, lb_vip->vip_str);
@@ -7053,9 +7056,17 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
ovn_northd_lb *lb,
 ds_put_cstr(_action_learn, "\", backend = \"");
 
 /* Prepare common part of affinity learn match. */
-ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
-  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
-  reg_vip, lb_vip->vip_str, ip_match);
+if (lb_vip->vip_port) {
+ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
+  "ct.new && %s && %s == %s && "
+  "%s == %"PRIu16" && %s.dst == ", ip_match,
+  reg_vip, lb_vip->vip_str,
+  reg_port, lb_vip->vip_port, ip_match);
+} else {
+ds_put_format(_match_learn, REGBIT_KNOWN_LB_SESSION" == 0 && "
+  "ct.new && %s && %s == %s && %s.dst == ", ip_match,
+  reg_vip, lb_vip->vip_str, ip_match);
+}
 
 /* Prepare common part of affinity match. */
 ds_put_format(_match, REGBIT_KNOWN_LB_SESSION" == 1 && "
@@ -7172,13 +7183,15 @@ build_lb_affinity_lr_flows(struct hmap *lflows, struct 
ovn_northd_lb *lb,
  *   table=ls_in_lb_aff_learn, priority=100
  *  match=(REGBIT_KNOWN_LB_SESSION == 0
  * && ct.new && ip4
- * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B1 && tcp.dst == BP1)
+ * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
+ * && ip4.dst == B1 && tcp.dst == BP1)
  *  action=(commit_lb_aff(vip = "V:VP", backend = "B1:BP1",
  *proto = tcp, timeout = T));
  *   table=ls_in_lb_aff_learn, priority=100
  *  match=(REGBIT_KNOWN_LB_SESSION == 0
  * && ct.new && ip4
- * && REG_ORIG_DIP_IPV4 == V && ip4.dst == B2 && tcp.dst == BP2)
+ * && REG_ORIG_DIP_IPV4 == V && REG_ORIG_TP_DPORT == VP
+ * && ip4.dst == B2 && tcp.dst == BP2)
  *  action=(commit_lb_aff(vip = "V:VP", backend = "B2:BP2",
  *proto = tcp, timeout = T));
  *
@@ -7236,6 +7249,7 @@ build_lb_affinity_ls_flows(struct hmap *lflows, struct 
ovn_northd_lb *lb,
 const char *ip_match = ipv6 ? "ip6" : "ip4";
 
 const char *reg_vip = ipv6 ? REG_ORIG_DIP_IPV6 : 

Re: [ovs-dev] [PATCH v1] ovs-ctl: Allow inclusion of hugepages in coredumps

2022-12-05 Thread Ilya Maximets
On 12/5/22 13:46, David Marchand wrote:
> On Fri, Dec 2, 2022 at 7:00 PM Mike Pattrick  wrote:
>>> Did you consider madvise()?
>>>
>>> MADV_DONTDUMP (since Linux 3.4)
>>>Exclude from a core dump those pages in the range
>>> specified by addr and length.  This is useful in applications that
>>> have large areas of memory that are known not to be useful in a core
>>> dump.  The effect of  MADV_DONT‐
>>>DUMP takes precedence over the bit mask that is set via
>>> the /proc/[pid]/coredump_filter file (see core(5)).
>>>
>>> MADV_DODUMP (since Linux 3.4)
>>>Undo the effect of an earlier MADV_DONTDUMP.
>>
>>> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
>>> will override whatever user had in their global configuration.  Meaning
>>> every DPDK application with vhost ports will start dumping some of the
>>> guest pages with no actual ability to turn that off.
>>
>> I initially thought it would work that way, but the DODUMP flag just
>> disables the DONTDUMP flag.
>>
>> https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
>> https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
>>
> 
> Glad to read that the manual tells the same story than the kernel code :-).

Manuals, pfff.  I seem to automatically skip them even if quoted directly
in the thread. :D

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ovs-ctl: Allow inclusion of hugepages in coredumps

2022-12-05 Thread David Marchand
On Fri, Dec 2, 2022 at 9:14 PM Mike Pattrick  wrote:
>
> On Fri, Dec 2, 2022 at 1:40 PM Ilya Maximets  wrote:
> >
> > On 12/2/22 18:59, Mike Pattrick wrote:
> > > On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets  wrote:
> > >>
> > >> On 12/2/22 11:36, Maxime Coquelin wrote:
> > >>>
> > >>>
> > >>> On 12/2/22 11:09, David Marchand wrote:
> >  On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets  
> >  wrote:
> > > Shouldn't this be 0x7f instead?
> > > 0x3f doesn't enable bit #6, which is responsible for dumping
> > > shared huge pages.  Or am I missing something?
> > 
> >  That's a good point, the hugepage may or may not be private. I'll 
> >  send
> >  in a new one.
> > >>>
> > >>> OK.  One thing to think about though is that we'll grab
> > >>> VM memory, I guess, in case we have vhost-user ports.
> > >>> So, the core dump size can become insanely huge.
> > >>>
> > >>> The downside of not having them is inability to inspect
> > >>> virtqueues and stuff in the dump.
> > >>
> > >> Did you consider madvise()?
> > >>
> > >> MADV_DONTDUMP (since Linux 3.4)
> > >>Exclude from a core dump those pages in the range
> > >> specified by addr and length.  This is useful in applications that
> > >> have large areas of memory that are known not to be useful in a core
> > >> dump.  The effect of  MADV_DONT‐
> > >>DUMP takes precedence over the bit mask that is set 
> > >> via
> > >> the /proc/[pid]/coredump_filter file (see core(5)).
> > >>
> > >> MADV_DODUMP (since Linux 3.4)
> > >>Undo the effect of an earlier MADV_DONTDUMP.
> > >
> > > I don't think OVS actually knows location of particular VM memory
> > > pages that we do not need.  And dumping virtqueues and stuff is,
> > > probably, the point of this patch (?).
> > >
> > > vhost-user library might have a better idea on which particular parts
> > > of the memory guest may use for virtqueues and buffers, but I'm not
> > > 100% sure.
> > 
> >  Yes, distinguishing hugepages of interest is a problem.
> > 
> >  Since v20.05, DPDK mem allocator takes care of excluding (unused)
> >  hugepages from dump.
> >  So with this OVS patch, if we catch private and shared hugepages,
> >  "interesting" DPDK hugepages will get dumped, which is useful for
> >  debugging post mortem.
> > 
> >  Adding Maxime, who will have a better idea of what is possible for the
> >  guest mapping part.
> > 
> > 
> > >>>
> > >>> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
> > >>> time, then there are two cases:
> > >>>   a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
> > >>> memory. Doing so, we would have the rings memory, but not their buffers
> > >>> (except if they are located on same hugepages).
> > >>>   b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
> > >>> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
> > >>> get both vrings and their buffers the backend is allowed to access.
> > >>
> > >> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
> > >> will override whatever user had in their global configuration.  Meaning
> > >> every DPDK application with vhost ports will start dumping some of the
> > >> guest pages with no actual ability to turn that off.
> > >
> > > I initially thought it would work that way, but the DODUMP flag just
> > > disables the DONTDUMP flag.
> > >
> > > https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
> > > https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
> >
> > Hmm, interesting.  Makes sense.
> >
> > Thanks for the pointers!
> >
> > So, it should still be 7f regardless in the coredump filter for OVS, right?
> > Do you plan to update the current patch or do you think we should omit
> > shared pages until support for MADV_DO/DONTDUMP is added to vhost library?
> >
> > Note that this will likely not be available in 22.11 as it's not a bug fix.
> > So, 23.11 at the earliest.
> >
> > Basically 2 options:
> >
> > 1. 0x3f and not having shared pages.  Flip to 0x7f with DPDK 23.11 next 
> > year.
> >Pros: Smaller files
> >Cons: Missing some of the virtqueue memory until [potentially] DPDK 
> > 23.11.

Mm, if someone still has some --socket-mem config, then I guess shared
hugepages will be in use in DPDK.


> >
> > 2. 0x7f today.
> >Pros: All the memory is available.
> >Cons: [Significantly] larger files until [potentially] DPDK 23.11.
> >
> > What do you think?  David, Maxime?
>
> I'd prefer 7f today. It's disabled by default, has zero impact on end
> users, makes setting up debugging environments more convenient, and on
> distributions with systemd the larger coredumps are managed somewhat
> automatically. The news item already 

Re: [ovs-dev] [PATCH v1] ovs-ctl: Allow inclusion of hugepages in coredumps

2022-12-05 Thread Ilya Maximets
On 12/2/22 21:14, Mike Pattrick wrote:
> On Fri, Dec 2, 2022 at 1:40 PM Ilya Maximets  wrote:
>>
>> On 12/2/22 18:59, Mike Pattrick wrote:
>>> On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets  wrote:

 On 12/2/22 11:36, Maxime Coquelin wrote:
>
>
> On 12/2/22 11:09, David Marchand wrote:
>> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets  wrote:
>>> Shouldn't this be 0x7f instead?
>>> 0x3f doesn't enable bit #6, which is responsible for dumping
>>> shared huge pages.  Or am I missing something?
>>
>> That's a good point, the hugepage may or may not be private. I'll 
>> send
>> in a new one.
>
> OK.  One thing to think about though is that we'll grab
> VM memory, I guess, in case we have vhost-user ports.
> So, the core dump size can become insanely huge.
>
> The downside of not having them is inability to inspect
> virtqueues and stuff in the dump.

 Did you consider madvise()?

 MADV_DONTDUMP (since Linux 3.4)
Exclude from a core dump those pages in the range
 specified by addr and length.  This is useful in applications that
 have large areas of memory that are known not to be useful in a core
 dump.  The effect of  MADV_DONT‐
DUMP takes precedence over the bit mask that is set via
 the /proc/[pid]/coredump_filter file (see core(5)).

 MADV_DODUMP (since Linux 3.4)
Undo the effect of an earlier MADV_DONTDUMP.
>>>
>>> I don't think OVS actually knows location of particular VM memory
>>> pages that we do not need.  And dumping virtqueues and stuff is,
>>> probably, the point of this patch (?).
>>>
>>> vhost-user library might have a better idea on which particular parts
>>> of the memory guest may use for virtqueues and buffers, but I'm not
>>> 100% sure.
>>
>> Yes, distinguishing hugepages of interest is a problem.
>>
>> Since v20.05, DPDK mem allocator takes care of excluding (unused)
>> hugepages from dump.
>> So with this OVS patch, if we catch private and shared hugepages,
>> "interesting" DPDK hugepages will get dumped, which is useful for
>> debugging post mortem.
>>
>> Adding Maxime, who will have a better idea of what is possible for the
>> guest mapping part.
>>
>>
>
> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
> time, then there are two cases:
>   a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
> memory. Doing so, we would have the rings memory, but not their buffers
> (except if they are located on same hugepages).
>   b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
> get both vrings and their buffers the backend is allowed to access.

 I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
 will override whatever user had in their global configuration.  Meaning
 every DPDK application with vhost ports will start dumping some of the
 guest pages with no actual ability to turn that off.
>>>
>>> I initially thought it would work that way, but the DODUMP flag just
>>> disables the DONTDUMP flag.
>>>
>>> https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
>>> https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
>>
>> Hmm, interesting.  Makes sense.
>>
>> Thanks for the pointers!
>>
>> So, it should still be 7f regardless in the coredump filter for OVS, right?
>> Do you plan to update the current patch or do you think we should omit
>> shared pages until support for MADV_DO/DONTDUMP is added to vhost library?
>>
>> Note that this will likely not be available in 22.11 as it's not a bug fix.
>> So, 23.11 at the earliest.
>>
>> Basically 2 options:
>>
>> 1. 0x3f and not having shared pages.  Flip to 0x7f with DPDK 23.11 next year.
>>Pros: Smaller files
>>Cons: Missing some of the virtqueue memory until [potentially] DPDK 23.11.
>>
>> 2. 0x7f today.
>>Pros: All the memory is available.
>>Cons: [Significantly] larger files until [potentially] DPDK 23.11.
>>
>> What do you think?  David, Maxime?
> 
> I'd prefer 7f today. It's disabled by default, has zero impact on end
> users, makes setting up debugging environments more convenient, and on
> distributions with systemd the larger coredumps are managed somewhat
> automatically. The news item already warns about large coredumps.
> 
> WDYT?

Sounds good to me.

> 
> -M
> 
>>>
>>> Cheers,
>>> M
>>>

 Can the behavior be configurable?

>
> I can prepare a PoC quickly if someone is willing to experiment.
>
> Regards,
> Maxime
>
>

>>>
>>
> 

___

Re: [ovs-dev] [PATCH v1] ovs-ctl: Allow inclusion of hugepages in coredumps

2022-12-05 Thread David Marchand
On Fri, Dec 2, 2022 at 7:00 PM Mike Pattrick  wrote:
> >  Did you consider madvise()?
> > 
> >  MADV_DONTDUMP (since Linux 3.4)
> > Exclude from a core dump those pages in the range
> >  specified by addr and length.  This is useful in applications that
> >  have large areas of memory that are known not to be useful in a core
> >  dump.  The effect of  MADV_DONT‐
> > DUMP takes precedence over the bit mask that is set via
> >  the /proc/[pid]/coredump_filter file (see core(5)).
> > 
> >  MADV_DODUMP (since Linux 3.4)
> > Undo the effect of an earlier MADV_DONTDUMP.
> > >>>
> > I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
> > will override whatever user had in their global configuration.  Meaning
> > every DPDK application with vhost ports will start dumping some of the
> > guest pages with no actual ability to turn that off.
>
> I initially thought it would work that way, but the DODUMP flag just
> disables the DONTDUMP flag.
>
> https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
> https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
>

Glad to read that the manual tells the same story than the kernel code :-).


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v6] odp-execute: Add ISA implementation of set_masked IPv6 action

2022-12-05 Thread Ilya Maximets
On 12/5/22 12:53, Finn, Emma wrote:
> 
> 
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Friday 2 December 2022 14:22
>> To: Finn, Emma ; d...@openvswitch.org
>> Cc: i.maxim...@ovn.org; Van Haaren, Harry ;
>> echau...@redhat.com; Stokes, Ian 
>> Subject: Re: [v6] odp-execute: Add ISA implementation of set_masked IPv6
>> action
>>
>> On 11/30/22 16:57, Emma Finn wrote:
>>> This commit adds support for the AVX512 implementation of the
>>> ipv6_set_addrs action as well as an AVX512 implementation of updating
>>> the L4 checksums.
>>>
>>> Signed-off-by: Emma Finn 
>>
>> Hi.  Thanks for the updated version!
>> Could you also provide some performance numbers in the commit message?
>> Performance related patches should typically have some.
>>
> Yes, I will add some relative performance numbers when I send out the next 
> version.

Thanks!

> 
>> Some comments inline.  There is also a bug in ipv4 implementation.
>>
>>>
>>> ---
>>> v6:
>>>  - Added check for ipv6 extension headers.
>>> v5:
>>>   - Fixed load for ip6 src and dst mask for checksum check.
>>> v4:
>>>   - Reworked and moved check for checksum outside loop.
>>>   - Code cleanup based on review from Eelco.
>>> v3:
>>>   - Added a runtime check for AVX512 vbmi.
>>> v2:
>>>   - Added check for availbility of s6_addr32 field of struct in6_addr.
>>>   - Fixed network headers for freebsd builds.
>>> ---
>>> ---
> 
> 
>>> +static inline uint16_t ALWAYS_INLINE
>>> +__attribute__((__target__("avx512vbmi")))
>>> +avx512_ipv6_addr_csum_delta(__m512i old_header, __m512i
>> new_header) {
>>> +uint16_t old_delta = avx512_ipv6_sum_header(old_header);
>>> +uint16_t new_delta = avx512_ipv6_sum_header(new_header);
>>> +uint32_t csum_delta = ((uint16_t)~old_delta) + new_delta;
>>
>> Is the cast necessary here?  The 'old_delta' is uint16_t.  The bit inversion
>> should not change the type, right?
>>
> Yes cast is necessary here. 
> Bit inversion doesn't change type but the addition with result being saved
> to a 32-bit does. Without cast, delta is incorrect

Hmm, OK.  Please, add a space between the cast and the inversion then,
as Eelco suggested in his diff for v4.

> 
>>> +
>>> +return  ~csum_finish(csum_delta);
>>
>> One too many spaces after 'return'.
>>
>>> +}
>>> +
>>> +/* This function performs the same operation on each packet in the
>>> +batch as
>>> + * the scalar odp_set_ipv6() function. */ static void
>>> +__attribute__((__target__("avx512vbmi")))
>>> +action_avx512_set_ipv6(struct dp_packet_batch *batch, const struct
>>> +nlattr *a) {
>>> +const struct ovs_key_ipv6 *key, *mask;
>>> +struct dp_packet *packet;
>>> +
>>> +a = nl_attr_get(a);
>>> +key = nl_attr_get(a);
>>> +mask = odp_get_key_mask(a, struct ovs_key_ipv6);
>>> +
>>> +/* Read the content of the key and mask in the respective registers. We
>>> + * only load the size of the actual structure, which is only 40 bytes. 
>>> */
>>> +__m512i v_key = _mm512_maskz_loadu_epi64(0x1F, (void *) key);
>>> +__m512i v_mask = _mm512_maskz_loadu_epi64(0x1F, (void *) mask);
>>> +
>>> +/* This shuffle mask v_shuffle, is to shuffle key and mask to match the
>>> + * ip6_hdr structure layout. */
>>> +static const uint8_t ip_shuffle_mask[64] = {
>>> +0x20, 0x21, 0x22, 0x23, 0xFF, 0xFF, 0x24, 0x26,
>>> +0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
>>> +0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
>>> +0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
>>> +0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
>>> +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF, 0xFF,
>>> +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
>>> +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF
>>
>> These are overindented.  Should be moved 4 spaces to the left.
>>
>>> +};
>>> +
>>> +__m512i v_shuffle = _mm512_loadu_si512((void *) ip_shuffle_mask);
>>> +
>>> +/* This shuffle is required for key and mask to match the layout of the
>>> + * ip6_hdr struct. */
>>> +__m512i v_key_shuf = _mm512_permutexvar_epi8(v_shuffle, v_key);
>>> +__m512i v_mask_shuf = _mm512_permutexvar_epi8(v_shuffle,
>> v_mask);
>>> +
>>> +/* Set the v_zero register to all zero's. */
>>> +const __m128i v_zeros = _mm_setzero_si128();
>>> +
>>> +/* Set the v_all_ones register to all one's. */
>>> +const __m128i v_all_ones = _mm_cmpeq_epi16(v_zeros, v_zeros);
>>> +
>>> +/* Load ip6 src and dst masks respectively into 128-bit wide 
>>> registers. */
>>> +__m128i v_src = _mm_loadu_si128((void *) >ipv6_src);
>>> +__m128i v_dst = _mm_loadu_si128((void *) >ipv6_dst);
>>> +
>>> +/* Perform a bitwise OR between src and dst registers. */
>>> +__m128i v_or = _mm_or_si128(v_src, v_dst);
>>> +
>>> +/* Will return true if any bit has been set in v_or, else it will 
>>> return
>>> + * false. */
>>> +bool do_checksum = !_mm_test_all_zeros(v_or, v_all_ones);
>>> +

Re: [ovs-dev] [v6] odp-execute: Add ISA implementation of set_masked IPv6 action

2022-12-05 Thread Finn, Emma



> -Original Message-
> From: Ilya Maximets 
> Sent: Friday 2 December 2022 14:22
> To: Finn, Emma ; d...@openvswitch.org
> Cc: i.maxim...@ovn.org; Van Haaren, Harry ;
> echau...@redhat.com; Stokes, Ian 
> Subject: Re: [v6] odp-execute: Add ISA implementation of set_masked IPv6
> action
> 
> On 11/30/22 16:57, Emma Finn wrote:
> > This commit adds support for the AVX512 implementation of the
> > ipv6_set_addrs action as well as an AVX512 implementation of updating
> > the L4 checksums.
> >
> > Signed-off-by: Emma Finn 
> 
> Hi.  Thanks for the updated version!
> Could you also provide some performance numbers in the commit message?
> Performance related patches should typically have some.
> 
Yes, I will add some relative performance numbers when I send out the next 
version.

> Some comments inline.  There is also a bug in ipv4 implementation.
> 
> >
> > ---
> > v6:
> >  - Added check for ipv6 extension headers.
> > v5:
> >   - Fixed load for ip6 src and dst mask for checksum check.
> > v4:
> >   - Reworked and moved check for checksum outside loop.
> >   - Code cleanup based on review from Eelco.
> > v3:
> >   - Added a runtime check for AVX512 vbmi.
> > v2:
> >   - Added check for availbility of s6_addr32 field of struct in6_addr.
> >   - Fixed network headers for freebsd builds.
> > ---
> > ---


> > +static inline uint16_t ALWAYS_INLINE
> > +__attribute__((__target__("avx512vbmi")))
> > +avx512_ipv6_addr_csum_delta(__m512i old_header, __m512i
> new_header) {
> > +uint16_t old_delta = avx512_ipv6_sum_header(old_header);
> > +uint16_t new_delta = avx512_ipv6_sum_header(new_header);
> > +uint32_t csum_delta = ((uint16_t)~old_delta) + new_delta;
> 
> Is the cast necessary here?  The 'old_delta' is uint16_t.  The bit inversion
> should not change the type, right?
> 
Yes cast is necessary here. 
Bit inversion doesn't change type but the addition with result being saved to a 
32-bit does. Without cast, delta is incorrect

> > +
> > +return  ~csum_finish(csum_delta);
> 
> One too many spaces after 'return'.
> 
> > +}
> > +
> > +/* This function performs the same operation on each packet in the
> > +batch as
> > + * the scalar odp_set_ipv6() function. */ static void
> > +__attribute__((__target__("avx512vbmi")))
> > +action_avx512_set_ipv6(struct dp_packet_batch *batch, const struct
> > +nlattr *a) {
> > +const struct ovs_key_ipv6 *key, *mask;
> > +struct dp_packet *packet;
> > +
> > +a = nl_attr_get(a);
> > +key = nl_attr_get(a);
> > +mask = odp_get_key_mask(a, struct ovs_key_ipv6);
> > +
> > +/* Read the content of the key and mask in the respective registers. We
> > + * only load the size of the actual structure, which is only 40 bytes. 
> > */
> > +__m512i v_key = _mm512_maskz_loadu_epi64(0x1F, (void *) key);
> > +__m512i v_mask = _mm512_maskz_loadu_epi64(0x1F, (void *) mask);
> > +
> > +/* This shuffle mask v_shuffle, is to shuffle key and mask to match the
> > + * ip6_hdr structure layout. */
> > +static const uint8_t ip_shuffle_mask[64] = {
> > +0x20, 0x21, 0x22, 0x23, 0xFF, 0xFF, 0x24, 0x26,
> > +0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> > +0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
> > +0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
> > +0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
> > +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF, 0xFF,
> > +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
> > +0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF
> 
> These are overindented.  Should be moved 4 spaces to the left.
> 
> > +};
> > +
> > +__m512i v_shuffle = _mm512_loadu_si512((void *) ip_shuffle_mask);
> > +
> > +/* This shuffle is required for key and mask to match the layout of the
> > + * ip6_hdr struct. */
> > +__m512i v_key_shuf = _mm512_permutexvar_epi8(v_shuffle, v_key);
> > +__m512i v_mask_shuf = _mm512_permutexvar_epi8(v_shuffle,
> v_mask);
> > +
> > +/* Set the v_zero register to all zero's. */
> > +const __m128i v_zeros = _mm_setzero_si128();
> > +
> > +/* Set the v_all_ones register to all one's. */
> > +const __m128i v_all_ones = _mm_cmpeq_epi16(v_zeros, v_zeros);
> > +
> > +/* Load ip6 src and dst masks respectively into 128-bit wide 
> > registers. */
> > +__m128i v_src = _mm_loadu_si128((void *) >ipv6_src);
> > +__m128i v_dst = _mm_loadu_si128((void *) >ipv6_dst);
> > +
> > +/* Perform a bitwise OR between src and dst registers. */
> > +__m128i v_or = _mm_or_si128(v_src, v_dst);
> > +
> > +/* Will return true if any bit has been set in v_or, else it will 
> > return
> > + * false. */
> > +bool do_checksum = !_mm_test_all_zeros(v_or, v_all_ones);
> > +
> > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
> > +
> > +/* Load the 40 bytes of the IPv6 header. */
> > +

[ovs-dev] [PATCH v1 2/2] tests: add unit tests to rculist

2022-12-05 Thread Adrian Moreno
Low test coverage on this area caused some errors to remain unnoticed.
Add basic functional test of rculist.

Signed-off-by: Adrian Moreno 
---
 tests/automake.mk|   1 +
 tests/library.at |   5 ++
 tests/test-rculist.c | 205 +++
 3 files changed, 211 insertions(+)
 create mode 100644 tests/test-rculist.c

diff --git a/tests/automake.mk b/tests/automake.mk
index d509cf935..88f97b8b7 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -474,6 +474,7 @@ tests_ovstest_SOURCES = \
tests/test-packets.c \
tests/test-random.c \
tests/test-rcu.c \
+   tests/test-rculist.c \
tests/test-reconnect.c \
tests/test-rstp.c \
tests/test-sflow.c \
diff --git a/tests/library.at b/tests/library.at
index bafb28277..164ae789d 100644
--- a/tests/library.at
+++ b/tests/library.at
@@ -27,6 +27,11 @@ AT_CHECK([ovstest test-hindex], [0], [.
 ])
 AT_CLEANUP
 
+AT_SETUP([test rcu linked lists])
+AT_CHECK([ovstest test-rculist], [0], [.
+])
+AT_CLEANUP
+
 AT_SETUP([cuckoo hash])
 AT_KEYWORDS([cmap])
 AT_CHECK([ovstest test-cmap check 1], [0], [...
diff --git a/tests/test-rculist.c b/tests/test-rculist.c
new file mode 100644
index 0..49fe434ff
--- /dev/null
+++ b/tests/test-rculist.c
@@ -0,0 +1,205 @@
+#include 
+#undef NDEBUG
+#include 
+
+#include "ovstest.h"
+#include "rculist.h"
+#include "openvswitch/list.h"
+#include "ovs-thread.h"
+#include "random.h"
+#include "util.h"
+
+enum { MAX_ELEMS = 10, MAX_CHECKS = 200 };
+
+/* Sample list element. */
+struct element {
+int value;
+struct rculist node;
+};
+
+/* Continuously check the integrity of the list until it's empty. */
+static void *
+checker_main(void *aux)
+{
+struct element *elem;
+struct rculist *list = (struct rculist *) aux;
+bool checked = false;
+
+for (int i = 0; i < MAX_CHECKS; i++) {
+int value = -1;
+RCULIST_FOR_EACH (elem, node, list) {
+ovs_assert(value <= elem->value);
+ovs_assert(elem->value < MAX_ELEMS);
+value = elem->value;
+if (!checked) {
+checked = true;
+}
+usleep(10);
+}
+
+ovsrcu_quiesce();
+
+if (checked && rculist_is_empty(list)) {
+break;
+}
+}
+return NULL;
+}
+
+/* Run test while a thread checks the integrity of the list.
+ * Tests must end up emptying the list. */
+static void
+run_test_while_checking(void (*function)(struct rculist *list))
+{
+struct rculist list;
+pthread_t checker;
+
+rculist_init();
+
+checker = ovs_thread_create("checker", checker_main, );
+function();
+
+ovs_assert(rculist_is_empty());
+ovsrcu_quiesce();
+xpthread_join(checker, NULL);
+printf(".");
+}
+
+static void
+test_rculist_insert_delete__(struct rculist *list, bool long_version)
+{
+struct element *elem;
+int value;
+
+for (int i = 1; i < MAX_ELEMS; i++) {
+elem = xmalloc(sizeof *elem);
+elem->value = i;
+rculist_insert(list, >node);
+/* Leave some time for checkers to iterate through. */
+usleep(random_range(1000));
+}
+
+ovsrcu_quiesce();
+
+value = MAX_ELEMS;
+RCULIST_FOR_EACH_REVERSE_PROTECTED (elem, node, list) {
+ovs_assert (elem->value <= value);
+value = elem->value;
+}
+
+if (long_version) {
+struct element *next;
+RCULIST_FOR_EACH_SAFE_PROTECTED (elem, next, node, list) {
+rculist_remove(>node);
+ovsrcu_postpone(free, elem);
+/* Leave some time for checkers to iterate through. */
+usleep(random_range(1000));
+}
+} else {
+RCULIST_FOR_EACH_SAFE_PROTECTED (elem, node, list) {
+rculist_remove(>node);
+ovsrcu_postpone(free, elem);
+/* Leave some time for checkers to iterate through. */
+usleep(random_range(1000));
+}
+}
+}
+
+static void
+test_rculist_insert_delete(struct rculist *list) {
+test_rculist_insert_delete__(list, false);
+}
+
+static void
+test_rculist_insert_delete_long(struct rculist *list) {
+test_rculist_insert_delete__(list, true);
+}
+
+static void
+test_rculist_push_front_pop_back(struct rculist *list)
+{
+struct element *elem;
+
+for (int i = MAX_ELEMS - 1; i > 0; i--) {
+elem = xmalloc(sizeof *elem);
+elem->value = i;
+rculist_push_front(list, >node);
+/* Leave some time for checkers to iterate through. */
+usleep(random_range(1000));
+}
+
+ovsrcu_quiesce();
+
+while (!rculist_is_empty(list)) {
+elem = CONTAINER_OF(rculist_pop_back(list), struct element, node);
+ovsrcu_postpone(free, elem);
+/* Leave some time for checkers to iterate through. */
+usleep(random_range(1000));
+}
+}
+
+static void
+test_rculist_push_back_pop_front(struct rculist *list)
+{
+

[ovs-dev] [PATCH v1 1/2] rculist: use rculist_back_protected to access prev

2022-12-05 Thread Adrian Moreno
The .prev member of a rculist should not be used directly by users
because it's not rcu-safe. A convenient fake mutex (rculist_fake_mutex)
helps ensuring that in conjunction with clang's thread safety
extensions.

Only writers with exclusive access to the rculist should access .prev
via some of the provided *_protected() accessors.

Use rculist_back_protected() in REVERSE_PROTECTED iterators to avoid
clang's compilation warning.

Signed-off-by: Adrian Moreno 
---
 lib/rculist.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/rculist.h b/lib/rculist.h
index 9bb8cbf3e..6df963eb2 100644
--- a/lib/rculist.h
+++ b/lib/rculist.h
@@ -378,12 +378,14 @@ rculist_is_singleton_protected(const struct rculist *list)
  UPDATE_MULTIVAR(ITER, rculist_next(ITER_VAR(ITER
 
 #define RCULIST_FOR_EACH_REVERSE_PROTECTED(ITER, MEMBER, RCULIST) \
-for (INIT_MULTIVAR(ITER, MEMBER, (RCULIST)->prev, struct rculist);\
+for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(RCULIST), \
+   struct rculist);   \
  CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST));   \
- UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
+ UPDATE_MULTIVAR(ITER, rculist_back_protected(ITER_VAR(ITER
 
 #define RCULIST_FOR_EACH_REVERSE_PROTECTED_CONTINUE(ITER, MEMBER, RCULIST)\
-for (INIT_MULTIVAR(ITER, MEMBER, (ITER)->MEMBER.prev, struct rculist);\
+for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(ITER->MEMBER),\
+   struct rculist);   \
  CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST));   \
  UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
 
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev