from:"Aaron Conole"

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-25 Thread Aaron Conole

Jakub Kicinski  writes:

> On Thu, 25 Apr 2024 19:57:19 +0100 Simon Horman wrote:
>> openvswitch.sh does not appear to have any dependencies on Open vSwitch
>> user-space. My understanding is that, rather, it makes use of
>> tools/testing/selftests/net/openvswitch/ovs-dpctl.py to talk to the Kernel
>> using Netlink (which is also what Open vSwitch user-space does).
>> 
>> My brief testing indicates that for this the only dependencies
>> when running on Amazon Linux 2 are python3 and pyroute2.
>> 
>> I think that it should be possible to port pmtu.sh to use ovs-dpctl.py.
>> This would require some enhancements to ovs-dpctl.py to handle adding the
>> tunnel vports (interfaces).

I have some work from some time back:

https://github.com/apconole/linux-next-work/commit/61d2b9fc68a4e3fc950a24b06232c2fbcbfa0372

https://github.com/apconole/linux-next-work/commit/7262107de7170d44b6dbf6c5ea6f7e6c0bb71d36

https://github.com/apconole/linux-next-work/commit/af5338c9044660fa0eaaa967602aec706bbaeb5b

I was using it to try and build up a test to check recursions in the
datapath, but didn't get a chance to finish.

BUT most of the stuff is there, just needs to be cleaned up.  It would
at least cover the classic case, but the LWT support needs to be added.

>> As an aside, to run the Open vSwitch tests in pmtu.sh the openvswitch
>> kernel module is needed. So I think it would make sense to add
>> CONFIG_OPENVSWITCH to tools/testing/selftests/net/config.
>> 
>> That would mean that tools/testing/selftests/net/config also has all
>> the requirements to run openvswitch.sh. If so, we probably wouldn't need to
>> add tools/testing/selftests/net/openvswitch/config or otherwise do anything
>> special to configure the kernel for openvswitch.sh.
>
> That sounds great, for simplicity we could move the ovs files down 
> to the .../net/ directory. It'd be cool to not have to do that, and
> let us separate tests more clearly into directories. But right now
> every directory has its own runner so there's a high price to pay
> for a primarily aesthetic gain :(

Either one would work for me.  I will repost the RFC addressing some of
the comments.  It should be runnable in a bare VM that has the required
python packages that Simon notes (pyroute2 and python3).

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-25 Thread Aaron Conole

Simon Horman  writes:

> On Wed, Apr 24, 2024 at 02:14:09PM -0400, Aaron Conole wrote:
>> Simon Horman  writes:
>> 
>> > Hi Aaron, Jakub, all,
>> >
>> > I have recently been exercising the Open vSwitch kernel selftests,
>> > using vng, something like this:
>> >
>> >TESTDIR="tools/testing/selftests/net/openvswitch"
>> >
>> > vng -v --run . --user root --cpus 2 \
>> > --overlay-rwdir "$PWD" -- \
>> > "modprobe openvswitch && \
>> > echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>> >  make -C \"$TESTDIR\" run_tests"
>> >
>> > And I have some observations that I'd like to ask about.
>> >
>> > 1. Building the kernel using the following command does not
>> >build the openvswitch kernel module.
>> >
>> >vng -v --build \
>> >--config tools/testing/selftests/net/config
>> >
>> >All that seems to be missing is CONFIG_OPENVSWITCH=m
>> >and I am wondering what the best way of resolving this is.
>> >
>> >Perhaps I am doing something wrong.
>> >Or perhaps tools/testing/selftests/net/openvswitch/config
>> >should be created? If so, should it include (most of?) what is in
>> >tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?
>> 
>> I have a series that I need to get back to fixing:
>> 
>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411917.html
>> 
>> which does include the config listed, and some of the fixes for things
>> you've noted.
>> 
>> I think it makes sense to get back to it.
>
> Thanks Aaron,
>
> I was hoping you might say something like that.
>
> WRT to the config itself, as Benjamin mentioned elsewhere in this thread [1]
> there is a question about how this should be handled consistently for
> all selftests.
>
> [1] https://lore.kernel.org/netdev/ZilIgbIvB04iUal2@f4/

Yeah, I think it makes sense.  There are probably some other bashisms
beyond the substitution noted.  I'll add it to the RFC and rework.

>> 
>> > 2. As per my example above, it seems that a modprobe openvswitch is
>> >required (if openvswitch is a module).
>> >
>> >Again, perhaps I am doing something wrong. But if not, should this be
>> >incorporated into tools/testing/selftests/net/openvswitch/openvswitch.sh
>> >or otherwise automated?
>> >
>> > 3. I have observed that the last test fails (yesterday, but not today!),
>> >because the namespace it tries to create already exists. I believe this
>> >is because it is pending deletion.
>> >
>> >My work-around is as follows:
>> >
>> >  ovs_add_netns_and_veths () {
>> >info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
>> > +  for i in $(seq 10); do
>> > +  ovs_sbx "$1" test -e "/var/run/netns/$3" || break
>> > +  info "Namespace $3 still exists (attempt $i)"
>> > +  ovs_sbx "$1" ip netns del "$3"
>> > +  sleep "$i"
>> > +  done
>> >ovs_sbx "$1" ip netns add "$3" || return 1
>> >on_exit "ovs_sbx $1 ip netns del $3"
>> >ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
>> >
>> >N.B.: the "netns del" part is probably not needed,
>> >but I'm not able to exercise it effectively right now.
>> >
>> >I am wondering if a loop like this is appropriate to add, perhaps also
>> >to namespace deletion. Or if it would be appropriate to port
>> >openvswitch.sh to use ./tools/testing/selftests/net/lib.sh, which I
>> >believe handles this.
>> >
>> > 4. I am observing timeouts whith the default value of 45s.
>> >Bumping this to 90s seems to help.
>> >Are there any objections to a patch to bump the timeout?
>
> Regarding points 3 and 4.
>
> I did a bit more testing after I sent my email yesterday.
> I have two test machines. It turns out, to my surprise, that one is
> much slower than the other when running openvswitch.sh with vng.
>
> I am unsure why, but that isn't really on topic. The point
> is that I'm currently only seeing problems 3 and 4 manifest
> on the slow machine.
>
> I think problem 3 is probably worth solving.
> But the timeout question is more subjective.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-25 Thread Aaron Conole

Jakub Kicinski  writes:

> On Wed, 24 Apr 2024 17:44:05 +0100 Simon Horman wrote:
>> I have recently been exercising the Open vSwitch kernel selftests,
>> using vng,
>
> Speaking of ovs tests, we currently don't run them in CI (and suffer
> related skips in pmtu.sh) because Amazon Linux doesn't have ovs
> packaged and building it looks pretty hard.
>
> Is there an easy way to build just the CLI tooling or get a pre-built
> package somewhere?

As Simon notes - we would need some support in the ovs-dpctl.py to set
up the tunnel interfaces, and probably need to set them up for lwt and
classic tunnels.

I have a test branch where I have support for the former, and I can
clean it up and submit it.

> Or perhaps you'd be willing to run the OvS tests and we can move 
> the part of pmtu.sh into OvS test dir?

I guess either would be fine, as long as they can get run.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Aaron Conole

Benjamin Poirier  writes:

> On 2024-04-24 18:37 +0100, Simon Horman wrote:
>> On Wed, Apr 24, 2024 at 05:44:05PM +0100, Simon Horman wrote:
>> > Hi Aaron, Jakub, all,
>> > 
>> > I have recently been exercising the Open vSwitch kernel selftests,
>> > using vng, something like this:
>> > 
>> >TESTDIR="tools/testing/selftests/net/openvswitch"
>> > 
>> > vng -v --run . --user root --cpus 2 \
>> > --overlay-rwdir "$PWD" -- \
>> > "modprobe openvswitch && \
>> > echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>> >  make -C \"$TESTDIR\" run_tests"
>> > 
>> > And I have some observations that I'd like to ask about.
>> > 
>> > 1. Building the kernel using the following command does not
>> >build the openvswitch kernel module.
>> > 
>> >vng -v --build \
>> >--config tools/testing/selftests/net/config
>> > 
>> >All that seems to be missing is CONFIG_OPENVSWITCH=m
>> >and I am wondering what the best way of resolving this is.
>> > 
>> >Perhaps I am doing something wrong.
>> >Or perhaps tools/testing/selftests/net/openvswitch/config
>> >should be created? If so, should it include (most of?) what is in
>> >tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?
>
> I noticed something similar when testing Jiri's virtio_net selftests
> patchset [1].
>
> drivers/net/virtio_net/config includes virtio options but the
> test also needs at least CONFIG_NET_VRF=y which is part of net/config.
>
> Whatever the answer to your question, all config files should be
> coherent on this matter.
>
> [1] https://lore.kernel.org/netdev/20240424104049.3935572-1-j...@resnulli.us/
>
> [...]
>> 
>>   5. openvswitch.sh starts with "#!/bin/sh".
>>  But substitutions such as "${ns:0:1}0"  fail if /bin/sh is dash.
>>  Perhaps we should change openvswitch.sh to use bash?
>
> I think so. A similar change was done in
> c2518da8e6b0 selftests: bonding: Change script interpreter (v6.8-rc1)

+1 - I'm okay with it.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Aaron Conole

Simon Horman  writes:

> Hi Aaron, Jakub, all,
>
> I have recently been exercising the Open vSwitch kernel selftests,
> using vng, something like this:
>
>   TESTDIR="tools/testing/selftests/net/openvswitch"
>
> vng -v --run . --user root --cpus 2 \
> --overlay-rwdir "$PWD" -- \
> "modprobe openvswitch && \
>echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>  make -C \"$TESTDIR\" run_tests"
>
> And I have some observations that I'd like to ask about.
>
> 1. Building the kernel using the following command does not
>build the openvswitch kernel module.
>
>   vng -v --build \
>   --config tools/testing/selftests/net/config
>
>All that seems to be missing is CONFIG_OPENVSWITCH=m
>and I am wondering what the best way of resolving this is.
>
>Perhaps I am doing something wrong.
>Or perhaps tools/testing/selftests/net/openvswitch/config
>should be created? If so, should it include (most of?) what is in
>tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?

I have a series that I need to get back to fixing:

https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411917.html

which does include the config listed, and some of the fixes for things
you've noted.

I think it makes sense to get back to it.

> 2. As per my example above, it seems that a modprobe openvswitch is
>required (if openvswitch is a module).
>
>Again, perhaps I am doing something wrong. But if not, should this be
>incorporated into tools/testing/selftests/net/openvswitch/openvswitch.sh
>or otherwise automated?
>
> 3. I have observed that the last test fails (yesterday, but not today!),
>because the namespace it tries to create already exists. I believe this
>is because it is pending deletion.
>
>My work-around is as follows:
>
>  ovs_add_netns_and_veths () {
>   info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
> + for i in $(seq 10); do
> + ovs_sbx "$1" test -e "/var/run/netns/$3" || break
> + info "Namespace $3 still exists (attempt $i)"
> + ovs_sbx "$1" ip netns del "$3"
> + sleep "$i"
> + done
>   ovs_sbx "$1" ip netns add "$3" || return 1
>   on_exit "ovs_sbx $1 ip netns del $3"
>   ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
>
>N.B.: the "netns del" part is probably not needed,
>but I'm not able to exercise it effectively right now.
>
>I am wondering if a loop like this is appropriate to add, perhaps also
>to namespace deletion. Or if it would be appropriate to port
>openvswitch.sh to use ./tools/testing/selftests/net/lib.sh, which I
>believe handles this.
>
> 4. I am observing timeouts whith the default value of 45s.
>Bumping this to 90s seems to help.
>Are there any objections to a patch to bump the timeout?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] net: openvswitch: Release reference to netdev

2024-04-24 Thread Aaron Conole

Jun Gu  writes:

> dev_get_by_name will provide a reference on the netdev. So ensure that
> the reference of netdev is released after completed.
>
> Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name")
> Signed-off-by: Jun Gu 
> ---

Thanks!

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] net: openvswitch: Fix Use-After-Free in ovs_ct_exit

2024-04-24 Thread Aaron Conole

Hyunwoo Kim  writes:

> Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
> of ovs_ct_limit_exit, is not part of the RCU read critical section, it
> is possible that the RCU grace period will pass during the traversal and
> the key will be free.
>
> To prevent this, it should be changed to hlist_for_each_entry_safe.
>
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Signed-off-by: Hyunwoo Kim 
> ---

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] rhel/systemd: Set ovsdb-server timeout to 5 minutes

2024-04-23 Thread Aaron Conole

Chris Riches  writes:

> On 15/04/2024 14:39, Jon Kohler wrote:
>>> On Apr 11, 2024, at 9:43 AM, Chris Riches  wrote:
>>>
>>> On 11/04/2024 14:24, Ilya Maximets wrote:
 On 4/11/24 10:59, Chris Riches wrote:
>  From what we know so far, the DB was full of stale connection-tracking
> information such as the following:
>
> [...]
>
> Once the host was recovered by putting in the timeout increase,
> ovsdb-server successfully started and GCed the database down from 2.4
> *GB* to 29 *KB*. Had this happened before the host restart, we would
> have never seen this problem. But since it seems possible to end up
> booting with such a large DB, we figured a timeout increase was a
> sensible measure to take.
 Uff.  Sounds like ovn-controller went off the rails.

 Normally, ovsdb-server compacts the database once in 10-20 minutes,
 if the database doubles the size since the previous check.  If all
 the transactions are that small, it would mean ovn-controller made
 about 10K transactions per second in the 10-20 minutes before the
 restart.  That's huge.

 I wonder if this can be addressed with a better compaction strategy.
 Something like forcing compaction if "the database is more than 10 MB
 and increased 10x" regardless of the time.
>>> I'm not sure exactly what the test was doing when this was
>>> observed, so I don't know whether that transaction volume is within
>>> the realm of possibility or if we're looking at a failure to
>>> perform compaction on time. It would be nice to have an enhanced
>>> safety-net for DB size, as we were only a few hundred MB away from
>>> hitting filesystem space issues as well.
>>>
 Normally, ovsdb-server compacts the database once in 10-20
 minutes, if the database doubles the size since the previous
 check.
>>> I presume you mean if it doubled in size since the previous
>>> *compaction*? If we only compact when it doubles since the last
>>> *check*, then it would be easy for it to slightly-less-than-double
>>> every 10-20 minutes and never trigger the compaction while still
>>> growing exponentially.
>>>
>>> I'm happy to discuss compaction approaches (though my expertise is
>>> very much in host service management and not OVS itself), but do
>>> you think there's merit in having this extended timeout as a
>>> backstop too?
>> FWIW, I think we should do both extending the time out and tuning up the
>> compaction, as having a situation where a service can get in an endless
>> loop if for whatever reason it takes too long is problematic. Addressing
>> the root cause (compaction, too many calls, some other bug(s) etc) is
>> good, but extending the timeout seems like an easy backstop.
>
> I agree with Jon's assessment - regardless of any action taken on
> compaction or preventing growth in the first place, we should consider
> the proposed timeout increase as a backstop against getting stuck in
> an infinite loop.
>
> Ilya (or another maintainer) - can I get an opinion on this?

From my side, it looks fine.  I don't think we ever saw the DB taking
this long on startup, so never considered that it could (maybe it is the
case that compaction also occurs on graceful exits?  I don't know ovsdb
that well).

At least from my side, it seems fine.

> Thanks,
> Chris

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] selftests: openvswitch: Fix escape chars in regexp.

2024-04-17 Thread Aaron Conole

Adrian Moreno  writes:

> Character sequences starting with `\` are interpreted by python as
> escaped Unicode characters. However, they have other meaning in
> regular expressions (e.g: "\d").
>
> It seems Python >= 3.12 starts emitting a SyntaxWarning when these
> escaped sequences are not recognized as valid Unicode characters.
>
> An example of these warnings:
>
> tools/testing/selftests/net/openvswitch/ovs-dpctl.py:505:
> SyntaxWarning: invalid escape sequence '\d'
>
> Fix all the warnings by flagging literals as raw strings.
>
> Signed-off-by: Adrian Moreno 
> ---

Thanks, Adrian.

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-tcpdump: Support vlan option.

2024-04-09 Thread Aaron Conole

Daniel Ding  writes:

>  2024年3月30日 上午2:43，Aaron Conole  写道：
>
>  Daniel Ding  writes:
>
>  When I try filter geneve protocol with a vlan, the warning message
>  occurs that tell me the kernel cann't support this combination.
>
>  ^ can't
>
>  $ ovs-tcpdump -i eth2 -nne vlan 10 and geneve
>  Warning: Kernel filter failed: Invalid argument
>
>  So I fix it by the following:
>  1. the mirror-to interface was added with a vlan tag, which let
>  datapath to pop its tag.
>  2. the traffic will be mirrored with mirror's select_vlan, and that
>  don't care about will not be received on the mirror-to interface.
>
>  Maybe rephrase:
>
>  "The fix is to make a convenience argument that sets the mirror port's
>  select_vlan column, and also marks the port as native tagged with VLAN."
>
> Thanks you, Aaron. 
> I have already updated this description in the following patch.
>
>  Signed-off-by: Daniel Ding 
>  ---

Please post as a v2 patch in a separate thread.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] Rename primary development branch as main.

2024-04-04 Thread Aaron Conole

Simon Horman  writes:

> Recently OVS adopted a policy of using the inclusive naming word list v1
> [1, 2].
>
> In keeping with this policy rename the primary development branch from
> 'master' to 'main'. This patch does not actually make that change,
> but rather updates references to the branch in the source tree.
> It is intended to be applied at (approximately) the same time that the
> change is made.
>
> OVS is currently hosted on GitHub. We can expect the following behaviour
> after the rename:
>
> 1. GitHub pull requests against are renamed branch are automatically
>re-homed on new branch
> 2. GitHub Issues do not seem to be affected - at least the test issue I
>created had no association with a branch
> 3. URLs accessed via the GitHub web UI are automatically renamed
>(so long as a new branch called master is not created).
> 4. Using the git cli command, fetch will fetch the new branch (main),
>and fetch -p will remove (prune) the old branch (master)
>
> [1] df5e5cf4318a ("Documentation: Add section on inclusive language.")
> [2] https://inclusivenaming.org/word-lists/
>
> Signed-off-by: Simon Horman 
> ---
> Changes in v2:
> - Keep two blank lines between versions.
> - Drop bogus update to OpenSSL hashes URL in appveyor.yml.
> - Drop other appveyor.yml changes, they are now present upstream.
>   + appveyor: Prepare for rename of primary development branch.
> https://github.com/openvswitch/ovs/commit/95ff912edef8
> - Add note about updates to git configuration.
> ---
> Notes:
>
> * Now is the time to raise any concerns regarding this patch.
>   It is planned to implement this change next week.

Awesome!

> * If you have an automation that fetches the master branch then
>   the suggested action is:
>   1. Before the branch rename occurs: update the automation to pull main an
>  fall back to pulling master if that fails
>   2. After the rename occurs: Update the automation to only fetch main
>
> * After the change it may be necessary to update your local
>   git configuration for checked out branches.
>
>   For example:
>   # Fetch origin: new remote main branch; remote master branch is deleted
>   git fetch -tp origin
>   # Rename local branch
>   git branch -m master main
>   # Update local main branch to use remote main branch as it's upstream
>   git branch --set-upstream-to=origin/main main

Thanks for listing these steps.  Hopefully it will make it simpler for
others.

> ---
>  .../internals/committer-responsibilities.rst   | 12 +++---
>  .../internals/contributing/backporting-patches.rst | 12 +++---
>  Documentation/internals/release-process.rst| 50 
> +++---
>  Documentation/intro/install/dpdk.rst   |  2 +-
>  Documentation/intro/install/fedora.rst |  2 +-
>  Documentation/intro/install/general.rst|  2 +-
>  Documentation/intro/install/rhel.rst   |  2 +-
>  Documentation/topics/language-bindings.rst |  2 +-
>  Documentation/tutorials/faucet.rst |  6 +--
>  Documentation/tutorials/ovs-conntrack.rst  |  2 +-
>  NEWS   |  3 ++
>  README.rst |  2 +-
>  12 files changed, 50 insertions(+), 47 deletions(-)

Acked-by: Aaron Conole 

> diff --git a/Documentation/internals/committer-responsibilities.rst 
> b/Documentation/internals/committer-responsibilities.rst
> index c35fd708913b..eed2e017678a 100644
> --- a/Documentation/internals/committer-responsibilities.rst
> +++ b/Documentation/internals/committer-responsibilities.rst
> @@ -73,14 +73,14 @@ If it is someone else's change, then you can ask the 
> original submitter to
>  address it. Regardless, you need to ensure that the problem is fixed in a
>  timely way. The definition of "timely" depends on the severity of the 
> problem.
>  
> -If a bug is present on master and other branches, fix it on master first, 
> then
> +If a bug is present on main and other branches, fix it on main first, then
>  backport the fix to other branches. Straightforward backports do not require
> -additional review (beyond that for the fix on master).
> +additional review (beyond that for the fix on main).
>  
> -Feature development should be done only on master. Occasionally it makes 
> sense
> +Feature development should be done only on main. Occasionally it makes sense
>  to add a feature to the most recent release branch, before the first actual
>  release of that branch. These should be handled in the same way as bug fixes,
> -that is, first implemented on master and then backported.
> +that is, first implemented on main and then backported.
>  
>  Ke

Re: [ovs-dev] [PATCH v7 2/2] netlink-conntrack: Optimize flushing ct zone.

2024-04-04 Thread Aaron Conole

Felix Huettner via dev  writes:

> Previously the kernel did not provide a netlink interface to flush/list
> only conntrack entries matching a specific zone. With [1] and [2] it is now
> possible to flush and list conntrack entries filtered by zone. Older
> kernels not yet supporting this feature will ignore the filter.
> For the list request that means just returning all entries (which we can
> then filter in userspace as before).
> For the flush request that means deleting all conntrack entries.
>
> The implementation is now identical to the windows one, so we combine
> them.
>
> These significantly improves the performance of flushing conntrack zones
> when the conntrack table is large. Since flushing a conntrack zone is
> normally triggered via an openflow command it blocks the main ovs thread
> and thereby also blocks new flows from being applied. Using this new
> feature we can reduce the flushing time for zones by around 93%.
>
> In combination with OVN the creation of a Logical_Router (which causes
> the flushing of a ct zone) could block other operations, e.g. the
> failover of Logical_Routers (as they cause new flows to be created).
> This is visible from a user perspective as a ovn-controller that is idle
> (as it waits for vswitchd) and vswitchd reporting:
> "blocked 1000 ms waiting for main to quiesce" (potentially with ever
> increasing times).
>
> The following performance tests where run in a qemu vm with 500.000
> conntrack entries distributed evenly over 500 ct zones using `ovstest
> test-netlink-conntrack flush zone=`.
>
>   |  flush zone with 1000 entries  |   flush zone with no entry |
>   +-+--+-+--|
>   |   with the patch| without  |   with the patch| without  |
>   +--+--+--+--+--+--|
>   | v6.8-rc4 |  v6.7.1  | v6.8-rc4 | v6.8-rc4 |  v6.7.1  | v6.8-rc4 |
> +-+--+--+--+--+--+--|
> | Min |  0.260   |  3.946   |  3.497   |  0.228   |  3.462   |  3.212   |
> | Median  |  0.319   |  4.237   |  4.349   |  0.298   |  4.460   |  4.010   |
> | 90%ile  |  0.335   |  4.367   |  4.522   |  0.325   |  4.662   |  4.572   |
> | 99%ile  |  0.348   |  4.495   |  4.773   |  0.340   |  4.931   |  6.003   |
> | Max |  0.362   |  4.543   |  5.054   |  0.348   |  5.390   |  6.396   |
> | Mean|  0.320   |  4.236   |  4.331   |  0.296   |  4.430   |  4.071   |
> | Total   |  80.02   |  1058|  1082|  73.93   |  1107|  1017|
>
> [1]: 
> https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba
> [2]: 
> https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a
>
> Acked-by: Mike Pattrick 
> Co-Authored-By: Luca Czesla 
> Signed-off-by: Luca Czesla 
> Co-Authored-By: Max Lamprecht 
> Signed-off-by: Max Lamprecht 
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Aaron Conole 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v7 1/2] util: Support checking for kernel versions.

2024-04-03 Thread Aaron Conole

Felix Huettner via dev  writes:

> Extract checking for a given kernel version to a separate function.
> It will be used also in the next patch.
>
> Acked-by: Mike Pattrick 
> Acked-by: Eelco Chaudron 
> Signed-off-by: Felix Huettner 
> ---

Acked-by: Aaron Conole 

> v6->v7: none
> v5->v6:
> - fix ovs_kernel_is_version_or_newer returning false if major and minor
>   are equal (thanks Mike)
> v4->v5:
> - fix wrong ifdef that broke on macos
> - fix ovs_kernel_is_version_or_newer working in reverse than desired
> - ovs_kernel_is_version_or_newer now always returns false if uname
>   errors (Thanks Eelco)
> v4:
> - extract function to check kernel version
>
>  lib/netdev-linux.c | 15 +++
>  lib/util.c | 27 +++
>  lib/util.h |  4 
>  3 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 1e904d8e6..25349c605 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -40,7 +40,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -6428,18 +6427,10 @@ getqdisc_is_safe(void)
>  static bool safe = false;
>  
>  if (ovsthread_once_start()) {
> -struct utsname utsname;
> -int major, minor;
> -
> -if (uname() == -1) {
> -VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> -} else if (!ovs_scan(utsname.release, "%d.%d", , )) {
> -VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> -} else if (major < 2 || (major == 2 && minor < 35)) {
> -VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel %s",
> -  utsname.release);
> -} else {
> +if (ovs_kernel_is_version_or_newer(2, 35)) {
>  safe = true;
> +} else {
> +VLOG_INFO("disabling unsafe RTM_GETQDISC in Linux kernel");
>  }
>  ovsthread_once_done();
>  }
> diff --git a/lib/util.c b/lib/util.c
> index 3fb3a4b40..5c31d983a 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -27,6 +27,7 @@
>  #include 
>  #ifdef __linux__
>  #include 
> +#include 
>  #endif
>  #include 
>  #include 
> @@ -2500,3 +2501,29 @@ OVS_CONSTRUCTOR(winsock_start) {
> }
>  }
>  #endif
> +
> +#ifdef __linux__
> +bool
> +ovs_kernel_is_version_or_newer(int target_major, int target_minor)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +static int current_major, current_minor = -1;
> +
> +if (ovsthread_once_start()) {
> +struct utsname utsname;
> +
> +if (uname() == -1) {
> +VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
> +} else if (!ovs_scan(utsname.release, "%d.%d",
> +_major, _minor)) {
> +VLOG_WARN("uname reported bad OS release (%s)", utsname.release);
> +}
> +ovsthread_once_done();
> +}
> +if (current_major == -1 || current_minor == -1) {
> +return false;
> +}
> +return current_major > target_major || (
> +current_major == target_major && current_minor >= target_minor);
> +}
> +#endif
> diff --git a/lib/util.h b/lib/util.h
> index f2d45bcac..55718fd87 100644
> --- a/lib/util.h
> +++ b/lib/util.h
> @@ -611,4 +611,8 @@ int ftruncate(int fd, off_t length);
>  }
>  #endif
>  
> +#ifdef __linux__
> +bool ovs_kernel_is_version_or_newer(int target_major, int target_minor);
> +#endif
> +
>  #endif /* util.h */
>
> base-commit: 33f45ded67a2d524ccf54cf4bb79a38d8140f14b

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1 1/1] ofproto: Fix NULL deref reported by Coverity.

2024-04-02 Thread Aaron Conole

mit...@outlook.com writes:

> From: miter 
>
> Ofproto_class_find__() may return NULL, and dereference it to cause
> segfault.
>
> Tested-by: Zhang YuHuang 
> Signed-off-by: Lin Huang 
> ---

I guess that type_run and type_wait aren't flagged this way because the
only users walk the ofproto types list explicitly (so we don't get into
this situation)?

>  ofproto/ofproto.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index 122a06f30..21c6a1d82 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -800,7 +800,7 @@ ofproto_type_set_config(const char *datapath_type, const 
> struct smap *cfg)
>  datapath_type = ofproto_normalize_type(datapath_type);
>  class = ofproto_class_find__(datapath_type);
>  
> -if (class->type_set_config) {
> +if (class && class->type_set_config) {

This change looks good to me.  Have you a related issue or is it just
preventative?

>  class->type_set_config(datapath_type, cfg);
>  }
>  }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-tcpdump: Support vlan option.

2024-03-29 Thread Aaron Conole

Daniel Ding  writes:

> When I try filter geneve protocol with a vlan, the warning message
> occurs that tell me the kernel cann't support this combination.
 ^ can't
>
> $ ovs-tcpdump -i eth2 -nne vlan 10 and geneve
> Warning: Kernel filter failed: Invalid argument
>
> So I fix it by the following:
> 1. the mirror-to interface was added with a vlan tag, which let
> datapath to pop its tag.
> 2. the traffic will be mirrored with mirror's select_vlan, and that
> don't care about will not be received on the mirror-to interface.

Maybe rephrase:

"The fix is to make a convenience argument that sets the mirror port's
select_vlan column, and also marks the port as native tagged with VLAN."

> Signed-off-by: Daniel Ding 
> ---
>  utilities/ovs-tcpdump.in | 37 +
>  1 file changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
> index eada803bb..b2b69d3c4 100755
> --- a/utilities/ovs-tcpdump.in
> +++ b/utilities/ovs-tcpdump.in
> @@ -142,6 +142,8 @@ The following options are available:
> --mirror-toThe name for the mirror port to use (optional)
>Default 'miINTERFACE'
> --span If specified, mirror all ports (optional)
> +   --vlan If specified, mirror a vlan traffic and pop
   s/ a//

> +  its tag (optional)
>  """ % {'prog': sys.argv[0]})
>  sys.exit(0)
>  
> @@ -319,7 +321,7 @@ class OVSDB(object):
>   (mirror_name, txn.get_error()))
>  self._txn = None
>  
> -def make_port(self, port_name, bridge_name):
> +def make_port(self, port_name, bridge_name, vlan=None):
>  iface_row = self.make_interface(port_name, False)
>  txn = self._txn
>  
> @@ -330,6 +332,12 @@ class OVSDB(object):
>  port = txn.insert(self.get_table('Port'))
>  port.name = port_name
>  
> +if vlan is not None:
> +port.verify('tag')
> +tag = getattr(port, 'tag', [])
> +tag.append(vlan)
> +port.tag = tag
> +

Can a port have multiple tags set for vlan?  I was under the impression
that only one native tag was allowed per-port, but maybe I'm wrong.

>  br.verify('ports')
>  ports = getattr(br, 'ports', [])
>  ports.append(port)
> @@ -354,7 +362,7 @@ class OVSDB(object):
>  return result
>  
>  def bridge_mirror(self, intf_name, mirror_intf_name, br_name,
> -  mirror_select_all=False):
> +  mirror_select_all=False, mirrored_vlan=None):
>  
>  txn = self._start_txn()
>  mirror = txn.insert(self.get_table('Mirror'))
> @@ -374,6 +382,12 @@ class OVSDB(object):
>  src_port.append(mirrored_port)
>  mirror.select_src_port = src_port
>  
> +if mirrored_vlan:
> +mirror.verify('select_vlan')
> +select_vlan = getattr(mirror, 'select_vlan', [])
> +select_vlan.append(mirrored_vlan)
> +mirror.select_vlan = select_vlan
> +
>  output_port = self._find_row_by_name('Port', mirror_intf_name)
>  
>  mirror.verify('output_port')
> @@ -440,6 +454,7 @@ def main():
>  db_sock = 'unix:%s' % os.path.join(rundir, "db.sock")
>  interface = None
>  tcpdargs = []
> +vlan = None
>  
>  skip_next = False
>  mirror_interface = None
> @@ -474,12 +489,25 @@ def main():
>  elif cur in ['--span']:
>  mirror_select_all = True
>  continue
> +elif cur in ['--vlan']:
> +vlan = nxt
> +skip_next = True
> +continue
>  tcpdargs.append(cur)
>  
>  if interface is None:
>  print("Error: must at least specify an interface with '-i' option")
>  sys.exit(1)
>  
> +if vlan:
> +try:
> +vlan = int(vlan)
> +if vlan < 0 or vlan > 4095:
> +raise ValueError("out of range")
> +except ValueError:
> +print("Error: vlan muse be within <0-4095>")
  ^must

> +sys.exit(1)
> +
>  if not py_which(dump_cmd):
>  print("Error: unable to execute '%s' (check PATH)" % dump_cmd)
>  sys.exit(1)
> @@ -523,10 +551,11 @@ def main():
>  teardown(db_sock, interface, mirror_interface, tap_created)
>  
>  try:
> -ovsdb.make_port(mirror_interface, ovsdb.port_bridge(interface))
> +ovsdb.make_port(mirror_interface,
> +ovsdb.port_bridge(interface), vlan)
>  ovsdb.bridge_mirror(interface, mirror_interface,
>  ovsdb.port_bridge(interface),
> -mirror_select_all)
> +mirror_select_all, vlan)
>  except OVSDBException as oe:
>

Re: [ovs-dev] [PATCH] conntrack: Do clean instead of forece expire.

2024-03-27 Thread Aaron Conole

Cheng Li  writes:

> Force expire a connection and then create new connection of the same
> tuple(cmap hash). This makes ct->conns cmap operation expensive(
> within ct->ct_lock).
>
> This patch cover the scenario by doing the clean immediately instead
> of setting expire. Also this patch fix ct_clean debug log.
>
> Signed-off-by: Cheng Li 
> ---

I'm having trouble with this commit message.  You don't include any
details about any performance characteristics that you measured
anywhere, while (rightly) pointing out that this has a larger
performance impact.  Please include these details (how you measured the
impact, what changed, etc).

>  lib/conntrack.c | 25 -
>  1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 8a7056bac..6e137daa6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -461,12 +461,6 @@ conn_clean(struct conntrack *ct, struct conn *conn)
>  atomic_count_dec(>n_conn);
>  }
> 
> -static void
> -conn_force_expire(struct conn *conn)
> -{
> -atomic_store_relaxed(>expiration, 0);
> -}
> -
>  /* Destroys the connection tracker 'ct' and frees all the allocated memory.
>   * The caller of this function must already have shut down packet input
>   * and PMD threads (which would have been quiesced).  */
> @@ -1030,7 +1024,10 @@ conn_update_state(struct conntrack *ct, struct 
> dp_packet *pkt,
>  case CT_UPDATE_NEW:
>  if (conn_lookup(ct, >key_node[CT_DIR_FWD].key,
>  now, NULL, NULL)) {
> -conn_force_expire(conn);
> +/* Instead of setting conn->expiration and let ct_clean 
> thread
> + * clean the conn, doing the clean now to avoid duplicate 
> hash
> + * in ct->conns for performance reason. */
> +conn_clean(ct, conn);

I guess the idea is that the clean can happen from the unlocked context,
but that creates a lock in the datapath (which is why the expire list
update is here).  The conn_key_lookup() has to iterate over the list
more, but then it doesn't need to take an expensive lock.

In this case, we will have to acquire the big CT lock, and if there are
two lookups happening simultaneously we will pend processing one packet
for the other.  Ideally, the adaptive nature of the pthread will prevent
us from having to call into kernel, but that isn't a guarantee - just a
best effort.

Have you done any kind of measurement on this?  I'm quite worried about
the worst-case that this introduces into the packet processing path.
The existing design avoids this worst-case lock performance precisely
because it marks the connection as needing expiration so that future
lookups will skip.

>  }
>  create_new_conn = true;
>  break;
> @@ -1242,7 +1239,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
>  if (OVS_UNLIKELY(force && ctx->reply && conn)) {
>  if (conn_lookup(ct, >key_node[CT_DIR_FWD].key,
>  now, NULL, NULL)) {
> -conn_force_expire(conn);
> +conn_clean(ct, conn);
>  }
>  conn = NULL;
>  }
> @@ -1429,7 +1426,8 @@ conntrack_get_sweep_interval(struct conntrack *ct)
>  }
> 
>  static size_t
> -ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
> +ct_sweep(struct conntrack *ct, struct rculist *list, long long now,
> + size_t *cleaned_count)
>  OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>  struct conn *conn;
> @@ -1438,6 +1436,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, 
> long long now)
>  RCULIST_FOR_EACH (conn, node, list) {
>  if (conn_expired(conn, now)) {
>  conn_clean(ct, conn);
> +(*cleaned_count) ++;

A fix somewhat like this may be needed anyway.  The log seems to have become
inaccurate with:

3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with 
rculists.")

Please submit it as a separate patch so we can backport.

>  }
> 
>  count++;
> @@ -1454,7 +1453,7 @@ conntrack_clean(struct conntrack *ct, long long now)
>  {
>  long long next_wakeup = now + conntrack_get_sweep_interval(ct);
>  unsigned int n_conn_limit, i;
> -size_t clean_end, count = 0;
> +size_t clean_end, count = 0, cleaned_count = 0;
> 
>  atomic_read_relaxed(>n_conn_limit, _conn_limit);
>  clean_end = n_conn_limit / 64;
> @@ -1465,13 +1464,13 @@ conntrack_clean(struct conntrack *ct, long long now)
>  break;
>  }
> 
> -count += ct_sweep(ct, >exp_lists[i], now);
> +count += ct_sweep(ct, >exp_lists[i], now, _count);
>  }
> 
>  ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
>  
> -VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
> - time_msec() - now);
> +VLOG_DBG("conntrack cleanup %"PRIuSIZE"/%"PRIuSIZE" entries in %lld 
> msec",
> + cleaned_count, count, time_msec()

Re: [ovs-dev] [PATCH net] openvswitch: Set the skbuff pkt_type for proper pmtud support.

2024-03-27 Thread Aaron Conole

Eelco Chaudron  writes:

> On 25 Mar 2024, at 13:37, Ilya Maximets wrote:
>
>> On 3/25/24 13:22, Aaron Conole wrote:
>>> Eelco Chaudron  writes:
>>>
>>>> On 22 Mar 2024, at 20:06, Aaron Conole wrote:
>>>>
>>>>> Open vSwitch is originally intended to switch at layer 2, only dealing 
>>>>> with
>>>>> Ethernet frames.  With the introduction of l3 tunnels support, it crossed
>>>>> into the realm of needing to care a bit about some routing details when
>>>>> making forwarding decisions.  If an oversized packet would need to be
>>>>> fragmented during this forwarding decision, there is a chance for pmtu
>>>>> to get involved and generate a routing exception.  This is gated by the
>>>>> skbuff->pkt_type field.
>>>>>
>>>>> When a flow is already loaded into the openvswitch module this field is
>>>>> set up and transitioned properly as a packet moves from one port to
>>>>> another.  In the case that a packet execute is invoked after a flow is
>>>>> newly installed this field is not properly initialized.  This causes the
>>>>> pmtud mechanism to omit sending the required exception messages across
>>>>> the tunnel boundary and a second attempt needs to be made to make sure
>>>>> that the routing exception is properly setup.  To fix this, we set the
>>>>> outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
>>>>> to the openvswitch module via a port device or packet command.
>>>>
>>>> Is this not a problem when the packet comes from the bridge port in the 
>>>> kernel?
>>>
>>> It very well may be an issue there as well, but the recommendation is to
>>> operate with the bridge port down as far as I know, so I don't know if
>>> this issue has been observed happening from the bridge port.
>>
>> FWIW, bridge ports are typically used as an entry point for tunneled
>> traffic so it can egress from a physical port attached to OVS.  It means
>> they are pretty much always UP in most common setups like OpenStack or
>> ovn-kubernetes and handle a decent amount of traffic.  They are also used
>> to direct some other types of traffic to the host kernel.
>
> +1 here, I’m talking about the same port. I think we only advise
> having this down for userspace bridges, but not in the case the bridge
> is the tunnel endpoint.

Okay, I'll confirm about up/down, but it seems like it shouldn't matter
and we should be setting the outgoing type.

>> Unless I misunderstood which ports we're talking about here.
>>
>> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ofproto-dpif-xlate: Fix continuations with associated metering.

2024-03-27 Thread Aaron Conole

Aaron Conole  writes:

> Ilya Maximets  writes:
>
>> On 3/22/24 14:40, Aaron Conole wrote:
>>> Open vSwitch supports the ability to invoke a controller action by way
>>> of a sample action with a specified meter.  In the normal case, this
>>> sample action is transparently generated during xlate processing.  However,
>>> when executing via a continuation, the logic to generate the sample
>>> action when finishing the context freeze was missing.  The result is that
>>> the behavior when action is 'controller(pause,meter_id=1)' does not match
>>> the behavior when action is 'controller(meter_id=1)'.
>>> 
>>> OVN and other controller solutions may rely on this metering to protect
>>> the control path, so it is critical to preserve metering, whether we are
>>> doing a plain old send to controller, or a continuation.
>>> 
>>> Fixes: 77ab5fd2a95b ("Implement serializing the state of packet
>>> traversal in "continuations".")
>>> Reported-at: https://issues.redhat.com/browse/FDP-455
>>> Tested-by: Alex Musil 
>>> Signed-off-by: Aaron Conole 
>>> ---
>>> v1 -> v2:
>>>   Clean up unrelated whitespace change
>>>   Fix style issues around test comments
>>>   Fix a line length issue truncating action in a comment
>>>   Change from `` to $() in the test
>>
>> Nit: May also change `` to $() in on_exit hook.
>
> I'll fix on apply.
>
>> But anyway:
>>
>> Acked-by: Ilya Maximets 
>
> Thanks for the review!

I fixed the nit and applied to all branches back to 2.17

Thanks all!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] openvswitch: Set the skbuff pkt_type for proper pmtud support.

2024-03-25 Thread Aaron Conole

Eelco Chaudron  writes:

> On 22 Mar 2024, at 20:06, Aaron Conole wrote:
>
>> Open vSwitch is originally intended to switch at layer 2, only dealing with
>> Ethernet frames.  With the introduction of l3 tunnels support, it crossed
>> into the realm of needing to care a bit about some routing details when
>> making forwarding decisions.  If an oversized packet would need to be
>> fragmented during this forwarding decision, there is a chance for pmtu
>> to get involved and generate a routing exception.  This is gated by the
>> skbuff->pkt_type field.
>>
>> When a flow is already loaded into the openvswitch module this field is
>> set up and transitioned properly as a packet moves from one port to
>> another.  In the case that a packet execute is invoked after a flow is
>> newly installed this field is not properly initialized.  This causes the
>> pmtud mechanism to omit sending the required exception messages across
>> the tunnel boundary and a second attempt needs to be made to make sure
>> that the routing exception is properly setup.  To fix this, we set the
>> outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
>> to the openvswitch module via a port device or packet command.
>
> Is this not a problem when the packet comes from the bridge port in the 
> kernel?

It very well may be an issue there as well, but the recommendation is to
operate with the bridge port down as far as I know, so I don't know if
this issue has been observed happening from the bridge port.

Since I will spin a v2 with a comment, do you want me to mention
something about the bridge port?

>> This issue is periodically encountered in complex setups, such as large
>> openshift deployments, where multiple sets of tunnel traversal occurs.
>> A way to recreate this is with the ovn-heater project that can setup
>> a networking environment which mimics such large deployments.  In that
>> environment, without this patch, we can see:
>>
>>   ./ovn_cluster.sh start
>>   podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
>>   podman exec ovn-chassis-1 ip netns exec sw01p1  ip r flush cache
>>   podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 
>> -c2
>>   PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
>>   From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)
>>
>>   --- 21.0.0.3 ping statistics ---
>>   2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1017ms
>>
>> Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
>> sent into the server.
>>
>> With this patch, setting the pkt_type, we see the following:
>>
>>   podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 
>> -c2
>>   PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
>>   From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
>>   ping: local error: message too long, mtu=1222
>>
>>   --- 21.0.0.3 ping statistics ---
>>   2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1061ms
>>
>> In this case, the first ping request receives the FRAG_NEEDED message and
>> a local routing exception is created.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-164
>> Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.")
>> Signed-off-by: Aaron Conole 
>> ---
>> NOTE: An alternate approach would be to add a netlink attribute to preserve
>>   pkt_type across the kernel->user boundary, but that does require some
>>   userspace cooperation.
>
> I prefer the method in this patch, as it requires no userspace change,
> i.e. it will work even with older versions of OVS without the need for
> backports.

Yes - that was my thinking as well.

>>  net/openvswitch/actions.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 6fcd7e2ca81fe..952c6292100d0 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -936,6 +936,8 @@ static void do_output(struct datapath *dp, struct 
>> sk_buff *skb, int out_port,
>>  pskb_trim(skb, ovs_mac_header_len(key));
>>  }
>>
>> +skb->pkt_type = PACKET_OUTGOING;
>> +
>
> Maybe add a comment based on the large explanation above?

Okay - I can add one.

>>  if (likely(!mru ||
>> (skb->len <= mru + vport->dev->hard_header_len))) {
>>  ovs_vport_send(vport, skb, ovs_key_mac_proto(key));
>> -- 
>> 2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ofproto-dpif-xlate: Fix continuations with associated metering.

2024-03-25 Thread Aaron Conole

Ilya Maximets  writes:

> On 3/22/24 14:40, Aaron Conole wrote:
>> Open vSwitch supports the ability to invoke a controller action by way
>> of a sample action with a specified meter.  In the normal case, this
>> sample action is transparently generated during xlate processing.  However,
>> when executing via a continuation, the logic to generate the sample
>> action when finishing the context freeze was missing.  The result is that
>> the behavior when action is 'controller(pause,meter_id=1)' does not match
>> the behavior when action is 'controller(meter_id=1)'.
>> 
>> OVN and other controller solutions may rely on this metering to protect
>> the control path, so it is critical to preserve metering, whether we are
>> doing a plain old send to controller, or a continuation.
>> 
>> Fixes: 77ab5fd2a95b ("Implement serializing the state of packet
>> traversal in "continuations".")
>> Reported-at: https://issues.redhat.com/browse/FDP-455
>> Tested-by: Alex Musil 
>> Signed-off-by: Aaron Conole 
>> ---
>> v1 -> v2:
>>   Clean up unrelated whitespace change
>>   Fix style issues around test comments
>>   Fix a line length issue truncating action in a comment
>>   Change from `` to $() in the test
>
> Nit: May also change `` to $() in on_exit hook.

I'll fix on apply.

> But anyway:
>
> Acked-by: Ilya Maximets 

Thanks for the review!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net] openvswitch: Set the skbuff pkt_type for proper pmtud support.

2024-03-22 Thread Aaron Conole

Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames.  With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions.  If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception.  This is gated by the
skbuff->pkt_type field.

When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another.  In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized.  This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup.  To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.

This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments.  In that
environment, without this patch, we can see:

  ./ovn_cluster.sh start
  podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
  podman exec ovn-chassis-1 ip netns exec sw01p1  ip r flush cache
  podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)

  --- 21.0.0.3 ping statistics ---
  2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1017ms

Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.

With this patch, setting the pkt_type, we see the following:

  podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
  ping: local error: message too long, mtu=1222

  --- 21.0.0.3 ping statistics ---
  2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1061ms

In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.

Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole 
---
NOTE: An alternate approach would be to add a netlink attribute to preserve
  pkt_type across the kernel->user boundary, but that does require some
  userspace cooperation.

 net/openvswitch/actions.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 6fcd7e2ca81fe..952c6292100d0 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -936,6 +936,8 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
pskb_trim(skb, ovs_mac_header_len(key));
}
 
+   skb->pkt_type = PACKET_OUTGOING;
+
if (likely(!mru ||
   (skb->len <= mru + vport->dev->hard_header_len))) {
ovs_vport_send(vport, skb, ovs_key_mac_proto(key));
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] ofproto-dpif-xlate: Fix continuations with associated metering.

2024-03-22 Thread Aaron Conole

Open vSwitch supports the ability to invoke a controller action by way
of a sample action with a specified meter.  In the normal case, this
sample action is transparently generated during xlate processing.  However,
when executing via a continuation, the logic to generate the sample
action when finishing the context freeze was missing.  The result is that
the behavior when action is 'controller(pause,meter_id=1)' does not match
the behavior when action is 'controller(meter_id=1)'.

OVN and other controller solutions may rely on this metering to protect
the control path, so it is critical to preserve metering, whether we are
doing a plain old send to controller, or a continuation.

Fixes: 77ab5fd2a95b ("Implement serializing the state of packet traversal in 
"continuations".")
Reported-at: https://issues.redhat.com/browse/FDP-455
Tested-by: Alex Musil 
Signed-off-by: Aaron Conole 
---
v1 -> v2:
  Clean up unrelated whitespace change
  Fix style issues around test comments
  Fix a line length issue truncating action in a comment
  Change from `` to $() in the test

 ofproto/ofproto-dpif-xlate.c | 66 +++-
 tests/ofproto-dpif.at| 51 
 2 files changed, 85 insertions(+), 32 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 89f183182e..7c49508950 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5080,10 +5080,37 @@ put_controller_user_action(struct xlate_ctx *ctx,
bool dont_send, bool continuation,
uint32_t recirc_id, int len,
enum ofp_packet_in_reason reason,
+   uint32_t provider_meter_id,
uint16_t controller_id)
 {
 struct user_action_cookie cookie;
 
+/* If the controller action didn't request a meter (indicated by a
+ * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
+ * configured through the "controller" virtual meter.
+ *
+ * Internally, ovs-vswitchd uses UINT32_MAX to indicate no meter is
+ * configured. */
+uint32_t meter_id;
+if (provider_meter_id == UINT32_MAX) {
+meter_id = ctx->xbridge->ofproto->up.controller_meter_id;
+} else {
+meter_id = provider_meter_id;
+}
+
+size_t offset;
+size_t ac_offset;
+if (meter_id != UINT32_MAX) {
+/* If controller meter is configured, generate
+ * clone(meter,userspace) action. */
+offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
+   UINT32_MAX);
+ac_offset = nl_msg_start_nested(ctx->odp_actions,
+OVS_SAMPLE_ATTR_ACTIONS);
+nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_METER, meter_id);
+}
+
 memset(, 0, sizeof cookie);
 cookie.type = USER_ACTION_COOKIE_CONTROLLER;
 cookie.ofp_in_port = OFPP_NONE,
@@ -5101,6 +5128,11 @@ put_controller_user_action(struct xlate_ctx *ctx,
 uint32_t pid = dpif_port_get_pid(ctx->xbridge->dpif, odp_port);
 odp_put_userspace_action(pid, , sizeof cookie, ODPP_NONE,
  false, ctx->odp_actions, NULL);
+
+if (meter_id != UINT32_MAX) {
+nl_msg_end_nested(ctx->odp_actions, ac_offset);
+nl_msg_end_nested(ctx->odp_actions, offset);
+}
 }
 
 static void
@@ -5145,32 +5177,6 @@ xlate_controller_action(struct xlate_ctx *ctx, int len,
 }
 recirc_refs_add(>xout->recircs, recirc_id);
 
-/* If the controller action didn't request a meter (indicated by a
- * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
- * configured through the "controller" virtual meter.
- *
- * Internally, ovs-vswitchd uses UINT32_MAX to indicate no meter is
- * configured. */
-uint32_t meter_id;
-if (provider_meter_id == UINT32_MAX) {
-meter_id = ctx->xbridge->ofproto->up.controller_meter_id;
-} else {
-meter_id = provider_meter_id;
-}
-
-size_t offset;
-size_t ac_offset;
-if (meter_id != UINT32_MAX) {
-/* If controller meter is configured, generate clone(meter, userspace)
- * action. */
-offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_SAMPLE);
-nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
-   UINT32_MAX);
-ac_offset = nl_msg_start_nested(ctx->odp_actions,
-OVS_SAMPLE_ATTR_ACTIONS);
-nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_METER, meter_id);
-}
-
 /* Generate the datapath flows even if we don't send the packet-in
  * so that debugging more closely represents normal state. */
 bool dont_send = false

Re: [ovs-dev] [PATCH v2] route-table: Avoid routes from non-standard routing tables.

2024-03-21 Thread Aaron Conole

Ilya Maximets  writes:

> Currently, ovs-vswitchd is subscribed to all the routing changes in the
> kernel.  On each change, it marks the internal routing table cache as
> invalid, then resets it and dumps all the routes from the kernel from
> scratch.  The reason for that is kernel routing updates not being
> reliable in a sense that it's hard to tell which route is getting
> removed or modified.  Userspace application has to track the order in
> which route entries are dumped from the kernel.  Updates can get lost
> or even duplicated and the kernel doesn't provide a good mechanism to
> distinguish one route from another.  To my knowledge, dumping all the
> routes from a kernel after each change is the only way to keep the
> cache consistent.  Some more info can be found in the following never
> addressed issues:
>   https://bugzilla.redhat.com/1337860
>   https://bugzilla.redhat.com/1337855
>
> It seems to be believed that NetworkManager "mostly" does incremental
> updates right.  But it is still not completely correct, will re-dump
> the whole table in certain cases, and it takes a huge amount of very
> complicated code to do the accounting and route comparisons.
>
> Going back to ovs-vswitchd, it currently dumps routes from all the
> routing tables.  If it will get conflicting routes from multiple
> tables, the cache will not be useful.  The routing cache in userspace
> is primarily used for checking the egress port for tunneled traffic
> and this way also detecting link state changes for a tunnel port.
> For userspace datapath it is used for actual routing of the packet
> after sending to a native tunnel.
> With kernel datapath we don't really have a mechanism to know which
> routing table will actually be used by the kernel after encapsulation,
> so our lookups on a cache may be incorrect because of this as well.
>
> So, unless all the relevant routes are in the standard tables, the
> lookup in userspace route cache is unreliable.
>
> Luckily, most setups are not using any complicated routing in
> non-standard tables that OVS has to be aware of.
>
> It is possible, but unlikely, that standard routing tables are
> completely empty while some other custom table is not, and all the OVS
> tunnel traffic is directed to that table.  That would be the only
> scenario where dumping non-standard tables would make sense.  But it
> seems like this kind of setup will likely need a way to tell OVS from
> which table the routes should be taken, or we'll need to dump routing
> rules and keep a separate cache for each table, so we can first match
> on rules and then lookup correct routes in a specific table.  I'm not
> sure if trying to implement all that is justified.
>
> For now, stop considering routes from non-standard tables to avoid
> mixing different tables together and also wasting CPU resources.
>
> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
> running on a same host and in a same network namespace with OVS using
> its own custom routing table.
>
> Unfortunately, there seems to be no way to tell the kernel to send
> updates only for particular tables.  So, we'll still receive and parse
> all of them.  But they will not result in a full cache invalidation in
> most cases.
>
> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
> So, we can make use of it and dump only standard tables when we get a
> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
> the socket for filtering to work.  There is no reason to not enable it
> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>
> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
> Signed-off-by: Ilya Maximets 
> ---

Thanks!

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] route-table: Avoid routes from non-standard routing tables.

2024-03-20 Thread Aaron Conole

Ilya Maximets  writes:

> On 3/19/24 20:56, Aaron Conole wrote:
>> Ilya Maximets  writes:
>> 
>>> Currently, ovs-vswitchd is subscribed to all the routing changes in the
>>> kernel.  On each change, it marks the internal routing table cache as
>>> invalid, then resets it and dumps all the routes from the kernel from
>>> scratch.  The reason for that is kernel routing updates not being
>>> reliable in a sense that it's hard to tell which route is getting
>>> removed or modified.  Userspace application has to track the order in
>>> which route entries are dumped from the kernel.  Updates can get lost
>>> or even duplicated and the kernel doesn't provide a good mechanism to
>>> distinguish one route from another.  To my knowledge, dumping all the
>>> routes from a kernel after each change is the only way to keep the
>>> cache consistent.  Some more info can be found in the following never
>>> addressed issues:
>>>   https://bugzilla.redhat.com/1337860
>>>   https://bugzilla.redhat.com/1337855
>>>
>>> It seems to be believed that NetworkManager "mostly" does incremental
>>> updates right.  But it is still not completely correct, will re-dump
>>> the whole table in certain cases, and it takes a huge amount of very
>>> complicated code to do the accounting and route comparisons.
>>>
>>> Going back to ovs-vswitchd, it currently dumps routes from all the
>>> routing tables.  If it will get conflicting routes from multiple
>>> tables, the cache will not be useful.  The routing cache in userspace
>>> is primarily used for checking the egress port for tunneled traffic
>>> and this way also detecting link state changes for a tunnel port.
>>> For userspace datapath it is used for actual routing of the packet
>>> after sending to a native tunnel.
>>> With kernel datapath we don't really have a mechanism to know which
>>> routing table will actually be used by the kernel after encapsulation,
>>> so our lookups on a cache may be incorrect because of this as well.
>>>
>>> So, unless all the relevant routes are in the standard tables, the
>>> lookup in userspace route cache is unreliable.
>>>
>>> Luckily, most setups are not using any complicated routing in
>>> non-standard tables that OVS has to be aware of.
>>>
>>> It is possible, but unlikely, that standard routing tables are
>>> completely empty while some other custom table is not, and all the OVS
>>> tunnel traffic is directed to that table.  That would be the only
>>> scenario where dumping non-standard tables would make sense.  But it
>>> seems like this kind of setup will likely need a way to tell OVS from
>>> which table the routes should be taken, or we'll need to dump routing
>>> rules and keep a separate cache for each table, so we can first match
>>> on rules and then lookup correct routes in a specific table.  I'm not
>>> sure if trying to implement all that is justified.
>>>
>>> For now, stop considering routes from non-standard tables to avoid
>>> mixing different tables together and also wasting CPU resources.
>>>
>>> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
>>> running on a same host and in a same network namespace with OVS using
>>> its own custom routing table.
>>>
>>> Unfortunately, there seems to be no way to tell the kernel to send
>>> updates only for particular tables.  So, we'll still receive and parse
>>> all of them.  But they will not result in a full cache invalidation in
>>> most cases.
>>>
>>> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
>>> So, we can make use of it and dump only standard tables when we get a
>>> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
>>> the socket for filtering to work.  There is no reason to not enable it
>>> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>>>
>>> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
>>> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
>>> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
>>> Reported-at: 
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>>  lib/netlink-protocol.h | 10 ++
>>>  lib/netlink-socket.c   |  8 +
>>>  lib/route-table

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with associated metering.

2024-03-19 Thread Aaron Conole

Ilya Maximets  writes:

> On 3/7/24 18:25, Aaron Conole wrote:
>> Open vSwitch supports the ability to invoke a controller action by way
>> of a sample action with a specified meter.  In the normal case, this
>> sample action is transparently generated during xlate processing.  However,
>> when executing via a continuation, the logic to generate the sample
>> action when finishing the context freeze was missing.  The result is that
>> the behavior when action is 'controller(pause,meter_id=1)' does not match
>> the behavior when action is 'controller(meter_id=1)'.
>> 
>> OVN and other controller solutions may rely on this metering to protect
>> the control path, so it is critical to preserve metering, whether we are
>> doing a plain old send to controller, or a continuation.
>> 
>> Fixes: 77ab5fd2a95b ("Implement serializing the state of packet traversal in 
>> "continuations".")
>> Reported-at: https://issues.redhat.com/browse/FDP-455
>> Tested-by: Alex Musil 
>> Signed-off-by: Aaron Conole 
>> ---
>>  ofproto/ofproto-dpif-xlate.c | 66 +++-
>>  tests/ofproto-dpif.at| 50 +++
>>  2 files changed, 84 insertions(+), 32 deletions(-)
>
> Thanks, Aaron!  The change seems corrct to me.
> See a couple of comments inline.
>
> Best regrads, Ilya Maximets.
>
>> 
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index 89f183182e..4f1e57e6d1 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -5080,10 +5080,37 @@ put_controller_user_action(struct xlate_ctx *ctx,
>> bool dont_send, bool continuation,
>> uint32_t recirc_id, int len,
>> enum ofp_packet_in_reason reason,
>> +   uint32_t provider_meter_id,
>> uint16_t controller_id)
>>  {
>>  struct user_action_cookie cookie;
>>  
>> +/* If the controller action didn't request a meter (indicated by a
>> + * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
>> + * configured through the "controller" virtual meter.
>> + *
>> + * Internally, ovs-vswitchd uses UINT32_MAX to indicate no meter is
>> + * configured. */
>> +uint32_t meter_id;
>> +if (provider_meter_id == UINT32_MAX) {
>> +meter_id = ctx->xbridge->ofproto->up.controller_meter_id;
>> +} else {
>> +meter_id = provider_meter_id;
>> +}
>> +
>> +size_t offset;
>> +size_t ac_offset;
>> +if (meter_id != UINT32_MAX) {
>> +/* If controller meter is configured, generate clone(meter,
>> + * userspace) action. */
>
> Might be better to keep the comment the way it were originally.
> I find it harder to read the action split in two lines.

Okay - I will put it back.  I think I split it because of the line
lengths, but maybe it would be better to put clone(meter,userspace) on
the second line.

>> +offset = nl_msg_start_nested(ctx->odp_actions, 
>> OVS_ACTION_ATTR_SAMPLE);
>> +nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
>> +   UINT32_MAX);
>> +ac_offset = nl_msg_start_nested(ctx->odp_actions,
>> +OVS_SAMPLE_ATTR_ACTIONS);
>> +nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_METER, meter_id);
>> +}
>> +
>>  memset(, 0, sizeof cookie);
>>  cookie.type = USER_ACTION_COOKIE_CONTROLLER;
>>  cookie.ofp_in_port = OFPP_NONE,
>> @@ -5101,6 +5128,11 @@ put_controller_user_action(struct xlate_ctx *ctx,
>>  uint32_t pid = dpif_port_get_pid(ctx->xbridge->dpif, odp_port);
>>  odp_put_userspace_action(pid, , sizeof cookie, ODPP_NONE,
>>   false, ctx->odp_actions, NULL);
>> +
>> +if (meter_id != UINT32_MAX) {
>> +nl_msg_end_nested(ctx->odp_actions, ac_offset);
>> +nl_msg_end_nested(ctx->odp_actions, offset);
>> +}
>>  }
>>  
>>  static void
>> @@ -5145,32 +5177,6 @@ xlate_controller_action(struct xlate_ctx *ctx, int 
>> len,
>>  }
>>  recirc_refs_add(>xout->recircs, recirc_id);
>>  
>> -/* If the controller action didn't request a meter (indicated by a
>> - * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
>> - * configured through the &quo

Re: [ovs-dev] [PATCH] route-table: Avoid routes from non-standard routing tables.

2024-03-19 Thread Aaron Conole

Ilya Maximets  writes:

> Currently, ovs-vswitchd is subscribed to all the routing changes in the
> kernel.  On each change, it marks the internal routing table cache as
> invalid, then resets it and dumps all the routes from the kernel from
> scratch.  The reason for that is kernel routing updates not being
> reliable in a sense that it's hard to tell which route is getting
> removed or modified.  Userspace application has to track the order in
> which route entries are dumped from the kernel.  Updates can get lost
> or even duplicated and the kernel doesn't provide a good mechanism to
> distinguish one route from another.  To my knowledge, dumping all the
> routes from a kernel after each change is the only way to keep the
> cache consistent.  Some more info can be found in the following never
> addressed issues:
>   https://bugzilla.redhat.com/1337860
>   https://bugzilla.redhat.com/1337855
>
> It seems to be believed that NetworkManager "mostly" does incremental
> updates right.  But it is still not completely correct, will re-dump
> the whole table in certain cases, and it takes a huge amount of very
> complicated code to do the accounting and route comparisons.
>
> Going back to ovs-vswitchd, it currently dumps routes from all the
> routing tables.  If it will get conflicting routes from multiple
> tables, the cache will not be useful.  The routing cache in userspace
> is primarily used for checking the egress port for tunneled traffic
> and this way also detecting link state changes for a tunnel port.
> For userspace datapath it is used for actual routing of the packet
> after sending to a native tunnel.
> With kernel datapath we don't really have a mechanism to know which
> routing table will actually be used by the kernel after encapsulation,
> so our lookups on a cache may be incorrect because of this as well.
>
> So, unless all the relevant routes are in the standard tables, the
> lookup in userspace route cache is unreliable.
>
> Luckily, most setups are not using any complicated routing in
> non-standard tables that OVS has to be aware of.
>
> It is possible, but unlikely, that standard routing tables are
> completely empty while some other custom table is not, and all the OVS
> tunnel traffic is directed to that table.  That would be the only
> scenario where dumping non-standard tables would make sense.  But it
> seems like this kind of setup will likely need a way to tell OVS from
> which table the routes should be taken, or we'll need to dump routing
> rules and keep a separate cache for each table, so we can first match
> on rules and then lookup correct routes in a specific table.  I'm not
> sure if trying to implement all that is justified.
>
> For now, stop considering routes from non-standard tables to avoid
> mixing different tables together and also wasting CPU resources.
>
> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
> running on a same host and in a same network namespace with OVS using
> its own custom routing table.
>
> Unfortunately, there seems to be no way to tell the kernel to send
> updates only for particular tables.  So, we'll still receive and parse
> all of them.  But they will not result in a full cache invalidation in
> most cases.
>
> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
> So, we can make use of it and dump only standard tables when we get a
> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
> the socket for filtering to work.  There is no reason to not enable it
> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>
> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netlink-protocol.h | 10 ++
>  lib/netlink-socket.c   |  8 +
>  lib/route-table.c  | 80 +-
>  tests/system-route.at  | 64 +
>  4 files changed, 146 insertions(+), 16 deletions(-)
>
> diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
> index 6eaa7035a..e4bb28ac9 100644
> --- a/lib/netlink-protocol.h
> +++ b/lib/netlink-protocol.h
> @@ -155,6 +155,11 @@ enum {
>  #define NLA_TYPE_MASK   ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
>  #endif
>  
> +/* Introduced in v4.4. */
> +#ifndef NLM_F_DUMP_FILTERED
> +#define NLM_F_DUMP_FILTERED 0x20
> +#endif
> +
>  /* These were introduced all together in 2.6.14.  (We want our programs to
>   * support the newer kernel features even if compiled with older headers.) */
>  #ifndef NETLINK_ADD_MEMBERSHIP
> @@ -168,6 +173,11 @@ enum {
>  #define NETLINK_LISTEN_ALL_NSID 8
>  #endif
>  
> +/* Strict checking of netlink arguments introduced in Linux kernel v4.20. */
> +#ifndef

Re: [ovs-dev] [PATCH] github: Reduce ASLR entropy to be compatible with asan in llvm 14.

2024-03-12 Thread Aaron Conole

Ilya Maximets  writes:

> Starting with image version 20240310.1.0, GitHub runners are using
> 32-bit entropy for ASLR:
>
>   $ sudo sysctl -a | grep vm.mmap.rnd
>   vm.mmap_rnd_bits = 32
>   vm.mmap_rnd_compat_bits = 16
>
> This breaks all the asan-enabled builds, because older asan gets
> confused by memory mappings and crashes with segmentation fault.
>
> The issue is fixed in newer releases of llvm:
>   
> https://github.com/llvm/llvm-project/commit/fb77ca05ffb4f8e666878f2f6718a9fb4d686839
>   https://reviews.llvm.org/D148280
>
> But these are not available in Ubuntu 22.04 image.
>
> This should be fixed by GitHub, but until new images are available
> reducing ASLR entropy manually to 28 bits to make builds work.
>
> Reported-at: https://github.com/actions/runner-images/issues/9491
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Aaron Conole 

We'll probably need something similar in other projects, too... What a
mess.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with associated metering.

2024-03-07 Thread Aaron Conole

Open vSwitch supports the ability to invoke a controller action by way
of a sample action with a specified meter.  In the normal case, this
sample action is transparently generated during xlate processing.  However,
when executing via a continuation, the logic to generate the sample
action when finishing the context freeze was missing.  The result is that
the behavior when action is 'controller(pause,meter_id=1)' does not match
the behavior when action is 'controller(meter_id=1)'.

OVN and other controller solutions may rely on this metering to protect
the control path, so it is critical to preserve metering, whether we are
doing a plain old send to controller, or a continuation.

Fixes: 77ab5fd2a95b ("Implement serializing the state of packet traversal in 
"continuations".")
Reported-at: https://issues.redhat.com/browse/FDP-455
Tested-by: Alex Musil 
Signed-off-by: Aaron Conole 
---
 ofproto/ofproto-dpif-xlate.c | 66 +++-
 tests/ofproto-dpif.at| 50 +++
 2 files changed, 84 insertions(+), 32 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 89f183182e..4f1e57e6d1 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5080,10 +5080,37 @@ put_controller_user_action(struct xlate_ctx *ctx,
bool dont_send, bool continuation,
uint32_t recirc_id, int len,
enum ofp_packet_in_reason reason,
+   uint32_t provider_meter_id,
uint16_t controller_id)
 {
 struct user_action_cookie cookie;
 
+/* If the controller action didn't request a meter (indicated by a
+ * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
+ * configured through the "controller" virtual meter.
+ *
+ * Internally, ovs-vswitchd uses UINT32_MAX to indicate no meter is
+ * configured. */
+uint32_t meter_id;
+if (provider_meter_id == UINT32_MAX) {
+meter_id = ctx->xbridge->ofproto->up.controller_meter_id;
+} else {
+meter_id = provider_meter_id;
+}
+
+size_t offset;
+size_t ac_offset;
+if (meter_id != UINT32_MAX) {
+/* If controller meter is configured, generate clone(meter,
+ * userspace) action. */
+offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
+   UINT32_MAX);
+ac_offset = nl_msg_start_nested(ctx->odp_actions,
+OVS_SAMPLE_ATTR_ACTIONS);
+nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_METER, meter_id);
+}
+
 memset(, 0, sizeof cookie);
 cookie.type = USER_ACTION_COOKIE_CONTROLLER;
 cookie.ofp_in_port = OFPP_NONE,
@@ -5101,6 +5128,11 @@ put_controller_user_action(struct xlate_ctx *ctx,
 uint32_t pid = dpif_port_get_pid(ctx->xbridge->dpif, odp_port);
 odp_put_userspace_action(pid, , sizeof cookie, ODPP_NONE,
  false, ctx->odp_actions, NULL);
+
+if (meter_id != UINT32_MAX) {
+nl_msg_end_nested(ctx->odp_actions, ac_offset);
+nl_msg_end_nested(ctx->odp_actions, offset);
+}
 }
 
 static void
@@ -5145,32 +5177,6 @@ xlate_controller_action(struct xlate_ctx *ctx, int len,
 }
 recirc_refs_add(>xout->recircs, recirc_id);
 
-/* If the controller action didn't request a meter (indicated by a
- * 'meter_id' argument other than NX_CTLR_NO_METER), see if one was
- * configured through the "controller" virtual meter.
- *
- * Internally, ovs-vswitchd uses UINT32_MAX to indicate no meter is
- * configured. */
-uint32_t meter_id;
-if (provider_meter_id == UINT32_MAX) {
-meter_id = ctx->xbridge->ofproto->up.controller_meter_id;
-} else {
-meter_id = provider_meter_id;
-}
-
-size_t offset;
-size_t ac_offset;
-if (meter_id != UINT32_MAX) {
-/* If controller meter is configured, generate clone(meter, userspace)
- * action. */
-offset = nl_msg_start_nested(ctx->odp_actions, OVS_ACTION_ATTR_SAMPLE);
-nl_msg_put_u32(ctx->odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
-   UINT32_MAX);
-ac_offset = nl_msg_start_nested(ctx->odp_actions,
-OVS_SAMPLE_ATTR_ACTIONS);
-nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_METER, meter_id);
-}
-
 /* Generate the datapath flows even if we don't send the packet-in
  * so that debugging more closely represents normal state. */
 bool dont_send = false;
@@ -5178,12 +5184,7 @@ xlate_controller_action(struct xlate_ctx *ctx, int len,
 dont_send = true;
 }
 put_controller_user_action(ctx, dont_send, false, re

Re: [ovs-dev] [PATCH v3] conntrack: Remove nat_conn introducing key directionality.

2024-03-06 Thread Aaron Conole

Simon Horman  writes:

> + Xavier
>
> On Thu, Aug 31, 2023 at 02:52:59PM -0400, Aaron Conole wrote:
>> Ilya Maximets  writes:
>> 
>> > On 8/31/23 09:15, Frode Nordahl wrote:
>> >> On Wed, Aug 30, 2023 at 9:30 PM Paolo Valerio  wrote:
>> >>>
>> >>> From: hepeng 
>> >>>
>> >>> The patch avoids the extra allocation for nat_conn.
>> >>> Currently, when doing NAT, the userspace conntrack will use an extra
>> >>> conn for the two directions in a flow. However, each conn has actually
>> >>> the two keys for both orig and rev directions. This patch introduces a
>> >>> key_node[CT_DIRS] member as per Aaron's suggestion in the conn which
>> >>> consists of a key, direction, and a cmap_node for hash lookup so
>> >>> addressing the feedback received by the original patch [0].
>> >>>
>> >>> [0]
>> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/
>> >>>
>> >>> Signed-off-by: Peng He 
>> >>> Co-authored-by: Paolo Valerio 
>> >>> Signed-off-by: Paolo Valerio 
>> >> 
>> >> Thanks alot for working on this, should we perhaps reference the
>> >> original bug report, i.e:
>> >> Reported-by: Michael Plato 
>> >> Reported-at: 
>> >> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-September/052065.html
>> >
>> > Can be added while applying, I think.  It also may be worth adding
>> > a sentence about fixing the assertion to the commit message.
>> 
>> Done.
>> 
>> >> 
>> >> We have a reproducer for the issue and we no longer see it occurring
>> >> with this patch.
>> >> Tested-by: Frode Nordahl 
>> >
>> > Thanks!
>> 
>> Thanks everyone!  I've applied and backported down to branch-3.0, and
>> will work on the backport to branch-2.17.
>
> Hi Aaron,
>
> while working on [1] I notice that this patch did not seem to be
> backported to branch-3.2. I will plan on doing so as part of
> my backports of [1].
>
> [1] [ovs-dev,v3] conntrack: Fix flush not flushing all elements.
> 
> https://patchwork.ozlabs.org/project/openvswitch/patch/20240304152159.1710977-1-xsimo...@redhat.com/

Strange - I would have thought I had applied it.  Glad to see this get
resolved.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v10 2/2] rhel: Enable USDT scripts by default in Fedora builds.

2024-03-05 Thread Aaron Conole

All supported versions of Fedora do package libbpf, so it
makes sense to enable USDT support.

Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
Signed-off-by: Aaron Conole 
---
 rhel/openvswitch-fedora.spec.in | 8 
 1 file changed, 8 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 5d24ebcda8..94b6d7431c 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -28,6 +28,8 @@
 %bcond_with dpdk
 # To disable AF_XDP support, specify '--without afxdp' when building
 %bcond_without afxdp
+# To control the USDT support
+%bcond_without usdt
 
 # If there is a need to automatically enable the package after installation,
 # specify the "--with autoenable"
@@ -77,6 +79,9 @@ Provides: %{name}-dpdk = %{version}-%{release}
 %if %{with afxdp}
 BuildRequires: libxdp-devel libbpf-devel numactl-devel
 %endif
+%if %{with usdt}
+BuildRequires: libbpf-devel systemtap-sdt-devel
+%endif
 BuildRequires: unbound unbound-devel
 
 Requires: openssl hostname iproute module-init-tools unbound
@@ -173,6 +178,9 @@ This package provides IPsec tunneling support for OVS 
tunnels.
 --enable-afxdp \
 %else
 --disable-afxdp \
+%endif
+%if %{with usdt}
+--enable-usdt-probes \
 %endif
 --enable-ssl \
 --disable-static \
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v10 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-03-05 Thread Aaron Conole

From: Kevin Sprague 

During normal operations, it is useful to understand when a particular flow
gets removed from the system. This can be useful when debugging performance
issues tied to ofproto flow changes, trying to determine deployed traffic
patterns, or while debugging dynamic systems where ports come and go.

Prior to this change, there was a lack of visibility around flow expiration.
The existing debugging infrastructure could tell us when a flow was added to
the datapath, but not when it was removed or why.

This change introduces a USDT probe at the point where the revalidator
determines that the flow should be removed.  Additionally, we track the
reason for the flow eviction and provide that information as well.  With
this change, we can track the complete flow lifecycle for the netlink
datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
the revalidator USDT, letting us watch as flows are added and removed from
the kernel datapath.

This change only enables this information via USDT probe, so it won't be
possible to access this information any other way (see:
Documentation/topics/usdt-probes.rst).

Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
which serves as a demonstration of how the new USDT probe might be used
going forward.

Acked-by: Han Zhou 
Signed-off-by: Kevin Sprague 
Co-authored-by: Aaron Conole 
Signed-off-by: Aaron Conole 
---
v8 -> v9: Reorganized the flow delete reasons enum
  Updated flow_reval_monitor to use pahole to extract fields
  Added the purge reason with a proper USDT point
  Updated documentation
  Dropped all the outstanding ACKs

v9 -> v10: Reorder the usdt arguments
   Rewrite delete comment
   Added python docstrings to functions
   Use ternary assignment
   Clean up example output comment
   Capitolize 'F' in Flow
   Snip out the comments including help text
   Shrink try / except blocks for the USDT probes
   refactor the print()/sys.exit() into a single call that uses
   sys.stderr
   Re-run black & flake8


 Documentation/topics/usdt-probes.rst |  43 +
 ofproto/ofproto-dpif-upcall.c|  44 +-
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 978 +++
 4 files changed, 1062 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

diff --git a/Documentation/topics/usdt-probes.rst 
b/Documentation/topics/usdt-probes.rst
index e527f43bab..b9a6c54b29 100644
--- a/Documentation/topics/usdt-probes.rst
+++ b/Documentation/topics/usdt-probes.rst
@@ -214,8 +214,10 @@ Available probes in ``ovs_vswitchd``:
 - dpif_recv:recv_upcall
 - main:poll_block
 - main:run_start
+- revalidate:flow_result
 - revalidate_ukey\_\_:entry
 - revalidate_ukey\_\_:exit
+- revalidator_sweep\_\_:flow_result
 - udpif_revalidator:start_dump
 - udpif_revalidator:sweep_done
 
@@ -443,6 +445,47 @@ sweep phase was completed.
 - ``utilities/usdt-scripts/reval_monitor.py``
 
 
+probe revalidate:flow_result
+
+
+**Description**:
+This probe is triggered when the revalidator has executed on a particular
+flow key to make a determination whether to evict a flow, and the cause
+for eviction.  The revalidator runs periodically, and this probe will only
+be triggered when a flow is flagged for revalidation.
+
+**Arguments**:
+
+- *arg0*: ``(struct udpif *) udpif``
+- *arg1*: ``(struct udpif_key *) ukey``
+- *arg2*: ``(enum reval_result) result``
+- *arg3*: ``(enum flow_del_reason) del_reason``
+
+**Script references**:
+
+- ``utilities/usdt-scripts/flow_reval_monitor.py``
+
+
+probe revalidator_sweep\_\_:flow_result
+~~~
+
+**Description**:
+This probe is placed in the path of the revalidator sweep, and is executed
+under the condition that a flow entry is in an unexpected state, or the
+flows were asked to be purged due to a user action.
+
+**Arguments**:
+
+- *arg0*: ``(struct udpif *) udpif``
+- *arg1*: ``(struct udpif_key *) ukey``
+- *arg2*: ``(enum reval_result) result``
+- *arg3*: ``(enum flow_del_reason) del_reason``
+
+**Script references**:
+
+- ``utilities/usdt-scripts/flow_reval_monitor.py``
+
+
 Adding your own probes
 --
 
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 9a5c5c29ce..d881956366 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -269,6 +269,20 @@ enum ukey_state {
 };
 #define N_UKEY_STATES (UKEY_DELETED + 1)
 
+enum flow_del_reason {
+FDR_NONE = 0,   /* No deletion reason for the flow. */
+FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
+FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
+FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
+FDR_FLOW_LIMIT, /*

[ovs-dev] [PATCH v10 0/2] debugging: Add a revalidator probe, and monitor script

2024-03-05 Thread Aaron Conole

Resurrecting a feature from 2022, introduce a probe that indicates
why a particular flow may be selected for eviction during revalidation
and includes the flow information.

The second patch tells fedora builds to include the USDT probe support
on Fedora systems.

Aaron Conole (1):
  rhel: Enable USDT scripts by default in Fedora builds.

Kevin Sprague (1):
  revalidator: Add a USDT probe during flow deletion with purge reason.

 Documentation/topics/usdt-probes.rst |  43 +
 ofproto/ofproto-dpif-upcall.c|  44 +-
 rhel/openvswitch-fedora.spec.in  |   8 +
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 978 +++
 5 files changed, 1070 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v9 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-03-04 Thread Aaron Conole

Eelco Chaudron  writes:

> On 20 Feb 2024, at 22:47, Aaron Conole wrote:
>
>> From: Kevin Sprague 
>>
>> During normal operations, it is useful to understand when a particular flow
>> gets removed from the system. This can be useful when debugging performance
>> issues tied to ofproto flow changes, trying to determine deployed traffic
>> patterns, or while debugging dynamic systems where ports come and go.
>>
>> Prior to this change, there was a lack of visibility around flow expiration.
>> The existing debugging infrastructure could tell us when a flow was added to
>> the datapath, but not when it was removed or why.
>>
>> This change introduces a USDT probe at the point where the revalidator
>> determines that the flow should be removed.  Additionally, we track the
>> reason for the flow eviction and provide that information as well.  With
>> this change, we can track the complete flow lifecycle for the netlink
>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
>> the revalidator USDT, letting us watch as flows are added and removed from
>> the kernel datapath.
>>
>> This change only enables this information via USDT probe, so it won't be
>> possible to access this information any other way (see:
>> Documentation/topics/usdt-probes.rst).
>>
>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>> which serves as a demonstration of how the new USDT probe might be used
>> going forward.
>>
>> Acked-by: Han Zhou 
>> Signed-off-by: Kevin Sprague 
>> Co-authored-by: Aaron Conole 
>> Signed-off-by: Aaron Conole 
>
> Thanks for doing the v9, some small comments remain below.
>
> Cheers,
>
> Eelco
>
>> ---
>> v8 -> v9: Reorganized the flow delete reasons enum
>>   Updated flow_reval_monitor to use pahole to extract fields
>>   Added the purge reason with a proper USDT point
>>   Updated documentation
>>   Dropped all the outstanding ACKs
>>
>>  Documentation/topics/usdt-probes.rst |  43 +
>>  ofproto/ofproto-dpif-upcall.c|  48 +-
>>  utilities/automake.mk|   3 +
>>  utilities/usdt-scripts/flow_reval_monitor.py | 997 +++
>>  4 files changed, 1085 insertions(+), 6 deletions(-)
>>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>
>> diff --git a/Documentation/topics/usdt-probes.rst 
>> b/Documentation/topics/usdt-probes.rst
>> index e527f43bab..015614a6b8 100644
>> --- a/Documentation/topics/usdt-probes.rst
>> +++ b/Documentation/topics/usdt-probes.rst
>> @@ -214,8 +214,10 @@ Available probes in ``ovs_vswitchd``:
>>  - dpif_recv:recv_upcall
>>  - main:poll_block
>>  - main:run_start
>> +- revalidate:flow_result
>>  - revalidate_ukey\_\_:entry
>>  - revalidate_ukey\_\_:exit
>> +- revalidator_sweep\_\_:flow_result
>>  - udpif_revalidator:start_dump
>>  - udpif_revalidator:sweep_done
>>
>> @@ -443,6 +445,47 @@ sweep phase was completed.
>>  - ``utilities/usdt-scripts/reval_monitor.py``
>>
>>
>> +probe revalidate:flow_result
>> +
>> +
>> +**Description**:
>> +This probe is triggered when the revalidator has executed on a particular
>> +flow key to make a determination whether to evict a flow, and the cause
>> +for eviction.  The revalidator runs periodically, and this probe will only
>> +be triggered when a flow is flagged for revalidation.
>> +
>> +**Arguments**:
>> +
>> +- *arg0*: ``(enum reval_result) result``
>> +- *arg1*: ``(enum flow_del_reason) reason``
>
> nit: variable name changed, so should be del_reason.

Good catch, I'll update.

>> +- *arg2*: ``(struct udpif *) udpif``
>> +- *arg3*: ``(struct udpif_key *) ukey``
>> +
>
> I think you missed my previous comment on re-ordering the arguments to
> be more inline with existing probes, i.e.:
>
> +OVS_USDT_PROBE(revalidator_sweep__, flow_result, udpif, ukey,
> +   result, del_reason);

Guess so.  I'll fix it.

>> +**Script references**:
>> +
>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>> +
>> +
>> +probe revalidator_sweep\_\_:flow_result
>> +~~~
>> +
>> +**Description**:
>> +This probe is placed in the path of the revalidator sweep, and is executed
>> +under the condition that a flow entry is in an unexpected state, or the
>> +flows were asked to be purged due to a user action.

Re: [ovs-dev] [RFC 3/7] selftests: openvswitch: use non-graceful kills when needed

2024-02-21 Thread Aaron Conole

Adrian Moreno  writes:

> On 2/16/24 16:28, Aaron Conole wrote:
>> Normally a spawned process under OVS is given a SIGTERM when the test
>> ends as part of cleanup.  However, in case the process is still lingering
>> for some reason, we also send a SIGKILL to force it down faster.
>> Signed-off-by: Aaron Conole 
>> ---
>>   tools/testing/selftests/net/openvswitch/openvswitch.sh | 6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> index a5dbde482ba4..678a72ad47c1 100755
>> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> @@ -91,7 +91,8 @@ ovs_add_if () {
>>  python3 $ovs_base/ovs-dpctl.py add-if \
>>  -u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err &
>>  pid=$!
>> -on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null"
>> +on_exit "ovs_sbx $1 kill --timeout 1000 TERM \
>> +--timeout 1000 KILL $pid 
>> 2>/dev/null"
>>  fi
>>   }
>>   
>
> AFAIK, this will immediately send TERM, then wait 1s, send TERM again,
> wait 1s then send KILL. Is that what you intended? To avoid the double
> TERM you could:

Okay, I'll adjust it.

> --
> Adrián Moreno
>
>> @@ -108,7 +109,8 @@ ovs_netns_spawn_daemon() {
>>  info "spawning cmd: $*"
>>  ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
>>  pid=$!
>> -ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
>> +ovs_sbx "$sbx" on_exit "kill --timeout 1000 TERM \
>> +--timeout 1000 KILL $pid 2>/dev/null"
>>   }
>> ovs_add_netns_and_veths () {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-21 Thread Aaron Conole

Adrian Moreno  writes:

> On 2/20/24 19:06, Aaron Conole wrote:
>> Eelco Chaudron  writes:
>> 
>>> On 19 Feb 2024, at 19:57, Aaron Conole wrote:
>>>
>>>> Eelco Chaudron  writes:
>>>>
>>>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>>>>>
>>>>>> Aaron Conole  writes:
>>>>>>
>>>>>>> Eelco Chaudron  writes:
>>>>>>>
>>>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>>>>>
>>>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>>>>>
>>>>>>>>>>> Eelco Chaudron  writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: Kevin Sprague 
>>>>>>>>>>>>>
>>>>>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>>>>>> particular flow
>>>>>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>>>>>> traffic
>>>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come and 
>>>>>>>>>>>>> go.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>>>>>> expiration.
>>>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow 
>>>>>>>>>>>>> was added to
>>>>>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change introduces a USDT probe at the point where the 
>>>>>>>>>>>>> revalidator
>>>>>>>>>>>>> determines that the flow should be removed.  Additionally, we 
>>>>>>>>>>>>> track the
>>>>>>>>>>>>> reason for the flow eviction and provide that information as 
>>>>>>>>>>>>> well.  With
>>>>>>>>>>>>> this change, we can track the complete flow lifecycle for the 
>>>>>>>>>>>>> netlink
>>>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>>>>>> USDT, and
>>>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and 
>>>>>>>>>>>>> removed from
>>>>>>>>>>>>> the kernel datapath.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change only enables this information via USDT probe, so it 
>>>>>>>>>>>>> won't be
>>>>>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also included is a script 
>>>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might 
>>>>>>>>>>>>> be used
>>>>>>>>>>>>> going forward.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Kevin Sprague 
>>>>>>>>>>>>> Co-authored-by: Aaron Conole 
>>>>>>>>>>>>> Signed-off-by: Aaron Conole 
>>>>>>>>>>>>
&g

Re: [ovs-dev] [PATCH v1] ovs-tcpdump: Cleanup mirror failed with twice fatal signals

2024-02-21 Thread Aaron Conole

Daniel Ding  writes:

> After running ovs-tcpdump and inputs multiple CTRL+C, the program will
> raise the following exception.
>
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>   File "/usr/bin/ovs-tcpdump", line 421, in cleanup_mirror
> ovsdb = OVSDB(db_sock)
>   File "/usr/bin/ovs-tcpdump", line 168, in __init__
> OVSDB.wait_for_db_change(self._idl_conn)  # Initial Sync with DB
>   File "/usr/bin/ovs-tcpdump", line 155, in wait_for_db_change
> while idl.change_seqno == seq and not idl.run():
>
> Signed-off-by: Daniel Ding 
> ---

LGTM for the linux side - maybe Alin might check the windows side.

When you post v2 you can keep my

Reviewed-by: Aaron Conole 

>  utilities/ovs-tcpdump.in | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
> index 4cbd9a5d3..eec2262d6 100755
> --- a/utilities/ovs-tcpdump.in
> +++ b/utilities/ovs-tcpdump.in
> @@ -17,6 +17,7 @@
>  import os
>  import pwd
>  from random import randint
> +import signal
>  import subprocess
>  import sys
>  import time
> @@ -417,8 +418,22 @@ def py_which(executable):
> for path in os.environ["PATH"].split(os.pathsep))
>  
>  
> +def ignore_fatal_signals():
> +if sys.platform == "win32":
> +signals = [signal.SIGTERM, signal.SIGINT]
> +else:
> +signals = [signal.SIGTERM, signal.SIGINT, signal.SIGHUP,
> +   signal.SIGALRM]
> +
> +for signr in signals:
> +signal.signal(signr, signal.SIG_IGN)
> +
> +

Just a nit, but it might be clearer to put the ignore_fatal_signals as a
function scoped in teardown below.  That way it is a bit clearer that
this will only be called to turn off these in the double failure case.

>  def teardown(db_sock, interface, mirror_interface, tap_created):
>  def cleanup_mirror():
> +# Ignore fatal signals, avoid it to break OVSDB operations.
> +# So that cleanup mirror and ports be finished.
> +ignore_fatal_signals()
>  try:
>  ovsdb = OVSDB(db_sock)
>  ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Aaron Conole

Mike Pattrick  writes:

> On Tue, Feb 20, 2024 at 8:56 PM Mike Pattrick  wrote:
>>
>> This patch adopts the proposed RFC 6935 by allowing null UDP checksums
>> even if the tunnel protocol is IPv6. This is already supported by Linux
>> through the udp6zerocsumtx tunnel option. It is disabled by default and
>> IPv6 tunnels are flagged as requiring a checksum, but this patch enables
>> the user to set csum=false on IPv6 tunnels.
>>
>> Signed-off-by: Mike Pattrick 
>
> One of the github CI runners failed this in test "bfd - bfd decay". I
> believe this is a false negative.

In the future, you can send a recheck request (I included one with my
previous comment) to re-run and clear these "flaky" tests.

> -M
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] userspace: Allow UDP zero checksum with IPv6 tunnels.

2024-02-21 Thread Aaron Conole

Mike Pattrick  writes:

> This patch adopts the proposed RFC 6935 by allowing null UDP checksums
> even if the tunnel protocol is IPv6. This is already supported by Linux
> through the udp6zerocsumtx tunnel option. It is disabled by default and
> IPv6 tunnels are flagged as requiring a checksum, but this patch enables
> the user to set csum=false on IPv6 tunnels.
>
> Signed-off-by: Mike Pattrick 
> ---

Recheck-request: github-robot

>  lib/netdev-native-tnl.c |  2 +-
>  lib/netdev-vport.c  | 13 +++--
>  lib/netdev.h|  9 -
>  ofproto/tunnel.c| 11 +--
>  tests/tunnel.at |  6 +++---
>  vswitchd/vswitch.xml|  8 +---
>  6 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index dee9ab344..e8258bc4e 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config 
> *tnl_cfg,
>  udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
>  udp->udp_dst = tnl_cfg->dst_port;
>  
> -if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
> +if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {

Previously, this encap csum flag would be ignored for IPv6 tunnels, but
now it will be honored for them.  It should be noted in the NEWS file,
since it is user impacting.

>  /* Write a value in now to mark that we should compute the checksum
>   * later. 0x is handy because it is transparent to the
>   * calculation. */
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 60caa02fb..f9a778988 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
> *args, char **errp)
>  tnl_cfg.dst_port = htons(atoi(node->value));
>  } else if (!strcmp(node->key, "csum") && has_csum) {
>  if (!strcmp(node->value, "true")) {
> -tnl_cfg.csum = true;
> +tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
> +} else if (!strcmp(node->value, "false")) {
> +tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
>  }
>  } else if (!strcmp(node->key, "seq") && has_seq) {
>  if (!strcmp(node->value, "true")) {
> @@ -850,6 +852,11 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
> *args, char **errp)
>  }
>  }
>  
> +/* The default csum state for GRE is special. */
> +if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT && strstr(type, "gre")) {
> +tnl_cfg.csum = NETDEV_TNL_CSUM_DEFAULT_GRE;
> +}
> +
>  enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
>  const char *full_type = (strcmp(type, "vxlan") ? type
>   : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
> @@ -1026,8 +1033,10 @@ get_tunnel_config(const struct netdev *dev, struct 
> smap *args)
>  }
>  }
>  
> -if (tnl_cfg->csum) {
> +if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
>  smap_add(args, "csum", "true");
> +} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
> +smap_add(args, "csum", "false");
>  }
>  
>  if (tnl_cfg->set_seq) {
> diff --git a/lib/netdev.h b/lib/netdev.h
> index 67a8486bd..a79531e6d 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -111,6 +111,13 @@ enum netdev_srv6_flowlabel {
>  SRV6_FLOWLABEL_COMPUTE,
>  };
>  
> +enum netdev_tnl_csum {
> +NETDEV_TNL_CSUM_DEFAULT,
> +NETDEV_TNL_CSUM_ENABLED,
> +NETDEV_TNL_CSUM_DISABLED,
> +NETDEV_TNL_CSUM_DEFAULT_GRE,
> +};
> +
>  /* Configuration specific to tunnels. */
>  struct netdev_tunnel_config {
>  ovs_be64 in_key;
> @@ -139,7 +146,7 @@ struct netdev_tunnel_config {
>  uint8_t tos;
>  bool tos_inherit;
>  
> -bool csum;
> +enum netdev_tnl_csum csum;
>  bool dont_fragment;
>  enum netdev_pt_mode pt_mode;
>  
> diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
> index 80ddee78a..6f462874e 100644
> --- a/ofproto/tunnel.c
> +++ b/ofproto/tunnel.c
> @@ -465,9 +465,14 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
> flow *flow,
>  
>  flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
>  flow->tunnel.flags |= (cfg->dont_fragment ? FLOW_TNL_F_DONT_FRAGMENT : 0)
> -| (cfg->csum ? FLOW_TNL_F_CSUM : 0)
>  | (cfg->out_key_present ? FLOW_TNL_F_KEY : 0);
>  
> +if (cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
> +flow->tunnel.flags |= FLOW_TNL_F_CSUM;
> +} else if (cfg->csum == NETDEV_TNL_CSUM_DEFAULT && !flow->tunnel.ip_dst) 
> {
> +flow->tunnel.flags |= FLOW_TNL_F_CSUM;
> +}
> +
>  if (cfg->set_egress_pkt_mark) {
>  flow->pkt_mark = cfg->egress_pkt_mark;
>  wc->masks.pkt_mark = UINT32_MAX;
> @@ -706,8 +711,10 @@ tnl_port_format(const struct tnl_port *tnl_port, struct 
> ds *ds)
>  ds_put_cstr(ds,

[ovs-dev] [PATCH v9 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-20 Thread Aaron Conole

From: Kevin Sprague 

During normal operations, it is useful to understand when a particular flow
gets removed from the system. This can be useful when debugging performance
issues tied to ofproto flow changes, trying to determine deployed traffic
patterns, or while debugging dynamic systems where ports come and go.

Prior to this change, there was a lack of visibility around flow expiration.
The existing debugging infrastructure could tell us when a flow was added to
the datapath, but not when it was removed or why.

This change introduces a USDT probe at the point where the revalidator
determines that the flow should be removed.  Additionally, we track the
reason for the flow eviction and provide that information as well.  With
this change, we can track the complete flow lifecycle for the netlink
datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
the revalidator USDT, letting us watch as flows are added and removed from
the kernel datapath.

This change only enables this information via USDT probe, so it won't be
possible to access this information any other way (see:
Documentation/topics/usdt-probes.rst).

Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
which serves as a demonstration of how the new USDT probe might be used
going forward.

Acked-by: Han Zhou 
Signed-off-by: Kevin Sprague 
Co-authored-by: Aaron Conole 
Signed-off-by: Aaron Conole 
---
v8 -> v9: Reorganized the flow delete reasons enum
  Updated flow_reval_monitor to use pahole to extract fields
  Added the purge reason with a proper USDT point
  Updated documentation
  Dropped all the outstanding ACKs

 Documentation/topics/usdt-probes.rst |  43 +
 ofproto/ofproto-dpif-upcall.c|  48 +-
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 997 +++
 4 files changed, 1085 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

diff --git a/Documentation/topics/usdt-probes.rst 
b/Documentation/topics/usdt-probes.rst
index e527f43bab..015614a6b8 100644
--- a/Documentation/topics/usdt-probes.rst
+++ b/Documentation/topics/usdt-probes.rst
@@ -214,8 +214,10 @@ Available probes in ``ovs_vswitchd``:
 - dpif_recv:recv_upcall
 - main:poll_block
 - main:run_start
+- revalidate:flow_result
 - revalidate_ukey\_\_:entry
 - revalidate_ukey\_\_:exit
+- revalidator_sweep\_\_:flow_result
 - udpif_revalidator:start_dump
 - udpif_revalidator:sweep_done
 
@@ -443,6 +445,47 @@ sweep phase was completed.
 - ``utilities/usdt-scripts/reval_monitor.py``
 
 
+probe revalidate:flow_result
+
+
+**Description**:
+This probe is triggered when the revalidator has executed on a particular
+flow key to make a determination whether to evict a flow, and the cause
+for eviction.  The revalidator runs periodically, and this probe will only
+be triggered when a flow is flagged for revalidation.
+
+**Arguments**:
+
+- *arg0*: ``(enum reval_result) result``
+- *arg1*: ``(enum flow_del_reason) reason``
+- *arg2*: ``(struct udpif *) udpif``
+- *arg3*: ``(struct udpif_key *) ukey``
+
+**Script references**:
+
+- ``utilities/usdt-scripts/flow_reval_monitor.py``
+
+
+probe revalidator_sweep\_\_:flow_result
+~~~
+
+**Description**:
+This probe is placed in the path of the revalidator sweep, and is executed
+under the condition that a flow entry is in an unexpected state, or the
+flows were asked to be purged due to a user action.
+
+**Arguments**:
+
+- *arg0*: ``(enum reval_result) result``
+- *arg1*: ``(enum flow_del_reason) reason``
+- *arg2*: ``(struct udpif *) udpif``
+- *arg3*: ``(struct udpif_key *) ukey``
+
+**Script references**:
+
+- ``utilities/usdt-scripts/flow_reval_monitor.py``
+
+
 Adding your own probes
 --
 
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b5cbeed878..fbc7858690 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -269,6 +269,20 @@ enum ukey_state {
 };
 #define N_UKEY_STATES (UKEY_DELETED + 1)
 
+enum flow_del_reason {
+FDR_NONE = 0,   /* No deletion reason for the flow. */
+FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
+FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
+FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
+FDR_FLOW_LIMIT, /* All flows being killed. */
+FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask. */
+FDR_NO_OFPROTO, /* The flow didn't have an associated ofproto. */
+FDR_PURGE,  /* User action caused flows to be killed. */
+FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate. */
+FDR_UPDATE_FAIL,/* Flow state transition was unexpected. */
+FDR_XLATION_ERROR,  /* There was an error translating the f

[ovs-dev] [PATCH v9 2/2] rhel: Enable USDT scripts by default in Fedora builds.

2024-02-20 Thread Aaron Conole

All supported versions of Fedora do package libbpf, so it
makes sense to enable USDT support.

Acked-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 rhel/openvswitch-fedora.spec.in | 8 
 1 file changed, 8 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 5d24ebcda8..94b6d7431c 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -28,6 +28,8 @@
 %bcond_with dpdk
 # To disable AF_XDP support, specify '--without afxdp' when building
 %bcond_without afxdp
+# To control the USDT support
+%bcond_without usdt
 
 # If there is a need to automatically enable the package after installation,
 # specify the "--with autoenable"
@@ -77,6 +79,9 @@ Provides: %{name}-dpdk = %{version}-%{release}
 %if %{with afxdp}
 BuildRequires: libxdp-devel libbpf-devel numactl-devel
 %endif
+%if %{with usdt}
+BuildRequires: libbpf-devel systemtap-sdt-devel
+%endif
 BuildRequires: unbound unbound-devel
 
 Requires: openssl hostname iproute module-init-tools unbound
@@ -173,6 +178,9 @@ This package provides IPsec tunneling support for OVS 
tunnels.
 --enable-afxdp \
 %else
 --disable-afxdp \
+%endif
+%if %{with usdt}
+--enable-usdt-probes \
 %endif
 --enable-ssl \
 --disable-static \
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v9 0/2] debugging: Add a revalidator probe, and monitor script

2024-02-20 Thread Aaron Conole

Resurrecting a feature from 2022, introduce a probe that indicates
why a particular flow may be selected for eviction during revalidation
and includes the flow information.

The second patch tells fedora builds to include the USDT probe support
on Fedora systems.

Aaron Conole (1):
  rhel: Enable USDT scripts by default in Fedora builds.

Kevin Sprague (1):
  revalidator: Add a USDT probe during flow deletion with purge reason.

 Documentation/topics/usdt-probes.rst |  43 +
 ofproto/ofproto-dpif-upcall.c|  48 +-
 rhel/openvswitch-fedora.spec.in  |   8 +
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 997 +++
 5 files changed, 1093 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-20 Thread Aaron Conole

Eelco Chaudron  writes:

> On 19 Feb 2024, at 19:57, Aaron Conole wrote:
>
>> Eelco Chaudron  writes:
>>
>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>>>
>>>> Aaron Conole  writes:
>>>>
>>>>> Eelco Chaudron  writes:
>>>>>
>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>>>
>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>>>
>>>>>>>>> Eelco Chaudron  writes:
>>>>>>>>>
>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>>>
>>>>>>>>>>> From: Kevin Sprague 
>>>>>>>>>>>
>>>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>>>> particular flow
>>>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>>>> performance
>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>>>> traffic
>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come and 
>>>>>>>>>>> go.
>>>>>>>>>>>
>>>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>>>> expiration.
>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>>>>>>> added to
>>>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>>>
>>>>>>>>>>> This change introduces a USDT probe at the point where the 
>>>>>>>>>>> revalidator
>>>>>>>>>>> determines that the flow should be removed.  Additionally, we track 
>>>>>>>>>>> the
>>>>>>>>>>> reason for the flow eviction and provide that information as well.  
>>>>>>>>>>> With
>>>>>>>>>>> this change, we can track the complete flow lifecycle for the 
>>>>>>>>>>> netlink
>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>>>> USDT, and
>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and 
>>>>>>>>>>> removed from
>>>>>>>>>>> the kernel datapath.
>>>>>>>>>>>
>>>>>>>>>>> This change only enables this information via USDT probe, so it 
>>>>>>>>>>> won't be
>>>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>>>
>>>>>>>>>>> Also included is a script 
>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might be 
>>>>>>>>>>> used
>>>>>>>>>>> going forward.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Kevin Sprague 
>>>>>>>>>>> Co-authored-by: Aaron Conole 
>>>>>>>>>>> Signed-off-by: Aaron Conole 
>>>>>>>>>>
>>>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>>>> below. I have no additional comments on patch 2.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Eelco
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>   Documentation/topics/usdt-probes.rst |   1 +
>>>>>>>>>>>

Re: [ovs-dev] [RFC 0/7] selftests: openvswitch: cleanups for running as selftests

2024-02-20 Thread Aaron Conole

Jakub Kicinski  writes:

> On Fri, 16 Feb 2024 10:28:39 -0500 Aaron Conole wrote:
>> The series is a host of cleanups to the openvswitch selftest suite
>> which should be ready to run under the netdev selftest runners using
>> vng.  For now, the testing has been done with RW directories, but
>> additional testing will be done to try and keep it all as RO to be
>> more friendly.
>
> Would it be an option to make the output go into a dir in /tmp/ 
> instead of in place, in the tree?
>
>   mktemp -p /tmp/ -d ovs-test.X

That's probably the best approach.  I'll switch to it.

> or such?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-19 Thread Aaron Conole

Eelco Chaudron  writes:

> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>
>> Aaron Conole  writes:
>>
>>> Eelco Chaudron  writes:
>>>
>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>
>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>
>>>>>>> Eelco Chaudron  writes:
>>>>>>>
>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>
>>>>>>>>> From: Kevin Sprague 
>>>>>>>>>
>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>> particular flow
>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>> performance
>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>> traffic
>>>>>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>>>>>
>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>> expiration.
>>>>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>>>>> added to
>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>
>>>>>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>>>>>> determines that the flow should be removed.  Additionally, we track 
>>>>>>>>> the
>>>>>>>>> reason for the flow eviction and provide that information as well.  
>>>>>>>>> With
>>>>>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>> USDT, and
>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and removed 
>>>>>>>>> from
>>>>>>>>> the kernel datapath.
>>>>>>>>>
>>>>>>>>> This change only enables this information via USDT probe, so it won't 
>>>>>>>>> be
>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>
>>>>>>>>> Also included is a script 
>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>> which serves as a demonstration of how the new USDT probe might be 
>>>>>>>>> used
>>>>>>>>> going forward.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kevin Sprague 
>>>>>>>>> Co-authored-by: Aaron Conole 
>>>>>>>>> Signed-off-by: Aaron Conole 
>>>>>>>>
>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>> below. I have no additional comments on patch 2.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Eelco
>>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>   Documentation/topics/usdt-probes.rst |   1 +
>>>>>>>>>   ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>>>>>>   utilities/automake.mk|   3 +
>>>>>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 
>>>>>>>>> +++
>>>>>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>>>
>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>>>

Re: [ovs-dev] [PATCH v3 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-19 Thread Aaron Conole

Paolo Valerio  writes:

> The patch, when 'persistent' flag is specified, makes the IP selection
> in a range persistent across reboots.
>
> Signed-off-by: Paolo Valerio 
> Acked-by: Simon Horman 
> ---

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] conntrack: Handle random selection for port ranges.

2024-02-19 Thread Aaron Conole

Paolo Valerio  writes:

> The userspace conntrack only supported hash for port selection.
> With the patch, both userspace and kernel datapath support the random
> flag.
>
> The default behavior remains the same, that is, if no flags are
> specified, hash is selected.
>
> Signed-off-by: Paolo Valerio 
> Acked-by: Simon Horman 
> ---

Acked-by: Aaron Conole 

>  Documentation/ref/ovs-actions.7.rst |  3 +--
>  NEWS|  3 +++
>  lib/conntrack.c | 15 ---
>  lib/conntrack.h |  5 +
>  lib/dpif-netdev.c   |  4 +++-
>  5 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/ref/ovs-actions.7.rst 
> b/Documentation/ref/ovs-actions.7.rst
> index 36adcc5db..80acd9070 100644
> --- a/Documentation/ref/ovs-actions.7.rst
> +++ b/Documentation/ref/ovs-actions.7.rst
> @@ -1551,8 +1551,7 @@ following arguments:
>  should be selected. When a port range is specified, fallback to
>  ephemeral ports does not happen, else, it will.  The port number
>  selection can be informed by the optional ``random`` and ``hash`` 
> flags
> -described below.  The userspace datapath only supports the ``hash``
> -behavior.
> +described below.
>  
>  The optional *flags* are:
>  
> diff --git a/NEWS b/NEWS
> index a6617546c..93046b963 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,8 @@
>  Post-v3.3.0
>  
> +   - Userspace datapath:
> + * Conntrack now supports 'random' flag for selecting ports in a range
> +   while natting.
>  
>  
>  v3.3.0 - xx xxx 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 013709bd6..e09ecdf33 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -,7 +,7 @@ nat_range_hash(const struct conn_key *key, uint32_t 
> basis,
>  /* Ports are stored in host byte order for convenience. */
>  static void
>  set_sport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
> -uint32_t hash, uint16_t *curr, uint16_t *min,
> +uint32_t off, uint16_t *curr, uint16_t *min,
>  uint16_t *max)
>  {
>  if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
> @@ -2241,19 +2241,19 @@ set_sport_range(const struct nat_action_info_t *ni, 
> const struct conn_key *k,
>  } else {
>  *min = ni->min_port;
>  *max = ni->max_port;
> -*curr = *min + (hash % ((*max - *min) + 1));
> +*curr =  *min + (off % ((*max - *min) + 1));
>  }
>  }
>  
>  static void
>  set_dport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
> -uint32_t hash, uint16_t *curr, uint16_t *min,
> +uint32_t off, uint16_t *curr, uint16_t *min,
>  uint16_t *max)
>  {
>  if (ni->nat_action & NAT_ACTION_DST_PORT) {
>  *min = ni->min_port;
>  *max = ni->max_port;
> -*curr = *min + (hash % ((*max - *min) + 1));
> +*curr = *min + (off % ((*max - *min) + 1));
>  } else {
>  *curr = ntohs(k->dst.port);
>  *min = *max = *curr;
> @@ -2388,18 +2388,19 @@ nat_get_unique_tuple(struct conntrack *ct, struct 
> conn *conn,
>   fwd_key->nw_proto == IPPROTO_SCTP;
>  uint16_t min_dport, max_dport, curr_dport;
>  uint16_t min_sport, max_sport, curr_sport;
> -uint32_t hash;
> +uint32_t hash, port_off;
>  
>  hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
> +port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : 
> hash;
>  min_addr = nat_info->min_addr;
>  max_addr = nat_info->max_addr;
>  
>  find_addr(fwd_key, _addr, _addr, , hash,
>(fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info);
>  
> -set_sport_range(nat_info, fwd_key, hash, _sport,
> +set_sport_range(nat_info, fwd_key, port_off, _sport,
>  _sport, _sport);
> -set_dport_range(nat_info, fwd_key, hash, _dport,
> +set_dport_range(nat_info, fwd_key, port_off, _dport,
>  _dport, _dport);
>  
>  if (pat_proto) {
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 0a888be45..9b0c6aa88 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -77,12 +77,17 @@ enum nat_action_e {
>  NAT_ACTION_DST_PORT = 1 << 3,
>  };
>  
> +enum nat_flags_e {
> +NAT_RANGE_RANDOM = 1 << 0,
> +};
> +
>  struct nat_action_info_t {
>  union ct_addr min_addr;
>  union ct_addr max_addr;
>

Re: [ovs-dev] [PATCH v3] netlink-conntrack: Optimize flushing ct zone.

2024-02-19 Thread Aaron Conole

Felix Huettner  writes:

>> > diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c
>> > index 492bfcffb..1b050894d 100644
>> > --- a/lib/netlink-conntrack.c
>> > +++ b/lib/netlink-conntrack.c
>> > @@ -25,6 +25,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  
>> >  #include "byte-order.h"
>> >  #include "compiler.h"
>> > @@ -141,6 +142,9 @@ nl_ct_dump_start(struct nl_ct_dump_state **statep, 
>> > const uint16_t *zone,
>> >  
>> >  nl_msg_put_nfgenmsg(>buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK,
>> >  IPCTNL_MSG_CT_GET, NLM_F_REQUEST);
>> > +if (zone) {
>> > +nl_msg_put_be16(>buf, CTA_ZONE, htons(*zone));
>> > +}
>> >  nl_dump_start(>dump, NETLINK_NETFILTER, >buf);
>> >  ofpbuf_clear(>buf);
>> >  
>> > @@ -283,23 +287,72 @@ nl_ct_flush_zone(uint16_t flush_zone)
>> >  return err;
>> >  }
>> >  #else
>> > +
>> > +static bool
>> > +netlink_flush_supports_zone(void)
>> > +{
>> > +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> > +static bool supported = false;
>> > +
>> > +if (ovsthread_once_start()) {
>> > +struct utsname utsname;
>> > +int major, minor;
>> > +
>> > +if (uname() == -1) {
>> > +VLOG_WARN("uname failed (%s)", ovs_strerror(errno));
>> > +} else if (!ovs_scan(utsname.release, "%d.%d", , )) {
>> > +VLOG_WARN("uname reported bad OS release (%s)", 
>> > utsname.release);
>> 
>> Do these WARN calls need to be severe like this?  There isn't much for
>> the user to do about this, and it won't impact the functioning of the
>> system in such a major way.  Maybe they can be INFO instead?  Otherwise,
>> we may need to change most of the selftests to ignore the "uname failed"
>> warnings.
>> 
>> More of a nit, because it shouldn't generally fail on any systems.
>
> That was also my thought. I actually copied most of the logic from
> `getqdisc_is_safe` in `netdev-linux.c`.
>
> I am completely fine with changing it to something else if that makes
> everyones lives easier, since hitting it is so unrealistic.

I think it makes sense to just have a single function that the tc and
conntrack layers can call for this functionality, while we can just make
the logs at INFO level - then no issues with strange test environments.
And it doesn't prevent anything from running, or result in 'degraded'
performance - just prevents an optimization.

WDYT?

> Let me know what you would prefer.
>
> Thanks
> Felix
>
>> 
>> > +} else if (major < 6 || (major == 6 && minor < 8)) {
>> > +VLOG_INFO("disabling conntrack flush by zone in Linux kernel 
>> > %s",
>> > +  utsname.release);
>> > +} else {
>> > +supported = true;
>> > +}
>> > +ovsthread_once_done();
>> > +}
>> > +return supported;
>> > +}
>> > +
>> >  int
>> >  nl_ct_flush_zone(uint16_t flush_zone)
>> >  {
>> > -/* Apparently, there's no netlink interface to flush a specific zone.
>> > +/* In older kernels, there was no netlink interface to flush a 
>> > specific
>> > + * conntrack zone.
>> >   * This code dumps every connection, checks the zone and eventually
>> >   * delete the entry.
>> > + * In newer kernels there is the option to specifiy a zone for 
>> > filtering
>> > + * during dumps. Older kernels ignore this option. We set it here in 
>> > the
>> > + * hope we only get relevant entries back, but fall back to filtering 
>> > here
>> > + * to keep compatibility.
>> >   *
>> > - * This is race-prone, but it is better than using shell scripts. */
>> > + * This is race-prone, but it is better than using shell scripts.
>> > + *
>> > + * Additionally newer kernels also support flushing a zone without 
>> > listing
>> > + * it first. */
>> >  
>> >  struct nl_dump dump;
>> >  struct ofpbuf buf, reply, delete;
>> > +int err;
>> > +
>> > +if (netlink_flush_supports_zone()) {
>> > +ofpbuf_init(, NL_DUMP_BUFSIZE);
>> > +
>> > +nl_msg_put_nfgenmsg(, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK,
>> > +IPCTNL_MSG_CT_DELETE, NLM_F_REQUEST);
>> > +nl_msg_put_be16(, CTA_ZONE, htons(flush_zone));
>> > +
>> > +err = nl_transact(NETLINK_NETFILTER, , NULL);
>> > +ofpbuf_uninit();
>> > +
>> > +return err;
>> > +}
>> >  
>> >  ofpbuf_init(, NL_DUMP_BUFSIZE);
>> >  ofpbuf_init(, NL_DUMP_BUFSIZE);
>> >  
>> >  nl_msg_put_nfgenmsg(, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK,
>> >  IPCTNL_MSG_CT_GET, NLM_F_REQUEST);
>> > +nl_msg_put_be16(, CTA_ZONE, htons(flush_zone));
>> >  nl_dump_start(, NETLINK_NETFILTER, );
>> >  ofpbuf_clear();
>> >  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 4/7] selftests: openvswitch: delete previously allocated netns

2024-02-16 Thread Aaron Conole

Paolo Abeni  writes:

> On Fri, 2024-02-16 at 10:28 -0500, Aaron Conole wrote:
>> Many openvswitch test cases reused netns and interface names.  This works
>> fine as long as the test case cleans up gracefully.  However, if there is
>> some kind of ungraceful termination (such as an external signal) the netns
>> or interfaces can be left lingering.  
>
> It looks the openvswitch.sh test script is already trying quite hard to
> delete the allocated resources on ungraceful termination via "trap...".
>
> That is usually enough for other self-tests, could you please detail
> when it fails here?

I thought it should work - but at least what I observed is that when the
vng spawned VM was running the tests, it would TERM portions of the
subshell, but not the running openvswitch.sh script.  That left these
namespaces and interfaces lingering.

>> This happens when the selftest
>> timeout gets exceeded, while running under very slow debugging conditions.
>
> 'timeout' should send SIG_TERM, and the script already handle that
> gracefully?

At least, I didn't observe that to be the case when it got terminated.
I'll remove the timeout setting and try to reproduce it.

>> The solution here is to cleanup the netns on executing the next test.
>
> I suggest avoiding this, it could end up killing innocent alias netns.
>
> You could consider using the 'setup_ns' helper from the
> tools/testing/selftests/net/lib.sh library to always generate unique
> netns names.

Okay - I will look into that.

>> Signed-off-by: Aaron Conole 
>> ---
>>  tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
>> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> index 678a72ad47c1..8dc315585710 100755
>> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> @@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() {
>>  
>>  ovs_add_netns_and_veths () {
>>  info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
>> +ntns_e=`ip netns list | grep $3`
>> +[ "$ntns_e" != "" ] && ip netns del "$3"
>> +if4_e=`ip link show $4 2>/dev/null`
>
> Minor unrelated note: $() is preferable to `` for sub-shells, as it's
> more friendly to nesting, string expansing, quotes, etc.

Okay - I'll prefer it in future.  I didn't know how much I should be
worrying about non-POSIX shells (I seem to remember that `` is accepted
in more shells).

> Cheers,
>
> Paolo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 7/7] selftests: openvswitch: add config and timeout settings

2024-02-16 Thread Aaron Conole

In order for the ovs selftests to run, we need to introduce a sample
configuration.  This will get run under the kselftest runner for net, but
given that environment can be limited process wise, the test can take
longer than on bare metal or high availability VMs.  So, also introduce
a settings file which includes a timeout value that should be sufficent
for such environments.

Signed-off-by: Aaron Conole 
---
 .../testing/selftests/net/openvswitch/config  | 50 +++
 .../selftests/net/openvswitch/settings|  1 +
 2 files changed, 51 insertions(+)
 create mode 100644 tools/testing/selftests/net/openvswitch/config
 create mode 100644 tools/testing/selftests/net/openvswitch/settings

diff --git a/tools/testing/selftests/net/openvswitch/config 
b/tools/testing/selftests/net/openvswitch/config
new file mode 100644
index ..24cff330f3c0
--- /dev/null
+++ b/tools/testing/selftests/net/openvswitch/config
@@ -0,0 +1,50 @@
+CONFIG_OPENVSWITCH=m
+CONFIG_OPENVSWITCH_GRE=m
+CONFIG_OPENVSWITCH_VXLAN=m
+CONFIG_OPENVSWITCH_GENEVE=m
+CONFIG_NF_CONNTRACK_OVS=y
+CONFIG_NF_NAT_OVS=y
+CONFIG_NF_NAT=m
+CONFIG_NF_CONNTRACK=m
+CONFIG_GENEVE=y
+CONFIG_VXLAN=y
+CONFIG_IPV6=y
+CONFIG_TUN=y
+CONFIG_TAP=y
+CONFIG_USER_NS=y
+CONFIG_NET_NS=y
+CONFIG_SYSFS=y
+CONFIG_PROC_SYSCTL=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_VETH=y
+CONFIG_DUMMY=y
+CONFIG_NETFILTER=y
+CONFIG_NETFILTER_ADVANCED=y
+CONFIG_IP6_NF_IPTABLES=m
+CONFIG_IP_NF_IPTABLES=m
+CONFIG_IP6_NF_NAT=m
+CONFIG_IP_NF_NAT=m
+CONFIG_IPV6_GRE=m
+CONFIG_TRACEPOINTS=y
+CONFIG_NET_DROP_MONITOR=m
+CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_IPV6=y
+CONFIG_NF_TABLES_IPV4=y
+CONFIG_KALLSYMS=y
+CONFIG_BRIDGE=m
+CONFIG_VLAN_8021Q=m
+CONFIG_BRIDGE_VLAN_FILTERING=y
+CONFIG_NET_L3_MASTER_DEV=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NET_VRF=m
+CONFIG_BPF_SYSCALL=y
+CONFIG_CGROUP_BPF=y
+CONFIG_NET_ACT_CT=m
+CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_MPLS=m
+CONFIG_NET_ACT_VLAN=m
+CONFIG_NET_CLS_FLOWER=m
+CONFIG_NET_CLS_MATCHALL=m
+CONFIG_NET_SCH_INGRESS=m
+CONFIG_NET_ACT_GACT=m
+CONFIG_NAMESPACES=y
diff --git a/tools/testing/selftests/net/openvswitch/settings 
b/tools/testing/selftests/net/openvswitch/settings
new file mode 100644
index ..694d70710ff0
--- /dev/null
+++ b/tools/testing/selftests/net/openvswitch/settings
@@ -0,0 +1 @@
+timeout=300
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 4/7] selftests: openvswitch: delete previously allocated netns

2024-02-16 Thread Aaron Conole

Many openvswitch test cases reused netns and interface names.  This works
fine as long as the test case cleans up gracefully.  However, if there is
some kind of ungraceful termination (such as an external signal) the netns
or interfaces can be left lingering.  This happens when the selftest
timeout gets exceeded, while running under very slow debugging conditions.

The solution here is to cleanup the netns on executing the next test.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/openvswitch/openvswitch.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 678a72ad47c1..8dc315585710 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -115,6 +115,10 @@ ovs_netns_spawn_daemon() {
 
 ovs_add_netns_and_veths () {
info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
+   ntns_e=`ip netns list | grep $3`
+   [ "$ntns_e" != "" ] && ip netns del "$3"
+   if4_e=`ip link show $4 2>/dev/null`
+   [ "$if4_e" != "" ] && ip link del "$4"
ovs_sbx "$1" ip netns add "$3" || return 1
on_exit "ovs_sbx $1 ip netns del $3"
ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 6/7] selftests: openvswitch: insert module when running the tests

2024-02-16 Thread Aaron Conole

The openvswitch tests do not attempt to insert the openvswitch module
automatically.  Now the test will auto load the module and try to
unload it at the end.  The test harness includes the option to not
load the module, which is helpful when developing changes and loading
the module from a different location.

Signed-off-by: Aaron Conole 
---
 .../testing/selftests/net/openvswitch/openvswitch.sh  | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index a2c106104fb8..e7c9b4fc5591 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -672,12 +672,14 @@ run_test() {
 exitcode=0
 desc=0
 all_skipped=true
+LOAD_MOD=yes
 
 while getopts :pvt o
 do
case $o in
p) PAUSE_ON_FAIL=yes;;
v) VERBOSE=1;;
+   n) LOAD_MOD=no;;
t) if which tcpdump > /dev/null 2>&1; then
TRACING=1
   else
@@ -697,6 +699,10 @@ for arg do
command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not 
found"; usage; }
 done
 
+if [ "$LOAD_MOD" == "yes" ]; then
+   modprobe openvswitch
+fi
+
 name=""
 desc=""
 for t in ${tests}; do
@@ -716,4 +722,9 @@ for t in ${tests}; do
desc=""
 done
 
+
+if [ "$LOAD_MOD" == "yes" ]; then
+   modprobe -r openvswitch
+fi
+
 exit ${exitcode}
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 3/7] selftests: openvswitch: use non-graceful kills when needed

2024-02-16 Thread Aaron Conole

Normally a spawned process under OVS is given a SIGTERM when the test
ends as part of cleanup.  However, in case the process is still lingering
for some reason, we also send a SIGKILL to force it down faster.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/openvswitch/openvswitch.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index a5dbde482ba4..678a72ad47c1 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -91,7 +91,8 @@ ovs_add_if () {
python3 $ovs_base/ovs-dpctl.py add-if \
-u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err &
pid=$!
-   on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null"
+   on_exit "ovs_sbx $1 kill --timeout 1000 TERM \
+--timeout 1000 KILL $pid 2>/dev/null"
fi
 }
 
@@ -108,7 +109,8 @@ ovs_netns_spawn_daemon() {
info "spawning cmd: $*"
ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
pid=$!
-   ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
+   ovs_sbx "$sbx" on_exit "kill --timeout 1000 TERM \
+--timeout 1000 KILL $pid 2>/dev/null"
 }
 
 ovs_add_netns_and_veths () {
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 1/7] selftests: openvswitch: add test case error directories to clean list

2024-02-16 Thread Aaron Conole

Normally, the openvswitch selftests don't keep error files around, but
if debugging, there is an option to keep these files.  The 'clean'
target should be informed that they exist to ensure they are deleted
properly.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/openvswitch/Makefile | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/Makefile 
b/tools/testing/selftests/net/openvswitch/Makefile
index 2f1508abc826..e6c4a89cc14a 100644
--- a/tools/testing/selftests/net/openvswitch/Makefile
+++ b/tools/testing/selftests/net/openvswitch/Makefile
@@ -8,6 +8,16 @@ TEST_PROGS := openvswitch.sh
 
 TEST_FILES := ovs-dpctl.py
 
-EXTRA_CLEAN := test_netlink_checks
+OVS_TESTS := \
+   test_arp_ping \
+   test_ct_connect_v4 \
+   test_connect_v4 \
+   test_nat_connect_v4 \
+   test_nat_related_v4 \
+   test_netlink_checks \
+   test_upcall_interfaces \
+   test_drop_reason
+
+EXTRA_CLEAN := $(OVS_TESTS)
 
 include ../../lib.mk
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 5/7] selftests: openvswitch: make arping test a bit 'slower'

2024-02-16 Thread Aaron Conole

The arping test transmits a single packet and immediately tries to pull
the log for upcall details.  This works fine in practice on most systems
but can fail under a slower VM where it can take a while for the log
data to be written.  By adding addtional transmits we give the system
time to write, and also increase the opportunity to not miss processing
the upcall queue.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/openvswitch/openvswitch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 8dc315585710..a2c106104fb8 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -598,7 +598,7 @@ test_upcall_interfaces() {
 
sleep 1
info "sending arping"
-   ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
+   ip netns exec upc arping -I l0 172.31.110.20 -c 3 \
>$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
 
grep -E "MISS upcall\[0/yes\]: 
.*arp\(sip=172.31.110.1,tip=172.31.110.20,op=1,sha=" $ovs_dir/left0.out 
>/dev/null 2>&1 || return 1
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 2/7] selftests: openvswitch: be more verbose with selftest debugging

2024-02-16 Thread Aaron Conole

The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases.  Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.

Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/openvswitch/openvswitch.sh | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 87b80bee6df4..a5dbde482ba4 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -23,7 +23,9 @@ tests="
drop_reason drop: test drop reasons are 
emitted"
 
 info() {
-[ $VERBOSE = 0 ] || echo $*
+   [ "${ovs_dir}" != "" ] &&
+   echo "`date +"[%m-%d %H:%M:%S]"` $*" >> ${ovs_dir}/debug.log
+   [ $VERBOSE = 0 ] || echo $*
 }
 
 ovs_base=`pwd`
@@ -65,7 +67,8 @@ ovs_setenv() {
 
 ovs_sbx() {
if test "X$2" != X; then
-   (ovs_setenv $1; shift; "$@" >> ${ovs_dir}/debug.log)
+   (ovs_setenv $1; shift;
+info "run cmd: $@"; "$@" >> ${ovs_dir}/debug.log)
else
ovs_setenv $1
fi
@@ -139,9 +142,10 @@ ovs_add_flow () {
info "Adding flow to DP: sbx:$1 br:$2 flow:$3 act:$4"
ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-flow "$2" "$3" "$4"
if [ $? -ne 0 ]; then
-   echo "Flow [ $3 : $4 ] failed" >> ${ovs_dir}/debug.log
+   info "Flow [ $3 : $4 ] failed"
return 1
fi
+   info "Flow added."
return 0
 }
 
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC 0/7] selftests: openvswitch: cleanups for running as selftests

2024-02-16 Thread Aaron Conole

The series is a host of cleanups to the openvswitch selftest suite
which should be ready to run under the netdev selftest runners using
vng.  For now, the testing has been done with RW directories, but
additional testing will be done to try and keep it all as RO to be
more friendly.

There is one more test case I plan which will print the debug log
details when a test case fails so that a developer can get a clear
picture why the test case failed.  That will be done for the proper
submission as another patch in this series.

Additionally, the timeout setting was just an arbitrary number that
I picked, but needs more testing to tune it properly (since 5
minutes may be a bit too long).

Tested on fedora 38 using virtme-ng with the following commandline:

../virtme-ng/vng -v --run . --user root --cpus 4 \
--rwdir=/home/aconole/git/linux/tools/testing/selftests/net/openvswitch/ \
-- \
make -C tools/testing/selftests/net/openvswitch \
 TARGETS=openvswitch TEST_PROGS=openvswitch.sh run_tests

Aaron Conole (7):
  selftests: openvswitch: add test case error directories to clean list
  selftests: openvswitch: be more verbose with selftest debugging
  selftests: openvswitch: use non-graceful kills when needed
  selftests: openvswitch: delete previously allocated netns
  selftests: openvswitch: make arping test a bit 'slower'
  selftests: openvswitch: insert module when running the tests
  selftests: openvswitch: add config and timeout settings

 .../selftests/net/openvswitch/Makefile| 12 -
 .../testing/selftests/net/openvswitch/config  | 50 +++
 .../selftests/net/openvswitch/openvswitch.sh  | 33 +---
 .../selftests/net/openvswitch/settings|  1 +
 4 files changed, 89 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/net/openvswitch/config
 create mode 100644 tools/testing/selftests/net/openvswitch/settings

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3] netlink-conntrack: Optimize flushing ct zone.

2024-02-15 Thread Aaron Conole

Felix Huettner via dev  writes:

> Previously the kernel did not provide a netlink interface to flush/list
> only conntrack entries matching a specific zone. With [1] and [2] it is now
> possible to flush and list conntrack entries filtered by zone. Older
> kernels not yet supporting this feature will ignore the filter.
> For the list request that means just returning all entries (which we can
> then filter in userspace as before).
> For the flush request that means deleting all conntrack entries.
>
> These significantly improves the performance of flushing conntrack zones
> when the conntrack table is large. Since flushing a conntrack zone is
> normally triggered via an openflow command it blocks the main ovs thread
> and thereby also blocks new flows from being applied. Using this new
> feature we can reduce the flushing time for zones by around 93%.
>
> In combination with OVN the creation of a Logical_Router (which causes
> the flushing of a ct zone) could block other operations, e.g. the
> failover of Logical_Routers (as they cause new flows to be created).
> This is visible from a user perspective as a ovn-controller that is idle
> (as it waits for vswitchd) and vswitchd reporting:
> "blocked 1000 ms waiting for main to quiesce" (potentially with ever
> increasing times).
>
> The following performance tests where run in a qemu vm with 500.000
> conntrack entries distributed evenly over 500 ct zones using `ovstest
> test-netlink-conntrack flush zone=`.
>
> With this patch and kernel v6.8-rc4:
>
> -
>Min (s) Median (s)  90%ile (s) 
>  99%ile (s)  Max (s) Mean (s)Total (s)   Count
> -
> flush zone with 1000 entries   0.260   0.319   0.335  
>  0.348   0.362   0.320   80.02   250
> flush zone with no entry   0.228   0.298   0.325  
>  0.340   0.348   0.296   73.93   250
> -
>
> With this patch and kernel v6.7.1:
>
> -
>Min (s) Median (s)  90%ile (s) 
>  99%ile (s)  Max (s) Mean (s)Total (s)   Count
> -
> flush zone with 1000 entries   3.946   4.237   4.367  
>  4.495   4.543   4.236   1058.992250
> flush zone with no entry   3.462   4.460   4.662  
>  4.931   5.390   4.430   1107.479250
> -
>
> Without this patch and kernel v6.8-rc4:
>
> -
>Min (s) Median (s)  90%ile (s) 
>  99%ile (s)  Max (s) Mean (s)Total (s)   Count
> -
> flush zone with 1000 entries   3.497   4.349   4.522  
>  4.773   5.054   4.331   1082.802250
> flush zone with no entry   3.212   4.010   4.572  
>  6.003   6.396   4.071   1017.838250
> -
>
> [1]: 
> https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba
> [2]: 
> https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a
>
> Co-Authored-By: Luca Czesla 
> Signed-off-by: Luca Czesla 
> Co-Authored-By: Max Lamprecht 
> Signed-off-by: Max Lamprecht 
> Signed-off-by: Felix Huettner 
> ---

Thanks so much - I haven't gotten a chance to test on a new kernel, but
it's on my agenda.

> v2->v3:
> - update description to include upstream fix (Thanks to Ilya for finding
>   that issue)
> v1->v2:
> - fixed wrong signed-off-by
>
>  lib/netlink-conntrack.c | 57

Re: [ovs-dev] [PATCH v2 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-15 Thread Aaron Conole

Paolo Valerio  writes:

> The patch, when 'persistent' flag is specified, makes the IP selection
> in a range persistent across reboots.
>
> Signed-off-by: Paolo Valerio 
> ---
>  NEWS  |  3 ++-
>  lib/conntrack.c   | 27 +--
>  lib/conntrack.h   |  1 +
>  lib/dpif-netdev.c |  2 ++
>  4 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 93046b963..0c86bba81 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,7 +2,8 @@ Post-v3.3.0
>  
> - Userspace datapath:
>   * Conntrack now supports 'random' flag for selecting ports in a range
> -   while natting.
> +   while natting and 'persistent' flag for selection of the IP address
> +   from a range.
>  
>  
>  v3.3.0 - xx xxx 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index e09ecdf33..7868a67f7 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2202,17 +2202,21 @@ nat_range_hash(const struct conn_key *key, uint32_t 
> basis,
>  {
>  uint32_t hash = basis;
>  
> +if (!basis) {
> +hash = ct_addr_hash_add(hash, >src.addr);
> +} else {
> +hash = ct_endpoint_hash_add(hash, >src);
> +hash = ct_endpoint_hash_add(hash, >dst);
> +}
> +
>  hash = ct_addr_hash_add(hash, _info->min_addr);
>  hash = ct_addr_hash_add(hash, _info->max_addr);
>  hash = hash_add(hash,
>  ((uint32_t) nat_info->max_port << 16)
>  | nat_info->min_port);
> -hash = ct_endpoint_hash_add(hash, >src);
> -hash = ct_endpoint_hash_add(hash, >dst);
>  hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type);
>  hash = hash_add(hash, key->nw_proto);
>  hash = hash_add(hash, key->zone);
> -
>  /* The purpose of the second parameter is to distinguish hashes of data 
> of
>   * different length; our data always has the same length so there is no
>   * value in counting. */
> @@ -2386,12 +2390,23 @@ nat_get_unique_tuple(struct conntrack *ct, struct 
> conn *conn,
>  bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP ||
>   fwd_key->nw_proto == IPPROTO_UDP ||
>   fwd_key->nw_proto == IPPROTO_SCTP;
> +uint32_t hash, port_off, basis = ct->hash_basis;
>  uint16_t min_dport, max_dport, curr_dport;
>  uint16_t min_sport, max_sport, curr_sport;
> -uint32_t hash, port_off;
>  
> -hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
> -port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : 
> hash;
> +if (nat_info->nat_flags & NAT_PERSISTENT) {
> +basis = 0;
> +}
> +
> +hash = nat_range_hash(fwd_key, basis, nat_info);
> +
> +if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
> +port_off = random_uint32();
> +} else {
> +port_off =
> +basis ? hash : nat_range_hash(fwd_key, ct->hash_basis, nat_info);
> +}
> +

I'm probably misreading this, but the multiple 'if's is confusing me.

Maybe it would be better to write something like:

bool persist = !!(nat_info->nat_flags & NAT_PERSISTENT)

if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
   port_off = random_uint32();
} else {
   port_off = !persist ?
   nat_range_hash(fwd_key, ct->hash_basis, nat_info) :
   nat_range_hash(fwd_key, 0, nat_info);
}

Especially because in the above code, ct->hash_basis == '0' would look
the same as persistent (which isn't obvious from the code).

At least, the multiple nested ifs make it difficult to follow what is
going on with basis call and nat_range_hash

Just read Simon's comments, and it's similar kind of comment I guess
:)

>  min_addr = nat_info->min_addr;
>  max_addr = nat_info->max_addr;
>  
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 9b0c6aa88..ee7da099e 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -79,6 +79,7 @@ enum nat_action_e {
>  
>  enum nat_flags_e {
>  NAT_RANGE_RANDOM = 1 << 0,
> +NAT_PERSISTENT = 1 << 1,
>  };
>  
>  struct nat_action_info_t {
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c3334c667..fbf7ccabd 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -9413,6 +9413,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *packets_,
>  nat_action_info.nat_flags |= NAT_RANGE_RANDOM;
>  break;
>  case OVS_NAT_ATTR_PERSISTENT:
> +nat_action_info.nat_flags |= NAT_PERSISTENT;
> +break;
>  case OVS_NAT_ATTR_PROTO_HASH:
>  break;
>  case OVS_NAT_ATTR_UNSPEC:

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix flush not flushing all elements.

2024-02-15 Thread Aaron Conole

Xavier Simonart  writes:

> When a ct element was cleaned, the cmap could be shrinked, potentially
> causing some elements to be skipped in the flush iteration.
>
> Signed-off-by: Xavier Simonart 
> ---

This only applies to netdev datapaths, so might be good to mention it in
the commit message.  The netlink datapath shouldn't have this kind of
issue.

Also I think it needs:

Fixes: 967bb5c5cd90 ("conntrack: Add rcu support.")

>  lib/conntrack.c | 14 -
>  lib/conntrack.h |  1 +
>  tests/system-traffic.at | 45 +
>  3 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 013709bd6..6d02eaba8 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2637,25 +2637,19 @@ conntrack_dump_start(struct conntrack *ct, struct 
> conntrack_dump *dump,
>  
>  dump->ct = ct;
>  *ptot_bkts = 1; /* Need to clean up the callers. */
> +dump->cursor = cmap_cursor_start(>conns);
>  return 0;
>  }
>  
>  int
>  conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry)
>  {
> -struct conntrack *ct = dump->ct;
>  long long now = time_msec();
>  
> -for (;;) {
> -struct cmap_node *cm_node = cmap_next_position(>conns,
> -   >cm_pos);
> -if (!cm_node) {
> -break;
> -}
> -struct conn_key_node *keyn;
> -struct conn *conn;
> +struct conn_key_node *keyn;
> +struct conn *conn;
>  
> -INIT_CONTAINER(keyn, cm_node, cm_node);
> +CMAP_CURSOR_FOR_EACH_CONTINUE (keyn, cm_node, >cursor) {
>  if (keyn->dir != CT_DIR_FWD) {
>  continue;
>  }
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 0a888be45..2eb0870c2 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -103,6 +103,7 @@ struct conntrack_dump {
>  union {
>  struct cmap_position cm_pos;
>  struct hmap_position hmap_pos;
> +struct cmap_cursor cursor;
>  };
>  bool filter_zone;
>  uint16_t zone;
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index e68fe7e18..99979844e 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -8329,6 +8329,51 @@ AT_CHECK([ovs-pcap client.pcap | grep 
> 20102000], [0], [dnl
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([conntrack - Flush many conntrack entries by port])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()

I was thinking at first that we should SKIP on netlink datapath, but
the more I think about it, the more I think it might be a useful test
there.  Thanks!

> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=100,in_port=1,udp,action=ct(zone=1,commit),2
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +# 20 packets from port 1 and 1 packet from port 2.
> +for i in $(seq 1 20); do
> +d=$(printf "%02x" $i)
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100${d}0008
>  actions=resubmit(,0)"])
> +done
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000200010008
>  actions=resubmit(,0)"])
> +
> +: > conntrack
> +
> +for i in $(seq 1 20); do
> +d=$(printf "%d" $i)
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=${d}),reply=(src=10.1.1.2,dst=10.1.1.1,sport=${d},dport=1),zone=1"
>  >> conntrack
> +done
> +echo 
> "udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1"
>  >> conntrack
> +
> +sort conntrack > expout
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [expout])
> +
> +# Check that flushing conntrack by port 1 flush all ct for port 1 but keeps 
> ct for port 2.
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep -F "src=10.1.1.1," | sort 
> ], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=2,dport=1),reply=(src=10.1.1.2,dst=10.1.1.1,sport=1,dport=2),zone=1
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP(["dnl
> +/could not create datapath/d
> +/(Cannot allocate memory) on packet/d"])
> +AT_CLEANUP
> +
>  AT_BANNER([IGMP])
>  
>  AT_SETUP([IGMP - flood under normal action])

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-12 Thread Aaron Conole

Aaron Conole  writes:

> Eelco Chaudron  writes:
>
>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>
>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>
>>>>> Eelco Chaudron  writes:
>>>>>
>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>
>>>>>>> From: Kevin Sprague 
>>>>>>>
>>>>>>> During normal operations, it is useful to understand when a particular 
>>>>>>> flow
>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>> performance
>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>> traffic
>>>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>>>
>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>> expiration.
>>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>>> added to
>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>
>>>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>>>> reason for the flow eviction and provide that information as well.  With
>>>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, 
>>>>>>> and
>>>>>>> the revaldiator USDT, letting us watch as flows are added and removed 
>>>>>>> from
>>>>>>> the kernel datapath.
>>>>>>>
>>>>>>> This change only enables this information via USDT probe, so it won't be
>>>>>>> possible to access this information any other way (see:
>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>
>>>>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>> which serves as a demonstration of how the new USDT probe might be used
>>>>>>> going forward.
>>>>>>>
>>>>>>> Signed-off-by: Kevin Sprague 
>>>>>>> Co-authored-by: Aaron Conole 
>>>>>>> Signed-off-by: Aaron Conole 
>>>>>>
>>>>>> Thanks for following this up Aaron! See comments on this patch
>> below. I have no additional comments on patch 2.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Eelco
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>   Documentation/topics/usdt-probes.rst |   1 +
>>>>>>>   ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>>>>   utilities/automake.mk|   3 +
>>>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>
>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>>   - dpif_recv:recv_upcall
>>>>>>>   - main:poll_block
>>>>>>>   - main:run_start
>>>>>>> +- revalidate:flow_result
>>>>>>>   - revalidate_ukey\_\_:entry
>>>>>>>   - revalidate_ukey\_\_:exit
>>>>>>>   - udpif_revalidator:start_dump
>>>>>>
>>>>>> You are missing the specific flow_result result section. This is from 
>>>>>> the previous patch:
>>>>>
>>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>>
>>>>>> @@

Re: [ovs-dev] [PATCH branch-3.1 0/2] Release patches for v3.1.4.

2024-02-08 Thread Aaron Conole

Ilya Maximets  writes:

> Bug Fix + Security release.
>
> Ilya Maximets (2):
>   Set release date for 3.1.4.
>   Prepare for 3.1.5.
>
>  NEWS | 8 +++-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)

Series:

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH branch-3.2 0/2] Release patches for v3.2.2.

2024-02-08 Thread Aaron Conole

Ilya Maximets  writes:

> Bug Fix + Security release.
>
> Ilya Maximets (2):
>   Set release date for 3.2.2.
>   Prepare for 3.2.3.
>
>  NEWS | 8 +++-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)

Series:

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH branch-2.17 0/2] Release patches for v2.17.9.

2024-02-08 Thread Aaron Conole

Ilya Maximets  writes:

> Bug Fix + Security release.
>
> Ilya Maximets (2):
>   Set release date for 2.17.9.
>   Prepare for 2.17.10.
>
>  NEWS | 8 +++-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)

Series:

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH branch-3.0 0/2] Release patches for v3.0.6.

2024-02-08 Thread Aaron Conole

Ilya Maximets  writes:

> Bug Fix + Security release.
>
> Ilya Maximets (2):
>   Set release date for 3.0.6.
>   Prepare for 3.0.7.
>
>  NEWS | 8 +++-
>  configure.ac | 2 +-
>  debian/changelog | 8 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)

Series:

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net v2 0/2] net: openvswitch: limit the recursions from action sets

2024-02-07 Thread Aaron Conole

Open vSwitch module accepts actions as a list from the netlink socket
and then creates a copy which it uses in the action set processing.
During processing of the action list on a packet, the module keeps a
count of the execution depth and exits processing if the action depth
goes too high.

However, during netlink processing the recursion depth isn't checked
anywhere, and the copy trusts that kernel has large enough stack to
accommodate it.  The OVS sample action was the original action which
could perform this kinds of recursion, and it originally checked that
it didn't exceed the sample depth limit.  However, when sample became
optimized to provide the clone() semantics, the recursion limit was
dropped.

This series adds a depth limit during the __ovs_nla_copy_actions() call
that will ensure we don't exceed the max that the OVS userspace could
generate for a clone().

Additionally, this series provides a selftest in 2/2 that can be used to
determine if the OVS module is allowing unbounded access.  It can be
safely omitted where the ovs selftest framework isn't available.

Aaron Conole (2):
  net: openvswitch: limit the number of recursions from action sets
  selftests: openvswitch: Add validation for the recursion test

 net/openvswitch/flow_netlink.c| 49 -
 .../selftests/net/openvswitch/openvswitch.sh  | 13 
 .../selftests/net/openvswitch/ovs-dpctl.py| 71 +++
 3 files changed, 102 insertions(+), 31 deletions(-)

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net v2 2/2] selftests: openvswitch: Add validation for the recursion test

2024-02-07 Thread Aaron Conole

Add a test case into the netlink checks that will show the number of
nested action recursions won't exceed 16.  Going to 17 on a small
clone call isn't enough to exhaust the stack on (most) systems, so
it should be safe to run even on systems that don't have the fix
applied.

Signed-off-by: Aaron Conole 
---
v1->v2: Used a slightly more portable io redirection semantic for the
clone() flow test case.

 .../selftests/net/openvswitch/openvswitch.sh  | 13 
 .../selftests/net/openvswitch/ovs-dpctl.py| 71 +++
 2 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index f8499d4c87f3..36e40256ab92 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -502,7 +502,20 @@ test_netlink_checks () {
wc -l) == 2 ] || \
  return 1
 
+   info "Checking clone depth"
ERR_MSG="Flow actions may not be safe on all matching packets"
+   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+   ovs_add_flow "test_netlink_checks" nv0 \
+   'in_port(1),eth(),eth_type(0x800),ipv4()' \
+   
'clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(drop)'
 \
+   >/dev/null 2>&1 && return 1
+   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+   if [ "$PRE_TEST" == "$POST_TEST" ]; then
+   info "failed - clone depth too large"
+   return 1
+   fi
+
PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
ovs_add_flow "test_netlink_checks" nv0 \
'in_port(1),eth(),eth_type(0x0806),arp()' 'drop(0),2' \
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index b97e621face9..5e0e539a323d 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -299,7 +299,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_PUSH_NSH", "none"),
 ("OVS_ACTION_ATTR_POP_NSH", "flag"),
 ("OVS_ACTION_ATTR_METER", "none"),
-("OVS_ACTION_ATTR_CLONE", "none"),
+("OVS_ACTION_ATTR_CLONE", "recursive"),
 ("OVS_ACTION_ATTR_CHECK_PKT_LEN", "none"),
 ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
 ("OVS_ACTION_ATTR_DEC_TTL", "none"),
@@ -465,29 +465,42 @@ class ovsactions(nla):
 print_str += "pop_mpls"
 else:
 datum = self.get_attr(field[0])
-print_str += datum.dpstr(more)
+if field[0] == "OVS_ACTION_ATTR_CLONE":
+print_str += "clone("
+print_str += datum.dpstr(more)
+print_str += ")"
+else:
+print_str += datum.dpstr(more)
 
 return print_str
 
 def parse(self, actstr):
+totallen = len(actstr)
 while len(actstr) != 0:
 parsed = False
+parencount = 0
 if actstr.startswith("drop"):
 # If no reason is provided, the implicit drop is used (i.e no
 # action). If some reason is given, an explicit action is used.
-actstr, reason = parse_extract_field(
-actstr,
-"drop(",
-"([0-9]+)",
-lambda x: int(x, 0),
-False,
-None,
-)
+reason = None
+if actstr.startswith("drop("):
+parencount += 1
+
+actstr, reason = parse_extract_field(
+actstr,
+"drop(",
+"([0-9]+)",
+lambda x: int(x, 0),
+False,
+None,
+)
+
 if reason is not None:
 self["attrs"].append(["OVS_ACTION_ATTR_DROP", reason])
 parsed = True
 else:
-return
+actstr = actstr[len("drop"): ]
+return (totallen - len(actstr))
 
 elif parse_starts_block(actstr, "^(\d+)", False, True):
 actstr, output = parse_extract_field(
@@ -504,6 +517,7 @@ class ovsactions(nla):
 False,
 0,

[ovs-dev] [PATCH net v2 1/2] net: openvswitch: limit the number of recursions from action sets

2024-02-07 Thread Aaron Conole

The ovs module allows for some actions to recursively contain an action
list for complex scenarios, such as sampling, checking lengths, etc.
When these actions are copied into the internal flow table, they are
evaluated to validate that such actions make sense, and these calls
happen recursively.

The ovs-vswitchd userspace won't emit more than 16 recursion levels
deep.  However, the module has no such limit and will happily accept
limits larger than 16 levels nested.  Prevent this by tracking the
number of recursions happening and manually limiting it to 16 levels
nested.

The initial implementation of the sample action would track this depth
and prevent more than 3 levels of recursion, but this was removed to
support the clone use case, rather than limited at the current userspace
limit.

Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use 
cases")
Signed-off-by: Aaron Conole 
---
v1->v2: Switch to tracking the stack depth by using a depth argument rather than
a per-cpu counter.

 net/openvswitch/flow_netlink.c | 49 +++---
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 88965e2068ac..ebc5728aab4e 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -48,6 +48,7 @@ struct ovs_len_tbl {
 
 #define OVS_ATTR_NESTED -1
 #define OVS_ATTR_VARIABLE -2
+#define OVS_COPY_ACTIONS_MAX_DEPTH 16
 
 static bool actions_may_change_flow(const struct nlattr *actions)
 {
@@ -2545,13 +2546,15 @@ static int __ovs_nla_copy_actions(struct net *net, 
const struct nlattr *attr,
  const struct sw_flow_key *key,
  struct sw_flow_actions **sfa,
  __be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log);
+ u32 mpls_label_count, bool log,
+ u32 depth);
 
 static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
-   u32 mpls_label_count, bool log, bool last)
+   u32 mpls_label_count, bool log, bool last,
+   u32 depth)
 {
const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
const struct nlattr *probability, *actions;
@@ -2602,7 +2605,8 @@ static int validate_and_copy_sample(struct net *net, 
const struct nlattr *attr,
return err;
 
err = __ovs_nla_copy_actions(net, actions, key, sfa,
-eth_type, vlan_tci, mpls_label_count, log);
+eth_type, vlan_tci, mpls_label_count, log,
+depth + 1);
 
if (err)
return err;
@@ -2617,7 +2621,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
 const struct sw_flow_key *key,
 struct sw_flow_actions **sfa,
 __be16 eth_type, __be16 vlan_tci,
-u32 mpls_label_count, bool log)
+u32 mpls_label_count, bool log,
+u32 depth)
 {
const struct nlattr *attrs[OVS_DEC_TTL_ATTR_MAX + 1];
int start, action_start, err, rem;
@@ -2660,7 +2665,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
return action_start;
 
err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type,
-vlan_tci, mpls_label_count, log);
+vlan_tci, mpls_label_count, log,
+depth + 1);
if (err)
return err;
 
@@ -2674,7 +2680,8 @@ static int validate_and_copy_clone(struct net *net,
   const struct sw_flow_key *key,
   struct sw_flow_actions **sfa,
   __be16 eth_type, __be16 vlan_tci,
-  u32 mpls_label_count, bool log, bool last)
+  u32 mpls_label_count, bool log, bool last,
+  u32 depth)
 {
int start, err;
u32 exec;
@@ -2694,7 +2701,8 @@ static int validate_and_copy_clone(struct net *net,
return err;
 
err = __ovs_nla_copy_actions(net, attr, key, sfa,
-eth_type, vlan_tci, mpls_label_count, log);
+eth_type, vlan_tci, mpls_label_count, log,
+depth + 1);

Re: [ovs-dev] [PATCH net 1/2] net: openvswitch: limit the number of recursions from action sets

2024-02-06 Thread Aaron Conole

Eric Dumazet  writes:

> On Tue, Feb 6, 2024 at 3:55 PM Aaron Conole  wrote:
>>
>>
>> Oops - I didn't consider it.
>>
>> Given that, maybe the best approach would not to rely on per-cpu
>> counter. I'll respin in the next series with a depth counter that I pass
>> to the function instead and compare that.  I guess that should address
>> migration and eliminate the need for per-cpu counter.
>>
>> Does it make sense?
>
> Sure, a depth parameter would work much better ;)

Okay - I'll give time for others to comment and resubmit in ~24 hours
unless there's a reason to submit sooner.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/2] net: openvswitch: limit the number of recursions from action sets

2024-02-06 Thread Aaron Conole

Eric Dumazet  writes:

> On Tue, Feb 6, 2024 at 2:11 PM Aaron Conole  wrote:
>>
>> The ovs module allows for some actions to recursively contain an action
>> list for complex scenarios, such as sampling, checking lengths, etc.
>> When these actions are copied into the internal flow table, they are
>> evaluated to validate that such actions make sense, and these calls
>> happen recursively.
>>
>> The ovs-vswitchd userspace won't emit more than 16 recursion levels
>> deep.  However, the module has no such limit and will happily accept
>> limits larger than 16 levels nested.  Prevent this by tracking the
>> number of recursions happening and manually limiting it to 16 levels
>> nested.
>>
>> The initial implementation of the sample action would track this depth
>> and prevent more than 3 levels of recursion, but this was removed to
>> support the clone use case, rather than limited at the current userspace
>> limit.
>>
>> Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use 
>> cases")
>> Signed-off-by: Aaron Conole 
>> ---
>>  net/openvswitch/flow_netlink.c | 33 -
>>  1 file changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> index 88965e2068ac..ba5cfa67a720 100644
>> --- a/net/openvswitch/flow_netlink.c
>> +++ b/net/openvswitch/flow_netlink.c
>> @@ -48,6 +48,9 @@ struct ovs_len_tbl {
>>
>>  #define OVS_ATTR_NESTED -1
>>  #define OVS_ATTR_VARIABLE -2
>> +#define OVS_COPY_ACTIONS_MAX_DEPTH 16
>> +
>> +static DEFINE_PER_CPU(int, copy_actions_depth);
>>
>>  static bool actions_may_change_flow(const struct nlattr *actions)
>>  {
>> @@ -3148,11 +3151,11 @@ static int copy_action(const struct nlattr *from,
>> return 0;
>>  }
>>
>> -static int __ovs_nla_copy_actions(struct net *net, const struct nlattr 
>> *attr,
>> - const struct sw_flow_key *key,
>> - struct sw_flow_actions **sfa,
>> - __be16 eth_type, __be16 vlan_tci,
>> - u32 mpls_label_count, bool log)
>> +static int ___ovs_nla_copy_actions(struct net *net, const struct nlattr 
>> *attr,
>> +  const struct sw_flow_key *key,
>> +  struct sw_flow_actions **sfa,
>> +  __be16 eth_type, __be16 vlan_tci,
>> +  u32 mpls_label_count, bool log)
>>  {
>> u8 mac_proto = ovs_key_mac_proto(key);
>> const struct nlattr *a;
>> @@ -3478,6 +3481,26 @@ static int __ovs_nla_copy_actions(struct net *net, 
>> const struct nlattr *attr,
>> return 0;
>>  }
>>
>> +static int __ovs_nla_copy_actions(struct net *net, const struct nlattr 
>> *attr,
>> + const struct sw_flow_key *key,
>> + struct sw_flow_actions **sfa,
>> + __be16 eth_type, __be16 vlan_tci,
>> + u32 mpls_label_count, bool log)
>> +{
>> +   int level = this_cpu_read(copy_actions_depth);
>> +   int ret;
>> +
>> +   if (level > OVS_COPY_ACTIONS_MAX_DEPTH)
>> +   return -EOVERFLOW;
>> +
>
> This code seems to run in process context.
>
> Using per cpu limit would not work, unless you disabled migration ?

Oops - I didn't consider it.

Given that, maybe the best approach would not to rely on per-cpu
counter. I'll respin in the next series with a depth counter that I pass
to the function instead and compare that.  I guess that should address
migration and eliminate the need for per-cpu counter.

Does it make sense?

>> +   __this_cpu_inc(copy_actions_depth);
>> +   ret = ___ovs_nla_copy_actions(net, attr, key, sfa, eth_type,
>> + vlan_tci, mpls_label_count, log);
>> +   __this_cpu_dec(copy_actions_depth);
>> +
>> +   return ret;
>> +}
>> +
>>  /* 'key' must be the masked key. */
>>  int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>  const struct sw_flow_key *key,
>> --
>> 2.41.0
>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net 2/2] selftests: openvswitch: Add validation for the recursion test

2024-02-06 Thread Aaron Conole

Add a test case into the netlink checks that will show the number of
nested action recursions won't exceed 16.  Going to 17 on a small
clone call isn't enough to exhaust the stack on (most) systems, so
it should be safe to run even on systems that don't have the fix
applied.

Signed-off-by: Aaron Conole 
---
NOTE: This patch may be safely omitted for trees that don't
  include the selftests

 .../selftests/net/openvswitch/openvswitch.sh  | 13 
 .../selftests/net/openvswitch/ovs-dpctl.py| 71 +++
 2 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index f8499d4c87f3..30cb9d3b035d 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -502,7 +502,20 @@ test_netlink_checks () {
wc -l) == 2 ] || \
  return 1
 
+   info "Checking clone depth"
ERR_MSG="Flow actions may not be safe on all matching packets"
+   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+   ovs_add_flow "test_netlink_checks" nv0 \
+   'in_port(1),eth(),eth_type(0x800),ipv4()' \
+   
'clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(drop)'
 \
+   &> /dev/null && return 1
+   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+   if [ "$PRE_TEST" == "$POST_TEST" ]; then
+   info "failed - clone depth too large"
+   return 1
+   fi
+
PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
ovs_add_flow "test_netlink_checks" nv0 \
'in_port(1),eth(),eth_type(0x0806),arp()' 'drop(0),2' \
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index b97e621face9..5e0e539a323d 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -299,7 +299,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_PUSH_NSH", "none"),
 ("OVS_ACTION_ATTR_POP_NSH", "flag"),
 ("OVS_ACTION_ATTR_METER", "none"),
-("OVS_ACTION_ATTR_CLONE", "none"),
+("OVS_ACTION_ATTR_CLONE", "recursive"),
 ("OVS_ACTION_ATTR_CHECK_PKT_LEN", "none"),
 ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
 ("OVS_ACTION_ATTR_DEC_TTL", "none"),
@@ -465,29 +465,42 @@ class ovsactions(nla):
 print_str += "pop_mpls"
 else:
 datum = self.get_attr(field[0])
-print_str += datum.dpstr(more)
+if field[0] == "OVS_ACTION_ATTR_CLONE":
+print_str += "clone("
+print_str += datum.dpstr(more)
+print_str += ")"
+else:
+print_str += datum.dpstr(more)
 
 return print_str
 
 def parse(self, actstr):
+totallen = len(actstr)
 while len(actstr) != 0:
 parsed = False
+parencount = 0
 if actstr.startswith("drop"):
 # If no reason is provided, the implicit drop is used (i.e no
 # action). If some reason is given, an explicit action is used.
-actstr, reason = parse_extract_field(
-actstr,
-"drop(",
-"([0-9]+)",
-lambda x: int(x, 0),
-False,
-None,
-)
+reason = None
+if actstr.startswith("drop("):
+parencount += 1
+
+actstr, reason = parse_extract_field(
+actstr,
+"drop(",
+"([0-9]+)",
+lambda x: int(x, 0),
+False,
+None,
+)
+
 if reason is not None:
 self["attrs"].append(["OVS_ACTION_ATTR_DROP", reason])
 parsed = True
 else:
-return
+actstr = actstr[len("drop"): ]
+return (totallen - len(actstr))
 
 elif parse_starts_block(actstr, "^(\d+)", False, True):
 actstr, output = parse_extract_field(
@@ -504,6 +517,7 @@ class ovsactions(nla):
 False,
 0,
 )
+

[ovs-dev] [PATCH net 1/2] net: openvswitch: limit the number of recursions from action sets

2024-02-06 Thread Aaron Conole

The ovs module allows for some actions to recursively contain an action
list for complex scenarios, such as sampling, checking lengths, etc.
When these actions are copied into the internal flow table, they are
evaluated to validate that such actions make sense, and these calls
happen recursively.

The ovs-vswitchd userspace won't emit more than 16 recursion levels
deep.  However, the module has no such limit and will happily accept
limits larger than 16 levels nested.  Prevent this by tracking the
number of recursions happening and manually limiting it to 16 levels
nested.

The initial implementation of the sample action would track this depth
and prevent more than 3 levels of recursion, but this was removed to
support the clone use case, rather than limited at the current userspace
limit.

Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use 
cases")
Signed-off-by: Aaron Conole 
---
 net/openvswitch/flow_netlink.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 88965e2068ac..ba5cfa67a720 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -48,6 +48,9 @@ struct ovs_len_tbl {
 
 #define OVS_ATTR_NESTED -1
 #define OVS_ATTR_VARIABLE -2
+#define OVS_COPY_ACTIONS_MAX_DEPTH 16
+
+static DEFINE_PER_CPU(int, copy_actions_depth);
 
 static bool actions_may_change_flow(const struct nlattr *actions)
 {
@@ -3148,11 +3151,11 @@ static int copy_action(const struct nlattr *from,
return 0;
 }
 
-static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
- const struct sw_flow_key *key,
- struct sw_flow_actions **sfa,
- __be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log)
+static int ___ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
+  const struct sw_flow_key *key,
+  struct sw_flow_actions **sfa,
+  __be16 eth_type, __be16 vlan_tci,
+  u32 mpls_label_count, bool log)
 {
u8 mac_proto = ovs_key_mac_proto(key);
const struct nlattr *a;
@@ -3478,6 +3481,26 @@ static int __ovs_nla_copy_actions(struct net *net, const 
struct nlattr *attr,
return 0;
 }
 
+static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **sfa,
+ __be16 eth_type, __be16 vlan_tci,
+ u32 mpls_label_count, bool log)
+{
+   int level = this_cpu_read(copy_actions_depth);
+   int ret;
+
+   if (level > OVS_COPY_ACTIONS_MAX_DEPTH)
+   return -EOVERFLOW;
+
+   __this_cpu_inc(copy_actions_depth);
+   ret = ___ovs_nla_copy_actions(net, attr, key, sfa, eth_type,
+ vlan_tci, mpls_label_count, log);
+   __this_cpu_dec(copy_actions_depth);
+
+   return ret;
+}
+
 /* 'key' must be the masked key. */
 int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 const struct sw_flow_key *key,
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net 0/2] net: openvswitch: limit the recursions from action sets

2024-02-06 Thread Aaron Conole

Open vSwitch module accepts actions as a list from the netlink socket
and then creates a copy which it uses in the action set processing.
During processing of the action list on a packet, the module keeps a
count of the execution depth and exits processing if the action depth
goes too high.

However, during netlink processing the recursion depth isn't checked
anywhere, and the copy trusts that kernel has large enough stack to
accommodate it.  The OVS sample action was the original action which
could perform this kinds of recursion, and it originally checked that
it didn't exceed the sample depth limit.  However, when sample became
optimized to provide the clone() semantics, the recursion limit was
dropped.

This series adds a depth limit during the __ovs_nla_copy_actions() call
that will ensure we don't exceed the max that the OVS userspace could
generate for a clone().

Additionally, this series provides a selftest in 2/2 that can be used to
determine if the OVS module is allowing unbounded access.  It can be
safely omitted where the ovs selftest framework isn't available.

Aaron Conole (2):
  net: openvswitch: limit the number of recursions from action sets
  selftests: openvswitch: Add validation for the recursion test

 net/openvswitch/flow_netlink.c| 33 +++--
 .../selftests/net/openvswitch/openvswitch.sh  | 13 
 .../selftests/net/openvswitch/ovs-dpctl.py| 71 +++
 3 files changed, 97 insertions(+), 20 deletions(-)

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-02 Thread Aaron Conole

Eelco Chaudron  writes:

> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>
>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>
>>>
>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>
>>>> Eelco Chaudron  writes:
>>>>
>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>
>>>>>> From: Kevin Sprague 
>>>>>>
>>>>>> During normal operations, it is useful to understand when a particular 
>>>>>> flow
>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>> performance
>>>>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>>
>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>> expiration.
>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>> added to
>>>>>> the datapath, but not when it was removed or why.
>>>>>>
>>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>>> reason for the flow eviction and provide that information as well.  With
>>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, 
>>>>>> and
>>>>>> the revaldiator USDT, letting us watch as flows are added and removed 
>>>>>> from
>>>>>> the kernel datapath.
>>>>>>
>>>>>> This change only enables this information via USDT probe, so it won't be
>>>>>> possible to access this information any other way (see:
>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>
>>>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>> which serves as a demonstration of how the new USDT probe might be used
>>>>>> going forward.
>>>>>>
>>>>>> Signed-off-by: Kevin Sprague 
>>>>>> Co-authored-by: Aaron Conole 
>>>>>> Signed-off-by: Aaron Conole 
>>>>>
>>>>> Thanks for following this up Aaron! See comments on this patch
> below. I have no additional comments on patch 2.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Eelco
>>>>>
>>>>>
>>>>>> ---
>>>>>>   Documentation/topics/usdt-probes.rst |   1 +
>>>>>>   ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>>>   utilities/automake.mk|   3 +
>>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>
>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>   - dpif_recv:recv_upcall
>>>>>>   - main:poll_block
>>>>>>   - main:run_start
>>>>>> +- revalidate:flow_result
>>>>>>   - revalidate_ukey\_\_:entry
>>>>>>   - revalidate_ukey\_\_:exit
>>>>>>   - udpif_revalidator:start_dump
>>>>>
>>>>> You are missing the specific flow_result result section. This is from the 
>>>>> previous patch:
>>>>
>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>
>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>>   - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>
>>>>>
>>>>> +probe revalidate:flow_result
>>>>> +~
>>>>> +
>>>>> +**Description**:
>>>>> +This probe is triggered when the revalidat

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-02 Thread Aaron Conole

Adrian Moreno  writes:

> On 2/1/24 10:02, Eelco Chaudron wrote:
>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>> 
>>> Eelco Chaudron  writes:
>>>
>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>
>>>>> From: Kevin Sprague 
>>>>>
>>>>> During normal operations, it is useful to understand when a particular 
>>>>> flow
>>>>> gets removed from the system. This can be useful when debugging 
>>>>> performance
>>>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>
>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>> expiration.
>>>>> The existing debugging infrastructure could tell us when a flow was added 
>>>>> to
>>>>> the datapath, but not when it was removed or why.
>>>>>
>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>> reason for the flow eviction and provide that information as well.  With
>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, 
>>>>> and
>>>>> the revaldiator USDT, letting us watch as flows are added and removed from
>>>>> the kernel datapath.
>>>>>
>>>>> This change only enables this information via USDT probe, so it won't be
>>>>> possible to access this information any other way (see:
>>>>> Documentation/topics/usdt-probes.rst).
>>>>>
>>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>> which serves as a demonstration of how the new USDT probe might be used
>>>>> going forward.
>>>>>
>>>>> Signed-off-by: Kevin Sprague 
>>>>> Co-authored-by: Aaron Conole 
>>>>> Signed-off-by: Aaron Conole 
>>>>
>>>> Thanks for following this up Aaron! See comments on this patch
> below. I have no additional comments on patch 2.
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>>>
>>>>
>>>>> ---
>>>>>   Documentation/topics/usdt-probes.rst |   1 +
>>>>>   ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>>   utilities/automake.mk|   3 +
>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>
>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>> b/Documentation/topics/usdt-probes.rst
>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>   - dpif_recv:recv_upcall
>>>>>   - main:poll_block
>>>>>   - main:run_start
>>>>> +- revalidate:flow_result
>>>>>   - revalidate_ukey\_\_:entry
>>>>>   - revalidate_ukey\_\_:exit
>>>>>   - udpif_revalidator:start_dump
>>>>
>>>> You are missing the specific flow_result result section. This is from the 
>>>> previous patch:
>>>
>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>
>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>   - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>
>>>>
>>>> +probe revalidate:flow_result
>>>> +~
>>>> +
>>>> +**Description**:
>>>> +This probe is triggered when the revalidator decides whether or not to
>>>> +revalidate a flow. ``reason`` is an enum that denotes that either the flow
>>>> +is being kept, or the reason why the flow is being deleted. The
>>>> +``flow_reval_monitor.py`` script uses this probe to notify users when 
>>>> flows
>>>> +matching user-provided criteria are deleted.
>>>> +
>>>> +**Arguments**:
>>>> +
>

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-02 Thread Aaron Conole

Eelco Chaudron  writes:

> On 1 Feb 2024, at 18:28, Aaron Conole wrote:
>
>> Eelco Chaudron  writes:
>>
>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>
>>>> Eelco Chaudron  writes:
>>>>
>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>
>>>>>> From: Kevin Sprague 
>>>>>>
>>>>>> During normal operations, it is useful to understand when a particular 
>>>>>> flow
>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>> performance
>>>>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>>
>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>> expiration.
>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>> added to
>>>>>> the datapath, but not when it was removed or why.
>>>>>>
>>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>>> reason for the flow eviction and provide that information as well.  With
>>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, 
>>>>>> and
>>>>>> the revaldiator USDT, letting us watch as flows are added and removed 
>>>>>> from
>>>>>> the kernel datapath.
>>>>>>
>>>>>> This change only enables this information via USDT probe, so it won't be
>>>>>> possible to access this information any other way (see:
>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>
>>>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>> which serves as a demonstration of how the new USDT probe might be used
>>>>>> going forward.
>>>>>>
>>>>>> Signed-off-by: Kevin Sprague 
>>>>>> Co-authored-by: Aaron Conole 
>>>>>> Signed-off-by: Aaron Conole 
>>>>>
>>>>> Thanks for following this up Aaron! See comments on this patch below. I 
>>>>> have no additional comments on patch 2.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Eelco
>>>>>
>>>>>
>>>>>> ---
>>>>>>  Documentation/topics/usdt-probes.rst |   1 +
>>>>>>  ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>>>  utilities/automake.mk|   3 +
>>>>>>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>>>>>  4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>
>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>  - dpif_recv:recv_upcall
>>>>>>  - main:poll_block
>>>>>>  - main:run_start
>>>>>> +- revalidate:flow_result
>>>>>>  - revalidate_ukey\_\_:entry
>>>>>>  - revalidate_ukey\_\_:exit
>>>>>>  - udpif_revalidator:start_dump
>>>>>
>>>>> You are missing the specific flow_result result section. This is from the 
>>>>> previous patch:
>>>>
>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>
>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>>  - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>
>>>>>
>>>>> +probe revalidate:flow_result
>>>>> +~
>>>>> +
>>>>> +**Description**:
>>>>> +This probe is triggered when the revalidator decides whether or not to
>

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-02-01 Thread Aaron Conole

Eelco Chaudron  writes:

> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>
>> Eelco Chaudron  writes:
>>
>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>
>>>> From: Kevin Sprague 
>>>>
>>>> During normal operations, it is useful to understand when a particular flow
>>>> gets removed from the system. This can be useful when debugging performance
>>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>
>>>> Prior to this change, there was a lack of visibility around flow 
>>>> expiration.
>>>> The existing debugging infrastructure could tell us when a flow was added 
>>>> to
>>>> the datapath, but not when it was removed or why.
>>>>
>>>> This change introduces a USDT probe at the point where the revalidator
>>>> determines that the flow should be removed.  Additionally, we track the
>>>> reason for the flow eviction and provide that information as well.  With
>>>> this change, we can track the complete flow lifecycle for the netlink
>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
>>>> the revaldiator USDT, letting us watch as flows are added and removed from
>>>> the kernel datapath.
>>>>
>>>> This change only enables this information via USDT probe, so it won't be
>>>> possible to access this information any other way (see:
>>>> Documentation/topics/usdt-probes.rst).
>>>>
>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>> which serves as a demonstration of how the new USDT probe might be used
>>>> going forward.
>>>>
>>>> Signed-off-by: Kevin Sprague 
>>>> Co-authored-by: Aaron Conole 
>>>> Signed-off-by: Aaron Conole 
>>>
>>> Thanks for following this up Aaron! See comments on this patch below. I 
>>> have no additional comments on patch 2.
>>>
>>> Cheers,
>>>
>>> Eelco
>>>
>>>
>>>> ---
>>>>  Documentation/topics/usdt-probes.rst |   1 +
>>>>  ofproto/ofproto-dpif-upcall.c|  42 +-
>>>>  utilities/automake.mk|   3 +
>>>>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>>>  4 files changed, 693 insertions(+), 6 deletions(-)
>>>>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>
>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>> b/Documentation/topics/usdt-probes.rst
>>>> index e527f43bab..a8da9bb1f7 100644
>>>> --- a/Documentation/topics/usdt-probes.rst
>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>  - dpif_recv:recv_upcall
>>>>  - main:poll_block
>>>>  - main:run_start
>>>> +- revalidate:flow_result
>>>>  - revalidate_ukey\_\_:entry
>>>>  - revalidate_ukey\_\_:exit
>>>>  - udpif_revalidator:start_dump
>>>
>>> You are missing the specific flow_result result section. This is from the 
>>> previous patch:
>>
>> D'oh!  Thanks for catching it.  I'll re-add it.
>>
>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>  - ``utilities/usdt-scripts/bridge_loop.bt``
>>>
>>>
>>> +probe revalidate:flow_result
>>> +~
>>> +
>>> +**Description**:
>>> +This probe is triggered when the revalidator decides whether or not to
>>> +revalidate a flow. ``reason`` is an enum that denotes that either the flow
>>> +is being kept, or the reason why the flow is being deleted. The
>>> +``flow_reval_monitor.py`` script uses this probe to notify users when flows
>>> +matching user-provided criteria are deleted.
>>> +
>>> +**Arguments**:
>>> +
>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>> +- *arg1*: ``(struct udpif *) udpif``
>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>> +
>>> +**Script references**:
>>> +
>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>> +
>>> +
>>>  Adding your own probes
>>>  --
>>>
>>>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpi

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-31 Thread Aaron Conole

Eelco Chaudron  writes:

> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>
>> From: Kevin Sprague 
>>
>> During normal operations, it is useful to understand when a particular flow
>> gets removed from the system. This can be useful when debugging performance
>> issues tied to ofproto flow changes, trying to determine deployed traffic
>> patterns, or while debugging dynamic systems where ports come and go.
>>
>> Prior to this change, there was a lack of visibility around flow expiration.
>> The existing debugging infrastructure could tell us when a flow was added to
>> the datapath, but not when it was removed or why.
>>
>> This change introduces a USDT probe at the point where the revalidator
>> determines that the flow should be removed.  Additionally, we track the
>> reason for the flow eviction and provide that information as well.  With
>> this change, we can track the complete flow lifecycle for the netlink
>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
>> the revaldiator USDT, letting us watch as flows are added and removed from
>> the kernel datapath.
>>
>> This change only enables this information via USDT probe, so it won't be
>> possible to access this information any other way (see:
>> Documentation/topics/usdt-probes.rst).
>>
>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>> which serves as a demonstration of how the new USDT probe might be used
>> going forward.
>>
>> Signed-off-by: Kevin Sprague 
>> Co-authored-by: Aaron Conole 
>> Signed-off-by: Aaron Conole 
>
> Thanks for following this up Aaron! See comments on this patch below. I have 
> no additional comments on patch 2.
>
> Cheers,
>
> Eelco
>
>
>> ---
>>  Documentation/topics/usdt-probes.rst |   1 +
>>  ofproto/ofproto-dpif-upcall.c|  42 +-
>>  utilities/automake.mk|   3 +
>>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>  4 files changed, 693 insertions(+), 6 deletions(-)
>>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>
>> diff --git a/Documentation/topics/usdt-probes.rst 
>> b/Documentation/topics/usdt-probes.rst
>> index e527f43bab..a8da9bb1f7 100644
>> --- a/Documentation/topics/usdt-probes.rst
>> +++ b/Documentation/topics/usdt-probes.rst
>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>  - dpif_recv:recv_upcall
>>  - main:poll_block
>>  - main:run_start
>> +- revalidate:flow_result
>>  - revalidate_ukey\_\_:entry
>>  - revalidate_ukey\_\_:exit
>>  - udpif_revalidator:start_dump
>
> You are missing the specific flow_result result section. This is from the 
> previous patch:

D'oh!  Thanks for catching it.  I'll re-add it.

> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>  - ``utilities/usdt-scripts/bridge_loop.bt``
>
>
> +probe revalidate:flow_result
> +~
> +
> +**Description**:
> +This probe is triggered when the revalidator decides whether or not to
> +revalidate a flow. ``reason`` is an enum that denotes that either the flow
> +is being kept, or the reason why the flow is being deleted. The
> +``flow_reval_monitor.py`` script uses this probe to notify users when flows
> +matching user-provided criteria are deleted.
> +
> +**Arguments**:
> +
> +- *arg0*: ``(enum flow_del_reason) reason``
> +- *arg1*: ``(struct udpif *) udpif``
> +- *arg2*: ``(struct udpif_key *) ukey``
> +
> +**Script references**:
> +
> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
> +
> +
>  Adding your own probes
>  --
>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index b5cbeed878..97d75833f7 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -269,6 +269,18 @@ enum ukey_state {
>>  };
>>  #define N_UKEY_STATES (UKEY_DELETED + 1)
>>
>> +enum flow_del_reason {
>> +FDR_REVALIDATE = 0, /* The flow was revalidated. */
>
> It was called FDR_FLOW_LIVE before, which might make more sense. As the flow 
> is just NOT deleted. It might or might not have been revalidated. Thoughts?

I think it had to have been revalidated if we emit the reason, because
we only emit the reason code after revalidation.  IE: there are many
places where we skip revalidation but the flow stays live - and we don't
emit reasons in those cases.

So at least for this patch, it MUST have been revalidated.  But maybe in
the future, we would want to catch cases where the fl

Re: [ovs-dev] [PATCH net-next] selftests: openvswitch: Test ICMP related matches work with SNAT

2024-01-31 Thread Aaron Conole

Brad Cowie  writes:

> Add a test case for regression in openvswitch nat that was fixed by
> commit e6345d2824a3 ("netfilter: nf_nat: fix action not being set for
> all ct states").
>
> Link: https://lore.kernel.org/netdev/20231221224311.130319-1-b...@faucet.nz/
> Link: https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/410476.html
> Suggested-by: Aaron Conole 
> Signed-off-by: Brad Cowie 
> ---

I tested this on a patched kernel and as well as an unpatched kernel and
got the following:

6.5.5-200: TEST: ip4-nat-related: ICMP related matches work with SNAT  
[FAIL]
6.7.0: TEST: ip4-nat-related: ICMP related matches work with SNAT  
[ OK ]

Thanks for adding the test case!

Tested-by: Aaron Conole 
Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 2/2] rhel: Enable USDT scripts by default in Fedora builds.

2024-01-30 Thread Aaron Conole

Ilya Maximets  writes:

> On 1/25/24 21:55, Aaron Conole wrote:
>> All supported versions of Fedora do package libbpf, so it
>> makes sense to enable USDT support.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>> v8: Include the correct devel package as a dependency
>> 
>>  rhel/openvswitch-fedora.spec.in | 8 
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/rhel/openvswitch-fedora.spec.in 
>> b/rhel/openvswitch-fedora.spec.in
>> index 5d24ebcda8..b97f509cae 100644
>> --- a/rhel/openvswitch-fedora.spec.in
>> +++ b/rhel/openvswitch-fedora.spec.in
>> @@ -28,6 +28,8 @@
>>  %bcond_with dpdk
>>  # To disable AF_XDP support, specify '--without afxdp' when building
>>  %bcond_without afxdp
>> +# To control the USDT support
>> +%bcond_without usdt
>>  
>>  # If there is a need to automatically enable the package after installation,
>>  # specify the "--with autoenable"
>> @@ -77,6 +79,9 @@ Provides: %{name}-dpdk = %{version}-%{release}
>>  %if %{with afxdp}
>>  BuildRequires: libxdp-devel libbpf-devel numactl-devel
>>  %endif
>> +%if %{with afxdp}
>
> Did you mean usdt ?

Ugh... yes.  I'll post the update.  Thanks for catching it.

>> +BuildRequires: libbpf-devel systemtap-sdt-devel
>> +%endif
>>  BuildRequires: unbound unbound-devel
>>  
>>  Requires: openssl hostname iproute module-init-tools unbound
>> @@ -173,6 +178,9 @@ This package provides IPsec tunneling support for OVS 
>> tunnels.
>>  --enable-afxdp \
>>  %else
>>  --disable-afxdp \
>> +%endif
>> +%if %{with usdt}
>> +--enable-usdt-probes \
>>  %endif
>>  --enable-ssl \
>>  --disable-static \

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v7 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-26 Thread Aaron Conole

Simon Horman  writes:

> On Thu, Jan 25, 2024 at 12:05:29PM -0500, Aaron Conole wrote:
>> From: Kevin Sprague 
>> 
>> During normal operations, it is useful to understand when a particular flow
>> gets removed from the system. This can be useful when debugging performance
>> issues tied to ofproto flow changes, trying to determine deployed traffic
>> patterns, or while debugging dynamic systems where ports come and go.
>> 
>> Prior to this change, there was a lack of visibility around flow expiration.
>> The existing debugging infrastructure could tell us when a flow was added to
>> the datapath, but not when it was removed or why.
>> 
>> This change introduces a USDT probe at the point where the revalidator
>> determines that the flow should be removed.  Additionally, we track the
>> reason for the flow eviction and provide that information as well.  With
>> this change, we can track the complete flow lifecycle for the netlink
>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
>> the revaldiator USDT, letting us watch as flows are added and removed from
>> the kernel datapath.
>> 
>> This change only enables this information via USDT probe, so it won't be
>> possible to access this information any other way (see:
>> Documentation/topics/usdt-probes.rst).
>> 
>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>> which serves as a demonstration of how the new USDT probe might be used
>> going forward.
>> 
>> Signed-off-by: Kevin Sprague 
>> Co-authored-by: Aaron Conole 
>> Signed-off-by: Aaron Conole 
>
> Hi Aaron,
>
> has any consideration been given to adding tests for this feature?

I don't think so.  Actually, we don't have any tests for any of the USDT
probes / scripts.  At the very least, we could probably update some of
the existing tests to take advantage of these tracepoints to verify the
information being captured.  That's a much bigger task, though.

Maybe Adrian has opinions on it as well, since Retis may use this
information to gather insights.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v7 2/2] rhel: Enable USDT scripts by default in Fedora builds

2024-01-26 Thread Aaron Conole

Simon Horman  writes:

> On Thu, Jan 25, 2024 at 12:05:30PM -0500, Aaron Conole wrote:
>> All supported versions of Fedora do package libbpf, so it
>> makes sense to enable USDT support.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  rhel/openvswitch-fedora.spec.in | 5 +
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/rhel/openvswitch-fedora.spec.in 
>> b/rhel/openvswitch-fedora.spec.in
>> index 5d24ebcda8..3d9afeac59 100644
>> --- a/rhel/openvswitch-fedora.spec.in
>> +++ b/rhel/openvswitch-fedora.spec.in
>> @@ -28,6 +28,8 @@
>>  %bcond_with dpdk
>>  # To disable AF_XDP support, specify '--without afxdp' when building
>>  %bcond_without afxdp
>> +# To control the USDT support
>> +%bcond_without usdt
>>  
>>  # If there is a need to automatically enable the package after installation,
>>  # specify the "--with autoenable"
>> @@ -173,6 +175,9 @@ This package provides IPsec tunneling support for OVS 
>> tunnels.
>>  --enable-afxdp \
>>  %else
>>  --disable-afxdp \
>> +%endif
>> +%if %{with usdt}
>> +--enable-usdt-probes \
>>  %endif
>>  --enable-ssl \
>>  --disable-static \
>
> Hi Aaron,
>
> Our mutual friend ovsrobot didn't seem to like this one :(
>
>  configure: error: unable to find sys/sdt.h needed for USDT support
> error: Bad exit status from /var/tmp/rpm-tmp.pNLYFp (%build)
> Macro expanded in comment on line 23: %define kernel 
> 2.6.40.4-5.fc15.x86_64
>
> Bad exit status from /var/tmp/rpm-tmp.pNLYFp (%build)
>
> Link: https://github.com/ovsrobot/ovs/actions/runs/7658135907/job/20870414014

Thanks - fixed in v8

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-26 Thread Aaron Conole

Han Zhou  writes:

> On Thu, Jan 25, 2024 at 12:55 PM Aaron Conole  wrote:
>>
>> From: Kevin Sprague 
>>
>> During normal operations, it is useful to understand when a particular flow
>> gets removed from the system. This can be useful when debugging performance
>> issues tied to ofproto flow changes, trying to determine deployed traffic
>> patterns, or while debugging dynamic systems where ports come and go.
>>
>> Prior to this change, there was a lack of visibility around flow expiration.
>> The existing debugging infrastructure could tell us when a flow was added to
>> the datapath, but not when it was removed or why.
>>
>> This change introduces a USDT probe at the point where the revalidator
>> determines that the flow should be removed.  Additionally, we track the
>> reason for the flow eviction and provide that information as well.  With
>> this change, we can track the complete flow lifecycle for the netlink
>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
>> the revaldiator USDT, letting us watch as flows are added and removed from
>> the kernel datapath.
>>
>> This change only enables this information via USDT probe, so it won't be
>> possible to access this information any other way (see:
>> Documentation/topics/usdt-probes.rst).
>>
>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>> which serves as a demonstration of how the new USDT probe might be used
>> going forward.
>>
>> Signed-off-by: Kevin Sprague 
>> Co-authored-by: Aaron Conole 
>> Signed-off-by: Aaron Conole 
>
> Thanks Aaron for taking care of this patch. I saw you resolved most of my 
> comments for the v6 of the original patch:
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401220.html
>
> Butit seems my last comment was missed:
> ===
>
> I do notice a counter in my patch doesn't have a
> counterpart in this patch. In revalidator_sweep__(), I have:
> if (purge) {
> result = UKEY_DELETE;
> +COVERAGE_INC(upcall_flow_del_purge);
>
> Would it be good to add one (e.g. FDR_PURGE) here, too?
>
> ===
> Could you check if this can be added?
> If this is merged I can rebase my patch on top of this.

Sorry I didn't reply to this.

I'm not sure it makes sense to add the probe for purge, specifically as
the purge is only done in two cases:

1. The threads are being stopped (which should never occur after
   initialization unless the vswitchd is being stopped / killed)

2. An admin runs a command to purge the revalidators (which isn't a
   recommended procedure as it can cause lots of really weird side
   effects and we only use it as a debug tool).

Did I understand the case enough?  I didn't reread the patch you're
proposing, so I might be misunderstanding something.

> Thanks,
> Han
>
>>
>> ---
>>  Documentation/topics/usdt-probes.rst |   1 +
>>  ofproto/ofproto-dpif-upcall.c|  42 +-
>>  utilities/automake.mk|   3 +
>>  utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
>>  4 files changed, 693 insertions(+), 6 deletions(-)
>>  create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>
>> diff --git a/Documentation/topics/usdt-probes.rst 
>> b/Documentation/topics/usdt-probes.rst
>> index e527f43bab..a8da9bb1f7 100644
>> --- a/Documentation/topics/usdt-probes.rst
>> +++ b/Documentation/topics/usdt-probes.rst
>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>  - dpif_recv:recv_upcall
>>  - main:poll_block
>>  - main:run_start
>> +- revalidate:flow_result
>>  - revalidate_ukey\_\_:entry
>>  - revalidate_ukey\_\_:exit
>>  - udpif_revalidator:start_dump
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index b5cbeed878..97d75833f7 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -269,6 +269,18 @@ enum ukey_state {
>>  };
>>  #define N_UKEY_STATES (UKEY_DELETED + 1)
>>
>> +enum flow_del_reason {
>> +FDR_REVALIDATE = 0, /* The flow was revalidated. */
>> +FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
>> +FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate. */
>> +FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask. */
>> +FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
>> +FDR_NO_OFPROTO, /* The flow didn't have an associated ofproto. 
>> */
>> +FDR_XLATION_ERROR,  /

[ovs-dev] [PATCH v8 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-25 Thread Aaron Conole

From: Kevin Sprague 

During normal operations, it is useful to understand when a particular flow
gets removed from the system. This can be useful when debugging performance
issues tied to ofproto flow changes, trying to determine deployed traffic
patterns, or while debugging dynamic systems where ports come and go.

Prior to this change, there was a lack of visibility around flow expiration.
The existing debugging infrastructure could tell us when a flow was added to
the datapath, but not when it was removed or why.

This change introduces a USDT probe at the point where the revalidator
determines that the flow should be removed.  Additionally, we track the
reason for the flow eviction and provide that information as well.  With
this change, we can track the complete flow lifecycle for the netlink
datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
the revaldiator USDT, letting us watch as flows are added and removed from
the kernel datapath.

This change only enables this information via USDT probe, so it won't be
possible to access this information any other way (see:
Documentation/topics/usdt-probes.rst).

Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
which serves as a demonstration of how the new USDT probe might be used
going forward.

Signed-off-by: Kevin Sprague 
Co-authored-by: Aaron Conole 
Signed-off-by: Aaron Conole 
---
 Documentation/topics/usdt-probes.rst |   1 +
 ofproto/ofproto-dpif-upcall.c|  42 +-
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
 4 files changed, 693 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

diff --git a/Documentation/topics/usdt-probes.rst 
b/Documentation/topics/usdt-probes.rst
index e527f43bab..a8da9bb1f7 100644
--- a/Documentation/topics/usdt-probes.rst
+++ b/Documentation/topics/usdt-probes.rst
@@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
 - dpif_recv:recv_upcall
 - main:poll_block
 - main:run_start
+- revalidate:flow_result
 - revalidate_ukey\_\_:entry
 - revalidate_ukey\_\_:exit
 - udpif_revalidator:start_dump
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b5cbeed878..97d75833f7 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -269,6 +269,18 @@ enum ukey_state {
 };
 #define N_UKEY_STATES (UKEY_DELETED + 1)
 
+enum flow_del_reason {
+FDR_REVALIDATE = 0, /* The flow was revalidated. */
+FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
+FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate. */
+FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask. */
+FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
+FDR_NO_OFPROTO, /* The flow didn't have an associated ofproto. */
+FDR_XLATION_ERROR,  /* There was an error translating the flow. */
+FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
+FDR_FLOW_LIMIT, /* All flows being killed. */
+};
+
 /* 'udpif_key's are responsible for tracking the little bit of state udpif
  * needs to do flow expiration which can't be pulled directly from the
  * datapath.  They may be created by any handler or revalidator thread at any
@@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct udpif_key 
*ukey,
 static enum reval_result
 revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
   uint16_t tcp_flags, struct ofpbuf *odp_actions,
-  struct recirc_refs *recircs, struct xlate_cache *xcache)
+  struct recirc_refs *recircs, struct xlate_cache *xcache,
+  enum flow_del_reason *reason)
 {
 struct xlate_out *xoutp;
 struct netflow *netflow;
@@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 netflow = NULL;
 
 if (xlate_ukey(udpif, ukey, tcp_flags, )) {
+*reason = FDR_XLATION_ERROR;
 goto exit;
 }
 xoutp = 
 
 if (xoutp->avoid_caching) {
+*reason = FDR_AVOID_CACHING;
 goto exit;
 }
 
@@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 ofpbuf_clear(odp_actions);
 
 if (!ofproto) {
+*reason = FDR_NO_OFPROTO;
 goto exit;
 }
 
@@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, _mask, ,
  NULL)
 == ODP_FIT_ERROR) {
+*reason = FDR_BAD_ODP_FIT;
 goto exit;
 }
 
@@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
  * down.  Note that we do not know if the datapath has ignored any of the
  * wildcarded bits, so we may be overly conserva

[ovs-dev] [PATCH v8 0/2] debugging: Add a revalidator probe, and monitor script

2024-01-25 Thread Aaron Conole

Resurrecting a feature from 2022, introduce a probe that indicates
why a particular flow may be selected for eviction during revalidation
and includes the flow information.

The second patch tells fedora builds to include the USDT probe support
on Fedora systems.

Aaron Conole (1):
  rhel: Enable USDT scripts by default in Fedora builds.

Kevin Sprague (1):
  revalidator: Add a USDT probe during flow deletion with purge reason.

 Documentation/topics/usdt-probes.rst |   1 +
 ofproto/ofproto-dpif-upcall.c|  42 +-
 rhel/openvswitch-fedora.spec.in  |   8 +
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
 5 files changed, 701 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v8 2/2] rhel: Enable USDT scripts by default in Fedora builds.

2024-01-25 Thread Aaron Conole

All supported versions of Fedora do package libbpf, so it
makes sense to enable USDT support.

Signed-off-by: Aaron Conole 
---
v8: Include the correct devel package as a dependency

 rhel/openvswitch-fedora.spec.in | 8 
 1 file changed, 8 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 5d24ebcda8..b97f509cae 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -28,6 +28,8 @@
 %bcond_with dpdk
 # To disable AF_XDP support, specify '--without afxdp' when building
 %bcond_without afxdp
+# To control the USDT support
+%bcond_without usdt
 
 # If there is a need to automatically enable the package after installation,
 # specify the "--with autoenable"
@@ -77,6 +79,9 @@ Provides: %{name}-dpdk = %{version}-%{release}
 %if %{with afxdp}
 BuildRequires: libxdp-devel libbpf-devel numactl-devel
 %endif
+%if %{with afxdp}
+BuildRequires: libbpf-devel systemtap-sdt-devel
+%endif
 BuildRequires: unbound unbound-devel
 
 Requires: openssl hostname iproute module-init-tools unbound
@@ -173,6 +178,9 @@ This package provides IPsec tunneling support for OVS 
tunnels.
 --enable-afxdp \
 %else
 --disable-afxdp \
+%endif
+%if %{with usdt}
+--enable-usdt-probes \
 %endif
 --enable-ssl \
 --disable-static \
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v7 2/2] rhel: Enable USDT scripts by default in Fedora builds

2024-01-25 Thread Aaron Conole

All supported versions of Fedora do package libbpf, so it
makes sense to enable USDT support.

Signed-off-by: Aaron Conole 
---
 rhel/openvswitch-fedora.spec.in | 5 +
 1 file changed, 5 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 5d24ebcda8..3d9afeac59 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -28,6 +28,8 @@
 %bcond_with dpdk
 # To disable AF_XDP support, specify '--without afxdp' when building
 %bcond_without afxdp
+# To control the USDT support
+%bcond_without usdt
 
 # If there is a need to automatically enable the package after installation,
 # specify the "--with autoenable"
@@ -173,6 +175,9 @@ This package provides IPsec tunneling support for OVS 
tunnels.
 --enable-afxdp \
 %else
 --disable-afxdp \
+%endif
+%if %{with usdt}
+--enable-usdt-probes \
 %endif
 --enable-ssl \
 --disable-static \
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v7 0/2] debugging: Add a revalidator probe, and monitor script

2024-01-25 Thread Aaron Conole

Resurrecting a feature from 2022, introduce a probe that indicates
why a particular flow may be selected for eviction during revalidation
and includes the flow information.

The second patch tells fedora builds to include the USDT probe support
on Fedora systems.

Aaron Conole (1):
  rhel: Enable USDT scripts by default in Fedora builds

Kevin Sprague (1):
  revalidator: Add a USDT probe during flow deletion with purge reason.

 Documentation/topics/usdt-probes.rst |   1 +
 ofproto/ofproto-dpif-upcall.c|  42 +-
 rhel/openvswitch-fedora.spec.in  |   5 +
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
 5 files changed, 698 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v7 1/2] revalidator: Add a USDT probe during flow deletion with purge reason.

2024-01-25 Thread Aaron Conole

From: Kevin Sprague 

During normal operations, it is useful to understand when a particular flow
gets removed from the system. This can be useful when debugging performance
issues tied to ofproto flow changes, trying to determine deployed traffic
patterns, or while debugging dynamic systems where ports come and go.

Prior to this change, there was a lack of visibility around flow expiration.
The existing debugging infrastructure could tell us when a flow was added to
the datapath, but not when it was removed or why.

This change introduces a USDT probe at the point where the revalidator
determines that the flow should be removed.  Additionally, we track the
reason for the flow eviction and provide that information as well.  With
this change, we can track the complete flow lifecycle for the netlink
datapath by hooking the upcall tracepoint in kernel, the flow put USDT, and
the revaldiator USDT, letting us watch as flows are added and removed from
the kernel datapath.

This change only enables this information via USDT probe, so it won't be
possible to access this information any other way (see:
Documentation/topics/usdt-probes.rst).

Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
which serves as a demonstration of how the new USDT probe might be used
going forward.

Signed-off-by: Kevin Sprague 
Co-authored-by: Aaron Conole 
Signed-off-by: Aaron Conole 
---
 Documentation/topics/usdt-probes.rst |   1 +
 ofproto/ofproto-dpif-upcall.c|  42 +-
 utilities/automake.mk|   3 +
 utilities/usdt-scripts/flow_reval_monitor.py | 653 +++
 4 files changed, 693 insertions(+), 6 deletions(-)
 create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py

diff --git a/Documentation/topics/usdt-probes.rst 
b/Documentation/topics/usdt-probes.rst
index e527f43bab..a8da9bb1f7 100644
--- a/Documentation/topics/usdt-probes.rst
+++ b/Documentation/topics/usdt-probes.rst
@@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
 - dpif_recv:recv_upcall
 - main:poll_block
 - main:run_start
+- revalidate:flow_result
 - revalidate_ukey\_\_:entry
 - revalidate_ukey\_\_:exit
 - udpif_revalidator:start_dump
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b5cbeed878..97d75833f7 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -269,6 +269,18 @@ enum ukey_state {
 };
 #define N_UKEY_STATES (UKEY_DELETED + 1)
 
+enum flow_del_reason {
+FDR_REVALIDATE = 0, /* The flow was revalidated. */
+FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
+FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate. */
+FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask. */
+FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
+FDR_NO_OFPROTO, /* The flow didn't have an associated ofproto. */
+FDR_XLATION_ERROR,  /* There was an error translating the flow. */
+FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
+FDR_FLOW_LIMIT, /* All flows being killed. */
+};
+
 /* 'udpif_key's are responsible for tracking the little bit of state udpif
  * needs to do flow expiration which can't be pulled directly from the
  * datapath.  They may be created by any handler or revalidator thread at any
@@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct udpif_key 
*ukey,
 static enum reval_result
 revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
   uint16_t tcp_flags, struct ofpbuf *odp_actions,
-  struct recirc_refs *recircs, struct xlate_cache *xcache)
+  struct recirc_refs *recircs, struct xlate_cache *xcache,
+  enum flow_del_reason *reason)
 {
 struct xlate_out *xoutp;
 struct netflow *netflow;
@@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 netflow = NULL;
 
 if (xlate_ukey(udpif, ukey, tcp_flags, )) {
+*reason = FDR_XLATION_ERROR;
 goto exit;
 }
 xoutp = 
 
 if (xoutp->avoid_caching) {
+*reason = FDR_AVOID_CACHING;
 goto exit;
 }
 
@@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 ofpbuf_clear(odp_actions);
 
 if (!ofproto) {
+*reason = FDR_NO_OFPROTO;
 goto exit;
 }
 
@@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
 if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, _mask, ,
  NULL)
 == ODP_FIT_ERROR) {
+*reason = FDR_BAD_ODP_FIT;
 goto exit;
 }
 
@@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
udpif_key *ukey,
  * down.  Note that we do not know if the datapath has ignored any of the
  * wildcarded bits, so we may be overly conserva

Re: [ovs-dev] [PATCH] dp-packet: Reset offload flags when clearing a packet.

2024-01-24 Thread Aaron Conole

Dumitru Ceara  writes:

> On 1/23/24 00:11, Mike Pattrick wrote:
>> The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
>> a BFD packet flagged as double encapsulated would trigger a seg fault.
>> The problem surfaced because bfd_put_packet was reusing a packet
>> allocated on the stack that wasn't having its flags reset between calls.
>> 
>
> Thanks for tracking this one down, Mike!
>
>> This change will reset OL flags in data_clear(), which should fix this
>> type of packet reuse issue in general as long as data_clear() is called
>> in between uses. This change also includes a tangentially related check
>> in dp_packet_inner_l4_size(), where the correct offset was not being
>> checked.
>
> Up to maintainers but should this tangential fix be a separate patch?

I think it makes sense.  These are two different issues, so probably
should go as two separate patches.

>> 
>> Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
>> Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
>> Reported-by: Dumitru Ceara 
>> Reported-at: https://issues.redhat.com/browse/FDP-300
>> Signed-off-by: Mike Pattrick 
>> ---
>
> I'm no expert in this code but the change itself looks correct to me and
> OVN tests pass with this applied:
>
> Reviewed-by: Dumitru Ceara 
>
>>  lib/dp-packet.h | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 939bec5c8..f328a6637 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -207,6 +207,7 @@ void *dp_packet_resize_l2(struct dp_packet *, int 
>> increment);
>>  void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
>>  static inline void *dp_packet_eth(const struct dp_packet *);
>>  static inline void dp_packet_reset_offsets(struct dp_packet *);
>> +static inline void dp_packet_reset_offload(struct dp_packet *);
>>  static inline uint16_t dp_packet_l2_pad_size(const struct dp_packet *);
>>  static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint16_t);
>>  static inline void *dp_packet_l2_5(const struct dp_packet *);
>> @@ -380,6 +381,7 @@ dp_packet_clear(struct dp_packet *b)
>>  {
>>  dp_packet_set_data(b, dp_packet_base(b));
>>  dp_packet_set_size(b, 0);
>> +dp_packet_reset_offload(b);
>>  }
>>  
>>  /* Removes 'size' bytes from the head end of 'b', which must contain at 
>> least
>> @@ -537,7 +539,7 @@ dp_packet_inner_l4(const struct dp_packet *b)
>>  static inline size_t
>>  dp_packet_inner_l4_size(const struct dp_packet *b)
>>  {
>> -return OVS_LIKELY(b->l4_ofs != UINT16_MAX)
>> +return OVS_LIKELY(b->inner_l4_ofs != UINT16_MAX)
>> ? (const char *) dp_packet_tail(b)
>> - (const char *) dp_packet_inner_l4(b)
>> - dp_packet_l2_pad_size(b)
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] checkpatch.py: Port checkpatch related changes from the OVS repo.

2024-01-19 Thread Aaron Conole

Dumitru Ceara  writes:

> On 1/17/24 21:40, Mark Michelson wrote:
>> On 1/16/24 11:23, Numan Siddique wrote:
>>> On Tue, Jan 16, 2024 at 8:09 AM Dumitru Ceara  wrote:

 This picks up the following OVS changes:
    00d3d4a7d375 ("checkpatch: Avoid catastrophic backtracking.")
    d25c6bd8df37 ("checkpatch: Reorganize flagged words using a list.")
    9a50170a805a ("checkpatch: Add suggestions to the spell checker.")
    799f697e51ec ("checkpatch: Print subject field if misspelled or
 missing.")
    1b8fa4a66aa4 ("checkpatch: Add checks for the subject line.")
    bf843fd439b2 ("checkpatch: Don't spell check Fixes tag.")
    74bfe3701407 ("checkpatch: Add argument to skip committer signoff
 check.")
    21c61243fb75 ("checkpatch: Fix personal word list storage.")
    915b97971d58 ("checkpatch.py: Load codespell dictionary.")

 Signed-off-by: Dumitru Ceara 
>>>
>>> Thanks for the patch.
>>>
>>> Acked-by: Numan Siddique 
>>>
>>> Numan
>> 
>> Thank you Dumitru and Numan.
>> 
>> I merged this to main and all OVN branches back to 22.12.
>> 
>
> Thanks, Numan and Mark!
>
>> I'm curious about something. Since we have the OVS submodule, and there
>> are no OVN-specific additions to checkpatch (that I'm aware of anyway),
>> would it make sense to remove the checkpatch from OVN and just rely on
>> the OVS submodule's checkpatch? This way, whenever we do a submodule
>> bump, we'd automatically pull in the checkpatch modifications.
>> 
>> What do you think?
>> 
>
> +1 I think we do have one OVN-specific change to checkpatch but I'm sure
> we can figure out how to have it ported to the OVS one if needed:
>
> https://github.com/ovn-org/ovn/commit/04a5527c6a8657b682450f71f7ee681512557b36
>
> On second thought maybe we don't even need this one ported.  I think OVS
> has its own mitigation in place.

Yep - exactly.

> Another thing that took me by surprise is that 0-day bot always uses the
> OVS checkpatch script (that's why it didn't choke on Numan's I-P patch
> like OVN checkpatch did for me locally) - cc-ing Aaron.

It does?  I thought it should be using the checkpatch that is included in
the OVN repo.  That sounds like a bug.  But if you decide to switch to
using the OVS one then maybe it becomes a feature. :)

> Regards,
> Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/3] ofproto-dpif-upcall: Change flow dump duration message to WARN level.

2024-01-16 Thread Aaron Conole

Eelco Chaudron  writes:

> On 16 Jan 2024, at 0:24, Ilya Maximets wrote:
>
>> On 1/10/24 12:25, Eelco Chaudron wrote:
>>> Currently the 'Spent an unreasonably long Xms dumping flows' message
>>> is set to the INFO level. However, based on this, we are also
>>> drastically limiting the number of flows in the datapath, and this
>>> would warrant a WARNING level.
>>>
>>> Acked-by: Simon Horman 
>>> Signed-off-by: Eelco Chaudron 
>>> ---
>>>  ofproto/ofproto-dpif-upcall.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>>> index cc10f57b5..cd71e3ee3 100644
>>> --- a/ofproto/ofproto-dpif-upcall.c
>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>> @@ -1049,7 +1049,7 @@ udpif_revalidator(void *arg)
>>>  atomic_store_relaxed(>flow_limit, flow_limit);
>>>
>>>  if (duration > 2000) {
>>> -VLOG_INFO("Spent an unreasonably long %lldms dumping 
>>> flows",
>>> +VLOG_WARN("Spent an unreasonably long %lldms dumping 
>>> flows",
>>>duration);
>>>  }
>>>
>>
>> Hmm.  Interestingly, this is causing random test failures
>> when a large time warp is happening during revalidation.
>>
>> For example I saw this test failing in CI:
>>   drop-stats.at:132: testing drop-stats - stack too deep
>>
>> Because of:
>>
>> 2024-01-15T23:11:09.413Z|1|ofproto_dpif_upcall(revalidator6)|WARN|Spent
>> an unreasonably long 5000ms dumping flows
>>
>> We probably need to adjust the tests to avoid large time
>> warping where possible or exclude the warning from checking.
>
> ACK will try to look at this today to minimise the impact.

Thanks - I never saw it on my system (but maybe that was the failure
that required the retest on this series for the robot).

> //Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 3/3] timeval: Add coverage counter for long poll interval events.

2024-01-15 Thread Aaron Conole

Eelco Chaudron  writes:

> Martin Kennelly observes that even though this data is available to
> humans via the journal/log files, these aren't exactly easy for a
> developer to make any kind of behavioral inferences.  This kind of
> log and counter would be useful when checking on system health to
> let us know that an Open vSwitch component is noticing some kind of
> system level hiccup.
>
> Add a new coverage counter to track information on these events, and
> let a developer or system engineer know how long these events have
> occurred with some historical context.
>
> Reported-at: 
> https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
> Suggested-by: Martin Kennelly 
> Co-Authored-By: Aaron Conole 
> Signed-off-by: Aaron Conole 
> Signed-off-by: Eelco Chaudron 
> Acked-by: Simon Horman 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif-upcall: Add flow_limit coverage counters.

2024-01-15 Thread Aaron Conole

Eelco Chaudron  writes:

> Add new coverage counters that might help debugging flow_limit
> related issues.
>
> Signed-off-by: Eelco Chaudron 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/3] ofproto-dpif-upcall: Change flow dump duration message to WARN level.

2024-01-15 Thread Aaron Conole

Eelco Chaudron  writes:

> Currently the 'Spent an unreasonably long Xms dumping flows' message
> is set to the INFO level. However, based on this, we are also
> drastically limiting the number of flows in the datapath, and this
> would warrant a WARNING level.
>
> Acked-by: Simon Horman 
> Signed-off-by: Eelco Chaudron 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5 3/3] system-traffic.at: Test conntrack + FTP server running on a non-standard port.

2024-01-10 Thread Aaron Conole

Viacheslav Galaktionov via dev  writes:

> All existing test iterations assume that the FTP server is running on a
> standard port, which may not always be the case. These tests helped find
> problems in conntrack alg processing with non-standard ports.
>
> Perform the necessary adjustments to ensure the test suite can start the
> L7 server on a user-provided port.
>
> Signed-off-by: Viacheslav Galaktionov 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5 2/3] conntrack: Use helpers from committed connections.

2024-01-10 Thread Aaron Conole

Viacheslav Galaktionov via dev  writes:

> When a packet hits a flow rule without an explicitly specified helper,
> OvS has to rely on automatic application layer gateway detection to
> find related connections. This works as long as services are running on
> their standard ports, e.g. when FTP servers use TCP port 21.
>
> However, sometimes it's necessary to run services on non-standard ports.
> In that case, there is no way for OvS to guess which protocol is used
> within a given flow. Of course, this means that no related connections
> can be recognized.
>
> When a connection is committed with a particular helper, it's reasonable
> to assume this helper will be used in subsequent CT actions, as long as
> they don't override it. Achieve this behaviour by using the committed
> connection's helper when a flow rule does not specify one.
>
> Signed-off-by: Viacheslav Galaktionov 
> Acked-by: Ivan Malov 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5 1/3] lib/conntrack: Only use given packet in protocol detection.

2024-01-10 Thread Aaron Conole

Viacheslav Galaktionov via dev  writes:

> The current protocol detection logic relies on two pieces of metadata
> passed as arguments: tp_src and tp_dst, which represent the L4 source
> and destination port numbers from the flow that triggered the current
> flow rule first, and was responsible for creating the current DP flow.
>
> Since multiple network flows of many different kinds, potentially using
> different protocols on all layers, can be processed by one flow rule,
> using the metadata of some unrelated flow might lead to unexpected
> results. For example, ICMP type and code can be interpreted as TCP
> source and destination ports. This can confuse the code responsible for
> the helper selection, leading to errors in traffic handling and
> incorrect detection of related flows.
>
> One of the easiest ways to fix this problem is to simply remove the
> tp_src and tp_dst parameters from the picture. The current code base has
> no good use for them.
>
> The helper selection logic was based on these values and therefore needs
> to be changed. Ensure that the helper specified in a flow rule is used,
> given it is compatible with the L4 protocol of the packet. When a flow
> rule does not specify a helper, one can still be picked using the given
> packet's metadata like TCP/UDP ports.
>
> Signed-off-by: Viacheslav Galaktionov 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] system-tests: Test openflow matching for ct related packets with SNAT.

2024-01-10 Thread Aaron Conole

Brad Cowie  writes:

> Linux kernel commit ebddb1404900 ("net: move the nat function to
> nf_nat_ovs for ovs and tc") introduced a regression into the kernel
> datapath which prevented the openvswitch match key from being updated
> when nat was undone for packets in the related conntrack state. This
> issue caused these packets (usually ICMP/ICMPv6 error packets) to
> match the wrong openflow rule.
>
> This issue was fixed in linux kernel commit e6345d2824a3 ("netfilter:
> nf_nat: fix action not being set for all ct states").
>
> This test will fail for linux kernel versions v6.2 to v6.6, so test is
> skipped for versions lower than v6.7.
>
> Link: https://lore.kernel.org/netdev/20231221224311.130319-1-b...@faucet.nz/
> Suggested-by: Aaron Conole 
> Signed-off-by: Brad Cowie 
> ---

Thanks, applied.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [CONF] OVS+OVN '23 Fall Conference, Video links on main site

2024-01-09 Thread Aaron Conole

Greetings,

We've updated the links on the OVS+OVN Conference page to the YouTube
playlist that contains the OVS+OVN '23 Fall Conference.  These
recordings include the Q sessions that were conducted after each
presentation is made for most of the sessions (some talks had conflicts
with speakers).

As always, you can find the link to the most recent conference at
http://ovscon.site

Hope you enjoy!

Thanks,
OVS+OVN Conference Team

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 2/2] checkpatch.py: Load codespell dictionary.

2024-01-09 Thread Aaron Conole

Roi Dayan  writes:

> On 18/12/2023 9:45, Roi Dayan wrote:
>> 
>> 
>> On 14/12/2023 15:25, Aaron Conole wrote:
>>> Eelco Chaudron  writes:
>>>
>>>> On 15 Nov 2023, at 3:33, Aaron Conole wrote:
>>>>
>>>>> Eelco Chaudron  writes:
>>>>>
>>>>>> On 14 Nov 2023, at 8:49, Roi Dayan wrote:
>>>>>>
>>>>>>> On 13/11/2023 19:08, Aaron Conole wrote:
>>>>>>>> Roi Dayan  writes:
>>>>>>>>
>>>>>>>>> codespell dictionary contains a list of widely used words
>>>>>>>>> which enchant alone could fail on. for an example:
>>>>>>>>> refcount, pthread, enqueuing, etc.
>>>>>>>>> Load that dictionary, if exists, into enchant spell checker.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Roi Dayan 
>>>>>>>>> ---
>>>>>>>>
>>>>>>>> Thanks for working on this.  Just some nits below and then I think it
>>>>>>>> would be ready for merge.
>>>>>>>>
>>>>>>>>>  utilities/checkpatch.py | 19 +++
>>>>>>>>>  1 file changed, 19 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>>>>>>> index 2dd02ee6420c..2669eca11108 100755
>>>>>>>>> --- a/utilities/checkpatch.py
>>>>>>>>> +++ b/utilities/checkpatch.py
>>>>>>>>> @@ -39,6 +39,16 @@ spell_check_dict = None
>>>>>>>>>  def open_spell_check_dict():
>>>>>>>>>  import enchant
>>>>>>>>>
>>>>>>>>> +try:
>>>>>>>>> +import codespell_lib
>>>>>>>>> +codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>>>>>>> +codespell_file = os.path.join(codespell_dir, 'data', 
>>>>>>>>> 'dictionary.txt')
>>>>>>>>> +if not os.path.exists(codespell_file):
>>>>>>>>> +codespell_file = ''
>>>>>>>>> +except:
>>>>>>>>> +codespell_file = ''
>>>>>>>>> +
>>>>>>>>> +
>>>>>>>>
>>>>>>>> Looks like there's a flake8 flag here for whitespace.
>>>>>>>>
>>>>>>>> Waiting for Eelco / others to chime in before I take it.
>>>>>>>
>>>>>>> right. I have 2 space lines by mistake. need to remove one space line.
>>>>>>>
>>>>>>>>
>>>>>>>>>  try:
>>>>>>>>>  extra_keywords = ['ovs', 'vswitch', 'vswitchd', 
>>>>>>>>> 'ovs-vswitchd',
>>>>>>>>>'netdev', 'selinux', 'ovs-ctl', 'dpctl', 
>>>>>>>>> 'ofctl',
>>>>>>>>> @@ -91,7 +101,16 @@ def open_spell_check_dict():
>>>>>>>>>'syscall', 'lacp', 'ipf', 'skb', 
>>>>>>>>> 'valgrind']
>>>>>>>>>
>>>>>>>>>  global spell_check_dict
>>>>>>>>> +
>>>>>>>>>  spell_check_dict = enchant.Dict("en_US")
>>>>>>>>> +
>>>>>>>>> +if codespell_file:
>>>>>>>>> +with open(codespell_file) as f:
>>>>>>>>> +for line in f.readlines():
>>>>>>>>> +words = line.strip().split('>')[1].strip(', 
>>>>>>>>> ').split(',')
>>>>>>>>> +for word in words:
>>>>>>>>> +spell_check_dict.add_to_session(word)
>>>>>>>>> +
>>>>>>>>
>>>>>>>> I think the split(',') should also be: split(', ').  I noticed some of
>>>>>>>> the words added have a speace (such as):
>>>>>>>>  ' use'
>>>>>>>
>>>>>>> you are right. with the tempfile I did word.strip() which is not here.
>>>>>>> I missed the space on those words when checking.
>>>>>>
>>>>>> I guess keeping the original single ’,’ split, and adding word.strip() 
>>>>>> would be better, i.e.:
>>>>>>
>>>>>> +for word in words:
>>>>>> +spell_check_dict.add_to_session(word.strip())
>>>>>>
>>>>>>
>>>>>> With the above changes assuming Aaron would apply the patch.
>>>>>>
>>>>>> Acked-by: Eelco Chaudron 
>>>>>
>>>>> Sounds good to me.  I will apply the correct version.
>>>>
>>>> Hi Aaron, just a reminder as this might have slipped your mind ;)
>>>
>>> Sorry - I assumed that there would be a new version posted.  I can apply
>>> with your change and the nits above corrected.  Is that okay Roi?
>>>
>> 
>> hi, yes of course. thanks.
>> 
>
> Hi Aaron, Are you taking this?
> thanks

Yes - applied.  Thanks all.

>
>>>>>>
>>>>>>>>
>>>>>>>>>  for kw in extra_keywords:
>>>>>>>>>  spell_check_dict.add_to_session(kw)
>>>>>>>>
>>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1989 matches

Mail list logo