On 10/27/25 19:40, Ilya Maximets wrote:
On 10/24/25 4:28 PM, Frode Nordahl wrote:
The commit in the fixes tag extended struct dpif_execute with an
upcall_pid member.

This member was left uninitialized in dpif_execute_helper_cb(),
leading to Open vSwitch providing the kernel with a PID for which
it does not have a listener.

This condition is caught by one of the existing system-traffic
tests which is extended to explicitly check for presence of lost
upcalls in the datapath.

Fixes: 0d9dc8e9ca4a ("dpif-netlink: Provide original upcall pid in 'execute' 
commands.")
Reported-at: https://launchpad.net/bugs/2127061
Signed-off-by: Frode Nordahl <[email protected]>
---
  lib/dpif.c              | 1 +
  tests/system-traffic.at | 2 ++
  2 files changed, 3 insertions(+)

Good catch, Frode!  Thanks for the patch.

Thank you for taking the time to review, Ilya, much appreciated!


Wanted to note that retis was instrumental in tracking down the
root cause if this issue, very useful tool!

I also wanted to note that I spent considerable amonut of time
trying to figure out a low touch way of creating a specifc test
for this, but did not come up with any good alternatives.

I appear to somehow have missed that an existing system test
catches the condition, so I went ahead and expanded on that
in the end.

Hmm.  I can't reproduce the problem with this particular test.
There should be no upcalls after the packet is re-injected via
the 'execute' request.  And also the pcap file check should be
failing if the packet was dropped.  Could you explain a bit more
what you see in your test run?

I identified this particular test as interesting by running the entire testsuite with retis and looking for evidence of a -111 return for upcalls in the kernel code.

Rechecking this locally I see a difference in behavior depending on which compiler is used to build Open vSwitch, specifically I see the condition when compiling with gcc 14.2.0.

Recompiling with gcc 15.2.0 on otherwise identical system makes the test pass. I can only guess there is a difference in behavior for how the uninitialized value is treated.

In any case, we cannot rely on compiler behavior for our test case, so I guess we need to find some other way of testing this. Any suggestions welcome :-)


diff --git a/lib/dpif.c b/lib/dpif.c
index 070fc0131..a062a4179 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1240,6 +1240,7 @@ dpif_execute_helper_cb(void *aux_, struct dp_packet_batch 
*packets_,
          execute.probe = false;
          execute.mtu = 0;
          execute.hash = 0;
+        execute.upcall_pid = 0;

Thinking a bit more about the problem, I think, we should be
getting the original upcall pid for this request.  We have it
in the 'execute' structure before we're trying to execute with
help.  We can store the value in the aux data and set it here.
The same as we do for the 'flow'.  This will ensure that we're
keeping the upcall pid regardless if the execution is going
direct to kernel or some actions are executed in the userspace
first.

Ack, I'll address this in a v2.

--
Frode Nordahl

          aux->error = dpif_execute(aux->dpif, &execute);
          log_execute_message(aux->dpif, &this_module, &execute,
                              true, aux->error);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 0c41c39ff..50a45383f 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3061,6 +3061,8 @@ NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
      "$(ovs-ofctl compose-packet --bare 'ICMP_PKT')"],
    [0], [ignore])

+AT_CHECK([ovs-appctl dpctl/show | grep -q lookups.*lost:0])
+
  dnl Check the expected decapsulated on the egress interface.
  OVS_WAIT_UNTIL([ovs-pcap p1.pcap | grep -q \
      "^$(ovs-ofctl compose-packet --bare 'ICMP_PKT')\$"])

Best regards, Ilya Maximets.



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to