On 10/27/23 23:16, Stéphane Graber via dev wrote:
> Hello,
> 
> I'm currently working on re-enabling our daily OVN tests in Incus (the
> LXD fork).
> 
> Unfortunately I'm not having much luck getting our testsuite to go all
> the way through as it's triggering kernel panics.
> 
> Here is the stack trace I'm getting:
> ```

<snip>

> ```
> 
> That kernel build is effectively a clean 6.5.9 kernel.
> The action immediately preceding the kernel panic is the instance
> being forcefully stopped, making the last command to run prior to
> panic be `ip link del vethXYZ`.

Hi, Stéphane.  Thanks for the report!
This is interesting.  It looks like for some reason the revalidator
is generating a datapath flow with 60+ nested actions, which is
unusual and should not really happen in a normal setup.

> 
> I can reproduce this panic very consistently, though can't easily
> isolate the particular configuration needed in order for this to
> trigger.
> For example, once the machine is rebooted after the panic, I can
> start/stop those instances at will, without any kernel panic.

Would be really helpful if you could somehow intercept the netlink
message revalidator is sending before the kernel dies.  I understand
though that it might be challenging.  Debug logs in dpif_operate()
are printed after the operation, so we can't actually use them,
unless you modify the sources and move log_flow_put_message() before
dpif->dpif_class->operate() call.
One thing that happens before the operation execution is USDT probe.
If you have them enabled during the build, you should be able to
capture the dpif_netlink_operate__:op_flow_put request arguments
this way before it goes to the kernel.  Some info about USDT probes:
  https://docs.openvswitch.org/en/latest/topics/usdt-probes/

On the other hand, kernel should likely have a nesting limit to avoid
crashing on user requests. :)  We have a MAX_ODP_NESTED limit for the
user-provided datapath flows, which is equal to 32.  So, it might be
a sane value to use for the kernel action parsing as well.

But we should still figure out why OVS generates such a flow in a
first place as it doesn't sound right.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to