On 5/12/26 2:11 PM, Eelco Chaudron wrote:
> 
> 
> On 11 May 2026, at 16:35, Tim Rozet wrote:
> 
>> Hey Eelco,
>> The issue is I still have to refer to a show output to get the flow
>> information to figure out what the error was. I would prefer to have
>> that information in the log itself, so that I can root cause without
>> having to try to reproduce and catch the error later in a live system.
>> The bug where we hit this before was one Ilya helped me RCA as
>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/.
>> We are hitting other instances of this error with the previous bug fixed,
>> and we have not caught the flow yet. Having this output in the log would
>> be a big help.
> 
> I understand that it would be more convenient to get all the output. I'm
> worried that we might introduce additional error messages that we did not
> see in the past, potentially worrying users for something that is not
> actually an issue, for example EEXIST.
> 
> We could use log_flow_put_message() to replace the TC-specific message in
> order to avoid the above. We might need to pass a name rather than dpif to
> the function, where in the TC case the name could be the netdev name.

The EEXIST should not be a problem as it is explicitly handled inside the
log_flow_put_message() to be at debug level.  So, there should not be a
difference between when this message and the one inside the TC-specific
code is shown.  So, there should be no errors logged here unless we already
have an error message from the tc-offload level.

> I have copied in Ilya for his thoughts, as I believe he has had concerns
> around error messages/codes in general, if I remember correctly.

I think, my initial concern was about negative error code handling, i.e.
that we could've had some strange negative codes reported from the offload
implementation.  But on a closer look it seems like the only negative case
is the -1 set right inside the dpif_offload_operate(), so it should be fine.

The error handling in this code is a bit awkward though even without this
patch.

For the code, I'd suggest defining a new variable inside the case or at
the top of the loop, do the error code calculation inside the case and use
the result in the function call instead of writing a big ternary as an
argument.

Best regards, Ilya Maximets.

> 
> Cheers,
> 
> Eelco
> 
>> Thanks,
>> -Tim
>> On 17 Apr 2026, at 18:28, Tim Rozet via dev wrote:
>>
>>> When there are offload errors we see error messages that only include
>>> the netdev such as:
>>>
>>> 2026-03-19T19:42:43.103Z|03122|dpif_netlink(handler367)|ERR|failed to 
>>> offload flow: Invalid argument: ovn-d2586f-0
>>>
>>> This error message lacks the flow information in order to debug why the
>>> flow failed to be offloaded. This then requires a user to go reproduce
>>> the problem and turn on debug logging to try to correlate the error to a
>>> flow.
>>>
>>> Pass the provider error to log_flow_put_message() for flow put failures,
>>> while keeping ENOSPC on the existing debug-only path. This makes the
>>> offload layer emit the failed flow match/actions for cases like EINVAL,
>>> so root-causing TC offload failures does not require enabling debug
>>> logging.
>>
>> Hi Tim,
>>
>> Sorry for the late response, got caught up with patches from others.
>>
>> Looking at the change, I am not sure gating on specific error codes is
>> the right approach here. Would it be an option instead to include the
>> ufid in the existing log message? The one you quoted above and the
>> delete one a few lines lower?
>>
>> With the ufid present, the failing flow can be looked up in the flow
>> dump output without needing to change the log level or include the
>> full match and actions inline.
>>
>> What do you think?
>>
>> //Eelco
>>
>> [...]
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to