On 5/26/23 21:19, Eelco Chaudron wrote:
>
>
> Send from my phone
>
>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <[email protected]> het
>> volgende geschreven:
>>
>> On 5/26/23 20:43, Ilya Maximets wrote:
>>> On 5/23/23 12:39, Frode Nordahl wrote:
>>>> The tc module combines the use of the `tc_transact` helper
>>>> function for communication with the in-kernel tc infrastructure
>>>> with assertions on the reply data by `ofpbuf_at_assert` on the
>>>> received data prior to further processing.
>>>>
>>>> `tc_transact` in turn calls `nl_transact`, which via
>>>> `nl_transact_multiple__` ultimately calls and handles return
>>>> value from `recvmsg`. On error a check for EAGAIN is performed
>>>> and a consequence of this condition is effectively to provide a
>>>> non-error (0) result and an empty reply buffer.
>>>
>>> Hi, Frode, others.
>>>
>>> I took a closer look at the patch and the code in the netlink-socket.
>>> IIUC, the EAGAIN here is not a result of operation that we're requesting,
>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while
>>> trying to read. The reply to that transaction will arrive eventually and
>>> we will interpret it later as a reply to a different netlink transaction.
>>>
>>> The issue appears to be introduced in the following commit:
>>> 407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message in
>>> a transaction.")
>>>
>>> And it was a performance optimization introduced as part of the set:
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html
>>>
>>> The problem is that we can't tell apart socket EAGAIN if there is nothing
>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't
>>> manage to reply yet.
>>
>> It's still strance though that reply is delayed. Typically netlink
>> replies are formed right in the request handler. Did you manage to
>> figure out what is causing this in tc specifically? It wasn't an issue
>> for an OVS and other common families for many years.
>>
>
> Replying from my phone so I did not look at any code. However when I looked
> at the code during the review I noticed it will ignore earlier (none
> processed messages) replies based in the receive loop on the transaction ID.
> So I do not think it should be an issue of messages getting out of sync.
That makes sense, OK.
>
> I assumed the delay happens when we request something to a hardware offload
> drive which is replying async due to it being busy.
I'm not sure it is actually possible for netlink replies to be delivered
asynchronously. Looking at the kernel code, by the time the original
request is processed, the reply should already be sent.
A simpler explanation, would be that it just doesn't reply for some reason
to a request that needs a reply. e.g. some incorrect error handling in the
kernel that leaves request unanswered.
One potential issue I see in the tc_new_tfilter() implementation is that
if tfilter_notify() will fail for any reason, the function will still
return 0, because tc_new_tfilter() doesn't check the result. And no reply
will be generated, because the return value is zero (success) and the
rtnetlink_send() was never called due to an error in the tfilter_notify().
I'm not sure if that's what happening here, but it is one option how we
end up with a successful netlink transaction with NLM_F_ECHO set, but no
reply on the socket.
A potential fix would be (not tested):
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2621550bfddc..9c334dbf9c5f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct
nlmsghdr *n,
err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
flags, extack);
if (err == 0) {
- tfilter_notify(net, skb, n, tp, block, q, parent, fh,
+ err = tfilter_notify(net, skb, n, tp, block, q, parent, fh,
RTM_NEWTFILTER, false, rtnl_held, extack);
tfilter_put(tp, fh);
/* q pointer is NULL for shared blocks */
---
Solution on the OVS side might be to fail transactions that requested a
reply, but didn't get one. Async replies sound like something that should
not be allowed and likely is not allowed.
Best regards, Ilya Maximets.
>
> Maybe Frode can get some perf traces to confirm this.
>
> //Eelco
>>>
>>> The real solution would be to revert commit 407556ac6c90 or be a bit
>>> smarter and wait for reply only on requests that specify a reply buffer.
>>>
>>> Or am I missing something?
>>>
>>> Best regards, Ilya Maximets.
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev