On 5/26/23 22:07, Ilya Maximets wrote: > CC: Marcelo. > > On 5/26/23 22:02, Ilya Maximets wrote: >> On 5/26/23 21:19, Eelco Chaudron wrote: >>> >>> >>> Send from my phone >>> >>>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <[email protected]> het >>>> volgende geschreven: >>>> >>>> On 5/26/23 20:43, Ilya Maximets wrote: >>>>> On 5/23/23 12:39, Frode Nordahl wrote: >>>>>> The tc module combines the use of the `tc_transact` helper >>>>>> function for communication with the in-kernel tc infrastructure >>>>>> with assertions on the reply data by `ofpbuf_at_assert` on the >>>>>> received data prior to further processing. >>>>>> >>>>>> `tc_transact` in turn calls `nl_transact`, which via >>>>>> `nl_transact_multiple__` ultimately calls and handles return >>>>>> value from `recvmsg`. On error a check for EAGAIN is performed >>>>>> and a consequence of this condition is effectively to provide a >>>>>> non-error (0) result and an empty reply buffer. >>>>> >>>>> Hi, Frode, others. >>>>> >>>>> I took a closer look at the patch and the code in the netlink-socket. >>>>> IIUC, the EAGAIN here is not a result of operation that we're requesting, >>>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while >>>>> trying to read. The reply to that transaction will arrive eventually and >>>>> we will interpret it later as a reply to a different netlink transaction. >>>>> >>>>> The issue appears to be introduced in the following commit: >>>>> 407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message >>>>> in a transaction.") >>>>> >>>>> And it was a performance optimization introduced as part of the set: >>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html >>>>> >>>>> The problem is that we can't tell apart socket EAGAIN if there is nothing >>>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't >>>>> manage to reply yet. >>>> >>>> It's still strance though that reply is delayed. Typically netlink >>>> replies are formed right in the request handler. Did you manage to >>>> figure out what is causing this in tc specifically? It wasn't an issue >>>> for an OVS and other common families for many years. >>>> >>> >>> Replying from my phone so I did not look at any code. However when I looked >>> at the code during the review I noticed it will ignore earlier (none >>> processed messages) replies based in the receive loop on the transaction >>> ID. So I do not think it should be an issue of messages getting out of sync. >> >> That makes sense, OK. >> >>> >>> I assumed the delay happens when we request something to a hardware offload >>> drive which is replying async due to it being busy. >> >> I'm not sure it is actually possible for netlink replies to be delivered >> asynchronously. Looking at the kernel code, by the time the original >> request is processed, the reply should already be sent. >> >> A simpler explanation, would be that it just doesn't reply for some reason >> to a request that needs a reply. e.g. some incorrect error handling in the >> kernel that leaves request unanswered. >> >> One potential issue I see in the tc_new_tfilter() implementation is that >> if tfilter_notify() will fail for any reason, the function will still >> return 0, because tc_new_tfilter() doesn't check the result. And no reply >> will be generated, because the return value is zero (success) and the >> rtnetlink_send() was never called due to an error in the tfilter_notify(). >> >> I'm not sure if that's what happening here, but it is one option how we >> end up with a successful netlink transaction with NLM_F_ECHO set, but no >> reply on the socket. >> >> A potential fix would be (not tested): >> >> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c >> index 2621550bfddc..9c334dbf9c5f 100644 >> --- a/net/sched/cls_api.c >> +++ b/net/sched/cls_api.c >> @@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct >> nlmsghdr *n, >> err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, >> flags, extack); >> if (err == 0) { >> - tfilter_notify(net, skb, n, tp, block, q, parent, fh, >> + err = tfilter_notify(net, skb, n, tp, block, q, parent, fh, >> RTM_NEWTFILTER, false, rtnl_held, extack); >> tfilter_put(tp, fh); >> /* q pointer is NULL for shared blocks */ >> ---
^^ This is definitely not a correct solution though, because we already changed the filter, but we just failed to dump it back. Some other solution is needed. Most likely, it needs to be re-tried until it succeeds. Or the change() will need to be reverted and a failure reported to the userspace. >> >> Solution on the OVS side might be to fail transactions that requested a >> reply, but didn't get one. Async replies sound like something that should >> not be allowed and likely is not allowed. >> >> Best regards, Ilya Maximets. >> >>> >>> Maybe Frode can get some perf traces to confirm this. >>> >>> //Eelco >>>>> >>>>> The real solution would be to revert commit 407556ac6c90 or be a bit >>>>> smarter and wait for reply only on requests that specify a reply buffer. >>>>> >>>>> Or am I missing something? >>>>> >>>>> Best regards, Ilya Maximets. >>>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
