CC: Marcelo. On 5/26/23 22:02, Ilya Maximets wrote: > On 5/26/23 21:19, Eelco Chaudron wrote: >> >> >> Send from my phone >> >>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <[email protected]> het >>> volgende geschreven: >>> >>> On 5/26/23 20:43, Ilya Maximets wrote: >>>> On 5/23/23 12:39, Frode Nordahl wrote: >>>>> The tc module combines the use of the `tc_transact` helper >>>>> function for communication with the in-kernel tc infrastructure >>>>> with assertions on the reply data by `ofpbuf_at_assert` on the >>>>> received data prior to further processing. >>>>> >>>>> `tc_transact` in turn calls `nl_transact`, which via >>>>> `nl_transact_multiple__` ultimately calls and handles return >>>>> value from `recvmsg`. On error a check for EAGAIN is performed >>>>> and a consequence of this condition is effectively to provide a >>>>> non-error (0) result and an empty reply buffer. >>>> >>>> Hi, Frode, others. >>>> >>>> I took a closer look at the patch and the code in the netlink-socket. >>>> IIUC, the EAGAIN here is not a result of operation that we're requesting, >>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while >>>> trying to read. The reply to that transaction will arrive eventually and >>>> we will interpret it later as a reply to a different netlink transaction. >>>> >>>> The issue appears to be introduced in the following commit: >>>> 407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message in >>>> a transaction.") >>>> >>>> And it was a performance optimization introduced as part of the set: >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html >>>> >>>> The problem is that we can't tell apart socket EAGAIN if there is nothing >>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't >>>> manage to reply yet. >>> >>> It's still strance though that reply is delayed. Typically netlink >>> replies are formed right in the request handler. Did you manage to >>> figure out what is causing this in tc specifically? It wasn't an issue >>> for an OVS and other common families for many years. >>> >> >> Replying from my phone so I did not look at any code. However when I looked >> at the code during the review I noticed it will ignore earlier (none >> processed messages) replies based in the receive loop on the transaction ID. >> So I do not think it should be an issue of messages getting out of sync. > > That makes sense, OK. > >> >> I assumed the delay happens when we request something to a hardware offload >> drive which is replying async due to it being busy. > > I'm not sure it is actually possible for netlink replies to be delivered > asynchronously. Looking at the kernel code, by the time the original > request is processed, the reply should already be sent. > > A simpler explanation, would be that it just doesn't reply for some reason > to a request that needs a reply. e.g. some incorrect error handling in the > kernel that leaves request unanswered. > > One potential issue I see in the tc_new_tfilter() implementation is that > if tfilter_notify() will fail for any reason, the function will still > return 0, because tc_new_tfilter() doesn't check the result. And no reply > will be generated, because the return value is zero (success) and the > rtnetlink_send() was never called due to an error in the tfilter_notify(). > > I'm not sure if that's what happening here, but it is one option how we > end up with a successful netlink transaction with NLM_F_ECHO set, but no > reply on the socket. > > A potential fix would be (not tested): > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > index 2621550bfddc..9c334dbf9c5f 100644 > --- a/net/sched/cls_api.c > +++ b/net/sched/cls_api.c > @@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct > nlmsghdr *n, > err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, > flags, extack); > if (err == 0) { > - tfilter_notify(net, skb, n, tp, block, q, parent, fh, > + err = tfilter_notify(net, skb, n, tp, block, q, parent, fh, > RTM_NEWTFILTER, false, rtnl_held, extack); > tfilter_put(tp, fh); > /* q pointer is NULL for shared blocks */ > --- > > Solution on the OVS side might be to fail transactions that requested a > reply, but didn't get one. Async replies sound like something that should > not be allowed and likely is not allowed. > > Best regards, Ilya Maximets. > >> >> Maybe Frode can get some perf traces to confirm this. >> >> //Eelco >>>> >>>> The real solution would be to revert commit 407556ac6c90 or be a bit >>>> smarter and wait for reply only on requests that specify a reply buffer. >>>> >>>> Or am I missing something? >>>> >>>> Best regards, Ilya Maximets. >>> >
_______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
