CC: Marcelo.

On 5/26/23 22:02, Ilya Maximets wrote:
> On 5/26/23 21:19, Eelco Chaudron wrote:
>>
>>
>> Send from my phone
>>
>>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <[email protected]> het 
>>> volgende geschreven:
>>>
>>> On 5/26/23 20:43, Ilya Maximets wrote:
>>>> On 5/23/23 12:39, Frode Nordahl wrote:
>>>>> The tc module combines the use of the `tc_transact` helper
>>>>> function for communication with the in-kernel tc infrastructure
>>>>> with assertions on the reply data by `ofpbuf_at_assert` on the
>>>>> received data prior to further processing.
>>>>>
>>>>> `tc_transact` in turn calls `nl_transact`, which via
>>>>> `nl_transact_multiple__` ultimately calls and handles return
>>>>> value from `recvmsg`.  On error a check for EAGAIN is performed
>>>>> and a consequence of this condition is effectively to provide a
>>>>> non-error (0) result and an empty reply buffer.
>>>>
>>>> Hi, Frode, others.
>>>>
>>>> I took a closer look at the patch and the code in the netlink-socket.
>>>> IIUC, the EAGAIN here is not a result of operation that we're requesting,
>>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while
>>>> trying to read.  The reply to that transaction will arrive eventually and
>>>> we will interpret it later as a reply to a different netlink transaction.
>>>>
>>>> The issue appears to be introduced in the following commit:
>>>>  407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message in 
>>>> a transaction.")
>>>>
>>>> And it was a performance optimization introduced as part of the set:
>>>>  https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html
>>>>
>>>> The problem is that we can't tell apart socket EAGAIN if there is nothing
>>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't
>>>> manage to reply yet.
>>>
>>> It's still strance though that reply is delayed.  Typically netlink
>>> replies are formed right in the request handler.  Did you manage to
>>> figure out what is causing this in tc specifically?  It wasn't an issue
>>> for an OVS and other common families for many years.
>>>
>>
>> Replying from my phone so I did not look at any code. However when I looked 
>> at the code during the review I noticed it will ignore earlier (none 
>> processed messages) replies based in the receive loop on the transaction ID. 
>> So I do not think it should be an issue of messages getting out of sync.
> 
> That makes sense, OK.
> 
>>
>> I assumed the delay happens when we request something to a hardware offload 
>> drive which is replying async due to it being busy.
> 
> I'm not sure it is actually possible for netlink replies to be delivered
> asynchronously.  Looking at the kernel code, by the time the original
> request is processed, the reply should already be sent.
> 
> A simpler explanation, would be that it just doesn't reply for some reason
> to a request that needs a reply. e.g. some incorrect error handling in the
> kernel that leaves request unanswered.
> 
> One potential issue I see in the tc_new_tfilter() implementation is that
> if tfilter_notify() will fail for any reason, the function will still
> return 0, because tc_new_tfilter() doesn't check the result.  And no reply
> will be generated, because the return value is zero (success) and the
> rtnetlink_send() was never called due to an error in the tfilter_notify().
> 
> I'm not sure if that's what happening here, but it is one option how we
> end up with a successful netlink transaction with NLM_F_ECHO set, but no
> reply on the socket.
> 
> A potential fix would be (not tested):
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 2621550bfddc..9c334dbf9c5f 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
> nlmsghdr *n,
>       err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
>                             flags, extack);
>       if (err == 0) {
> -             tfilter_notify(net, skb, n, tp, block, q, parent, fh,
> +             err = tfilter_notify(net, skb, n, tp, block, q, parent, fh,
>                              RTM_NEWTFILTER, false, rtnl_held, extack);
>               tfilter_put(tp, fh);
>               /* q pointer is NULL for shared blocks */
> ---
> 
> Solution on the OVS side might be to fail transactions that requested a
> reply, but didn't get one.  Async replies sound like something that should
> not be allowed and likely is not allowed.
> 
> Best regards, Ilya Maximets.
> 
>>
>> Maybe Frode can get some perf traces to confirm this. 
>>
>> //Eelco
>>>>
>>>> The real solution would be to revert commit 407556ac6c90 or be a bit
>>>> smarter and wait for reply only on requests that specify a reply buffer.
>>>>
>>>> Or am I missing something?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to