Re: [ovs-dev] [PATCH v4] tc: fix crash on EAGAIN return from recvmsg on netlink socket.

Ilya Maximets Fri, 26 May 2023 13:31:25 -0700

On 5/26/23 22:07, Ilya Maximets wrote:
> CC: Marcelo.
> 
> On 5/26/23 22:02, Ilya Maximets wrote:
>> On 5/26/23 21:19, Eelco Chaudron wrote:
>>>
>>>
>>> Send from my phone
>>>
>>>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <[email protected]> het 
>>>> volgende geschreven:
>>>>
>>>> On 5/26/23 20:43, Ilya Maximets wrote:
>>>>> On 5/23/23 12:39, Frode Nordahl wrote:
>>>>>> The tc module combines the use of the `tc_transact` helper
>>>>>> function for communication with the in-kernel tc infrastructure
>>>>>> with assertions on the reply data by `ofpbuf_at_assert` on the
>>>>>> received data prior to further processing.
>>>>>>
>>>>>> `tc_transact` in turn calls `nl_transact`, which via
>>>>>> `nl_transact_multiple__` ultimately calls and handles return
>>>>>> value from `recvmsg`.  On error a check for EAGAIN is performed
>>>>>> and a consequence of this condition is effectively to provide a
>>>>>> non-error (0) result and an empty reply buffer.
>>>>>
>>>>> Hi, Frode, others.
>>>>>
>>>>> I took a closer look at the patch and the code in the netlink-socket.
>>>>> IIUC, the EAGAIN here is not a result of operation that we're requesting,
>>>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while
>>>>> trying to read.  The reply to that transaction will arrive eventually and
>>>>> we will interpret it later as a reply to a different netlink transaction.
>>>>>
>>>>> The issue appears to be introduced in the following commit:
>>>>>  407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message 
>>>>> in a transaction.")
>>>>>
>>>>> And it was a performance optimization introduced as part of the set:
>>>>>  https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html
>>>>>
>>>>> The problem is that we can't tell apart socket EAGAIN if there is nothing
>>>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't
>>>>> manage to reply yet.
>>>>
>>>> It's still strance though that reply is delayed.  Typically netlink
>>>> replies are formed right in the request handler.  Did you manage to
>>>> figure out what is causing this in tc specifically?  It wasn't an issue
>>>> for an OVS and other common families for many years.
>>>>
>>>
>>> Replying from my phone so I did not look at any code. However when I looked 
>>> at the code during the review I noticed it will ignore earlier (none 
>>> processed messages) replies based in the receive loop on the transaction 
>>> ID. So I do not think it should be an issue of messages getting out of sync.
>>
>> That makes sense, OK.
>>
>>>
>>> I assumed the delay happens when we request something to a hardware offload 
>>> drive which is replying async due to it being busy.
>>
>> I'm not sure it is actually possible for netlink replies to be delivered
>> asynchronously.  Looking at the kernel code, by the time the original
>> request is processed, the reply should already be sent.
>>
>> A simpler explanation, would be that it just doesn't reply for some reason
>> to a request that needs a reply. e.g. some incorrect error handling in the
>> kernel that leaves request unanswered.
>>
>> One potential issue I see in the tc_new_tfilter() implementation is that
>> if tfilter_notify() will fail for any reason, the function will still
>> return 0, because tc_new_tfilter() doesn't check the result.  And no reply
>> will be generated, because the return value is zero (success) and the
>> rtnetlink_send() was never called due to an error in the tfilter_notify().
>>
>> I'm not sure if that's what happening here, but it is one option how we
>> end up with a successful netlink transaction with NLM_F_ECHO set, but no
>> reply on the socket.
>>
>> A potential fix would be (not tested):
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 2621550bfddc..9c334dbf9c5f 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
>> nlmsghdr *n,
>>      err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
>>                            flags, extack);
>>      if (err == 0) {
>> -            tfilter_notify(net, skb, n, tp, block, q, parent, fh,
>> +            err = tfilter_notify(net, skb, n, tp, block, q, parent, fh,
>>                             RTM_NEWTFILTER, false, rtnl_held, extack);
>>              tfilter_put(tp, fh);
>>              /* q pointer is NULL for shared blocks */
>> ---


^^ This is definitely not a correct solution though, because we already
changed the filter, but we just failed to dump it back.  Some other
solution is needed.  Most likely, it needs to be re-tried until it succeeds.
Or the change() will need to be reverted and a failure reported to the
userspace.

>>
>> Solution on the OVS side might be to fail transactions that requested a
>> reply, but didn't get one.  Async replies sound like something that should
>> not be allowed and likely is not allowed.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> Maybe Frode can get some perf traces to confirm this. 
>>>
>>> //Eelco
>>>>>
>>>>> The real solution would be to revert commit 407556ac6c90 or be a bit
>>>>> smarter and wait for reply only on requests that specify a reply buffer.
>>>>>
>>>>> Or am I missing something?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] tc: fix crash on EAGAIN return from recvmsg on netlink socket.

Reply via email to