Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

Florin Coras Mon, 20 Mar 2023 08:29:48 -0700

Hi, 

Understood and yes, connect will synchronously fail if port is not available, 
so you should be able to retry it later.


Regards, 
Florin

> On Mar 20, 2023, at 1:58 AM, Zhang Dongya <fortitude.zh...@gmail.com> wrote:
> 
> Hi,
> 
> It seems the issue occurs when there are disconnect called because our 
> network can't guarantee a tcp can't be reset even when 3 ways handshake is 
> completed (firewall issue :( ).
> 
> When we find the app layer timeout, we will first disconnect (because we 
> record the session handle, this session might be a half open session), does 
> vnet session layer guarantee that if we reconnect from master thread when the 
> half open session still not be released yet (due to asynchronous logic) that 
> the reconnect fail? if then we can retry connect later.
> 
> I prefer to not registered half open callback because I think it make app 
> complicated from a TCP programming prospective.
> 
> For your patch, I think it should be work because I can't delete the half 
> open session immediately because there is worker configured, so the half open 
> will be removed from bihash when syn retrans timeout. I have merged the patch 
> and will provide feedback later.
> 
> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
> 于2023年3月20日周一 13:09写道：
>> Hi, 
>> 
>> Inline.
>> 
>>> On Mar 19, 2023, at 6:47 PM, Zhang Dongya <fortitude.zh...@gmail.com 
>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> It can be aborted both in established state or half open state because I 
>>> will do timeout in our app layer. 
>> 
>> [fc] Okay! Is the issue present irrespective of the state of the session or 
>> does it happen only after a disconnect in hanf-open state? More lower. 
>> 
>>> 
>>> Regarding your question,
>>> 
>>> - Yes we add a builtin in app relys on C apis that  mainly use 
>>> vnet_connect/disconnect to connect or disconnect session.
>> 
>> [fc] Understood
>> 
>>> - We call these api in a vpp ctrl process which should be running on the 
>>> master thread, we never do session setup/teardown on worker thread. (the 
>>> environment that found this issue is configured with 1 master + 1 worker 
>>> setup.)
>> 
>> [fc] With vpp latest it’s possible to connect from first workers. It’s an 
>> optimization meant to avoid 1) worker barrier on syns and 2) entering poll 
>> mode on main (consume less cpu)
>> 
>>> - We started to develop the app using 22.06 and I keep to merge upstream 
>>> changes to latest vpp by cherry-picking. The reason for line mismatch is 
>>> that I added some comment to the session layer code, it should be equal to 
>>> the master branch now.
>> 
>> [fc] Ack
>> 
>>> 
>>> When reading the code I understand that we mainly want to cleanup half open 
>>> from bihash in session_stream_connect_notify, however, in syn-sent state if 
>>> I choose to close the session, the session might be closed by my app due to 
>>> session setup timeout (in second scale), in that case, session will be 
>>> marked as half_open_done and half open session will be freed shortly in the 
>>> ctrl thread (the 1st worker?).
>> 
>> [fc] Actually, this might be the issue. We did start to provide a half-open 
>> session handle to apps which if closed does clean up the session but 
>> apparently it is missing the cleanup of the session lookup table. Could you 
>> try this patch [1]? It might need additional work.
>> 
>> Having said that, forcing a close/cleanup will not free the port 
>> synchronously. So, if you’re using fixed ports, you’ll have to wait for the 
>> half-open cleanup notification.
>> 
>>> 
>>> Should I also registered half open callback or there are some other reason 
>>> that lead to this failure?
>>> 
>> 
>> [fc] Yes, see above.
>> 
>> Regards, 
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38526
>> 
>>> 
>>> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月20日周一 06:22写道：
>>>> Hi, 
>>>> 
>>>> When you abort the connection, is it fully established or half-open? 
>>>> Half-opens are cleaned up by the owner thread after a timeout, but the 
>>>> 5-tuple should be assigned to the fully established session by that point. 
>>>> tcp_half_open_connection_cleanup does not cleanup the bihash instead 
>>>> session_stream_connect_notify does once tcp connect returns either success 
>>>> or failure. 
>>>> 
>>>> So a few questions:
>>>> - is it accurate to assume you have a builtin vpp app and rely only on C 
>>>> apis to interact with host stack?
>>>> - on what thread (main or first worker) do you call vnet_connect?
>>>> - what api do you use to close the session? 
>>>> - what version of vpp is this because lines don’t match vpp latest?
>>>> 
>>>> Regards,
>>>> Florin
>>>> 
>>>> > On Mar 19, 2023, at 2:08 AM, Zhang Dongya <fortitude.zh...@gmail.com 
>>>> > <mailto:fortitude.zh...@gmail.com>> wrote:
>>>> > 
>>>> > Hi list,
>>>> > 
>>>> > recently in our application, we constantly triggered such abrt issue 
>>>> > which make our connectivity interrupt for a while:
>>>> > 
>>>> > Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
>>>> > 0x7fefd3b2000b
>>>> > Mar 19 16:11:26 ubuntu vnet[2565933]: 
>>>> > /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 
>>>> > (tcp46_input_inline) assertion `tcp_lookup_is_valid (tc0, b[0], 
>>>> > tcp_buffer_hdr (b[0]))' fails
>>>> > 
>>>> > Our scenario is quite simple, we will make 4 parallel tcp connection 
>>>> > (use 4 fixed source ports) to a remote vpp stack (fixed ip and port), 
>>>> > and will do some keepalive in our application layer, since we only use 
>>>> > the vpp tcp stack to make the middle box happy with the connection, we 
>>>> > do not use the data transport of tcp statck actually.
>>>> > 
>>>> > However, since the network condition is complex, we have to  always need 
>>>> > to abrt the connection and reconnect.
>>>> > 
>>>> > I keep to merge upstream session and tcp fix however the issue still not 
>>>> > fixed, what I found now it may be in some case 
>>>> > tcp_half_open_connection_cleanup may not deleted the half open session 
>>>> > from the lookup table (bihash) and the session index is realloced by 
>>>> > other connection.
>>>> > 
>>>> > Hope the list can provide some hint about how to overcome this issue, 
>>>> > thanks a lot.
>>>> > 
>>>> > 
>>>> > 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> 
> 
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22733): https://lists.fd.io/g/vpp-dev/message/22733
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

Reply via email to