from:"Florin Coras"

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-23 Thread Florin Coras

Hi Zhang, 

Thanks for confirming! Give me a few more days to check if there’s any other 
improvements to be made in that area. 

Regards,
Florin 

> On Mar 23, 2023, at 12:00 AM, Zhang Dongya  wrote:
> 
> Hi,
> 
> The new patch works as expected, no assert triggered abort anymore.
> 
> Really appreciate your help and thanks a lot.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月22日周三 11:54写道：
>> Hi Zhang, 
>> 
>> Awesome! Thanks!
>> 
>> Regards,
>> Florin
>> 
>>> On Mar 21, 2023, at 7:41 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi Florin,
>>> 
>>> Thanks a lot, the previous patch and with reset disabled have been running 
>>> 1 day without issue.
>>> 
>>> I will enable reset and with your new patch, will provide feedback later.
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月22日周三 02:12写道：
>>>> Hi, 
>>>> 
>>>> Okay, resetting of half-opens definitely not supported. I updated the 
>>>> patch to just clean them up on forced reset, without sending a reset to 
>>>> make sure session lookup table cleanup still happens. 
>>>> 
>>>> Regards,
>>>> Florin
>>>> 
>>>>> On Mar 20, 2023, at 9:13 PM, Zhang Dongya >>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> After review my code, I found that I have add a flag to the 
>>>>> vnet_disconnect API which will call session_reset instead of 
>>>>> session_close, the reason I do this is to make intermediate firewall just 
>>>>> flush the state and reconstruct if I later reconnect.
>>>>> 
>>>>> It seems in session_reset logic, for half open session, it also missing 
>>>>> to remove the session from the lookup hash which may cause the issue too.
>>>>> 
>>>>> I change my code and will test with your patch along, will provide 
>>>>> feedback later.
>>>>> 
>>>>> I also noticed the bihash issue discussed in the list recently, I will 
>>>>> merge later.
>>>>> 
>>>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>>>> 于2023年3月21日周二 11:56写道：
>>>>>> Hi, 
>>>>>> 
>>>>>> That last thing is pretty interesting. It’s either the issue fixed by 
>>>>>> this patch [1] or sessions are somehow cleaned up multiple times. If 
>>>>>> it’s the latter, I’d really like to understand how that happens. 
>>>>>> 
>>>>>> Regards,
>>>>>> Florin
>>>>>> 
>>>>>> [1] https://gerrit.fd.io/r/c/vpp/+/38507 
>>>>>> 
>>>>>>> On Mar 20, 2023, at 6:52 PM, Zhang Dongya >>>>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> After merge this patch and update the test environment, the issue still 
>>>>>>> persists.
>>>>>>> 
>>>>>>> Let me clear my client app config:
>>>>>>> 1. register a reset callback, which will call vnet_disconnect there and 
>>>>>>> also trigger reconnect by send event to the ctrl process.)
>>>>>>> 2. register a connected callback, which will handle connect err by 
>>>>>>> trigger reconnect, on success, it will record session handle and 
>>>>>>> extract tcp sequence for our app usage.
>>>>>>> 3. register a disconnect callback, which basically do same as reset 
>>>>>>> callback.
>>>>>>> 4. register a cleanup callback and accept callback, which basically 
>>>>>>> make the session layer happy without actually relevant work to do.
>>>>>>> 
>>>>>>> There is a ctrl process in mater, which will handle periodically 
>>>>>>> reconnect or triggered by event.
>>>>>>> 
>>>>>>> BTW, I also see frequently warning 'session %u hash delete rv -3' in 
>>>>>>> session_delete in my environment, hope this helps to investigate.
>>>>>>> 
>>>>>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>>>>>> 于2023年3月20

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-21 Thread Florin Coras

Hi Zhang, 

Awesome! Thanks!

Regards,
Florin

> On Mar 21, 2023, at 7:41 PM, Zhang Dongya  wrote:
> 
> Hi Florin,
> 
> Thanks a lot, the previous patch and with reset disabled have been running 1 
> day without issue.
> 
> I will enable reset and with your new patch, will provide feedback later.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月22日周三 02:12写道：
>> Hi, 
>> 
>> Okay, resetting of half-opens definitely not supported. I updated the patch 
>> to just clean them up on forced reset, without sending a reset to make sure 
>> session lookup table cleanup still happens. 
>> 
>> Regards,
>> Florin
>> 
>>> On Mar 20, 2023, at 9:13 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> After review my code, I found that I have add a flag to the vnet_disconnect 
>>> API which will call session_reset instead of session_close, the reason I do 
>>> this is to make intermediate firewall just flush the state and reconstruct 
>>> if I later reconnect.
>>> 
>>> It seems in session_reset logic, for half open session, it also missing to 
>>> remove the session from the lookup hash which may cause the issue too.
>>> 
>>> I change my code and will test with your patch along, will provide feedback 
>>> later.
>>> 
>>> I also noticed the bihash issue discussed in the list recently, I will 
>>> merge later.
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月21日周二 11:56写道：
>>>> Hi, 
>>>> 
>>>> That last thing is pretty interesting. It’s either the issue fixed by this 
>>>> patch [1] or sessions are somehow cleaned up multiple times. If it’s the 
>>>> latter, I’d really like to understand how that happens. 
>>>> 
>>>> Regards,
>>>> Florin
>>>> 
>>>> [1] https://gerrit.fd.io/r/c/vpp/+/38507 
>>>> 
>>>>> On Mar 20, 2023, at 6:52 PM, Zhang Dongya >>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> After merge this patch and update the test environment, the issue still 
>>>>> persists.
>>>>> 
>>>>> Let me clear my client app config:
>>>>> 1. register a reset callback, which will call vnet_disconnect there and 
>>>>> also trigger reconnect by send event to the ctrl process.)
>>>>> 2. register a connected callback, which will handle connect err by 
>>>>> trigger reconnect, on success, it will record session handle and extract 
>>>>> tcp sequence for our app usage.
>>>>> 3. register a disconnect callback, which basically do same as reset 
>>>>> callback.
>>>>> 4. register a cleanup callback and accept callback, which basically make 
>>>>> the session layer happy without actually relevant work to do.
>>>>> 
>>>>> There is a ctrl process in mater, which will handle periodically 
>>>>> reconnect or triggered by event.
>>>>> 
>>>>> BTW, I also see frequently warning 'session %u hash delete rv -3' in 
>>>>> session_delete in my environment, hope this helps to investigate.
>>>>> 
>>>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>>>> 于2023年3月20日周一 23:29写道：
>>>>>> Hi, 
>>>>>> 
>>>>>> Understood and yes, connect will synchronously fail if port is not 
>>>>>> available, so you should be able to retry it later. 
>>>>>> 
>>>>>> Regards, 
>>>>>> Florin
>>>>>> 
>>>>>>> On Mar 20, 2023, at 1:58 AM, Zhang Dongya >>>>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> It seems the issue occurs when there are disconnect called because our 
>>>>>>> network can't guarantee a tcp can't be reset even when 3 ways handshake 
>>>>>>> is completed (firewall issue :( ).
>>>>>>> 
>>>>>>> When we find the app layer timeout, we will first disconnect (because 
>>>>>>> we record the session handle, this session might be a half open 
>>>>>>> session), does vnet session layer guarantee that if we reconn

Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-21 Thread Florin Coras

Hi, 

The problem seems to be that you’re using a vmxnet3 interface, so I suspect 
this might be a vm configuration issue. Your current config should work but 
could end up being inefficient. 

With respect to your problem, I just built redis and ran redis-server and cli 
over LDP. Everything seems to be working fine so I’m assuming you’re doing some 
stress tests of redis? Could you provide more info about your client? 

Regards,
Florin

> On Mar 21, 2023, at 6:18 AM, Chen Weihao  wrote:
> 
> Thank you for your reply.
> I tried to change num-tx-queues from 2 to 5, but it got a SIGSEGV, the 
> backtrace is:
> #0  0x7fffb453ff89 in rte_write32_relaxed (addr=0x80007ef0, value=0)
> at ../src-dpdk/lib/eal/include/generic/rte_io.h:310
> #1  rte_write32 (addr=0x80007ef0, value=0)
> at ../src-dpdk/lib/eal/include/generic/rte_io.h:373
> #2  vmxnet3_enable_intr (hw=0xac03b0600, intr_idx=4294967262)
> at ../src-dpdk/drivers/net/vmxnet3/vmxnet3_ethdev.c:210
> #3  0x7fffb4544d35 in vmxnet3_dev_rx_queue_intr_enable (
> dev=0x7fffb5186980 , queue_id=0)
> at ../src-dpdk/drivers/net/vmxnet3/vmxnet3_ethdev.c:1815
> #4  0x7fffaff4bbf2 in rte_eth_dev_rx_intr_enable (port_id=0, queue_id=0)
> at ../src-dpdk/lib/ethdev/rte_ethdev.c:4740
> #5  0x7fffb49f4564 in dpdk_setup_interrupts (xd=0x7fffbdbb2940)
> at /home/chenweihao/vpp_dev/src/plugins/dpdk/device/common.c:336
> #6  0x7fffb49f4430 in dpdk_device_start (xd=0x7fffbdbb2940)
> at /home/chenweihao/vpp_dev/src/plugins/dpdk/device/common.c:411
> #7  0x7fffb49ff713 in dpdk_interface_admin_up_down (
> vnm=0x77e2b828 , hw_if_index=1, flags=1)
> at /home/chenweihao/vpp_dev/src/plugins/dpdk/device/device.c:476
> #8  0x770d60e8 in vnet_sw_interface_set_flags_helper (
> vnm=0x77e2b828 , sw_if_index=1, 
> flags=VNET_SW_INTERFACE_FLAG_ADMIN_UP, helper_flags=0)
> at /home/chenweihao/vpp_dev/src/vnet/interface.c:470
> #9  0x770d645a in vnet_sw_interface_set_flags (
> vnm=0x77e2b828 , sw_if_index=1, 
> flags=VNET_SW_INTERFACE_FLAG_ADMIN_UP)
> at /home/chenweihao/vpp_dev/src/vnet/interface.c:524
> #10 0x7710515f in set_state (vm=0x7fffb6a00740, input=0x7fffa9f84bb8, 
> cmd=0x7fffb7180850)
> at /home/chenweihao/vpp_dev/src/vnet/interface_cli.c:946
> #11 0x77e72257 in vlib_cli_dispatch_sub_commands (vm=0x7fffb6a00740, 
> cm=0x77f6a770 , input=0x7fffa9f84bb8, 
> parent_command_index=20) at /home/chenweihao/vpp_dev/src/vlib/cli.c:650
> #12 0x77e71fea in vlib_cli_dispatch_sub_commands (vm=0x7fffb6a00740, 
> cm=0x77f6a770 , input=0x7fffa9f84bb8, 
> parent_command_index=7) at /home/chenweihao/vpp_dev/src/vlib/cli.c:607
> #13 0x77e71fea in vlib_cli_dispatch_sub_commands (vm=0x7fffb6a00740, 
> cm=0x77f6a770 , input=0x7fffa9f84bb8, 
> parent_command_index=0) at /home/chenweihao/vpp_dev/src/vlib/cli.c:607
> #14 0x77e7122a in vlib_cli_input (vm=0x7fffb6a00740, 
> input=0x7fffa9f84bb8, function=0x0, function_arg=0)
> at /home/chenweihao/vpp_dev/src/vlib/cli.c:753
> #15 0x77ef7e23 in unix_cli_exec (vm=0x7fffb6a00740, 
> input=0x7fffa9f84f30, cmd=0x7fffb71815b8)
> at /home/chenweihao/vpp_dev/src/vlib/unix/cli.c:3431
> #16 0x77e72257 in vlib_cli_dispatch_sub_commands (vm=0x7fffb6a00740, 
> cm=0x77f6a770 , input=0x7fffa9f84f30, 
> --Type  for more, q to quit, c to continue without paging--
> parent_command_index=0) at /home/chenweihao/vpp_dev/src/vlib/cli.c:650
> #17 0x77e7122a in vlib_cli_input (vm=0x7fffb6a00740, 
> input=0x7fffa9f84f30, function=0x0, function_arg=0)
> at /home/chenweihao/vpp_dev/src/vlib/cli.c:753
> #18 0x77efdfc5 in startup_config_process (vm=0x7fffb6a00740, 
> rt=0x7fffb9194080, f=0x0)
> at /home/chenweihao/vpp_dev/src/vlib/unix/main.c:291
> #19 0x77ea2c5d in vlib_process_bootstrap (_a=140736084405176)
> at /home/chenweihao/vpp_dev/src/vlib/main.c:1221
> #20 0x76f1ffd8 in clib_calljmp ()
> at /home/chenweihao/vpp_dev/src/vppinfra/longjmp.S:123
> #21 0x7fffac516bb0 in ?? ()
> #22 0x77ea26f9 in vlib_process_startup (vm=0x8, 
> p=0x77ea53bb , f=0x7fffac516cc0)
> at /home/chenweihao/vpp_dev/src/vlib/main.c:1246
> #23 0x76f7aa1c in vec_mem_size (v=0x7fffb6a00740)
> at /home/chenweihao/vpp_dev/src/vppinfra/vec.c:15
> #24 0x0581655dfd1c in ?? ()
> #25 0x00330004 in ?? ()
> #26 0x0030 in ?? ()
> #27 0x7fffbdbc7240 in ?? ()
> #28 0x7fffbdbc7240 in ?? ()
> #29 0x7fffb80e5498 in ?? ()
> #30 0x0001 in ?? ()
> #31 0x in ?? ()
> 
> I tried to change num-rx-queues and num-tx-queues to 4, then SIGSEGV not 
> happened.
> I applied the patch https://gerrit.fd.io/r/c/vpp/+/38529 , and the problem of 
> redis 6.0 seemed still exist, the stack backtrace is same with 
>

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-21 Thread Florin Coras

Hi, 

Okay, resetting of half-opens definitely not supported. I updated the patch to 
just clean them up on forced reset, without sending a reset to make sure 
session lookup table cleanup still happens. 

Regards,
Florin

> On Mar 20, 2023, at 9:13 PM, Zhang Dongya  wrote:
> 
> Hi,
> 
> After review my code, I found that I have add a flag to the vnet_disconnect 
> API which will call session_reset instead of session_close, the reason I do 
> this is to make intermediate firewall just flush the state and reconstruct if 
> I later reconnect.
> 
> It seems in session_reset logic, for half open session, it also missing to 
> remove the session from the lookup hash which may cause the issue too.
> 
> I change my code and will test with your patch along, will provide feedback 
> later.
> 
> I also noticed the bihash issue discussed in the list recently, I will merge 
> later.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月21日周二 11:56写道：
>> Hi, 
>> 
>> That last thing is pretty interesting. It’s either the issue fixed by this 
>> patch [1] or sessions are somehow cleaned up multiple times. If it’s the 
>> latter, I’d really like to understand how that happens. 
>> 
>> Regards,
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38507 
>> 
>>> On Mar 20, 2023, at 6:52 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> After merge this patch and update the test environment, the issue still 
>>> persists.
>>> 
>>> Let me clear my client app config:
>>> 1. register a reset callback, which will call vnet_disconnect there and 
>>> also trigger reconnect by send event to the ctrl process.)
>>> 2. register a connected callback, which will handle connect err by trigger 
>>> reconnect, on success, it will record session handle and extract tcp 
>>> sequence for our app usage.
>>> 3. register a disconnect callback, which basically do same as reset 
>>> callback.
>>> 4. register a cleanup callback and accept callback, which basically make 
>>> the session layer happy without actually relevant work to do.
>>> 
>>> There is a ctrl process in mater, which will handle periodically reconnect 
>>> or triggered by event.
>>> 
>>> BTW, I also see frequently warning 'session %u hash delete rv -3' in 
>>> session_delete in my environment, hope this helps to investigate.
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月20日周一 23:29写道：
>>>> Hi, 
>>>> 
>>>> Understood and yes, connect will synchronously fail if port is not 
>>>> available, so you should be able to retry it later. 
>>>> 
>>>> Regards, 
>>>> Florin
>>>> 
>>>>> On Mar 20, 2023, at 1:58 AM, Zhang Dongya >>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> It seems the issue occurs when there are disconnect called because our 
>>>>> network can't guarantee a tcp can't be reset even when 3 ways handshake 
>>>>> is completed (firewall issue :( ).
>>>>> 
>>>>> When we find the app layer timeout, we will first disconnect (because we 
>>>>> record the session handle, this session might be a half open session), 
>>>>> does vnet session layer guarantee that if we reconnect from master thread 
>>>>> when the half open session still not be released yet (due to asynchronous 
>>>>> logic) that the reconnect fail? if then we can retry connect later.
>>>>> 
>>>>> I prefer to not registered half open callback because I think it make app 
>>>>> complicated from a TCP programming prospective.
>>>>> 
>>>>> For your patch, I think it should be work because I can't delete the half 
>>>>> open session immediately because there is worker configured, so the half 
>>>>> open will be removed from bihash when syn retrans timeout. I have merged 
>>>>> the patch and will provide feedback later.
>>>>> 
>>>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>>>> 于2023年3月20日周一 13:09写道：
>>>>>> Hi, 
>>>>>> 
>>>>>> Inline.
>>>>>> 
>>>>>>> On Mar 19, 2023, at 6:47 PM, Zhang Dongya >>>>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>>&

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-20 Thread Florin Coras

Hi, 

That last thing is pretty interesting. It’s either the issue fixed by this 
patch [1] or sessions are somehow cleaned up multiple times. If it’s the 
latter, I’d really like to understand how that happens. 

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38507 

> On Mar 20, 2023, at 6:52 PM, Zhang Dongya  wrote:
> 
> Hi,
> 
> After merge this patch and update the test environment, the issue still 
> persists.
> 
> Let me clear my client app config:
> 1. register a reset callback, which will call vnet_disconnect there and also 
> trigger reconnect by send event to the ctrl process.)
> 2. register a connected callback, which will handle connect err by trigger 
> reconnect, on success, it will record session handle and extract tcp sequence 
> for our app usage.
> 3. register a disconnect callback, which basically do same as reset callback.
> 4. register a cleanup callback and accept callback, which basically make the 
> session layer happy without actually relevant work to do.
> 
> There is a ctrl process in mater, which will handle periodically reconnect or 
> triggered by event.
> 
> BTW, I also see frequently warning 'session %u hash delete rv -3' in 
> session_delete in my environment, hope this helps to investigate.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月20日周一 23:29写道：
>> Hi, 
>> 
>> Understood and yes, connect will synchronously fail if port is not 
>> available, so you should be able to retry it later. 
>> 
>> Regards, 
>> Florin
>> 
>>> On Mar 20, 2023, at 1:58 AM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> It seems the issue occurs when there are disconnect called because our 
>>> network can't guarantee a tcp can't be reset even when 3 ways handshake is 
>>> completed (firewall issue :( ).
>>> 
>>> When we find the app layer timeout, we will first disconnect (because we 
>>> record the session handle, this session might be a half open session), does 
>>> vnet session layer guarantee that if we reconnect from master thread when 
>>> the half open session still not be released yet (due to asynchronous logic) 
>>> that the reconnect fail? if then we can retry connect later.
>>> 
>>> I prefer to not registered half open callback because I think it make app 
>>> complicated from a TCP programming prospective.
>>> 
>>> For your patch, I think it should be work because I can't delete the half 
>>> open session immediately because there is worker configured, so the half 
>>> open will be removed from bihash when syn retrans timeout. I have merged 
>>> the patch and will provide feedback later.
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月20日周一 13:09写道：
>>>> Hi, 
>>>> 
>>>> Inline.
>>>> 
>>>>> On Mar 19, 2023, at 6:47 PM, Zhang Dongya >>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> It can be aborted both in established state or half open state because I 
>>>>> will do timeout in our app layer. 
>>>> 
>>>> [fc] Okay! Is the issue present irrespective of the state of the session 
>>>> or does it happen only after a disconnect in hanf-open state? More lower. 
>>>> 
>>>>> 
>>>>> Regarding your question,
>>>>> 
>>>>> - Yes we add a builtin in app relys on C apis that  mainly use 
>>>>> vnet_connect/disconnect to connect or disconnect session.
>>>> 
>>>> [fc] Understood
>>>> 
>>>>> - We call these api in a vpp ctrl process which should be running on the 
>>>>> master thread, we never do session setup/teardown on worker thread. (the 
>>>>> environment that found this issue is configured with 1 master + 1 worker 
>>>>> setup.)
>>>> 
>>>> [fc] With vpp latest it’s possible to connect from first workers. It’s an 
>>>> optimization meant to avoid 1) worker barrier on syns and 2) entering poll 
>>>> mode on main (consume less cpu)
>>>> 
>>>>> - We started to develop the app using 22.06 and I keep to merge upstream 
>>>>> changes to latest vpp by cherry-picking. The reason for line mismatch is 
>>>>> that I added some comment to the session layer code, it should be equal 
>>>>> to the master branch now.
>>>> 
>>>> [fc] Ack
&

Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-20 Thread Florin Coras

Hi, 

First of all, could you try this [1] with latest vpp? It’s really interesting 
that iperf does not exhibit this issue. 

Regarding your config, some observations:
- I see you have configured 4 worker. I would then recommend to use 4 rx-queues 
and 5 tx-queues (main can send packets), as opposed to 2. 
- tcp defaults to cubic, so config can be omitted.
- evt_qs_memfd_seg is not deprecated, so it can be omitted as well
- any particular reason for "set interface rx-mode eth1 polling”? dpdk 
interfaces are in polling mode by default
- you’re using binary api socket "api-socket-name /run/vpp/api.sock”. That 
works, but going forward we’ll slowly deprecate that api. So it’d recommend 
using the app socket api. See for instance [2] for changes needed to session 
stanza and vcl. 

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38529
[2] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf


> On Mar 20, 2023, at 5:50 AM, Chen Weihao  wrote:
> 
> Thanks for your reply.
> I give a more detailed backtrace and config in 
> https://lists.fd.io/g/vpp-dev/message/22731.  
> My installation method is to 
> clone vpp from github and make build on Ubuntu 22.04(Kernel version is 
> 5.19)，and I use make run for test and make debug for debugging. Yes, I yried 
> to make the server and client are attached to the same vpp instance.I tried 
> the latest version of vpp on github on yesterday, the problem is still exist.
> I am looking forward to your reply.
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22734): https://lists.fd.io/g/vpp-dev/message/22734
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-20 Thread Florin Coras

Hi, 

Understood and yes, connect will synchronously fail if port is not available, 
so you should be able to retry it later. 

Regards, 
Florin

> On Mar 20, 2023, at 1:58 AM, Zhang Dongya  wrote:
> 
> Hi,
> 
> It seems the issue occurs when there are disconnect called because our 
> network can't guarantee a tcp can't be reset even when 3 ways handshake is 
> completed (firewall issue :( ).
> 
> When we find the app layer timeout, we will first disconnect (because we 
> record the session handle, this session might be a half open session), does 
> vnet session layer guarantee that if we reconnect from master thread when the 
> half open session still not be released yet (due to asynchronous logic) that 
> the reconnect fail? if then we can retry connect later.
> 
> I prefer to not registered half open callback because I think it make app 
> complicated from a TCP programming prospective.
> 
> For your patch, I think it should be work because I can't delete the half 
> open session immediately because there is worker configured, so the half open 
> will be removed from bihash when syn retrans timeout. I have merged the patch 
> and will provide feedback later.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月20日周一 13:09写道：
>> Hi, 
>> 
>> Inline.
>> 
>>> On Mar 19, 2023, at 6:47 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> It can be aborted both in established state or half open state because I 
>>> will do timeout in our app layer. 
>> 
>> [fc] Okay! Is the issue present irrespective of the state of the session or 
>> does it happen only after a disconnect in hanf-open state? More lower. 
>> 
>>> 
>>> Regarding your question,
>>> 
>>> - Yes we add a builtin in app relys on C apis that  mainly use 
>>> vnet_connect/disconnect to connect or disconnect session.
>> 
>> [fc] Understood
>> 
>>> - We call these api in a vpp ctrl process which should be running on the 
>>> master thread, we never do session setup/teardown on worker thread. (the 
>>> environment that found this issue is configured with 1 master + 1 worker 
>>> setup.)
>> 
>> [fc] With vpp latest it’s possible to connect from first workers. It’s an 
>> optimization meant to avoid 1) worker barrier on syns and 2) entering poll 
>> mode on main (consume less cpu)
>> 
>>> - We started to develop the app using 22.06 and I keep to merge upstream 
>>> changes to latest vpp by cherry-picking. The reason for line mismatch is 
>>> that I added some comment to the session layer code, it should be equal to 
>>> the master branch now.
>> 
>> [fc] Ack
>> 
>>> 
>>> When reading the code I understand that we mainly want to cleanup half open 
>>> from bihash in session_stream_connect_notify, however, in syn-sent state if 
>>> I choose to close the session, the session might be closed by my app due to 
>>> session setup timeout (in second scale), in that case, session will be 
>>> marked as half_open_done and half open session will be freed shortly in the 
>>> ctrl thread (the 1st worker?).
>> 
>> [fc] Actually, this might be the issue. We did start to provide a half-open 
>> session handle to apps which if closed does clean up the session but 
>> apparently it is missing the cleanup of the session lookup table. Could you 
>> try this patch [1]? It might need additional work.
>> 
>> Having said that, forcing a close/cleanup will not free the port 
>> synchronously. So, if you’re using fixed ports, you’ll have to wait for the 
>> half-open cleanup notification.
>> 
>>> 
>>> Should I also registered half open callback or there are some other reason 
>>> that lead to this failure?
>>> 
>> 
>> [fc] Yes, see above.
>> 
>> Regards, 
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38526
>> 
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月20日周一 06:22写道：
>>>> Hi, 
>>>> 
>>>> When you abort the connection, is it fully established or half-open? 
>>>> Half-opens are cleaned up by the owner thread after a timeout, but the 
>>>> 5-tuple should be assigned to the fully established session by that point. 
>>>> tcp_half_open_connection_cleanup does not cleanup the bihash instead 
>>>> session_stream_connect_notify does once tcp connect returns either success 
>>>> or failure. 
>>>

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Florin Coras

Hi, 

Inline.

> On Mar 19, 2023, at 6:47 PM, Zhang Dongya  wrote:
> 
> Hi,
> 
> It can be aborted both in established state or half open state because I will 
> do timeout in our app layer. 

[fc] Okay! Is the issue present irrespective of the state of the session or 
does it happen only after a disconnect in hanf-open state? More lower. 

> 
> Regarding your question,
> 
> - Yes we add a builtin in app relys on C apis that  mainly use 
> vnet_connect/disconnect to connect or disconnect session.

[fc] Understood

> - We call these api in a vpp ctrl process which should be running on the 
> master thread, we never do session setup/teardown on worker thread. (the 
> environment that found this issue is configured with 1 master + 1 worker 
> setup.)

[fc] With vpp latest it’s possible to connect from first workers. It’s an 
optimization meant to avoid 1) worker barrier on syns and 2) entering poll mode 
on main (consume less cpu)

> - We started to develop the app using 22.06 and I keep to merge upstream 
> changes to latest vpp by cherry-picking. The reason for line mismatch is that 
> I added some comment to the session layer code, it should be equal to the 
> master branch now.

[fc] Ack

> 
> When reading the code I understand that we mainly want to cleanup half open 
> from bihash in session_stream_connect_notify, however, in syn-sent state if I 
> choose to close the session, the session might be closed by my app due to 
> session setup timeout (in second scale), in that case, session will be marked 
> as half_open_done and half open session will be freed shortly in the ctrl 
> thread (the 1st worker?).

[fc] Actually, this might be the issue. We did start to provide a half-open 
session handle to apps which if closed does clean up the session but apparently 
it is missing the cleanup of the session lookup table. Could you try this patch 
[1]? It might need additional work.

Having said that, forcing a close/cleanup will not free the port synchronously. 
So, if you’re using fixed ports, you’ll have to wait for the half-open cleanup 
notification.

> 
> Should I also registered half open callback or there are some other reason 
> that lead to this failure?
> 

[fc] Yes, see above.

Regards, 
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38526

> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月20日周一 06:22写道：
>> Hi, 
>> 
>> When you abort the connection, is it fully established or half-open? 
>> Half-opens are cleaned up by the owner thread after a timeout, but the 
>> 5-tuple should be assigned to the fully established session by that point. 
>> tcp_half_open_connection_cleanup does not cleanup the bihash instead 
>> session_stream_connect_notify does once tcp connect returns either success 
>> or failure. 
>> 
>> So a few questions:
>> - is it accurate to assume you have a builtin vpp app and rely only on C 
>> apis to interact with host stack?
>> - on what thread (main or first worker) do you call vnet_connect?
>> - what api do you use to close the session? 
>> - what version of vpp is this because lines don’t match vpp latest?
>> 
>> Regards,
>> Florin
>> 
>> > On Mar 19, 2023, at 2:08 AM, Zhang Dongya > > <mailto:fortitude.zh...@gmail.com>> wrote:
>> > 
>> > Hi list,
>> > 
>> > recently in our application, we constantly triggered such abrt issue which 
>> > make our connectivity interrupt for a while:
>> > 
>> > Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
>> > 0x7fefd3b2000b
>> > Mar 19 16:11:26 ubuntu vnet[2565933]: 
>> > /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline) 
>> > assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails
>> > 
>> > Our scenario is quite simple, we will make 4 parallel tcp connection (use 
>> > 4 fixed source ports) to a remote vpp stack (fixed ip and port), and will 
>> > do some keepalive in our application layer, since we only use the vpp tcp 
>> > stack to make the middle box happy with the connection, we do not use the 
>> > data transport of tcp statck actually.
>> > 
>> > However, since the network condition is complex, we have to  always need 
>> > to abrt the connection and reconnect.
>> > 
>> > I keep to merge upstream session and tcp fix however the issue still not 
>> > fixed, what I found now it may be in some case 
>> > tcp_half_open_connection_cleanup may not deleted the half open session 
>> > from the lookup table (bihash) and the session index is realloced by other 
>> > connection.
>> > 
>> > Hope the list can provide some hint about how to overcome this issue, 
>> > thanks a lot.
>> > 
>> > 
>> > 
>> 
>> 
>> 
>> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22729): https://lists.fd.io/g/vpp-dev/message/22729
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-19 Thread Florin Coras

I just tried iperf3 in cut-through mode, i.e., server and client attached to 
the same vpp instance running 4 workers, with 128 connections and this seems to 
be working fine.

Could you try that out and see if it’s also working for you? It might be that 
this is something specific to how Redis uses sockets, so to reproduce we’ll 
need to replicate your testbed.

Regards,
Florin

> On Mar 19, 2023, at 2:58 PM, Florin Coras via lists.fd.io 
>  wrote:
> 
> Hi, 
> 
> That may very well be a problem introduced by the move of connects to first 
> worker. Unfortunately, I we don’t have tests for all of those corner cases 
> yet.
> 
> However, to replicate this issue, could you provide a bit more details about 
> your setup and the exact backtrace? It looks like you’re leverage cut-through 
> sessions so the server and client are attached to the same vpp instance? 
> Also, could you try vpp latest to see check if the issue still persists? 
> 
> Regards,
> Florin
> 
>> On Mar 19, 2023, at 1:53 AM, chenwei...@outlook.com wrote:
>> 
>> Hi vpp-team,
>>  I'm new to VPP and I'm trying to run Redis 6.0.18 in VCL with LD_PRELOAD 
>> using VPP 22.10 and VPP 23.02. I found that assert fails frequently in VPP 
>> 23.02, and after checking, I found that the assert fails in the session_get 
>> function in vnet/session/session.h. The cause was an invalid session_id with 
>> a value of -1 (or ~0).
>>  This function is called by the session_half_open_migrate_notify function in 
>> vnet/session/session.c, which is called by ct_accept_one in 
>> vnet/session/application_local.c. Function ct_accept_one is called because 
>> of an accept RPC request handled by the session_event_dispatch_ctrl function 
>> from the ct_connect function in vnet/session/application_local.c. Function 
>> ct_connect allocates and initializes a half-open transport object. However, 
>> its c_s_index value is -1 (or ~0), i.e., no session is allocated. allocating 
>> a session is implemented by calling session_alloc_for_half_open in the 
>> session_open_vc function of ct_connect (located in vnet/session/session.c). 
>> Therefore, I think the assertion failure is a case that ct_accept_one 
>> function accesses half-open tc without a session being allocated.
>>  I found that this problem does not exist on VPP 22.10. I checked the 
>> patches between 22.10 and 23.02 and found “session: move connects to first 
>> worker (https://gerrit.fd.io/r/c/vpp/+/35713)” that might be related to this 
>> issue, but I can't give a definite statement and I don’t know how fix it. I 
>> would be very grateful if you could address this issue.
>> Thanks,
>> 
>> 
>> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22726): https://lists.fd.io/g/vpp-dev/message/22726
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Florin Coras

Hi, 

When you abort the connection, is it fully established or half-open? Half-opens 
are cleaned up by the owner thread after a timeout, but the 5-tuple should be 
assigned to the fully established session by that point. 
tcp_half_open_connection_cleanup does not cleanup the bihash instead 
session_stream_connect_notify does once tcp connect returns either success or 
failure. 

So a few questions:
- is it accurate to assume you have a builtin vpp app and rely only on C apis 
to interact with host stack?
- on what thread (main or first worker) do you call vnet_connect?
- what api do you use to close the session? 
- what version of vpp is this because lines don’t match vpp latest?

Regards,
Florin

> On Mar 19, 2023, at 2:08 AM, Zhang Dongya  wrote:
> 
> Hi list,
> 
> recently in our application, we constantly triggered such abrt issue which 
> make our connectivity interrupt for a while:
> 
> Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
> 0x7fefd3b2000b
> Mar 19 16:11:26 ubuntu vnet[2565933]: 
> /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline) 
> assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails
> 
> Our scenario is quite simple, we will make 4 parallel tcp connection (use 4 
> fixed source ports) to a remote vpp stack (fixed ip and port), and will do 
> some keepalive in our application layer, since we only use the vpp tcp stack 
> to make the middle box happy with the connection, we do not use the data 
> transport of tcp statck actually.
> 
> However, since the network condition is complex, we have to  always need to 
> abrt the connection and reconnect.
> 
> I keep to merge upstream session and tcp fix however the issue still not 
> fixed, what I found now it may be in some case 
> tcp_half_open_connection_cleanup may not deleted the half open session from 
> the lookup table (bihash) and the session index is realloced by other 
> connection.
> 
> Hope the list can provide some hint about how to overcome this issue, thanks 
> a lot.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22725): https://lists.fd.io/g/vpp-dev/message/22725
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-19 Thread Florin Coras

Hi, 

That may very well be a problem introduced by the move of connects to first 
worker. Unfortunately, I we don’t have tests for all of those corner cases yet.

However, to replicate this issue, could you provide a bit more details about 
your setup and the exact backtrace? It looks like you’re leverage cut-through 
sessions so the server and client are attached to the same vpp instance? Also, 
could you try vpp latest to see check if the issue still persists? 

Regards,
Florin

> On Mar 19, 2023, at 1:53 AM, chenwei...@outlook.com wrote:
> 
> Hi vpp-team,
>   I'm new to VPP and I'm trying to run Redis 6.0.18 in VCL with LD_PRELOAD 
> using VPP 22.10 and VPP 23.02. I found that assert fails frequently in VPP 
> 23.02, and after checking, I found that the assert fails in the session_get 
> function in vnet/session/session.h. The cause was an invalid session_id with 
> a value of -1 (or ~0).
>   This function is called by the session_half_open_migrate_notify function in 
> vnet/session/session.c, which is called by ct_accept_one in 
> vnet/session/application_local.c. Function ct_accept_one is called because of 
> an accept RPC request handled by the session_event_dispatch_ctrl function 
> from the ct_connect function in vnet/session/application_local.c. Function 
> ct_connect allocates and initializes a half-open transport object. However, 
> its c_s_index value is -1 (or ~0), i.e., no session is allocated. allocating 
> a session is implemented by calling session_alloc_for_half_open in the 
> session_open_vc function of ct_connect (located in vnet/session/session.c). 
> Therefore, I think the assertion failure is a case that ct_accept_one 
> function accesses half-open tc without a session being allocated.
>   I found that this problem does not exist on VPP 22.10. I checked the 
> patches between 22.10 and 23.02 and found “session: move connects to first 
> worker (https://gerrit.fd.io/r/c/vpp/+/35713)” that might be related to this 
> issue, but I can't give a definite statement and I don’t know how fix it. I 
> would be very grateful if you could address this issue.
> Thanks,
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22724): https://lists.fd.io/g/vpp-dev/message/22724
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] can't establish tcp connection with new introduced transport_endpoint_freelist

2023-03-16 Thread Florin Coras

Great! Thanks for confirming!

Regards,
Florin

> On Mar 16, 2023, at 8:29 PM, Zhang Dongya  wrote:
> 
> yes, this is exactly what I want do, this patch works as expected, thanks a 
> lot.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月15日周三 01:22写道：
>> Hi, 
>> 
>> Are you looking for behavior similar to the one when random local ports are 
>> allocated when, if port is used, we check if the 5-tuple is available? 
>> 
>> Don’t think we explicitly supported this before but here’s a patch [1]. 
>> 
>> Regards,
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38486
>> 
>> 
>>> On Mar 14, 2023, at 12:56 AM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Just use this patch and the connection can be reconnected after closed.
>>> 
>>> However, I find another possible bug when using local ip + local port for 
>>> different target server due to transport_endpoint_mark_used return error
>>> if it find local ip + port being created.
>>> 
>>> I think it should increase the refcnt instead if it find 6 tuple is unique.
>>> 
>>>> static int
>>>> transport_endpoint_mark_used (u8 proto, ip46_address_t *ip, u16 port)
>>>> {
>>>>   transport_main_t *tm = _main;
>>>>   local_endpoint_t *lep;
>>>>   u32 tei;
>>>> 
>>>>   ASSERT (vlib_get_thread_index () <= transport_cl_thread ());
>>>>   // BUG??? maybe should allow reuse ??? 
>>>>   tei =
>>>> transport_endpoint_lookup (>local_endpoints_table, proto, ip, 
>>>> port);
>>>>   if (tei != ENDPOINT_INVALID_INDEX)
>>>> return SESSION_E_PORTINUSE;
>>>> 
>>>>   /* Pool reallocs with worker barrier */
>>>>   lep = transport_endpoint_alloc ();
>>>>   clib_memcpy_fast (>ep.ip, ip, sizeof (*ip));
>>>>   lep->ep.port = port;
>>>>   lep->proto = proto;
>>>>   lep->refcnt = 1;
>>>> 
>>>>   transport_endpoint_table_add (>local_endpoints_table, proto, 
>>>> >ep,
>>>> lep - tm->local_endpoints);
>>>> 
>>>>   return 0;
>>>> }
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月14日周二 11:38写道：
>>>> Hi, 
>>>> 
>>>> Could you try this out [1]? I’ve hit this issue myself today but with udp 
>>>> sessions. Unfortunately, as you’ve correctly pointed out, we were forcing 
>>>> a cleanup only on the non-fixed local port branch. 
>>>> 
>>>> Regards, 
>>>> Florin
>>>> 
>>>> [1] https://gerrit.fd.io/r/c/vpp/+/38473
>>>> 
>>>>> On Mar 13, 2023, at 7:35 PM, Zhang Dongya >>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi list,
>>>>> 
>>>>> We have update coded from the upstream session changes to our code 
>>>>> base and find a possible bug which cause tcp connection can't be 
>>>>> established anymore.
>>>>> 
>>>>> Our scenario is that we will connect to a remote tcp server with 
>>>>> specified local port and local ip, however, new vpp code have introduced 
>>>>> a lcl_endpts_freelist which will be either flushed when pending local 
>>>>> endpoint exceeded the limit (32) or when transport_alloc_local_port is 
>>>>> called.
>>>>> 
>>>>> However, since we specify the local port and local ip and the total 
>>>>> session count is limited (< 32), in this case, the 
>>>>> transport_cleanup_freelist will never be called which cause the previous 
>>>>> session which use the specified local port and local ip will not be 
>>>>> released after the session aborted.
>>>>> 
>>>>> I think we should also try to free the list in such case as I did in the 
>>>>> following code:
>>>>> 
>>>>>> int
>>>>>> transport_alloc_local_endpoint (u8 proto, transport_endpoint_cfg_t * 
>>>>>> rmt_cfg,
>>>>>> ip46_address_t * lcl_addr, u16 * lcl_port)
>>>>>> {
>>>>>>   // ZDY:
>>>>>>   transport_main_t *tm = _main;
>>>>>>   transport_endpoint_t *rmt = (transport_endpoint_t *) rmt_cfg;
>>>>>>   session_error_t error;

Re: [vpp-dev] can't establish tcp connection with new introduced transport_endpoint_freelist

2023-03-14 Thread Florin Coras

Hi, 

Are you looking for behavior similar to the one when random local ports are 
allocated when, if port is used, we check if the 5-tuple is available? 

Don’t think we explicitly supported this before but here’s a patch [1]. 

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38486


> On Mar 14, 2023, at 12:56 AM, Zhang Dongya  wrote:
> 
> Just use this patch and the connection can be reconnected after closed.
> 
> However, I find another possible bug when using local ip + local port for 
> different target server due to transport_endpoint_mark_used return error
> if it find local ip + port being created.
> 
> I think it should increase the refcnt instead if it find 6 tuple is unique.
> 
>> static int
>> transport_endpoint_mark_used (u8 proto, ip46_address_t *ip, u16 port)
>> {
>>   transport_main_t *tm = _main;
>>   local_endpoint_t *lep;
>>   u32 tei;
>> 
>>   ASSERT (vlib_get_thread_index () <= transport_cl_thread ());
>>   // BUG??? maybe should allow reuse ??? 
>>   tei =
>> transport_endpoint_lookup (>local_endpoints_table, proto, ip, port);
>>   if (tei != ENDPOINT_INVALID_INDEX)
>> return SESSION_E_PORTINUSE;
>> 
>>   /* Pool reallocs with worker barrier */
>>   lep = transport_endpoint_alloc ();
>>   clib_memcpy_fast (>ep.ip, ip, sizeof (*ip));
>>   lep->ep.port = port;
>>   lep->proto = proto;
>>   lep->refcnt = 1;
>> 
>>   transport_endpoint_table_add (>local_endpoints_table, proto, >ep,
>> lep - tm->local_endpoints);
>> 
>>   return 0;
>> }
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月14日周二 11:38写道：
>> Hi, 
>> 
>> Could you try this out [1]? I’ve hit this issue myself today but with udp 
>> sessions. Unfortunately, as you’ve correctly pointed out, we were forcing a 
>> cleanup only on the non-fixed local port branch. 
>> 
>> Regards, 
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38473
>> 
>>> On Mar 13, 2023, at 7:35 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi list,
>>> 
>>> We have update coded from the upstream session changes to our code base 
>>> and find a possible bug which cause tcp connection can't be established 
>>> anymore.
>>> 
>>> Our scenario is that we will connect to a remote tcp server with specified 
>>> local port and local ip, however, new vpp code have introduced a 
>>> lcl_endpts_freelist which will be either flushed when pending local 
>>> endpoint exceeded the limit (32) or when transport_alloc_local_port is 
>>> called.
>>> 
>>> However, since we specify the local port and local ip and the total session 
>>> count is limited (< 32), in this case, the transport_cleanup_freelist will 
>>> never be called which cause the previous session which use the specified 
>>> local port and local ip will not be released after the session aborted.
>>> 
>>> I think we should also try to free the list in such case as I did in the 
>>> following code:
>>> 
>>>> int
>>>> transport_alloc_local_endpoint (u8 proto, transport_endpoint_cfg_t * 
>>>> rmt_cfg,
>>>> ip46_address_t * lcl_addr, u16 * lcl_port)
>>>> {
>>>>   // ZDY:
>>>>   transport_main_t *tm = _main;
>>>>   transport_endpoint_t *rmt = (transport_endpoint_t *) rmt_cfg;
>>>>   session_error_t error;
>>>>   int port;
>>>> 
>>>>   /*
>>>>* Find the local address
>>>>*/
>>>>   if (ip_is_zero (_cfg->peer.ip, rmt_cfg->peer.is_ip4))
>>>> {
>>>>   error = transport_find_local_ip_for_remote 
>>>> (_cfg->peer.sw_if_index,
>>>>  rmt, lcl_addr);
>>>>   if (error)
>>>> return error;
>>>> }
>>>>   else
>>>> {
>>>>   /* Assume session layer vetted this address */
>>>>   clib_memcpy_fast (lcl_addr, _cfg->peer.ip,
>>>> sizeof (rmt_cfg->peer.ip));
>>>> }
>>>> 
>>>>   /*
>>>>* Allocate source port
>>>>*/
>>>>   if (rmt_cfg->peer.port == 0)
>>>> {
>>>>   port = transport_alloc_local_port (proto, lcl_addr, rmt_cfg);
>>>>   if (port < 1)
>>>> return SESSION_E_NOPORT;
>>>>   *lcl_port = port;
>>>> }
>>>>   else
>>>> {
>>>>   port = clib_net_to_host_u16 (rmt_cfg->peer.port);
>>>>   *lcl_port = port;
>>>> 
>>>>   // ZDY: need add this to to cleanup because in specified src port
>>>>   // case, we will not run to transport_alloc_local_port, then
>>>>   // freelist will only be freeed when list is full (>32).
>>>>   /* Cleanup freelist if need be */
>>>>   if (vec_len (tm->lcl_endpts_freelist))
>>>> transport_cleanup_freelist ();
>>>> 
>>>>   return transport_endpoint_mark_used (proto, lcl_addr, port);
>>>> }
>>>> 
>>>>   return 0;
>>>> }
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22704): https://lists.fd.io/g/vpp-dev/message/22704
Mute This Topic: https://lists.fd.io/mt/97596886/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] can't establish tcp connection with new introduced transport_endpoint_freelist

2023-03-13 Thread Florin Coras

Hi, 

Could you try this out [1]? I’ve hit this issue myself today but with udp 
sessions. Unfortunately, as you’ve correctly pointed out, we were forcing a 
cleanup only on the non-fixed local port branch. 

Regards, 
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38473

> On Mar 13, 2023, at 7:35 PM, Zhang Dongya  wrote:
> 
> Hi list,
> 
> We have update coded from the upstream session changes to our code base 
> and find a possible bug which cause tcp connection can't be established 
> anymore.
> 
> Our scenario is that we will connect to a remote tcp server with specified 
> local port and local ip, however, new vpp code have introduced a 
> lcl_endpts_freelist which will be either flushed when pending local endpoint 
> exceeded the limit (32) or when transport_alloc_local_port is called.
> 
> However, since we specify the local port and local ip and the total session 
> count is limited (< 32), in this case, the transport_cleanup_freelist will 
> never be called which cause the previous session which use the specified 
> local port and local ip will not be released after the session aborted.
> 
> I think we should also try to free the list in such case as I did in the 
> following code:
> 
>> int
>> transport_alloc_local_endpoint (u8 proto, transport_endpoint_cfg_t * rmt_cfg,
>> ip46_address_t * lcl_addr, u16 * lcl_port)
>> {
>>   // ZDY:
>>   transport_main_t *tm = _main;
>>   transport_endpoint_t *rmt = (transport_endpoint_t *) rmt_cfg;
>>   session_error_t error;
>>   int port;
>> 
>>   /*
>>* Find the local address
>>*/
>>   if (ip_is_zero (_cfg->peer.ip, rmt_cfg->peer.is_ip4))
>> {
>>   error = transport_find_local_ip_for_remote (_cfg->peer.sw_if_index,
>>  rmt, lcl_addr);
>>   if (error)
>> return error;
>> }
>>   else
>> {
>>   /* Assume session layer vetted this address */
>>   clib_memcpy_fast (lcl_addr, _cfg->peer.ip,
>> sizeof (rmt_cfg->peer.ip));
>> }
>> 
>>   /*
>>* Allocate source port
>>*/
>>   if (rmt_cfg->peer.port == 0)
>> {
>>   port = transport_alloc_local_port (proto, lcl_addr, rmt_cfg);
>>   if (port < 1)
>> return SESSION_E_NOPORT;
>>   *lcl_port = port;
>> }
>>   else
>> {
>>   port = clib_net_to_host_u16 (rmt_cfg->peer.port);
>>   *lcl_port = port;
>> 
>>   // ZDY: need add this to to cleanup because in specified src port
>>   // case, we will not run to transport_alloc_local_port, then
>>   // freelist will only be freeed when list is full (>32).
>>   /* Cleanup freelist if need be */
>>   if (vec_len (tm->lcl_endpts_freelist))
>> transport_cleanup_freelist ();
>> 
>>   return transport_endpoint_mark_used (proto, lcl_addr, port);
>> }
>> 
>>   return 0;
>> }
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22699): https://lists.fd.io/g/vpp-dev/message/22699
Mute This Topic: https://lists.fd.io/mt/97596886/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: COMMERCIAL BULK: Re: [E] COMMERCIAL BULK: [vpp-dev] TLS app stuck after burst of traffic #vpp-hoststack

2023-03-07 Thread Florin Coras

Hi Kevin, 

Understood. If you manage to confirm things are stable on latest vpp, maybe try 
to backport patches that changed tls tx functions. Worst case, you’ll have to 
backport some session layer patches as well. 

Regards,
Florin

> On Mar 7, 2023, at 5:24 AM, Kevin Yan  wrote:
> 
> Hi Florin,
>Due to some reasons we need to stay on vpp20.09 for some time, 
>  that’s why I hope I can fix the issue on this version. Actually, I  did some 
> code changes in TLS layer to let session node force reschedule the TLS 
> session when tx svm fifo is full or at least not empty, after the changes, 
> the tx  svm fifo can be recovered after stopping the traffic, but I think 
> this is not the correct way to solve the issue.
>  
>Anyway let me see if I can test the same case with latest vpp 
> codes
>  
> BRs,
> Kevin
>  
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  <mailto:vpp-dev@lists.fd.io>> On Behalf Of Florin Coras
> Sent: Tuesday, March 7, 2023 2:49 PM
> To: vpp-dev mailto:vpp-dev@lists.fd.io>>
> Cc: Olivia Dunham  <mailto:theoliviadun...@gmail.com>>
> Subject: COMMERCIAL BULK: Re: [E] COMMERCIAL BULK: [vpp-dev] TLS app stuck 
> after burst of traffic #vpp-hoststack
>  
> Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of 
> Phishing Scams, Report questionable emails tos...@mavenir.com 
> <mailto:s...@mavenir.com>
> Hi Kevin,  
>  
> That’s a really old version of vpp. TLS has seen several improvements since 
> then in areas including scheduling after incomplete writes. If you get a 
> chance to test vpp latest or a more recent release, do let us know if the 
> issue still persists. 
>  
> Regards,
> Florin
> 
> 
> On Mar 6, 2023, at 5:42 PM, Kevin Yan via lists.fd.io 
>  <mailto:kevin.yan=mavenir@lists.fd.io>> wrote:
>  
> Hi @Olivia Dunham <mailto:theoliviadun...@gmail.com>,
>Recently I met the exact same issue,  TLS TX svm fifo gets 
> full after burst of traffic and it will never resume, meanwhile, TCP TX svm 
> fifo is empty at that time I’m using VPP20.09,  I believe there is some issue 
> in TLS layer,
> so did you fix the issue later? If yes, can you share the solution.
>  
> BRs,
> Kevin
>  
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  <mailto:vpp-dev@lists.fd.io>> On Behalf Of Olivia Dunham
> Sent: Tuesday, September 14, 2021 8:58 PM
> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: [E] COMMERCIAL BULK: [vpp-dev] TLS app stuck after burst of traffic 
> #vpp-hoststack
>  
> Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of 
> Phishing Scams, Report questionable emails tos...@mavenir.com 
> <mailto:s...@mavenir.com>
> During sudden burst of traffic, the TCP fifo gets full. When this happens the 
> openssl TLS app de-schedules the transport. But once the TCP data is sent 
> out, the TLS is not resuming. VPP ends up in a state where TCP fifo is empty, 
> but the TLS fifo is full and no more Tx happens on TLS fifo.
> 
> VPP version: 21.01
> 
> We came across this commit - session tls: deq notifications for custom tx 
> <https://github.com/FDio/vpp/commit/1e6a0f64653c8142fa7032aba127ab4894bafc3c>
> Not sure what is the issue fixed by this commit, but It doesn't seem to fix 
> the above mentioned issue.
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> <mailto:secur...@mavenir.com>. This message contains the views of its author 
> and may not necessarily reflect the views of Mavenir Systems, Inc. or its 
> affiliates, who employ systems to monitor email messages, but make no 
> representation that such messages are authorized, secure, uncompromised, or 
> free from computer viruses, malware, or other defects. Thank You
> 
> 
>  
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> <mailto

Re: [E] COMMERCIAL BULK: [vpp-dev] TLS app stuck after burst of traffic #vpp-hoststack

2023-03-06 Thread Florin Coras

Hi Kevin, 

That’s a really old version of vpp. TLS has seen several improvements since 
then in areas including scheduling after incomplete writes. If you get a chance 
to test vpp latest or a more recent release, do let us know if the issue still 
persists. 

Regards,
Florin

> On Mar 6, 2023, at 5:42 PM, Kevin Yan via lists.fd.io 
>  wrote:
> 
> Hi @Olivia Dunham ,
>Recently I met the exact same issue,  TLS TX svm fifo gets 
> full after burst of traffic and it will never resume, meanwhile, TCP TX svm 
> fifo is empty at that time I’m using VPP20.09,  I believe there is some issue 
> in TLS layer,
> so did you fix the issue later? If yes, can you share the solution.
>  
> BRs,
> Kevin
>  
> From: vpp-dev@lists.fd.io   > On Behalf Of Olivia Dunham
> Sent: Tuesday, September 14, 2021 8:58 PM
> To: vpp-dev@lists.fd.io 
> Subject: [E] COMMERCIAL BULK: [vpp-dev] TLS app stuck after burst of traffic 
> #vpp-hoststack
>  
> Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of 
> Phishing Scams, Report questionable emails tos...@mavenir.com 
> 
> During sudden burst of traffic, the TCP fifo gets full. When this happens the 
> openssl TLS app de-schedules the transport. But once the TCP data is sent 
> out, the TLS is not resuming. VPP ends up in a state where TCP fifo is empty, 
> but the TLS fifo is full and no more Tx happens on TLS fifo.
> 
> VPP version: 21.01
> 
> We came across this commit - session tls: deq notifications for custom tx 
> 
> Not sure what is the issue fixed by this commit, but It doesn't seem to fix 
> the above mentioned issue.
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> . This message contains the views of its author 
> and may not necessarily reflect the views of Mavenir Systems, Inc. or its 
> affiliates, who employ systems to monitor email messages, but make no 
> representation that such messages are authorized, secure, uncompromised, or 
> free from computer viruses, malware, or other defects. Thank You
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22672): https://lists.fd.io/g/vpp-dev/message/22672
Mute This Topic: https://lists.fd.io/mt/97440914/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp+nginx #vpp-hoststack

2023-03-06 Thread Florin Coras




nginx.conf
Description: Binary data

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22670): https://lists.fd.io/g/vpp-dev/message/22670
Mute This Topic: https://lists.fd.io/mt/96623842/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Issue related to VCL Session Migration

2023-02-07 Thread Florin Coras

Hi Vivek, Aslam, 

That’s an interesting use case. We typically recommend using VCL natively and 
only if that’s not possible use LDP, which implies VLS locking. We haven’t had 
many VLS native integration efforts.

Coming back to your problem, any particular reason why you’re not registering 
all your app’s pthreads as vcl workers and have them listen on the same 
ip:port? Then vpp would just distribute incoming connections to all workers 
which can in parallel handle tls establishment and io.

If that’s not possible, you’re doing the right thing by using VLS. I’m assuming 
you’re creating those epoll sessions within the pthreads/workers? If that’s not 
the case, and you have sessions added to those epfds, you should get a "can't 
migrate nonempty epoll session” error. 

If on the other hand you accept sessions on only one worker, and then hand them 
off to other workers, you’ll need to change ownership/share those sessions in 
vpp as well, i.e., modify vls_clone_and_share_rpc_handler and have it call 
vls_init_share_session (which today it is not if mt workers is on). This is the 
least efficient option as it involves a lot of work per each session.  

Regards,
Florin

> On Feb 7, 2023, at 12:54 PM, Vivek Gupta  wrote:
> 
> Hi,
>  
> We have SSL based application using VCL. 
>  
> - Currently, we have one thread which does the epoll, TLS Session 
> establishment, Read/Write for multiple tunnels.
>  
> - Now we want to split the functionality in different threads, such that TLS 
> Session establishment happens in a separate thread,
>   read/write happens in another thread and Epoll is happening in separate 
> thread.
>   
> - To do this, we want migrate the VCL sessions from one VCL worker to another 
> VCL worker using the VLS wrapper.
>  
> - We notice that post the migration read/write is working fine from the 
> migrated thread, but EPOLL indications are not coming to either the
>   old thread or the new thread.
>  
> Since we are using VLS, we have set multi-thread-workers option to TRUE.
>  
> If we use the single VCL worker based VLS option, epoll is working fine for 
> us. But it will require lot of locks and hence trying to avoid that option.
>  
> Please let us know if epoll is supported for migrating the VCL sessions, with 
> multi-thread-workers option set to true. Also, any pointers
> any specific changes to be done for that will help a lot.
>  
>  
> Regards,
> Vivek

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22564): https://lists.fd.io/g/vpp-dev/message/22564
Mute This Topic: https://lists.fd.io/mt/96816502/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Delete node process

2023-02-06 Thread Florin Coras

Hi, 

You can disable a node using vlib_node_set_state. There’s no api to unregister 
a node. 

Regards,
Florin

> On Feb 6, 2023, at 12:00 PM, amine belroul  wrote:
> 
> Hello, 
> How can I delete node process from vpp runtime?
> For right I can make it just done but not deleted.
> 
> 
> Thank you. 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22555): https://lists.fd.io/g/vpp-dev/message/22555
Mute This Topic: https://lists.fd.io/mt/96791991/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp+nginx #vpp-hoststack

2023-01-30 Thread Florin Coras

Hi, 

Could you provide a simplified description of your topology and a bare bones 
nginx config? We could try to repro this in the hs-test infra we’ve been 
recently developing. See here [1]. 

Also, could you also try out this patch [2] I’ve been toying with recently to 
see if it improves anything?

Regards,
Florin

[1] https://git.fd.io/vpp/tree/extras/hs-test
[2] https://gerrit.fd.io/r/c/vpp/+/38080

> On Jan 30, 2023, at 5:55 PM, first_se...@163.com wrote:
> 
> I use tool of wrk like this :wrk -c 20 -t 10 -d 40 http://172.30.4.23:80 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22525): https://lists.fd.io/g/vpp-dev/message/22525
Mute This Topic: https://lists.fd.io/mt/96623842/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp+nginx #vpp-hoststack

2023-01-30 Thread Florin Coras

Hi, 

I’m guessing you’re running out of ports on connections from nginx/vpp to the 
actual server, since you’re using fixed ips and a fixed destination port? Check 
how many sessions you have opened with “show session”.

Out of curiosity, what are you using mirroring for? Testing? 

Regards,
Florin

> On Jan 30, 2023, at 1:05 AM, first_se...@163.com wrote:
> 
> when the reverse proxy not config mirror this issue is not exitsed.but when 
> reverse proxy of nginx use the mirror module ,it is  occured and the 
> configure like this
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22522): https://lists.fd.io/g/vpp-dev/message/22522
Mute This Topic: https://lists.fd.io/mt/96623842/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP crashes because of API segment exhaustion

2023-01-24 Thread Florin Coras

Hi Alexander, 

Quick reply. 

Nice bug report! Agreed that it looks like vl_api_clnt_process sleeps, probably 
because it hit a queue size of 0, but memclnt_queue_callback or the timeout, 
albeit 20s is a lot, should wake it up. 

So, given that QUEUE_SIGNAL_EVENT is set, the only thing that comes to mind is 
that maybe somehow vlib_process_signal_event context gets corrupted. Could you 
run a debug image and see if anything asserts? Is vlib_process_signal_event 
called by chance from a worker?

Regards,
Florin

> On Jan 24, 2023, at 7:59 AM, Alexander Chernavin via lists.fd.io 
>  wrote:
> 
> Hello all,
> 
> We are experiencing VPP crashes that occur a few days after the startup 
> because of API segment exhaustion. Increasing API segment size to 256MB 
> didn't stop the crashes from occurring.
> 
> Can you please take a look at the description below and tell us if you have 
> seen similar issues or have any ideas what the cause may be?
> 
> Given:
> VPP 22.10
> 2 worker threads
> API segment size is 256MB
> ~893k IPv4 routes and ~160k IPv6 routes added
> 
> Backtrace:
>> [..]
>> #32660 0x55b02f606896 in os_panic () at 
>> /home/jenkins/tnsr-pkgs/work/vpp/src/vpp/vnet/main.c:414
>> #32661 0x7fce3c0ec740 in clib_mem_heap_alloc_inline (heap=0x0, 
>> size=, align=8, 
>> os_out_of_memory_on_failure=1) at 
>> /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:613
>> #32662 clib_mem_alloc (size=)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:628
>> #32663 0x7fce3dc4ee6f in vl_msg_api_alloc_internal (vlib_rp=0x130026000, 
>> nbytes=69, pool=0, 
>> may_return_null=0) at 
>> /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memory_shared.c:179
>> #32664 0x7fce3dc592cd in vl_api_rpc_call_main_thread_inline (force_rpc=0 
>> '\000', 
>> fp=, data=, data_length=)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:617
>> #32665 vl_api_rpc_call_main_thread (fp=0x7fce3c74de70 , 
>> data=0x7fcc372bdc00 "& \001$ ", data_length=28)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:641
>> #32666 0x7fce3cc7fe2d in icmp6_neighbor_solicitation_or_advertisement 
>> (vm=0x7fccc0864000, 
>> frame=0x7fcccd7d2d40, is_solicitation=1, node=)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:163
>> #32667 icmp6_neighbor_solicitation (vm=0x7fccc0864000, node=0x7fccc09e3380, 
>> frame=0x7fcccd7d2d40)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:322
>> #32668 0x7fce3c1a2fe0 in dispatch_node (vm=0x7fccc0864000, 
>> node=0x7fce3dc74836, 
>> type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING, 
>> frame=0x7fcccd7d2d40, 
>> last_time_stamp=4014159654296481) at 
>> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:961
>> #32669 dispatch_pending_node (vm=0x7fccc0864000, pending_frame_index=7, 
>> last_time_stamp=4014159654296481) at 
>> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1120
>> #32670 vlib_main_or_worker_loop (vm=0x7fccc0864000, is_main=0)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1589
>> #32671 vlib_worker_loop (vm=vm@entry=0x7fccc0864000)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1723
>> #32672 0x7fce3c1f581a in vlib_worker_thread_fn (arg=0x7fccbdb11b40)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:1579
>> #32673 0x7fce3c1f02c1 in vlib_worker_thread_bootstrap_fn 
>> (arg=0x7fccbdb11b40)
>> at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:418
>> #32674 0x7fce3be3db43 in start_thread (arg=) at 
>> ./nptl/pthread_create.c:442
>> #32675 0x7fce3becfa00 in clone3 () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 
> According to the backtrace, an IPv6 neighbor is being learned. Since the 
> packet was received on a worker thread, the neighbor information is being 
> passed to the main thread by making an RPC call (that works via the API). For 
> this, an API message for RPC call is being allocated from the API segment (as 
> а client). But the allocation is failing because of no available memory.
> 
> If inspect the API rings after crashing, it can be seen that they are all 
> filled with VL_API_RPC_CALL messages. Also, it can be seen that there are a 
> lot of pending RPC requests (vm->pending_rpc_requests has ~3.3M items). Thus, 
> API segment exhaustion occurs because of a huge number of pending RPC 
> messages.
> 
> RPC messages are processed in a process node called api-rx-from-ring 
> (function is called vl_api_clnt_process). And process nodes are handled in 
> the main thread only.
> 
> First hypothesis is that the main loop of the main thread pauses for such a 
> long time that a huge number of pending RPC messages are accumulated by the 
> worker threads (that keep running). But this doesn't seem to be confirmed if 
> inspect vm->loop_interval_start of all threads after crashing. 
> vm->loop_interval_start of the worker threads would have been greater

Re: [vpp-dev] VPP 22.10 : VCL not accepting UDP connections

2023-01-23 Thread Florin Coras

Hi Chinmaya, 

Given that you’re getting packets in the listener’s rx fifo, I suspect the 
request to make it a connected listener didn’t work. We’ve had a number of 
changes in vcl/session layer so hard to say what exactly might be affecting 
your app. 

Just did an iperf udp test on master and everything seems to be fine. Maybe try 
it with your current vpp version to make sure that everything is okay or try 
running the udp iperf make test [2]

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf#UDP_testing
[2] https://git.fd.io/vpp/tree/test/asf/test_vcl.py#n978

> On Jan 23, 2023, at 10:04 AM, Chinmaya Aggarwal  
> wrote:
> 
> Hi,
> 
> We are using connected socket (setting VPPCOM_ATTR_SET_CONNECTED) but still 
> facing this issue. Has something changed between VPP v21.06 and the new 
> release for the connected udp socket?
> 
> Thanks and Regards,
> Chinmaya Agarwal. 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22493): https://lists.fd.io/g/vpp-dev/message/22493
Mute This Topic: https://lists.fd.io/mt/96363933/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP 22.10 : VCL not accepting UDP connections

2023-01-20 Thread Florin Coras

Hi Chinmaya, 

Given that data is written to the listener’s fifo, I’ll guess vpp thinks it’s 
using non-connected udp sessions. Since you expect accepts to be coming, 
probably you’re missing an vppcom_session_attr VPPCOM_ATTR_SET_CONNECTED on the 
listener. See for instance here [1]. It could also be that the vcl lib your app 
is linked against is out of sync with vpp. 

Let me know if that solves the issue.

Regards,
Florin


[1] https://git.fd.io/vpp/tree/src/plugins/hs_apps/vcl/vcl_test_protos.c#n154

> On Jan 19, 2023, at 1:13 PM, Chinmaya Aggarwal  
> wrote:
> 
> Hi,
>  
> We re-compiled VPP 22.10 by cherry picking the below commits:-
>  
> udp: fix tx handling of non-connected sessions : 
> 15952b261f92959ca14cf6679efc318c12e90de6
> udp: support for disabling tx csum : f8ee39ff715ec713045af69da465ba4da8248212
> udp: explicit udp output node This allows for custom next node selection on 
> output. : 8c1be054b90f113aef3ae27b52d7389271ce91c3
>  
> But we are still facing the same issue that VCL is not able to accept UDP 
> connections and we are seeing rx full in "show session verbose".
>  
> Is there anything else that we might be missing out on or can try?
> 
> Thanks and Regards,
> Chinmaya Agarwal.
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22487): https://lists.fd.io/g/vpp-dev/message/22487
Mute This Topic: https://lists.fd.io/mt/96363933/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP 22.10 : VCL not accepting UDP connections

2023-01-18 Thread Florin Coras

Hi Chinmaya, 

Are you by chance using 23.02rc0, as opposed to 22.10, in combination with 
non-connected udp listeners? If yes, could you try this fix [1] or vpp latest 
to check if the issue still persists? 

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/37842

> On Jan 18, 2023, at 12:59 PM, Chinmaya Aggarwal  
> wrote:
> 
> Hi,
>  
> We are testing VCL in VPP 22.10 and facing the issue that VCL is not able to 
> accept UDP connections and we are seeing rx full in "show session verbose" 
> command:--
>  
> vpp# show session verbose
> Connection  State  
> Rx-f  Tx-f
> [0:0][U] 2001:5b0::501:b883:31f:29e:9881:9915->:::0 LISTEN 
> 3994945   0
> [0:1][U] 2001:5b0::501:b883:31f:19e:9881:9915->:::0 LISTEN 0  
>0
> Thread 0: active sessions 2
> Thread 1: no sessions
> Thread 2: no sessions
> Thread 3: no sessions
> Thread 4: no sessions
> vpp#
>  
>  
> Below is the relevant configuration in:- 
> startup.conf
> session {
> enable
> evt_qs_memfd_seg
> use-app-socket-api
> segment-baseva 0x20
> }
> #socksvr { socket-name /run/vpp/vcl.sock}
> socksvr { default }
>  
>  
> vcl.conf
> vcl {
>   rx-fifo-size 400
>   tx-fifo-size 400
>   app-scope-local
>   app-scope-global
>   app-socket-api  /var/run/vpp/app_ns_sockets/default
> }
>  
> The same configuration is working fine in VPP v21.06. Has anything changed in 
> 22.10 or are we missing something here?
> 
> Thanks and Regards,
> Chinmaya Agarwal.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22482): https://lists.fd.io/g/vpp-dev/message/22482
Mute This Topic: https://lists.fd.io/mt/96363933/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

[vpp-dev] PSA: host stack active opens from first worker

2022-12-01 Thread Florin Coras

Hi folks, 

It’s been many months and patches at this point but once [1] is merged, session 
layer will accept connects from both main, with worker barrier, and first 
worker. Preference is now for the latter, especially if CPS performance is 
critical. 

There should be no need to change existing apps. In particular, vcl 
applications will transparently leverage this improvement while builtin 
applications should still work even if they rely on main thread for the 
connects. 

The reason why this is a PSA is because in spite of all testing, there’s still 
a chance some corner case are not supported, some transports might be buggy now 
or just plain old bugs might’ve slipped in. Should you hit any issues, or have 
any additional comments, feel free to reach out via this thread or directly.

Benefits of this improvement are:
- no more main thread polling under heavy connect load. Under certain 
circumstances, main can still be used to perform the connects but this should 
be the exception not the norm. 
- higher CPS performance.

And to be precise, regarding the CPS performance, on my skylake testbed using 
40Gbps nics:
o) prior to [2], albeit note that several other changes that might’ve affected 
performance have already gone in: 
- 1 worker vpp, pre-warmup: 80k post warmup: 105k
- 4 worker vpp, pre-warmup: 135k post-warmup: 230k 
o) after change:
- 1 worker vpp, pre-warmup: 110k, post warmup: 165k 
- 4 worker vpp, pre-warmup: 150k, post warmup: 360k

You can try to reproduce these results using this [2].

Regards, 
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/35713/
[2] https://wiki.fd.io/view/VPP/HostStack/EchoClientServer#TCP_CPS_measurement


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22267): https://lists.fd.io/g/vpp-dev/message/22267
Mute This Topic: https://lists.fd.io/mt/95395667/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Understanding the use of FD.io VPP as a network stack

2022-11-16 Thread Florin Coras

Hi Federico, 

Apologies, I missed your first email. 

More Inline.

Regards,
Florin


> On Nov 16, 2022, at 7:53 AM, Federico Strati via lists.fd.io 
>  wrote:
> 
> Hello Ben, All
> 
> first of all, thanks for the prompt reply.
> 
> Now let me clarify the questions, which confused you because of my 
> ignorance
> 
> 1. How to use the VPP host stack (TCP):
> 
> I refined my understanding having seen the presentation of Florin 
> (https://www.youtube.com/watch?v=3Pp7ytZeaLk):
> Up to now the revised diagram would be:
> 
> Process A (TCP client or server) <--> VCL (VPP Comms Library) or VCL+LDP or 
> ??? <--> VPP (Session -> TCP -> Eth) <--> Memif or ??? custom plug-in <--> 
> Process B (our radio stack including driver)
> 
> i.e. we would like to use VPP network stack (also termed host stack) as an 
> user-space TCP stack over our radio stack.

So let’s first clarify what library to use. VCL is a library that apps link 
against and can interact with the host stack in vpp (session layer and 
transports) in a POSIX-like fashion. LDP is an LD_PRELOAD shim that intercepts 
socket related syscalls and redirects them into VCL. 

Now, regarding your diagram above. I’m assuming you're trying to generate TCP 
packets and feed them into a separate process. So yes, memif or tap should be 
pretty good options to get packets into your radio stack. You could also build 
some shared memory mechanisms whereby you could pass/expose vpp buffers to 
process B. 

Another option, which I assume is not trivial, would be to move your radio 
stack into a vpp node. TCP can be forced to feed packets to custom next nodes. 

> 
> Florin was saying that there is an alternative for VCL: we don't have legacy 
> BSD socket apps, hence we are free to use the most advanced interface.

I’m guessing you’re thinking about using raw session layer apis. I’d first 
start with VCL to keep things simple. 

> 
> Possibly we would like to be zero-copy insofar as possible.

There is no zero-copy api between vcl and session layer in vpp currently. 

> 
> The "North" of the TCP stack is the (client/server) apps, the "South" of the 
> stack are IP or Eth frames.
> 
> Ideally we would like to know what are the best options to interface with VPP 
> north-bound and south-bound.
> 
> We don't exit into a NIC card, that would be:
> 
> Process A --> VCL --> VPP --> DPDK plug-in (or ??? AF_Packet / AF_XDP) --> NIC
> 
> Hence what are the best possible solutions?

See if above works for you. 

> 
> 2. VPP multi-instances.
> 
> I'm not asking for multi-threading (which I already use successfully), but 
> for running multiple VPP processes in parallel, of course paying attention to 
> core pinning.
> 
> My question was, what are the options to change in startup.conf ?

Yes, you can use multiple startup.conf files just point vpp to them with -c. 
Note that you can’t run multiple vpp instances with dpdk plugin loaded. 

> 
> 3. My tests and the RSS mechanism.
> 
> My set-up is the following: two identical machines, X12's (two xeon, 38+38 
> cores), each one equipped with one Mellanox 100Gbps NIC card (one connectx-4 
> one connectx-5)
> 
> Using iperf3 with LDP+VCL to interface with VPP, hence the flow is:
> 
> Iperf3 client <--> VCL+LDP -> VPP -> DPDK plug-in -> Mlx NIC <-link-> Mlx NIC 
> -> DPDK plug-in -> VPP -> VCL+LDP <--> Iperf3 server
> Machine A 
> <--->  Machine B
> 
> Distribution Ubuntu 20.04 LTS, kernel low latency customised, isolated all 
> cores except two.
> 
> VPP version 21.10 recompiled natively on the machines.
> 
> I'm using DPDK not the RDMA driver.
> 
> What I'm observing is strange variations in throughput for the following 
> scenario:
> 
> Iperf3 single tcp stream on one isolated core, VPP 8 cores pinned to 8 NIC 
> queues
> 
> sometimes it is 15Gbps, sometimes it is 36Gbps ("show hardware" says 3 queues 
> are used)

Some comments here:
- TCP flows are pinned to cores. So only one core will ever be active in your 
test above. 
- Are iperf, vpp’s cores and the nic on the same numa? To see numa for the nic 
“show hardware” in vpp. 
- If you plan to use multiple iperf streams, make sure you have as many rx 
queues as vpp workers and number of tx queues should be rx queues + 1. 

> 
> Hence I was a bit dazzled about RSS. I'm not expecting such large variations 
> from run to run.
> 
> I'm not a VPP expert so if you have suggestions to what to look for, they 
> are welcome :-)
> 
> Thank you in advance for your patience and for your time
> 
> Kind regards
> 
> Federico
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22190): https://lists.fd.io/g/vpp-dev/message/22190
Mute This Topic: https://lists.fd.io/mt/95024801/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy

Re: [vpp-dev] DNS Resolution over VCL

2022-11-03 Thread Florin Coras

Hi Anthony, 

Great! Will add the gethostbyname issue to my never shrinking todo list but, 
should you look into it, let me know if you manage to figure out how ldp 
interacts with gethostbyname.

Regards, 
Florin

> On Nov 3, 2022, at 4:54 AM, Anthony Fee  wrote:
> 
> Hi Florin,
> 
> Thanks for the feedback. I built a very basic application to trigger a TCP 
> request, with gethostbyname() to resolve the hostname, using the VCL library 
> directly. This worked as expected. Just to verify, I ran this also with 
> LD_PRELOAD and I did see the gethostbyname() function stopped working. So it 
> does look like that LD_PRELOAD is having an impact.
> 
> Anyway, the good news is that VCL is very easy to work. 
> 
> Thanks again for your input.
> Anthony 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22118): https://lists.fd.io/g/vpp-dev/message/22118
Mute This Topic: https://lists.fd.io/mt/94581374/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] weird crash when allocate new ho_session_alloc in debug image

2022-11-03 Thread Florin Coras

Thanks, merged!

Regards,
Florin


> On Nov 3, 2022, at 12:33 AM, 张东亚  wrote:
> 
> Hi,
> 
> I have made a patch and submit it to gerrit for review.
> 
> https://gerrit.fd.io/r/c/vpp/+/37567 <https://gerrit.fd.io/r/c/vpp/+/37567>
> 
> I have run against vpp unit test for session feature and no regression found 
> yet.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年11月1日周二 23:42写道：
> Hi, 
> 
> Will you be pushing the fix or should I do it? 
> 
> Regards, 
> Florin
> 
>> On Oct 25, 2022, at 9:26 AM, Florin Coras via lists.fd.io 
>> <http://lists.fd.io/> > <mailto:fcoras.lists=gmail@lists.fd.io>> wrote:
>> 
>> Hi, 
>> 
>> Apologies, I missed your original point and only though about the large 
>> bitmap we create at startup. So yes, go for s->session_index + 1.
>> 
>> Regards,
>> Florin
>> 
>>> On Oct 24, 2022, at 9:11 PM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> Can you elaborate a bit on that, If session index is 64, if we do not 
>>> increase by 1, it will only make one 64B vec for the bitmap, which may not 
>>> hold the session index.
>>> 
>>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>>> 于2022年10月25日周二 01:14写道：
>>> Hi, 
>>> 
>>> Could you replace s->session_index by s->session_index ? : 1 in the patch? 
>>> 
>>> Regards, 
>>> Florin
>>> 
>>>> On Oct 24, 2022, at 12:23 AM, Zhang Dongya >>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>> 
>>>> Hi list,
>>>> 
>>>> Recently I am testing my TCP application in a plugin, what I did is to 
>>>> initiate a TCP client in my plugin, however, when I build the debug image 
>>>> and test, the vpp
>>>> will crash and complaint about out of memory.
>>>> 
>>>> After doing some research, it seems the following code may cause the crash:
>>>> 
>>>> always_inline session_t *
>>>> ho_session_alloc (void)
>>>> {
>>>>   session_t *s;
>>>>   ASSERT (vlib_get_thread_index () == 0);
>>>>   s = session_alloc (0);
>>>>   s->session_state = SESSION_STATE_CONNECTING;
>>>>   s->flags |= SESSION_F_HALF_OPEN;
>>>>   /* Not ideal. Half-opens are only allocated from main with worker barrier
>>>>* but can be cleaned up, i.e., session_half_open_free, from main without
>>>>* a barrier. In debug images, the free_bitmap can grow while workers 
>>>> peek
>>>>* the sessions pool, e.g., session_half_open_migrate_notify, and as a
>>>>* result crash while validating the session. To avoid this, grow the 
>>>> bitmap
>>>>* now. */
>>>>   if (CLIB_DEBUG)
>>>> {
>>>>   session_t *sp = session_main.wrk[0].sessions;
>>>>   clib_bitmap_validate (pool_header (sp)->free_bitmap, 
>>>> s->session_index);
>>>> }
>>>>   return s;
>>>> }
>>>> 
>>>> since the clib_bitmap_validate is defined as:
>>>> 
>>>> /* Make sure that a bitmap is at least n_bits in size */
>>>> #define clib_bitmap_validate(v,n_bits) \
>>>>   clib_bitmap_vec_validate ((v), ((n_bits) - 1) / BITS (uword))
>>>> 
>>>> 
>>>> The first half open session have a session_index with zero, so 0-1 will 
>>>> make a overflow which cause it try to allocate (UINT64_MAX-1)/ 64 memory 
>>>> which make
>>>> the vppinfra abort.
>>>> 
>>>> I think we should modify the code above with s->session_index + 1, if 
>>>> that's correct, I will submit a patch later.
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22117): https://lists.fd.io/g/vpp-dev/message/22117
Mute This Topic: https://lists.fd.io/mt/94529335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] DNS Resolution over VCL

2022-10-31 Thread Florin Coras

Hi Anthony, 

Assuming the host os has network connectivity beyond vpp for dns resolution, 
this is surprising. Would be good to understand if anything actually makes its 
way into ldp during a gethostbyname() call. 

Native integration with vcl, as opposed to ldp, should solve the problem but 
that obviously means more work and it might not be possible in some cases. If 
your use case relies on more of vpp’s features, beyond host stack, tap/memif 
interfaces are also a good option. 

Regards, 
Florin

> On Oct 31, 2022, at 6:24 AM, Anthony Fee  wrote:
> 
> Hi Florin,
> 
> Thank you for the reply, much appreciated. I assume that gethostbyname() uses 
> something under the hood that LDP does intercept, otherwise it would continue 
> to use the Linux implementation to resolve hostnames. From what I can see, 
> right now LDP renders gethostbyname() unusable in the application. Are you 
> aware of this behaviour? If so, is there any workaround when using VCL or is 
> it better to just use another mechanism to interface with VPP?
> 
> I don't have the scope to look at this now, but I will likely need this 
> functionality in the future so would be interested in implementing it when 
> the time comes. I know that you are mainly focused on server side at the 
> moment so this probably isn't much of an issue. I'm getting asked more on the 
> client side these days so it is of interest to me.
> 
> Thanks again,
> Anthony 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22107): https://lists.fd.io/g/vpp-dev/message/22107
Mute This Topic: https://lists.fd.io/mt/94581374/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] DNS Resolution over VCL

2022-10-28 Thread Florin Coras

Hi Anthony, 

LDP doesn’t currently intercept gethostbyname as integration with vpp's 
internal dns resolver is not yet done for vcl. Should you or anybody else be 
interested in implementing that, I’d be happy to offer support. 

Regards,
Florin

> On Oct 28, 2022, at 2:29 AM, Anthony Fee  wrote:
> 
> Hi all,
> 
> I still haven't had any luck getting this working. Any ideas? 
> 
> Thanks in advance!
> Anthony 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22093): https://lists.fd.io/g/vpp-dev/message/22093
Mute This Topic: https://lists.fd.io/mt/94581374/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] weird crash when allocate new ho_session_alloc in debug image

2022-10-25 Thread Florin Coras

Hi, 

Apologies, I missed your original point and only though about the large bitmap 
we create at startup. So yes, go for s->session_index + 1.

Regards,
Florin

> On Oct 24, 2022, at 9:11 PM, Zhang Dongya  wrote:
> 
> Hi,
> 
> Can you elaborate a bit on that, If session index is 64, if we do not 
> increase by 1, it will only make one 64B vec for the bitmap, which may not 
> hold the session index.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年10月25日周二 01:14写道：
> Hi, 
> 
> Could you replace s->session_index by s->session_index ? : 1 in the patch? 
> 
> Regards, 
> Florin
> 
>> On Oct 24, 2022, at 12:23 AM, Zhang Dongya > <mailto:fortitude.zh...@gmail.com>> wrote:
>> 
>> Hi list,
>> 
>> Recently I am testing my TCP application in a plugin, what I did is to 
>> initiate a TCP client in my plugin, however, when I build the debug image 
>> and test, the vpp
>> will crash and complaint about out of memory.
>> 
>> After doing some research, it seems the following code may cause the crash:
>> 
>> always_inline session_t *
>> ho_session_alloc (void)
>> {
>>   session_t *s;
>>   ASSERT (vlib_get_thread_index () == 0);
>>   s = session_alloc (0);
>>   s->session_state = SESSION_STATE_CONNECTING;
>>   s->flags |= SESSION_F_HALF_OPEN;
>>   /* Not ideal. Half-opens are only allocated from main with worker barrier
>>* but can be cleaned up, i.e., session_half_open_free, from main without
>>* a barrier. In debug images, the free_bitmap can grow while workers peek
>>* the sessions pool, e.g., session_half_open_migrate_notify, and as a
>>* result crash while validating the session. To avoid this, grow the 
>> bitmap
>>* now. */
>>   if (CLIB_DEBUG)
>> {
>>   session_t *sp = session_main.wrk[0].sessions;
>>   clib_bitmap_validate (pool_header (sp)->free_bitmap, s->session_index);
>> }
>>   return s;
>> }
>> 
>> since the clib_bitmap_validate is defined as:
>> 
>> /* Make sure that a bitmap is at least n_bits in size */
>> #define clib_bitmap_validate(v,n_bits) \
>>   clib_bitmap_vec_validate ((v), ((n_bits) - 1) / BITS (uword))
>> 
>> 
>> The first half open session have a session_index with zero, so 0-1 will make 
>> a overflow which cause it try to allocate (UINT64_MAX-1)/ 64 memory which 
>> make
>> the vppinfra abort.
>> 
>> I think we should modify the code above with s->session_index + 1, if that's 
>> correct, I will submit a patch later.
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22071): https://lists.fd.io/g/vpp-dev/message/22071
Mute This Topic: https://lists.fd.io/mt/94529335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] weird crash when allocate new ho_session_alloc in debug image

2022-10-24 Thread Florin Coras

Hi, 

Could you replace s->session_index by s->session_index ? : 1 in the patch? 

Regards, 
Florin

> On Oct 24, 2022, at 12:23 AM, Zhang Dongya  wrote:
> 
> Hi list,
> 
> Recently I am testing my TCP application in a plugin, what I did is to 
> initiate a TCP client in my plugin, however, when I build the debug image and 
> test, the vpp
> will crash and complaint about out of memory.
> 
> After doing some research, it seems the following code may cause the crash:
> 
> always_inline session_t *
> ho_session_alloc (void)
> {
>   session_t *s;
>   ASSERT (vlib_get_thread_index () == 0);
>   s = session_alloc (0);
>   s->session_state = SESSION_STATE_CONNECTING;
>   s->flags |= SESSION_F_HALF_OPEN;
>   /* Not ideal. Half-opens are only allocated from main with worker barrier
>* but can be cleaned up, i.e., session_half_open_free, from main without
>* a barrier. In debug images, the free_bitmap can grow while workers peek
>* the sessions pool, e.g., session_half_open_migrate_notify, and as a
>* result crash while validating the session. To avoid this, grow the bitmap
>* now. */
>   if (CLIB_DEBUG)
> {
>   session_t *sp = session_main.wrk[0].sessions;
>   clib_bitmap_validate (pool_header (sp)->free_bitmap, s->session_index);
> }
>   return s;
> }
> 
> since the clib_bitmap_validate is defined as:
> 
> /* Make sure that a bitmap is at least n_bits in size */
> #define clib_bitmap_validate(v,n_bits) \
>   clib_bitmap_vec_validate ((v), ((n_bits) - 1) / BITS (uword))
> 
> 
> The first half open session have a session_index with zero, so 0-1 will make 
> a overflow which cause it try to allocate (UINT64_MAX-1)/ 64 memory which make
> the vppinfra abort.
> 
> I think we should modify the code above with s->session_index + 1, if that's 
> correct, I will submit a patch later.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22065): https://lists.fd.io/g/vpp-dev/message/22065
Mute This Topic: https://lists.fd.io/mt/94529335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp crash when close a host-stack tcp session in syn-sent state.

2022-10-13 Thread Florin Coras

Hi, 

[cc Vanessa]

Could you please open a ticket here [1]? Hopefully this can be solved. 

Regards,
Florin

[1] https://jira.linuxfoundation.org/plugins/servlet/desk/portal/2/create/37

> On Oct 12, 2022, at 10:42 PM, Zhang Dongya  wrote:
> 
> Yes, I can login to link [1] and can see my account have been registered in 
> LF 5 years, however, when I login the gerrit web ui, it still reports 
> Forbidden error, my account username is ZhangDongya.
> 
> Ok, I will try to use git command line to give a try.
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年10月13日周四 10:12写道：
> An LF account should suffice. Could you confirm your lf credentials work here 
> [1]?
> 
> And, in case you haven’t seen this already, here are the steps to get you 
> started on pushing the patch, once the above is solved [2]. 
> 
> Regards,
> Florin
> 
> [1] https://identity.linuxfoundation.org/ 
> <https://identity.linuxfoundation.org/>
> [2] 
> https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Pulling_code_via_ssh
>  
> <https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Pulling_code_via_ssh>
> 
> 
>> On Oct 12, 2022, at 6:21 PM, Zhang Dongya > <mailto:fortitude.zh...@gmail.com>> wrote:
>> 
>> Thanks a lot,I just add a check for tx_fifo there locally and it seems works.
>> 
>> BTW,
>> 
>> I'd like to help to submit a patch, however I don't know the reason when I 
>> trying to login gerrit using my linux foundation id, it always reports 
>> Forbidden error, do you know where I can get help to
>> solve this ?  or gerrit need some approval for get involved?
>> 
>> It's ok if you want to get it fixed asap. 
>> 
>> Florin Coras mailto:fcoras.li...@gmail.com>> 
>> 于2022年10月12日周三 23:44写道：
>> Hi, 
>> 
>> It looks like a bug. We should make sure the fifo exists, which is typically 
>> the case unless transport is stuck in half-open. Note that tcp does timeout 
>> and cleanups those stuck half-open sessions, but we should allow the app to 
>> cleanup as well. 
>> 
>> Let me know if you plan to push a patch or I should do it. 
>> 
>> Regards,
>> Florin
>> 
>>> On Oct 12, 2022, at 12:44 AM, Zhang Dongya >> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am now trying to use vpp host-stack to negotiate a valid TCP session, 
>>> however, I found if I call vnet_disconnect_session when the TCP stuck in 
>>> syn-sent state (this may be caused by I have shutdown the remove side).
>>> 
>>> Vpp will crash in the following code which call svm_fifo_clear_deq_ntf 
>>> while the tx_fifo is not inited, this is because the tx_fifo will be 
>>> allocated init app_worker_init_connected.
>>> 
>>> Is this a bug or I have something wrong with my using of host-stack?
>>> 
>>> 
>>> 
>>> void
>>> session_close (session_t * s)
>>> {
>>>   if (!s)
>>> return;
>>> 
>>>   if (s->session_state >= SESSION_STATE_CLOSING)
>>> {
>>>   /* Session will only be removed once both app and transport
>>>* acknowledge the close */
>>>   if (s->session_state == SESSION_STATE_TRANSPORT_CLOSED
>>> || s->session_state == SESSION_STATE_TRANSPORT_DELETED)
>>>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
>>>   return;
>>> }
>>> 
>>>   /* App closed so stop propagating dequeue notifications */
>>>   svm_fifo_clear_deq_ntf (s->tx_fifo);
>>>   s->session_state = SESSION_STATE_CLOSING;
>>>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
>>> }
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22023): https://lists.fd.io/g/vpp-dev/message/22023
Mute This Topic: https://lists.fd.io/mt/94276501/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp crash when close a host-stack tcp session in syn-sent state.

2022-10-12 Thread Florin Coras

An LF account should suffice. Could you confirm your lf credentials work here 
[1]?

And, in case you haven’t seen this already, here are the steps to get you 
started on pushing the patch, once the above is solved [2]. 

Regards,
Florin

[1] https://identity.linuxfoundation.org/ 
<https://identity.linuxfoundation.org/>
[2] 
https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Pulling_code_via_ssh
 
<https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Pulling_code_via_ssh>


> On Oct 12, 2022, at 6:21 PM, Zhang Dongya  wrote:
> 
> Thanks a lot,I just add a check for tx_fifo there locally and it seems works.
> 
> BTW,
> 
> I'd like to help to submit a patch, however I don't know the reason when I 
> trying to login gerrit using my linux foundation id, it always reports 
> Forbidden error, do you know where I can get help to
> solve this ?  or gerrit need some approval for get involved?
> 
> It's ok if you want to get it fixed asap. 
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年10月12日周三 23:44写道：
> Hi, 
> 
> It looks like a bug. We should make sure the fifo exists, which is typically 
> the case unless transport is stuck in half-open. Note that tcp does timeout 
> and cleanups those stuck half-open sessions, but we should allow the app to 
> cleanup as well. 
> 
> Let me know if you plan to push a patch or I should do it. 
> 
> Regards,
> Florin
> 
>> On Oct 12, 2022, at 12:44 AM, Zhang Dongya > <mailto:fortitude.zh...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I am now trying to use vpp host-stack to negotiate a valid TCP session, 
>> however, I found if I call vnet_disconnect_session when the TCP stuck in 
>> syn-sent state (this may be caused by I have shutdown the remove side).
>> 
>> Vpp will crash in the following code which call svm_fifo_clear_deq_ntf while 
>> the tx_fifo is not inited, this is because the tx_fifo will be allocated 
>> init app_worker_init_connected.
>> 
>> Is this a bug or I have something wrong with my using of host-stack?
>> 
>> 
>> 
>> void
>> session_close (session_t * s)
>> {
>>   if (!s)
>> return;
>> 
>>   if (s->session_state >= SESSION_STATE_CLOSING)
>> {
>>   /* Session will only be removed once both app and transport
>>* acknowledge the close */
>>   if (s->session_state == SESSION_STATE_TRANSPORT_CLOSED
>> || s->session_state == SESSION_STATE_TRANSPORT_DELETED)
>>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
>>   return;
>> }
>> 
>>   /* App closed so stop propagating dequeue notifications */
>>   svm_fifo_clear_deq_ntf (s->tx_fifo);
>>   s->session_state = SESSION_STATE_CLOSING;
>>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
>> }
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22018): https://lists.fd.io/g/vpp-dev/message/22018
Mute This Topic: https://lists.fd.io/mt/94276501/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] vpp crash when close a host-stack tcp session in syn-sent state.

2022-10-12 Thread Florin Coras

Hi, 

It looks like a bug. We should make sure the fifo exists, which is typically 
the case unless transport is stuck in half-open. Note that tcp does timeout and 
cleanups those stuck half-open sessions, but we should allow the app to cleanup 
as well. 

Let me know if you plan to push a patch or I should do it. 

Regards,
Florin

> On Oct 12, 2022, at 12:44 AM, Zhang Dongya  wrote:
> 
> Hi,
> 
> I am now trying to use vpp host-stack to negotiate a valid TCP session, 
> however, I found if I call vnet_disconnect_session when the TCP stuck in 
> syn-sent state (this may be caused by I have shutdown the remove side).
> 
> Vpp will crash in the following code which call svm_fifo_clear_deq_ntf while 
> the tx_fifo is not inited, this is because the tx_fifo will be allocated init 
> app_worker_init_connected.
> 
> Is this a bug or I have something wrong with my using of host-stack?
> 
> 
> 
> void
> session_close (session_t * s)
> {
>   if (!s)
> return;
> 
>   if (s->session_state >= SESSION_STATE_CLOSING)
> {
>   /* Session will only be removed once both app and transport
>* acknowledge the close */
>   if (s->session_state == SESSION_STATE_TRANSPORT_CLOSED
> || s->session_state == SESSION_STATE_TRANSPORT_DELETED)
>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
>   return;
> }
> 
>   /* App closed so stop propagating dequeue notifications */
>   svm_fifo_clear_deq_ntf (s->tx_fifo);
>   s->session_state = SESSION_STATE_CLOSING;
>   session_program_transport_ctrl_evt (s, SESSION_CTRL_EVT_CLOSE);
> }
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22011): https://lists.fd.io/g/vpp-dev/message/22011
Mute This Topic: https://lists.fd.io/mt/94276501/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] UDP multithreading error

2022-09-26 Thread Florin Coras

Hi, 

That’s an inefficiency in ldp when vcl does not use eventfds for notifications. 
Could you change your vcl.conf and add use-mq-eventfd? 

Regards,
Florin

> On Sep 26, 2022, at 5:35 AM, nengjie wang  wrote:
> 
> Thank you very much for your reply.
> 
> However, after the socket is set to the non blocking mode, the CPU 
> utilization rate is always 100%. The resource consumption is very high when 
> multithreading. I can only reduce the CPU utilization rate by using methods 
> such as select.
> 
> Why does this mode fail in blocking mode? Do you have any ideas?
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年9月8日周四 02:06写道：
> Hi, 
> 
> Just tested on master and this seems to work fine after a few app fixes. 
> Having said that, I would recommend you build vcl native apps and register 
> multiple workers for better performance.
> 
> Comments:
> - make sure sockets are not blocking otherwise there’s a good change that 
> only one of the threads will output something
> - only print a message if data was received
> 
> See diff here [1].  
> 
> Example output, with 2 clients (c1 and c2) and updated ips. 
> 
> [snip]
> vppcom_session_create:1410: vcl<233836:0>: created session 1
> ldp<233836>: fd 33: calling vls_bind: vlsh 1, addr 0x7ffd7d7cf750, len 16
> vppcom_session_bind:1599: vcl<233836:0>: session 1 handle 1: binding to local 
> IPv4 address 7.0.0.2 port 12342, proto UDP
> vppcom_session_listen:1628: vcl<233836:0>: session 1: sending vpp listen 
> request...
> vcl_session_bound_handler:569: vcl<233836:0>: session 1 [0x0]: listen 
> succeeded!
> ldp<233836>: fd 33 vlsh 1, cmd 4
> 
> [recv from 7.0.0.1:60414]c1 s1
> 
> [recv from 7.0.0.1:56760]c2 s1
> 
> [recv from 7.0.0.1:56760]c2 s2
> 
> [recv from 7.0.0.1:60414]c1 s2
> 
> 
> Regards,
> Florin
> 
> [1] https://pastebin.com/4LCukbEC <https://pastebin.com/4LCukbEC>
> 
> > On Sep 7, 2022, at 12:05 AM, nengjie wang  > <mailto:moho.man...@gmail.com>> wrote:
> > 
> > The attachment is my server program.
> > 
> > My VPP version is 22.06. When I start this server by LDP and send a request 
> > to this server, the server will have a direct segment error. Two threads 
> > share an event queue, resulting in an event queue crash. Can you tell me if 
> > LDP does not support this use?
> > 
> > In addition, I noticed that there is a configuration item called ‘muti 
> > thread workers’ in the VCL configuration file. When this configuration is 
> > turned on, the program can not receive RPC response. Is it suitable for me 
> > to use this configuration item in this case?
> > 
> > 
> > 
> 
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21929): https://lists.fd.io/g/vpp-dev/message/21929
Mute This Topic: https://lists.fd.io/mt/93520105/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] request-response between vlib processes

2022-09-13 Thread Florin Coras

Hi Vratko,

> On Sep 13, 2022, at 5:03 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) via lists.fd.io  wrote:
> 
> In general, most of “communication” between VPP components
> is done by directly calling C functions,
> so it makes sense avf_flag_change is being called within vl_api_clnt_process 
> process.
> It is avf_process_request (called directly by avf_flag_change)
> that decides to hand-off the request to avf_process process for async 
> handling,
> so it should make sure to resume the API process correctly upon the response.
> 
> > just to set a mac address? 
>  
> In my particular test the async operation switches promiscuous mode on an 
> interface,
> but I guess it does not really matter what a particular operation does.
> What matters is there is a synchronous API call (l2_patch_add_del in my test)
> which only indirectly causes an asynchronous operation (as the interface uses 
> AVF driver).
>  

Didn’t have an issue with how the api ends up calling avf_process_request. I 
was just wondering why we ended up needing such a complicated procedure to 
apply what looked like simple updates.

> > Do we really need to block the binary api 
>  
> The l2_patch_add_del does block.
> Especially in the “del” case, the subsequent API calls
> need to know whether the interface is gone yet or not.

I’m pretty sure we could mark things as down and program an asyc cleanup from 
within the avf layer. That is, if async is necessary, for deletes we should be 
able to provide a return code as soon as we find that the device/state exists 
and program the removal.

But for adds, it would be good if we could avoid suspending the current process 
in avf because it can’t know all the ways in which the calling process could be 
signaled. 

>  
> > pass opaques in requests
>  
> As usual, there are several ways to make it work,
> we just need to pick one (and put an example usage into the docs).

And I believe that’s what we’re discussing here :-) 

Florin

>  
> Vratko.
>  
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>  <mailto:vpp-dev@lists.fd.io>> On Behalf Of Florin Coras
> Sent: Monday, 2022-September-12 23:11
> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] request-response between vlib processes
>  
> Hi Vratko, 
>  
> Do we really need to block the binary api waiting for a reply from another 
> vpp process just to set a mac address? 
>  
> If setting up the mac (or similar) cannot be done synchronously, probably api 
> handlers should hand over all those requests to another vpp process, 
> vl_api_async_req_process, that takes care of async execution and generation 
> of api replies. You could also pass opaques in requests and maybe expect 
> backends, like avf_process, to bounce that opaques back for demuxing. 
>  
> Regards,
> Florin
> 
> 
> On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) vialists.fd.io <http://lists.fd.io/>  <mailto:vrpolak=cisco@lists.fd.io>> wrote:
>  
> [resending to the correct vpp-dev e-mail address]
>  
> Short version:
> Vratko would appreciate something like 
> vlib_current_process_wait_for_one_time_event_or_clock.
>  
> Medium version:
> One instance of request-response interaction between vlib processes had a bug 
> [11].
> Vratko contributed a fix [9] for the immediate issue,
> but the proper fix was left hinted in TODOs (and discussed in the long 
> version).
>  
> Long version:
>  
> Vlib supports processes and signals, see corresponding sections in the docs 
> [7].
> Using the actor model vocabulary, a (vlib) process is an actor,
> and (vlib) signaling a (vlib) event means sending a message between actors.
> There is no vlib name for actor behavior [10].
>  
> The typical use of event signaling in VPP is “fire and forget”,
> meaning a “request” without any need to respond.
> That means a typical process has just one behavior;
> the side effects of a process are given by event type (and data),
> not directly by the sequence of previous events received.
>  
> But there is an exception (and in future there may be more).
> The process avf_process, when handling AVF_PROCESS_EVENT_REQ
> and detecting that was signaled by some other process,
> it signals back a “response” event.
> The main reason is that some operations may take unreasonably long time,
> and we prefer VPP to crash there (instead of getting stuck)
> so we can see the backtrace.
>  
> A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
> whose loop usually handles SOCKET_READ_EVENT events.
> I mean, this socket API handling process has no idea about AVF plugin 
>

Re: [vpp-dev] request-response between vlib processes

2022-09-12 Thread Florin Coras

Hi Vratko, 

Do we really need to block the binary api waiting for a reply from another vpp 
process just to set a mac address? 

If setting up the mac (or similar) cannot be done synchronously, probably api 
handlers should hand over all those requests to another vpp process, 
vl_api_async_req_process, that takes care of async execution and generation of 
api replies. You could also pass opaques in requests and maybe expect backends, 
like avf_process, to bounce that opaques back for demuxing. 

Regards,
Florin

> On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) via lists.fd.io  wrote:
> 
> [resending to the correct vpp-dev e-mail address]
>  
> Short version:
> Vratko would appreciate something like 
> vlib_current_process_wait_for_one_time_event_or_clock.
>  
> Medium version:
> One instance of request-response interaction between vlib processes had a bug 
> [11].
> Vratko contributed a fix [9] for the immediate issue,
> but the proper fix was left hinted in TODOs (and discussed in the long 
> version).
>  
> Long version:
>  
> Vlib supports processes and signals, see corresponding sections in the docs 
> [7].
> Using the actor model vocabulary, a (vlib) process is an actor,
> and (vlib) signaling a (vlib) event means sending a message between actors.
> There is no vlib name for actor behavior [10].
>  
> The typical use of event signaling in VPP is “fire and forget”,
> meaning a “request” without any need to respond.
> That means a typical process has just one behavior;
> the side effects of a process are given by event type (and data),
> not directly by the sequence of previous events received.
>  
> But there is an exception (and in future there may be more).
> The process avf_process, when handling AVF_PROCESS_EVENT_REQ
> and detecting that was signaled by some other process,
> it signals back a “response” event.
> The main reason is that some operations may take unreasonably long time,
> and we prefer VPP to crash there (instead of getting stuck)
> so we can see the backtrace.
>  
> A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
> whose loop usually handles SOCKET_READ_EVENT events.
> I mean, this socket API handling process has no idea about AVF plugin 
> specific needs,
> but it can call avf_process_request function which (upon detecting it is not 
> called
> from avf_process process) does the signaling and waiting.
>  
> But this means vl_api_clnt_process is the first process (that I know of) with 
> two behaviors.
> The first one focuses on handling new API messages,
> the second one focuses on handling the AVF response (especially the lack 
> thereof in time).
> As clib_panic is called when the response does not arrive,
> (and I hope there are never two requests at the same time)
> the first behavior never encounters the AVF response.
> But the second behavior can encounter SOCKET_READ_EVENT.
> The VPP-2033 [11] bug is what happens in that case.
>  
> A minor issue is that the “response” event is defined just by
> event type being zero, which would not work in (hypothetical future) scenarios
> when a single process waits for two different responses.
>  
> Reading through node_funcs.h I found 
> vlib_current_process_wait_for_one_time_event [12],
> which looks suited for waiting for “single response” events,
> but it lacks the time awareness of vlib_process_wait_for_event_or_clock.
> If we had something like vlib_current_process_wait_for_one_time_event_or_clock
> (and its example usage in the docs), handling the response would become 
> easier.
>  
> Vratko.
>  
> [7] 
> https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst
>  
> 
> [9] https://gerrit.fd.io/r/c/vpp/+/37083 
> 
> [10] https://en.wikipedia.org/wiki/Actor_model#Behaviors 
> 
> [11] https://jira.fd.io/browse/VPP-2033 
> [12] 
> https://github.com/FDio/vpp/blob/16052480c377127f9cb7facbab53f46e595b27cf/src/vlib/node_funcs.h#L1186
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21872): https://lists.fd.io/g/vpp-dev/message/21872
Mute This Topic: https://lists.fd.io/mt/93630182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] UDP multithreading error

2022-09-07 Thread Florin Coras

Hi, 

Just tested on master and this seems to work fine after a few app fixes. Having 
said that, I would recommend you build vcl native apps and register multiple 
workers for better performance.

Comments:
- make sure sockets are not blocking otherwise there’s a good change that only 
one of the threads will output something
- only print a message if data was received

See diff here [1].  

Example output, with 2 clients (c1 and c2) and updated ips. 

[snip]
vppcom_session_create:1410: vcl<233836:0>: created session 1
ldp<233836>: fd 33: calling vls_bind: vlsh 1, addr 0x7ffd7d7cf750, len 16
vppcom_session_bind:1599: vcl<233836:0>: session 1 handle 1: binding to local 
IPv4 address 7.0.0.2 port 12342, proto UDP
vppcom_session_listen:1628: vcl<233836:0>: session 1: sending vpp listen 
request...
vcl_session_bound_handler:569: vcl<233836:0>: session 1 [0x0]: listen succeeded!
ldp<233836>: fd 33 vlsh 1, cmd 4

[recv from 7.0.0.1:60414]c1 s1

[recv from 7.0.0.1:56760]c2 s1

[recv from 7.0.0.1:56760]c2 s2

[recv from 7.0.0.1:60414]c1 s2


Regards,
Florin

[1] https://pastebin.com/4LCukbEC

> On Sep 7, 2022, at 12:05 AM, nengjie wang  wrote:
> 
> The attachment is my server program.
> 
> My VPP version is 22.06. When I start this server by LDP and send a request 
> to this server, the server will have a direct segment error. Two threads 
> share an event queue, resulting in an event queue crash. Can you tell me if 
> LDP does not support this use?
> 
> In addition, I noticed that there is a configuration item called ‘muti thread 
> workers’ in the VCL configuration file. When this configuration is turned on, 
> the program can not receive RPC response. Is it suitable for me to use this 
> configuration item in this case?
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21859): https://lists.fd.io/g/vpp-dev/message/21859
Mute This Topic: https://lists.fd.io/mt/93520105/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP LDP ERROR

2022-08-30 Thread Florin Coras

Hi wanghe, 

Hard to say exactly what’s happening there but, at a high level, vls tries to 
clone a vcl session from one worker to another and the rpc does not return 
after 3s. 

Are you running this at scale or does this happen with few sessions?

Also, make sure that a session does not constantly bounce between app threads. 
If that’s the case, most probably multi-thread-workers is not the right 
approach as it was mainly meant for scenarios where a session is allocated on 
one thread and then it’s used on another. 

Regards,
Florin

> On Aug 30, 2022, at 2:22 AM, NUAA无痕  wrote:
> 
> Hi, vpp experts
> my vpp version is 22.06
> 
> vcl.conf
> {
> ...
> multi-thread-workers
> }
> 
> im use multi thread program with LDP hoststack, vcl.conf config 
> 'multi-thread-workers',
> my program will listen many ports, but now program run will error
> 
> error message:
> vls_mt_session_migrate:1065 failed to wait rpc response
> 
> can you give your suggestion?
> 
> best regards
> wanghe
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21838): https://lists.fd.io/g/vpp-dev/message/21838
Mute This Topic: https://lists.fd.io/mt/93345198/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] mysql proxy by using vpp host stack #vpp-hoststack

2022-08-22 Thread Florin Coras

Hi, 

Probably libevent statically links libc. If possible, try to recompile libevent 
and make it dynamic. 

Regards,
Florin

> On Aug 20, 2022, at 8:06 PM, weizhen9...@163.com wrote:
> 
> Hi,
> Now I implement mysql proxy by using libevent.so. And I want to implement 
> mysql proxy by vpp hoststack. But I find the dynamic library libevent.so can 
> not callback the libvcl_preload. Is this plan possible?
> Thanks.
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21809): https://lists.fd.io/g/vpp-dev/message/21809
Mute This Topic: https://lists.fd.io/mt/93155793/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] TCP Stack Performance Benchmark

2022-08-22 Thread Florin Coras

Hi Anthony, 

Sounds great! Let me know what you find. 

As for wrk, maybe give it a try. Typically, we’ve focused ldp testing on server 
side, e.g., nginx/envoy, since that was what folks were interested in. We 
should probably take a closer look at client side as well. 

Regards,
Florin

> On Aug 19, 2022, at 3:51 AM, Anthony Fee  wrote:
> 
> Hi Florin,
> 
> Thank you for the reply.
> 
> I have seen with Netperf that actually there is TCP traffic going back and 
> forth, a connection is established and then the client side crashes. I'll try 
> to dig deeper to see what might be causing that. All I see on the server side 
> is a received invalid connection message.
> 
> I don't need to have both server and client using VPP, but I am focused on 
> the client performance. It's a good suggestion, I may try netperf server side 
> outside of VPP to see what that produces (and the other way around). I assume 
> wrk doesn't work with VCL?
> 
> I'm getting more involved in VPP these days and so I may look to extend 
> existing apps, depending on if it's required. Either way, I'll report back 
> any findings here in case it helps someone else.
> 
> Thanks again for the response.
> 
> Anthony 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21808): https://lists.fd.io/g/vpp-dev/message/21808
Mute This Topic: https://lists.fd.io/mt/93077102/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] TCP Stack Performance Benchmark

2022-08-18 Thread Florin Coras

Hi Anthony, 

I for one have not tried netperf, so not sure why it’s not working. Assuming it 
is directly using epoll/select, i.e., no libevent/ev  it should work unless it 
uses a weird threading model. So, it might be worth investigating why it’s 
crashing. 

Do you need both server and client to use vpp. If not, maybe try nginx + vpp 
and run wrk/ab or any other tool on the client side. 

Another option would be to extend our home grown vcl test apps to measure 
latency. See the vcl client and server apps here [1]. 

Regards,
Florin

[1] https://git.fd.io/vpp/tree/src/plugins/hs_apps/vcl

> On Aug 17, 2022, at 2:26 AM, Anthony Fee  wrote:
> 
> Hi all,
>  
> I am currently trying to compare TCP performance between the host TCP vs VPP 
> TCP stack. Mainly I am focused on latency but am struggling to find tools to 
> make the comparison. I have tried looking at iPerf as this has been used with 
> VPP through VCL but this only provides UDP latency numbers.
>  
> Has anyone attempted to use Netperf using VCL? I have tried but it seg faults 
> but maybe there are steps I can take to enable this?
>  
> I have also looked at T-Rex although I haven’t found a way to generate the 
> test scenarios that I require. Stateful mode does provide tools for TCP 
> testing and benchmarking but T-Rex emulates the TCP stack in this scenario.
>  
> Any feedback here would be appreciated.
>  
> Thanks,
> Anthony
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21804): https://lists.fd.io/g/vpp-dev/message/21804
Mute This Topic: https://lists.fd.io/mt/93077102/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] TCP msg queue full, connections reset issue

2022-07-27 Thread Florin Coras

Hi Vijay, 

That looks like an accept that either 1) can’t be propagated over shared memory 
message queue to app, because mq is congested or is 2) rejected by a builtin 
app 

Regards, 
Florin 

> On Jul 26, 2022, at 7:13 PM, Vijay Kumar  wrote:
> 
> Hi experts,
> 
> We are seeing the below counters being pegged. The scenario is the UEs are 
> trying to establish TCP with VPP.
> 
> It would be highly appreciated if anyone could tell us why we see the msg 
> queue full counter shown below?
> 
> 
> 1 tcp4-rcv-process   Events not sent for lack of msg 
> queue space
> 2 tcp4-output Packets sent 
> 1 tcp4-output Resets sent 
> 1 tcp4-output Invalid connection  
>   
> 
> 
> 
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21732): https://lists.fd.io/g/vpp-dev/message/21732
Mute This Topic: https://lists.fd.io/mt/92642099/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Which installation is the proper way?

2022-07-13 Thread Florin Coras

Hi Krisztián, 

The first option installs vpp’s debian packages, it will not install vpp in 
your home folder. Once you’re done installing the debs, check if vpp is running 
with something like ps, e.g., ps -ef | grep vpp. Make sure to install 
vpp-plugin-core, to get the plugins.  

Regarding the second option, we no longer support ubuntu 18.04. Could you try 
with 20.04? 

Regards,
Florin

> On Jul 13, 2022, at 5:26 AM, Krisztián Varga  wrote:
> 
> Hello everyone,
> 
> I found two different ways to install VPP on my computer, but I'm not sure, 
> if these ways do the same. There is the documentation for the first one: 
> https://s3-docs.fd.io/vpp/22.06/gettingstarted/installing/ubuntu.html 
>  . 
> With this, I don't have a VPP folder in my home directory, and the plugins 
> not seem to be working this way. 
> The other way I found was this: 
> https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code
>  
> 
> I tried it on Ubuntu 18.04, but it failed at install-dep and install 
> bootstrap. I need to use DPDK plugin for my job, which installation would be 
> better?
> Also, if I put a PCI address in the config file to the dpdk{ } part, then 
> should I enable the plugin for the dpdk? 
> 
> Thank you for the answers. 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21650): https://lists.fd.io/g/vpp-dev/message/21650
Mute This Topic: https://lists.fd.io/mt/92356047/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] api msg deadlock

2022-06-03 Thread Florin Coras

Hi Wanghe, 

The only api bindings supported today are c, python and golang. Maybe somebody 
familiar with the jvpp code can help you out but otherwise I’d recommend to 
switch if possible. 

Regards,
Florin

> On Jun 3, 2022, at 7:55 AM, NUAA无痕  wrote:
> 
> Hi, florin
> 
> About this question, i compare c++ code with jvpp code， then i found that 
> jvpp maybe have a bug and even if update vpp also cannot resolve it
> 
> jvpp code according to vpp version 1901, that has jvpp example
> vpp-1901/extras/japi/java/jvpp-core/io/fd/vpp/jvpp/core/examples/CreateSubInterfaceExample.java
>   has jvpp example and our code according to it
> 
> now vpp version is 2101
> then when java code connected to vpp and then use "close“ function it will 
> hint "peer unresponsive, give up" 
> this error from src/vlibmemory/memory_client.c vl_client_disconnect function
> 
> why this error is that svm_queue_sub always return -2 until timeout
> 
> this is code , the reason is that "vl_input_queue->cursize == 0 " and 
> vl_input_queue->head == vl_input_queue->tail
> 
> int
> vl_client_disconnect (void)
> {
>   vl_api_memclnt_delete_reply_t *rp;
>   svm_queue_t *vl_input_queue;
>   api_main_t *am = vlibapi_get_main ();
>   time_t begin;
> 
>   vl_input_queue = am->vl_input_queue;
>   vl_client_send_disconnect (0 /* wait for reply */ );
> 
>   /*
>* Have to be careful here, in case the client is disconnecting
>* because e.g. the vlib process died, or is unresponsive.
>*/
>   begin = time (0);
>   while (1)
> {
>   time_t now;
> 
>   now = time (0);
> 
>   if (now >= (begin + 2))
> {
>  clib_warning ("peer unresponsive, give up");
>  am->my_client_index = ~0;
>  am->my_registration = 0;
>  am->shmem_hdr = 0;
>  return -1;
> }
> 
> /* this error because vl_input_queue->cursize == 0  */
>   if (svm_queue_sub (vl_input_queue, (u8 *) & rp, SVM_Q_NOWAIT, 0) < 0)
> continue;
> 
>   VL_MSG_API_UNPOISON (rp);
> 
>   /* drain the queue */
>   if (ntohs (rp->_vl_msg_id) != VL_API_MEMCLNT_DELETE_REPLY)
> {
>  clib_warning ("queue drain: %d", ntohs (rp->_vl_msg_id));
>  vl_msg_api_handler ((void *) rp);
>  continue;
> }
>   vl_msg_api_handler ((void *) rp);
>   break;
> }
> 
>   vl_api_name_and_crc_free ();
>   return 0;
> }
> 
> when i use c++ for vpp binary api,  vl_input_queue->cursize == 1 and 
> vl_input_queue->head != vl_input_queue->tail
> 
> so c++ use binary api is correct that about svm_queue_* series functions
> 
> Although JVpp is no longer supported， but this is important for me!
> 
> Can you give a patch for jvpp? Thanks
> 
> Best regards
> 
> Wanghe
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21501): https://lists.fd.io/g/vpp-dev/message/21501
Mute This Topic: https://lists.fd.io/mt/91372330/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] api msg deadlock

2022-05-28 Thread Florin Coras

Hi wanghe, 

Neither vpp 21.01 nor jvpp are supported, so your only options are to try back 
porting fixes from newer versions, if any exist, or to debug the problem. 

As previously mentioned, the deadlock seem to be in a reply message, so the 
issue is probably in the java/c++ implementation of the binary api client or 
the way the api is used by the client. Either the api client is not dequeueing 
messages, e.g., maybe it’s stuck waiting on vpp, or, if did dequeue, it did not 
broadcast on the condvar or the broadcast was missed by vpp. 

Try to check what your api client is doing. That might shed some light on the 
issue.

Hope this helps. 

Regards,
Florin

> On May 27, 2022, at 7:57 PM, NUAA无痕  wrote:
> 
> Hi，florin
> 
> I would appreciate it if you can resolve it
> 
> My project use java web to control vpp, many binary api are used, such as 
> interfaces, ip, our customize api etc.
> 
> half a year ago， i use one c++ restful framework ( google‘s pistache) to use 
> vpp binary api, there also has deadlock problem , then i don't know to send 
> mail for help (dont know must subscribe mail  can send mail to vpp-dev haha)
> 
> i think maybe i m not proficient with c++ multithreading that cause deadlock, 
> that time i fount if use one thread also api communication will deadlock
> 
> so i think many times use api will cause vpp deadlock
> 
> For resolve this problem, we decide to change it to jvpp, maybe jvpp resolve 
> this deadlock （now this problem also exist）
> 
> i found git log for vpp new version that vpp resolve a deadlock about svm （i 
> dont know if it can resolve this）, but now we update vpp need at least one 
> month （maybe too long）
> 
> So florin expert , can you analyze this problem?  thank you very much!
> 
> Best regards
> wanghe
> 
> 
> 
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年5月28日周六 01:02写道：
> Hi wanghe, 
> 
> Unfortunately, jvpp is no longer supported so probably there’s no recent fix 
> for the issue you’re hitting. By the looks of it, an api msg handler is 
> trying to enqueue something (probably a reply towards the client) and ends up 
> stuck because the svm queue is full and a condvar broadcast never comes. 
> 
> If you really need to fix this, I’d check jvpp code to see if condvar 
> broadcasts on dequeue are done properly. 
> 
> Regards,
> Florin
> 
> > On May 27, 2022, at 12:53 AM, NUAA无痕  > <mailto:nuaawan...@gmail.com>> wrote:
> > 
> > Hi, vpp experts
> > 
> > im use vpp 2101 version
> > my project use jvpp communicate with vpp by binary api, but now when long 
> > time run(about 14h) it will deadlock, this is info
> > 
> > 0x7f2687783a35 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> > /usr/lib64/libpthread.so.0
> > (gdb) bt
> > #0  0x7f2687783a35 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> > /usr/lib64/libpthread.so.0
> > #1  0x7f2688f198e6 in svm_queue_add () from 
> > /usr/local/zfp/lib/libsvm.so.21.01.1
> > #2  0x55cdda6856f3 in ?? ()
> > #3  0x7f268904c909 in vl_msg_api_handler_with_vm_node () from 
> > /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> > #4  0x7f2689033521 in vl_mem_api_handle_msg_main () from 
> > /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> > #5  0x7f2689043fce in ?? () from 
> > /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> > #6  0x7f2688fba5a7 in ?? () from /usr/local/zfp/lib/libvlib.so.21.01.1
> > #7  0x7f2688ef8de0 in clib_calljmp () from 
> > /usr/local/zfp/lib/libvppinfra.so.21.01.1
> > #8  0x7f263d673dd0 in ?? ()
> > #9  0x7f2688fbdf67 in ?? () from /usr/local/zfp/lib/libvlib.so.21.01.1
> > #10 0x in ?? ()
> > 
> > because i use release version so some info is not show, i found that vpp 
> > new version has change a lot about svm.
> > 
> > for some reason，i need some time to update vpp and now must resolve this 
> > problem
> > 
> > so experts can you give patch for this bug for vpp 2101 version
> > 
> > Best regards
> > wanghe
> > 
> > 
> > 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21480): https://lists.fd.io/g/vpp-dev/message/21480
Mute This Topic: https://lists.fd.io/mt/91372330/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] api msg deadlock

2022-05-27 Thread Florin Coras

Hi wanghe, 

Unfortunately, jvpp is no longer supported so probably there’s no recent fix 
for the issue you’re hitting. By the looks of it, an api msg handler is trying 
to enqueue something (probably a reply towards the client) and ends up stuck 
because the svm queue is full and a condvar broadcast never comes. 

If you really need to fix this, I’d check jvpp code to see if condvar 
broadcasts on dequeue are done properly. 

Regards,
Florin

> On May 27, 2022, at 12:53 AM, NUAA无痕  wrote:
> 
> Hi, vpp experts
> 
> im use vpp 2101 version
> my project use jvpp communicate with vpp by binary api, but now when long 
> time run(about 14h) it will deadlock, this is info
> 
> 0x7f2687783a35 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> (gdb) bt
> #0  0x7f2687783a35 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7f2688f198e6 in svm_queue_add () from 
> /usr/local/zfp/lib/libsvm.so.21.01.1
> #2  0x55cdda6856f3 in ?? ()
> #3  0x7f268904c909 in vl_msg_api_handler_with_vm_node () from 
> /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> #4  0x7f2689033521 in vl_mem_api_handle_msg_main () from 
> /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> #5  0x7f2689043fce in ?? () from 
> /usr/local/zfp/lib/libvlibmemory.so.21.01.1
> #6  0x7f2688fba5a7 in ?? () from /usr/local/zfp/lib/libvlib.so.21.01.1
> #7  0x7f2688ef8de0 in clib_calljmp () from 
> /usr/local/zfp/lib/libvppinfra.so.21.01.1
> #8  0x7f263d673dd0 in ?? ()
> #9  0x7f2688fbdf67 in ?? () from /usr/local/zfp/lib/libvlib.so.21.01.1
> #10 0x in ?? ()
> 
> because i use release version so some info is not show, i found that vpp new 
> version has change a lot about svm.
> 
> for some reason，i need some time to update vpp and now must resolve this 
> problem
> 
> so experts can you give patch for this bug for vpp 2101 version
> 
> Best regards
> wanghe
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21478): https://lists.fd.io/g/vpp-dev/message/21478
Mute This Topic: https://lists.fd.io/mt/91372330/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] hoststack-udp problems

2022-05-25 Thread Florin Coras

Hi, 

Inline.

> On May 25, 2022, at 2:15 AM, NUAA无痕  wrote:
> 
> hi, Florin Coras
> I may not have described clearly
> 
> im use vpp version 2101

Could you also try with latest vpp? We’re about to release 22.06

> 
> 1.sendto function problem
> i use LDP for c socket program with udp
> then i find if only use sendto function is error , this is client code
> 
> int main ()
> {
> int sockfd = socket (AF_INET, SOCK_DGRAM, IPPROTO_UDP);
> struct sockdaddr_in server = {0};
> server.sin_addr.s_addr = inet_addr (192.168.1.1);
> server.sin_port = htons ();
> server.sin_family = AF_INET;
> 
> char msg[] = "hello";
> /* this is not same */
> sendto (sockfd, msg, sizeof (msg), 0, (struct sockaddr *)   , 
> sizeof (server));
> 
> close (sockfd);
> return 0;
> }
> 
> this code vpp will report remote ip cannot connect error
> 
> but if use connect function,like this
> 
> int main ()
> {
> int sockfd = socket (AF_INET, SOCK_DGRAM, IPPROTO_UDP);
> struct sockdaddr_in server = {0};
> server.sin_addr.s_addr = inet_addr (192.168.1.1);
> server.sin_port = htons ();
> server.sin_family = AF_INET;
> 
> char msg[] = "hello";
> /* this is not same */
> connect (sockfd, （struct sockaddr*), sizeof (server));
> sendto (sockfd, msg, sizeof (msg), 0, (struct sockaddr *)NULL, sizeof 
> (server));
> 
> close (sockfd);
> return 0;
> }
> 
> ldp will work fine

Yup, that’s what I imagined you were doing. I’d expect that to work fine with 
more recent versions of vpp, but if it doesn’t do let me know. 

> 
> 2.i find vpp udp proto LDP also use mss, the code has set mss = 1500 - 20 - 8
> i think udp packet if biger than MTU , will cause ip fragment rather than 
> like tcp user mss
> (this is a good design)
> 
> but now i need send a 4K udp packet use ip fragment not use mss, because if 
> udp into mss size
> i also need reassemble(i need data), if use ip fragment and reassemble, i can 
> get data easily

UDP only sets DF in the IP header to 0 (in newer vpp versions), it does not 
implement fragmentation. You can try to increase udp's mtu to 4k and it will 
just forward the large datagram to ip. IP forwarding logic should compare the 
datagram’s size with the interface’s mtu’s and, if larger, should fragment. 

Given that udp delivery is unreliable and fragmentation/reassembly is 
expensive, I typically recommend against this.

> 
> 3."it cannot stop" is that client send "hello" once, if i use command "show 
> int", the packet will increase 1 and stop,
> but i set startup.conf udp mtu biger than 1500, i still send "hello" and 
> cannot stop, the count always increase and server also can receive it
> if mtu less than 1500 ,it is ok
> by the way, my nic that vpp manage  mtu is 9000, startup.conf udp { mtu 9000 
> } will cause this question

That might be a bug. Again, could you try a more recent vpp version and do let 
me know if the issue is still present. 

Regards, 
Florin

> 
> best  
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年5月22日周日 06:57写道：
> Hi, 
> 
> 
> 
> > On May 20, 2022, at 2:31 AM, NUAA无痕  > <mailto:nuaawan...@gmail.com>> wrote:
> > 
> > hi,vpp expert
> > now im use vpp hoststack for udp, i meet some problems
> > 
> > 1.udp socket must use connect function, if user sendto will cause ip 
> > address not connect error
> 
> What version of vpp are you using? Although we prefer connected udp for 
> performance reasons, sendto should work. If socket was not connected/bound, 
> vcl should connect it. What’s the exact error you’re getting and what are you 
> trying to do? 
> 
> > 
> > 2.if i use udp socket send packet biger than 1500, udp will split many 
> > packet, is some method let it dont split
> 
> What exactly are you trying to achieve? Session layer chops datagrams into 
> mss sized packets. If you’re trying to send large datagrams, up to nic mtu 
> size, then as you did lower, increase udp mtu. 
> 
> > 
> > 3.startup.conf set udp { mtu 9000 }, then use hoststack send one packet, it 
> > will always send packet and cannot stop, the mtu must less than 1500
> 
> Not sure I understand what you mean by “it cannot stop”? If by chance you’re 
> trying to force ip fragmentation, that’s not supported with udp sessions. 
> 
> Regards,
> Florin
> 
> > 
> > can u give some suggestions? than u
> > 
> > 
> > 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21461): https://lists.fd.io/g/vpp-dev/message/21461
Mute This Topic: https://lists.fd.io/mt/91227351/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] hoststack-java netty segmentfault

2022-05-25 Thread Florin Coras

Hi, 

Interesting, thanks for the info. But the thing I was worried about is java’s 
threading model and interaction with the network. VLS, the shim between vcl and 
ldp, tries to guess how the app uses the network and enforces locking to make 
sure only one thread interacts with a vcl worker's resources. If vls does not 
correctly guess the threading model, it can lead to issues. 

Regarding the crash lower, it seems nio tries to EPOLL_CTL_MOD a session and 
that session has no fifo. That should not crash (see [1]). Are you using an 
older version of vpp?

Regards,
Florin

[1] https://git.fd.io/vpp/tree/src/vcl/vppcom.c#n2883

> On May 25, 2022, at 1:43 AM, NUAA无痕  wrote:
> 
> 
> 
> -- Forwarded message -
> 发件人： NUAA无痕 mailto:nuaawan...@gmail.com>>
> Date: 2022年5月25日周三 16:37
> Subject: Re: [vpp-dev] hoststack-java netty segmentfault
> To: Florin Coras mailto:fcoras.li...@gmail.com>>
> 
> 
> hi, Florin
> yes, im use LDP + java, i hava tested java socket, it works fine!
> i think jvm also use libc.so, so i guess java socket will translate to c 
> socket, it is
> 
> now i m use LDP for netty,but it segmentfault
> vpp version is 2101
> 
> this is info
> 
> Thread 17 "nioEventLoopGro" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f977571d0 (LWP 7042)]
> 0x007fb7d998a8 in svm_fifo_del_want_deq_ntf (f=0x0, ntf_type=2 '\002') at 
> /home/wanghe/ZFP-2101/src/svm/svm_fifo.h:770
> 770   f->want_deq_ntf &= ~ntf_type;
> (gdb) bt
> #0  0x007fb7d998a8 in svm_fifo_del_want_deq_ntf (f=0x0, ntf_type=2 
> '\002') at /home/wanghe/ZFP-2101/src/svm/svm_fifo.h:770
> #1  0x007fb7da6ba0 in vppcom_epoll_ctl (vep_handle=1, op=3, 
> session_handle=10, event=0x7f97755c68) at 
> /home/wanghe/ZFP-2101/src/vcl/vppcom.c:2740
> #2  0x007fb7dc02f0 in vls_epoll_ctl (ep_vlsh=0, op=3, vlsh=9, 
> event=0x7f97755c68) at /home/wanghe/ZFP-2101/src/vcl/vcl_locked.c:1293
> #3  0x007fb7fbc390 in epoll_ctl (epfd=128, op=3, fd=137, 
> event=0x7f97755c68) at /home/wanghe/ZFP-2101/src/vcl/ldp.c:2294
> #4  0x007fa8135b1c in Java_sun_nio_ch_EPollArrayWrapper_epollCtl () from 
> /usr/lib/jvm/java-8-openjdk-arm64/jre/lib/aarch64/libnio.so
> #5  0x007f9c08f49c in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> 
> this may a bug, if vpp can support java, it will very good
> i need to enhance java web performance 
> 
> best wish
> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2022年5月22日周日 06:43写道：
> Hi, 
> 
> Are you trying to use LDP + java? I suspect that has never been tested and 
> I’d be surprised if it worked. 
> 
> Regards, 
> Florin
> 
> > On May 20, 2022, at 2:18 AM, NUAA无痕  > <mailto:nuaawan...@gmail.com>> wrote:
> > 
> > hi， vpp expert
> > im use vpp hoststack for java netty
> > but it segmentfault, reason is epoll use svm_fifo_t is null
> > can u give some suggestion,thank u
> > 
> > 
> > 
> > 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21460): https://lists.fd.io/g/vpp-dev/message/21460
Mute This Topic: https://lists.fd.io/mt/91227238/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] hoststack-udp problems

2022-05-21 Thread Florin Coras

Hi, 

> On May 20, 2022, at 2:31 AM, NUAA无痕  wrote:
> 
> hi,vpp expert
> now im use vpp hoststack for udp, i meet some problems
> 
> 1.udp socket must use connect function, if user sendto will cause ip address 
> not connect error

What version of vpp are you using? Although we prefer connected udp for 
performance reasons, sendto should work. If socket was not connected/bound, vcl 
should connect it. What’s the exact error you’re getting and what are you 
trying to do? 

> 
> 2.if i use udp socket send packet biger than 1500, udp will split many 
> packet, is some method let it dont split

What exactly are you trying to achieve? Session layer chops datagrams into mss 
sized packets. If you’re trying to send large datagrams, up to nic mtu size, 
then as you did lower, increase udp mtu. 

> 
> 3.startup.conf set udp { mtu 9000 }, then use hoststack send one packet, it 
> will always send packet and cannot stop, the mtu must less than 1500

Not sure I understand what you mean by “it cannot stop”? If by chance you’re 
trying to force ip fragmentation, that’s not supported with udp sessions. 

Regards,
Florin

> 
> can u give some suggestions? than u
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21428): https://lists.fd.io/g/vpp-dev/message/21428
Mute This Topic: https://lists.fd.io/mt/91227351/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] hoststack-java netty segmentfault

2022-05-21 Thread Florin Coras

Hi, 

Are you trying to use LDP + java? I suspect that has never been tested and I’d 
be surprised if it worked. 

Regards, 
Florin

> On May 20, 2022, at 2:18 AM, NUAA无痕  wrote:
> 
> hi， vpp expert
> im use vpp hoststack for java netty
> but it segmentfault, reason is epoll use svm_fifo_t is null
> can u give some suggestion,thank u
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21427): https://lists.fd.io/g/vpp-dev/message/21427
Mute This Topic: https://lists.fd.io/mt/91227238/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-05 Thread Florin Coras

Hi Yacan, 

Currently rpcs from first worker to main are done through session layer and are 
processed by main in batches. Session queue node runs on main in interrupt mode 
so first worker will set an interrupt when the list of pending connects goes 
non-empty and main will switch to polling in the rpc handler if it notices it 
can’t handle pending connects in one dispatch. So, the first connect might be 
affected by main sleeping in epoll_pwait but subsequent connects should not, 
assuming we get a constant stream of connects.

To test that, weizhen9612 try adding to session stanza in startup.conf: session 
{ poll-main }. That should avoid main sleeping in epoll_pwait.  

Eventually, we’ll get to a point where first worker will execute the connects. 
Part of the changes needed are in, i.e., session pools are now realloced with 
barrier, but more improvements are needed.   

Regards,
Florin

> On May 5, 2022, at 5:01 AM, liuyacan  wrote:
> 
> Hi Florin, weizhen9612:
> 
>  I'm not sure whether rpc for connects will be executed immediately by 
> the main thread in the current implementation, or will it wait for the 
> epoll_pwait in linux_epoll_input_inline() to time out.
> 
> Regards,
> yacan
> On 5/5/2022 16:19，  wrote： 
> Hi,
>Now I configure main-core to 2 and corelist-workers to 0, and find that 
> the performance has improved significantly.
> 
> When I execute the following conmand, I find that vpp have main thread only.
> #show threads
> 
> 
> What does this situation show？
> Thanks. 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21375): https://lists.fd.io/g/vpp-dev/message/21375
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-04 Thread Florin Coras

Next step then. What’s segment-size and add-segment-size in vcl.conf? Could you 
set them to something large like 40? Also event-queue-size 100, 
just to make sure mq and fifo segments are not a limiting factor. In vpp under 
session stanza, set event-queue-length 20. 

Try also to run the test twice to make sure the issue is not pool warmup. 
Finally, if perf doesn’t improve, before test do "clear error" and after test 
"show error” and let’s see if there’s something there. 

Regards,
Florin 

> On May 4, 2022, at 6:25 PM, weizhen9...@163.com wrote:
> 
> Hi,
> I test the performance of upstream server.
> 
> Just as you see, the performance of upstream is more higher than vpp proxy. 
> In addition, I don't find any drops.
> Thanks.
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21360): https://lists.fd.io/g/vpp-dev/message/21360
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-04 Thread Florin Coras

As mentioned previously, is the upstream server handling the load? Do you see 
drops between vpp and the upstream server? 

Regards,
Florin

> On May 4, 2022, at 9:10 AM, weizhen9...@163.com wrote:
> 
> Hi,
>According to your suggestion, I config the src-address.
> 
> 
> But the performance is lower than that before.
> 
> 
> 
> Thanks.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21357): https://lists.fd.io/g/vpp-dev/message/21357
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-04 Thread Florin Coras

What’s the result prior to multiple addresses? Also, can you give it the whole 
/24? No need to configure the ips, just tcp src-address 
192.168.6.6-192.168.6.250

Forgot to ask before but is the server that’s being proxied for handling the 
load? It will also need to accept a lot of connections. 

Regards,
Florin

> On May 4, 2022, at 8:35 AM, weizhen9...@163.com wrote:
> 
> According to your suggestion, I test with multiple source ips. But the 
> performance is still low.
> 
> The ip is as follows.
> 
> vpp#tcp src-address 192.168.6.6-192.168.6.9
> Thanks.
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21355): https://lists.fd.io/g/vpp-dev/message/21355
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-04 Thread Florin Coras

Hi, 

That shouldn’t be the issue. Half-opens are on main because connection 
establishment needs locks before it sends out a syn packet. Handshakes are not 
completed on main but on workers. VPP with one worker + main should be able to 
handle 100k+ CPS with warmed up pools. 

Long term we’ll switch from main to first worker for syns but again, that’s not 
the thing that limits performance in your case. Instead, it’s probably the 
number of ports. You should be able to confirm that by testing with multiple 
source ips. 

Regards,
Florin

> On May 4, 2022, at 8:00 AM, weizhen9...@163.com wrote:
> 
> Hi,
>Is this the reason for the low performance? In general, the main threads 
> handles management functions(debug CLI, API, stats collection) and one or 
> more worker threads handle packet processing from input to output of the 
> packet. Why does the main core handle the session? Does the condition 
> influence the performance? If yes, what should I do?
> Thanks. 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21353): https://lists.fd.io/g/vpp-dev/message/21353
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-04 Thread Florin Coras

Hi, 

Those are half-open connects. So yes, they’re expected if nginx opens a new 
connection for each request.

Regards,
Florin

> On May 4, 2022, at 6:48 AM, weizhen9...@163.com wrote:
> 
> Hi,
> When I use wrk to test the performance of  nginx proxy using vpp host 
> stack, I execute the command "show session" by vppctl. The result is 
> following.
> 
> The main core has most of sessions. Is this normal? If not, what should I DO?
> Thanks.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21351): https://lists.fd.io/g/vpp-dev/message/21351
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-03 Thread Florin Coras

Hi, 

Unfortunately, I have not, partly because I didn’t expect too much out of the 
test due to the issues you’re hitting. What’s the difference between linux and 
vpp with and without tcp_max_tw_bucket? 

Regards,
Florin

> On May 3, 2022, at 3:28 AM, weizhen9...@163.com wrote:
> 
> Hi,
> 
> I wanted to ask if you have tested the performance of nginx proxy using 
> vpp host stack as a short connection, i.e. after vpp send GET request to 
> upstream server, vpp close the connection. If yes, please tell me the result. 
> Thank you for your suggestion about adding multiple source IPs. But we 
> want to make the performance of the vpp protocol stack higher than that of 
> the kernel in the same condition.
> 
> Thanks.
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21348): https://lists.fd.io/g/vpp-dev/message/21348
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #epoll problems with epoll

2022-05-03 Thread Florin Coras

It seems vpp is sending and receiving fins and resets. So if the remote end did 
not send fins, probably the resets are the source of the epoll HUPs. If you 
want to debug why those resets were sent, you might have to capture a pcap 
trace or try to capture them while they are sent. “show tcp stats” might also 
provide some clues, if the the resets are triggered by timers. 

Regards,
Florin 

> On May 2, 2022, at 11:58 PM, 25956760...@gmail.com wrote:
> 
> Hi, Florin,
> I have tried your suggestion. But the problem still exists. Here is a picture 
> of show errors. Thanks again for your help.
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21340): https://lists.fd.io/g/vpp-dev/message/21340
Mute This Topic: https://lists.fd.io/mt/90853811/21656
Mute #epoll:https://lists.fd.io/g/vpp-dev/mutehashtag/epoll
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-02 Thread Florin Coras

Hi, 

That indeed looks like an issue due to vpp not being able to recycle 
connections fast enough. There are only 64k connections available between vpp 
and the upstream server, so recycling them as fast as possible, i.e., with 0 
timeout as the kernel does after tcp_max_tw_buckets threshold is hit, might 
make it look like performance is moderately good assuming there are less than 
64k active connections (not closing). 

However, as explained in the previous emails, that might lead to connection 
errors (see my previous links). You could try to emulate that with vpp, by just 
setting timewait-time to 0 but the same disclaimer regarding connection errors 
holds. The only other option is to ensure vpp can allocate more connections to 
the upstream server, i.e., either more source IPs or more destination/server 
IPs.

Regards,
Florin 

> On May 2, 2022, at 8:33 AM, weizhen9...@163.com wrote:
> 
> Hi,
> The short link means that after the client send GET request, the client 
> send tcp FIN packet. Instead, the long link means that after the client send 
> GET request,  the client send next http GET request by using the same link 
> and don't need to send syn packet.
> We found that when vpp and the upstream servers used the short link, the 
> performance is lower than nginx proxy using kernel host stack. The picture 
> shows the performance of nginx proxy using vpp host stack.
> 
> Actually, the performance of nginx proxy using vpp host stack is higher than 
> nginx proxy using kernel host stack. I don't understand why?
> Thanks.
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21322): https://lists.fd.io/g/vpp-dev/message/21322
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-05-02 Thread Florin Coras

Hi, 

As per this [1], after tcp_max_tw_buckets threshold is hit timewait time is 0 
and this [2] explains what will go wrong. Assuming you’re hitting the 
threshold, 1s timewait-time in vpp will probably not be enough to match 
performance. 

Not sure what you mean by “short link”. If you can’t use multiple source IPs or 
destination IPs in the active opens between vpp and the upstream servers, 
there’s not much that could be done beyond what’s mentioned above as vpp can’t 
allocate connections. If your nginx and server it’s proxying for are colocated, 
and the server can use vcl, you could maybe try to use cut-through sessions as 
those do not consume ports in tcp. 

Regards, 
Florin

[1] https://sysctl-explorer.net/net/ipv4/tcp_max_tw_buckets/ 

[2] 
https://stackoverflow.com/questions/45979123/what-is-the-side-effect-of-setting-tcp-max-tw-buckets-to-a-very-small-value

> On May 1, 2022, at 2:09 AM, weizhen9...@163.com wrote:
> 
> Hi,
> I set the timewait_time which is equal to 1s in tcp's configuration. But 
> the performance of nginx proxy using vpp host stack is still lower than nginx 
> proxy using kernel host stack. 
> Now I want to know what can I do to improve the performance? And does 
> nginx proxy using vpp host stack support short link?
> In addition, just as you said above, do I need to sets time-wait to 0? 
> And I don't set tcp-src address. I hope the performance of nginx proxy using 
> vpp host stack is higher than the performance of nginx proxy using kernel 
> host stack in the hardware environment.
> Thanks. 
> 
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21317): https://lists.fd.io/g/vpp-dev/message/21317
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-04-30 Thread Florin Coras

Hi, 

Understood. See the comments in my previous reply regarding timewait-time 
(tcp_max_tw_bucket practically sets time-wait to 0 once threshold is passed) 
and tcp-src address. 

Regards, 
Florin

> On Apr 30, 2022, at 10:08 AM, weizhen9...@163.com wrote:
> 
> Hi,
> I test nginx proxy using RPS. And nginx proxy only towards one IP.
> Now I test the performance of nginx proxy using vpp host stack and by 
> configuring nginx, it is a short connection between the nginx reverse proxy 
> and the upstream server. The result of test show that the performance of 
> nginx proxy using vpp host stack is lower than nginx proxy using kernel host 
> stack. In kernel host stack, I config tcp_max_tw_bucket.
> But when  it is a long connection between the nginx reverse proxy and the 
> upstream server, the performance of nginx proxy using vpp host stack is 
> higher than nginx proxy using kernel host stack.
> So what should I do to improve the performance of nginx proxy using vpp host 
> stack when  it is a short connection between the nginx reverse proxy and the 
> upstream server?
> Thanks. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21313): https://lists.fd.io/g/vpp-dev/message/21313
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] test performance of nginx using vpp host stack#vpp-hoststack

2022-04-30 Thread Florin Coras

Hi, 

What is performance in this case, CPS? If yes, does nginx proxy only towards 
one IP, hence the need for tcp_max_tw_bucket? 

You have the option to reduce time wait time in tcp by setting timewait-time in 
tcp’s startup.conf stanza. I would not recommend reducing it too much as it can 
lead to corruption of data streams whenever connections cannot be gracefully 
closed because of lost packets. 

If you have more IPs vpp could use on the interface vpp towards your server, 
I’d recommend providing them to tcp via: tcp src-address  - 

Regards,
Florin
 

> On Apr 30, 2022, at 4:17 AM, weizhen9...@163.com wrote:
> 
> Hi,
> Now I use nginx  which uses vpp host stack as a proxy to test the 
> performance. But I find the performance of nginx using vpp host stack is 
> lower  than nginx using kernel host stack. The reason is that I config the  
> tcp_max_tw_bucket in kernel host stack. So does the vpp stack support the 
> setting tcp_max_tw_bucket? If not, can I modify the vpp host stack?
> Thanks. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21311): https://lists.fd.io/g/vpp-dev/message/21311
Mute This Topic: https://lists.fd.io/mt/90793836/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vpp-hoststack how to change tcp congestion algorithm

2022-04-26 Thread Florin Coras

Hi, 

I wouldn’t recommend it, but if you must change it, in vpp’s startup.conf add a 
tcp stanza with cc-algo set to newreno: tcp { cc-algo newreno }

Regards, 
Florin

> On Apr 26, 2022, at 10:30 PM, 25956760...@gmail.com wrote:
> 
> Hi All,
> I have read the src code of TCP.  I found that if I want to switch tcp 
> congestion algorithm between newreno and cubic, the only way is to modify the 
> src code. May I ask if my understanding is correct? Thank you very much. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21289): https://lists.fd.io/g/vpp-dev/message/21289
Mute This Topic: https://lists.fd.io/mt/90725025/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #socket-api get stuck in socket function

2022-04-26 Thread Florin Coras

Hi, 

I’m guessing you’re asking how to run LDP from gdb. For that either create a 
gdb script file and: gdb -x cmd.gdb --args ./epollandsocket, or run the 
commands after starting gdb. 

The minimal set of commands to start: 
set exec-wrapper env 
‘LD_PRELOAD=/vpp/build-root/build-vpp-native/vpp/lib/libvcl_ldpreload.so’
set environment VCL_CONFIG=/vcl.conf

Hope this helps!

Regards,
Florin

> On Apr 26, 2022, at 10:50 AM, 25956760...@gmail.com wrote:
> 
> Thanks， I'm using LDP. When I use gdb to debug, it comes out with output as 
> follows.
> I guess LDP recovers some libraries that gdb needs. When I switch to normal 
> env, gdb runs correctly. How can I solve this, thanks. Correct output as 
> follows.
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21286): https://lists.fd.io/g/vpp-dev/message/21286
Mute This Topic: https://lists.fd.io/mt/90680910/21656
Mute #socket-api:https://lists.fd.io/g/vpp-dev/mutehashtag/socket-api
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #socket-api get stuck in socket function

2022-04-25 Thread Florin Coras

Hi, 

Are you trying to run the test app through LDP? If yes, sharing one epoll fd 
between two threads might be the source of the issues, although hard to say 
what could go wrong. 

With respect to your app, could you run it from gdb and check where it’s stuck? 

Regards,
Florin

> On Apr 25, 2022, at 2:16 AM, 25956760...@gmail.com wrote:
> 
> Hi All ,
> 
> I'm trying to use nonblock socket with epoll. 
> I wonder why I sometimes get stuck in socket function when I use VCL .
> Code as following:
> printf ("before socket\n");
> if ((c_fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
> perror("create socket fail.\n");
> exit(0);
> }
> printf ("after socket\n");
> The only output is "before socket",socket function even dosen't return.
> Thank you very much
> 
> complete code is in attachments
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21284): https://lists.fd.io/g/vpp-dev/message/21284
Mute This Topic: https://lists.fd.io/mt/90680910/21656
Mute #socket-api:https://lists.fd.io/g/vpp-dev/mutehashtag/socket-api
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [E] [vpp-dev] VPP deadlock issue

2022-04-07 Thread Florin Coras

Hi Kevin, 

Not sure but that does sound like a bug if it’s allowed. The mutex should be 
dropped before suspending. 

Regards,
Florin

> On Apr 7, 2022, at 7:07 PM, Kevin Yan  wrote:
> 
> Hi Florin,
>   Thanks for the quick reply. I think when this issue happened, 
> main thread was locking the binary api’s queue mutex, and then it scheduled 
> to execute another process node, in this process node it called barrier sync. 
> Is this a possible scenario?
>  
> BRs,
> Kevin
>  
> From: Florin Coras mailto:fcoras.li...@gmail.com>> 
> Sent: Friday, April 8, 2022 10:00 AM
> To: Kevin Yan mailto:kevin@mavenir.com>>
> Cc: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>; Alan Wang 
> mailto:alan.w...@mavenir.com>>
> Subject: [E] Re: [vpp-dev] VPP deadlock issue
>  
> Email is from a Free Mail Service (Gmail/Yahoo/Hotmail….) : Beware of 
> Phishing Scams, Report questionable emails to s...@mavenir.com 
> <mailto:s...@mavenir.com>
> Hi Kevin,  
>  
> That’s a pretty old VPP release so you should maybe try to update. 
>  
> Regarding the deadlock, what is main actually doing? If it didn’t lock the 
> binary api's queue mutex before the barrier sync, it shouldn’t deadlock. 
>  
> Regards,
> Florin
> 
> 
> On Apr 7, 2022, at 6:39 PM, Kevin Yan via lists.fd.io <http://lists.fd.io/> 
>  <mailto:kevin.yan=mavenir@lists.fd.io>> wrote:
>  
> Hi,
>   Recently I got a VPP crash issue, one worker thread is doing 
> mutex lock and waiting for getting the mutex, the complete call stack is 
> arp_learn-> vnet_arp_set_ip4_over_ethernet-> vl_api_rpc_call_main_thread-> 
> vl_msg_api_alloc_as_if_client-> vl_msg_api_alloc_internal-> 
> pthread_mutex_lock (>mutex); while at the same time main thread is 
> calling vlib_worker_thread_barrier_sync to try to lock all worker threads, 
> this will lead deadlock and hence VPP crashed.
>  
>   Did anyone meet the similar issue and how to solve this race 
> condition? I am using vpp19.01, tried to search the commits related to this 
> issue for later release but no lucky, not sure if this issue got fixed in 
> later release .
>   Appreciate if anyone can help.
>  
> BRs,
> Kevin
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> <mailto:secur...@mavenir.com>. This message contains the views of its author 
> and may not necessarily reflect the views of Mavenir Systems, Inc. or its 
> affiliates, who employ systems to monitor email messages, but make no 
> representation that such messages are authorized, secure, uncompromised, or 
> free from computer viruses, malware, or other defects. Thank You
> 
> 
>  
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> <mailto:secur...@mavenir.com>. This message contains the views of its author 
> and may not necessarily reflect the views of Mavenir Systems, Inc. or its 
> affiliates, who employ systems to monitor email messages, but make no 
> representation that such messages are authorized, secure, uncompromised, or 
> free from computer viruses, malware, or other defects. Thank You


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21231): https://lists.fd.io/g/vpp-dev/message/21231
Mute This Topic: https://lists.fd.io/mt/90327955/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP deadlock issue

2022-04-07 Thread Florin Coras

Hi Kevin, 

That’s a pretty old VPP release so you should maybe try to update. 

Regarding the deadlock, what is main actually doing? If it didn’t lock the 
binary api's queue mutex before the barrier sync, it shouldn’t deadlock. 

Regards,
Florin

> On Apr 7, 2022, at 6:39 PM, Kevin Yan via lists.fd.io 
>  wrote:
> 
> Hi,
>   Recently I got a VPP crash issue, one worker thread is doing 
> mutex lock and waiting for getting the mutex, the complete call stack is 
> arp_learn-> vnet_arp_set_ip4_over_ethernet-> vl_api_rpc_call_main_thread-> 
> vl_msg_api_alloc_as_if_client-> vl_msg_api_alloc_internal-> 
> pthread_mutex_lock (>mutex); while at the same time main thread is 
> calling vlib_worker_thread_barrier_sync to try to lock all worker threads, 
> this will lead deadlock and hence VPP crashed.
>  
>   Did anyone meet the similar issue and how to solve this race 
> condition? I am using vpp19.01, tried to search the commits related to this 
> issue for later release but no lucky, not sure if this issue got fixed in 
> later release .
>   Appreciate if anyone can help.
>  
> BRs,
> Kevin
> This e-mail message may contain confidential or proprietary information of 
> Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
> the intended recipient(s). If you are not the intended recipient of this 
> message, you are hereby notified that any review, use or distribution of this 
> information is absolutely prohibited and we request that you delete all 
> copies in your control and contact us by e-mailing to secur...@mavenir.com 
> . This message contains the views of its author 
> and may not necessarily reflect the views of Mavenir Systems, Inc. or its 
> affiliates, who employ systems to monitor email messages, but make no 
> representation that such messages are authorized, secure, uncompromised, or 
> free from computer viruses, malware, or other defects. Thank You
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21229): https://lists.fd.io/g/vpp-dev/message/21229
Mute This Topic: https://lists.fd.io/mt/90327533/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] LDP in comparison with OpenOnload

2022-04-06 Thread Florin Coras

Hi Kunal, 

Similar goal but it’s only been tested against a limited number of 
applications. 

Also, as per my previous reply, ldp accepts a mix of linux and vcl fd/sessions 
and to that end it sets aside a number of fds for linux. Consequently, vcl fds 
will start from 1 << LDP_ENV_SID_BIT and that might be a problem for 
applications that assume their fds start at 0 and end at a low value. That’s 
typically a problem with those that use select as opposed to epoll. 

Regards,
Florin

> On Apr 6, 2022, at 1:20 PM, Kunal Parikh  wrote:
> 
> Hi,
> 
> I want to gauge if the plan for LDP is to be similar to OpenOnload 
> (https://github.com/Xilinx-CNS/onload )
> 
> We use OpenOnload with SolarFlare cards with great success.
> 
> It doesn't require us to change our code while getting the benefits of kernel 
> bypass (and hardware acceleration from SolarFlare cards).
> 
> Thanks,
> 
> Kunal 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21217): https://lists.fd.io/g/vpp-dev/message/21217
Mute This Topic: https://lists.fd.io/mt/90298662/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VCL & netperf #hoststack

2022-04-06 Thread Florin Coras

I’ve never tried netperf so unfortunately I don’t even know if it works. From 
the server logs, it looks like it hit some sort of error on accept. 

Note that we set aside the first 1 << LDP_ENV_SID_BIT (env variable) fds for 
linux. By default that value is 5, which is the min we accept, i.e., 32 fds. 
That could be a problem for netperf, given that in the logs it’s saying 
“setting 32 in fdset”. 

Another option to check latency would be to use wrk/ab or similar tools with a 
web server that’s known to work with ldp, like nginx. 

Regards,
Florin

> On Apr 6, 2022, at 1:17 PM, Kunal Parikh  wrote:
> 
> I am using LD_PRELOAD
> 
> Is there a particular example of netperf flags you can recommend for 
> measuring per packet latency? 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21216): https://lists.fd.io/g/vpp-dev/message/21216
Mute This Topic: https://lists.fd.io/mt/90297978/21656
Mute #hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VCL & netperf #hoststack

2022-04-06 Thread Florin Coras

Hi Kunal, 

How are you attaching netperf to VCL/VPP? Unless you modify it to use VCL your 
only option is to try to use LD_PRELOAD (see iperf exaple here [1]). 

Note however that most probably LDP does not support all socket options netperf 
might want. 

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf

> On Apr 6, 2022, at 12:50 PM, Kunal Parikh  wrote:
> 
> Hi Folks,
> 
> I want visualize the latency profile of VCL HostStack.
> 
> I am using netperf and am receiving this error on the server:
> 
> Issue receiving request on control connection. Errno 19 (No such device)
> 
> Detailed logs attached. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21213): https://lists.fd.io/g/vpp-dev/message/21213
Mute This Topic: https://lists.fd.io/mt/90297978/21656
Mute #hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Feedback on a tool: vppcfg

2022-04-02 Thread Florin Coras

Hi Pim, 

Definitely cool! Haven’t had a chance to go through all of it but the fact that 
some binary api calls crash vpp is something we should fix. 

It feels like vppcfg could also be used for extensive vpp api/cli/cfg testing. 

My quick 0.02$

Regards,
Florin

> On Apr 2, 2022, at 8:17 AM, Pim van Pelt  wrote:
> 
> Hoi colleagues,
> 
> I know there exist several smaller and larger scale VPP configuration 
> harnesses out there, some more complex and feature complete than others. I 
> wanted to share my work on an approach based on a YAML configuration with 
> strict syntax and semantic validation, and a path planner that brings the 
> dataplane from any configuration state safely to any other configuration 
> state, as defined by these YAML files.
> 
> A bit of a storyline on the validator: 
> https://ipng.ch/s/articles/2022/03/27/vppcfg-1.html 
> 
> A bit of background on the DAG path planner: 
> https://ipng.ch/s/articles/2022/04/02/vppcfg-2.html 
> 
> Code with tests on https://github.com/pimvanpelt/vppcfg 
> 
> 
> The config and planner supports interfaces, bondethernets, vxlan tunnels, 
> l2xc, bridgedomains and, quelle surprise, linux-cp configurations of all 
> sorts. If anybody feels like giving it a spin, I'd certainly appreciate 
> feedback and if you can manage to create two configuration states that the 
> planner cannot reconcile, I'd love to hear about those too.
> 
> For now, the path planner works by reading the API configuration state 
> exactly once (at startup), and then it figures out the CLI calls to print 
> without needing to consult VPP again. This is super useful as it’s a 
> non-intrusive way to inspect the changes before applying them, and it’s a 
> property I’d like to carry forward. However, I don’t necessarily think that 
> emitting the CLI statements is the best user experience, it’s more for the 
> purposes of analysis that they can be useful. What I really want to do is 
> emit API calls after the plan is created and reviewed/approved, directly 
> reprogramming the VPP dataplane. However, the VPP API set needed to do this 
> is not 100% baked yet. For example, I observed crashes when tinkering with 
> BVIs and Loopbacks (see my thread from last week, thanks for the response 
> Neale), and fixed a few obvious errors in the Linux CP API (gerrit) but there 
> are still a few more issues to work through before I can set the next step 
> with vppcfg.
> 
> If this tool proves to be useful to others, I'm happy to upstream it to 
> extras/ somewhere.
> 
> -- 
> Pim van Pelt mailto:p...@ipng.nl>> 
> PBVP1-RIPE - http://www.ipng.nl/ 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21184): https://lists.fd.io/g/vpp-dev/message/21184
Mute This Topic: https://lists.fd.io/mt/90202690/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: Private: Re: [vpp-dev] increase nginx performace using vpp host stack#nginx #vpp-hoststack

2022-03-31 Thread Florin Coras

As mentioned previously, that is not supported for connects currently.  

Regards,
Florin

> On Mar 31, 2022, at 9:44 PM, weizhen9...@163.com wrote:
> 
> Hi,
> I describe our scene in detail. We use nginx in vpp host stack as a proxy. 
> And we add some features in nginx. For example, nginx close upstream tcp 
> links actively and this causes a lot of TIME_WAIT states in nginx proxy when 
> we test the performance of nginx using vpp host stack. So we configure kernel 
> parameters to multiplex tcp port and recycle port quickly. But I can't find 
> vpp host stack parameters which is similar with kernel parameters.
> How can we configure vpp parameters to reuse tcp port and recycle port 
> quickly?
> Thanks.  
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21177): https://lists.fd.io/g/vpp-dev/message/21177
Mute This Topic: https://lists.fd.io/mt/90157992/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: Private: Re: [vpp-dev] increase nginx performace using vpp host stack#nginx #vpp-hoststack

2022-03-31 Thread Florin Coras

Hi,

Given that 20k connections are being actively opened, those on main thread, and 
40k are established, those on the workers, suggests that tcp runs out of ports 
for connects. If possible either increase the number of destination IPs for 
nginx or try “tcp src-address ip1-ip2” and pass in a range of consecutive ips 
that could be used the vpp as source ips.   

The two workers seem to be pretty lightly loaded since loops/2 is over 2M, 
i.e., they’re mostly spinning empty, without work. 

Regards,
Florin

> On Mar 31, 2022, at 8:21 PM, weizhen9...@163.com wrote:
> 
> Hi,
> 
> From this test results, can you see some errors? Why is the performance of 
> nginx using vpp low?
> 
> Thanks. 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21175): https://lists.fd.io/g/vpp-dev/message/21175
Mute This Topic: https://lists.fd.io/mt/90157992/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: Private: Re: [vpp-dev] increase nginx performace using vpp host stack#nginx #vpp-hoststack

2022-03-31 Thread Florin Coras

TCP accepts connections in time-wait (see here [1]) but won’t reuse ports of 
connection in time-wait for connects. If you expect lots of active opens from 
nginx to only one destination ip and you have multiple source ips, you could 
try to use "tcp src-address” cli. 

Regards,
Florin

[1] https://git.fd.io/vpp/tree/src/vnet/tcp/tcp_input.c#n2667

> On Mar 31, 2022, at 7:00 PM, weizhen9...@163.com wrote:
> 
> Hi, 
> 
> When we test the performance of nginx using vpp host stack, we execute the 
> following command.
> #show run 
> 
> 
> 
> 
> 
> #show session
> 
> 
> 
> Besides, we add some new features in nginx and this feature will cause a lot 
> of time_wait states. In order to reuse TIME_WAIT state tcp_tw_reuse and 
> tcp_tw_recycle. Besides, to make the port unrestricted, we modify other 
> parameters.
> So how can we modify the vpp host stack parameters?
> Thanks.
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21173): https://lists.fd.io/g/vpp-dev/message/21173
Mute This Topic: https://lists.fd.io/mt/90157992/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: Private: Re: [vpp-dev] increase nginx performace using vpp host stack#nginx #vpp-hoststack

2022-03-31 Thread Florin Coras

I didn’t mean you should switch to envoy, just that throughput is pretty low 
probably because of some configuration. What that configuration is is not 
obvious unfortunately. 

Regarding the kernel parameters, we have time wait reuse enabled (equivalent to 
tcp_tw_reuse) but that should not matter unless nginx establishes a new 
connection for each request. Does it, as part of the features you’ve added?

Could you do: 
- clear run; show run - in vpp when under load
- show sessions - to see the number of session in vpp 

Also, could you reduce the number of workers to maybe 1-2 in nginx and 1 in 
vpp, to get some base performance readings?

Regards,
Florin 

[1] https://wiki.fd.io/images/0/08/Using_vpp_as_envoys_network_stack.pdf 


> On Mar 31, 2022, at 8:56 AM, weizhen9...@163.com wrote:
> 
> Hi,
> 
> I'm doing proxying with nginx. And we develop some new functions in nginx. 
> The performance of nginx in kernel host stack is higher than nginx using vpp 
> host stack. When testing the nginx in kernel host stack, we modify the kernel 
> parameters in kernel host stack. When using vpp host stack, how can we modify 
> the similar parameters?
> 
> Besides, we don't use the envoy. We develop some functions in nginx.
>> 
>> Hi, 
>> 
>> Spoke too soon. Noticed you’re doing proxying with nginx. 
>> 
>> What does clear run; show run report in vpp when under load?
>> 
>> Side note, with envoy I’m seeing much better numbers. See for instance slide 
>> 12 here [1]. So I suspect this is a configuration issue but can’t tell 
>> upfront what it is.
>> 
>> Regards,
>> Florin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21168): https://lists.fd.io/g/vpp-dev/message/21168
Mute This Topic: https://lists.fd.io/mt/90157992/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] increase nginx performace using vpp host stack#nginx #vpp-hoststack

2022-03-31 Thread Florin Coras

Hi, 

Could you provide a bit more info about the numbers you are seeing and the type 
of test you are doing? Also some details about your configs and vpp version? 

As for tcp_tw_recycle, I believe that was removed in Linux 4.12 because it was 
causing issues for nat-ed connections. Did you mean tcp_tw_reuse? If yes, are 
you testing CPS with only one remote client IP? 

Regards, 
Florin

> On Mar 31, 2022, at 6:35 AM, weizhen9...@163.com wrote:
> 
> Now, we develop some functions in nginx and test nginx performance in kernel 
> host stack and vpp host stack. We found the performance of nginx in vpp is 
> lower than nginx in kernel.
> So I want to ask how to debug to increase the performance of nginx in vpp. 
> Besides, we modify some parameter in kernel. For example, in file 
> /etc/sysctl.conf, we add the following parameters:
> net.ipv4.tcp_max_tw_buckets = 3000
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_tw_recycle = 1
> In file /etc/security/limited.conf, we add the following parameters:
> * soft nofile 102400
> * hard nofile 102400
> So how can we add those parameters in vpp host stack? And how can we increase 
> the performance of nginx in vpp?
> Thanks.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21162): https://lists.fd.io/g/vpp-dev/message/21162
Mute This Topic: https://lists.fd.io/mt/90154397/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-30 Thread Florin Coras

Hi Kunal, 

Yes, it might be worth looking into ip and tcp csum offloads. 

But given that mtu ~ 9kB, maybe look into forcing tcp to build jumbo frames, 
i.e., tcp { mtu 9000 } in startup.conf. It’ll be needed on both ends and I’m 
assuming here network between your two vpp instances supports 9k mtu. 

Regards,
Florin 

> On Mar 30, 2022, at 12:46 PM, Kunal Parikh  wrote:
> 
> Hi Florin
> 
> Following is the output from
> vppctl show hardware-interfaces
>  
>   NameIdx   Link  Hardware
> local0 0down  local0
>   Link speed: unknown
>   local
> vpp0   1 up   vpp0
>   Link speed: unknown
>   RX Queues:
> queue thread mode
> 0 vpp_wk_0 (1)   polling
>   Ethernet address 0e:ca:6b:19:5b:95
>   AWS ENA VF
> carrier up full duplex max-frame-size 9026
> flags: admin-up maybe-multiseg rx-ip4-cksum
> Devargs:
> rx: queues 1 (max 8), desc 256 (min 128 max 2048 align 1)
> tx: queues 2 (max 8), desc 256 (min 128 max 1024 align 1)
> pci: device 1d0f:ec20 subsystem : address :00:06.00 numa 0
> max rx packet len: 9234
> promiscuous: unicast off all-multicast off
> vlan offload: strip off filter off qinq off
> rx offload avail:  ipv4-cksum udp-cksum tcp-cksum scatter
> rx offload active: ipv4-cksum scatter
> tx offload avail:  ipv4-cksum udp-cksum tcp-cksum multi-segs
> tx offload active: multi-segs
> rss avail: ipv4-tcp ipv4-udp ipv6-tcp ipv6-udp
> rss active:none
> tx burst function: (not available)
> rx burst function: (not available)
> 
> Should my goal be to move items in the "avail" list to the "active" list?
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21152): https://lists.fd.io/g/vpp-dev/message/21152
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-29 Thread Florin Coras

Hi Kunal,

No problem. Actually, another thing to consider might be mtu. If the interfaces 
are configured with mtu > 1.5kB and the network accepts jumbo frames, maybe try 
tcp {mtu } 

Regards, 
Florin 

> On Mar 29, 2022, at 12:24 PM, Kunal Parikh  wrote:
> 
> Many thanks for looking into this Florin.
> 
> I'll investigate DPDK PMD tests to see if checksum offloading can be enabled 
> outside of VPP. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21141): https://lists.fd.io/g/vpp-dev/message/21141
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-29 Thread Florin Coras

Yup, similar symptoms. 

So beyond trying to figure out why checksum offloading is not working and 
trying to combine that with gso, i.e., tcp { tso } in startup.conf, not sure 
what else could be done. 

If you decide to try debugging checksum offloading, try adding 
enable-tcp-udp-checksum to dpdk stanza, as opposed to no-tx-checksum-offload. 

Regards, 
Florin

> On Mar 29, 2022, at 11:36 AM, Kunal Parikh  wrote:
> 
> Diagnostics produced using -b 5g
> taskset --cpu-list 10-15 iperf3 -4 -c 10.21.120.133 -b 5g -t 30
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21137): https://lists.fd.io/g/vpp-dev/message/21137
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-29 Thread Florin Coras

Actually this time the client worker loops/s has dropped to 7k. So that worker 
seems to be struggling, probably because of the interface tx cost. 

Not sure how that could be solved as it looks like an ena + dpdk tx issue. Out 
of curiosity, if you try to limit iperf client bw by doing something "-b 5g", 
does it change anything? 

Regards,
Florin

> On Mar 29, 2022, at 10:21 AM, Kunal Parikh  wrote:
> 
> Attaching diagnostics. 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21135): https://lists.fd.io/g/vpp-dev/message/21135
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-29 Thread Florin Coras

Hi Kunal, 

I remember Shankar needed tcp { no-csum-offload } in startup.conf but I see you 
disabled tx-checksum-offload for dpdk. So could you try disabling it from tcp?

The fact that csum offloading is not working is probably going to somewhat 
affect throughput but I wouldn’t expect it to be that much. 

Regards,
Florin

> On Mar 29, 2022, at 5:51 AM, Kunal Parikh  wrote:
> 
> Thanks Florin.
> 
> I've attached output from the console of the iperf3 server and client.
> 
> I don't know what I should be looking for.
> 
> Can you please provide some pointers?
> 
> Many thanks,
> 
> Kunal 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21132): https://lists.fd.io/g/vpp-dev/message/21132
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-29 Thread Florin Coras

Hi Kunal, 

First of all, that’s a lot of workers. For this test, could you just reduce the 
number to 1? All of them, including worker 0 are spinning empty on both server 
and client, i.e., loop/s > 1M. 

Beyond that, the only thing I’m noticing is that the client is very bursty, 
i.e., sends up to 42 packets / dispatch but the receiver only gets 4. There are 
no drops so it looks like the network is struggling to buffer and deliver the 
packets instead of dropping, which might actually help in this case. 

How many rx/tx descriptors have you configured for your nics. You can check 
with “show hardware”. Could you make sure they’re not more than 256 in 
startup.conf?

Regards, 
Florin

> On Mar 29, 2022, at 5:51 AM, Kunal Parikh  wrote:
> 
> Thanks Florin.
> 
> I've attached output from the console of the iperf3 server and client.
> 
> I don't know what I should be looking for.
> 
> Can you please provide some pointers?
> 
> Many thanks,
> 
> Kunal 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21129): https://lists.fd.io/g/vpp-dev/message/21129
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-28 Thread Florin Coras

Hi Kunal, 

Unfortunately, the screenshots are unreadable for me. 

But if the throughput did not improve, maybe try:

clear run
show run

And check loop/s and vector/dispatch. And a 

show session verbose 2

And let’s see what the connection reports in terms of errors, cwnd and so on. 

Regards, 
Florin

> On Mar 28, 2022, at 1:35 PM, Kunal Parikh  wrote:
> 
> Also, I do believe that write combining is enabled based on:
> 
> 
> $ lspci -v -s 00:06.0
> 00:06.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
>  Physical Slot: 6
>  Flags: bus master, fast devsel, latency 0
>  Memory at febf8000 (32-bit, non-prefetchable) [size=16K]
>  Memory at fe90 (32-bit, prefetchable) [size=1M]
>  Memory at febe (32-bit, non-prefetchable) [size=64K]
>  Capabilities: [70] Express Endpoint, MSI 00
>  Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
>  Kernel driver in use: vfio-pci
>  Kernel modules: ena
>  
> root@ip-10-21-120-175:~# cat /sys/kernel/debug/x86/pat_memtype_list | grep 
> fe90
> PAT: [mem 0xfe80-0xfe90] write-combining
> PAT: [mem 0xfe90-0xfea0] uncached-minus
> PAT: [mem 0xfe90-0xfea0] uncached-minus
> 
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21125): https://lists.fd.io/g/vpp-dev/message/21125
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vpp-hoststack

2022-03-24 Thread Florin Coras

Hi, 

It does not. For such scenarios it’s probably better to use something like 
memif. 

Regards, 
Florin

> On Mar 24, 2022, at 2:42 AM, 25956760...@gmail.com wrote:
> 
> Dose anybody know if VCL supports raw_socket?Thanks. 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21108): https://lists.fd.io/g/vpp-dev/message/21108
Mute This Topic: https://lists.fd.io/mt/89995693/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-23 Thread Florin Coras

Hi Shankar, 

That’s a pretty old release. Could you try something newer, like 22.02? 

Nonetheless, you’ll probably need to try some of those optimizations.

Regards, 
Florin

> On Mar 23, 2022, at 11:47 AM, Shankar Raju  wrote:
> 
> Hi Florin,
> I'm using VPP Version: 20.09-release. These were the results I got with the 
> default config. Let me try some of those optimizations and see if that works. 
> Thanks 
> 
> WITH VPP :
> Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 
> seconds, 10 second test, tos 0
> [ ID] Interval   Transfer Bitrate Retr  Cwnd
> [ 33]   0.00-1.00   sec   415 MBytes  3.48 Gbits/sec0   0.00 Bytes
> [ 33]   1.00-2.00   sec   425 MBytes  3.56 Gbits/sec0   0.00 Bytes
> [ 33]   2.00-3.00   sec   426 MBytes  3.57 Gbits/sec0   0.00 Bytes
> [ 33]   3.00-4.00   sec   423 MBytes  3.54 Gbits/sec0   0.00 Bytes
> [ 33]   4.00-5.00   sec   417 MBytes  3.50 Gbits/sec0   0.00 Bytes
> [ 33]   5.00-6.00   sec   417 MBytes  3.50 Gbits/sec0   0.00 Bytes
> [ 33]   6.00-7.00   sec   418 MBytes  3.51 Gbits/sec0   0.00 Bytes
> [ 33]   7.00-8.00   sec   422 MBytes  3.54 Gbits/sec0   0.00 Bytes
> [ 33]   8.00-9.00   sec   418 MBytes  3.50 Gbits/sec0   0.00 Bytes
> [ 33]   9.00-10.00  sec   422 MBytes  3.54 Gbits/sec0   0.00 Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> Test Complete. Summary Results:
> [ ID] Interval   Transfer Bitrate Retr
> [ 33]   0.00-10.00  sec  4.10 GBytes  3.53 Gbits/sec0 sender
> [ 33]   0.00-10.00  sec  4.10 GBytes  3.52 Gbits/sec  receiver
> 
> WITHOUT VPP:
> 
> Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 
> seconds, 10 second test, tos 0
> [ ID] Interval   Transfer Bitrate Retr  Cwnd
> [  5]   0.00-1.00   sec  1.11 GBytes  9.56 Gbits/sec0   1.95 MBytes
> [  5]   1.00-2.00   sec  1.11 GBytes  9.53 Gbits/sec0   1.95 MBytes
> [  5]   2.00-3.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.16 MBytes
> [  5]   3.00-4.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.27 MBytes
> [  5]   4.00-5.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.27 MBytes
> [  5]   5.00-6.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.27 MBytes
> [  5]   6.00-7.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.27 MBytes
> [  5]   7.00-8.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.50 MBytes
> [  5]   8.00-9.00   sec  1.11 GBytes  9.53 Gbits/sec0   2.50 MBytes
> [  5]   9.00-10.00  sec  1.11 GBytes  9.53 Gbits/sec0   2.50 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> Test Complete. Summary Results:
> [ ID] Interval   Transfer Bitrate Retr
> [  5]   0.00-10.00  sec  11.1 GBytes  9.53 Gbits/sec0 sender
> [  5]   0.00-10.00  sec  11.1 GBytes  9.53 Gbits/sec  receiver
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21106): https://lists.fd.io/g/vpp-dev/message/21106
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-23 Thread Florin Coras

Hi Shankar, 

What is the result and what is the difference? Also, I might’ve missed it but 
what was the vpp version in these tests? 

Regarding optimizations:
- show hardware: will tell you the numa for your nic (if you have multiple 
numas) and the rx/tx descriptor ring sizes. Typically for tcp it’s preferable 
to use a smaller number of descriptors, say 256. These are configurable under 
dpdk stanza per nic
- show run: check loops/s and vector rates per nodes. If loops/s < 10k or 
vector rate for any node > 100, results should be further inspected. 
- show error: if tcp reports lots of out of order enqueues or lots of dupacks, 
that’s a sign that something is dropping. Might be the nic if you get interface 
tx errors. 
- cpu pinning: make sure your vpp worker runs on the same numa as your nic (if 
need be). Pin iperf to same numa as vpp's worker but make sure they don’t use 
the same cpu.
- fifos: you’re now using 800kB fifos. Maybe raise those to 4MB albeit for low 
latency that shouldn’t matter much. 
- ena specific: check if write combining is enabled 
https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk#622-vfio-pci 

Regards,
Florin

> On Mar 23, 2022, at 9:47 AM, Shankar Raju  wrote:
> 
> Hi Florin,
> 
> Disabling checksums worked. Now iperf is able to send and receive traffic. 
> But the transfer rate and bitrate seems to be smaller when using VPP. Could 
> you please let me know the right tuning params for getting better performance 
> with VPP ?
> 
> Thanks  
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21104): https://lists.fd.io/g/vpp-dev/message/21104
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-23 Thread Florin Coras

Hi Shankar, 

In startup.conf under tcp stanza, add no-csum-offload. 

Regards, 
Florin

> On Mar 23, 2022, at 6:59 AM, Shankar Raju  wrote:
> 
> Hi Florin,
> 
> I'm running this experiment on AWS and its using ENA NICs. I ran vppctl show 
> error command and I did see errors because of bad checksums. Is there a way 
> to turn off tx and rx checksuming through vpp just like we do with ethtool ?
> 
> SERVER SIDE:
> vppctl show errors
>Count  Node  Reason
>Severity
>  1 dpdk-input  no error   
>  error
>  4 arp-reply   ARP replies sent   
>  error
> 17   session-queue   Packets transmitted  
>  info
>  2tcp4-listen   SYNs received 
>  info
>  4  tcp4-rcv-process  Pure ACKs received  
>  info
>  2  tcp4-rcv-processFINs received 
>  info
>  6  tcp4-established Packets pushed into rx fifo  
>  info
>  6  tcp4-established  Pure ACKs received  
>  info
> 19tcp4-outputPackets sent 
>  info
>  1 ip4-glean  ARP requests sent   
>  error
>  9 ip4-local   bad tcp checksum   
>  error
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21101): https://lists.fd.io/g/vpp-dev/message/21101
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP Iperf3 test

2022-03-22 Thread Florin Coras

Hi Shankar, 

What vpp version is this? For optimizations, could you take a look at a recent 
version of [1]?

Having said that, let’s try debugging this in small steps. First, I’d recommend 
not exporting LD_PRELOAD instead doing something like:

sudo sh -c “LD_PRELOAD= VCL_CONFIG= iperf3 -4 -s”
sudo sh -c “LD_PRELOAD= VCL_CONFIG= iperf3 -c 10.21.120.181”

I’m assuming here that you’re running 2 vpps on two different hosts. Also, no 
need for -b on client side, tcp tests scale up by default. 

With respect to your startup.conf, no need to configure default buffer size, 
buffers can be kept at 16384. With recent vpp versions  evt_qs_memfd_seg is no 
longer needed under session stanza. 

If results don’t change. “show session verbose 2” on both client and server 
side vpp. 

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf

> On Mar 22, 2022, at 1:39 PM, Shankar Raju  wrote:
> 
> Hi Guys,
>  
> I’m trying to test VPP configuration with iperf3 and I running into the 
> following issue. 
> · Iperf3 client is able to make a connection with the server, but the 
> client receives data for the first 0-1 second and then it does not receive 
> any traffic from the server. The bitrate is zero after that.
>  
> Here’s my startup.conf and vcl.conf
>  
> STARTUP.CONF
>  
> unix {
>   nodaemon
>   log /var/log/vpp/vpp.log
>   full-coredump
>   cli-listen /run/vpp/cli.sock
>   gid vpp
>  
>   ## run vpp in the interactive mode
>   # interactive
>  
>   ## do not use colors in terminal output
>   # nocolor
>  
>   ## do not display banner
>   # nobanner
> }
>  
>  
> api-segment {
>   gid vpp
> }
>  
> socksvr {
>   default
> }
>  
> dpdk {
>   uio-driver vfio-pci
>   dev :00:06.0 {
> name eth1
>   }
> }
>  
> buffers {
>   buffers-per-numa 65536
>   default data-size 2048
> }
>  
> cpu {
>   main-core 0
>   workers 1
> }
>  
> session {
>   evt_qs_memfd_seg
> }
>  
> VCL.CONF
> vcl {
>   rx-fifo-size 80
>   tx-fifo-size 80
>   app-scope-local
>   app-scope-global
>   api-socket-name /var/run/vpp/api.sock
> }
>  
> Steps I followed:
>  
> · Set IP address through vpcctl
> Iperf3 server command:
> · export LD_PRELOAD=$LDP_PATH && export VCL_CONFIG=$VCL_CFG
> · iperf3 -4 -s -A 1 -V
>  
> iperf3 client command:
> · export LD_PRELOAD=$LDP_PATH && export VCL_CONFIG=$VCL_CFG
> · iperf3 -c 10.21.120.181 -u -A 1 -V -b 0
>  
> Client side output:
>  
> iperf3 -c 10.21.120.181 -A 1 -V
> iperf 3.7
> Linux ip-10-21-120-48 5.13.0-1019-aws #21~20.04.1-Ubuntu SMP Wed Mar 16 
> 11:54:08 UTC 2022 x86_64
> Control connection MSS 1448
> Time: Tue, 22 Mar 2022 19:38:53 GMT
> Connecting to host 10.21.120.181, port 5201
>   Cookie: 3crb2g5fkqv6atlhbe7z6kr4r5kcgjj2tfeb
>   TCP MSS: 1448 (default)
> [ 33] local 10.21.120.128 port 35336 connected to 10.21.120.181 port 5201
> Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 
> seconds, 10 second test, tos 0
> [ ID] Interval   Transfer Bitrate Retr  Cwnd
> [ 33]   0.00-1.00   sec   256 KBytes  2.10 Mbits/sec0   0.00 Bytes
> [ 33]   1.00-2.01   sec  0.00 Bytes  0.00 bits/sec0   0.00 Bytes
> [ 33]   2.01-3.00   sec  0.00 Bytes  0.00 bits/sec0   0.00 Bytes
> [ 33]   3.00-4.01   sec  0.00 Bytes  0.00 bits/sec0   0.00 Bytes
> [ 33]   4.01-5.00   sec  0.00 Bytes  0.00 bits/sec0   0.00 Bytes
>  
> Server side output:
>  
> iperf3 -4 -s -A 1 -V
> iperf 3.7
> Linux ip-10-21-120-83 5.13.0-1019-aws #21~20.04.1-Ubuntu SMP Wed Mar 16 
> 11:54:08 UTC 2022 x86_64
> ---
> Server listening on 5201
> ---
> Time: Tue, 22 Mar 2022 19:38:53 GMT
> Accepted connection from 10.21.120.128, port 2061
>   Cookie: 3crb2g5fkqv6atlhbe7z6kr4r5kcgjj2tfeb
>   TCP MSS: 0 (default)
> [ 34] local 10.21.120.181 port 5201 connected to 10.21.120.128 port 35336
> Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 
> seconds, 10 second test, tos 0
> [ ID] Interval   Transfer Bitrate
> [ 34]   0.00-1.00   sec  0.00 Bytes  0.00 bits/sec
> [ 34]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec
> [ 34]   2.00-3.01   sec  0.00 Bytes  0.00 bits/sec
> [ 34]   3.01-4.01   sec  0.00 Bytes  0.00 bits/sec
>  
> Please let me know if I’m configuring something wrong.
>  
> Thanks
>  
> Shankar Raju
>  
>  
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21098): https://lists.fd.io/g/vpp-dev/message/21098
Mute This Topic: https://lists.fd.io/mt/89961794/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vpp-hoststack

2022-03-21 Thread Florin Coras

Only cubic and newreno. 

Regards, 
Florin

> On Mar 21, 2022, at 9:15 PM, 25956760...@gmail.com wrote:
> 
> Thank you for your answer, so is now only Cubic  supported?
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21090): https://lists.fd.io/g/vpp-dev/message/21090
Mute This Topic: https://lists.fd.io/mt/89934022/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] #vpp-hoststack

2022-03-21 Thread Florin Coras

Hi, 

TCP BBR is on our todo list but it’s not currently supported.

Regards,
Florin

> On Mar 21, 2022, at 10:40 AM, 25956760...@gmail.com wrote:
> 
> Dose anyone konw that if vpp-hoststack supports tcp BBR congestion 
> algorithm.I need to use it, thanks   
>  
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21086): https://lists.fd.io/g/vpp-dev/message/21086
Mute This Topic: https://lists.fd.io/mt/89934022/21656
Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP 21.06 - seeing TCP packet drops

2022-03-20 Thread Florin Coras

Hi Vijay, 

Glad it helped!

The patch does not cleanly cherry-pick through gerrit and 21.06 is pretty old 
at this point, i.e., not sure if we’re planning any maintenance release at this 
point. Unfortunately that means your only options are either to switch to 
22.02, I already cherry-picked the patch, or to maintain the fix in a private 
branch.

Regards,
Florin

> On Mar 20, 2022, at 4:27 AM, Vijay Kumar  wrote:
> 
> Hi Florin,
> 
> Thanks for the wonderful suggestion.
> 
> Upon adding the below option the tcp_listen works fine. The syn-ack is being 
> sent from vpp to peer and the TCP handshake is completing successfully.
> 
> a->options[APP_OPTIONS_ADD_SEGMENT_SIZE] = segment_size;
> 
> As you mentioned, earlier there was only one listener but now we added to 
> more ALG applications that are also listening on TCP. Hence this 
> option becomes mandatory for me.
> 
> Florin, can u pls commit the code changes that you shared in this patch. The 
> 21.06 code was buggy and it was not reporting any error in the 
> tcp46_listen_inline() node.
> 
> https://gerrit.fd.io/r/c/vpp/+/35654 <https://gerrit.fd.io/r/c/vpp/+/35654>
> 
> 
> Thanks.
> 
> 
> On Thu, Mar 17, 2022 at 12:07 AM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Vijay, 
> 
> Yes, APP_OPTIONS_ADD_SEGMENT_SIZE will be needed if any listeners are used. 
> It was there before but was not needed if only one listener was configured.
> 
> Regards, 
> Florin
> 
>> On Mar 16, 2022, at 11:15 AM, Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> 
>> Hi Florin,
>> 
>> My application code has not changed b/w 20.05 and 21.06. The below is the 
>> code snippet of my application that binds the TCP  listen IP/Port
>> The option that you mentioned "APP_OPTIONS_ADD_SEGMENT_SIZE" is not set in 
>> my application code but "APP_OPTIONS_SEGMENT_SIZE" is set.
>> 
>> My application code pasted below worked fine in VPP 20.05 but not in 21.06
>> Is the missing option "APP_OPTIONS_ADD_SEGMENT_SIZE" important to be set in 
>> 21.06 VPP?
>> 
>> 
>> static int nas_server_attach (u32 *nasAppIndex)
>> {
>> an_ppe_nas_main_t *pm = _ppe_nas_main;
>> u64 options[APP_OPTIONS_N_OPTIONS];
>> vnet_app_attach_args_t _a, *a = &_a;
>> u32 segment_size = 512 << 20;
>> 
>> clib_memset (a, 0, sizeof (*a));
>> clib_memset (options, 0, sizeof (options));
>> 
>> if (pm->private_segment_size)
>> segment_size = pm->private_segment_size;
>> a->name = format (0, "nas-tcp-server");
>> a->api_client_index = APP_INVALID_INDEX;
>> a->session_cb_vft = _server_session_cb_vft;
>> a->options = options;
>> a->options[APP_OPTIONS_SEGMENT_SIZE] = segment_size;
>> a->options[APP_OPTIONS_RX_FIFO_SIZE] = pm->fifo_size;
>> a->options[APP_OPTIONS_TX_FIFO_SIZE] = pm->fifo_size;
>> a->options[APP_OPTIONS_PRIVATE_SEGMENT_COUNT] = 
>> pm->private_segment_count;
>> a->options[APP_OPTIONS_PREALLOC_FIFO_PAIRS] =
>>     pm->prealloc_fifos ? pm->prealloc_fifos : 0;
>> 
>> a->options[APP_OPTIONS_FLAGS] = APP_OPTIONS_FLAGS_IS_BUILTIN;
>> 
>> if (vnet_application_attach (a))
>> {
>> NAS_DBG("Failed to attach ");
>> return -1;
>> }
>> *nasAppIndex = a->app_index;
>> pm->app_index = a->app_index;
>>   
>> vec_free (a->name);
>> return 0;
>> }
>> 
>> 
>> 
>> Regards,
>> Vijay
>> 
>> On Wed, Mar 16, 2022 at 10:19 PM Florin Coras > <mailto:fcoras.li...@gmail.com>> wrote:
>> Hi Vijay, 
>> 
>> That’s a sign that either fifo allocations failed (not enough memory in the 
>> fifo segment) or that the app refused the session (app_worker_accept_notify 
>> returns non zero). 
>> 
>> Here’s a guess, your app does not set APP_OPTIONS_ADD_SEGMENT_SIZE in the 
>> attach options passed to vnet_application_attach. Some vpp versions ago we 
>> switched to using the first fifo segment as connects segment and all 
>> listeners allocate their first segments based on size provided with this 
>> option. If not provided, listeners fail to allocate segments. 
>> 
>> Regards,
>> Florin
>> 
>>> On Mar 16, 2022, at 3:49 AM, Vijay Kumar >> <mailto:vjkumar2...@gmail.com>> wrote:
>>> 
>>> Hi Florin,
>>> 
>>> The patch helped me to find the exact point of failu

Re: [vpp-dev] vlib_plugin_registration size mismatch

2022-03-16 Thread Florin Coras

No problems, glad it’s solved :-)

Cheers,
Florin

> On Mar 16, 2022, at 10:36 AM, Matthew Smith  wrote:
> 
> Hi Florin!
> 
> Apparently all those plugins are in vpp-plugin-devtools and I had an old 
> build of that package installed. Oops!
> 
> Thanks,
> -Matt
> 
> 
> On Wed, Mar 16, 2022 at 12:00 PM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Matt, 
> 
> Just tried running after a make build and no such message in show log for me. 
> Did you try a make wipe? 
> 
> Regards,
> Florin
> 
> > On Mar 16, 2022, at 9:50 AM, Matthew Smith via lists.fd.io 
> > <http://lists.fd.io/>  > <mailto:netgate@lists.fd.io>> wrote:
> > 
> > Hi,
> > 
> > I have been testing against a build from yesterday's master branch (commit 
> > id b0f0f8c8dd9d694bfc13652f89b8b577e9c1c708) and have been seeing these 
> > messages when VPP starts up:
> > 
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > bufmon_plugin.so
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > dispatch_trace_plugin.so
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > oddbuf_plugin.so
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > perfmon_plugin.so
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > tracedump_plugin.so
> > vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> > unittest_plugin.so
> > 
> > I noticed the problem because I was trying to do some testing on the 
> > tracedump plugin and I could not get the tests to work. But many other 
> > plugins (vrrp, nat, acl) work fine and do not have similar messages logged.
> > 
> > Does anyone have any ideas on what may cause this? Is anyone else seeing 
> > this issue on a build from the current master branch?
> > 
> > Thanks,
> > -Matt
> > 
> > 
> > 
> > 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21044): https://lists.fd.io/g/vpp-dev/message/21044
Mute This Topic: https://lists.fd.io/mt/89826377/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP 21.06 - seeing TCP packet drops

2022-03-16 Thread Florin Coras

Hi Vijay, 

Yes, APP_OPTIONS_ADD_SEGMENT_SIZE will be needed if any listeners are used. It 
was there before but was not needed if only one listener was configured.

Regards, 
Florin

> On Mar 16, 2022, at 11:15 AM, Vijay Kumar  wrote:
> 
> Hi Florin,
> 
> My application code has not changed b/w 20.05 and 21.06. The below is the 
> code snippet of my application that binds the TCP  listen IP/Port
> The option that you mentioned "APP_OPTIONS_ADD_SEGMENT_SIZE" is not set in my 
> application code but "APP_OPTIONS_SEGMENT_SIZE" is set.
> 
> My application code pasted below worked fine in VPP 20.05 but not in 21.06
> Is the missing option "APP_OPTIONS_ADD_SEGMENT_SIZE" important to be set in 
> 21.06 VPP?
> 
> 
> static int nas_server_attach (u32 *nasAppIndex)
> {
> an_ppe_nas_main_t *pm = _ppe_nas_main;
> u64 options[APP_OPTIONS_N_OPTIONS];
> vnet_app_attach_args_t _a, *a = &_a;
> u32 segment_size = 512 << 20;
> 
> clib_memset (a, 0, sizeof (*a));
> clib_memset (options, 0, sizeof (options));
> 
> if (pm->private_segment_size)
> segment_size = pm->private_segment_size;
> a->name = format (0, "nas-tcp-server");
> a->api_client_index = APP_INVALID_INDEX;
> a->session_cb_vft = _server_session_cb_vft;
> a->options = options;
> a->options[APP_OPTIONS_SEGMENT_SIZE] = segment_size;
> a->options[APP_OPTIONS_RX_FIFO_SIZE] = pm->fifo_size;
> a->options[APP_OPTIONS_TX_FIFO_SIZE] = pm->fifo_size;
> a->options[APP_OPTIONS_PRIVATE_SEGMENT_COUNT] = pm->private_segment_count;
> a->options[APP_OPTIONS_PREALLOC_FIFO_PAIRS] =
> pm->prealloc_fifos ? pm->prealloc_fifos : 0;
> 
> a->options[APP_OPTIONS_FLAGS] = APP_OPTIONS_FLAGS_IS_BUILTIN;
> 
> if (vnet_application_attach (a))
> {
> NAS_DBG("Failed to attach ");
> return -1;
> }
> *nasAppIndex = a->app_index;
> pm->app_index = a->app_index;
>   
> vec_free (a->name);
> return 0;
> }
> 
> 
> 
> Regards,
> Vijay
> 
> On Wed, Mar 16, 2022 at 10:19 PM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Vijay, 
> 
> That’s a sign that either fifo allocations failed (not enough memory in the 
> fifo segment) or that the app refused the session (app_worker_accept_notify 
> returns non zero). 
> 
> Here’s a guess, your app does not set APP_OPTIONS_ADD_SEGMENT_SIZE in the 
> attach options passed to vnet_application_attach. Some vpp versions ago we 
> switched to using the first fifo segment as connects segment and all 
> listeners allocate their first segments based on size provided with this 
> option. If not provided, listeners fail to allocate segments. 
> 
> Regards,
> Florin
> 
>> On Mar 16, 2022, at 3:49 AM, Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> 
>> Hi Florin,
>> 
>> The patch helped me to find the exact point of failure in 
>> tcp46_listen_inline() graph node.
>> When I tested again, I get the below error and it maps to this error code 
>> "TCP_ERROR_CREATE_SESSION_FAIL"
>> This counter is incremented when the call to function 
>> session_stream_accept() returns failure.
>> 
>> Is there any potential reason why allocation fails at this place?
>> 
>> 
>> show errors output
>> ===
>> 4 arp-reply ARP replies 
>> sent  error  
>> 14   ip4-udp-lookup  No 
>> error  error  
>>  5tcp4-listen Sessions couldn't 
>> be allocated   error  
>>  5  esp4-decrypt-tunESP pkts 
>> received  error  
>>  5  ipsec4-tun-input  good packets 
>> receivederror  
>>  1 ip4-input   ip4 ttl 
>> <= 1error  
>>  1   ip4-icmp-error  hop limit exceeded 
>> response sent  error  
>>   3346   ethernet-inputunknown 
>> vlanerror  
>> 
>> 
>> On Wed, Mar 16, 2022 at 1:31 PM Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> Hi Florin,
>> 
>> Thanks for the clarification about the TCP changes b/w the 2 releases
>> 
>> I will us

Re: [vpp-dev] vlib_plugin_registration size mismatch

2022-03-16 Thread Florin Coras

Hi Matt, 

Just tried running after a make build and no such message in show log for me. 
Did you try a make wipe? 

Regards,
Florin

> On Mar 16, 2022, at 9:50 AM, Matthew Smith via lists.fd.io 
>  wrote:
> 
> Hi,
> 
> I have been testing against a build from yesterday's master branch (commit id 
> b0f0f8c8dd9d694bfc13652f89b8b577e9c1c708) and have been seeing these messages 
> when VPP starts up:
> 
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> bufmon_plugin.so
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> dispatch_trace_plugin.so
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> oddbuf_plugin.so
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> perfmon_plugin.so
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> tracedump_plugin.so
> vpp: plugin/load: vlib_plugin_registration size mismatch in plugin 
> unittest_plugin.so
> 
> I noticed the problem because I was trying to do some testing on the 
> tracedump plugin and I could not get the tests to work. But many other 
> plugins (vrrp, nat, acl) work fine and do not have similar messages logged.
> 
> Does anyone have any ideas on what may cause this? Is anyone else seeing this 
> issue on a build from the current master branch?
> 
> Thanks,
> -Matt
> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21040): https://lists.fd.io/g/vpp-dev/message/21040
Mute This Topic: https://lists.fd.io/mt/89826377/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] VPP 21.06 - seeing TCP packet drops

2022-03-16 Thread Florin Coras

Hi Vijay, 

That’s a sign that either fifo allocations failed (not enough memory in the 
fifo segment) or that the app refused the session (app_worker_accept_notify 
returns non zero). 

Here’s a guess, your app does not set APP_OPTIONS_ADD_SEGMENT_SIZE in the 
attach options passed to vnet_application_attach. Some vpp versions ago we 
switched to using the first fifo segment as connects segment and all listeners 
allocate their first segments based on size provided with this option. If not 
provided, listeners fail to allocate segments. 

Regards,
Florin

> On Mar 16, 2022, at 3:49 AM, Vijay Kumar  wrote:
> 
> Hi Florin,
> 
> The patch helped me to find the exact point of failure in 
> tcp46_listen_inline() graph node.
> When I tested again, I get the below error and it maps to this error code 
> "TCP_ERROR_CREATE_SESSION_FAIL"
> This counter is incremented when the call to function session_stream_accept() 
> returns failure.
> 
> Is there any potential reason why allocation fails at this place?
> 
> 
> show errors output
> ===
> 4 arp-reply ARP replies 
> sent  error  
> 14   ip4-udp-lookup  No error 
>  error  
>  5tcp4-listen Sessions couldn't 
> be allocated   error  
>  5  esp4-decrypt-tunESP pkts 
> received  error  
>  5  ipsec4-tun-input  good packets 
> receivederror  
>  1 ip4-input   ip4 ttl <= 
> 1error  
>  1   ip4-icmp-error  hop limit exceeded 
> response sent  error  
>   3346   ethernet-inputunknown 
> vlanerror  
> 
> 
> On Wed, Mar 16, 2022 at 1:31 PM Vijay Kumar  <mailto:vjkumar2...@gmail.com>> wrote:
> Hi Florin,
> 
> Thanks for the clarification about the TCP changes b/w the 2 releases
> 
> I will use your patch, hopefully I will catch the issue about where the drops 
> are. I will try to debug further.
> I will revert back if required.
> 
> 
> Regards.
> 
> On Wed, Mar 16, 2022 at 11:12 AM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Vijay, 
> 
>> On Mar 15, 2022, at 9:58 PM, Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> 
>> Hi florin,
>> 
>> Thanks a lot for helping me out. I will try your patch and update you with 
>> the result.
> 
> Thanks! 
> 
>> 
>> 
>> A general observation
>> ==
>> In my setup, I think the SYN pkt is dropped much before the SYNS_RCVD 
>> counter is incremented.
> 
> That’s what I’d expect as well. 
> 
>> 
>> I have seen a lot of changes b/w 20.05 and 21.06 code, like in the graph 
>> node tcp46_listen_inline() of VPP 20.05 the SYN  pkt trace function is 
>> called towards the end i.e. after calling tcp_send_synack() but in 21.06 
>> code, the tcp46_listen_trace_frame() is called at the very beginning of 
>> tcp46_listen_inline() graph node.
>> 
>> I think moving the pkt trace macro to the end of the function is good 
>> (placing it close to the line that calls tcp_send_synack), otherwise it can 
>> mislead us and will not help debugging.
>> 
> 
> Most of the changes were code refactoring/cleanup and buffer handling 
> optimizations, not protocol related. TCP tracing, at least as it is now, 
> doesn’t provide info about the errors hit, instead it reports the connections 
> hit and the reporting of errors is done through node and session counters 
> (see show session verbose 2). Obviously that didn’t work properly for the 
> listen node. 
> 
> Regards,
> Florin
> 
>> 
>> 
>> On Wed, Mar 16, 2022 at 10:22 AM Florin Coras > <mailto:fcoras.li...@gmail.com>> wrote:
>> Hi Vijay, 
>> 
>> That’s probably because packets are hitting an actual error and it seems the 
>> listen node is not reporting anything but syns received. Here’s a patch that 
>> might help [1]. It might not cherry-pick cleanly on 21.06. 
>> 
>> Regards,
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/35654 
>> <https://gerrit.fd.io/r/c/vpp/+/35654>
>> 
>>> On Mar 15, 2022, at 7:54 PM, Vijay Kumar >> <mailto:vjkumar2...@gmail.com>> wrote:
>>> 
>>> Hi Florin,
>>> 
>>> I have the output

Re: [vpp-dev] VPP 21.06 - seeing TCP packet drops

2022-03-15 Thread Florin Coras

Hi Vijay, 

> On Mar 15, 2022, at 9:58 PM, Vijay Kumar  wrote:
> 
> Hi florin,
> 
> Thanks a lot for helping me out. I will try your patch and update you with 
> the result.

Thanks! 

> 
> 
> A general observation
> ==
> In my setup, I think the SYN pkt is dropped much before the SYNS_RCVD counter 
> is incremented.

That’s what I’d expect as well. 

> 
> I have seen a lot of changes b/w 20.05 and 21.06 code, like in the graph node 
> tcp46_listen_inline() of VPP 20.05 the SYN  pkt trace function is called 
> towards the end i.e. after calling tcp_send_synack() but in 21.06 code, the 
> tcp46_listen_trace_frame() is called at the very beginning of 
> tcp46_listen_inline() graph node.
> 
> I think moving the pkt trace macro to the end of the function is good 
> (placing it close to the line that calls tcp_send_synack), otherwise it can 
> mislead us and will not help debugging.
> 

Most of the changes were code refactoring/cleanup and buffer handling 
optimizations, not protocol related. TCP tracing, at least as it is now, 
doesn’t provide info about the errors hit, instead it reports the connections 
hit and the reporting of errors is done through node and session counters (see 
show session verbose 2). Obviously that didn’t work properly for the listen 
node. 

Regards,
Florin

> 
> 
> On Wed, Mar 16, 2022 at 10:22 AM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Vijay, 
> 
> That’s probably because packets are hitting an actual error and it seems the 
> listen node is not reporting anything but syns received. Here’s a patch that 
> might help [1]. It might not cherry-pick cleanly on 21.06. 
> 
> Regards,
> Florin
> 
> [1] https://gerrit.fd.io/r/c/vpp/+/35654 
> <https://gerrit.fd.io/r/c/vpp/+/35654>
> 
>> On Mar 15, 2022, at 7:54 PM, Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> 
>> Hi Florin,
>> 
>> I have the output of show node counters (show errors) taken on both 20.05 
>> and 21.06 vpp. In 21.06 I don't see any counters in tcp4-listen or 
>> tcp4-output etc. 
>> 
>> Please let me know why SYN rcvd counter itself is not incremented but in the 
>> earlier reply I already pasted the show trace ouput where we saw the SYN pkt 
>> was landed on tcp4-litsen node.
>> 
>> VPP 20.05
>> ===
>> pp# show node counters 
>>CountNode  Reason
>>  3   memif-input  not ip packet
>>  10259 an_ppe_wfectrl wfectrl packets received
>>  10259 an_ppe_wfectrl wfectrl replies sent
>>  1 an_ppe_wfectrl wfectrl packet processing 
>> failed
>>   8123 an_ppe_wfectrl session stat request 
>> received
>>  1 an_ppe_wfectrl service construct config 
>> request received
>>  1 an_ppe_wfectrl service construct config 
>> request success
>>  1 an_ppe_wfectrl service config request 
>> received
>>  1 an_ppe_wfectrl service config request 
>> success
>>   1686 an_ppe_wfectrl dpi stats request received
>>   1686 an_ppe_wfectrl dpi stats request success
>> 70 an_ppe_wfectrl nat stats request received
>> 70 an_ppe_wfectrl nat stats request success
>> 70 an_ppe_wfectrl vpp stats request received
>> 70 an_ppe_wfectrl vpp stats request success
>>  1 an_ppe_wfectrl udp tunnel resource block 
>> add request received
>>  1 an_ppe_wfectrl udp tunnel resource block 
>> add request success
>>  1 an_ppe_wfectrl l3rm resource block update 
>> request received
>>  1 an_ppe_wfectrl l3rm resource block update 
>> request success
>>  1 an_ppe_wfectrl ue registration request 
>> received
>> 17 an_ppe_wfectrl ike msg received from 
>> ikemgr
>> 17 an_ppe_wfectrl ike msg send to network 
>> success
>>  3 an_ppe_wfectrl ipsec sa install msg 
>> received from ikemgr
>>  3 an_ppe_wfectrl ipsec sa install msg 
>> processed successfully
>>  1  an_ppe_vppc

Re: [vpp-dev] VPP 21.06 - seeing TCP packet drops

2022-03-15 Thread Florin Coras

d 
> in rtr   error  
>  5an_ppe_router_input   packets dropped to host. no 
> tacptcp session found  error  
> 14   an-ppe-isakmp4-output  Total IKEV4 packets 
> dispatched to network  error  
> 14   an-ppe-isakmpmgr-inputpackets processed by 
> isakmpmgr input plugin error  
> 14   an-ppe-isakmpmgr-inputReceived IKE 
> packetserror  
> 14   an-ppe-isakmpmgr-input  Successfully sent ike 
> message to Session Manager  error  
>  5   an-ppe-isakmpmgr-input Received ike exchange 
> AUTH packet  error  
>  4   an-ppe-isakmpmgr-inputReceived ike exchange 
> CREATE CHILD SA packeterror  
>  1   an-ppe-isakmpmgr-inputReceived ike exchange 
> SA INIT packeterror  
>  4   an-ppe-isakmpmgr-input Received ike exchange 
> INFORMATIONAL packet error  
>  3 arp-reply ARP replies 
> sent  error  
> 14   ip4-udp-lookup  No error 
>  error  
>  5  esp4-decrypt-tun    ESP pkts 
> received  error  
>  5  ipsec4-tun-input  good packets 
> receivederror  
>  1 ip4-input   ip4 ttl <= 
> 1error  
>  1   ip4-icmp-error  hop limit exceeded 
> response sent  error  
>  27628   ethernet-inputunknown 
> vlanerror  
> vpp# 
> vpp# 
> 
>  
> 
> 
> On Wed, Mar 16, 2022 at 6:52 AM Florin Coras  <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Vijay, 
> 
> You won’t see the syn/syn-acks in traces because of the way they are 
> generated. Nonetheless, you can verify that the syns were received with “show 
> error” and check that the syn-acks were actually generated with a pcap trace 
> rxtx 
> 
> Previously tracing worked because buffers were re-used and incoming packets 
> were directly sent to output. More recently we’ve started dispatching 
> syn-acks through the session layer in order to minimize size of tx bursts per 
> dispatch. 
> 
> Regards,
> Florin
> 
>> On Mar 15, 2022, at 6:08 PM, Vijay Kumar > <mailto:vjkumar2...@gmail.com>> wrote:
>> 
>> Hi Florin,
>> 
>> Thanks for the quick response.
>> 
>> It means it is expected to see the SYN pkts getting dropped after it is 
>> terminated but I was not able to see any SYN-ACK going out of VPP. 
>> 
>> In both, "show trace" cmd output and also the pcap taken at dspatch graph 
>> node level, I was not able to see the SYN-ACK going out.  The route to the 
>> SYN-ACK destination (which is the original source that started SYN) is also 
>> present in the ip fib output. The configuration is the same that was working 
>> fine for me in vp 20.05. 
>> 
>> Is there anything that I can look at for the SYN-ACK not sending issue?
>> 
>> 
>> 
>> On Wed, 16 Mar 2022, 00:40 Florin Coras, > <mailto:fcoras.li...@gmail.com>> wrote:
>> Hi Vijay, 
>> 
>> I see an_ppe_router_input forwards syns to tcp-input and those packets are 
>> delivered to tcp-listen, i.e., you’ve created a listener that’s matched by 
>> the incoming traffic. 
>> 
>> The thing to keep in mind is that tcp terminates incoming flows and it does 
>> not reuse buffers. That is, the syn hits the listen node and a syn-ack is 
>> programmed for this new connection that still needs to complete the 
>> handshake. The original syn packet is discarded and therefore you see it as 
>> a drop. 
>> 
>> Regards, 
>> Florin 
>> 
>>> On Mar 15, 2022, at 3:50 AM, Vijay Kumar >> <mailto:vjkumar2...@gmail.com>> wrote:
>>> 
>>> The is the output of show trace and show interface
>>> 
>>> 
>>> Packet 36
>>> 
>>> 00:03:26:875694: dpdk-input
>>>   VirtualFuncEthernet0/7/0 rx queue 0
>>>   buffer 0x4c1b21: current data 0, length 138, buffer-pool 0, ref-count 1, 
>>> totlen-nifb 0, trace handle 0x123
>>>ext-hdr-valid 
>>>l4-cksum-computed l4-cksum-correct 
>>>

1 2 3 4 5 6 7 8 >

1 - 100 of 719 matches

Mail list logo