[vpp-dev] #vpp #vnet os_panic for failed barrier timeout

2021-06-23 Thread Bly, Mike via lists.fd.io
We are looking for advise on whether this os_panic() for a barrier timeout has 
anyone looking at it. We see in the forum many instances of type of main thread 
back-trace. For this incedent, referencing the sw_interface_dump API we created 
a lighter oper-get call to simply fetch link state vs. all of the extensive 
information the dump command fetches for each interface. At the time we added 
our new oper-get function,  we overlooked the "is_mp_safe" enablement for dump 
and as such did NOT set it for our new oper-get. The end result is a fairly 
light API that requires barrier support. When this issue occurred the 
configuration was using a single separate worker thread so the API is waiting 
for a barrier count of 1. Interestingly, the BT analysis shows the count value 
was met, which implies some deeper issue. Why did a single worker, with at most 
10s of packets per second workload at the time fail to stall at a barrier 
within the allotted one second timeout value? And, even more fun to answer is 
why we even reached the os_panic call as the BT shows the worker was stalled at 
the barrier. Please refer to GDB analysis at bottom of this email.

This code is based on 19.08. We are in the process of upgrading to 21.01, but 
in review of the forum posts, this type of BT is seen across many versions. 
This is an extremely rare event. We had one occurrence in September of last 
year that we could not reproduce and then just had a second occurrence this 
week. As such, we are not able to reproduce this on demand, let alone in stock 
VPP code given this is a new API.

While we could simply enable is_mp_safe as done for sw_interface_dump to avoid 
the issue, we are troubled at not being able to explain why the os_panic 
occurred in the first place. As such, we are hoping someone might be able to 
provide guidance here on next steps. What additional details from the core-file 
can we provide?


Thread 1 backtrace

#0 __GI_raise (sig=sig@entry=6) at 
/usr/src/debug/glibc/2.30-r0/git/sysdeps/unix/sysv/linux/raise.c:50
#1 0x003cb8425548 in __GI_abort () at 
/usr/src/debug/glibc/2.30-r0/git/stdlib/abort.c:79
#2 0x004075da in os_exit () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vpp/vnet/main.c:379
#3 0x7ff1f5740794 in unix_signal_handler (signum=, 
si=, uc=)
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/unix/main.c:183
#4 
#5 __GI_raise (sig=sig@entry=6) at 
/usr/src/debug/glibc/2.30-r0/git/sysdeps/unix/sysv/linux/raise.c:50
#6 0x003cb8425548 in __GI_abort () at 
/usr/src/debug/glibc/2.30-r0/git/stdlib/abort.c:79
#7 0x00407583 in os_panic () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vpp/vnet/main.c:355
#8 0x7ff1f5728643 in vlib_worker_thread_barrier_sync_int (vm=0x7ff1f575ba40 
, func_name=)
at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/threads.c:1476
#9 0x7ff1f62c6d56 in vl_msg_api_handler_with_vm_node 
(am=am@entry=0x7ff1f62d8d40 , the_msg=0x1300ba738,
vm=vm@entry=0x7ff1f575ba40 , node=node@entry=0x7ff1b588c000)
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibapi/api_shared.c:583
#10 0x7ff1f62b1237 in void_mem_api_handle_msg_i (am=, 
q=, node=0x7ff1b588c000,
vm=0x7ff1f575ba40 ) at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/memory_api.c:712
#11 vl_mem_api_handle_msg_main (vm=vm@entry=0x7ff1f575ba40 , 
node=node@entry=0x7ff1b588c000)
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/memory_api.c:722
#12 0x7ff1f62be713 in vl_api_clnt_process (f=, 
node=, vm=)
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/vlib_api.c:326
#13 vl_api_clnt_process (vm=, node=, f=)
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlibmemory/vlib_api.c:252
#14 0x7ff1f56f90b7 in vlib_process_bootstrap (_a=)
at /usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/main.c:1468
#15 0x7ff1f561f220 in clib_calljmp () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vppinfra/longjmp.S:123
#16 0x7ff1b5e39db0 in ?? ()
#17 0x7ff1f56fc669 in vlib_process_startup (f=0x0, p=0x7ff1b588c000, 
vm=0x7ff1f575ba40 )
at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vppinfra/types.h:133

Thread 3 backtrace

(gdb) thr 3
[Switching to thread 3 (LWP 440)]
#0 vlib_worker_thread_barrier_check () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/threads.h:426
426 ;
(gdb) bt
#0 vlib_worker_thread_barrier_check () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/threads.h:426
#1 vlib_main_or_worker_loop (is_main=0, vm=0x7ff1b6a5e0c0) at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/main.c:1744
#2 vlib_worker_loop (vm=0x7ff1b6a5e0c0) at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vlib/main.c:1934
#3 0x7ff1f561f220 in clib_calljmp () at 
/usr/src/debug/vpp/19.08+gitAUTOINC+6641eb3e8f-r0/git/src/vppinfra/longjmp.S:123

Re: [vpp-dev] MPLS DROP DPO

2021-06-23 Thread Neale Ranns

Hi Mohsen,

You programmed the non-EOS entry, but the packet was EOS. MPLS lookup is really 
a 21 bit lookup; label & EOS-bit.

/neale


From: vpp-dev@lists.fd.io  on behalf of Mohsen Meamarian 
via lists.fd.io 
Date: Wednesday, 23 June 2021 at 09:09
To: vpp-dev@lists.fd.io 
Subject: [vpp-dev] MPLS DROP DPO

Hi friends ,
I set mpls config between 3 hosts . but the middle host vpp dropped the mpls 
packet . I use trace to see the drop error and see MPLS DROP DPO . What could 
be the reason?
I attached two photos from sh trace & sh mpls fib.
thanks.



[cid:image001.png@01D76832.52B6BC80][cid:image002.png@01D76832.52B6BC80]

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19627): https://lists.fd.io/g/vpp-dev/message/19627
Mute This Topic: https://lists.fd.io/mt/83732825/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



#notuseful [Re: [vpp-dev] #vppcom, #vnet #vpp]

2021-06-23 Thread Christian Hopps


People who have to sort through lots of email (like myself) do not open every 
email before marking it read or deleting it. Email with subject lines that give 
no clue to the content are instant deletes for me (and many others I'm sure) -- 
except in this case I decided to respond b/c for some reason I see a lot of 
these '#hashtag' subject lines on mail to the vpp-dev list for some bizarre 
reason.

Chris.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19626): https://lists.fd.io/g/vpp-dev/message/19626
Mute This Topic: https://lists.fd.io/mt/83734441/21656
Mute #notuseful:https://lists.fd.io/g/vpp-dev/mutehashtag/notuseful
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] MPLS DROP DPO

2021-06-23 Thread Mohsen Meamarian
Hi friends ,
I set mpls config between 3 hosts . but the middle host vpp dropped the
mpls packet . I use trace to see the drop error and see MPLS DROP DPO
. What could be the reason?
I attached two photos from sh trace & sh mpls fib.
thanks.



[image: Screenshot (1060).png][image: Screenshot (1059).png]

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19625): https://lists.fd.io/g/vpp-dev/message/19625
Mute This Topic: https://lists.fd.io/mt/83732825/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-