Re: [vpp-dev] vpp 20.01 not properly handle ipv6 router advertisement and neighbor advertisement #vnet #vpp

2020-04-04 Thread Dave Barach via lists.fd.io
Thanks for the report. This seems like a bug, most likely pretty simple to fix. 
As you pointed out, static RA’s should include constant lifetime values.

It will take a bit of time to fix it and release the fix as part of a 20.01 
maintenance release.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of guojx1...@gmail.com
Sent: Saturday, April 4, 2020 11:13 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] vpp 20.01 not properly handle ipv6 router advertisement 
and neighbor advertisement #vnet #vpp

20.01 download from official repository for centos 7
Thanks for your help!
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15988): https://lists.fd.io/g/vpp-dev/message/15988
Mute This Topic: https://lists.fd.io/mt/72765661/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Mute #vnet: https://lists.fd.io/mk?hashtag=vnet=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP nat ipfix logging problem, need to use thread-specific vlib_main_t?

2020-04-05 Thread Dave Barach via lists.fd.io
If you have the thread index handy, that's OK. Otherwise, use vlib_get_main() 
which grabs the thread index from thread local storage. 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Sunday, April 5, 2020 4:58 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP nat ipfix logging problem, need to use thread-specific 
vlib_main_t?

Hello VPP experts,

We have been using VPP for NAT44 for a while and it has been working fine, but 
a few days ago when we tried turing on nat ipfix logging, vpp crashed. It 
turned out that the problem went away if we used only a single thread, so it 
seemed related to how threading was handled in the ipfix logging code. The 
crash happened in different ways on different runs but often seemed related to 
the snat_ipfix_send() function in plugins/nat/nat_ipfix_logging.c.

Having looked at the code in nat_ipfix_logging.c I have the following theory 
about what goes wrong (I might have misunderstood something, if so please 
correct me):

In the the snat_ipfix_send() function, a vlib_main_t data structure is used, a 
pointer to it is fetched in the following way:

   vlib_main_t *vm = frm->vlib_main;

So the frm->vlib_main pointer comes from "frm" which has been set to 
flow_report_main which is a global data structure from vnet/ipfix- 
export/flow_report.c that as far as I can tell only exists once in memory (not 
once per thread). This means that different threads calling the 
snat_ipfix_send() function are using the same vlib_main_t data structure. That 
is not how it should be, I think, instead each thread should be using its own 
thread-specific vlib_main_t data structure.

A suggestion for how to fix this is to replace the line

   vlib_main_t *vm = frm->vlib_main;

with the following line

   vlib_main_t *vm = vlib_mains[thread_index];

in all places where worker threads are using such a vlib_main_t pointer. Using 
vlib_mains[thread_index] means that we are picking the thread-specific 
vlib_main_t data structure for the current thread, instead of all threads using 
the same vlib_main_t. I pushed such a change to gerrit, here: 
https://gerrit.fd.io/r/c/vpp/+/26359

That fix seems to solve the issue in my tests, vpp does not crash anymore after 
the change. Please have a look at it and let me know if this seems reasonable 
or if I have misunderstood something.

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15991): https://lists.fd.io/g/vpp-dev/message/15991
Mute This Topic: https://lists.fd.io/mt/72786912/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP nat ipfix logging problem, need to use thread-specific vlib_main_t?

2020-04-05 Thread Dave Barach via lists.fd.io
The packet generator support per-stream worker placement:

packet-generator new {
name worker1
worker 1
limit 0
rate 1.2e7
size 128-128
tx-interface FortyGigabitEthernet1/0/1
node FortyGigabitEthernet1/0/1-output
data { IP4: 1.2.4 -> 3cfd.fed0.b6c9
   UDP: 192.168.41.10 -> 192.168.51.10
   UDP: 1234 -> 2345
   incrementing 114
}
}



From: vpp-dev@lists.fd.io  On Behalf Of Paul Vinciguerra
Sent: Sunday, April 5, 2020 11:24 AM
To: Dave Barach (dbarach) 
Cc: Elias Rudberg ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP nat ipfix logging problem, need to use 
thread-specific vlib_main_t?

How can we test scenarios like this?
'set interface rx-placement' doesn't support pg interfaces.
DBGvpp# set interface rx-placement TenGigabitEthernet5/0/0 worker 2
DBGvpp# set interface rx-placement pg0 worker 2
set interface rx-placement: not found
DBGvpp#
Is there another command to bind a pg interface to a worker thread?

On Sun, Apr 5, 2020 at 8:08 AM Dave Barach via lists.fd.io<http://lists.fd.io> 
mailto:cisco@lists.fd.io>> wrote:
If you have the thread index handy, that's OK. Otherwise, use vlib_get_main() 
which grabs the thread index from thread local storage.

-Original Message-
From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Elias Rudberg
Sent: Sunday, April 5, 2020 4:58 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] VPP nat ipfix logging problem, need to use thread-specific 
vlib_main_t?

Hello VPP experts,

We have been using VPP for NAT44 for a while and it has been working fine, but 
a few days ago when we tried turing on nat ipfix logging, vpp crashed. It 
turned out that the problem went away if we used only a single thread, so it 
seemed related to how threading was handled in the ipfix logging code. The 
crash happened in different ways on different runs but often seemed related to 
the snat_ipfix_send() function in plugins/nat/nat_ipfix_logging.c.

Having looked at the code in nat_ipfix_logging.c I have the following theory 
about what goes wrong (I might have misunderstood something, if so please 
correct me):

In the the snat_ipfix_send() function, a vlib_main_t data structure is used, a 
pointer to it is fetched in the following way:

   vlib_main_t *vm = frm->vlib_main;

So the frm->vlib_main pointer comes from "frm" which has been set to 
flow_report_main which is a global data structure from vnet/ipfix- 
export/flow_report.c that as far as I can tell only exists once in memory (not 
once per thread). This means that different threads calling the 
snat_ipfix_send() function are using the same vlib_main_t data structure. That 
is not how it should be, I think, instead each thread should be using its own 
thread-specific vlib_main_t data structure.

A suggestion for how to fix this is to replace the line

   vlib_main_t *vm = frm->vlib_main;

with the following line

   vlib_main_t *vm = vlib_mains[thread_index];

in all places where worker threads are using such a vlib_main_t pointer. Using 
vlib_mains[thread_index] means that we are picking the thread-specific 
vlib_main_t data structure for the current thread, instead of all threads using 
the same vlib_main_t. I pushed such a change to gerrit, here: 
https://gerrit.fd.io/r/c/vpp/+/26359

That fix seems to solve the issue in my tests, vpp does not crash anymore after 
the change. Please have a look at it and let me know if this seems reasonable 
or if I have misunderstood something.

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15993): https://lists.fd.io/g/vpp-dev/message/15993
Mute This Topic: https://lists.fd.io/mt/72786912/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] worker barrier state

2020-03-25 Thread Dave Barach via Lists.Fd.Io
Vlib_main_t *vm->main_loop_count.

One trip around the main loop accounts for all per-worker local graph edges / 
acyclic graph behaviors. 

As to the magic number E (not to be confused with e): repeatedly handing off 
packets from thread to thread seems like a bad implementation strategy. The 
packet tracer will tell you how many handoffs are involved in a certain path, 
as will a bit of code inspection.

Neale has some experience with this scenario, maybe he can share some 
thoughts...

HTH... Dave 

-Original Message-
From: Christian Hopps  
Sent: Wednesday, March 25, 2020 1:14 PM
To: Dave Barach (dbarach) 
Cc: Christian Hopps ; dmar...@me.com; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] worker barrier state

I'm not clear on what you mean by table add/del, but I can give you the 
scenario I'm concerned with.

I have a packet P input and it has some state S associated with it.

The API wants to delete state S. When is it safe?

Say P's arc from input to output contains E edges. Each node on the arc could 
conceivably handoff packet P to another worker for processing. So if I read 
things correctly I need to wait at least E laps until I know for sure that P is 
out of the system, and S is safe to delete.

Q: How do I know what value E is?

I am not in control of all nodes along a P's arc and how they might handoff 
packets, and the graph is not acyclic so I couldn't even use a max value like 
the total number of nodes in the graph for E as the packet may loop back.

Q: Which lap counter am I looking at?

As you point out each vlib_main_t has it's own counter (main_loop_count?) so I 
think I have to record every workers main_loop_count in the state S and wait 
for every counter to  be +E before deleting S.

Thanks for the help!

Chris.

> On Mar 25, 2020, at 12:15 PM, Dave Barach (dbarach)  wrote:
> 
> +1. 
>  
> View any metadata subject to table add/del accidents with suspicion. There is 
> a safe delete paradigm: each vlib_main_t has a “lap counter”.  When deleting 
> table entries: atomically update table entries. Record the lap counter and 
> wait until all worker threads have completed a lap. Then, delete (or 
> pool_put) the underlying data structure.
>  
> Dave
>  
>  
> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion 
> via Lists.Fd.Io
> Sent: Wednesday, March 25, 2020 12:10 PM
> To: Christian Hopps 
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] worker barrier state
>  
> 
> 
> > On 25 Mar 2020, at 16:01, Christian Hopps  wrote:
> > 
> > Is it supposed to be the case that no packets are inflight (*) in the graph 
> > when the worker barrier is held?
> > 
> > I think perhaps MP unsafe API code is assuming this.
> > 
> > I also think that the frame queues used by handoff code violate this 
> > assumption.
> > 
> > Can someone with deep VPP knowledge clarify this for me? :)
> 
> 
> correct, there is small chance that frame is enqueued right before worker 
> hits barrier…
> 
> — 
> Damjan
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15874): https://lists.fd.io/g/vpp-dev/message/15874
Mute This Topic: https://lists.fd.io/mt/72542383/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

2020-03-27 Thread Dave Barach via Lists.Fd.Io
We scrape the 32-bit node-runtime counters back into the 64 bit node counters 
when they overflow.

It should take a good long time for a 32-bit counter to overflow – O(1 seconds’ 
worth of node runtime) – we do this to reduce the cache footprint of the 
node-runtime.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of 
nagarajuiit...@gmail.com
Sent: Friday, March 27, 2020 1:59 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi,

I see that clib_cpu_time_now() returns u64 type data.
And we are updating max clocks spent by a node in vlib_node_runtime_t using 
these timestamps.

Why are we using u32 types in vlib_node_runtime_t?
Because of this, it can only hold data within 2 seconds and anything above 32 
bit range overflows in our system.

Really appreciate your help.

Thanks,
Nagaraju
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15896): https://lists.fd.io/g/vpp-dev/message/15896
Mute This Topic: https://lists.fd.io/mt/72581795/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Mute #counters: https://lists.fd.io/mk?hashtag=counters=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] naive questions on VPP memory usage ( does it ever come down )

2020-04-01 Thread Dave Barach via Lists.Fd.Io
Could be. In real life, objects allocated from pools have widely variable 
lifetimes. It would be virtually impossible to unmap less than an entire pool.

If your application would tolerate something akin garbage collection, I suppose 
you could create a fresh pool with a fresh set of indices, and free the 
original.

Personally, I would never go in this direction.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Wednesday, April 1, 2020 8:59 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] naive questions on VPP memory usage ( does it ever come down 
)

Few questions on VPP memory usage.

1. Using pmap -p  i am collecting the total memory usage of vpp 
process at the beginning of my test. ( it is X KB )
2. I ran test for few hours which will obviously have lot of pool_get/pool_put
3. Collected the same same pmap output and the memory usage grown to X+Y
4. Cleared all my sessions which ideally should have cleared all my allocations.
5. I still see the memory usage being stuck at X+Y

I see that only pool_free is doing unmap of the memory, whereas pool_put is not 
doing so.
Could this be the reason why I am seeing memory not coming down.

Thanks & Regards,
Satish


--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15968): https://lists.fd.io/g/vpp-dev/message/15968
Mute This Topic: https://lists.fd.io/mt/72699704/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Naive question about mspace_is_heap_object

2020-04-03 Thread Dave Barach via lists.fd.io
Data segment, .bss segment, stack segment addresses to name a few. Although it 
doesn’t happen every day, it’s almost too easy to free an object which wasn’t 
allocated.

D

From: vpp-dev@lists.fd.io  On Behalf Of 
xiapengli...@gmail.com
Sent: Friday, April 3, 2020 2:57 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Naive question about mspace_is_heap_object

Hi experts,
Curious about what the following lines did in mspace_is_heap_object:

if (pp > ms->least_addr && pp <= ms->least_addr + ms->footprint)

return 1
What kind of memory allocations would fall out of the existing segments of 
mspace as we have disabled mmap_alloc? And what exactly
ms->least_addr + ms->footprint means here?
Btw, just saw least_addr updated during allocation, would 
release_unused_segments make the least_addr out of date and dangeous?

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15981): https://lists.fd.io/g/vpp-dev/message/15981
Mute This Topic: https://lists.fd.io/mt/72743648/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP Crashes When Executing Large Script Using 'exec'

2020-03-25 Thread Dave Barach via Lists.Fd.Io
OK, no need to see the script... Classifier table out of memory... If you’re 
using the “classify table” debug CLI to set up the tables, change (or add) 
“memory-size xxxM” or “memory-size xxxG” to give the classifier enough memory. 
Depending on how many concurrent entries you expect, set the number of buckets 
somewhere between Nconcurrent/2 and Nconcurrent.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Luc Pelletier
Sent: Wednesday, March 25, 2020 9:21 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Crashes When Executing Large Script Using 'exec'

2nd attempt - replying all. Dave - Apologies for the duplicate response.

Thanks for your response. You're right -- I should have provided more details. 
My script is trying to set up a large numbers of IPs to block using 
classifiers. I now have a backtrace as well which indicates that it seems to 
run out of memory when creating classifier sessions. Maybe I'm not using 
classifiers correctly, it's been difficult to find documentation on how to use 
that feature. I'd be grateful for any tips or help you can provide. Thanks in 
advance.

Here's the backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7fd22f118801 in __GI_abort () at abort.c:79
#2  0x55be59a05ca3 in os_panic () at /usr/src/vpp/src/vpp/vnet/main.c:355
#3  0x7fd230484cf5 in clib_mem_alloc_aligned_at_offset 
(os_out_of_memory_on_failure=1, align_offset=0, align=64, size=) 
at /usr/src/vpp/src/vppinfra/mem.h:143
#4  clib_mem_alloc_aligned (align=64, size=) at 
/usr/src/vpp/src/vppinfra/mem.h:163
#5  vnet_classify_entry_alloc (t=t@entry=0x7fd1ef4ddd80, 
log2_pages=log2_pages@entry=13) at 
/usr/src/vpp/src/vnet/classify/vnet_classify.c:210
#6  0x7fd23048a224 in split_and_rehash (t=t@entry=0x7fd1ef4ddd80, 
old_values=old_values@entry=0x7fd1c1a99600, 
old_log2_pages=old_log2_pages@entry=11, new_log2_pages=new_log2_pages@entry=13)
at /usr/src/vpp/src/vnet/classify/vnet_classify.c:299
#7  0x7fd23048ae78 in vnet_classify_add_del (t=t@entry=0x7fd1ef4ddd80, 
add_v=add_v@entry=0x7fd1ef793a50, is_add=is_add@entry=1) at 
/usr/src/vpp/src/vnet/classify/vnet_classify.c:576
#8  0x7fd23048b73b in vnet_classify_add_del_session 
(cm=cm@entry=0x7fd230bda1a0 , table_index=, 
match=0x7fd1ef7b5140 "", hit_next_index=,
opaque_index=, advance=, action=, metadata=, is_add=) at 
/usr/src/vpp/src/vnet/classify/vnet_classify.c:2706
#9  0x7fd23048dfad in classify_session_command_fn (vm=, 
input=0x7fd1ef793d10, cmd=) at 
/usr/src/vpp/src/vnet/classify/vnet_classify.c:2790
#10 0x7fd22f9d9a3e in vlib_cli_dispatch_sub_commands 
(vm=vm@entry=0x7fd22fc58380 , cm=cm@entry=0x7fd22fc585b0 
, input=input@entry=0x7fd1ef793d10,
parent_command_index=) at /usr/src/vpp/src/vlib/cli.c:568
#11 0x7fd22f9da1f3 in vlib_cli_dispatch_sub_commands 
(vm=vm@entry=0x7fd22fc58380 , cm=cm@entry=0x7fd22fc585b0 
, input=input@entry=0x7fd1ef793d10,
parent_command_index=parent_command_index@entry=0) at 
/usr/src/vpp/src/vlib/cli.c:528
#12 0x7fd22f9da475 in vlib_cli_input (vm=vm@entry=0x7fd22fc58380 
, input=input@entry=0x7fd1ef793d10, 
function=function@entry=0x0, function_arg=function_arg@entry=0)
at /usr/src/vpp/src/vlib/cli.c:667
#13 0x7fd22fa31999 in unix_cli_exec (vm=0x7fd22fc58380 , 
input=, cmd=) at 
/usr/src/vpp/src/vlib/unix/cli.c:3327
#14 0x7fd22f9d9a3e in vlib_cli_dispatch_sub_commands 
(vm=vm@entry=0x7fd22fc58380 , cm=cm@entry=0x7fd22fc585b0 
, input=input@entry=0x7fd1ef793f60,
parent_command_index=parent_command_index@entry=0) at 
/usr/src/vpp/src/vlib/cli.c:568
#15 0x7fd22f9da475 in vlib_cli_input (vm=0x7fd22fc58380 , 
input=input@entry=0x7fd1ef793f60, function=function@entry=0x7fd22fa34bf0 
,
function_arg=function_arg@entry=0) at /usr/src/vpp/src/vlib/cli.c:667
#16 0x7fd22fa37cf6 in unix_cli_process_input (cm=0x7fd22fc58de0 
, cli_file_index=0) at /usr/src/vpp/src/vlib/unix/cli.c:2572
#17 unix_cli_process (vm=0x7fd22fc58380 , rt=0x7fd1ef753000, 
f=) at /usr/src/vpp/src/vlib/unix/cli.c:2688
#18 0x7fd22f9f2c36 in vlib_process_bootstrap (_a=) at 
/usr/src/vpp/src/vlib/main.c:1475
#19 0x7fd22f4f3bb4 in clib_calljmp () from 
/usr/lib/x86_64-linux-gnu/libvppinfra.so.20.01
#20 0x7fd1eea76b30 in ?? ()
#21 0x7fd22f9f8041 in vlib_process_startup (f=0x0, p=0x7fd1ef753000, 
vm=0x7fd22fc58380 ) at /usr/src/vpp/src/vlib/main.c:1497
#22 dispatch_process (vm=0x7fd22fc58380 , p=0x7fd1ef753000, 
last_time_stamp=0, f=0x0) at /usr/src/vpp/src/vlib/main.c:1542

And here's part of the script (I've eliminated a lot of the lines that are 
duplicated) -- please note IPs below are completely random as I'm only at the 
stage where I'm trying things out:

classify table mask l3 ip4 src
classify table mask l3 ip4 dst
classify session hit-next 0 table-index 0 match l3 ip4 src 174.121.118.15
classify session hit-next 0 table-index 1 match l3 ip4 dst 174.121.118.15
classify session hit-next 0 

Re: [vpp-dev] worker barrier state

2020-03-25 Thread Dave Barach via Lists.Fd.Io
+1.

View any metadata subject to table add/del accidents with suspicion. There is a 
safe delete paradigm: each vlib_main_t has a “lap counter”.  When deleting 
table entries: atomically update table entries. Record the lap counter and wait 
until all worker threads have completed a lap. Then, delete (or pool_put) the 
underlying data structure.

Dave


From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Wednesday, March 25, 2020 12:10 PM
To: Christian Hopps 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] worker barrier state



> On 25 Mar 2020, at 16:01, Christian Hopps 
> mailto:cho...@chopps.org>> wrote:
>
> Is it supposed to be the case that no packets are inflight (*) in the graph 
> when the worker barrier is held?
>
> I think perhaps MP unsafe API code is assuming this.
>
> I also think that the frame queues used by handoff code violate this 
> assumption.
>
> Can someone with deep VPP knowledge clarify this for me? :)


correct, there is small chance that frame is enqueued right before worker hits 
barrier…

—
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15870): https://lists.fd.io/g/vpp-dev/message/15870
Mute This Topic: https://lists.fd.io/mt/72542383/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

2020-03-27 Thread Dave Barach via Lists.Fd.Io
Running for 2.5 seconds in a single graph node is certainly a problem.

See if you can use the vppinfra “file” infrastructure for non-blocking file 
I/O. Search for clib_file_add(...). You can use a VLIB_NODE_TYPE_PROCESS node, 
suspend the process node after a realistic amount of runtime [measured in 
microseconds!], and cause the process to resume by sending it an event from a 
clib file “read_ready” callback.

To see how long a given graph node invocation runs, you can use the vpp event 
logger. It will deal with nodes which run for a year, but only if you’re 
willing to wait that long. ...

“elog trace dispatch”
“event-logger save ”

Refer to this page to see how to view the resulting event log file: 
https://fd.io/docs/vpp/master/gettingstarted/developers/eventviewer.html

HTH... Dave

From: Nagaraju Vemuri 
Sent: Friday, March 27, 2020 4:41 PM
To: Dave Barach (dbarach) 
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

We are registering a node, where we do little heavy file operations and it is 
taking 2.5sec or so.
We are going to fix it.

This infra is supposed to catch any such wrongly coded node.

If max_clock logic is wrong, we will never catch nodes that (are incorrectly 
coded) take more than 2 seconds.
As per current code, we are always storing max_clock from 32 bit into 64 bit, 
so effectively it is always within 32 bit value.

My suggestion would solve the infra issue and catch such culprits?

Thanks,
Nagaraju

On Fri, Mar 27, 2020 at 1:32 PM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
The assumption in step 1 won’t hold true in real life. I’m not claiming that 
it’s “impossible,” just that it will not happen with anything like 
properly-coded node dispatch functions.

A full dispatch circuit across multiple graph nodes - with full frames - 
doesn’t exceed 1ms or so. None of the individual nodes run for anywhere near 
that long.

HTH... Dave

From: Nagaraju Vemuri 
mailto:nagarajuiit...@gmail.com>>
Sent: Friday, March 27, 2020 3:09 PM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi Dave,

I saw that code. Please excuse me if I'm wrong with below issue:
Step #1: vlib_node_runtime_update_stats() receives n_clocks (uword), assume 
this n_clocks has value bigger than 32 bit can hold.
Step #2: We are setting node->max_clock (note that here max_clock already 
overflowed as n_clocks is big value)
Step #3: Now we call vlib_node_runtime_sync_stats() and pass node as argument 
along with n_clocks.
Step #4: In vlib_node_runtime_sync_stats(), we set n->stats_total.max_clock = 
r->max_clock; from node(32 bit field) which already overflowed.

I think we should have the logic like below inside 
vlib_node_runtime_sync_stats().
n->stats_total.max_clock = n_clocks > n->stats_total.max_clock ? n_clocks : 
n->stats_total.max_clock;   # all variables are 64 bit here

Thanks,
Nagaraju

On Fri, Mar 27, 2020 at 11:11 AM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
vlib_node_runtime_update_stats(...)

From: Nagaraju Vemuri 
mailto:nagarajuiit...@gmail.com>>
Sent: Friday, March 27, 2020 12:54 PM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Thanks for your reply Dave.

Can you please point me to the code where we copy into 64 bit counters when 32 
bit overflows?

When we issue the command "show runtime max", do we print the data from 64 bit 
counters?

I made max_clock variable 64 bit type locally to experiment.
And found different output from "show runtime max".
Max clock was 7.49e9 with 64 bit field, whereas it was within 4e9(because of 32 
bit limitation) with 32 bit field.

Regards,
Nagaraju

On Fri, Mar 27, 2020 at 4:52 AM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
We scrape the 32-bit node-runtime counters back into the 64 bit node counters 
when they overflow.

It should take a good long time for a 32-bit counter to overflow – O(1 seconds’ 
worth of node runtime) – we do this to reduce the cache footprint of the 
node-runtime.

HTH... Dave

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of 
nagarajuiit...@gmail.com
Sent: Friday, March 27, 2020 1:59 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi,

I see that clib_cpu_time_now() returns u64 type data.
And we are updating max clocks spent by a node in vlib_node_runtime_t using 
these timestamps.

Why are we using u32 types in vlib_node_runtime_t?
Because of this, it can only hold data within 2 seconds and anything above 32 
bit range overflows in our system.

Really appreciate your help.

Thanks,
Nagaraju


--
Thanks,
Nagaraju Vemuri


--
Thanks,
Nagaraju Vemuri


--

Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

2020-03-27 Thread Dave Barach via Lists.Fd.Io
vlib_node_runtime_update_stats(...)

From: Nagaraju Vemuri 
Sent: Friday, March 27, 2020 12:54 PM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Thanks for your reply Dave.

Can you please point me to the code where we copy into 64 bit counters when 32 
bit overflows?

When we issue the command "show runtime max", do we print the data from 64 bit 
counters?

I made max_clock variable 64 bit type locally to experiment.
And found different output from "show runtime max".
Max clock was 7.49e9 with 64 bit field, whereas it was within 4e9(because of 32 
bit limitation) with 32 bit field.

Regards,
Nagaraju

On Fri, Mar 27, 2020 at 4:52 AM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
We scrape the 32-bit node-runtime counters back into the 64 bit node counters 
when they overflow.

It should take a good long time for a 32-bit counter to overflow – O(1 seconds’ 
worth of node runtime) – we do this to reduce the cache footprint of the 
node-runtime.

HTH... Dave

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of 
nagarajuiit...@gmail.com
Sent: Friday, March 27, 2020 1:59 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] clocks stored in vlib_node_runtime_t #vpp #counters

Hi,

I see that clib_cpu_time_now() returns u64 type data.
And we are updating max clocks spent by a node in vlib_node_runtime_t using 
these timestamps.

Why are we using u32 types in vlib_node_runtime_t?
Because of this, it can only hold data within 2 seconds and anything above 32 
bit range overflows in our system.

Really appreciate your help.

Thanks,
Nagaraju


--
Thanks,
Nagaraju Vemuri
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15906): https://lists.fd.io/g/vpp-dev/message/15906
Mute This Topic: https://lists.fd.io/mt/72581795/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Mute #counters: https://lists.fd.io/mk?hashtag=counters=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] n_vectors...

2020-03-30 Thread Dave Barach via Lists.Fd.Io
Hmmm, yeah. Been at this for years, I can’t really remember when we settled on 
e.g. n_vectors vs. n_vector_elts or some such.

In new code, it’s perfectly fair to use whatever names seem fit for purpose.

Vlib would be happy doing image processing, or any other kind of vector 
processing. There’s no law which says that frames need to have 32-bit elements. 
Each node decides.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Christian Hopps
Sent: Monday, March 30, 2020 8:07 PM
To: vpp-dev 
Cc: Christian Hopps 
Subject: [vpp-dev] n_vectors...

Something has always bothered me about my understanding of VPPs use of the term 
"vector" and "vectors". When I think of Vector Packet Processing I think of 
processing a vector (array) of packets in a single call to a node. The code, 
though, then seems to refer to the individual packets as "vectors" when it uses 
field names like "n_vectors" to refer to the number of buffers in a frame, or 
when "show runtime" talks about "vectors per call", when I think it's really 
talking about "packets/buffers per call" (and my mind wants to think that it's 
always *1* vector/frame of packets per call by design).

I find this confusing, and so I thought I'd ask if there was some meaning here 
I'm missing?

Thanks,
Chris.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15951): https://lists.fd.io/g/vpp-dev/message/15951
Mute This Topic: https://lists.fd.io/mt/72667316/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] n_vectors...

2020-03-31 Thread Dave Barach via Lists.Fd.Io
We should not rename variables in existing codes unless we're rewriting from 
scratch. It's already hard enough to cherry-pick bugfixes from master/latest to 
stable/2001 and stable/1908.  

Making wholesale variable name changes turns every subsequent cherry-pick into 
an adventure. In the interests of not breaking release-train software, we 
really can't go there.

I wish that the available tooling would allow us the freedom you seek, but they 
do not.

Dave

P.S. mapping "n_vectors" to whatever it means to you seems like a pretty 
minimal entry barrier. It's not like the code is inconsistent.

-Original Message-
From: Elias Rudberg  
Sent: Tuesday, March 31, 2020 9:21 AM
To: Dave Barach (dbarach) ; cho...@chopps.org; 
vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] n_vectors...

Hi Chris and Dave,

Thanks for bringing this up, and thanks for explaining!

I agree with Chris that this is confusing, it makes it much more difficult to 
understand the code.

Perhaps this is the kind of thing that doesn't matter much to those who are 
already familiar with the code, while at the same time it matters a lot for 
newcomers. If you want to lower the threshold for new people to be able to come 
in and understand the code and possibly contribute, then I think it would be a 
good idea to fix this even if it means changing many lines of code. It could be 
argued that the fact that "n_vectors" exists in so many places makes it even 
more important to have a reasonable name for it. One way could be to start with 
renaming things in some of the main data structures like those in vlib/node.h 
and vlib/threads.h and such places, and the changes the compiler will force as 
a result of that.

Best regards,
Elias


On Tue, 2020-03-31 at 00:45 +0000, Dave Barach via Lists.Fd.Io wrote:
> Hmmm, yeah. Been at this for years, I can’t really remember when we 
> settled on e.g. n_vectors vs. n_vector_elts or some such.
>  
> In new code, it’s perfectly fair to use whatever names seem fit for 
> purpose.
>  
> Vlib would be happy doing image processing, or any other kind of 
> vector processing. There’s no law which says that frames need to have 
> 32-bit elements. Each node decides.
>  
> FWIW... Dave
>  
> From: vpp-dev@lists.fd.io  On Behalf Of Christian 
> Hopps
> Sent: Monday, March 30, 2020 8:07 PM
> To: vpp-dev 
> Cc: Christian Hopps 
> Subject: [vpp-dev] n_vectors...
>  
> Something has always bothered me about my understanding of VPPs use of 
> the term "vector" and "vectors". When I think of Vector Packet 
> Processing I think of processing a vector (array) of packets in a 
> single call to a node. The code, though, then seems to refer to the 
> individual packets as "vectors" when it uses field names like 
> "n_vectors" to refer to the number of buffers in a frame, or when 
> "show runtime" talks about "vectors per call", when I think it's 
> really talking about "packets/buffers per call" (and my mind wants to 
> think that it's always *1* vector/frame of packets per call by 
> design).
> 
> I find this confusing, and so I thought I'd ask if there was some 
> meaning here I'm missing?
> 
> Thanks,
> Chris.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15957): https://lists.fd.io/g/vpp-dev/message/15957
Mute This Topic: https://lists.fd.io/mt/72667316/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP Performance data

2020-03-30 Thread Dave Barach via Lists.Fd.Io
If you care about master/latest, also check out 
https://docs.fd.io/csit/master/trending/

From: vpp-dev@lists.fd.io  On Behalf Of Jerome Tollet via 
Lists.Fd.Io
Sent: Monday, March 30, 2020 2:40 PM
To: Majumdar, Kausik ; vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Performance data

Hello,
You should have a look at https://docs.fd.io/csit/rls2001/report/
Jerome

De : mailto:vpp-dev@lists.fd.io>> au nom de "Majumdar, 
Kausik" mailto:kausik.majum...@commscope.com>>
Date : lundi 30 mars 2020 à 19:47
À : "vpp-dev@lists.fd.io" 
mailto:vpp-dev@lists.fd.io>>
Objet : [vpp-dev] VPP Performance data


Hi,

Can someone please share the VPP performance measurement data in different 
packet sizes for plain vanilla IPv4, IPv6, and tunnel encap/decap for IPSec, 
VxLAN cases. Also do we have VPP packet forwarding data with at least one 
Service VNF in the same host or in a remote host, where traffic is service 
chained to run the services.

I am considering the below topologies to get some performance data -


1.   Tgen --> IPv4/IPv6 --> Host1 (VPP1)  
Host2 (VPP2) --> IPv4/IPv6 --> Tgen



2.   Tgen --> IPv4/IPv6 --> Host1 (VPP1)  
Host2 (VPP2) --> Host2 VNF (Service VM) --> Host2 (VPP2) --> IPv4/IPv6 --> Tgen



3.   Tgen --> IPv4/IPv6 --> Host1 (VPP1)  
Host2 (VPP2) --> Host3 VNF (Service VM) --> Host2 (VPP2) --> IPv4/IPv6 --> Tgen


I am assuming some analysis already performed in the above scenarios with the 
Number of CPUs, Cores, SR-IOV for VNF forwarding, or OVS in Kernel for bridging 
to VNF.  If we have some pointer of these data for VPP 20.01 release that would 
be great.

Thanks,
Kausik
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15943): https://lists.fd.io/g/vpp-dev/message/15943
Mute This Topic: https://lists.fd.io/mt/72658703/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] perfmon plugin #vpp

2020-03-30 Thread Dave Barach via Lists.Fd.Io
It’s an on-demand performance monitor controlled by a set of debug CLI 
commands. It doesn’t anything unless enabled, and it certainly won’t hurt 
anything to disable it.

Disable the plugin in /etc/vpp/startup.conf if you like.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Nagaraju Vemuri
Sent: Monday, March 30, 2020 3:40 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] perfmon plugin #vpp

Hi,

I want to check if perfmon is needed in vpp.
Who uses it? or How to use perfmon plugin exactly?

Is it safe to remove perfmon plugin from vpp, if we decide to not use it?

Thanks,
Nagaraju
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15945): https://lists.fd.io/g/vpp-dev/message/15945
Mute This Topic: https://lists.fd.io/mt/72661675/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] perfmon plugin #vpp

2020-03-30 Thread Dave Barach via Lists.Fd.Io
Cheat sheet:

“show pmc events”

Start traffic

“set pmc events ”
[example: “set pmc events instructions-per-clock”]



Show results:
“show pmc [verbose]”

Clear counters:
“clear pmc”

HTH... Dave

From: Nagaraju Vemuri 
Sent: Monday, March 30, 2020 4:12 PM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] perfmon plugin #vpp

Thanks Dave.

Can you please point me to any document to refer to CLI commands associated 
with perfmon.


On Mon, Mar 30, 2020 at 12:51 PM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
It’s an on-demand performance monitor controlled by a set of debug CLI 
commands. It doesn’t anything unless enabled, and it certainly won’t hurt 
anything to disable it.

Disable the plugin in /etc/vpp/startup.conf if you like.

Dave

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Nagaraju Vemuri
Sent: Monday, March 30, 2020 3:40 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] perfmon plugin #vpp

Hi,

I want to check if perfmon is needed in vpp.
Who uses it? or How to use perfmon plugin exactly?

Is it safe to remove perfmon plugin from vpp, if we decide to not use it?

Thanks,
Nagaraju


--
Thanks,
Nagaraju Vemuri
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15948): https://lists.fd.io/g/vpp-dev/message/15948
Mute This Topic: https://lists.fd.io/mt/72661675/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] clib_waring is not printing

2020-03-26 Thread Dave Barach via Lists.Fd.Io
“show log” – plugin loader spew redirected to the log.

From: vpp-dev@lists.fd.io  On Behalf Of 
mythosmonkeyk...@163.com
Sent: Thursday, March 26, 2020 6:53 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] clib_waring is not printing

[cid:image001.png@01D60341.17489A80]


Why is there no log output when the plugin is loaded?
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15880): https://lists.fd.io/g/vpp-dev/message/15880
Mute This Topic: https://lists.fd.io/mt/71673585/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP Crashes When Executing Large Script Using 'exec'

2020-03-25 Thread Dave Barach via Lists.Fd.Io
How about: send a backtrace (preferably from a debug image), and put the script 
somewhere so that we can work the problem?

From: vpp-dev@lists.fd.io  On Behalf Of Luc Pelletier
Sent: Wednesday, March 25, 2020 7:53 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP Crashes When Executing Large Script Using 'exec'

Hi all,

I have a large script (2000 lines, 146,459 bytes) that I'm trying to execute 
using 'exec' in vppctl. When I copy+paste commands from the script, it works 
fine. However, if I try to execute the script with 'exec 
/path/to/myscript.txt', VPP crashes.

I'm running VPP v20.01 on Ubuntu 18.04 on Azure.

Any suggestions?

Thanks

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15860): https://lists.fd.io/g/vpp-dev/message/15860
Mute This Topic: https://lists.fd.io/mt/72538712/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] wireshark log capture with "any " option #vpp

2020-04-28 Thread Dave Barach via lists.fd.io
+1, with two additional notes. 

18.01 is over two years old. It's not supported anymore. If you absolutely must 
go there, "Use the Force and Read the Source..."

Since vpp 18.01 was released, I've rewritten the pcap trace cli. Ben is 
probably right, but that's about all I can say at this point.

FWIW... Dave 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Benoit Ganne 
(bganne) via lists.fd.io
Sent: Tuesday, April 28, 2020 12:09 PM
To: Deepak NC ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] wireshark log capture with "any " option #vpp

Did you try to "pcap rx off"? I think the capture is dumped only when turned 
off.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Deepak NC
> Sent: mardi 28 avril 2020 17:47
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] wireshark log capture with "any " option #vpp
> 
> Hi,
> 
> 
> 
> For packet capture the “pcap rx trace on max  any” option not working.
> 
> https://docs.fd.io/vpp/17.04/clicmd_src_vnet_devices_dpdk.html
> 
> 
> pcap rx trace [on|off] [max ] [intfc |any] [file ]
> 
> 
> 
> VPP version used is: vVPP_18.01
> 
> Interface:
> 
>   Name   IdxState  MTU (L3/IP4/IP6/MPLS)
> Counter  Count
> GigabitEthernet0/4/0  1  up  9000/0/0/0 rx
> packets541797
> rx
> bytes87675422
> tx
> packets   7610998
> tx
> bytes 10239458226
> drops
> 14828
> ip4
> 524859
> tx-
> error4088
> GigabitEthernet0/5/0  2  up  9000/0/0/0 rx
> packets  29725455
> rx
> bytes 24793775133
> tx
> packets  15574677
> tx
> bytes  4802873550
> drops
> 38505
> ip4
> 29711897
> tx-
> error   37324
> local00 down  0/0/0/0   drops
> 152
> 
> 
> 
> Please let me know why I am seeing this issue?
> 
> 
> 
> Thanks and regards
> 
> Deepak
> 
> 
> 
> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16179): https://lists.fd.io/g/vpp-dev/message/16179
Mute This Topic: https://lists.fd.io/mt/73329885/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Regarding lookup in vppinfra/hash.c

2020-04-23 Thread Dave Barach via lists.fd.io
Ack. If all else fails, suggest a quick peek at the assembly code.

From: vpp-dev@lists.fd.io  On Behalf Of Peng Xia
Sent: Thursday, April 23, 2020 12:15 AM
To: Peng Xia 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Regarding lookup in vppinfra/hash.c

Sorry, I was confused with the later lines. Seems they have the same effect.

Peng Xia via lists.fd.io 
mailto:gmail@lists.fd.io>> 
于2020年4月21日周二 下午9:50写道:
Hi experts, in vppinfra/hash.c function lookup()
line number 568:
clib_memcpy_fast (old_value, p->direct.value,
should it be clib_memcpy_fast (old_value, >direct.value,
?

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16148): https://lists.fd.io/g/vpp-dev/message/16148
Mute This Topic: https://lists.fd.io/mt/73212642/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP + iOAM + DPDK + eBPF and packet samples #vpp

2020-04-22 Thread Dave Barach via lists.fd.io
Adding a feature arc node e.g. to the device input feature arc would allow you 
to classify and (optionally) generate ipfix records [or whatever].

Depending on the size of the records involved, the ratio of classifier hits to 
classifier misses, the required PPS to avoid dropping traffic, number of 
threads, etc. you might be able to upload telemetry for nearly every (matching) 
packet.

Because of the number and complexity of the internal APIs involved, it will 
take time to fabricate and test all of the moving parts.

FWIW... Dave

P.S. Integration with eBPF defeats the purpose of user-mode networking, and 
would likely reduce system throughput to the point where you would wish you 
hadn’t thought of it. Go ahead if you like but you’re on your own.

From: vpp-dev@lists.fd.io  On Behalf Of mauricio.solisjr 
via lists.fd.io
Sent: Wednesday, April 22, 2020 7:47 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP + iOAM + DPDK + eBPF and packet samples #vpp

Hi,
I'm wondering if FDio has the following capabilities, I've also added my 
current understanding of each:

I'm working on obtaining telemetry information for a certain type of traffic, 
but due to computation power, we'd like to add telemetry to every 1000th 
packet.  I understand that we can add telemetry for traffic based on traffic 
class, flow label, source/destination addr, or protocol.  I believe you can 
also delete classifications at runtime, so I think one way this can be done is 
if we add a classifying table + session for a given traffic flow and remove it 
every so often.

Another idea is, if it possible to move classification to an eBPF before 
getting to the VPP.  I don't think this is possible since vpp bypasses the 
kernel, but maybe it we can add our own hooks to vpp???

Does anyone have an example/tutorial for iOAM with IPv4 packets? I know IPv4 
needs to be encapsulated, but I have not found any examples for iOAM + IPv4 + 
GRE

Thanks,
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16143): https://lists.fd.io/g/vpp-dev/message/16143
Mute This Topic: https://lists.fd.io/mt/73193360/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] accessing pool entries in gdb

2020-05-04 Thread Dave Barach via lists.fd.io
In [live, not core-file] gdb, try this:

(gdb) p pifi(pool, )

Which will tell you if  is free (invalid) or not (valid).

Also:

(gdb) p pool_elts(pool)

To see how many elements are in the pool.

Finally:

(gdb) p vl(pool)

To what vec_len(pool) is.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Monday, May 4, 2020 3:13 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] accessing pool entries in gdb

Having some issue while accessing entries of a pool in GDB.

I have a pool of some structures.

custom_struct *pool;
This custom_struct has alignment with 64 byte cache_line.

now, I have added 3 entries in this pool. The code seems to be working fine in 
adding/deleting/traversing this pool using pool_elt_at_index.

However, If i use gdb to view the elements using pool[0], pool[1] and pool[2], 
they are giving some invalid entries.
The first entry pool[0] seem to be fine, but the next entry onwards are showing 
invalid entries in GDB.

Is there anything wrong i am doing in gdb while traversing the pools ?

--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16233): https://lists.fd.io/g/vpp-dev/message/16233
Mute This Topic: https://lists.fd.io/mt/73982726/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] wireshark log capture with "any " option #vpp

2020-04-29 Thread Dave Barach via lists.fd.io
For now, we support 19.08 (LTS), and 20.01.

20.05 will be released next month.

We do not support 19.01.

Thanks... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Deepak NC
Sent: Wednesday, April 29, 2020 12:44 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] wireshark log capture with "any " option #vpp

Hi Dave,

We will upgrade to vpp version 19.01 and try same commands.
As we see this is stable version used by many.

Thanks and Regards
Deepak
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16202): https://lists.fd.io/g/vpp-dev/message/16202
Mute This Topic: https://lists.fd.io/mt/73329885/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] min_log2 abuse

2020-05-12 Thread Dave Barach via lists.fd.io
Thanks for the patch. Anyway your code reads much more easily than the 
original. [Jenkins is screwed up at the moment, Dave Wallace is working on it.]

We could do something like this to track down min_log2(0) calls:

#if defined (count_leading_zeros)
always_inline uword
min_log2 (uword x)
{
  uword n;

#if CLIB_DEBUG > 0
  if (x == 0) abort();
#endif
  n = count_leading_zeros (x);
  return BITS (uword) - n - 1;
}
#else
... obvious variation ...
#undef _

From: vpp-dev@lists.fd.io  On Behalf Of Andreas Schultz
Sent: Tuesday, May 12, 2020 8:51 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev 
Subject: Re: [vpp-dev] min_log2 abuse



Am Di., 12. Mai 2020 um 13:17 Uhr schrieb Dave Barach (dbarach) 
mailto:dbar...@cisco.com>>:
Dear Andreas,

Do you have a handy list of places which convert netmasks to lengths? 
Regardless of what one might do with min_log2, we ought to clean up those 
places in time for the 20.05 release (if possible).

Unfortunately not.

This is the code that caused problems for us: 
https://gerrit.fd.io/r/c/vpp/+/27016

Grepping through the code shows additional places that appear to have the 
potential to pass a 0 to one of the log2 functions. But we haven't investigated 
any of them closer.

I was trying to add an ASSERT (x != 0) to the log2 functions. But those 
functions are defined in clib.h, and that header is required by 
error_bootstrap.h that defines ASSERTs. The resulting include dependency loop 
is only solvable by moving the log2 functions out of clib.h

Andreas

Dave

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Andreas Schultz
Sent: Tuesday, May 12, 2020 5:42 AM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] min_log2 abuse

Hi,

There are few places in VPP (most notable when trying to convert a netmask into 
a length) that can pass 0 (zero) into min_log2 and expect to get a meaningful 
result.

Obviously log2(0) is undefined. It turns out that this also applies to the 
return of min_log2. Under the hood the function uses __builtin_clzl, and the 
return of that function is also undefined for input 0.

 __builtin_clzl  could be replaced with __builtin_ia32_lzcnt_u64 on supported 
CPUs to avoid the undefined behaviour. That would still not fix the problem 
that passing 0 into a log function is broken by design.

Any comments?
Andreas

--

Andreas Schultz


--

Andreas Schultz

--

Principal Engineer

t: +49 391 819099-224

--- enabling your networks 
-

Travelping GmbH
Roentgenstraße 13
39108 Magdeburg
Germany

t: +49 391 819099-0
f: +49 391 819099-299

e: i...@travelping.com
w: https://www.travelping.com/
Company registration: Amtsgericht Stendal
Geschaeftsfuehrer: Holger Winkelmann
Reg. No.: HRB 10578
VAT ID: DE236673780

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16341): https://lists.fd.io/g/vpp-dev/message/16341
Mute This Topic: https://lists.fd.io/mt/74155196/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] min_log2 abuse

2020-05-12 Thread Dave Barach via lists.fd.io
Dear Andreas,

Do you have a handy list of places which convert netmasks to lengths? 
Regardless of what one might do with min_log2, we ought to clean up those 
places in time for the 20.05 release (if possible).

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Andreas Schultz
Sent: Tuesday, May 12, 2020 5:42 AM
To: vpp-dev 
Subject: [vpp-dev] min_log2 abuse

Hi,

There are few places in VPP (most notable when trying to convert a netmask into 
a length) that can pass 0 (zero) into min_log2 and expect to get a meaningful 
result.

Obviously log2(0) is undefined. It turns out that this also applies to the 
return of min_log2. Under the hood the function uses __builtin_clzl, and the 
return of that function is also undefined for input 0.

 __builtin_clzl  could be replaced with __builtin_ia32_lzcnt_u64 on supported 
CPUs to avoid the undefined behaviour. That would still not fix the problem 
that passing 0 into a log function is broken by design.

Any comments?
Andreas

--

Andreas Schultz
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16326): https://lists.fd.io/g/vpp-dev/message/16326
Mute This Topic: https://lists.fd.io/mt/74155196/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Handshake procedure

2020-05-05 Thread Dave Barach via lists.fd.io
As an aside, please be very careful to respect the source code licenses 
involved.

Fd.io vpp code carries an Apache-2 license. GPL-licensed code must not be used.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Tuesday, May 5, 2020 8:57 AM
To: Artem Glazychev 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Handshake procedure

>
> On 5 May 2020, at 13:45, Artem Glazychev 
> mailto:artem.glazyc...@xored.com>> wrote:
>
> Hello.
>
> I'm want to make plugin for Wireguard tunnel.
> Wireguard has a handshake procedure. It occurs every few minutes.
> I have a question: is it possible to make this procedure in a vpp plugin? Can 
> i see similar examples/ideas somewhere?

Yes, look at ikev2 plugin.


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16241): https://lists.fd.io/g/vpp-dev/message/16241
Mute This Topic: https://lists.fd.io/mt/73995600/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] per node error codes limit

2020-05-11 Thread Dave Barach via lists.fd.io
You’re running old software. In future please include the vpp version so we 
know what we’re dealing with.

I fixed this issue last July, it does not exist in either 19.08 (LTS) or 20.01.

IIRC you may be able to cherrypick / backport the fix to an earlier release.

HTH... Dave

Author: Dave Barach   2019-07-23 10:22:31
Committer: Florin Coras   2019-07-23 13:02:04
Parent: 3b7261978ee4ffdc1e92336e708ae05e2be25f71 (udp: fix connection flags)
Child:  60183db3a8b25714539882cca05ba3b9e9e54489 (session: reorganize dispatch 
logic)
Branches: master, remotes/origin/hgw, remotes/origin/master, 
remotes/origin/pump, remotes/origin/stable/1908, remotes/origin/stable/2001
Follows: v19.08-rc0
Precedes: v19.08-rc1, v20.01-rc0

vlib: address vlib_error_t scaling issue

Encoding the vpp node index into the vlib_error_t as a 10-bit quantity
limits us to 1K graph nodes. Unfortunately, a few nodes need 6 bit
per-node error codes. Only a very few nodes have so many counters.

It turns out that there are about 2K total error counters in the system,
which is (approximately) the maximum error heap index.

The current (index,code) encoding limits the number of interfaces to
around 250, since each interface has two associated graph nodes and we
have about 500 "normal, interior" graph node

This patch adds an error-index to node-index map, so we can store
error heap indices directly in the vlib_buffer_t.

Type: refactor

Change-Id: I28101cad3d8750819e27b8785fc0cf71ff54f79a
Signed-off-by: Dave Barach 

From: vpp-dev@lists.fd.io  On Behalf Of hari_akkin via 
lists.fd.io
Sent: Monday, May 11, 2020 7:22 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] per node error codes limit

Hi,
Is there anyway to accomodate more than 64 counters for error per node? 
currently it is limited to 64 and I have a case where the number of counters 
are more than 64.

thanks
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16305): https://lists.fd.io/g/vpp-dev/message/16305
Mute This Topic: https://lists.fd.io/mt/74133433/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Assertion failure in nat_get_vlib_main() in snat_init()

2020-05-08 Thread Dave Barach via lists.fd.io
I merged Ole's patch a minute ago. Again, thanks for the report...

Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Friday, May 8, 2020 5:30 AM
To: otr...@employees.org
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Assertion failure in nat_get_vlib_main() in snat_init()

Hi Ole,

Yes, that fixes it!
With that patch my NAT test works, no more assertion failures.

/ Elias


On Fri, 2020-05-08 at 10:06 +0200, Ole Troan wrote:
> Hi Elias,
> 
> Thanks for finding that one.
> Can you verify that this patch fixes it:
> https://gerrit.fd.io/r/c/vpp/+/26951 nat: fix per thread data
> vlib_main_t usage take 2 [NEW] 
> 
> Best regards,
> Ole
> 
> > On 7 May 2020, at 22:57, Elias Rudberg 
> > wrote:
> > 
> > Hello,
> > 
> > With the current master branch (def78344) we now get an assertion 
> > failure on startup, here:
> > 
> > (gdb) bt
> > #0  __GI_raise (sig=sig@entry=6) at
> > ../sysdeps/unix/sysv/linux/raise.c:51
> > #1  0x7462e801 in __GI_abort () at abort.c:79
> > #2  0x004071f3 in os_panic ()
> >at vpp/src/vpp/vnet/main.c:366
> > #3  0x7550d7d9 in debugger ()
> >at vpp/src/vppinfra/error.c:84
> > #4  0x7550d557 in _clib_error (how_to_die=2, 
> > function_name=0x0, line_number=0,
> >fmt=0x7fffacbc0310 "%s:%d (%s) assertion `%s' fails")
> >at vpp/src/vppinfra/error.c:143
> > #5  0x7fffacac659e in nat_get_vlib_main (thread_index=4)
> >at vpp/src/plugins/nat/nat.c:2557
> > #6  0x7fffacabd7a5 in snat_init (vm=0x7639b980
> > )
> >at vpp/src/plugins/nat/nat.c:2685
> > #7  0x760b9f66 in call_init_exit_functions_internal
> > (vm=0x7639b980 , 
> >headp=0x7639bfa8 , call_once=1,
> > do_sort=1)
> >at vpp/src/vlib/init.c:350
> > #8  0x760b9e88 in vlib_call_init_exit_functions
> > (vm=0x7639b980 , 
> >headp=0x7639bfa8 , call_once=1)
> >at vpp/src/vlib/init.c:364
> > #9  0x760ba011 in vlib_call_all_init_functions
> > (vm=0x7639b980 )
> >at vpp/src/vlib/init.c:386
> > #10 0x760df1f8 in vlib_main (vm=0x7639b980 
> > , input=0x7fffb4b2afa8)
> >at vpp/src/vlib/main.c:2171
> > #11 0x76166405 in thread0 (arg=140737324366208)
> >at vpp/src/vlib/unix/main.c:658
> > #12 0x75531954 in clib_calljmp ()
> >at vpp/src/vppinfra/longjmp.S:123
> > #13 0x7fffcf30 in ?? ()
> > #14 0x76165f97 in vlib_unix_main (argc=57, argv=0x71d520)
> >at vpp/src/vlib/unix/main.c:730
> > #15 0x004068d8 in main (argc=57, argv=0x71d520)
> >at vpp/src/vpp/vnet/main.c:291
> > 
> > The code looks like this (this part was added in a recent commit it
> > seems):
> > 
> > always_inline vlib_main_t *
> > nat_get_vlib_main (u32 thread_index) {  vlib_main_t *vm;  vm = 
> > vlib_mains[thread_index];  ASSERT (vm);  return vm; }
> > 
> > So it is looking at vlib_mains[thread_index] but that is NULL, 
> > apparently.
> > 
> > Since this happens at startup, could it be that vlib_mains has not 
> > been initialized yet, it is too early to try to access it?
> > 
> > Is vlib_mains[thread_index] supposed to be initialized by the time 
> > when
> > vlib_call_all_init_functions() runs?
> > 
> > Best regards,
> > Elias
> > 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16284): https://lists.fd.io/g/vpp-dev/message/16284
Mute This Topic: https://lists.fd.io/mt/74060018/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Dave Barach via lists.fd.io
API messages in network byte order made sense 10 years ago when I worked with a 
mixed x86_64 / ppc32 system. As Damjan points out, API interoperability between 
big-endian and little-endian systems is a boutique use-case these days.

Timing is key. We won’t be able to cherry-pick API message handler fixes across 
an endian-order flag-day. If we decide to switch to native byte order, we’d 
better switch right before we pull our next LTS release.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Saturday, May 16, 2020 7:23 AM
To: Andrew Yourtchenko 
Cc: Christian Hopps ; Ole Troan ; 
vpp-dev ; Jerome Tollet (jtollet) 
Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] 
Proposal for VPP binary API stability


Knowing that even hard-core big-endian systems like PowerPC decided to switch 
to Little endian and that situation where binary api will be exchanged between 
BE and LE systems is extremely small, maybe we should consider removing endian 
conversion from APIs and simply represent data
in the native form. Looks like this is fertile ground for new bugs…

Thoughts?

—
Damjan


> On 15 May 2020, at 16:20, Andrew Yourtchenko 
> mailto:ayour...@gmail.com>> wrote:
>
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
>
> Backstory:
>
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
>
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
>
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
>
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
>
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
>
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
>
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
>
> So there are the following options:
>
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
>
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
>
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
>
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
>
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
>
> So, the feedback that I would appreciate on the above:
>
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
>
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "fix the representation" ?
>
> 3) would there be an interest in having a sort of registry of "who
> wants to know about things related to each API" - so that if a bug
> like this arises that requires a behavior change to fix, I could poll
> the affected folks and maybe be able to get away with (1) or (3) from
> above ?
>
> And a tweak to the process (and potentially tooling) with regards to
> going to "production API status":
>
> An API that is not touched during a "make test" can not 

Re: [vpp-dev] Reminder: VPP 20.05 RC1 is *tomorrow* 13th May 18:00 UTC

2020-05-13 Thread Dave Barach via lists.fd.io
That patch looks OK, but it never – not even once – passed validation. I 
rebased it without difficulty, but unless it passes validation it’s not going 
to be merged.

It would have been a Good Thing to have pursued the matter at the time, rather 
than a long time after the fact.

D.

From: vpp-dev@lists.fd.io  On Behalf Of Mrityunjay Kumar
Sent: Wednesday, May 13, 2020 11:11 AM
To: Andrew Yourtchenko 
Cc: vpp-dev 
Subject: Re: [vpp-dev] Reminder: VPP 20.05 RC1 is *tomorrow* 13th May 18:00 UTC

Hi Andrew
Long back I have given a patch. if possible can you please try to include it 
with vpp 2005.
https://gerrit.fd.io/r/c/vpp/+/21793

//MJ

Regards,
Mrityunjay Kumar.
Mobile: +91 - 9731528504


On Tue, May 12, 2020 at 4:53 PM Andrew Yourtchenko 
mailto:ayour...@gmail.com>> wrote:
Hi all,

Just a friendly reminder that tomorrow at 18:00 UTC we have the VPP
20.05 RC1 milestone [0], during which a stable/2005 branch will be
created.

Until after I send an email announcing the end of that process, master
branch is still CLOSED for API changes and risky commits.

You may have notice that verify jobs are still being a bit
temperamental - some of the changes appeared to have made the
situation a bit better, but Dave and myself are still working on it to
properly diagnose the root cause of this intermittent issue. If you
have any persistent infrastructure-related difficulties with the
commits that you would like to see merged before the new stable branch
being cut, please let me know ASAP.

thanks!

--a
/* your friendly 20.05 release manager */

[0]: https://wiki.fd.io/view/Projects/vpp/Release_Plans/Release_Plan_20.05
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16361): https://lists.fd.io/g/vpp-dev/message/16361
Mute This Topic: https://lists.fd.io/mt/74156235/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] regarding disable_trace in vppinfra/dlmalloc.c

2020-03-21 Thread Dave Barach via Lists.Fd.Io
Nice catch. TBH it won’t make much difference, but the code is clearly wrong. 
I’ll push a patch.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of 
xiapengli...@gmail.com
Sent: Friday, March 20, 2020 11:32 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] regarding disable_trace in vppinfra/dlmalloc.c

Sorry to bother you experts:(, but when I read dlmalloc, just curious about why 
disable_trace was setting the USE_TRACE_BIT instead of unsetting...

#define enable_trace(M) ((M)->mflags |= USE_TRACE_BIT)

#define disable_trace(M) ((M)->mflags |= USE_TRACE_BIT)

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15832): https://lists.fd.io/g/vpp-dev/message/15832
Mute This Topic: https://lists.fd.io/mt/72438803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Build broken: revert of "srv6-mobile: revert GTP4/6.DT and User Plane message mapping" pending

2020-03-22 Thread Dave Barach via Lists.Fd.Io
See https://gerrit.fd.io/r/c/vpp/+/26061.

I can see why folks thought that the original patch was OK: Jenkins / Gerrit 
hid the original test failure.

test_srv6_mobile.TestSRv6EndMGTP6D.test_srv6_mobile failed during validation, 
and has been failing 100% of the time since the patch was merged.

Debug CLI error: "sr localsid: Error: SRv6 LocalSID address is mandatory."

I tried fixing the test code, "s/prefix/address/" in several places such as:

self.vapi.cli(
"sr localsid prefix {}/64 behavior end.m.gtp6.e"
.format(pkts[0]['IPv6'].dst))

That led the test to fail for other reasons. Hence the revert.

FWIW... Dave




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15836): https://lists.fd.io/g/vpp-dev/message/15836
Mute This Topic: https://lists.fd.io/mt/72466522/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Build broken: revert of "srv6-mobile: revert GTP4/6.DT and User Plane message mapping" pending

2020-03-22 Thread Dave Barach via Lists.Fd.Io
Merged https://gerrit.fd.io/r/c/vpp/+/26059 instead...

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
Lists.Fd.Io
Sent: Sunday, March 22, 2020 9:40 AM
To: vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] Build broken: revert of "srv6-mobile: revert GTP4/6.DT and 
User Plane message mapping" pending

See https://gerrit.fd.io/r/c/vpp/+/26061.

I can see why folks thought that the original patch was OK: Jenkins / Gerrit 
hid the original test failure.

test_srv6_mobile.TestSRv6EndMGTP6D.test_srv6_mobile failed during validation, 
and has been failing 100% of the time since the patch was merged.

Debug CLI error: "sr localsid: Error: SRv6 LocalSID address is mandatory."

I tried fixing the test code, "s/prefix/address/" in several places such as:

self.vapi.cli(
"sr localsid prefix {}/64 behavior end.m.gtp6.e"
.format(pkts[0]['IPv6'].dst))

That led the test to fail for other reasons. Hence the revert.

FWIW... Dave




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15838): https://lists.fd.io/g/vpp-dev/message/15838
Mute This Topic: https://lists.fd.io/mt/72466522/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] How to get source node of a buffer

2020-03-24 Thread Dave Barach via Lists.Fd.Io
See URL below.

The vpp packet dispatch tracer will show you everything you would want to know 
about the nodes visited by individual packets:

https://fd.io/docs/vpp/master/gettingstarted/developers/vnet.html#graph-dispatcher-pcap-tracing

This is a developer tool, not a tool for post-mortem analysis.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Neale Ranns via 
Lists.Fd.Io
Sent: Tuesday, March 24, 2020 4:06 AM
To: Satya Murthy ; vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] How to get source node of a buffer

Hi Murthy,

There is no way to get the source node.
However, if you are debugging and you want to see the full history of the graph 
through which a packet has passed, you can turn on trajectory tracing.
#define VLIB_BUFFER_TRACE_TRAJECTORY 1
In vlib/buffer.h

/neale

From: mailto:vpp-dev@lists.fd.io>> on behalf of Satya 
Murthy mailto:satyamurthy1...@gmail.com>>
Date: Tuesday 24 March 2020 at 06:44
To: "vpp-dev@lists.fd.io" 
mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] How to get source node of a buffer

Hi ,

Is there any way to find the source node of a buffer. Basically, I want to know 
from which node this buffer came.
I understand that each graph node shall have a design which is independant of 
the source node.
However, the source node information may be useful in case of debugging some 
crashes while processing a buffer.

Any inputs on this pls.

--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15852): https://lists.fd.io/g/vpp-dev/message/15852
Mute This Topic: https://lists.fd.io/mt/72511630/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] [VCL] Memory access error for different size of mutex with different glibc versions in VPP and VCL app

2020-03-24 Thread Dave Barach via Lists.Fd.Io
This limitation should come as no surprise, and it’s hardly a “big” limitation.

Options include building container images which match the host distro, or using 
a vpp snap image on the host which corresponds to the container images.

Given that there are two ways to deal with the situation, pick your favorite 
and move on.

From: vpp-dev@lists.fd.io  On Behalf Of wanghanlin
Sent: Monday, March 23, 2020 10:16 PM
To: fcoras.li...@gmail.com
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] [VCL] Memory access error for different size of mutex 
with different glibc versions in VPP and VCL app

Hi Florin,
It's not only regarding compiled with the same glibc version, but running with 
the same glibc version also because libpthread is dynamically linked into VCL 
and VPP.
This is really a big limitation.

Regards,
Hanlin

[https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png]
wanghanlin
[https://mail-online.nosdn.127.net/qiyelogo/209a2912f40f6683af56bb7caff1cb54.png]
wanghan...@corp.netease.com
签名由 网易邮箱大师 定制
On 3/23/2020 23:31,Florin 
Coras wrote:
Hi Hanlin,

Unfortunately, you’ll have to make sure all code has been compiled with the 
same glibc version. I’ve heard that glibc changed in ubuntu 20.04 but I haven’t 
done any testing with it yet.

Note that the binary api also makes use of svm_queue_t.

Regards,
Florin


On Mar 22, 2020, at 10:49 PM, wanghanlin 
mailto:wanghan...@corp.netease.com>> wrote:

Hi All,
Now, VCL app and VPP shared some data structures, such as svm_queue_t.  In 
svm_queue_t, there are mutex and condvar variables that depends on specified 
glibc version.
When VPP run in host and VCL app run in a docker container, glibc versions 
maybe different between VPP and VCL app, and then result in memory access error 
for different size of mutex  and condvar.
Has anyone noticed this?

Regards,
Hanlin
[https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png]
wanghanlin
[https://mail-online.nosdn.127.net/qiyelogo/209a2912f40f6683af56bb7caff1cb54.png]
wanghan...@corp.netease.com
签名由 网易邮箱大师 定制

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15851): https://lists.fd.io/g/vpp-dev/message/15851
Mute This Topic: https://lists.fd.io/mt/72485607/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread Dave Barach via lists.fd.io
If you want to count things in data plane nodes, use a per-node counter and the 
“show error” debug CLI to inspect it.

To count every packet fed to the node dispatch function, you can bump a node 
counter once per frame:

  vlib_node_increment_counter (vm, myplugin_node.index, 
MYPLUGIN_ERROR_WHATEVER, frame->n_vectors);

A single printf call costs roughly the same number of clock cycles as 
processing O(10) packets from start to finish. It’s really expensive.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Nguy?n Th? Hi?u
Sent: Wednesday, May 6, 2020 5:26 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Why VPP performance down very much when I use print() 
function.

Hi VPP team.
I create a simple VPP node name "swap_mac". "swap_mac" node just swap between 
source and destination MAC address and send packet back.
Then, I use Pktgen tool to send packet to VPP. In VPP, the packet will go to 
swap_mac->interface-output node and finally send back Pktgen tool.

I found out with this test model, VPP throughput can go up 7Gbps in my lab. But 
VPP throughput just is 300Mbps when I add a counter variable to count number of 
received packet and a printf() to print value of  counter in "swap_mac" node 
function.
My code:

counter ++
if((counter % 600.000.000) == 0)
{
   printf("Receive packets: %ld", counter );
}
So, why VPP throughput change from 7Gbps to 300Mbps when I just call printf() 
function every  600.000.000 packets?
( I have tried to  comment out printf() , VPP throughput go up 7Gb again. )

Please help me to see it. I'm sorry for my bad English.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16246): https://lists.fd.io/g/vpp-dev/message/16246
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Fix in LACP code to avoid assertion failure in vlib_time_now()

2020-05-07 Thread Dave Barach via lists.fd.io
Thanks for the patch, merged... 

The cpu tick counters are different on each thread, so calling vlib_time_now 
(wrong_vlib_main_t *) wrecks the victim thread's timebase. Knock-on effects 
include all manner of obscure / hard-to-reproduce failures.

Dave  

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Thursday, May 7, 2020 10:17 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Fix in LACP code to avoid assertion failure in 
vlib_time_now()

Hello VPP experts,

When trying the current VPP master branch using a debug build we encountered an 
assertion failure in vlib_time_now() here:

always_inline f64
vlib_time_now (vlib_main_t * vm)
{
#if CLIB_DEBUG > 0
  extern __thread uword __os_thread_index; #endif
  /*
   * Make sure folks don't pass _global_main from a worker thread.
   */
  ASSERT (vm->thread_index == __os_thread_index);
  return clib_time_now (>clib_time) + vm->time_offset; }

The ASSERT there is triggered because the LACP code passes _global_main 
when it should pass a thread-specific vlib_main_t. So this looks like precisely 
the kind of issue that the assertion was made to catch.

To reproduce the problem I think it should be anough to use LACP in a 
multi-threaded scenario, using a debug build, then the assertion failure 
happens directy at startup, every time.

I pushed a fix, here: https://gerrit.fd.io/r/c/vpp/+/26943

After that fix it seems to work, LACP then works without assertion failure. 
Please have a look and merge if it seems okay.

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16271): https://lists.fd.io/g/vpp-dev/message/16271
Mute This Topic: https://lists.fd.io/mt/74051150/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-07 Thread Dave Barach via lists.fd.io
Ack. vmovdqa64 is an aligned move [google it...] so it's no surprise whatsoever 
that it blew up. If you check the new/improved assembly code, you'll probably 
see that the compiler generated a 'u' flavor [unaligned] vector move.

Thanks... Dave 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Wednesday, May 6, 2020 7:56 PM
To: dmar...@me.com
Cc: Dave Barach (dbarach) ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when 
using clang compiler

Hi Dave and Damjan,

Here is instruction and register info:

(gdb) x/i $pc
=> 0x7fffabbbdd67 :   vmovdqa64
-0x30a0(%rbp),%ymm0
(gdb) info registers rbp ymm0
rbp0x7417daf0   0x7417daf0
ymm0   {v8_float = {0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0,
0xfffd}, v4_double = {0x0, 0x37, 0x0, 0xff85}, v32_int8 = {0x0, 
0x0, 0x0, 0x10, 
0x3f, 0xf6, 0x41, 0x80, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x4b, 0x40, 0x0, 
0x0, 0x0, 0x10, 0x3f, 0xf6, 0x55, 0x0, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x5e, 
0xc0}, v16_int16 = {0x0, 0x1000, 0xf63f, 0x8041, 0x0, 0x1000, 0xf63f, 
0x404b, 0x0, 0x1000, 0xf63f, 0x55, 0x0, 0x1000, 0xf63f, 0xc05e}, v8_int32 = {
0x1000, 0x8041f63f, 0x1000, 0x404bf63f, 0x1000, 0x55f63f, 
0x1000, 0xc05ef63f}, v4_int64 = {0x8041f63f1000, 0x404bf63f1000, 
0x55f63f1000, 0xc05ef63f1000}, v2_int128 = 
{0x404bf63f10008041f63f1000,
0xc05ef63f1055f63f1000}}

Not sure if I understand all this but perhaps it means that the value in %rbp 
is used as a memory address, but that address 0x7417daf0 is not 32-byte 
aligned as it needs to be.

Adding __attribute__((aligned(32))) as Damjan suggests indeed seems to help. 
After that there was again a segfault in another place in the same file, where 
the same trick of adding __attribute__((aligned(32))) again helped.

So it seems the problem can be fixed by adding that alignment attribute in two 
places, like this:

diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c index 
cf0b6bffe..324436f01 100644
--- a/src/plugins/rdma/input.c
+++ b/src/plugins/rdma/input.c
@@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm, rdma_device_t * 
rd,
 
   if (is_mlx5dv)
 {
-  u64 va[8];
+  u64 va[8] __attribute__((aligned(32)));
   mlx5dv_rwq_t *wqe = rxq->wqes + slot;
 
   while (n >= 1)
@@ -488,7 +488,7 @@ rdma_device_input_inline (vlib_main_t * vm, 
vlib_node_runtime_t * node,
   rdma_rxq_t *rxq = vec_elt_at_index (rd->rxqs, qid);
   vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b = bufs;
   struct ibv_wc wc[VLIB_FRAME_SIZE];
-  u32 byte_cnts[VLIB_FRAME_SIZE];
+  u32 byte_cnts[VLIB_FRAME_SIZE] __attribute__((aligned(32)));
   vlib_buffer_t bt;
   u32 next_index, *to_next, n_left_to_next, n_rx_bytes = 0;
   int n_rx_packets, skip_ip4_cksum = 0;

Many thanks for you help!

Should I push the above as a patch to gerrit?

/ Elias



On Wed, 2020-05-06 at 20:38 +0200, Damjan Marion wrote:
> Can you try this:
> 
> diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c index 
> cf0b6bffe..b461ee27b 100644
> --- a/src/plugins/rdma/input.c
> +++ b/src/plugins/rdma/input.c
> @@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm, 
> rdma_device_t * rd,
> 
>if (is_mlx5dv)
>  {
> -  u64 va[8];
> +  u64 va[8] __attribute__((aligned(32)));
>mlx5dv_rwq_t *wqe = rxq->wqes + slot;
> 
>while (n >= 1)
> 
> 
> Thanks!
> 
> > On 6 May 2020, at 19:45, Elias Rudberg 
> > wrote:
> > 
> > Hello VPP experts,
> > 
> > When trying to use the current master branch, we get a segmentation 
> > fault error. Here is what it looks like in gdb:
> > 
> > Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fedf91fe700 (LWP 21309)] 
> > rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, 
> > rxq=0x77edea80, is_mlx5dv=1)
> >at vpp/src/plugins/rdma/input.c:115
> > 115   *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
> > + 4));

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16269): https://lists.fd.io/g/vpp-dev/message/16269
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Assertion failure in nat_get_vlib_main() in snat_init()

2020-05-07 Thread Dave Barach via lists.fd.io
Right, sorry, I've pinged the Responsible Parties offline to fix the problem. 

vec_len(vlib_mains) will be 1 in an init routine. Start_threads() which builds 
the final vlib_mains vector doesn't occur until just prior to main dispatch 
loop entry.

vec_elt_at_index (...) is meant to catch this sort of problem...

Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Thursday, May 7, 2020 4:58 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Assertion failure in nat_get_vlib_main() in snat_init()

Hello,

With the current master branch (def78344) we now get an assertion failure on 
startup, here:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7462e801 in __GI_abort () at abort.c:79
#2  0x004071f3 in os_panic ()
at vpp/src/vpp/vnet/main.c:366
#3  0x7550d7d9 in debugger ()
at vpp/src/vppinfra/error.c:84
#4  0x7550d557 in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, 
fmt=0x7fffacbc0310 "%s:%d (%s) assertion `%s' fails")
at vpp/src/vppinfra/error.c:143
#5  0x7fffacac659e in nat_get_vlib_main (thread_index=4)
at vpp/src/plugins/nat/nat.c:2557
#6  0x7fffacabd7a5 in snat_init (vm=0x7639b980
)
at vpp/src/plugins/nat/nat.c:2685
#7  0x760b9f66 in call_init_exit_functions_internal
(vm=0x7639b980 , 
headp=0x7639bfa8 , call_once=1,
do_sort=1)
at vpp/src/vlib/init.c:350
#8  0x760b9e88 in vlib_call_init_exit_functions
(vm=0x7639b980 , 
headp=0x7639bfa8 , call_once=1)
at vpp/src/vlib/init.c:364
#9  0x760ba011 in vlib_call_all_init_functions
(vm=0x7639b980 )
at vpp/src/vlib/init.c:386
#10 0x760df1f8 in vlib_main (vm=0x7639b980 , 
input=0x7fffb4b2afa8)
at vpp/src/vlib/main.c:2171
#11 0x76166405 in thread0 (arg=140737324366208)
at vpp/src/vlib/unix/main.c:658
#12 0x75531954 in clib_calljmp ()
at vpp/src/vppinfra/longjmp.S:123
#13 0x7fffcf30 in ?? ()
#14 0x76165f97 in vlib_unix_main (argc=57, argv=0x71d520)
at vpp/src/vlib/unix/main.c:730
#15 0x004068d8 in main (argc=57, argv=0x71d520)
at vpp/src/vpp/vnet/main.c:291

The code looks like this (this part was added in a recent commit it
seems):

always_inline vlib_main_t *
nat_get_vlib_main (u32 thread_index)
{
  vlib_main_t *vm;
  vm = vlib_mains[thread_index];
  ASSERT (vm);
  return vm;
}

So it is looking at vlib_mains[thread_index] but that is NULL, apparently.

Since this happens at startup, could it be that vlib_mains has not been 
initialized yet, it is too early to try to access it?

Is vlib_mains[thread_index] supposed to be initialized by the time when
vlib_call_all_init_functions() runs?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16278): https://lists.fd.io/g/vpp-dev/message/16278
Mute This Topic: https://lists.fd.io/mt/74060018/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Dave Barach via lists.fd.io
Could we please see the faulting instruction, as well as the vector register 
contents involved?

As in "x/i $pc", and the ymmX registers involved?

If the vector instruction requires alignment, "movaps" or similar, it wouldn't 
be a shock to discover an unaligned address. We've already found and fixed a 
few of those since switching to clang, and I have to say that "va + 4" raises 
all sorts of aligned vector instruction red flags...


FWIW... Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Wednesday, May 6, 2020 1:46 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Segmentation fault in rdma_device_input_refill when using 
clang compiler

Hello VPP experts,

When trying to use the current master branch, we get a segmentation fault 
error. Here is what it looks like in gdb:

Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fedf91fe700 (LWP 21309)] rdma_device_input_refill 
(vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
115   *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
+ 4));
(gdb) bt
#0  rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, 
rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
#1  0x7fffa84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:622
#2  0x7fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, frame=0x0)
at vpp/src/plugins/rdma/input.c:647
#3  0x760e3155 in dispatch_node (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT, 
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235
#4  0x760ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0,
is_main=0) at vpp/src/vlib/main.c:1815
#5  0x760dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at
vpp/src/vlib/main.c:1996
#6  0x761345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at
vpp/src/vlib/threads.c:1795
#7  0x75531954 in clib_calljmp () at
vpp/src/vppinfra/longjmp.S:123
#8  0x7fedf91fdce0 in ?? ()
#9  0x7612cd53 in vlib_worker_thread_bootstrap_fn
(arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584 Backtrace stopped: previous 
frame inner to this frame (corrupt stack?)

This segmentation fault happens the same way every time I try to start VPP.

This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs and a 
Intel Xeon Gold 6126 CPU.

I have looked back at recent changes and found that this problem started with 
the commit 4ba16a44 "misc: switch to clang-9" dated April 28. Before that we 
could use the master branch without thie problem.

Changing back to gcc by removing clang in src/CMakeLists.txt makes the error go 
away. However, there is then instead a problem with a "symbol lookup error" for 
crypto_native_plugin.so: undefined symbol:
crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling the 
crypto_native plugin)

So, two problems:

(1) The segmentation fault itself, perhaps indicating a bug somewhere but seems 
to appear only with clang and not with gcc

(2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem when 
trying to use gcc instead of clang

What do you think about these?

As a short-term fix, is removing clang in src/CMakeLists.txt reasonable or is 
there a better/easier workaround?

Does anyone else use the rdma plugin when compiling using clang -- perhaps that 
combination triggers this problem?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16253): https://lists.fd.io/g/vpp-dev/message/16253
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Buffer Occupancy Calculation #vnet #vpp

2020-09-03 Thread Dave Barach via lists.fd.io
+1. 

I suspect that the most useful occupancy measure may be the number of buffers 
in use / total number of buffers. 

FWIW... Dave 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Benoit Ganne 
(bganne) via lists.fd.io
Sent: Thursday, September 3, 2020 9:27 AM
To: mauricio.soli...@tno.nl; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Buffer Occupancy Calculation #vnet #vpp

> I'm confused about what buffers-per-numa and buffer data-size are 
> referring to?  If I'm trying to determine overall packet buffer 
> occupancy levels, should I use buffers-per-numa or data-size?

A buffer in VPP is composed of a 128-bytes header, a 128-bytes scratchpad (to 
allow eg. tunneling w/o moving data) and the data buffer.
'data-size' specify the size of the data buffer, ie the total size of the 
buffer will be 256-bytes + data-size.
'buffers-per-numa' is the number of buffers to allocate per NUMA node. On a 
standard 2-socket server with 2 NUMA nodes, you'll allocate a total of 2 * 
buffers-per-numa buffers.
So the memory consumption for buffers should be something like 2 * 
buffers-per-numa * (256 + data-size).
Now, this is not exactly true, because the buffers are aligned on cachelines, 
are not split accross pages and the total will rounded up to the pagesize, so 
there will be some additional overhead, but it should gives you the right order 
of magnitude.

> Would this change if I use DPDK? Meaning, is the buffer allocated by 
> DPDK queue memory or VPP?

No, even with DPDK the buffer pool is managed by VPP. There is some overhead in 
that case because each buffer must accommodate an additional dpdk header.

Best
ben
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17326): https://lists.fd.io/g/vpp-dev/message/17326
Mute This Topic: https://lists.fd.io/mt/76605334/21656
Mute #vpp: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/vpp
Mute #vnet: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] How to see coverity logs

2020-09-03 Thread Dave Barach via lists.fd.io
Non-zero clib_error_t *'s should be freed. Choices include: 
clib_error_report(error), or clib_error_free(error)...

From: vpp-dev@lists.fd.io  On Behalf Of Nitin Saxena
Sent: Wednesday, September 2, 2020 12:13 PM
To: Dave Barach (dbarach) ; vpp-dev 
Cc: Suheil Chandran 
Subject: Re: [vpp-dev] How to see coverity logs

Hi Dave/Chris,

Thanks for the pointers. I am able to see Coverity defects now. My reason for 
looking for Coverity logs is following

It has recently been came to my notice that vnet APIs and their caller do not 
free clib_error_t * properly. Usage of vnet_[sw|hw]_interface_set_flags() APIs 
are widespread across VPP but often clib_error_t *error  returned by these APIs 
are not freed. Is it my correct understanding that non-zero error returned by 
these APIs must be freed by the caller? I tried fixing some of the main APIs in 
vnet (https://gerrit.fd.io/r/c/vpp/+/28643)

Thanks,
Nitin

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via 
lists.fd.io
Sent: Wednesday, September 2, 2020 6:44 PM
To: Nitin Saxena mailto:nsax...@marvell.com>>; vpp-dev 
mailto:vpp-dev@lists.fd.io>>
Subject: [EXT] Re: [vpp-dev] How to see coverity logs

External Email

Known Coverity UI bug. Happens all the time for me.

Back up one page, and click on "show me the bugs" again. It should work.

D.

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Nitin Saxena
Sent: Wednesday, September 2, 2020 3:31 AM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] How to see coverity logs

Hi Maintainers,

I would like to see coverity logs for latest VPP build. I tried adding 
https://scan.coverity.com/projects/fd-io-vp<https://urldefense.proofpoint.com/v2/url?u=https-3A__scan.coverity.com_projects_fd-2Dio-2Dvp=DwMFAg=nKjWec2b6R0mOyPaz7xtfQ=S4H7jibYAtA5YOvfL3IkGduCfk9LbZMPOAecQGDzWV0=TX49XF4W5LMPHXROUhmceNFYYPdAjN594gNBNp4vUCQ=h79x--OWOPP2w6PLpRmbqgayjfen8be8uNjkTWqxmmU=>
 to my github but when I click on "View Defects" it says "Permission denied".
Is it possible to view coverity logs or I am missing anything?

Thanks,
Nitin


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17324): https://lists.fd.io/g/vpp-dev/message/17324
Mute This Topic: https://lists.fd.io/mt/76577815/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-02 Thread Dave Barach via lists.fd.io
Given the amount of soak-time / perf/scale / stress testing which the tw_timer 
code has experienced, it’s reasonably likely that your application is 
responsible.

Caution is required when dealing with timers other than the timer which has 
expired.

If you have > 1 timer per object and you manipulate timer B when timer A 
expires, there’s no guarantee that timer B isn’t already on the expired timer 
list. That’s almost always good for trouble.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR via 
lists.fd.io
Sent: Wednesday, September 2, 2020 12:39 AM
To: vpp-dev 
Subject: [vpp-dev]: Crash in Timer wheel infra

Hi All,

We are facing a crash in VPP's Timer wheel INFRA. Please find the details below.

Version : 19.08
Configuration: 2 workers and the main thread.
Bactraces: thread apply all bt


Thread 1 (Thread 0x7ff41d586d00 (LWP 253)):

---Type  to continue, or q  to quit---

#0  0x7ff41c696722 in __GI___waitpid (pid=707,

stat_loc=stat_loc@entry=0x7ff39f18ca18, options=options@entry=0)

at ../sysdeps/unix/sysv/linux/waitpid.c:30

#1  0x7ff41c601107 in do_system (line=)

at ../sysdeps/posix/system.c:149

#2  0x7ff41d11a76b in bd_signal_handler_cb (signo=6)

at /development/librtbrickinfra/bd/src/bd.c:770

#3  0x7ff410ce907b in rtb_bd_signal_handler (signo=6)

at /development/libvpp/src/vlib/unix/main.c:80

#4  0x7ff410ce9416 in unix_signal_handler (signum=6, si=0x7ff39f18d1f0,

uc=0x7ff39f18d0c0) at /development/libvpp/src/vlib/unix/main.c:180

#5  

#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

#7  0x7ff41c5f28b1 in __GI_abort () at abort.c:79

#8  0x7ff41248ee66 in os_panic ()

at /development/libvpp/src/vpp/vnet/main.c:559

#9  0x7ff410922825 in debugger ()

at /development/libvpp/src/vppinfra/error.c:84

#10 0x7ff410922bf4 in _clib_error (how_to_die=2, function_name=0x0,

line_number=0, fmt=0x7ff4109e8a78 "%s:%d (%s) assertion `%s' fails")

at /development/libvpp/src/vppinfra/error.c:143

#11 0x7ff4109a64dd in tw_timer_expire_timers_internal_1t_3w_1024sl_ov (

tw=0x7ff39fdf7a40, now=327.5993926951,

---Type  to continue, or q  to quit---

callback_vector_arg=0x7ff39fdfab00)

at /development/libvpp/src/vppinfra/tw_timer_template.c:753

#12 0x7ff4109a6b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov (

tw=0x7ff39fdf7a40, now=327.5993926951, vec=0x7ff39fdfab00)

at /development/libvpp/src/vppinfra/tw_timer_template.c:814

#13 0x7ff410c8321a in vlib_main_or_worker_loop (

vm=0x7ff410f22e40 , is_main=1)

at /development/libvpp/src/vlib/main.c:1859

#14 0x7ff410c83965 in vlib_main_loop (vm=0x7ff410f22e40 )

at /development/libvpp/src/vlib/main.c:1930

#15 0x7ff410c8462c in vlib_main (vm=0x7ff410f22e40 ,

input=0x7ff39f18dfb0) at /development/libvpp/src/vlib/main.c:2147

#16 0x7ff410ceabc9 in thread0 (arg=140686233054784)

at /development/libvpp/src/vlib/unix/main.c:666

#17 0x7ff410943600 in clib_calljmp ()

   from /usr/local/lib/libvppinfra.so.1.0.1

#18 0x7ffe4d981390 in ?? ()

#19 0x7ff410ceb13f in vlib_unix_main (argc=55, argv=0x556c398eb100)

at /development/libvpp/src/vlib/unix/main.c:736

#20 0x7ff41248e7cb in rtb_vpp_core_init (argc=55, argv=0x556c398eb100)

at /development/libvpp/src/vpp/vnet/main.c:483

#21 0x7ff41256189a in rtb_vpp_main ()

at /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113

---Type  to continue, or q  to quit---

#22 0x7ff41d11a15a in bd_load_daemon_lib (

dmn_lib_cfg=0x7ff41d337860 )

at /development/librtbrickinfra/bd/src/bd.c:627

#23 0x7ff41d11a205 in bd_load_all_daemon_libs ()

at /development/librtbrickinfra/bd/src/bd.c:646

#24 0x7ff41d11b676 in bd_start_process ()

at /development/librtbrickinfra/bd/src/bd.c:1128

#25 0x7ff419e92200 in bds_bd_init ()

at /development/librtbrickinfra/libbds/code/bds/src/bds.c:651

#26 0x7ff419f1aa5d in pubsub_bd_init_expiry (data=0x0)

at /development/librtbrickinfra/libbds/code/pubsub/src/pubsub_helper.c:1412

#27 0x7ff41cc23070 in timer_dispatch (item=0x556c39997cf0, p=QB_LOOP_HIGH)

at /development/librtbrickinfra/libqb/lib/loop_timerlist.c:56

#28 0x7ff41cc1f006 in qb_loop_run_level (level=0x556c366fb3e0)

at /development/librtbrickinfra/libqb/lib/loop.c:43

#29 0x7ff41cc1f77b in qb_loop_run (lp=0x556c366fb370)

at /development/librtbrickinfra/libqb/lib/loop.c:210

#30 0x7ff41cc30b3f in lib_qb_service_start_event_loop ()

at /development/librtbrickinfra/libqb/lib/wrapper/lib_qb_service.c:257

#31 0x556c358c7153 in main ()

Thread 11 (Thread 0x7ff35b622700 (LWP 413)):

#0  rtb_vpp_shm_rx_burst (port_id=3, queue_id=0, burst_size=64 '@')

at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:317

#1  0x7ff4125ee043 in rtb_vpp_shm_device_input (vm=0x7ff39f89ac80,

shmm=0x7ff41285e180 , shmif=0x7ff39f8ad940,


Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-02 Thread Dave Barach via lists.fd.io
It looks like vpp is crashing while expiring timers from the main thread 
process timer wheel. That’s not been reported before.

You might want to dust off .../extras/deprecated/vlib/unix/cj.[ch], and make a 
circular log of timer pool_put operations to work out what’s happening.

D.

From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR via 
lists.fd.io
Sent: Wednesday, September 2, 2020 9:42 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev 
Subject: Re: [vpp-dev]: Crash in Timer wheel infra

Thanks Dave for the quick analysis. Are there some Debug CLIs that I can run to 
analyse?
We are not using the VPP timers as we have our own timer library. In VPP, we 
have added a couple of VPP nodes(process, internal and input). Could these be 
causing the problem?

Thanks,
Rajith

On Wed, Sep 2, 2020 at 6:43 PM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
Given the amount of soak-time / perf/scale / stress testing which the tw_timer 
code has experienced, it’s reasonably likely that your application is 
responsible.

Caution is required when dealing with timers other than the timer which has 
expired.

If you have > 1 timer per object and you manipulate timer B when timer A 
expires, there’s no guarantee that timer B isn’t already on the expired timer 
list. That’s almost always good for trouble.

HTH... Dave

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Rajith PR via 
lists.fd.io
Sent: Wednesday, September 2, 2020 12:39 AM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev]: Crash in Timer wheel infra

Hi All,

We are facing a crash in VPP's Timer wheel INFRA. Please find the details below.

Version : 19.08
Configuration: 2 workers and the main thread.
Bactraces: thread apply all bt


Thread 1 (Thread 0x7ff41d586d00 (LWP 253)):

---Type  to continue, or q  to quit---

#0  0x7ff41c696722 in __GI___waitpid (pid=707,

stat_loc=stat_loc@entry=0x7ff39f18ca18, options=options@entry=0)

at ../sysdeps/unix/sysv/linux/waitpid.c:30

#1  0x7ff41c601107 in do_system (line=)

at ../sysdeps/posix/system.c:149

#2  0x7ff41d11a76b in bd_signal_handler_cb (signo=6)

at /development/librtbrickinfra/bd/src/bd.c:770

#3  0x7ff410ce907b in rtb_bd_signal_handler (signo=6)

at /development/libvpp/src/vlib/unix/main.c:80

#4  0x7ff410ce9416 in unix_signal_handler (signum=6, si=0x7ff39f18d1f0,

uc=0x7ff39f18d0c0) at /development/libvpp/src/vlib/unix/main.c:180

#5  

#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

#7  0x7ff41c5f28b1 in __GI_abort () at abort.c:79

#8  0x7ff41248ee66 in os_panic ()

at /development/libvpp/src/vpp/vnet/main.c:559

#9  0x7ff410922825 in debugger ()

at /development/libvpp/src/vppinfra/error.c:84

#10 0x7ff410922bf4 in _clib_error (how_to_die=2, function_name=0x0,

line_number=0, fmt=0x7ff4109e8a78 "%s:%d (%s) assertion `%s' fails")

at /development/libvpp/src/vppinfra/error.c:143

#11 0x7ff4109a64dd in tw_timer_expire_timers_internal_1t_3w_1024sl_ov (

tw=0x7ff39fdf7a40, now=327.5993926951,

---Type  to continue, or q  to quit---

callback_vector_arg=0x7ff39fdfab00)

at /development/libvpp/src/vppinfra/tw_timer_template.c:753

#12 0x7ff4109a6b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov (

tw=0x7ff39fdf7a40, now=327.5993926951, vec=0x7ff39fdfab00)

at /development/libvpp/src/vppinfra/tw_timer_template.c:814

#13 0x7ff410c8321a in vlib_main_or_worker_loop (

vm=0x7ff410f22e40 , is_main=1)

at /development/libvpp/src/vlib/main.c:1859

#14 0x7ff410c83965 in vlib_main_loop (vm=0x7ff410f22e40 )

at /development/libvpp/src/vlib/main.c:1930

#15 0x7ff410c8462c in vlib_main (vm=0x7ff410f22e40 ,

input=0x7ff39f18dfb0) at /development/libvpp/src/vlib/main.c:2147

#16 0x7ff410ceabc9 in thread0 (arg=140686233054784)

at /development/libvpp/src/vlib/unix/main.c:666

#17 0x7ff410943600 in clib_calljmp ()

   from /usr/local/lib/libvppinfra.so.1.0.1

#18 0x7ffe4d981390 in ?? ()

#19 0x7ff410ceb13f in vlib_unix_main (argc=55, argv=0x556c398eb100)

at /development/libvpp/src/vlib/unix/main.c:736

#20 0x7ff41248e7cb in rtb_vpp_core_init (argc=55, argv=0x556c398eb100)

at /development/libvpp/src/vpp/vnet/main.c:483

#21 0x7ff41256189a in rtb_vpp_main ()

at /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113

---Type  to continue, or q  to quit---

#22 0x7ff41d11a15a in bd_load_daemon_lib (

dmn_lib_cfg=0x7ff41d337860 )

at /development/librtbrickinfra/bd/src/bd.c:627

#23 0x7ff41d11a205 in bd_load_all_daemon_libs ()

at /development/librtbrickinfra/bd/src/bd.c:646

#24 0x7ff41d11b676 in bd_start_process ()

at /development/librtbrickinfra/bd/src/bd.c:1128

#25 0x7ff419e92200 in bds_bd_init ()

at /development/librtbrickinfra/libbds/code/bds/src/bds.c:651

#26 0x7ff419f1aa5d in 

Re: [vpp-dev] How to see coverity logs

2020-09-02 Thread Dave Barach via lists.fd.io
Known Coverity UI bug. Happens all the time for me.

Back up one page, and click on "show me the bugs" again. It should work.

D.

From: vpp-dev@lists.fd.io  On Behalf Of Nitin Saxena
Sent: Wednesday, September 2, 2020 3:31 AM
To: vpp-dev 
Subject: [vpp-dev] How to see coverity logs

Hi Maintainers,

I would like to see coverity logs for latest VPP build. I tried adding 
https://scan.coverity.com/projects/fd-io-vp to my github but when I click on 
"View Defects" it says "Permission denied".
Is it possible to view coverity logs or I am missing anything?

Thanks,
Nitin


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17318): https://lists.fd.io/g/vpp-dev/message/17318
Mute This Topic: https://lists.fd.io/mt/76577815/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-09 Thread Dave Barach via lists.fd.io
s/10us/100us/ seems to work as expected. Diffs under separate cover. Let me 
know what happens.

HTH... D

From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
lists.fd.io
Sent: Wednesday, September 9, 2020 8:15 AM
To: Rajith PR 
Cc: vpp-dev 
Subject: Re: [vpp-dev]: Crash in Timer wheel infra

“wheel slips” – aka calling tw_timer_expire_timer[_vec] later than expected 
every so often – is not a catastrophic problem; so long as the delay isn’t 
ridiculous. You’d need to compute mean delay and delay variance to know whether 
the slips that you’re seeing are trivial, or non-trivial. Consider adding elog 
instrumentation which will give you a nice picture of what’s going on.

I’m going to take a wild guess that you might be using a rather old version of 
vpp, which may not have a set of important fixes in 
clib_time_verify_frequency(). Those changes deal with deus ex machina (NTP or 
sysadmin) changes to the Linux timebase.

We had an internal customer complain about the timer wheel code “doing awful 
things” when the system timebase jumped forward or backward by an hour. Don’t 
get me started. The code now manages to deal with that situation. At the 
timer-wheel level: if a “wheel slip” due to NTP or sysadmin silliness causes 
Avogadro’s number of timers to expire in a single call, what you’re calling 
“havoc” will surely occur.

Changing the timer granularity from 10us to 100us should just work. I’ll try it 
and make sure that it does, in fact, work. Note that just changing the constant 
presented to the timer wheel init code is NOT sufficient.

D.

From: Rajith PR mailto:raj...@rtbrick.com>>
Sent: Wednesday, September 9, 2020 2:06 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev]: Crash in Timer wheel infra

Hi Andreas/Dave,

I did some experiments to debug the crash.

Firstly, I added some profiling code in vlib/main.c. The code basically is 
added to know the timer_wheel slips that can cause such havocs(as mentioned by 
Andreas). There are slippages as you can see from the data collected from the 
core file.

Total slips = 21489 out of total 98472987 runs.

10 usec for the process timer wheel is something we may not be able to achieve 
as we have a process node in which our solution runs. We would like to increase 
10 usec to 100 usec and observe the behaviour.  I tried increasing the interval 
from 10 usec to 100 usec but then the process nodes were scheduled very slow. 
What is the correct way to increase the interval?


Profiling code added,
   tw_start_time = 
vlib_time_now<http://rajith/lxr/http/ident?sn=vpp-19-08;i=vlib_time_now> (vm);

   if (tw_start_time > tw_last_start_time) {

   interval = tw_start_time - tw_last_start_time;

   if (interval > PROCESS_TW_TIMER_INTERVAL) {

   tw_slips++;

   }

   tw_total_run++;

   }

   tw_last_start_time = tw_start_time;



   nm->data_from_advancing_timing_wheel =

 TW<http://rajith/lxr/http/ident?sn=vpp-19-08;i=TW> 
(tw_timer_expire_timers_vec<http://rajith/lxr/http/ident?sn=vpp-19-08;i=tw_timer_expire_timers_vec>)

 ((TWT<http://rajith/lxr/http/ident?sn=vpp-19-08;i=TWT> 
(tw_timer_wheel) *) nm->timing_wheel, 
vlib_time_now<http://rajith/lxr/http/ident?sn=vpp-19-08;i=vlib_time_now> (vm),

  nm->data_from_advancing_timing_wheel);
Secondly, during the debugging we got another crash (line 1904 of vlib/main.c) 
below.
From gdb we found that vec_len of nm->data_from_advancing_timing_wheel is 1. 
But nm->data_from_advancing_timing_wheel[0] = ~0.


1896<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1896>  
 if (PREDICT_FALSE<http://rajith/lxr/http/ident?sn=vpp-19-08;i=PREDICT_FALSE>

1897<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1897>  
 (_vec_len<http://rajith/lxr/http/ident?sn=vpp-19-08;i=_vec_len> 
(nm->data_from_advancing_timing_wheel) > 0))

1898<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1898>  
   {

1899<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1899>  
 uword<http://rajith/lxr/http/ident?sn=vpp-19-08;i=uword> 
i<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i>;

1900<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1900>

1901<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1901>  
 for (i<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i> = 0; 
i<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i> < 
_vec_len<http://rajith/lxr/http/ident?sn=vpp-19-08;i=_vec_len> 
(nm->data_from_advancing_timing_wheel);

1902<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1902>  
  i<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i>++)

1903<http://rajith/lxr/http/

Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-09 Thread Dave Barach via lists.fd.io
“wheel slips” – aka calling tw_timer_expire_timer[_vec] later than expected 
every so often – is not a catastrophic problem; so long as the delay isn’t 
ridiculous. You’d need to compute mean delay and delay variance to know whether 
the slips that you’re seeing are trivial, or non-trivial. Consider adding elog 
instrumentation which will give you a nice picture of what’s going on.

I’m going to take a wild guess that you might be using a rather old version of 
vpp, which may not have a set of important fixes in 
clib_time_verify_frequency(). Those changes deal with deus ex machina (NTP or 
sysadmin) changes to the Linux timebase.

We had an internal customer complain about the timer wheel code “doing awful 
things” when the system timebase jumped forward or backward by an hour. Don’t 
get me started. The code now manages to deal with that situation. At the 
timer-wheel level: if a “wheel slip” due to NTP or sysadmin silliness causes 
Avogadro’s number of timers to expire in a single call, what you’re calling 
“havoc” will surely occur.

Changing the timer granularity from 10us to 100us should just work. I’ll try it 
and make sure that it does, in fact, work. Note that just changing the constant 
presented to the timer wheel init code is NOT sufficient.

D.

From: Rajith PR 
Sent: Wednesday, September 9, 2020 2:06 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev 
Subject: Re: [vpp-dev]: Crash in Timer wheel infra

Hi Andreas/Dave,

I did some experiments to debug the crash.

Firstly, I added some profiling code in vlib/main.c. The code basically is 
added to know the timer_wheel slips that can cause such havocs(as mentioned by 
Andreas). There are slippages as you can see from the data collected from the 
core file.

Total slips = 21489 out of total 98472987 runs.

10 usec for the process timer wheel is something we may not be able to achieve 
as we have a process node in which our solution runs. We would like to increase 
10 usec to 100 usec and observe the behaviour.  I tried increasing the interval 
from 10 usec to 100 usec but then the process nodes were scheduled very slow. 
What is the correct way to increase the interval?


Profiling code added,
   tw_start_time = 
vlib_time_now (vm);

   if (tw_start_time > tw_last_start_time) {

   interval = tw_start_time - tw_last_start_time;

   if (interval > PROCESS_TW_TIMER_INTERVAL) {

   tw_slips++;

   }

   tw_total_run++;

   }

   tw_last_start_time = tw_start_time;



   nm->data_from_advancing_timing_wheel =

 TW 
(tw_timer_expire_timers_vec)

 ((TWT 
(tw_timer_wheel) *) nm->timing_wheel, 
vlib_time_now (vm),

  nm->data_from_advancing_timing_wheel);
Secondly, during the debugging we got another crash (line 1904 of vlib/main.c) 
below.
From gdb we found that vec_len of nm->data_from_advancing_timing_wheel is 1. 
But nm->data_from_advancing_timing_wheel[0] = ~0.


1896  
 if (PREDICT_FALSE

1897  
 (_vec_len 
(nm->data_from_advancing_timing_wheel) > 0))

1898  
   {

1899  
 uword 
i;

1900

1901  
 for (i = 0; 
i < 
_vec_len 
(nm->data_from_advancing_timing_wheel);

1902  
  i++)

1903  
   {

1904  
 u32 d = 
nm->data_from_advancing_timing_wheel[i];

1905  
 u32 
di = 

Re: [vpp-dev] Issue with adding new new node in between ip4-input n ip4-lookup

2020-09-15 Thread Dave Barach via lists.fd.io
Sounds like you may not have enabled the “test-node” feature on the rx 
sw_if_index. “show interface  features”... Note that if the packet comes 
from a bridge group, I suspect that you’ll need to enable the feature on the 
bvi vs the rx interface.

This is the kind of problem which “pcap dispatch trace ...” tends to help chase 
down.

D.

From: vpp-dev@lists.fd.io  On Behalf Of Caffeine Coder via 
lists.fd.io
Sent: Tuesday, September 15, 2020 2:56 AM
To: Vpp-dev 
Subject: [vpp-dev] Issue with adding new new node in between ip4-input n 
ip4-lookup

Hi
I am trying to add a new node to parse packet data after all vxlan decoding, 
ip4-input and before doing IP lookup. This code flow is is working fine for 
packets coming from vxlan-gpe/IPSec tunnels and not for vxlan.
Traffic is working fine except not going through this new "test-node".

The issue i am seeing with vxlan packet is: packet is going directly 
ip4-lookup. i.e from ip4-vxlan-bypass->vxlan4-input->l2 
(input,learn,fwd)->ip4-input->ip4_lookup. But I can't hook my node in between 
ip4-input and ip4_lookup.

If I do vxlan-gpe or IPSEC, i can hook the test node properly i.e. 
ip4-vxlan-gpe-bypass->vxlan4-gpe-input->ip4-input->test-node->ip4_lookup.

I tried adding and changing ".runs_before" and ".runs_after" in my test-node. 
Still I am not able to stitch my test-node, if traffic coming from  vxlan 
tunnels.
"show vlib graph" is very useful command to debug for not able to find what 
else might be missing for vxlan. I do see that vxlan packet going through lot 
of l2 nodes. Does that need any special config for making non vpp native l3 
nodes to work fine?

Any pointers will be helpful.

Thanks
Sam.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17399): https://lists.fd.io/g/vpp-dev/message/17399
Mute This Topic: https://lists.fd.io/mt/76859804/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] sorted list functions?

2020-09-14 Thread Dave Barach via lists.fd.io
You're welcome to dust off, test, and use the skiplist code if you like. 

AFAIK, it has never been used, battle-tested, or hardened. That's why it ended 
up in the "deprecated" directory...

Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of G. Paul Ziemba
Sent: Sunday, September 13, 2020 3:52 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] sorted list functions?

I'm looking for some kind of sorted list capability in VPP (my key is 24 bits, 
value is 64 bits). I found the skiplist code
(vppinfra/slist.[ch]) which would be perfect (modulo value size) but that's 
going away apparently.

Is there some other sorted list capability that is already part of VPP?  
Requirements are the usual add/delete/search as well as fast discovery of 
adjacent nodes (previous and next) after a search (skiplist path afforded that).

thanks!
--
G. Paul Ziemba
FreeBSD unix:
12:51PM  up 105 days, 20:32, 43 users, load averages: 0.32, 0.26, 0.41
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17390): https://lists.fd.io/g/vpp-dev/message/17390
Mute This Topic: https://lists.fd.io/mt/76827689/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP 2005 is crashing on stopping the VCL applications #vpp-hoststack

2020-08-17 Thread Dave Barach via lists.fd.io
You can press the “cherrypick” button as easily as Florin... Hint...

From: vpp-dev@lists.fd.io  On Behalf Of Raj Kumar
Sent: Monday, August 17, 2020 5:09 PM
To: Ayush Gautam 
Cc: Florin Coras ; vpp-dev 
Subject: Re: [vpp-dev] VPP 2005 is crashing on stopping the VCL applications 
#vpp-hoststack

Hi Florin,
Can we please have the fix[1] on "stable/2005" branch.

[1] https://gerrit.fd.io/r/c/vpp/+/28173

 Thanks,
-Raj

On Wed, Aug 5, 2020 at 7:30 PM Raj Kumar via lists.fd.io 
mailto:gmail@lists.fd.io>> wrote:
Hi Florin,
Yes , this[1] fixed the issue.

thanks,
-Raj

On Wed, Aug 5, 2020 at 1:57 AM Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Raj,

Does this [1] fix the issue?

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/28173


On Aug 4, 2020, at 8:24 AM, Raj Kumar 
mailto:raj.gauta...@gmail.com>> wrote:

Hi Florin,
I tried vppcom_epoll_wait() on 2 different  servers by using a simple 
application ( with only 1 worker thread) . But, still vppcom_epoll_wait() is 
not being timed out if I do not use use-mq-eventfd .
Here are the server details -
server 1  -  Red hat 7.5  , vpp release - 20.01
server 2 - Centos 8.1  , vpp release - 20.05

thanks,
-Raj


On Tue, Aug 4, 2020 at 10:24 AM Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Raj,

Interesting. For some reason, the message queue’s underlying 
pthread_cond_timedwait does not work in your setup. Not sure what to make of 
that, unless maybe you’re trying to epoll wait from multiple threads that share 
the same message queue. That is not supported since each thread must have its 
own message queue, i.e., all threads that call epoll should be registered as 
workers. Alternatively, some form of locking or vls, instead of vcl, should be 
used.

The downside to switching the message queue to eventfd notifications, instead 
of mutext/condvar, is that waits are inefficient, i.e., they act pretty much 
like spinlocks. Do keep that in mind.

Regards,
Florin


On Aug 4, 2020, at 6:37 AM, Raj Kumar 
mailto:raj.gauta...@gmail.com>> wrote:

Hi Florin,
After adding use-mq-eventfd in VCL configuration, it is working as expected.
Thanks! for your help.

vcl {
  rx-fifo-size 400
  tx-fifo-size 400
  app-scope-local
  app-scope-global
  use-mq-eventfd
  api-socket-name /tmp/vpp-api.sock
}

thanks,
-Raj

On Tue, Aug 4, 2020 at 12:08 AM Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Raj,

Glad to hear that issue is solved. What vcl config are you running? Did you 
configure use-mq-eventd?

Regards,
Florin


On Aug 3, 2020, at 8:33 PM, Raj Kumar 
mailto:raj.gauta...@gmail.com>> wrote:

Hi Florin,
This issue is resolved now.  In my application, on receiving the kill signal 
main thread was sending phread_cancel() to the child thread because of that 
child thread was not exiting gracefully.
I have one question; it seems that vppcom_epoll_wait(epfd, rcvEvents, 
MAX_RETURN_EVENTS, 6.0) is not returning after timed out if the timeout 
value is a non zero value. It timed out only if the timeout value is 0.
The issue that I am facing is that if there is no traffic at all ( receiver is 
just listening on the connections ) then the worker thread is not exiting as it 
is blocked by vppcom_epoll_wait().

Thanks,
-Raj



On Wed, Jul 29, 2020 at 11:23 PM Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Raj,

In that case it should work. Just from the trace lower it’s hard to figure out 
what exactly happened. Also, keep in mind that vcl is not thread safe, so make 
sure you’re not trying to share sessions or allow two workers to  interact with 
the message queue(s) at the same time.

Regards,
Florin


On Jul 29, 2020, at 8:17 PM, Raj Kumar 
mailto:raj.gauta...@gmail.com>> wrote:

Hi Florin,
I am using kill  to stop the application. But , the application has a kill 
signal handler and after receiving the signal it is exiting gracefully.
About vppcom_app_exit, I think this function is registered with atexit() inside 
vppcom_app_create() so it should call when the application exits.
Even, I also tried this vppcom_app_exit() explicitly before exiting the 
application but still I am seeing the same issue.

My application is a multithreaded application. Can you please suggest some 
cleanup functions ( vppcom functions) that  I should call before exiting a 
thread and the main application for a proper cleanup.
I also tried vppcom_app_destroy() before exiting the main application but still 
I am seeing the same issue.

thanks,
-Raj

On Wed, Jul 29, 2020 at 5:34 PM Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Raj,

Does stopping include a call to vppcom_app_exit or killing the applications? If 
the latter, the apps might be killed with some mutexes/spinlocks held. For now, 
we only support the former.

Regards,
Florin

> On Jul 29, 2020, at 1:49 PM, Raj Kumar 
> mailto:raj.gauta...@gmail.com>> wrote:
>
> Hi,
> In my UDP application , I am using VPP host stack to receive packets and 
> memIf to transmit packets. 

Re: [SUSPECTED SPAM] [vpp-dev] Link third party shared libraries to vpp (plugins)

2020-08-18 Thread Dave Barach via lists.fd.io
Sure. Link your plugin against the third-party library and it will work 
instantly... See src/plugins/rdma/CMakeLists.txt for an example...

Dave

From: vpp-dev@lists.fd.io  On Behalf Of 
sachinpp...@gmail.com
Sent: Tuesday, August 18, 2020 1:25 PM
To: vpp-dev@lists.fd.io
Subject: [SUSPECTED SPAM] [vpp-dev] Link third party shared libraries to vpp 
(plugins)

Hello team,

We would like perform some operations on buffers received in newly created 
plugin. Is it possible to attach some third party shared library to VPP so that 
functions from that can be called from plugin ?

Regards,
SP
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17256): https://lists.fd.io/g/vpp-dev/message/17256
Mute This Topic: https://lists.fd.io/mt/76270700/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] communicate with External app

2020-08-28 Thread Dave Barach via lists.fd.io
Memif / libmemif.

From: vpp-dev@lists.fd.io  On Behalf Of 
sachinpp...@gmail.com
Sent: Friday, August 28, 2020 6:20 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] communicate with External app

Can someone please answer this query?

Regards,
Sachin P.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17299): https://lists.fd.io/g/vpp-dev/message/17299
Mute This Topic: https://lists.fd.io/mt/10641747/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] clib_mem_size return wrong mem size

2020-08-28 Thread Dave Barach via lists.fd.io
clib_mem_size() returns the object’s capacity, which will be >= number of bytes 
requested...


From: vpp-dev@lists.fd.io  On Behalf Of 
jiangxiaom...@outlook.com
Sent: Thursday, August 27, 2020 11:26 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] clib_mem_size return wrong mem size

Hi, experts:
 I find clib_mem_size(...) cann't return the mem size alloc by 
clib_mem_alloc(...). I alloc 32 bytes memory, but clib_mem_size return 44 
length as following snippet:
u32 len, real_len;
void* p;

len = 32;
p = clib_mem_alloc(len);
real_len = clib_mem_size(p);

ASSERT(real_len == len);
It's a bug or my usage of clib_mem_size wrong ?
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17297): https://lists.fd.io/g/vpp-dev/message/17297
Mute This Topic: https://lists.fd.io/mt/76466873/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] clib_mem_size return wrong mem size

2020-08-29 Thread Dave Barach via lists.fd.io
The memory allocator does not track how many bytes you requested. 
Clib_mem_alloc(...) returns no less than the requested number of bytes. That’s 
the API, contract, or promise depending on how you want to put it.

Clib_mem_realloc(...) always allocates a fresh chunk of memory and copies the 
original data into the fresh chunk. It could use a clib_mem_size() check to 
attempt to skip the allocate-and-copy operation. If the original object is 
large enough to accommodate the realloc request, there’s no reason to do 
anything.

However: clib_mem_realloc(...) is called in exactly one place in the code base, 
and it’s known by construction that the case in question requires a new object.

Idiomatic vpp coding uses vectors instead of direct calls to the memory 
allocator. Vectors know how many elements they contain. A u8 * vector would 
likely amount to a drop-in replacement for whatever you’re doing.

We won’t be changing the memory allocator API / contract / promise to track the 
original request size.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of 
jiangxiaom...@outlook.com
Sent: Friday, August 28, 2020 11:11 PM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] clib_mem_size return wrong mem size


Hi,  Dave Barach:
 I want to a realloc funtion like as bollow, the function args doesn't have 
the original mem size, and clib_mem_size seems can't return the right memory 
size. Is there any way to get the memory size ?

void *my_realloc(void *p, size_t s)

{

return p ? clib_mem_realloc(p, s, clib_mem_size(p)) : clib_mem_alloc(s);

}

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17301): https://lists.fd.io/g/vpp-dev/message/17301
Mute This Topic: https://lists.fd.io/mt/76466873/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ARP resolution from non-connected IP

2020-08-19 Thread Dave Barach via lists.fd.io
Configure proxy-arp.

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Wednesday, August 19, 2020 9:04 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] ARP resolution from non-connected IP


Hi,



Have a query on the ARP resolution.



We have a Router and VPP-box connected.



We are trying to do some peering from router’s loopback IP. This loopback IP is 
not in the connected subnet-range.

Due to this, router is initiating an ARP request with src as non-connected-IP 
in the payload.



VPP is dropping this ARP request saying “IP4 source address not local to 
subnet”.



In linux and in cisco routers, we have options to allow ARP requests from 
non-connected subnets.

Is there any workaround in VPP to allow this ARP request.



Please let us know.
--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17260): https://lists.fd.io/g/vpp-dev/message/17260
Mute This Topic: https://lists.fd.io/mt/76285682/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] pkt validation

2020-09-25 Thread Dave Barach via lists.fd.io
Here's a strategy: Use "show run" to identify graph nodes which are active when 
passing traffic. Nodes of interest will have names like "ethernet-input," 
"ip4-input", etc.

Obviously, the string "ethernet-input" string comes from "somewhere". In 
particular:

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (ethernet_input_node) = {
  .name = "ethernet-input",
  /* Takes a vector of packets. */

};

You'll find a companion function declaration:

VLIB_NODE_FN (ethernet_input_node) (vlib_main_t * vm,
  vlib_node_runtime_t * node,
  vlib_frame_t * frame)
{
  vnet_main_t *vnm = vnet_get_main ();
  u32 *from = vlib_frame_vector_args (frame);
  u32 n_packets = frame->n_vectors;

  ethernet_input_trace (vm, node, frame);

  if (frame->flags & ETH_INPUT_FRAME_F_SINGLE_SW_IF_IDX)
{
  ethernet_input_frame_t *ef = vlib_frame_scalar_args (frame);
  int ip4_cksum_ok = (frame->flags & ETH_INPUT_FRAME_F_IP4_CKSUM_OK) != 0;
  vnet_hw_interface_t *hi = vnet_get_hw_interface (vnm, ef->hw_if_index);
  eth_input_single_int (vm, node, hi, from, n_packets, ip4_cksum_ok);
}
  else
ethernet_input_inline (vm, node, from, n_packets,
ETHERNET_INPUT_VARIANT_ETHERNET);
  return n_packets;
}

That's the actual graph node dispatch function. At that point, it's a matter of 
"using the Force and reading the Source..."

HTH... Dave


From: vpp-dev@lists.fd.io  On Behalf Of hemant via 
lists.fd.io
Sent: Thursday, September 24, 2020 8:06 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] pkt validation

Does VPP, in its ethernet, IP, and IPv6 input, include stock code to check for 
malformed packet? Any pointer to such code would help so that I can look at 
other code as well.

Thanks,

Hemant

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#17507): https://lists.fd.io/g/vpp-dev/message/17507
Mute This Topic: https://lists.fd.io/mt/77070037/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] VPP committers: VPP PTL vote

2020-09-25 Thread Dave Barach via lists.fd.io
Folks,

The self-nomination period closed yesterday. We had one self-nomination, from 
Damjan Marion. At this point, we can proceed with a vote.

I'm sure that Damjan will do a great job, so let me start:

Damjan Marion as VPP PTL: +1

Please vote +1, 0, -1. For once, the "reply-all" button is everyone's friend.

Thanks... Dave



-Original Message-
From: d...@barachs.net 
Sent: Thursday, September 17, 2020 10:32 AM
To: 'vpp-dev@lists.fd.io' ; 't...@lists.fd.io' 

Subject: Happy Trails to Me...



Folks,



I'm departing the employment rolls towards the end of next month. Although I 
intend to remain active in the fd.io vpp community as a coder, committer, and 
resident greybeard, it's time for the community to pick a new PTL.



According to the project governance document, 
https://fd.io/docs/tsc/FD.IO-Technical-Community-Document-12-12-2017.pdf:



3.2.3.1 Project Technical Leader Candidates Candidates for the project's PTL 
will be derived from the Committers of the Project. Candidates must 
self-nominate.



I'd like to invite any interested vpp project committer to self-nominate for 
the PTL role. Please email vpp-dev@lists.fd.io.



Let's close the self-nomination period in one week: more specifically, by 5pm 
EDT on Thursday, September 24, 2020; committer vote to follow thereafter.



I'll be glad to answer unicast questions about the PTL role from eligible 
committers.



Thanks... Dave












-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#17521): https://lists.fd.io/g/vpp-dev/message/17521
Mute This Topic: https://lists.fd.io/mt/77123394/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] vpp committers: project PTL self-nominations close Thurs 9/22/2020 at 2100 UTC

2020-09-22 Thread Dave Barach via lists.fd.io
Thanks... Dave

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#17487): https://lists.fd.io/g/vpp-dev/message/17487
Mute This Topic: https://lists.fd.io/mt/77025295/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Silly programmer trick #47...

2020-09-25 Thread Dave Barach via lists.fd.io
Please don't be this person:

static clib_error_t *
my_cli_command_fn (vlib_main_t * vm,
   unformat_input_t * input,
   vlib_cli_command_t * cmd)
{
  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
{
  if (unformat (input, "mystuff"))
  else
 return (clib_error_return (0, "unknown input '%U'",
format_unformat_error, input));
}
  return (NULL);
}
/* *INDENT-OFF* */
VLIB_CLI_COMMAND (my_command, static) =
{
  .path = "my command",
  .function = my_cli_command_fn,
};

Commands coded like this work fine when typed one at a time, but they blow 
chunks when scripted...

Script:
my command mystuff
comment { ouch my_cli_command_fn ate the word comment and threw up! }

Instead, wrap the while(...) loop with the unformat_line_input guitar lick:

  elib_error_t *e = 0;
  /* Get a line of input. */
  if (!unformat_user (input, unformat_line_input, line_input))
return 0;

  while (unformat_check_input (line_input) != UNFORMAT_END_OF_INPUT)
{
  if (unformat (line_input, "mystuff"))
;
  else {
e = clib_error_return (0, "unknown input '%U'",
  format_unformat_error, input);
goto done;
  }
}


done:
  unformat_free (line_input);
  return e;

Thanks... Dave

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#17532): https://lists.fd.io/g/vpp-dev/message/17532
Mute This Topic: https://lists.fd.io/mt/77125668/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-02 Thread Dave Barach via lists.fd.io
The code manages to access a poisoned adjacency – 0x131313 fill pattern – 
copying Neale for an opinion.

D.

From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR via 
lists.fd.io
Sent: Tuesday, June 2, 2020 10:00 AM
To: vpp-dev 
Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

Hello All,

In 19.08 VPP version we are seeing a crash while accessing the 
load_balance_pool  in load_balanc_get() function. This is happening after 
enabling worker threads.
As such the FIB programming is happening in the main thread and in one of the 
worker threads we see this crash.
Also, this is seen when we scale to 300K+ ipv4 routes.

Here is the complete stack,

Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at 
/home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);

Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
#0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at 
/home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
#1  0x7fbef10676a8 in load_balance_get (lbi=61) at 
/home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
#2  0x7fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080, 
node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at 
/home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
#3  0x7fbef1068ead in ip4_lookup_node_fn_avx2 (vm=0x7fbe8a5aa080, 
node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
#4  0x7fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080, 
node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL, 
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at 
/home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
#5  0x7fbef0c6b7ad in dispatch_pending_node (vm=0x7fbe8a5aa080, 
pending_frame_index=2, last_time_stamp=381215594286358)
at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375
#6  0x7fbef0c6d3f0 in vlib_main_or_worker_loop (vm=0x7fbe8a5aa080, 
is_main=0) at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826
#7  0x7fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at 
/home/ubuntu/Scale/libvpp/src/vlib/main.c:1934
#8  0x7fbef0cac791 in vlib_worker_thread_fn (arg=0x7fbe8de2a340) at 
/home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754
#9  0x7fbef092da48 in clib_calljmp () from 
/home/ubuntu/Scale/libvpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so.1.0.1
#10 0x7fbe4aa8dec0 in ?? ()
#11 0x7fbef0ca700c in vlib_worker_thread_bootstrap_fn (arg=0x7fbe8de2a340) 
at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:573
Thanks in Advance,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16619): https://lists.fd.io/g/vpp-dev/message/16619
Mute This Topic: https://lists.fd.io/mt/74627827/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Crash in vlib_add_trace with multi worker mode

2020-06-02 Thread Dave Barach via lists.fd.io
Unless you fully communicate your configuration, you’ll have to debug the issue 
yourself. Are you using the standard handoff mechanism, or a mechanism of your 
own design?

The handoff demo plugin seems to work fine... See 
../src/examples/handoffdemo/{README.md, node.c} etc.

DBGvpp# sh trace

--- Start of thread 0 vpp_main ---
No packets in trace buffer
--- Start of thread 1 vpp_wk_0 ---
Packet 1

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x100
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 2

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x101
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 3

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x102
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 4

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x103
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

Packet 5

00:00:19:259770: pg-input
  stream x, 128 bytes, sw_if_index 0
  current data 0, length 128, buffer-pool 0, ref-count 1, trace handle 0x104
  : 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d
  0020: 
  0040: 
  0060: 
00:00:19:259851: handoffdemo-1
  HANDOFFDEMO: current thread 1

--- Start of thread 2 vpp_wk_1 ---
Packet 1

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 0
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 2

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 1
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 3

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 2
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 4

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 3
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

Packet 5

00:00:19:259879: handoff_trace
  HANDED-OFF: from thread 1 trace index 4
00:00:19:259879: handoffdemo-2
  HANDOFFDEMO: current thread 2
00:00:19:259930: error-drop
  rx:local0
00:00:19:259967: drop
  handoffdemo-2: completed packets

DBGvpp#

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Tuesday, June 2, 2020 7:11 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Crash in vlib_add_trace with multi worker mode

Hi ,

We are seeing a crash while doing add_trace for a vlib_buffer in our graph node.

#0 0x74ee0feb in raise () from /lib64/libc.so.6
#1 0x74ecb5c1 in abort () from /lib64/libc.so.6
#2 0x0040831c in os_panic () at 
/fdio/src/fdio.1810/src/vpp/vnet/main.c:368
#3 0x75f28f2f in debugger () at 
/fdio/src/fdio.1810/src/vppinfra/error.c:84
#4 0x75f2936a in _clib_error (how_to_die=2, 

Re: [vpp-dev] AMD Epyc and vpp.

2020-07-22 Thread Dave Barach via lists.fd.io
+1 ddio makes a first-order perf difference...

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Wednesday, July 22, 2020 4:04 AM
To: Christian Hopps 
Cc: vpp-dev 
Subject: Re: [vpp-dev] AMD Epyc and vpp.



> On 22 Jul 2020, at 02:33, Christian Hopps 
> mailto:cho...@chopps.org>> wrote:
>
> Hi vpp-dev,
>
> Has anyone done performance analysis with the new AMD epyc processors and VPP?
>
> Just naively running my normal build shows a 3GHz Epyc machine 
> under-performing a 2.1GHz intel xeon.


AFAIK AMD doesn’t have DDIO so I am not surprised.

—
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17033): https://lists.fd.io/g/vpp-dev/message/17033
Mute This Topic: https://lists.fd.io/mt/75716056/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP getting hanged after consecutive VAPI requests

2020-08-07 Thread Dave Barach via lists.fd.io
Looks like you forgot to write up the issue, to name one problem.

Please refer to 
https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html.

From: vpp-dev@lists.fd.io  On Behalf Of Chinmaya Aggarwal
Sent: Friday, August 7, 2020 7:54 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP getting hanged after consecutive VAPI requests

 Can anyone please suggest what is going wrong here?
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17161): https://lists.fd.io/g/vpp-dev/message/17161
Mute This Topic: https://lists.fd.io/mt/75986184/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Unable to create fib table and add interfaces in vpp 20.09

2020-08-12 Thread Dave Barach via lists.fd.io
“... create fib table and add interface to it.”

You need to actually create the table:

vpp# ip table 1
vpp# set int  ip table 1

From: vpp-dev@lists.fd.io  On Behalf Of techi...@gmail.com
Sent: Wednesday, August 12, 2020 8:47 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Unable to create fib table and add interfaces in vpp 20.09

Hello Team,

I have two physical NIC attached to vpp-dpdk. vpp throws following error when I 
try to create fib table and add interface to it.

set interface ip table GigabitEthernet4/0/0 1 ---> set interface ip table: 
no such table 1

"set interface ip table" command should create table for given table id 
according to documentation.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17199): https://lists.fd.io/g/vpp-dev/message/17199
Mute This Topic: https://lists.fd.io/mt/76146302/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Unable to create fib table and add interfaces in vpp 20.09

2020-08-12 Thread Dave Barach via lists.fd.io
Minus the typo this time:

vpp# ip table 1
vpp# set int ip table  1

From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
lists.fd.io
Sent: Wednesday, August 12, 2020 9:08 AM
To: techi...@gmail.com; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Unable to create fib table and add interfaces in vpp 
20.09

“... create fib table and add interface to it.”

You need to actually create the table:

vpp# ip table 1
vpp# set int  ip table 1

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of 
techi...@gmail.com<mailto:techi...@gmail.com>
Sent: Wednesday, August 12, 2020 8:47 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] Unable to create fib table and add interfaces in vpp 20.09

Hello Team,

I have two physical NIC attached to vpp-dpdk. vpp throws following error when I 
try to create fib table and add interface to it.

set interface ip table GigabitEthernet4/0/0 1 ---> set interface ip table: 
no such table 1

"set interface ip table" command should create table for given table id 
according to documentation.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17200): https://lists.fd.io/g/vpp-dev/message/17200
Mute This Topic: https://lists.fd.io/mt/76146302/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] queue drain warning on calling vapi_disconnect()

2020-07-31 Thread Dave Barach via lists.fd.io
For whatever reason, your application didn’t process 3x memclnt_keepalive 
messages. Vpp sends one of these messages every 10 seconds.

vpp# sh api messsage table
ID   Name

21   memclnt_keepalive


The code in question runs after the client sends a MEMCLNT_DELETE message. 
Purpose: make sure that we don’t leak message buffers...

  /* drain the queue */
  if (ntohs (rp->_vl_msg_id) != VL_API_MEMCLNT_DELETE_REPLY)
  {
clib_warning ("queue drain: %d", ntohs (rp->_vl_msg_id));
vl_msg_api_handler ((void *) rp);
continue;
  }

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Chinmaya Aggarwal
Sent: Friday, July 31, 2020 12:42 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] queue drain warning on calling vapi_disconnect()

Hi,
We are integrating VAPI in our application. We created connection to vapi using 
vapi_connect() and everything works fine for us. But during termination, when 
we call vapi_disconnect() we get the following warning in our logs multiple 
times :-
vl_client_disconnect:323: queue drain: 21
vl_client_disconnect:323: queue drain: 21
vl_client_disconnect:323: queue drain: 21

Can anyone suggest why this is happening?

Thanks and Regards,
Chinmaya Agarwal.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17120): https://lists.fd.io/g/vpp-dev/message/17120
Mute This Topic: https://lists.fd.io/mt/75901691/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] classify table, mask flow-label, version, and traffic-class NOT working #classify #vnet

2020-06-30 Thread Dave Barach via lists.fd.io
Patch on the way, thanks for the report.

From: vpp-dev@lists.fd.io  On Behalf Of mauricio.solisjr 
via lists.fd.io
Sent: Tuesday, June 30, 2020 7:53 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] classify table, mask flow-label, version, and traffic-class 
NOT working #classify #vnet

Hi
I've been trying to add a classify table using the following CLI:
classify table miss-next ip6-node ip6-lookup mask l3 ip6 flow-label

I noticed that in src/vnet/classify/vnet_classify.c the following lines cause 
the function "uword unformat_ip6_mask(...)" to return earlier than expected and 
not take "traffic-class", "flow-label", and "version" as mask l3 ip6 inputs:
 #define _(a) found_something += a;
  
foreach_ip6_proto_field;
 #undef _

  if (found_something == 0)
  return 0;
Even though "flow-label" is "something", we still return 0.

Is this expected behavior? I recompiled and added the following lines above the 
mentioned code in order to avoid the return 0:
found_something += version;   found_something += traffic_class;   
found_something += flow_label;
Regards,
Mauricio
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16845): https://lists.fd.io/g/vpp-dev/message/16845
Mute This Topic: https://lists.fd.io/mt/75211945/21656
Mute #vnet: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/vnet
Mute #classify: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/classify
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] name filter fix in 1908

2020-06-30 Thread Dave Barach via lists.fd.io
Dear Chris,

In looking at the patch, I have a question: the API version number changed 
despite the fact that the API itself was unchanged.

Should we revert the API version number bump and then cherry-pick to 19.08? 

Looping in Ole for an opinion...

Thanks... Dave 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Christian Hopps
Sent: Tuesday, June 30, 2020 3:10 AM
To: vpp-dev 
Cc: Christian Hopps 
Subject: [vpp-dev] name filter fix in 1908

Could this fix: https://gerrit.fd.io/r/c/vpp/+/23140

be pulled into stable/1908?

It applies clean after adapting the version number change to be compatible with 
1908 branch.

Thanks,
Chris.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16844): https://lists.fd.io/g/vpp-dev/message/16844
Mute This Topic: https://lists.fd.io/mt/75209217/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Vectors/node and packet size

2020-07-01 Thread Dave Barach via lists.fd.io
In order for the statistics to be accurate, please be sure to do the following:

Start traffic... “clear run”... wait a while to accumulate data... “show run”

Otherwise, the statistics will probably include a huge amount of dead airtime, 
data from previous runs, etc.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Jeremy Brown via 
lists.fd.io
Sent: Monday, June 29, 2020 12:22 PM
To: vpp-dev@lists.fd.io; Dany Gregoire 
Subject: [vpp-dev] Vectors/node and packet size

Greetings,

This is my first post to the forum, so if this is not the right place for this 
post please let me know.

I had a question on VPP performance. We are running two testcases, we limit it 
to single threaded and just using one core in order to reduce as many variables 
as we can. In the two testcases, the only thing that changes, is the size 
incoming packet to VPP.

Using a 64 byte packet, we see a vectors/node of ~80. Simply changing that 
packet size to 1400 we see the same vectors/node fall down to ~2.

This is regardless of pps… there seems to be a non-linear decrease in 
vectors/node with increasing packet size. I was wondering if anyone had noticed 
some similar behavior.


64- byte packets

Thread 1 vpp_wk_0 (lcore 2)
Time 98.9, average vectors/node 80.35, last 128 main loops 0.00 per node 0.00
  vector rates in 1.2643e5, out 1.2643e5, drop 0.e0, punt 2.0228e-2
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call
VirtualFuncEthernet88/10/4-out   active  90915 6249981  
 0  1.06e1   68.75
VirtualFuncEthernet88/10/4-txactive  90915 6249981  
 0  4.06e1   68.75
VirtualFuncEthernet88/11/5-out   active  73270 6249981  
 0  9.27e0   85.30
VirtualFuncEthernet88/11/5-txactive  73270 6249981  
 0  4.05e1   85.30
arp-inputactive  2   2  
 0  3.51e41.00
dpdk-input   polling116612933712499964  
 0  1.38e4 .01
error-punt   active  2   2  
 0  5.56e31.00
ethernet-input   active  2   2  
 0  1.47e41.00
gtpu4-encap  active  90914 6249980  
 0  1.01e2   68.75
gtpu4-input  active  73270 6249981  
 0  7.29e1   85.30
interface-output active  2   2  
 0  2.20e31.00
ip4-input-no-checksumactive 14557012499962  
 0  2.22e1   85.87
ip4-load-balance active  90914 6249980  
 0  1.77e1   68.75
ip4-localactive  73272 6249983  
 0  2.45e1   85.29
ip4-lookup   active 21884018749943  
 0  3.79e1   85.68
ip4-punt active  2   2  
 0  1.27e31.00
ip4-rewrite  active 23648218749940  
 0  2.75e1   79.29
ip4-udp-lookup   active  73270 6249981  
 0  2.44e1   85.301400-byte packets

Thread 1 vpp_wk_0 (lcore 2)
Time 102.1, average vectors/node 2.37, last 128 main loops 0.00 per node 0.00
  vector rates in 1.1841e5, out 1.1438e5, drop 4.0334e3, punt 1.9588e-2
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call
VirtualFuncEthernet88/10/4-out   active2815250 5838981  
 0  8.18e12.07
VirtualFuncEthernet88/10/4-txactive2815250 5838981  
 0  1.25e22.07
VirtualFuncEthernet88/11/5-out   active2765634 5839804  
 0  8.42e12.11
VirtualFuncEthernet88/11/5-txactive2765634 5839804  
 0  2.32e22.11
arp-inputactive  9 825  
 0  2.25e3   91.67
dpdk-input   polling113698238812089787  
 0  1.44e4 .01
error-drop   active 397116  411823  
 0  1.37e21.04
error-punt   active  2   2  
   

Re: [vpp-dev] p2p interfaces and clib_bihash_init (and oom)

2020-07-02 Thread Dave Barach via lists.fd.io
Which vpp version are you using?

The code looks substantially different in master/latest. In particular, you 
must not have this patch...:

Author: Neale Ranns   2020-05-25 05:09:36
Committer: Ole Trøan   2020-05-26 10:54:23
Parent: 080aa503b23a90ed43d7c0b2bc68e2726190a990 (vcl: do not propagate epoll 
events if session closed)
Child:  1bf6df4ff9c83bac1fc329a4b5c4d7061f13720a (fib: Fix interpose source 
reactivate)
Branches: master, remotes/origin/master
Follows: v20.09-rc0
Precedes: WORKS_05_27_2020

fib: Use basic hash for adjacency neighbour table

Type: improvement

a bihash per-interface used too much memory.

Change-Id: I447bb66c0907e1632fa5d886a3600e518663c39e
Signed-off-by: Neale Ranns 


From: vpp-dev@lists.fd.io  On Behalf Of Stanislav Zaikin
Sent: Thursday, July 2, 2020 12:22 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] p2p interfaces and clib_bihash_init (and oom)

Hello folks,

I've tried to set up vpp to handle many pppoe connections. But I faced OOM 
issue:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x767b3899 in __GI_abort () at abort.c:79
#2  0xcef2 in os_panic () at 
/home/zstas/vpp_gerrit/src/vpp/vnet/main.c:366
#3  0x76a51183 in os_out_of_memory () at 
/home/zstas/vpp_gerrit/src/vppinfra/unix-misc.c:222
#4  0x77994f1e in alloc_aligned_24_8 (h=0x7fffb85880c0, nbytes=64) at 
/home/zstas/vpp_gerrit/src/vppinfra/bihash_template.c:60
#5  0x77994fe0 in clib_bihash_instantiate_24_8 (h=0x7fffb85880c0) at 
/home/zstas/vpp_gerrit/src/vppinfra/bihash_template.c:86
...
#9  0x77a02fa5 in adj_nbr_insert (nh_proto=FIB_PROTOCOL_IP4, 
link_type=VNET_LINK_IP4, nh_addr=0x77c3da60 , sw_if_index=1365, 
adj_index=1351) at /home/zstas/vpp_gerrit/src/vnet/adj/adj_nbr.c:83
#10 0x77a0345b in adj_nbr_alloc (nh_proto=FIB_PROTOCOL_IP4, 
link_type=VNET_LINK_IP4, nh_addr=0x77c3da60 , sw_if_index=1365) 
at /home/zstas/vpp_gerrit/src/vnet/adj/adj_nbr.c:200
...
#22 0x7fffb018033d in vnet_pppoe_add_del_session (a=0x7fffb7420d10, 
sw_if_indexp=0x7fffb7420cdc) at 
/home/zstas/vpp_gerrit/src/plugins/pppoe/pppoe.c:418

Despite the fact PPPoE interface is P2P interface, there is the logic in vpp 
which create a pretty big adj table:

BV(clib_bihash_init) (adj_nbr_tables[nh_proto][sw_if_index],
"Adjacency Neighbour table",
ADJ_NBR_DEFAULT_HASH_NUM_BUCKETS,
ADJ_NBR_DEFAULT_HASH_MEMORY_SIZE);

At first, I've tried to to fix it with allocating a smaller table:

int numbuckets = ADJ_NBR_DEFAULT_HASH_NUM_BUCKETS;
int memsize = ADJ_NBR_DEFAULT_HASH_MEMORY_SIZE;
if( vnet_sw_interface_is_p2p( vnet_get_main(), sw_if_index ) == 1 ) {
numbuckets = 4;
memsize = 32 << 8;
}
BV(clib_bihash_init) (adj_nbr_tables[nh_proto][sw_if_index],
 "Adjacency Neighbour table",
 numbuckets,
 memsize);

But I saw that hugetable pages still consumed every time pppoe connection is 
coming up. I've looked into the code of alloc_aligned function, and it seems to 
me that a new memory page will be allocated  in any case (when you are 
initializing new hash table). But how we can deal with situations when we have 
hundreds of thousands of interfaces. Is there a way to prevent this behavior? 
Can we allocate an adj table in some kind of memory pool, or keep one adj table 
for all p2p interfaces?

--
Best regards
Stanislav Zaikin
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16873): https://lists.fd.io/g/vpp-dev/message/16873
Mute This Topic: https://lists.fd.io/mt/75261778/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Replacing master/slave nomenclature

2020-07-09 Thread Dave Barach via lists.fd.io
Looping in the technical steering committee...

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Stephen Hemminger
Sent: Thursday, July 2, 2020 7:02 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Replacing master/slave nomenclature

Is the VPP project addressing the use of master/slave nomenclature in the code 
base, documentation and CLI?  We are doing this for DPDK and it would be good 
if the replacement wording used in DPDK matched the wording used in FD.io 
projects.

Particularly problematic is the use of master/slave in bonding.
This seems to be a leftover from Linux, since none of the commercial products 
use that terminology and it is not present in 802.1AX standard.

The IEEE and IETF are doing an across the board look at these terms in 
standards.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16925): https://lists.fd.io/g/vpp-dev/message/16925
Mute This Topic: https://lists.fd.io/mt/75399929/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] vpp-20.01/src/vlib/threads.c:1408 (vlib_worker_thread_barrier_sync_int) assertion `vlib_get_thread_index () == 0'

2020-07-09 Thread Dave Barach via lists.fd.io
Do not call vlib_worker_thread_barrier_sync() on a worker thread.

From: vpp-dev@lists.fd.io  On Behalf Of ais...@gmail.com
Sent: Thursday, July 9, 2020 1:44 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] vpp-20.01/src/vlib/threads.c:1408 
(vlib_worker_thread_barrier_sync_int) assertion `vlib_get_thread_index () == 0'

Hi,
I am using vpp20.01 on multicore env running on a VM and getting a crash:
(vlib_worker_thread_barrier_sync_int) assertion `vlib_get_thread_index () == 0' 
fails

Startup.conf snippet:
cpu {
  main-core 0
  corelist-workers 1
}

In code I am trying something like this:
clib_error_t *
clients_connect (vlib_main_t * vm, u32 n_clients, char *uri)
{

  vnet_connect_args_t _a, *a = &_a;
  int i, rv;

  clib_memset (a, 0, sizeof (*a));

  for (i = 0; i < n_clients; i++)
{
  a->uri = (char *) uri;
  a->api_context = i;
  a->app_index = my_app_index;

  vlib_worker_thread_barrier_sync (vm);
  if ((rv = vnet_connect_uri (a)))
{
  vlib_worker_thread_barrier_release (vm);
  return clib_error_return (0, "connect returned: %d", rv);
}
  vlib_worker_thread_barrier_release (vm);
}
  return 0;
}

Can someone let me know why the assert is failing?

Regards,
Aishwarya
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16922): https://lists.fd.io/g/vpp-dev/message/16922
Mute This Topic: https://lists.fd.io/mt/75392381/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Observing a crash in vpp-20.05

2020-07-06 Thread Dave Barach via lists.fd.io
The backtrace appears to indicate that ipv6 link-level traffic is involved. 
It’s likely that the interface corresponding to sw_if_index 2 isn’t 
ipv6-enabled.

Begs the question why the code wipe out – copying Neale but he is on leave at 
the moment – please enable ipv6 on the interface.

It would be useful to run a debug image, since it would almost certainly ASSERT 
here rather than dereferencing a NULL pointer which causes signal 11 (SIGSEGV):

u32
ip6_ll_fib_get (u32 sw_if_index)
{
  ASSERT (vec_len (ip6_ll_table.ilt_fibs) > sw_if_index);

  return (ip6_ll_table.ilt_fibs[sw_if_index]);
}

In future, please try to format backtraces so we can read them without undue 
head-scratching.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Amit Mehra
Sent: Monday, July 6, 2020 5:58 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Observing a crash in vpp-20.05

Hi,

I am running some light ipv4 traffic around 5K pps and observing a core with 
the following bt
Program terminated with signal 6, Aborted. #0 0x2b838f53f387 in raise () 
from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install 
OPWVs11-8.1-el7.x86_64 (gdb) bt #0 0x2b838f53f387 in raise () from 
/lib64/libc.so.6 #1 0x2b838f540a78 in abort () from /lib64/libc.so.6 #2 
0x55deea85617e in os_exit (code=code@entry=1) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vpp/vnet/main.c:390
 #3 0x2b838de26716 in unix_signal_handler (signum=11, si=, 
uc=) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vlib/unix/main.c:187
 #4  #5 0x2b838d434479 in ip6_ll_fib_get 
(sw_if_index=2) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vnet/ip/ip6_ll_table.c:32
 #6 0x2b838d7c4904 in ip6_ll_dpo_inline (frame=0x2b8397edf880, 
node=0x2b83988669c0, vm=0x2b8397ca9040) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vnet/dpo/ip6_ll_dpo.c:132
 #7 ip6_ll_dpo_switch (vm=0x2b8397ca9040, node=0x2b83988669c0, 
frame=0x2b8397edf880) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vnet/dpo/ip6_ll_dpo.c:170
 #8 0x2b838dddeec7 in dispatch_node (last_time_stamp=, 
frame=0x2b8397edf880, dispatch_state=VLIB_NODE_STATE_POLLING, 
type=VLIB_NODE_TYPE_INTERNAL, node=0x2b83988669c0, vm=0x2b8397ca9040) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vlib/main.c:1235
 #9 dispatch_pending_node (vm=vm@entry=0x2b8397ca9040, 
pending_frame_index=pending_frame_index@entry=4, last_time_stamp=) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vlib/main.c:1403
 #10 0x2b838dde00bf in vlib_main_or_worker_loop (is_main=0, 
vm=0x2b8397ca9040) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vlib/main.c:1862
 #11 vlib_worker_loop (vm=0x2b8397ca9040) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/src/vlib/main.c:1996
 #12 0x2b838e8c5cac in clib_calljmp () from 
/opt/opwv/S11/8.1/tools/vpp/lib/libvppinfra.so.20.05 #13 0x2b85ca3b3c40 in 
?? () #14 0x2b8411e3107a in eal_thread_loop (arg=) at 
/bfs-build/build-area.42/builds/LinuxNBngp_8.X_RH7/2020-07-02-1702/third-party/vpp_2005/vpp_2005/build-root/build-vpp-native/external/dpdk-20.02/lib/librte_eal/linux/eal/eal_thread.c:153
 #15 0x00010d0c in ?? ()

Is this a known issue in vpp-20.05?

Regards
Amit
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16887): https://lists.fd.io/g/vpp-dev/message/16887
Mute This Topic: https://lists.fd.io/mt/75329694/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Query on Performance with pools

2020-07-07 Thread Dave Barach via lists.fd.io
“perf top -p `pidof vpp`” and go figure out what’s happening. Drill down on 
your node dispatch function, and look for stalls.

What you’re describing sounds like a non-functional prefetch strategy or other 
coding error.

Since we can’t look at the code, that’s about all we have to suggest...

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Satya Murthy
Sent: Tuesday, July 7, 2020 10:37 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Query on Performance with pools

Hi ,

We are seeing following observations when we do performance tests with our 
plugin/graphnodes.

1. When we have 1 subscriber session, our custom-graph-node takes 1.35e1 cycles 
 ( 60 vec/call )
2. When we have 200 subscriber sessions, the same custom-graph-node takes 1.2e2 
cycles ( same 60 vec/call ).

The difference between 1.35e1 and 1.2e2 seems pretty high, and, with more 
subscribers in picture, this will degrade performance a lot.
Also, this clearly shows that our graph node implementation is not doing enough 
prefetching.

Our psuedo-code is something like this:

Quad loop for buffers:
  1.prefetech buffer headers/ buffer data
  2.get session-top object from pool
   3.get session-leaf-1 from session-top object
  4. get session-leaf-2 from session-leaf-1 object

We are only doing prefetching in step1 alone.
How we can do pre-fetching in the case of steps2,3,4 ?

When we have objects in a pool, which are scattered across, how to take 
leverage of data-prefetching.
Any pointers / hints on how to handle this please.

--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16906): https://lists.fd.io/g/vpp-dev/message/16906
Mute This Topic: https://lists.fd.io/mt/75356233/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Replacing master/slave nomenclature

2020-07-13 Thread Dave Barach via lists.fd.io
+1, especially since our next release will be supported for a year, and API 
name changes are involved... 

-Original Message-
From: Kinsella, Ray  
Sent: Monday, July 13, 2020 6:01 AM
To: Dave Barach (dbarach) ; Stephen Hemminger 
; vpp-dev@lists.fd.io; t...@lists.fd.io; Ed 
Warnicke (eaw) 
Subject: Re: [vpp-dev] Replacing master/slave nomenclature

Hi Stephen,

I agree, I don't think we should ignore this.
Ed - I suggest we table a discussion at the next FD.io TSC?

Ray K

On 09/07/2020 17:05, Dave Barach via lists.fd.io wrote:
> Looping in the technical steering committee...
> 
> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Stephen Hemminger
> Sent: Thursday, July 2, 2020 7:02 PM
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] Replacing master/slave nomenclature
> 
> Is the VPP project addressing the use of master/slave nomenclature in the 
> code base, documentation and CLI?  We are doing this for DPDK and it would be 
> good if the replacement wording used in DPDK matched the wording used in 
> FD.io projects.
> 
> Particularly problematic is the use of master/slave in bonding.
> This seems to be a leftover from Linux, since none of the commercial products 
> use that terminology and it is not present in 802.1AX standard.
> 
> The IEEE and IETF are doing an across the board look at these terms in 
> standards.
> 
> 
> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16943): https://lists.fd.io/g/vpp-dev/message/16943
Mute This Topic: https://lists.fd.io/mt/75399929/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP 20.05.1 tomorrow 15th July 2020

2020-07-15 Thread Dave Barach via lists.fd.io
This one is definitely safe, and worth including: 
https://gerrit.fd.io/r/c/vpp/+/27281

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Andrew Yourtchenko
Sent: Wednesday, July 15, 2020 6:27 AM
To: Elias Rudberg 
Cc: vpp-dev@lists.fd.io; dwallac...@gmail.com
Subject: Re: [vpp-dev] VPP 20.05.1 tomorrow 15th July 2020

Hi Elias, sure, feel free to cherry-pick to stable/2005 branch and add me as a 
reviewer, then I can merge when JJB gives thumbs up.

--a

> On 15 Jul 2020, at 07:25, Elias Rudberg  wrote:
> 
> Hello Andrew,
> 
> The following two fixes have been merged to the master branch, it 
> would be good to have them in stable/2005 also:
> 
> https://gerrit.fd.io/r/c/vpp/+/27280 (misc: ipfix-export unformat u16 
> collector_port fix)
> 
> https://gerrit.fd.io/r/c/vpp/+/27281 (nat: fix regarding vm arg for 
> vlib_time_now call)
> 
> Best regards,
> Elias
> 
> 
>> On Tue, 2020-07-14 at 19:04 +0200, Andrew Yourtchenko wrote:
>> Hi all,
>> 
>> As agreed on the VPP community call today, we will declare the 
>> current stable/2005 branch as v20.05.1 tomorrow (15th July)
>> 
>> If you have any fixes that are already in master but not yet in 
>> stable/2005, that you want to get in there - please let  me know 
>> before noon UTC.
>> 
>> --a
>> Your friendly release manager
>> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16968): https://lists.fd.io/g/vpp-dev/message/16968
Mute This Topic: https://lists.fd.io/mt/75503386/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] why define an unused var

2020-07-11 Thread Dave Barach via lists.fd.io
+1 to what Chris wrote. For your own sake, please refer to master/latest.

Vpp 16.09 is obsolete, unsupported, and sufficiently different from the current 
codebase to lead you down all sorts of blind alleys.

HTH... Dave

From: vpp-dev@lists.fd.io  On Behalf Of "??
Sent: Saturday, July 11, 2020 4:48 AM
To: vpp-dev 
Subject: [vpp-dev] why define an unused var

hi there,

i'm reading vpp code,  and confused with this code: int rv __attribute__ 
((unused)) = write (2, "Main heap allocation failure!\r\n", 31); why it define 
a var named rv, but never used, what dose this var mean to?

git:stable/1609
vpp/vnet/main.c:264
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16938): https://lists.fd.io/g/vpp-dev/message/16938
Mute This Topic: https://lists.fd.io/mt/75435245/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP_Main Thread Gets Stuck

2020-06-19 Thread Dave Barach via lists.fd.io
Vpp version? Configuration? Backtraces from other threads? The timer wheel code 
is not likely to be directly responsible.

Earlier this year, we addressed a number of issues in vppinfra/time.[ch] having 
to do with NTP and/or manual time changes which could lead to symptoms like 
this.

If you don’t have those patches, it would be best to acquire them at your 
earliest convenience. T=131 seconds is within the plausible range for an NTP 
timebase earthquake.

HTH... Dave

Please refer to 
https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html#

From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR via 
lists.fd.io
Sent: Friday, June 19, 2020 12:30 AM
To: vpp-dev 
Subject: [vpp-dev] VPP_Main Thread Gets Stuck

Hi All,

While during scale tests with large numbers of routes, we occasionally hit a 
strange issue in our container. The vpp process became unresponsive, after 
attaching the process to gdb we could see the vpp_main thread is stuck on a 
specific function. Any pointer to debug such issues would be of great help.

Back Trace:

#0 0x7f6895f1bc56 in clib_bitmap_get (ai=0x7f683ad339c0, i=826) at 
/development/libvpp/src/vppinfra/bitmap.h:201
#1 0x7f6895f20357 in tw_timer_expire_timers_internal_1t_3w_1024sl_ov 
(tw=0x7f683ad3, now=131.6111045732342, callback_vector_arg=0x7f683ad330c0) 
at /development/libvpp/src/vppinfra/tw_timer_template.c:744 #2 
0x7f6895f20b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov 
(tw=0x7f683ad3, now=131.6111045732342, vec=0x7f683ad330c0) at 
/development/libvpp/src/vppinfra/tw_timer_template.c:814 #3 0x7f68961fd166 
in vlib_main_or_worker_loop (vm=0x7f689649ce00 , is_main=1) 
at /development/libvpp/src/vlib/main.c:1857 #4 0x7f68961fd8b1 in 
vlib_main_loop (vm=0x7f689649ce00 ) at 
/development/libvpp/src/vlib/main.c:1928 #5 0x7f68961fe578 in vlib_main 
(vm=0x7f689649ce00 , input=0x7f683a60ffb0) at 
/development/libvpp/src/vlib/main.c:2145 #6 0x7f6896264865 in thread0 
(arg=140087174745600) at /development/libvpp/src/vlib/unix/main.c:666 #7 
0x7f6895ebd600 in clib_calljmp () from /usr/local/lib/libvppinfra.so.1.0.1 
#8 0x7fff47e2f760 in ?? () #9 0x7f6896264ddb in vlib_unix_main 
(argc=21, argv=0x563cecf5f900) at /development/libvpp/src/vlib/unix/main.c:736

Thanks,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16763): https://lists.fd.io/g/vpp-dev/message/16763
Mute This Topic: https://lists.fd.io/mt/74973962/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Sub Interface/VLAN receiving packets directly from the Interface and multi thread approach

2020-06-23 Thread Dave Barach via lists.fd.io
Configure vlan subif(s), and configure L2 mode.

Attach your feature code to the L2 input feature arc(s), and enable on the 
indicated sw_if_index(es):

 .arc_name  = "l2-input-ip4",
  .arc_name  = "l2-input-ip6",
  .arc_name  = "l2-input-nonip",

Before you mess around spinning up threads, see if the standard (aka zero-work) 
approach provides sufficient performance. Hardware RSS hashing across a set of 
standard worker threads tends to scale linearly with the number of worker 
threads, up to some ridiculous PPS rates.

Note that any specific NIC type may not handle small packets at line rate due 
to PCI-bus limitations. You can’t win at that game.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of RaviKiran Veldanda
Sent: Tuesday, June 23, 2020 8:39 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Sub Interface/VLAN receiving packets directly from the 
Interface and multi thread approach

Hi Team,
Any advice or pointer helps us to progress on this issue.
//Ravi
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16792): https://lists.fd.io/g/vpp-dev/message/16792
Mute This Topic: https://lists.fd.io/mt/74987897/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ASSERT in arp_mk_reply

2020-06-29 Thread Dave Barach via lists.fd.io
So tell us: what is the interface in question? How was it created?

Was vec_len(hi->hw_address) 6 when the interface was created? At the time of 
the crash, was the hardwre address hw->hw_address intact but with an incorrect 
length, or was it trash?

Haven’t seen any previous report of such misbehavior, which makes me wonder if 
the issue is peculiar to your code, use-case, interface type / handling, etc.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR via 
lists.fd.io
Sent: Sunday, June 28, 2020 12:27 PM
To: vpp-dev 
Subject: [vpp-dev] ASSERT in arp_mk_reply

Hi All,

We are seeing  ASSERT (vec_len (hw_if0->hw_address) == 6); being hit in 
arp_mk_reply() . This is happening on 19.08.
We are having worker threads and a main thread.

As such , the hw_if0 appears to be valid(the pointer and content).
But the length of the vector is 15.
I have attached some information from the core file.

Thread 11 (Thread 0x7f085f7fe700 (LWP 261)):
#0  vlib_get_trace_count (vm=0x7f08bc310480, rt=0x7f08b9a3d0c0) at 
/development/libvpp/src/vlib/trace_funcs.h:177
#1  0x7f092b98082c in rtb_vpp_shm_device_input (vm=0x7f08bc310480, 
shmm=0x7f092bbe9980 , shmif=0x7f08bbdf93c0, 
node=0x7f08b9a3d0c0,
frame=0x0, thread_index=2, queue_id=0) at 
/development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:341
#2  0x7f092b98102e in rtb_vpp_shm_input_node_fn (vm=0x7f08bc310480, 
node=0x7f08b9a3d0c0, f=0x0) at 
/development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:434
#3  0x7f092a020c4f in dispatch_node (vm=0x7f08bc310480, 
node=0x7f08b9a3d0c0, type=VLIB_NODE_TYPE_INPUT, 
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
last_time_stamp=3193893401554218) at 
/development/libvpp/src/vlib/main.c:1207
#4  0x7f092a022d9c in vlib_main_or_worker_loop (vm=0x7f08bc310480, 
is_main=0) at /development/libvpp/src/vlib/main.c:1779
#5  0x7f092a0238d1 in vlib_worker_loop (vm=0x7f08bc310480) at 
/development/libvpp/src/vlib/main.c:1934
#6  0x7f092a062306 in vlib_worker_thread_fn (arg=0x7f08b9749140) at 
/development/libvpp/src/vlib/threads.c:1754
#7  0x7f0929ce3600 in clib_calljmp () from 
/usr/local/lib/libvppinfra.so.1.0.1
#8  0x7f085f7fdec0 in ?? ()
#9  0x7f092a05cb92 in vlib_worker_thread_bootstrap_fn (arg=0x7f08b9749140) 
at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 10 (Thread 0x7f085700 (LWP 260)):
#0  0x7f0935cb86c2 in __GI___waitpid (pid=7605, 
stat_loc=stat_loc@entry=0x7f08bc73db18, options=options@entry=0) at 
../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f0935c23067 in do_system (line=) at 
../sysdeps/posix/system.c:149
#2  0x7f093673c21a in bd_signal_handler_cb (signo=6) at 
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7f092a088d17 in rtb_bd_signal_handler (signo=6) at 
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7f092a0890b2 in unix_signal_handler (signum=6, si=0x7f08bc73e2f0, 
uc=0x7f08bc73e1c0) at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7f0935c14801 in __GI_abort () at abort.c:79
#8  0x7f092b827476 in os_panic () at 
/development/libvpp/src/vpp/vnet/main.c:559
#9  0x7f0929cc2825 in debugger () at 
/development/libvpp/src/vppinfra/error.c:84
#10 0x7f0929cc2bf4 in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, fmt=0x7f092b1156b8 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
#11 0x7f092aa096bf in arp_mk_reply (vnm=0x7f092b5a05a0 , 
p0=0x1001e349c0, sw_if_index0=8, if_addr0=0x7f08bbe399f4, arp0=0x1001e34b0e,
eth_rx=0x1001e34b00) at /development/libvpp/src/vnet/ethernet/arp.c:1206
#12 0x7f092aa09e76 in arp_reply (vm=0x7f08bc30fd40, node=0x7f08bcc4dc80, 
frame=0x7f08be1835c0) at /development/libvpp/src/vnet/ethernet/arp.c:1514
#13 0x7f092a020c4f in dispatch_node (vm=0x7f08bc30fd40, 
node=0x7f08bcc4dc80, type=VLIB_NODE_TYPE_INTERNAL, 
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7f08be1835c0, last_time_stamp=3193892880085078) at 
/development/libvpp/src/vlib/main.c:1207
#14 0x7f092a02140a in dispatch_pending_node (vm=0x7f08bc30fd40, 
pending_frame_index=2, last_time_stamp=3193892880085078)
at /development/libvpp/src/vlib/main.c:1375
#15 0x7f092a02304e in vlib_main_or_worker_loop (vm=0x7f08bc30fd40, 
is_main=0) at /development/libvpp/src/vlib/main.c:1826
#16 0x7f092a0238d1 in vlib_worker_loop (vm=0x7f08bc30fd40) at 
/development/libvpp/src/vlib/main.c:1934
#17 0x7f092a062306 in vlib_worker_thread_fn (arg=0x7f08b9749040) at 
/development/libvpp/src/vlib/threads.c:1754
#18 0x7f0929ce3600 in clib_calljmp () from 
/usr/local/lib/libvppinfra.so.1.0.1
#19 0x7f085fffeec0 in ?? ()
#20 0x7f092a05cb92 in vlib_worker_thread_bootstrap_fn (arg=0x7f08b9749040) 
at /development/libvpp/src/vlib/threads.c:573
---Type  to continue, or q  to quit---q
Quit
(gdb) thread 10

Re: [vpp-dev] Regarding vlib_time_now

2020-06-14 Thread Dave Barach via lists.fd.io
What is the magnitude of the delta that you observe? What does "show clock 
verbose" say about the state of clock-rate convergence? Is a deus ex machina 
(e.g. NTP) involved?



-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Prashant Upadhyaya
Sent: Sunday, June 14, 2020 10:32 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Regarding vlib_time_now



Hi,



I am using VPP 19.08

In my worker threads, I am observing that when I am making successive calls to 
vlib_time_now in a polling node, sometimes the value of the time reduces.

Is this expected to happen ? (presumably because of the implementation which 
tries to align the times in workers ?) I have an implementation which is 
extremely sensitive to time at microsecond level and depends on the the 
vlib_time_now only increasing monotonically across calls individually in the 
workers (or remain the same but never decrease) on a per worker basis even if 
the times within the workers are not synchronized.



Regards

-Prashant
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16722): https://lists.fd.io/g/vpp-dev/message/16722
Mute This Topic: https://lists.fd.io/mt/74875583/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Regarding vlib_time_now

2020-06-15 Thread Dave Barach via lists.fd.io
That's within reason given that thread time offsets are not recalculated 
immediately, and that (for stability reasons) the clock-rate update algorithm 
uses exponential smoothing.

Aside from accounting for the issue in your code, there probably isn't much to 
be done about it...

D

-Original Message-
From: Prashant Upadhyaya  
Sent: Monday, June 15, 2020 8:58 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Regarding vlib_time_now

Hi Dave,

Thanks, on a VM I am observing the reduction from a couple of microseconds to 
50 microseconds at times NTP was turned on. After turning it off, I don't see 
the time reduction.
The output of the command is below

vppctl show clock verbose

Time now 16712.719968, reftime 16712.719967, error .01, clocks/sec
2596982853.770165

Time last barrier release 16709.938950671

1: Time now 16710.417730, reftime 16710.417730, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279669 error -.00032

[root@bfs-dl360g9-16-vm4 iptabl]#
/opt/opwv/integra/SystemActivePath/tools/vpp/bin/vppctl show clock verbose

Time now 16715.621101, reftime 16715.621101, error 0.00, clocks/sec 
2596982853.770165

Time last barrier release 16712.721636492

1: Time now 16713.318854, reftime 16713.318854, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279482 error -.8

[root@bfs-dl360g9-16-vm4 iptabl]#
/opt/opwv/integra/SystemActivePath/tools/vpp/bin/vppctl show clock verbose

Time now 16718.249427, reftime 16718.249427, error 0.00, clocks/sec 
2596982853.770165

Time last barrier release 16715.621212275

1: Time now 16715.947179, reftime 16715.947179, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279562 error -.8

[root@bfs-dl360g9-16-vm4 iptabl]#
/opt/opwv/integra/SystemActivePath/tools/vpp/bin/vppctl show clock verbose

Time now 16719.646461, reftime 16719.646461, error 0.00, clocks/sec 
2596982853.770165

Time last barrier release 16718.249525477

1: Time now 16717.344206, reftime 16717.344206, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279598 error -.9

[root@bfs-dl360g9-16-vm4 iptabl]#
/opt/opwv/integra/SystemActivePath/tools/vpp/bin/vppctl show clock verbose

Time now 16721.162232, reftime 16721.162232, error 0.00, clocks/sec 
2596982853.770165

Time last barrier release 16720.702629716

1: Time now 16718.859979, reftime 16718.859979, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279598 error -.8

[root@bfs-dl360g9-16-vm4 iptabl]#
/opt/opwv/integra/SystemActivePath/tools/vpp/bin/vppctl show clock verbose

Time now 16722.313997, reftime 16722.313997, error 0.00, clocks/sec 
2596982853.770165

Time last barrier release 16721.162470894

1: Time now 16720.011753, reftime 16720.011753, error 0.00, clocks/sec 
2596982875.038256

Thread 1 offset 2.302279597 error -.9

Regards
-Prashant

On Sun, Jun 14, 2020 at 8:12 PM Dave Barach (dbarach)  wrote:
>
> What is the magnitude of the delta that you observe? What does "show clock 
> verbose" say about the state of clock-rate convergence? Is a deus ex machina 
> (e.g. NTP) involved?
>
>
>
> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Prashant 
> Upadhyaya
> Sent: Sunday, June 14, 2020 10:32 AM
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] Regarding vlib_time_now
>
>
>
> Hi,
>
>
>
> I am using VPP 19.08
>
> In my worker threads, I am observing that when I am making successive calls 
> to vlib_time_now in a polling node, sometimes the value of the time reduces.
>
> Is this expected to happen ? (presumably because of the implementation which 
> tries to align the times in workers ?) I have an implementation which is 
> extremely sensitive to time at microsecond level and depends on the the 
> vlib_time_now only increasing monotonically across calls individually in the 
> workers (or remain the same but never decrease) on a per worker basis even if 
> the times within the workers are not synchronized.
>
>
>
> Regards
>
> -Prashant
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16728): https://lists.fd.io/g/vpp-dev/message/16728
Mute This Topic: https://lists.fd.io/mt/74875583/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] [discuss] sw_if_index in vent_buffer giving wrong IP address with ip_interface_address_get_address

2020-06-19 Thread Dave Barach via lists.fd.io
HundredGigabitEthernet12/0/0 (sw_if_index=1) has the ip address 192.168.198.2. 
The calculation shown in your original email is producing 192.168.198.2, which 
seems right to me.

You’ve looked up the ip address of the rx interface, which may not be what you 
had in mind.

The packet got nowhere near ip4-lookup -> ip4-rewrite, which would set 
vlib_buffer(b)->sw_if_index[VLIB_TX] to the tx interface sw_if_index.

HTH... Dave

From: RaviKiran Veldanda 
Sent: Friday, June 19, 2020 1:47 PM
To: Dave Barach (dbarach) 
Subject: Re: [discuss] sw_if_index in vent_buffer giving wrong IP address with 
ip_interface_address_get_address

Yes Dave,
I did all the things you suggested and the packet is coming on  
HundredGigabitEthernet12/0/0 and I am just getting the packets for that 
interface only.
Please find details below:
Packet 1

00:02:29:382622: dpdk-input
  HundredGigabitEthernet12/0/0 rx queue 0
  buffer 0x92556: current data 0, length 60, buffer-pool 0, ref-count 1, 
totlen-nifb 0, trace handle 0x0
  ext-hdr-valid
  l4-cksum-computed l4-cksum-correct
  PKT MBUF: port 0, nb_segs 1, pkt_len 60
buf_len 2176, data_len 60, ol_flags 0x0, data_off 128, phys_addr 0x88895600
packet_type 0x1 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
Packet Types
  RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
  0x0027: 3c:2c:30:65:54:31 -> 01:80:c2:00:00:00
00:02:29:437741: ipgw_ent
  IPGW_ENT: sw_if_index 1, next index 0
  new src 3c:2c:30:65:54:31 -> new dst 01:80:c2:00:00:00
00:02:29:437762: ethernet-input
  0x0027: 3c:2c:30:65:54:31 -> 01:80:c2:00:00:00
00:02:29:437764: llc-input
  LLC bpdu -> bpdu
00:02:29:437770: error-drop
  rx:HundredGigabitEthernet12/0/0
00:02:29:437771: drop
  llc-input: unknown llc ssap/dsap

The Commands:

set interface ip add HundredGigabitEthernet12/0/0 
192.168.198.2/24
 set interface state HundredGigabitEthernet12/0/0 up
 set interface ip addr HundredGigabitEthernet12/0/0 2001:5b0::7cf0::98fc/64
 create interface memif id 0 socket-id 0 master
 set interface state memif0/0 up
 set interface ip add memif0/0 192.168.1.3/24
 set interface ip addr memif0/0 2001:5b0::7cf1::98fc/64

I believe there is some problem with this index in code.Please let us know your 
views.

//Ravi

On Fri, Jun 19, 2020 at 10:38 AM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
If the packet was received on HundredGigabitEthernet12/0/0, you should get 
192.168.198.2. If it was received on memif0/0 you should get 192.168.1.3. 
"trace add dpdk-input" [or send pkts, then "show trace". If that produces 
nothing, s/dpdk-input/memif-input/ or whatever the memif input node is called.

Use "show int addr" / "show int" to determine the sw_if_index to name mapping, 
and to display the interface ip addresses.


From: disc...@lists.fd.io 
mailto:disc...@lists.fd.io>> on behalf of 
ravi.jup...@gmail.com 
mailto:ravi.jup...@gmail.com>>
Sent: Thursday, June 18, 2020 3:54 PM
To: disc...@lists.fd.io 
mailto:disc...@lists.fd.io>>
Subject: [discuss] sw_if_index in vent_buffer giving wrong IP address with 
ip_interface_address_get_address


[Edited Message Follows]
Hi Team,
I am writing a plugin and my plugin is attached to device_input, so that I can 
receive all the traffic.
In my plugin, I am checking the headers and deciding to forward to my 
application or to the Native VPP. This is working fine, However ICMP case I 
have one requirement to check is it destined to Interface I attached my plugin 
or some other IP, So to check that I am using following APIs,
I am receiving packets and doing following things to get the IP address,
sw_if_index0 = vnet_buffer(b0)->sw_if_index[VLIB_RX];
ip4_main_t *im = _main;
 ip_lookup_main_t *lm = >lookup_main;
 ip_interface_address_t *if_add = pool_elt_at_index (lm->if_address_pool, 
if_index);
 ip4_address_t *if_ip = ip_interface_address_get_address (lm, if_add);
I am getting IP Address but the IP Address I am getting is something different,
For Example:
In VPPCTL If I create the interfaces in the following order:
 set interface ip add HundredGigabitEthernet12/0/0 
192.168.198.2/24
 set interface state HundredGigabitEthernet12/0/0 up
 set interface ip addr HundredGigabitEthernet12/0/0 2001:5b0::7cf0::98fc/64
 create interface memif id 0 socket-id 0 master
 set interface state memif0/0 up
 set interface ip add memif0/0 192.168.1.3/24
 set interface ip addr memif0/0 2001:5b0::7cf1::98fc/64
  I am getting IP Address as 192.168.1.3.
If I am creating interfaces in the following order:
 create interface memif id 0 socket-id 0 master
 set interface state memif0/0 up
 set interface ip add memif0/0 192.168.1.3/24
 set interface ip addr memif0/0 2001:5b0::7cf1::98fc/64
  set 

Re: [vpp-dev] VPP API CRC compatibility check process in checkstyle merged and active

2020-06-18 Thread Dave Barach via lists.fd.io
Coverage runs with gcc can run in parallel. With clang, not so much... CC=gcc 
is your friend...

D.

From: vpp-dev@lists.fd.io  On Behalf Of Andrew Yourtchenko
Sent: Thursday, June 18, 2020 4:25 PM
To: Balaji Venkatraman (balajiv) 
Cc: Neale Ranns (nranns) ; vpp-dev 
Subject: Re: [vpp-dev] VPP API CRC compatibility check process in checkstyle 
merged and active

Hi Balaji,

Yeah that was what I was thinking, though weekly ain’t good enough - one would 
have to run coverage report before and after and ensure it doesn’t drop.

But it’s only one point and it’s also not a given that a code with the api 
change/addition contains all the code for that new api version - almost the 
opposite, in my experience..

The best case would be of course to ensure that *every* commit has a 
non-decrementing code coverage value, and trigger some kind of alert if it 
does That will fulfil the requirements from the api standpoint 
automatically, and also automatically nudge the improvements in the code 
coverage overall...

I vaguely remember hearing that code coverage can’t run the test cases in 
parallel, is that right ?

—a



On 18 Jun 2020, at 19:04, Balaji Venkatraman (balajiv) 
mailto:bala...@cisco.com>> wrote:

Hi Andrew,

Just a few comments regarding coverage.

We could use the coverage (we currently run on a weekly basis) as baseline and 
monitor for incremental increases when a versioning change occurs. If there was 
a way to check the UT for the _v2 covers the ‘new/modified’ code and if 
possible add the coverage data as part of the commit criteria, that would be 
ideal. Until then, we could manually check if the coverage shows code for _v2 
being touched by the new test added for it before it is approved.

Just a suggestion!

--
Regards,
Balaji.


From: mailto:vpp-dev@lists.fd.io>> on behalf of Andrew 
Yourtchenko mailto:ayour...@gmail.com>>
Date: Thursday, June 18, 2020 at 8:58 AM
To: "Neale Ranns (nranns)" mailto:nra...@cisco.com>>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] VPP API CRC compatibility check process in checkstyle 
merged and active

Hi Neale,



On 18 Jun 2020, at 17:11, Neale Ranns (nranns) 
mailto:nra...@cisco.com>> wrote:
Hi Andrew,
A couple of questions?

Absolutely! That’s how we improve it! Thanks a lot for the questions ! Replies 
inline:




Firstly, about unit testing aka make test. This is the salient passage in your 
guide:
  "foo_message_v2 is tested in "make test" to the same extent as the 
foo_message"
IMHO "to the same extent" implies everywhere v1 is used v2 should now be used 
in its place. One would hope that in most cases a simple find and replace 
through all test cases would do the job. However, once one has created such a 
fork and verified (presumably through some objective measure like lcov) that it 
is the same extent of coverage, what becomes of it? V1 and V2 APIs must 
co-exist for some time, so how do we continue to run the v1 original tests and 
the v2 fork?

For most of most of the practical use cases the _v2 will be a trivial change 
compared to _v1 (eg. field change, etc), and that it would be implemented by v1 
handler calling v2 handler,
one can start with adding the tests for v2 that touch just the new/changed 
functionality, and in that case the tests calling v1 will “count” against the 
v2 coverage without the test duplication.


https://gerrit.fd.io/r/c/vpp/+/27586 Is the fresh example of just this approach.

I discussed with Ole and I tried to make a stricter and more concise 
description here for an API change:

https://wiki.fd.io/view/VPP/ApiChangeProcess#Tooling

So I would say we can explicitly say “the tests need to be converted to use the 
new API” either at the moment of “productizing” the new API or deletion of the 
old API. And yeah the idea is that we could eventually do automatic code 
coverage tests specifically for those points to ensure it doesn’t drop (or that 
it monotonically increases :)

I am not sure there is a good way to test the “code coverage for an API” per 
se, since none of the tests have only one API - the before/after overall 
comparison should be good enough ?

Given that between any two releases multi APIs may go through a version 
upgrade, there will be many such forks to manage.

I think it should be just one per message at most ? (If one uses the 
“in-progress” transition phase for new messages - in fact we are pondering that 
it might be a good idea to also enforce that via the tool, so that would add an 
explicit “yes this is ready” phase, and avoid “accidental production status”.



Additionally, are we also going to test all combinations of messages and their 
versions, e.g. foo_v2 with bar_v2.

I think the best judgement still applies. If you have foo_v1 and bar_v1 which 
are related and replaced by foo_v2 and bar_v2, which means their deprecations  
would be probably synced, and the same would apply for the use by consumers. So 
either “v1 and v1” or “v2 and v2”.

Again - the logic behind all 

Re: [vpp-dev] Userspace CNI - VPP Memif Forwarding Issue.

2020-06-28 Thread Dave Barach via lists.fd.io
"trace add [dpdk-input|memif-input] " ... run failing case ... "show 
trace". Should explain what's happening.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Sridhar K. N. Rao
Sent: Saturday, June 27, 2020 9:16 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Userspace CNI - VPP Memif Forwarding Issue.

Hello,

We (OPNFV-VSPERF) are working container networking performance testing. As part 
of this we are testing Userspace CNI with VPP. The Pod runs a dpdk l2fwd 
application. We are stuck with a forwarding issue, and any suggestion would be 
extremely helpful. I checked with CNI team, and they see no issues.

I have provided the summary here. All the details (vpp configuration, version, 
l2fwd run in pod, patch-info, interface info, statistics, pod-interface-stats, 
failed-scenarios, etc) can be found here - 
https://github.com/opensource-tnbt/cnb/blob/master/vpp-forwarding-issue.md

We have a very simple 2-node setup. The 2 nodes are connect directly by 2 
10-gig interfaces for data traffic.
On one node we have T-Rex Running, whereas on the other node we have VPP and 
DPDK-pod running.

So, the traffic will flow like this:
Direction1:
Trex0-phy ---> TenGigabitEthernet6/0/0 ---> memif1/0 ---> net_memif1
DPDK l2fwd app in pod: net_memif1 ---> net_memif2
Trex1-phy <--- TenGigabitEthernet6/0/1 <--- memif2/0 <--- net_memif2
Direction2:
Trex0-phy ---> TenGigabitEthernet6/0/0 ---> memif2/0 ---> net_memif2
DPDK l2fwd app in pod: net_memif2 ---> net_memif1
Trex1-phy <--- TenGigabitEthernet6/0/1 <--- memif1/0 <--- net_memif1

L2pathes are configured
Direction:1
sudo vppctl test l2patch rx TenGigabitEthernet6/0/0 tx memif1/0
sudo vppctl test l2patch rx memif2/0 tx TenGigabitEthernet6/0/1
Direction:2
sudo vppctl test l2patch rx TenGigabitEthernet6/0/1 tx memif2/0
sudo vppctl test l2patch rx memif1/0 tx TenGigabitEthernet6/0/0

When The traffic runs, All the traffic somehow ends up only on memif1/0 - like 
this:

vpp# show interface
  Name   IdxState  MTU (L3/IP4/IP6/MPLS) 
Counter  Count
TenGigabitEthernet6/0/0   1  up  9000/0/0/0 rx packets  
70598102
rx bytes
 79993565352
tx packets  
   616631302
tx bytes
696194649320
tx-error
   458997285
TenGigabitEthernet6/0/1   2  up  9000/0/0/0 rx packets  
70587607
rx bytes
 79981831588
local00 down  0/0/0/0
memif1/0  3  up  9000/0/0/0 rx packets  
   616631302
rx bytes
696194649320
tx packets  
70598102
tx bytes
 79993565352
memif2/0  4  up  9000/0/0/0 tx packets  
70587607
tx bytes
 79981831588

We have tried the following (The outputs of all these are in the link above).

  1.  Testing 1-Direction at a time.
  2.  Swapping the Phy -> memif mapping in the patch
  3.  Using Xconnect instead of l2patch.
  4.  Changing the mode from Interface to bridge in CNI.

None of these helped to debug the issue.

Looking forward for an advice.

Cheers,
Sridhar


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16837): https://lists.fd.io/g/vpp-dev/message/16837
Mute This Topic: https://lists.fd.io/mt/75152055/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: Private: Re: [vpp-dev] VPP C API application compilation issue on using S and W functions

2020-06-26 Thread Dave Barach via lists.fd.io
If you want to pinch code from vpp_api_test, please read and understand it:

/* W: wait for results, with timeout */
#define W(ret)  \
do {\
f64 timeout = vat_time_now (vam) + 1.0; \
socket_client_main_t *scm = vam->socket_client_main;  \
ret = -99;  \
\
if (scm && scm->socket_enable)  \
  vl_socket_client_read (5);  \
while (vat_time_now (vam) < timeout) {  \
if (vam->result_ready == 1) {   \
ret = vam->retval;  \
break;  \
}   \
vat_suspend (vam->vlib_main, 1e-5); \
}   \
} while(0);

So what does the “W” macro do? It causes the vpp_api_test main thread to block 
until vpp answers and the vpp_api_test rx pthread processes the reply, or a 
timeout occurs.

You’ll need to declare and initialize a clib_time_t, a vat_main_t, then supply 
the missing routines:

f64
vat_time_now (vat_main_t * vam)
{
  return clib_time_now (>clib_time);
}


void
vat_suspend (vlib_main_t * vm, f64 interval)
{
  /* do nothing to busy-wait, or call usleep, or call CLIB_PAUSE */
}

Given that you’re not using vpp_api_test, you might decide to use 
vat_helper_macros.h as a guide and fabricate something which seems more at home 
in your application.

HTH... Dave

From: Chinmaya Aggarwal 
Sent: Friday, June 26, 2020 7:59 AM
To: Dave Barach (dbarach) 
Subject: Private: Re: [vpp-dev] VPP C API application compilation issue on 
using S and W functions

Hi Dave,
We have our application (using VPP C API) running outside of VPP.  We linked 
vpp libs in our application (vatplugin, vppinfra, vlibmemoryclient, svm, 
vppapiclient). Attaching file vpp_connect_test.c that creates connection with 
VPP and calls VPP C API to modify a sr policy. We have used function S to send 
the message and W to wait for its reply. On compiling this C file we are 
getting the following error : -
root@ggnlabvm-hnsnfvsdn03:~/vpp_api_test# make
gcc  -Wall -g-c -o vpp_connect_test.o vpp_connect_test.c
gcc  -Wall -g  -I/usr/include/vpp_plugins -I/usr/include/ -o api_exec 
vpp_connect_test.o -Wl,--start-group  -lvatplugin -lvppinfra -lvlibmemoryclient 
-lsvm -lvppapiclient  -Wl,--end-group  -lpthread -lm -lrt -ldl
vpp_connect_test.o: In function `del_sl_pol2_index6':
/root/vpp_api_test/vpp_connect_test.c:97: undefined reference to `vat_time_now'
/root/vpp_api_test/vpp_connect_test.c:97: undefined reference to `vat_suspend'
/root/vpp_api_test/vpp_connect_test.c:97: undefined reference to `vat_time_now'
collect2: error: ld returned 1 exit status
Makefile:35: recipe for target 'api_exec' failed
make: *** [api_exec] Error 1

Is there any library that needs to be linked and we have missed it? If not, can 
we use S and W function in an external C application, in a way similar to how 
VPP uses it?

Thanks and Regards,
Chinmaya Agarwal.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16834): https://lists.fd.io/g/vpp-dev/message/16834
Mute This Topic: https://lists.fd.io/mt/75126256/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP C API application compilation issue on using S and W functions

2020-06-25 Thread Dave Barach via lists.fd.io
Why are you linking against vlib?

From: vpp-dev@lists.fd.io  On Behalf Of Chinmaya Aggarwal
Sent: Thursday, June 25, 2020 6:48 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP C API application compilation issue on using S and W 
functions

Hi,
We are testing VPP C API for our use case. We have a scenario where we want to 
wait for the reply of our API request. For this, we are trying to use S and W 
functions defined in src/vlibapi/vat_helper_macros.h. But we get the following 
error on compiling our source code : -

root@ggnlabvm-hnsnfvsdn03:~/vpp_api_test# make
gcc  -Wall -g  -I/usr/include/vpp_plugins -I/usr/include/ -o vpp_api_test 
vpp_connect.o -Wl,--start-group  -lvatplugin -lvppinfra -lvlibmemoryclient 
-lsvm -lvlib -lvppapiclient  -Wl,--end-group  -lpthread -lm -lrt -ldl
vpp_connect.o: In function `del_sl_pol2_index6':
/root/vpp_api_test/vpp_connect.c:780: undefined reference to `vat_time_now'
/root/vpp_api_test/vpp_connect.c:780: undefined reference to `vat_suspend'
/root/vpp_api_test/vpp_connect.c:780: undefined reference to `vat_time_now'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libvlib.so: undefined 
reference to `stat_segment_register_gauge'
collect2: error: ld returned 1 exit status
Makefile:37: recipe for target 'vpp_api_test' failed
make: *** [vpp_api_test] Error 1

Can anyone please suggest what is wrong here?


Thanks and Regards,
Chinmaya Agarwal.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16815): https://lists.fd.io/g/vpp-dev/message/16815
Mute This Topic: https://lists.fd.io/mt/75100285/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP C API application compilation issue on using S and W functions

2020-06-25 Thread Dave Barach via lists.fd.io
Unless you’re doing vector processing, vlib is not useful.

Here is the CMakeLists.txt entry for vpp_api_test. It’s not likely that you’ll 
need libvatplugin.so, but you get the idea...


##
# vpp_api_test
##
add_vpp_executable(vpp_api_test ENABLE_EXPORTS
  SOURCES
  api_format.c
  main.c
  plugin.c
  json_format.c
  types.c
  ip_types_api.c
  ip_types.c
  protocols.def

  DEPENDS api_headers

  LINK_LIBRARIES
  vlibmemoryclient
  svm
  vatplugin
  vppinfra
  Threads::Threads
  rt m dl crypto
)

From: vpp-dev@lists.fd.io  On Behalf Of Chinmaya Aggarwal
Sent: Thursday, June 25, 2020 7:02 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP C API application compilation issue on using S and W 
functions

I am following fd.io wiki link https://wiki.fd.io/view/VPP/How_To_Use_The_C_API
It says so.
Are we not suppose to link vlib?
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16817): https://lists.fd.io/g/vpp-dev/message/16817
Mute This Topic: https://lists.fd.io/mt/75100285/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-10 Thread Dave Barach via lists.fd.io
Thanks, glad to hear it... D.

From: Rajith PR 
Sent: Wednesday, June 10, 2020 4:04 AM
To: Dave Barach (dbarach) 
Cc: Benoit Ganne (bganne) ; vpp-dev ; 
Neale Ranns (nranns) 
Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

Hi Dave,

We ran a good number of scale tests with the fix. We didn't hit this crash.

Thanks a lot for the fix.

Regards,
Rajith



On Wed, Jun 3, 2020 at 5:40 PM Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:
Please test https://gerrit.fd.io/r/c/vpp/+/27407 and report results.

-Original Message-
From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via 
lists.fd.io<http://lists.fd.io>
Sent: Wednesday, June 3, 2020 7:08 AM
To: Benoit Ganne (bganne) mailto:bga...@cisco.com>>; 
raj...@rtbrick.com<mailto:raj...@rtbrick.com>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>; Neale Ranns 
(nranns) mailto:nra...@cisco.com>>
Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

+1, can't tell which poison pattern is involved without a scorecard.

load_balance_alloc_i (...) is clearly not thread-safe due to calls to 
pool_get_aligned (...) and vlib_validate_combined_counter(...).

Judicious use of pool_get_aligned_will_expand(...), 
_vec_resize_will_expand(...) and a manual barrier sync will fix this problem 
without resorting to draconian measures.

It'd sure be nice to hear from Neale before we code something like that.

D.

-Original Message-
From: Benoit Ganne (bganne) mailto:bga...@cisco.com>>
Sent: Wednesday, June 3, 2020 3:17 AM
To: raj...@rtbrick.com<mailto:raj...@rtbrick.com>; Dave Barach (dbarach) 
mailto:dbar...@cisco.com>>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>; Neale Ranns 
(nranns) mailto:nra...@cisco.com>>
Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

Neale is away and might be slow to react.
I suspect the issue is when creating new load balance entry through 
load_blance_create(), which will get a new element from the load balance pool. 
This in turn will update the pool free bitmap, which can grow. As it is backed 
by a vector, it can be reallocated somewhere else to fit the new size.
If it is done concurrently with dataplane processing, bad things happen. The 
pattern 0x131313 is filled by dlmalloc free() and will happen in that case. I 
think the same could happen to the pool itself, not only the bitmap.
If I am correct, I am not sure how we should fix that: fib update API is marked 
as mp_safe, so we could create a fixed-size load balance pool to prevent 
runtime reallocation, but it would waste memory and impose a maximum size.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
> mailto:vpp-dev@lists.fd.io>> On Behalf Of Rajith PR
> via lists.fd.io<http://lists.fd.io>
> Sent: mercredi 3 juin 2020 05:46
> To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
> Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>; Neale Ranns 
> (nranns)
> mailto:nra...@cisco.com>>
> Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
> Hi Dave/Neal,
>
> The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking
> into the right code or I have interpreted it incorrectly?
>
> Thanks,
> Rajith
>
> On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach)
> mailto:dbar...@cisco.com> 
> <mailto:dbar...@cisco.com<mailto:dbar...@cisco.com>> > wrote:
>
>
>   The code manages to access a poisoned adjacency – 0x131313 fill
> pattern – copying Neale for an opinion.
>
>
>
>   D.
>
>
>
>   From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
> <mailto:vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>   d...@lists.fd.io<mailto:d...@lists.fd.io> 
> <mailto:vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> > On Behalf Of 
> Rajith PR
> via lists.fd.io<http://lists.fd.io> <http://lists.fd.io>
>   Sent: Tuesday, June 2, 2020 10:00 AM
>   To: vpp-dev mailto:vpp-dev@lists.fd.io> 
> <mailto:vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> >
>   Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
>
>
>   Hello All,
>
>
>
>   In 19.08 VPP version we are seeing a crash while accessing the
> load_balance_pool  in load_balanc_get() function. This is happening
> after enabling worker threads.
>
>   As such the FIB programming is happening in the main thread and in
> one of the worker threads we see this crash.
>
>   Also, this is seen when we scale to 300K+ ipv4 routes.
>
>
>
>   Here is the complete stack,
>
>
>
>   Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
>
>

Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-03 Thread Dave Barach via lists.fd.io
Please test https://gerrit.fd.io/r/c/vpp/+/27407 and report results. 

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
lists.fd.io
Sent: Wednesday, June 3, 2020 7:08 AM
To: Benoit Ganne (bganne) ; raj...@rtbrick.com
Cc: vpp-dev ; Neale Ranns (nranns) 
Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

+1, can't tell which poison pattern is involved without a scorecard.

load_balance_alloc_i (...) is clearly not thread-safe due to calls to 
pool_get_aligned (...) and vlib_validate_combined_counter(...). 

Judicious use of pool_get_aligned_will_expand(...), 
_vec_resize_will_expand(...) and a manual barrier sync will fix this problem 
without resorting to draconian measures.

It'd sure be nice to hear from Neale before we code something like that. 

D. 

-Original Message-
From: Benoit Ganne (bganne) 
Sent: Wednesday, June 3, 2020 3:17 AM
To: raj...@rtbrick.com; Dave Barach (dbarach) 
Cc: vpp-dev ; Neale Ranns (nranns) 
Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

Neale is away and might be slow to react.
I suspect the issue is when creating new load balance entry through 
load_blance_create(), which will get a new element from the load balance pool. 
This in turn will update the pool free bitmap, which can grow. As it is backed 
by a vector, it can be reallocated somewhere else to fit the new size.
If it is done concurrently with dataplane processing, bad things happen. The 
pattern 0x131313 is filled by dlmalloc free() and will happen in that case. I 
think the same could happen to the pool itself, not only the bitmap.
If I am correct, I am not sure how we should fix that: fib update API is marked 
as mp_safe, so we could create a fixed-size load balance pool to prevent 
runtime reallocation, but it would waste memory and impose a maximum size.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR 
> via lists.fd.io
> Sent: mercredi 3 juin 2020 05:46
> To: Dave Barach (dbarach) 
> Cc: vpp-dev ; Neale Ranns (nranns) 
> 
> Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> 
> Hi Dave/Neal,
> 
> The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking 
> into the right code or I have interpreted it incorrectly?
> 
> Thanks,
> Rajith
> 
> On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach) 
> mailto:dbar...@cisco.com> > wrote:
> 
> 
>   The code manages to access a poisoned adjacency – 0x131313 fill 
> pattern – copying Neale for an opinion.
> 
> 
> 
>   D.
> 
> 
> 
>   From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>   d...@lists.fd.io <mailto:vpp-dev@lists.fd.io> > On Behalf Of Rajith PR 
> via lists.fd.io <http://lists.fd.io>
>   Sent: Tuesday, June 2, 2020 10:00 AM
>   To: vpp-dev mailto:vpp-dev@lists.fd.io> >
>   Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> 
> 
> 
>   Hello All,
> 
> 
> 
>   In 19.08 VPP version we are seeing a crash while accessing the 
> load_balance_pool  in load_balanc_get() function. This is happening 
> after enabling worker threads.
> 
>   As such the FIB programming is happening in the main thread and in 
> one of the worker threads we see this crash.
> 
>   Also, this is seen when we scale to 300K+ ipv4 routes.
> 
> 
> 
>   Here is the complete stack,
> 
> 
> 
>   Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> 
>   [Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
>   0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) 
> at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
>   201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);
> 
> 
> 
>   Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
>   #0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> i=61) at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
>   #1  0x7fbef10676a8 in load_balance_get (lbi=61) at
> /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
>   #2  0x7fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at
> /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
>   #3  0x7fbef1068ead in ip4_lookup_node_fn_avx2 (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
>   at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
>   #4  0x7fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL, 
> dispatch_state=VLIB_NODE_STATE_POLLING,
>   frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
>   #5  0x00

Re: [vpp-dev] VPP forwarding packets not destined to it #vpp

2020-06-03 Thread Dave Barach via lists.fd.io
Use the force and read the source:

/*?
* Layer 2 flooding can be enabled and disabled on each
* interface and on each bridge-domain. Use this command to
* manage bridge-domains. It is enabled by default.
*
* @cliexpar
* Example of how to enable flooding (where 200 is the bridge-domain-id):
* @cliexcmd{set bridge-domain flood 200}
* Example of how to disable flooding (where 200 is the bridge-domain-id):
* @cliexcmd{set bridge-domain flood 200 disable}
?*/
/* *INDENT-OFF* */
VLIB_CLI_COMMAND (bd_flood_cli, static) = {
  .path = "set bridge-domain flood",
  .short_help = "set bridge-domain flood  [disable]",
  .function = bd_flood,
};
/* *INDENT-ON* */

From: vpp-dev@lists.fd.io  On Behalf Of Nagaraju Vemuri
Sent: Tuesday, June 2, 2020 8:13 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP forwarding packets not destined to it #vpp


Hi,

We are using linux bridge to connect different interfaces owned by different 
VPP instances.
When the bridge has no binding info about MAC-to-port, bridge is flooding 
packets to all interfaces.
Hence VPP receives some packets whose MAC address is owned by some other VPP 
instance.
We want to drop such packets. By default VPP is forwarding these packets.

We tried using "set interface l2 forward  disable", but this did not 
help.

Please suggest what we can do.

Thanks,
Nagaraju
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16631): https://lists.fd.io/g/vpp-dev/message/16631
Mute This Topic: https://lists.fd.io/mt/74640593/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-03 Thread Dave Barach via lists.fd.io
+1, can't tell which poison pattern is involved without a scorecard.

load_balance_alloc_i (...) is clearly not thread-safe due to calls to 
pool_get_aligned (...) and vlib_validate_combined_counter(...). 

Judicious use of pool_get_aligned_will_expand(...), 
_vec_resize_will_expand(...) and a manual barrier sync will fix this problem 
without resorting to draconian measures.

It'd sure be nice to hear from Neale before we code something like that. 

D. 

-Original Message-
From: Benoit Ganne (bganne)  
Sent: Wednesday, June 3, 2020 3:17 AM
To: raj...@rtbrick.com; Dave Barach (dbarach) 
Cc: vpp-dev ; Neale Ranns (nranns) 
Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

Neale is away and might be slow to react.
I suspect the issue is when creating new load balance entry through 
load_blance_create(), which will get a new element from the load balance pool. 
This in turn will update the pool free bitmap, which can grow. As it is backed 
by a vector, it can be reallocated somewhere else to fit the new size.
If it is done concurrently with dataplane processing, bad things happen. The 
pattern 0x131313 is filled by dlmalloc free() and will happen in that case. I 
think the same could happen to the pool itself, not only the bitmap.
If I am correct, I am not sure how we should fix that: fib update API is marked 
as mp_safe, so we could create a fixed-size load balance pool to prevent 
runtime reallocation, but it would waste memory and impose a maximum size.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR 
> via lists.fd.io
> Sent: mercredi 3 juin 2020 05:46
> To: Dave Barach (dbarach) 
> Cc: vpp-dev ; Neale Ranns (nranns) 
> 
> Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> 
> Hi Dave/Neal,
> 
> The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking 
> into the right code or I have interpreted it incorrectly?
> 
> Thanks,
> Rajith
> 
> On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach) 
> mailto:dbar...@cisco.com> > wrote:
> 
> 
>   The code manages to access a poisoned adjacency – 0x131313 fill 
> pattern – copying Neale for an opinion.
> 
> 
> 
>   D.
> 
> 
> 
>   From: vpp-dev@lists.fd.io    d...@lists.fd.io  > On Behalf Of Rajith PR 
> via lists.fd.io 
>   Sent: Tuesday, June 2, 2020 10:00 AM
>   To: vpp-dev mailto:vpp-dev@lists.fd.io> >
>   Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> 
> 
> 
>   Hello All,
> 
> 
> 
>   In 19.08 VPP version we are seeing a crash while accessing the 
> load_balance_pool  in load_balanc_get() function. This is happening 
> after enabling worker threads.
> 
>   As such the FIB programming is happening in the main thread and in 
> one of the worker threads we see this crash.
> 
>   Also, this is seen when we scale to 300K+ ipv4 routes.
> 
> 
> 
>   Here is the complete stack,
> 
> 
> 
>   Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> 
>   [Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
>   0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) 
> at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
>   201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);
> 
> 
> 
>   Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
>   #0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> i=61) at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
>   #1  0x7fbef10676a8 in load_balance_get (lbi=61) at
> /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
>   #2  0x7fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at
> /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
>   #3  0x7fbef1068ead in ip4_lookup_node_fn_avx2 (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
>   at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
>   #4  0x7fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080, 
> node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL, 
> dispatch_state=VLIB_NODE_STATE_POLLING,
>   frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
>   #5  0x7fbef0c6b7ad in dispatch_pending_node (vm=0x7fbe8a5aa080, 
> pending_frame_index=2, last_time_stamp=381215594286358)
>   at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375
>   #6  0x7fbef0c6d3f0 in vlib_main_or_worker_loop 
> (vm=0x7fbe8a5aa080, is_main=0) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826
>   #7  0x7fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1934
>   #8  0x7fbef0cac791 in vlib_worker_thread_fn (arg=0x7fbe8de2a340) 
> at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754
>   #9  0x7fbef092da48 in clib_calljmp () from
> 

Re: [vpp-dev] Interesting backtrace in 1908

2020-06-05 Thread Dave Barach via lists.fd.io
Dear Chris,

Does this happen w/ master/latest? Can you share the startup config so I can 
try to repro the problem?

Thanks... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Christian Hopps
Sent: Friday, June 5, 2020 1:29 PM
To: vpp-dev 
Cc: Christian Hopps 
Subject: [vpp-dev] Interesting backtrace in 1908

I'm wondering if maybe this SIGSEGV/backtrace might be related to the other 
recently reported problem with the FIB and barrier code? The workers are at the 
barrier when the SIGSEGV happens, but maybe they aren't when they need to be 
earlier on?

In this case I've compiled w/o CLIB_DEBUG set, but with compiler flags set to 
-O0 instead of -O2 (trying to debug another problem that occurs much later).

This is being hit (apparently) when my startup config is adding a static arp 
entry (included below the backtrace)

I've sync'd code to the 2 recent commits past 19.08.02 as well as cherry 
picking the fix from Dave for the counter resize issue in the FIB.

I can try and put together a more in depth bug report (or try an RCA it 
myself), but I'm wondering if something might be easily identified from this 
backtrace w/o doing a bunch more work.

Thanks,
Chris.

(gdb) info thre
  Id   Target Id Frame
* 1Thread 83.83 "vpp_main" 0x75ccbb11 in clib_memcpy_fast 
(dst=0x1400, src=0x4500e239, n=936751150609465344) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:187
  2Thread 83.86 "eal-intr-thread" 0x759a3bb7 in epoll_wait 
(epfd=epfd@entry=15, events=events@entry=0x7fff9e8dbe10, 
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3Thread 83.87 "vpp_wk_0" 0x76290c90 in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430
  4Thread 83.88 "vpp_wk_1" 0x76290c9a in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430
  5Thread 83.89 "vpp_wk_2" 0x76290c9f in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430

(gdb) bt
#0  0x75ccbb11 in clib_memcpy_fast (dst=0x1400, 
src=0x4500e239, n=936751150609465344) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:187
#1  0x75cd49a8 in lookup (v=0x7fffb7c793c8, key=615, op=SET, 
new_value=0x7fffb7c04880, old_value=0x0) at 
/var/build/vpp/src/vppinfra/hash.c:611
#2  0x75cd6217 in _hash_set3 (v=0x7fffb7c793c8, key=615, 
value=0x7fffb7c04880, old_value=0x0) at /var/build/vpp/src/vppinfra/hash.c:840
#3  0x762b0b28 in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=522, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:241
#4  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=521, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#5  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=520, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#6  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=519, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#7  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=274, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#8  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=523, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#9  0x776e120a in vlib_node_add_next (next_node=615, node=523, 
vm=0x7656c400 ) at 
/var/build/vpp/src/vlib/node_funcs.h:1109
#10 adj_nbr_update_rewrite_internal (adj=0x7fffb821e5c0, 
adj_next_index=IP_LOOKUP_NEXT_REWRITE, this_node=523, next_node=615, 
rewrite=0x0) at /var/build/vpp/src/vnet/adj/adj_nbr.c:488
#11 0x776e0c34 in adj_nbr_update_rewrite (adj_index=2, 
flags=ADJ_NBR_REWRITE_FLAG_COMPLETE, rewrite=0x7fffb89f4d60 "\002") at 
/var/build/vpp/src/vnet/adj/adj_nbr.c:314
#12 0x76f695b3 in arp_mk_complete (ai=2, e=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/ethernet/arp.c:385
#13 0x76f696be in arp_mk_complete_walk (ai=2, ctx=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/ethernet/arp.c:430
#14 0x776e15c0 in adj_nbr_walk_nh4 (sw_if_index=1, addr=0x7fffb7f116ac, 
cb=0x76f69696 , ctx=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/adj/adj_nbr.c:624
#15 0x76f6a436 in arp_update_adjacency (vnm=0x77b47e80 , 
sw_if_index=1, ai=2) at /var/build/vpp/src/vnet/ethernet/arp.c:540
#16 0x76b30f27 in ethernet_update_adjacency (vnm=0x77b47e80 
, sw_if_index=1, ai=2) at 
/var/build/vpp/src/vnet/ethernet/interface.c:210
#17 0x77706ceb in vnet_update_adjacency_for_sw_interface 
(vnm=0x77b47e80 , sw_if_index=1, ai=2) at 
/var/build/vpp/src/vnet/adj/rewrite.c:187
#18 0x776e0b18 in adj_nbr_add_or_lock (nh_proto=FIB_PROTOCOL_IP4, 
link_type=VNET_LINK_IP4, nh_addr=0x7fffb821f200, sw_if_index=1) at 

Re: [vpp-dev] Interesting backtrace in 1908

2020-06-05 Thread Dave Barach via lists.fd.io
Dear Chris,

Whew, that just made my weekend a lot happier. 

I'll look into why the relevant patch didn't make it back into 19.08 - it will 
now! - unfortunately "stuff happens..."

Thanks for confirming... Dave

-Original Message-
From: Christian Hopps  
Sent: Friday, June 5, 2020 5:09 PM
To: Dave Barach (dbarach) 
Cc: Christian Hopps ; vpp-dev 
Subject: Re: [vpp-dev] Interesting backtrace in 1908

Bingo.

In fact in 19.08 the value is left as 0 which defaults to 15. I took it from 20 
down to 15, starting successfully until I reached 15 which then hit the problem 
(both with the arp path and the other).

Thanks for the help finding this!

Chris.

> On Jun 5, 2020, at 4:52 PM, Dave Barach via lists.fd.io 
>  wrote:
> 
> Hmmm. That begins to smell like an undetected stack overflow. To test that 
> theory: s/18/20/ below: 
> 
> /* *INDENT-OFF* */
> VLIB_REGISTER_NODE (startup_config_node,static) = {
>.function = startup_config_process,
>.type = VLIB_NODE_TYPE_PROCESS,
>.name = "startup-config-process",
>.process_log2_n_stack_bytes = 18,
> };
> /* *INDENT-ON* */
> 
> It's entirely possible that compiling -O0 blows the stack, especially if you 
> end up 75 miles deep in fib code.
> 
> Dave
> 
> -Original Message-
> From: Christian Hopps 
> Sent: Friday, June 5, 2020 4:28 PM
> To: Dave Barach (dbarach) 
> Cc: Christian Hopps ; vpp-dev 
> Subject: Re: [vpp-dev] Interesting backtrace in 1908
> 
> 
> 
>> On Jun 5, 2020, at 2:10 PM, Dave Barach via lists.fd.io 
>>  wrote:
>> 
>> Step 1 is to make the silly-looking sibling recursion in 
>> vlib_node_add_next_with_slot(...) disappear. I’m on it...
>> 
>> Just to ask, can you repro w/ master/latest?
> 
> I will try and do this.
> 
> In the meantime I moved the arp configs to later in my startup config (this 
> is actually built by a test script) and immediately hit another sigsegv in 
> startup. This one is in infra but is going through my code initialization, 
> but also rooted in startup config processing... It's also in memcpy code, 
> which is making me suspicious now.
> 
> Again, I've changed "-O2" to "-O0" in the cmake vpp.mk package, when I change 
> it back to -O2 I do not hit either bug. So I'm now wondering if there is 
> something wrong with doing this, like do I need to do something else as well?
> 
> What I'm going for is not to have CLIB_DEBUG defined, but still have useful 
> levels of debugabillity to do RCA on a much (millions of packets) later 
> problem I have.
> 
> modified   build-data/platforms/vpp.mk
> @@ -35,13 +35,21 @@ vpp_debug_TAG_CFLAGS = -O0 -DCLIB_DEBUG 
> $(vpp_common_cflags)  vpp_debug_TAG_CXXFLAGS = -O0 -DCLIB_DEBUG 
> $(vpp_common_cflags)  vpp_debug_TAG_LDFLAGS = -O0 -DCLIB_DEBUG 
> $(vpp_common_cflags)
> 
> -vpp_TAG_CFLAGS = -O2 $(vpp_common_cflags) -vpp_TAG_CXXFLAGS = -O2 
> $(vpp_common_cflags) -vpp_TAG_LDFLAGS = -O2 $(vpp_common_cflags) -pie
> +vpp_TAG_CFLAGS = -O0 $(vpp_common_cflags) vpp_TAG_CXXFLAGS = -O0
> +$(vpp_common_cflags) vpp_TAG_LDFLAGS = -O0 $(vpp_common_cflags) -pie
> 
> The new backtrace I'm seeing is:
> 
> Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
> 0x75cbe8b1 in clib_mov16 (dst=0x6d986a57  access memory at address 0x6d986a57>, src=0x2d8f076c2d08 
> ) at 
> /var/build/vpp/src/vppinfra/memcpy_sse3.h:56
> 
> (gdb) bt
> #0  0x75cbe8b1 in clib_mov16 (dst=0x6d986a57  Cannot access memory at address 0x6d986a57>, 
> src=0x2d8f076c2d08  0x2d8f076c2d08>) at /var/build/vpp/src/vppinfra/memcpy_sse3.h:56
> #1  0x75cbe910 in clib_mov32 (dst=0x7fffb90bea90 "", 
> src=0x7fffafa01fd0 "iptfs-refill-zpool sa_index %d before %d requested 
> %d head %d tail %d") at /var/build/vpp/src/vppinfra/memcpy_sse3.h:66
> #2  0x75cbe951 in clib_mov64 (dst=0x7fffb90bea90 "", 
> src=0x7fffafa01fd0 "iptfs-refill-zpool sa_index %d before %d requested 
> %d head %d tail %d") at /var/build/vpp/src/vppinfra/memcpy_sse3.h:73
> #3  0x75cbed5a in clib_memcpy_fast (dst=0x7fffb90bea90, 
> src=0x7fffafa01fd0, n=5) at 
> /var/build/vpp/src/vppinfra/memcpy_sse3.h:273
> #4  0x75cc5e72 in do_percent (_s=0x7fffb8141a58, 
> fmt=0x75dc7734 "%s%c", va=0x7fffb8141bc8) at 
> /var/build/vpp/src/vppinfra/format.c:341
> #5  0x75cc6564 in va_format (s=0x0, fmt=0x75dc7734 "%s%c", 
> va=0x7fffb8141bc8) at /var/build/vpp/src/vppinfra/format.c:404
> #6  0x75cc6810 in format (s=0x0, fmt=0x75dc7734 "%s%c") at 
> /var/build/vpp/src/vppinfra/format.c:428
> #7  0x75cace7d in elog_event_type_register

Re: [vpp-dev] Interesting backtrace in 1908

2020-06-05 Thread Dave Barach via lists.fd.io
Step 1 is to make the silly-looking sibling recursion in 
vlib_node_add_next_with_slot(...) disappear. I’m on it...

Just to ask, can you repro w/ master/latest?

Thanks... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Christian Hopps
Sent: Friday, June 5, 2020 1:29 PM
To: vpp-dev 
Cc: Christian Hopps 
Subject: [vpp-dev] Interesting backtrace in 1908

I'm wondering if maybe this SIGSEGV/backtrace might be related to the other 
recently reported problem with the FIB and barrier code? The workers are at the 
barrier when the SIGSEGV happens, but maybe they aren't when they need to be 
earlier on?

In this case I've compiled w/o CLIB_DEBUG set, but with compiler flags set to 
-O0 instead of -O2 (trying to debug another problem that occurs much later).

This is being hit (apparently) when my startup config is adding a static arp 
entry (included below the backtrace)

I've sync'd code to the 2 recent commits past 19.08.02 as well as cherry 
picking the fix from Dave for the counter resize issue in the FIB.

I can try and put together a more in depth bug report (or try an RCA it 
myself), but I'm wondering if something might be easily identified from this 
backtrace w/o doing a bunch more work.

Thanks,
Chris.

(gdb) info thre
  Id   Target Id Frame
* 1Thread 83.83 "vpp_main" 0x75ccbb11 in clib_memcpy_fast 
(dst=0x1400, src=0x4500e239, n=936751150609465344) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:187
  2Thread 83.86 "eal-intr-thread" 0x759a3bb7 in epoll_wait 
(epfd=epfd@entry=15, events=events@entry=0x7fff9e8dbe10, 
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3Thread 83.87 "vpp_wk_0" 0x76290c90 in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430
  4Thread 83.88 "vpp_wk_1" 0x76290c9a in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430
  5Thread 83.89 "vpp_wk_2" 0x76290c9f in 
vlib_worker_thread_barrier_check () at /var/build/vpp/src/vlib/threads.h:430

(gdb) bt
#0  0x75ccbb11 in clib_memcpy_fast (dst=0x1400, 
src=0x4500e239, n=936751150609465344) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:187
#1  0x75cd49a8 in lookup (v=0x7fffb7c793c8, key=615, op=SET, 
new_value=0x7fffb7c04880, old_value=0x0) at 
/var/build/vpp/src/vppinfra/hash.c:611
#2  0x75cd6217 in _hash_set3 (v=0x7fffb7c793c8, key=615, 
value=0x7fffb7c04880, old_value=0x0) at /var/build/vpp/src/vppinfra/hash.c:840
#3  0x762b0b28 in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=522, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:241
#4  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=521, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#5  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=520, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#6  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=519, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#7  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=274, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#8  0x762b102b in vlib_node_add_next_with_slot (vm=0x7656c400 
, node_index=523, next_node_index=615, slot=3) at 
/var/build/vpp/src/vlib/node.c:253
#9  0x776e120a in vlib_node_add_next (next_node=615, node=523, 
vm=0x7656c400 ) at 
/var/build/vpp/src/vlib/node_funcs.h:1109
#10 adj_nbr_update_rewrite_internal (adj=0x7fffb821e5c0, 
adj_next_index=IP_LOOKUP_NEXT_REWRITE, this_node=523, next_node=615, 
rewrite=0x0) at /var/build/vpp/src/vnet/adj/adj_nbr.c:488
#11 0x776e0c34 in adj_nbr_update_rewrite (adj_index=2, 
flags=ADJ_NBR_REWRITE_FLAG_COMPLETE, rewrite=0x7fffb89f4d60 "\002") at 
/var/build/vpp/src/vnet/adj/adj_nbr.c:314
#12 0x76f695b3 in arp_mk_complete (ai=2, e=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/ethernet/arp.c:385
#13 0x76f696be in arp_mk_complete_walk (ai=2, ctx=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/ethernet/arp.c:430
#14 0x776e15c0 in adj_nbr_walk_nh4 (sw_if_index=1, addr=0x7fffb7f116ac, 
cb=0x76f69696 , ctx=0x7fffb7f116a8) at 
/var/build/vpp/src/vnet/adj/adj_nbr.c:624
#15 0x76f6a436 in arp_update_adjacency (vnm=0x77b47e80 , 
sw_if_index=1, ai=2) at /var/build/vpp/src/vnet/ethernet/arp.c:540
#16 0x76b30f27 in ethernet_update_adjacency (vnm=0x77b47e80 
, sw_if_index=1, ai=2) at 
/var/build/vpp/src/vnet/ethernet/interface.c:210
#17 0x77706ceb in vnet_update_adjacency_for_sw_interface 
(vnm=0x77b47e80 , sw_if_index=1, ai=2) at 
/var/build/vpp/src/vnet/adj/rewrite.c:187
#18 0x776e0b18 in adj_nbr_add_or_lock (nh_proto=FIB_PROTOCOL_IP4, 
link_type=VNET_LINK_IP4, nh_addr=0x7fffb821f200, 

Re: [vpp-dev] Interesting backtrace in 1908

2020-06-05 Thread Dave Barach via lists.fd.io
Hmmm. That begins to smell like an undetected stack overflow. To test that 
theory: s/18/20/ below: 

/* *INDENT-OFF* */
VLIB_REGISTER_NODE (startup_config_node,static) = {
.function = startup_config_process,
.type = VLIB_NODE_TYPE_PROCESS,
.name = "startup-config-process",
.process_log2_n_stack_bytes = 18,
};
/* *INDENT-ON* */

It's entirely possible that compiling -O0 blows the stack, especially if you 
end up 75 miles deep in fib code.

Dave

-Original Message-
From: Christian Hopps  
Sent: Friday, June 5, 2020 4:28 PM
To: Dave Barach (dbarach) 
Cc: Christian Hopps ; vpp-dev 
Subject: Re: [vpp-dev] Interesting backtrace in 1908



> On Jun 5, 2020, at 2:10 PM, Dave Barach via lists.fd.io 
>  wrote:
> 
> Step 1 is to make the silly-looking sibling recursion in 
> vlib_node_add_next_with_slot(...) disappear. I’m on it...
>  
> Just to ask, can you repro w/ master/latest?

I will try and do this.

In the meantime I moved the arp configs to later in my startup config (this is 
actually built by a test script) and immediately hit another sigsegv in 
startup. This one is in infra but is going through my code initialization, but 
also rooted in startup config processing... It's also in memcpy code, which is 
making me suspicious now.

Again, I've changed "-O2" to "-O0" in the cmake vpp.mk package, when I change 
it back to -O2 I do not hit either bug. So I'm now wondering if there is 
something wrong with doing this, like do I need to do something else as well?

What I'm going for is not to have CLIB_DEBUG defined, but still have useful 
levels of debugabillity to do RCA on a much (millions of packets) later problem 
I have.
  
modified   build-data/platforms/vpp.mk
@@ -35,13 +35,21 @@ vpp_debug_TAG_CFLAGS = -O0 -DCLIB_DEBUG 
$(vpp_common_cflags)  vpp_debug_TAG_CXXFLAGS = -O0 -DCLIB_DEBUG 
$(vpp_common_cflags)  vpp_debug_TAG_LDFLAGS = -O0 -DCLIB_DEBUG 
$(vpp_common_cflags)

-vpp_TAG_CFLAGS = -O2 $(vpp_common_cflags) -vpp_TAG_CXXFLAGS = -O2 
$(vpp_common_cflags) -vpp_TAG_LDFLAGS = -O2 $(vpp_common_cflags) -pie
+vpp_TAG_CFLAGS = -O0 $(vpp_common_cflags) vpp_TAG_CXXFLAGS = -O0 
+$(vpp_common_cflags) vpp_TAG_LDFLAGS = -O0 $(vpp_common_cflags) -pie

The new backtrace I'm seeing is:

Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
0x75cbe8b1 in clib_mov16 (dst=0x6d986a57 , src=0x2d8f076c2d08 ) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:56

(gdb) bt
#0  0x75cbe8b1 in clib_mov16 (dst=0x6d986a57 , src=0x2d8f076c2d08 ) at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:56
#1  0x75cbe910 in clib_mov32 (dst=0x7fffb90bea90 "", src=0x7fffafa01fd0 
"iptfs-refill-zpool sa_index %d before %d requested %d head %d tail %d") at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:66
#2  0x75cbe951 in clib_mov64 (dst=0x7fffb90bea90 "", src=0x7fffafa01fd0 
"iptfs-refill-zpool sa_index %d before %d requested %d head %d tail %d") at 
/var/build/vpp/src/vppinfra/memcpy_sse3.h:73
#3  0x75cbed5a in clib_memcpy_fast (dst=0x7fffb90bea90, 
src=0x7fffafa01fd0, n=5) at /var/build/vpp/src/vppinfra/memcpy_sse3.h:273
#4  0x75cc5e72 in do_percent (_s=0x7fffb8141a58, fmt=0x75dc7734 
"%s%c", va=0x7fffb8141bc8) at /var/build/vpp/src/vppinfra/format.c:341
#5  0x75cc6564 in va_format (s=0x0, fmt=0x75dc7734 "%s%c", 
va=0x7fffb8141bc8) at /var/build/vpp/src/vppinfra/format.c:404
#6  0x75cc6810 in format (s=0x0, fmt=0x75dc7734 "%s%c") at 
/var/build/vpp/src/vppinfra/format.c:428
#7  0x75cace7d in elog_event_type_register (em=0x7656c7a8 
, t=0x7fffb9058300) at 
/var/build/vpp/src/vppinfra/elog.c:173
#8  0x7fffaf9b8ba4 in elog_event_data_inline (cpu_time=3193258542505306, 
track=0x7fffb9093f98, type=0x7fffafc08880 , em=0x7656c7a8 
) at /var/build/vpp/src/vppinfra/elog.h:310
#9  elog_data_inline (track=0x7fffb9093f98, type=0x7fffafc08880 , 
em=0x7656c7a8 ) at 
/var/build/vpp/src/vppinfra/elog.h:435
#10 iptfs_refill_zpool (vm=0x7656c400 , 
zpool=0x7fffb915a8c0, sa_index=1, payload_size=1470, put=false, 
track=0x7fffb9093f98) at /var/build/vpp/src/plugins/iptfs/iptfs_zpool.c:134
#11 0x7fffaf9b9d3f in iptfs_zpool_alloc (vm=0x7656c400 
, queue_size=768, sa_index=1, payload_size=1470, put=false, 
track=0x7fffb9093f98) at /var/build/vpp/src/plugins/iptfs/iptfs_zpool.c:235
#12 0x7fffaf99d73c in iptfs_tfs_data_init (sa_index=1, conf=0x7fffb91cd7c0) 
at /var/build/vpp/src/plugins/iptfs/ipsec_iptfs.c:347
#13 0x7fffaf9a0a09 in iptfs_add_del_sa (sa_index=1, 
tfs_config=0x7fffb91cd7c0, is_add=1 '\001') at 
/var/build/vpp/src/plugins/iptfs/ipsec_iptfs.c:822
#14 0x76fff6a8 in ipsec_sa_add_and_lock (id=3221225472, spi=1112, 
proto=IPSEC_PROTOCOL_ESP, crypto_alg=IPSEC_CRYPTO_ALG_NONE, ck=0x7fffb8144cf0, 
integ_alg=IPSEC_INTEG_ALG_NONE, ik=0x7fffb8144d80, flags

Re: [vpp-dev] received signal SIGSEGV (vppinfra/time_range.h) no such file or directory

2020-06-07 Thread Dave Barach via lists.fd.io
https://gerrit.fd.io/r/c/vpp/+/27458. Next time you find something like this, 
feel free to fix it and push a patch.

From: vpp-dev@lists.fd.io  On Behalf Of Pac Ette
Sent: Saturday, June 6, 2020 4:15 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] received signal SIGSEGV (vppinfra/time_range.h) no such file 
or directory

Hi all,

I encountered this error today while using:
vppctl sh mactime

problem in vpp v20.05 compiled from stable/2005 branch.
working fine in vpp v20.01

gdb:
Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
0x7fffad33f563 in clib_timebase_now (tb=)
at /root/vpp/src/vppinfra/time_range.h:92
92 /root/vpp/src/vppinfra/time_range.h: No such file or directory.

syslog:
2020-06-05T23:19:08.276917+00:00 333c55ab vnet[16756]: received signal SIGSEGV, 
PC 0x7f21e5c44563, faulting address 0x28
2020-06-05T23:19:08.278486+00:00 333c55ab vnet[16756]: #0  0x7f222e6f96e5 
0x7f222e6f96e5
2020-06-05T23:19:08.279024+00:00 333c55ab vnet[16756]: #1  0x7f222df4f890 
0x7f222df4f890
2020-06-05T23:19:08.279146+00:00 333c55ab vnet[16756]: #2  0x7f21e5c44563 
0x7f21e5c44563
2020-06-05T23:19:08.279636+00:00 333c55ab vnet[16756]: #3  0x7f222e670762 
0x7f222e670762
2020-06-05T23:19:08.279757+00:00 333c55ab vnet[16756]: #4  0x7f222e670679 
0x7f222e670679
2020-06-05T23:19:08.279861+00:00 333c55ab vnet[16756]: #5  0x7f222e66fde0 
vlib_cli_input + 0x80
2020-06-05T23:19:08.280377+00:00 333c55ab vnet[16756]: #6  0x7f222e6e9a4b 
0x7f222e6e9a4b
2020-06-05T23:19:08.280499+00:00 333c55ab vnet[16756]: #7  0x7f222e699cc7 
0x7f222e699cc7
2020-06-05T23:19:08.280968+00:00 333c55ab vnet[16756]: #8  0x7f222dadf3f4 
0x7f222dadf3f4

Thanks!
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16682): https://lists.fd.io/g/vpp-dev/message/16682
Mute This Topic: https://lists.fd.io/mt/74720451/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] vlib_add_trace(...) - stop inlining?

2020-06-08 Thread Dave Barach via lists.fd.io
Folks,

It looks to me like inlining vlib_add_trace(...) is probably a mistake in terms 
of code bloat. Does anyone hate the idea of changing it to a standard function?

Thanks... Dave
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16686): https://lists.fd.io/g/vpp-dev/message/16686
Mute This Topic: https://lists.fd.io/mt/74752449/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] vlib_add_trace(...) - stop inlining?

2020-06-08 Thread Dave Barach via lists.fd.io
See https://gerrit.fd.io/r/c/vpp/+/27467.

Here's the commit message:

vlib: stop inlining vlib_add_trace(...)

Packet tracing performance doesn't justify inlining vlib_add_trace(...) over 
500 times.

It makes a 15% text-segment size difference in a representative use-case:

Inline:
$ size .../vnet_skx.dir/ipsec/ipsec_input.c.o
   textdata bss dec hex filename
   6831  80   069111aff .../vnet_skx.dir/ipsec/ipsec_input.c.o

Not inline:
$ size .../vnet_skx.dir/ipsec/ipsec_input.c.o
   textdata bss dec hex filename
   5776  80   0585616e0 .../vnet_skx.dir/ipsec/ipsec_input.c.o

Retain the original code as vlib_add_trace_inline, instantiate once as 
vlib_add_trace.

Type: refactor
Signed-off-by: Dave Barach
Change-Id: Iaf431dbf00c4aad03663d86f9dd1322e84d03962

From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
lists.fd.io
Sent: Monday, June 8, 2020 10:13 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] vlib_add_trace(...) - stop inlining?

Folks,

It looks to me like inlining vlib_add_trace(...) is probably a mistake in terms 
of code bloat. Does anyone hate the idea of changing it to a standard function?

Thanks... Dave
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16688): https://lists.fd.io/g/vpp-dev/message/16688
Mute This Topic: https://lists.fd.io/mt/74752449/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Stop data processing in the node until event

2020-06-09 Thread Dave Barach via lists.fd.io
Seems like a design which will cause no end of trouble. Coded this way, key 
swaps will put serious pressure on the buffer allocator. What if the server 
never replies?

Accept either key for a short period of time. As soon as the new key is in hand 
– and one packet decrypts with it – flush the old key.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Artem Glazychev
Sent: Tuesday, June 9, 2020 7:25 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Stop data processing in the node until event


Good morning.

I'm writing a plugin with tunnel encryption.
I have a question. For example, in the middle of the data encrypting we decided 
that we need to update client-server keys. How to stop (start wait) data flow 
before i-th packet, for request-receive keys from server and continue encrypt 
i-th packet with new keys? Is it possible? Can you give me a direction?

Thanks.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16701): https://lists.fd.io/g/vpp-dev/message/16701
Mute This Topic: https://lists.fd.io/mt/74771816/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


<    1   2   3   4   5   6   >