[vpp-dev] Facing issue Ipv6 routing (Vpp 1908)

2020-11-16 Thread chetan bhasin
Hello Everyone,

We are facing an issue with respect to ipv6 fib lookup (Vpp-1908).
Sometimes a packet comes out of the wrong interface , looks like due to
wrong fib lookup.

I have found one change which is not part of our code
https://gerrit.fd.io/r/c/vpp/+/27255

Can anybody please suggest , is that change could cause such a problem ?

Thanks,
Chetan Bhasin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18049): https://lists.fd.io/g/vpp-dev/message/18049
Mute This Topic: https://lists.fd.io/mt/78311419/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] svm ASAN check error

2020-11-16 Thread jiangxiaoming
Hi ben:

You can try the following steps:
* make VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON build
* make run
* session enable

> 
> DBGvpp# session enable
> ==91579==AddressSanitizer CHECK failed:
> ../../../../libsanitizer/asan/asan_mapping.h:377 "((AddrIsInMem(p))) !=
> (0)" (0x0, 0x0)
> #0 0x7fc550ef058a in AsanCheckFailed
> ../../../../libsanitizer/asan/asan_rtl.cc:72
> #1 0x7fc550f0e49a  (/lib64/libasan.so.5+0x13449a)
> #2 0x7fc550eebedc in
> __asan::ShadowSegmentEndpoint::ShadowSegmentEndpoint(unsigned long)
> ../../../../libsanitizer/asan/asan_mapping.h:377
> #3 0x7fc550eebedc in __asan_unpoison_memory_region
> ../../../../libsanitizer/asan/asan_poisoning.cc:152
> #4 0x7fc54d3db01b in clib_mem_vm_map_internal
> /home/dev/code/vpp/src/vppinfra/linux/mem.c:491
> #5 0x7fc54d2a0124 in clib_mem_vm_map_shared
> /home/dev/code/vpp/src/vppinfra/mem.c:68
> #6 0x7fc551905d8b in ssvm_server_init_memfd
> /home/dev/code/vpp/src/svm/ssvm.c:253
> #7 0x7fc551906db8 in ssvm_server_init
> /home/dev/code/vpp/src/svm/ssvm.c:435
> #8 0x7fc54fe12a1b in session_vpp_event_queues_allocate
> /home/dev/code/vpp/src/vnet/session/session.c:1497
> #9 0x7fc54fe1615b in session_manager_main_enable
> /home/dev/code/vpp/src/vnet/session/session.c:1678
> #10 0x7fc54fe16b35 in vnet_session_enable_disable
> /home/dev/code/vpp/src/vnet/session/session.c:1772
> #11 0x7fc54fee967f in session_enable_disable_fn
> /home/dev/code/vpp/src/vnet/session/session_cli.c:873
> #12 0x7fc54db15611 in vlib_cli_dispatch_sub_commands
> /home/dev/code/vpp/src/vlib/cli.c:572
> #13 0x7fc54db16166 in vlib_cli_input /home/dev/code/vpp/src/vlib/cli.c:674
> 
> #14 0x7fc54dcacd31 in unix_cli_process_input
> /home/dev/code/vpp/src/vlib/unix/cli.c:2620
> #15 0x7fc54dcaeb19 in unix_cli_process
> /home/dev/code/vpp/src/vlib/unix/cli.c:2738
> #16 0x7fc54dbc583f in vlib_process_bootstrap
> /home/dev/code/vpp/src/vlib/main.c:1477
> #17 0x7fc54d288f0f 
> (/home/dev/code/vpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so.21.01+0xcff0f)
> 
> 
> /bin/sh: line 1: 91578 Segmentation fault      sudo -E
> /home/dev/code/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp " unix
> { interactive cli-listen /run/vpp/cli.sock gid 0  }  "dpdk { no-pci }"  "
> make: *** [run] Error 139
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18047): https://lists.fd.io/g/vpp-dev/message/18047
Mute This Topic: https://lists.fd.io/mt/78245228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] why tunnel interfaces do not support device-input feature?

2020-11-16 Thread 叶东岗

Hi all:

why tunnel interfaces do not support device-input feature?

why  esp packets  do not go through ipsec interface's "interface-output" 
node?


I think it's no bad idea to keep some features consistency of all 
interface in spite of an little performance degradation?



Best regards,
Ye Donggang


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18046): https://lists.fd.io/g/vpp-dev/message/18046
Mute This Topic: https://lists.fd.io/mt/78307484/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] vppctl fails from within the application (system cmd returns 256)

2020-11-16 Thread Aniket Pugaonkar
Hi,

i am having a issue with VPP running on RH8.2 with version 20.05.1.

There are 4 commands that I am trying to run from my application (using
vppcli), fairly straight forward.  the Trace.Writeline function is just a
utility to print out the trace.

what I see is the ret val is 256 indicating system command did not get
executed properly. however  the same command when i type it manually ,
works perfectly fine.. Not sure what is wrong. I am logged in as root and
running the process using "./a.out"  (manually) as a root.

Could anyone suggest anything else to check apart from switching to VPP API
to configure interfaces/add routes etc? we had few issues with vpp api/vcl
libraries and have not decided to pursue vpp - api for now, so vppctl is
the only option. Alternative is to write a ipaddr.txt file and put it in
startup.conf  -> which will do the trick. However I want to do it from
application.   is there special flag in vpp.log to enable that can give
information. i could not find anything useful in vpp.log

string s1("vppctl create sub-interfaces
HundredGigabitEthernet12/0/1 501");
string s2("vppctl set interface state HundredGigabitEthernet12/0/1
up");
string s3("vppctl set interface state
HundredGigabitEthernet12/0/1.501 up");
string s4("vppctl set interface ip address
HundredGigabitEthernet12/0/1.501 2001:5b0::501:b883:31f:19e:8879/64");

int ret;
ret = system(s1.c_str());
Trace.WriteLine(TraceConstants::Configuration, "Cmd: ", s1, ", ret
= ", ret);
ret = system(s2.c_str());
Trace.WriteLine(TraceConstants::Configuration, "Cmd: ", s2, ", ret
= ", ret);
ret = system(s3.c_str());
Trace.WriteLine(TraceConstants::Configuration, "Cmd: ", s3, ", ret
= ", ret);
ret = system(s4.c_str());
Trace.WriteLine(TraceConstants::Configuration, "Cmd: ", s4, ", ret
= ", ret);


my startup.conf file:

unix {
  nodaemon
  log /var/log/vpp/vpp.log
  full-coredump
  cli-listen /run/vpp/cli.sock
  gid vpp
}


[image: image.png]




-- 

Thanks and regards,
Aniket Pugaonkar

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18045): https://lists.fd.io/g/vpp-dev/message/18045
Mute This Topic: https://lists.fd.io/mt/78297645/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] #vpp #vppcom #vpp-dev #vnet

2020-11-16 Thread nikhil subhedar
Hi All,
I have a use case in which  a TCP connection will get terminate on VPP. So for 
that purpose i wanted to configure a TCP port which will bind to a interface.
I wanted to verify the session manager vnet node as well as vpp graph nodes. 
For this exercise i am using hping3.

Can anyone shed some light on this? also if anyone has back trace would be 
helpful .

Regards,
Nikhil

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18044): https://lists.fd.io/g/vpp-dev/message/18044
Mute This Topic: https://lists.fd.io/mt/78297247/21656
Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp
Mute #vppcom:https://lists.fd.io/g/vpp-dev/mutehashtag/vppcom
Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Python API modules

2020-11-16 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
> something has gone wrong here.

Yeah, sorry for the confusion.

I consult the maintainers file when double-checking who to add
as reviewers. But when I think the maintainers file is outdated,
I do not hesitate to add [0] or remove [1] anybody
based on my (limited) perception.

Usually I am too focused on fixing VPP
to properly think about the process of editing the maintainers file.
Do we have a document with guidelines related to that?

Vratko.

[0] https://gerrit.fd.io/r/c/vpp/+/20366
[1] https://gerrit.fd.io/r/c/vpp/+/22672

From: vpp-dev@lists.fd.io  On Behalf Of Ole Troan
Sent: Monday, 2020-November-16 15:16
To: Paul Vinciguerra 
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
Marcos - Mgiga ; vpp-dev 
Subject: Re: [vpp-dev] Python API modules

Hi Paul,

> He/Vratko removed me as one of the maintainers of papi.  
> https://gerrit.fd.io/r/c/vpp/+/22672.  I'm cool with not being a maintainer, 
> kinda funny that it was stuffed into another changeset.  I have been called 
> out repeatedly for submitting unrelated changes ;)

My apologies, something has gone wrong here. I wasn't even aware that you got 
removed.
I have reviewed the history and spoken to Vratko to understand what has 
happened here. Vratko was editing the MAINTAINERS file and made changes in good 
faith reflecting what he believed was the current state. Expecting reviewers to 
comment. I missed that in code review, and the patch went in. I do not believe 
that's a correct way of doing things, and since this was done in error, I'd 
like to add you back as maintainer for PAPI. That is revert to the state prior 
to that patch.

Apologies again, and let's fix that error.

Best regards,
Ole



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18043): https://lists.fd.io/g/vpp-dev/message/18043
Mute This Topic: https://lists.fd.io/mt/78229638/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Python API modules

2020-11-16 Thread Ole Troan
Hi Paul,

> He/Vratko removed me as one of the maintainers of papi.  
> https://gerrit.fd.io/r/c/vpp/+/22672.  I'm cool with not being a maintainer, 
> kinda funny that it was stuffed into another changeset.  I have been called 
> out repeatedly for submitting unrelated changes ;)

My apologies, something has gone wrong here. I wasn't even aware that you got 
removed.
I have reviewed the history and spoken to Vratko to understand what has 
happened here. Vratko was editing the MAINTAINERS file and made changes in good 
faith reflecting what he believed was the current state. Expecting reviewers to 
comment. I missed that in code review, and the patch went in. I do not believe 
that's a correct way of doing things, and since this was done in error, I'd 
like to add you back as maintainer for PAPI. That is revert to the state prior 
to that patch.

Apologies again, and let's fix that error.

Best regards,
Ole
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18042): https://lists.fd.io/g/vpp-dev/message/18042
Mute This Topic: https://lists.fd.io/mt/78229638/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: Handoff design issues [Re: RES: RES: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?]

2020-11-16 Thread Christian Hopps
Just checked out the patch; you are compressing the frames on the receiving 
thread side. I didn't realize (i.e., look) that the code was copying the buffer 
indexes into a new frame anyway, as it is, I think this is a good fix!

Thanks,
Chris.

> On Nov 16, 2020, at 4:20 AM, Klement Sekera -X (ksekera - PANTHEON TECH SRO 
> at Cisco)  wrote:
> 
> That’s exactly what my patch improves. Coalescing small groups of packets 
> waiting in the handoff queue into a full(er) frame allows the downstream node 
> to do more “V” and achieve better performance. And that’s also what I’ve seen 
> when testing the patch.
> 
> Thanks,
> Klement
> 
> ps. in case you missed the link: https://gerrit.fd.io/r/c/vpp/+/28980
> 
>> On 13 Nov 2020, at 22:47, Christian Hopps  wrote:
>> 
>> FWIW, I too have hit this issue. Basically VPP is designed to process a 
>> packet from rx to tx in the same thread. When downstream nodes run slower, 
>> the upstream rx node doesn't run, so the vector size in each frame naturally 
>> increases, and then the downstream nodes can benefit from "V" (i.e., 
>> processing multiple packets in one go).
>> 
>> This back-pressure from downstream does not occur when you hand-off from a 
>> fast thread to a slower thread, so you end up with many single packet frames 
>> and fill your hand-off queue.
>> 
>> The quick fix one tries then is to increase the queue size; however, this is 
>> not a great solution b/c you are still not taking advantage of the "V" in 
>> VPP. To really fit this back into the original design one needs to somehow 
>> still be creating larger vectors in the hand-off frames.
>> 
>> TBH I think the right solution here is to not hand-off frames, and instead 
>> switch to packet queues and then on the handed-off side the frames would get 
>> constructed from packet queues (basically creating another polling input 
>> node but on the new thread).
>> 
>> Thanks,
>> Chris.
>> 
>>> On Nov 13, 2020, at 12:21 PM, Marcos - Mgiga  wrote:
>>> 
>>> Understood. And what path did you take in order to analyse and monitor 
>>> vector rates ? Is there some specific command or log ?
>>> 
>>> Thanks
>>> 
>>> Marcos
>>> 
>>> -Mensagem original-
>>> De: vpp-dev@lists.fd.io  Em nome de ksekera via []
>>> Enviada em: sexta-feira, 13 de novembro de 2020 14:02
>>> Para: Marcos - Mgiga 
>>> Cc: Elias Rudberg ; vpp-dev@lists.fd.io
>>> Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size 
>>> NAT_FQ_NELTS to avoid congestion drops?
>>> 
>>> Not completely idle, more like medium load. Vector rates at which I saw 
>>> congestion drops were roughly 40 for thread doing no work (just handoffs - 
>>> I hardcoded it this way for test purpose), and roughly 100 for thread 
>>> picking the packets doing NAT.
>>> 
>>> What got me into infra investigation was the fact that once I was hitting 
>>> vector rates around 255, I did see packet drops, but no congestion drops.
>>> 
>>> HTH,
>>> Klement
>>> 
 On 13 Nov 2020, at 17:51, Marcos - Mgiga  wrote:
 
 So you mean that this situation ( congestion drops) is most likely to 
 occur when the system in general is idle than when it is processing a 
 large amount of traffic?
 
 Best Regards
 
 Marcos
 
 -Mensagem original-
 De: vpp-dev@lists.fd.io  Em nome de Klement
 Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 2020
 12:15
 Para: Elias Rudberg 
 Cc: vpp-dev@lists.fd.io
 Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size 
 NAT_FQ_NELTS to avoid congestion drops?
 
 Hi Elias,
 
 I’ve already debugged this and came to the conclusion that it’s the infra 
 which is the weak link. I was seeing congestion drops at mild load, but 
 not at full load. Issue is that with handoff, there is uneven workload. 
 For simplicity’s sake, just consider thread 1 handing off all the traffic 
 to thread 2. What happens is that for thread 1, the job is much easier, it 
 just does some ip4 parsing and then hands packet to thread 2, which 
 actually does the heavy lifting of hash inserts/lookups/translation etc. 
 64 element queue can hold 64 frames, one extreme is 64 1-packet frames, 
 totalling 64 packets, other extreme is 64 255-packet frames, totalling 
 ~16k packets. What happens is this: thread 1 is mostly idle and just 
 picking a few packets from NIC and every one of these small frames creates 
 an entry in the handoff queue. Now thread 2 picks one element from the 
 handoff queue and deals with it before picking another one. If the queue 
 has only 3-packet or 10-packet elements, then thread 2 can never really 
 get into what VPP excels in - bulk processing.
 
 Q: Why doesn’t it pick as many packets as possible from the handoff queue?
 A: It’s not implemented.
 
 I already wrote a patch for it, which made all congestion drops which I 
 saw (in above 

Re: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?

2020-11-16 Thread Klement Sekera via lists.fd.io
Hi Elias,

thanks for getting back with some real numbers. I only tested with two workers 
and a very simple case and in my case, increasing queue size didn’t help one 
bit. But again, in my case there was 100% handoff rate (every single packet was 
going through handoff), which is most probably the reason why one solution 
seemed like holy grail and the other useless.

To answer your question regarding why queue length is 64 - I guess nobody knows 
as the author of that code has been gone for a while. I see no reason why this 
shouldn’t be configurable. When I tried just increasing the value I quickly run 
into out-of-buffers situation with default configs.

Would you like to submit a patch?

Thanks,
Klement

> On 16 Nov 2020, at 11:33, Elias Rudberg  wrote:
> 
> Hi Klement,
> 
> Thanks! I have now tested your patch (28980), it seems to work and it
> does give some improvement. However, according to my tests, increasing
> NAT_FQ_NELTS seems to have a bigger effect, it improves performance a
> lot. When using the original NAT_FQ_NELTS value of 64, your patch
> gives some improvement but I still get the best performance when
> increasing NAT_FQ_NELTS.
> 
> For example, one of the tests behaves like this:
> 
> Without patch, NAT_FQ_NELTS=64  --> 129 Gbit/s and ~600k cong. drops
> With patch, NAT_FQ_NELTS=64  --> 136 Gbit/s and ~400k cong. drops
> Without patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops
> With patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops
> 
> So it still looks like increasing NAT_FQ_NELTS would be good, which
> brings me back to the same questions as before:
> 
> Were there specific reasons for setting NAT_FQ_NELTS to 64?
> 
> Are there some potential drawbacks or dangers of changing it to a
> larger value?
> 
> I suppose everyone will agree that when there is a queue with a
> maximum length, the choice of that maximum length can be important. Is
> there some particular reason to believe that 64 would be enough? In
> our case we are using 8 NAT threads. Suppose thread 8 is held up
> briefly due to something taking a little longer than usual, meanwhile
> threads 1-7 each hand off 10 frames to thread 8, that situation would
> require a queue size of at least 70, unless I misunderstood how the
> handoff mechanism works. To me, allowing a longer queue seems like a
> good thing because it allows us to handle also more difficult cases
> when threads are not always equally fast, there can be spikes in
> traffic that affect some threads more than others, things like
> that. But maybe there are strong reasons for keeping the queue short,
> reasons I don't know about, that's why I'm asking.
> 
> Best regards,
> Elias
> 
> 
> On Fri, 2020-11-13 at 15:14 +, Klement Sekera -X (ksekera -
> PANTHEON TECH SRO at Cisco) wrote:
>> Hi Elias,
>> 
>> I’ve already debugged this and came to the conclusion that it’s the
>> infra which is the weak link. I was seeing congestion drops at mild
>> load, but not at full load. Issue is that with handoff, there is
>> uneven workload. For simplicity’s sake, just consider thread 1
>> handing off all the traffic to thread 2. What happens is that for
>> thread 1, the job is much easier, it just does some ip4 parsing and
>> then hands packet to thread 2, which actually does the heavy lifting
>> of hash inserts/lookups/translation etc. 64 element queue can hold 64
>> frames, one extreme is 64 1-packet frames, totalling 64 packets,
>> other extreme is 64 255-packet frames, totalling ~16k packets. What
>> happens is this: thread 1 is mostly idle and just picking a few
>> packets from NIC and every one of these small frames creates an entry
>> in the handoff queue. Now thread 2 picks one element from the handoff
>> queue and deals with it before picking another one. If the queue has
>> only 3-packet or 10-packet elements, then thread 2 can never really
>> get into what VPP excels in - bulk processing.
>> 
>> Q: Why doesn’t it pick as many packets as possible from the handoff
>> queue? 
>> A: It’s not implemented.
>> 
>> I already wrote a patch for it, which made all congestion drops which
>> I saw (in above synthetic test case) disappear. Mentioned patch 
>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
>> 
>> Would you like to give it a try and see if it helps your issue? We
>> shouldn’t need big queues under mild loads anyway …
>> 
>> Regards,
>> Klement
>> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18040): https://lists.fd.io/g/vpp-dev/message/18040
Mute This Topic: https://lists.fd.io/mt/78230881/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?

2020-11-16 Thread Elias Rudberg
Hi Klement,

Thanks! I have now tested your patch (28980), it seems to work and it
does give some improvement. However, according to my tests, increasing
NAT_FQ_NELTS seems to have a bigger effect, it improves performance a
lot. When using the original NAT_FQ_NELTS value of 64, your patch
gives some improvement but I still get the best performance when
increasing NAT_FQ_NELTS.

For example, one of the tests behaves like this:

Without patch, NAT_FQ_NELTS=64  --> 129 Gbit/s and ~600k cong. drops
With patch, NAT_FQ_NELTS=64  --> 136 Gbit/s and ~400k cong. drops
Without patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops
With patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops

So it still looks like increasing NAT_FQ_NELTS would be good, which
brings me back to the same questions as before:

Were there specific reasons for setting NAT_FQ_NELTS to 64?

Are there some potential drawbacks or dangers of changing it to a
larger value?

I suppose everyone will agree that when there is a queue with a
maximum length, the choice of that maximum length can be important. Is
there some particular reason to believe that 64 would be enough? In
our case we are using 8 NAT threads. Suppose thread 8 is held up
briefly due to something taking a little longer than usual, meanwhile
threads 1-7 each hand off 10 frames to thread 8, that situation would
require a queue size of at least 70, unless I misunderstood how the
handoff mechanism works. To me, allowing a longer queue seems like a
good thing because it allows us to handle also more difficult cases
when threads are not always equally fast, there can be spikes in
traffic that affect some threads more than others, things like
that. But maybe there are strong reasons for keeping the queue short,
reasons I don't know about, that's why I'm asking.

Best regards,
Elias


On Fri, 2020-11-13 at 15:14 +, Klement Sekera -X (ksekera -
PANTHEON TECH SRO at Cisco) wrote:
> Hi Elias,
> 
> I’ve already debugged this and came to the conclusion that it’s the
> infra which is the weak link. I was seeing congestion drops at mild
> load, but not at full load. Issue is that with handoff, there is
> uneven workload. For simplicity’s sake, just consider thread 1
> handing off all the traffic to thread 2. What happens is that for
> thread 1, the job is much easier, it just does some ip4 parsing and
> then hands packet to thread 2, which actually does the heavy lifting
> of hash inserts/lookups/translation etc. 64 element queue can hold 64
> frames, one extreme is 64 1-packet frames, totalling 64 packets,
> other extreme is 64 255-packet frames, totalling ~16k packets. What
> happens is this: thread 1 is mostly idle and just picking a few
> packets from NIC and every one of these small frames creates an entry
> in the handoff queue. Now thread 2 picks one element from the handoff
> queue and deals with it before picking another one. If the queue has
> only 3-packet or 10-packet elements, then thread 2 can never really
> get into what VPP excels in - bulk processing.
> 
> Q: Why doesn’t it pick as many packets as possible from the handoff
> queue? 
> A: It’s not implemented.
> 
> I already wrote a patch for it, which made all congestion drops which
> I saw (in above synthetic test case) disappear. Mentioned patch 
> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
> 
> Would you like to give it a try and see if it helps your issue? We
> shouldn’t need big queues under mild loads anyway …
> 
> Regards,
> Klement
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18039): https://lists.fd.io/g/vpp-dev/message/18039
Mute This Topic: https://lists.fd.io/mt/78230881/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] IPSEC traffic fails when ESN is enabled

2020-11-16 Thread Benoit Ganne (bganne) via lists.fd.io
Hi Vijay,

It is not a known issue AFAIK. Can you share more details?

vpp# show ikev2 sa details
vpp# show ipsec all

Also, could you share a packet trace?
vpp# clear trace
vpp# trace add dpdk-input 10
[send traffic and see it being dropped]
vpp# show trace

Thanks
ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Vijay Kumar
> Sent: dimanche 15 novembre 2020 07:56
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] IPSEC traffic fails when ESN is enabled
> 
> Hi,
> 
> I have set up IPSEC SA b/w the Strongswan (initiator) and VPP (responder).
> Traffic flows fine but when I explicitly enabled ESN on Strongswan the
> IPSEC SA is established fine but traffic fails. I mean the ESP packets are
> going out from SS to the VPP but traffic is dropped at VPP.
> 
> I had sent 10 packets from SS to VPP. All 10 were dropped. The show
> interface (ipip0), show node counters and show errors all point to one
> counter that matches the value 10 packets that are dropped ("unknown ip
> protocol")
> 
> 
> Is this known issue and any fix is available?
> 
> 
> I have captured the version details and interface and error counters
> below: -
> =
> vpp# show version
> vpp v21.01-rc0~324-g62877029a built by root on ubuntu-10-37-3-75 at 2020-
> 10-30T11:10:45
> vpp#
> 
> 
> vpp# show ikev2 sa
> iip 10.75.1.20 ispi 29734be0bcf0ad74 rip 10.75.1.99 rspi e75e645e3741e754
> vpp#
> vpp#
> vpp# show ipsec sa
> [0] sa 2147483648 (0x8000) spi 3241827758 (0xc13a5dae) protocol:esp
> flags:[esn anti-replay ]
> [1] sa 3221227520 (0xc800) spi 3662743779 (0xda5108e3) protocol:esp
> flags:[esn anti-replay inbound ]
> vpp#
> vpp#
> vpp#
> vpp# show interface
>   Name   IdxState  MTU (L3/IP4/IP6/MPLS)
> Counter  Count
> GigabitEthernetb/0/0  1  up  9000/0/0/0 rx
> packets   895
> rx
> bytes   89264
> tx
> packets   399
> tx
> bytes   49762
> drops
> 632
> punt
> 1
> ip4
> 768
> ip6
> 3
> ipip0 2  up  9000/0/0/0 rx
> packets10
> rx
> bytes1320
> drops
> 10
> ip4
> 10
> local00 down  0/0/0/0
> vpp#
> vpp#
> vpp#
> vpp# show errors
>Count  Node  Reason
> Severity
>256 ikev2-ip4   IKEv2 packets processed
> error
> 12 dpdk-input  no error
> error
>115 arp-reply   ARP replies sent
> error
>147   ip4-udp-lookupNo error
> error
> 41  esp4-decrypt-tun  ESP pkts received
> error
> 31  esp4-encrypt-tun  ESP pkts received
> error
> 41  ipsec4-tun-inputgood packets received
> error
>469 ip4-input  Multicast RPF check
> failed   error
>  2 ip4-localip4 source lookup miss
> error
> 10 ip4-local unknown ip protocol
> error
>  1   ip4-icmp-input  unknown type
> error
> 31   ip4-icmp-input   echo replies sent
> error
> vpp#
> vpp#
> vpp# show node counters
>Count  Node  Reason
> Severity
>256 ikev2-ip4   IKEv2 packets processed
> error
> 12 dpdk-input  no error
> error
>115 arp-reply   ARP replies sent
> error
>147   ip4-udp-lookupNo error
> error
> 41  esp4-decrypt-tun  ESP pkts received
> error
> 31  esp4-encrypt-tun  ESP pkts received
> error
> 41  ipsec4-tun-inputgood packets received
> error
>469 ip4-input  Multicast RPF check
> failed   error
>  2 ip4-localip4 source lookup miss
> error
> 10 

Re: Handoff design issues [Re: RES: RES: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?]

2020-11-16 Thread Klement Sekera via lists.fd.io
That’s exactly what my patch improves. Coalescing small groups of packets 
waiting in the handoff queue into a full(er) frame allows the downstream node 
to do more “V” and achieve better performance. And that’s also what I’ve seen 
when testing the patch.

Thanks,
Klement

ps. in case you missed the link: https://gerrit.fd.io/r/c/vpp/+/28980

> On 13 Nov 2020, at 22:47, Christian Hopps  wrote:
> 
> FWIW, I too have hit this issue. Basically VPP is designed to process a 
> packet from rx to tx in the same thread. When downstream nodes run slower, 
> the upstream rx node doesn't run, so the vector size in each frame naturally 
> increases, and then the downstream nodes can benefit from "V" (i.e., 
> processing multiple packets in one go).
> 
> This back-pressure from downstream does not occur when you hand-off from a 
> fast thread to a slower thread, so you end up with many single packet frames 
> and fill your hand-off queue.
> 
> The quick fix one tries then is to increase the queue size; however, this is 
> not a great solution b/c you are still not taking advantage of the "V" in 
> VPP. To really fit this back into the original design one needs to somehow 
> still be creating larger vectors in the hand-off frames.
> 
> TBH I think the right solution here is to not hand-off frames, and instead 
> switch to packet queues and then on the handed-off side the frames would get 
> constructed from packet queues (basically creating another polling input node 
> but on the new thread).
> 
> Thanks,
> Chris.
> 
>> On Nov 13, 2020, at 12:21 PM, Marcos - Mgiga  wrote:
>> 
>> Understood. And what path did you take in order to analyse and monitor 
>> vector rates ? Is there some specific command or log ?
>> 
>> Thanks
>> 
>> Marcos
>> 
>> -Mensagem original-
>> De: vpp-dev@lists.fd.io  Em nome de ksekera via []
>> Enviada em: sexta-feira, 13 de novembro de 2020 14:02
>> Para: Marcos - Mgiga 
>> Cc: Elias Rudberg ; vpp-dev@lists.fd.io
>> Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size 
>> NAT_FQ_NELTS to avoid congestion drops?
>> 
>> Not completely idle, more like medium load. Vector rates at which I saw 
>> congestion drops were roughly 40 for thread doing no work (just handoffs - I 
>> hardcoded it this way for test purpose), and roughly 100 for thread picking 
>> the packets doing NAT.
>> 
>> What got me into infra investigation was the fact that once I was hitting 
>> vector rates around 255, I did see packet drops, but no congestion drops.
>> 
>> HTH,
>> Klement
>> 
>>> On 13 Nov 2020, at 17:51, Marcos - Mgiga  wrote:
>>> 
>>> So you mean that this situation ( congestion drops) is most likely to occur 
>>> when the system in general is idle than when it is processing a large 
>>> amount of traffic?
>>> 
>>> Best Regards
>>> 
>>> Marcos
>>> 
>>> -Mensagem original-
>>> De: vpp-dev@lists.fd.io  Em nome de Klement
>>> Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 2020
>>> 12:15
>>> Para: Elias Rudberg 
>>> Cc: vpp-dev@lists.fd.io
>>> Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size 
>>> NAT_FQ_NELTS to avoid congestion drops?
>>> 
>>> Hi Elias,
>>> 
>>> I’ve already debugged this and came to the conclusion that it’s the infra 
>>> which is the weak link. I was seeing congestion drops at mild load, but not 
>>> at full load. Issue is that with handoff, there is uneven workload. For 
>>> simplicity’s sake, just consider thread 1 handing off all the traffic to 
>>> thread 2. What happens is that for thread 1, the job is much easier, it 
>>> just does some ip4 parsing and then hands packet to thread 2, which 
>>> actually does the heavy lifting of hash inserts/lookups/translation etc. 64 
>>> element queue can hold 64 frames, one extreme is 64 1-packet frames, 
>>> totalling 64 packets, other extreme is 64 255-packet frames, totalling ~16k 
>>> packets. What happens is this: thread 1 is mostly idle and just picking a 
>>> few packets from NIC and every one of these small frames creates an entry 
>>> in the handoff queue. Now thread 2 picks one element from the handoff queue 
>>> and deals with it before picking another one. If the queue has only 
>>> 3-packet or 10-packet elements, then thread 2 can never really get into 
>>> what VPP excels in - bulk processing.
>>> 
>>> Q: Why doesn’t it pick as many packets as possible from the handoff queue?
>>> A: It’s not implemented.
>>> 
>>> I already wrote a patch for it, which made all congestion drops which I saw 
>>> (in above synthetic test case) disappear. Mentioned patch 
>>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
>>> 
>>> Would you like to give it a try and see if it helps your issue? We
>>> shouldn’t need big queues under mild loads anyway …
>>> 
>>> Regards,
>>> Klement
>>> 
 On 13 Nov 2020, at 16:03, Elias Rudberg  wrote:
 
 Hello VPP experts,
 
 We are using VPP for NAT44 and we get some "congestion drops", in a
 situation where we think 

Re: RES: RES: RES: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?

2020-11-16 Thread Klement Sekera via lists.fd.io
If you can handle the traffic with a single thread then all multi-worker issues 
would go away. But the congestion drops are seen easily with as little as two 
workers due to infra limitations.

Regards,
Klement

> On 13 Nov 2020, at 18:41, Marcos - Mgiga  wrote:
> 
> Thanks, you see reducing the number of VPP threads as an option to work this 
> issue around, since you would probably increase the vector rate per thread?
> 
> Best Regards
> 
> -Mensagem original-
> De: vpp-dev@lists.fd.io  Em nome de Klement Sekera via 
> lists.fd.io
> Enviada em: sexta-feira, 13 de novembro de 2020 14:26
> Para: Marcos - Mgiga 
> Cc: Elias Rudberg ; vpp-dev 
> Assunto: Re: RES: RES: [vpp-dev] Increasing NAT worker handoff frame queue 
> size NAT_FQ_NELTS to avoid congestion drops?
> 
> I used the usual
> 
> 1. start traffic
> 2. clear run
> 3. wait n seconds (e.g. n == 10)
> 4. show run
> 
> Klement
> 
>> On 13 Nov 2020, at 18:21, Marcos - Mgiga  wrote:
>> 
>> Understood. And what path did you take in order to analyse and monitor 
>> vector rates ? Is there some specific command or log ?
>> 
>> Thanks
>> 
>> Marcos
>> 
>> -Mensagem original-
>> De: vpp-dev@lists.fd.io  Em nome de ksekera via 
>> [] Enviada em: sexta-feira, 13 de novembro de 2020 14:02
>> Para: Marcos - Mgiga 
>> Cc: Elias Rudberg ; vpp-dev@lists.fd.io
>> Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size 
>> NAT_FQ_NELTS to avoid congestion drops?
>> 
>> Not completely idle, more like medium load. Vector rates at which I saw 
>> congestion drops were roughly 40 for thread doing no work (just handoffs - I 
>> hardcoded it this way for test purpose), and roughly 100 for thread picking 
>> the packets doing NAT.
>> 
>> What got me into infra investigation was the fact that once I was hitting 
>> vector rates around 255, I did see packet drops, but no congestion drops.
>> 
>> HTH,
>> Klement
>> 
>>> On 13 Nov 2020, at 17:51, Marcos - Mgiga  wrote:
>>> 
>>> So you mean that this situation ( congestion drops) is most likely to occur 
>>> when the system in general is idle than when it is processing a large 
>>> amount of traffic?
>>> 
>>> Best Regards
>>> 
>>> Marcos
>>> 
>>> -Mensagem original-
>>> De: vpp-dev@lists.fd.io  Em nome de Klement 
>>> Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 
>>> 2020
>>> 12:15
>>> Para: Elias Rudberg 
>>> Cc: vpp-dev@lists.fd.io
>>> Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size 
>>> NAT_FQ_NELTS to avoid congestion drops?
>>> 
>>> Hi Elias,
>>> 
>>> I’ve already debugged this and came to the conclusion that it’s the infra 
>>> which is the weak link. I was seeing congestion drops at mild load, but not 
>>> at full load. Issue is that with handoff, there is uneven workload. For 
>>> simplicity’s sake, just consider thread 1 handing off all the traffic to 
>>> thread 2. What happens is that for thread 1, the job is much easier, it 
>>> just does some ip4 parsing and then hands packet to thread 2, which 
>>> actually does the heavy lifting of hash inserts/lookups/translation etc. 64 
>>> element queue can hold 64 frames, one extreme is 64 1-packet frames, 
>>> totalling 64 packets, other extreme is 64 255-packet frames, totalling ~16k 
>>> packets. What happens is this: thread 1 is mostly idle and just picking a 
>>> few packets from NIC and every one of these small frames creates an entry 
>>> in the handoff queue. Now thread 2 picks one element from the handoff queue 
>>> and deals with it before picking another one. If the queue has only 
>>> 3-packet or 10-packet elements, then thread 2 can never really get into 
>>> what VPP excels in - bulk processing.
>>> 
>>> Q: Why doesn’t it pick as many packets as possible from the handoff queue? 
>>> A: It’s not implemented.
>>> 
>>> I already wrote a patch for it, which made all congestion drops which I saw 
>>> (in above synthetic test case) disappear. Mentioned patch 
>>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
>>> 
>>> Would you like to give it a try and see if it helps your issue? We 
>>> shouldn’t need big queues under mild loads anyway …
>>> 
>>> Regards,
>>> Klement
>>> 
 On 13 Nov 2020, at 16:03, Elias Rudberg  wrote:
 
 Hello VPP experts,
 
 We are using VPP for NAT44 and we get some "congestion drops", in a 
 situation where we think VPP is far from overloaded in general. Then 
 we started to investigate if it would help to use a larger handoff 
 frame queue size. In theory at least, allowing a longer queue could 
 help avoiding drops in case of short spikes of traffic, or if it 
 happens that some worker thread is temporarily busy for whatever 
 reason.
 
 The NAT worker handoff frame queue size is hard-coded in the 
 NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value 
 is 64. The idea is that putting a larger value there could help.
 
 We have run some tests where we changed