Dear Khers, okay, cool! When testing the debug image, you could save the full dump and the .debs for all the artefacts so just in case I could grab the entire set of info and was able to look at it in my environment.
Meantime, I had an idea for another potential failure mode, whereby the session would get checked while there is a session being freed, potentially resulting in a reallocation of the free bitmap in the pool. So before the reproduction in the debug build, give a shot to this one-line change in the release build and see if you still can reproduce the crash with it: --- a/src/plugins/acl/fa_node.c +++ b/src/plugins/acl/fa_node.c @@ -609,6 +609,8 @@ acl_fa_verify_init_sessions (acl_main_t * am) for (wk = 0; wk < vec_len (am->per_worker_data); wk++) { acl_fa_per_worker_data_t *pw = &am->per_worker_data[wk]; pool_alloc_aligned(pw->fa_sessions_pool, am->fa_conn_table_max_entries, CLIB_CACHE_LINE_BYTES); + /* preallocate the free bitmap */ + clib_bitmap_validate(pool_header(pw->fa_sessions_pool)->free_bitmap, am->fa_conn_table_max_entries); } --a On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote: > Dear Andrew > > I used latest version of master branch, I will replay the test with debug > build to make more debug info ASAP. > Vpp is running on Xeon E5-2600 series. > I did the tanother tests with two rx-queue and two worker, also with 4 > rx-queue and 4 worker, I got segmentation fault on the same function. > > I will send more info in few days. > > Regards, > Khers > > On Oct 24, 2017 6:43 PM, "Andrew 👽 Yourtchenko" <ayour...@gmail.com> > wrote: > >> Dear Khers, >> >> Thanks for the info! >> >> I tried with these configs in my local setup (I tried even to increase >> the multi-cpu contention by specifying 4 rx-queues instead of 2), but >> it works ok for me on the master. What is the version you are testing >> with ? I presume it is also the master, but just wanted to verify. >> >> To try to get more info about this happening: could you give a shot at >> reproducing this on the debug build ? There are a few asserts that >> would be handy to verify that they do hold true during your tests - >> the location of the crash points to either the pool header being >> corrupted by something (the asserts should catch that) or the pool >> itself reallocated and memory used by something else (which should not >> happen because the memory is preallocated during the initialisation >> time - unless you change the max number of sessions after >> initialisation). >> >> Also, could you tell a bit more about the hardware you are testing >> with ? (cat /proc/cpuinfo) >> >> --a >> >> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote: >> > Dear Andrew >> > >> > Thanks for your attention. >> > Trex config file <https://paste.ubuntu.com/25807801/> >> > Trex scenario is default sfr.yaml. >> > vpp: startup.conf <https://paste.ubuntu.com/25807840/> >> > I changed size of acl_mheap to '(uword)2<<32' in acl.c >> > vpp config: >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/0 1 >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/1 1 >> > >> > vppctl set int state TenGigabitEthernet86/0/0 up >> > vppctl set int state TenGigabitEthernet86/0/1 up >> > >> > vppctl set acl-plugin session table hash-table-buckets 1000000 >> > vppctl set acl-plugin session table hash-table-memory 2147483648 >> > >> > vppctl set acl-plugin session timeout udp idle 5 >> > vppctl set acl-plugin session timeout tcp idle 10 >> > vppctl set acl-plugin session timeout tcp transient 5 >> > >> > Regards, >> > Khers >> > >> > >> > On Mon, Oct 23, 2017 at 7:52 PM, Andrew 👽 Yourtchenko < >> ayour...@gmail.com> >> > wrote: >> > >> >> Hi, >> >> >> >> could you share the exact TRex and VPP config files, so I could >> >> recreate it locally to investigate further ? >> >> >> >> Thanks a lot! >> >> >> >> --a >> >> >> >> On 10/23/17, khers <s3m2e1.6s...@gmail.com> wrote: >> >> > Dear folks >> >> > >> >> > I have bridged two interfaces and set permit+reflect acl on the >> >> > input >> >> > of >> >> > interface one and deny rule on output of same interface as follow: >> >> > >> >> > acl_add_replace permit+reflect >> >> > acl_add_replace deny >> >> > >> >> > acl_interface_add_del sw_if_index 1 add input acl 0 >> >> > acl_interface_add_del sw_if_index 1 add output acl 1 >> >> > >> >> > >> >> > after about 100 seconds of running Trex with sfr scenario I got >> >> > sigsegv. >> >> > this is gdb's backtrace <https://pastebin.com/VvZ9Z3Nf>. >> >> > >> >> > Trex : >> >> > ./t-rex-64 -f cap2/sfr.yaml -m 5 -c 4 >> >> > >> >> > >> >> > Regards, >> >> > Khers >> >> > >> >> >> > >> > _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev