Dear Khers,

okay, cool! When testing the debug image, you could save the full dump
and the .debs for all the artefacts so just in case I could grab the
entire set of info and was able to look at it in my environment.

Meantime, I had an idea for another potential failure mode, whereby
the session would get checked while there is a session being freed,
potentially resulting in a reallocation of the free bitmap in the
pool.

So before the reproduction in the debug build, give a shot to this
one-line change
 in the release build and see if you still can reproduce the crash with it:

--- a/src/plugins/acl/fa_node.c
+++ b/src/plugins/acl/fa_node.c
@@ -609,6 +609,8 @@ acl_fa_verify_init_sessions (acl_main_t * am)
     for (wk = 0; wk < vec_len (am->per_worker_data); wk++) {
       acl_fa_per_worker_data_t *pw = &am->per_worker_data[wk];
       pool_alloc_aligned(pw->fa_sessions_pool,
am->fa_conn_table_max_entries, CLIB_CACHE_LINE_BYTES);
+      /* preallocate the free bitmap */
+      clib_bitmap_validate(pool_header(pw->fa_sessions_pool)->free_bitmap,
am->fa_conn_table_max_entries);
     }

--a

On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote:
> Dear Andrew
>
> I used latest version of master branch, I will replay the test with debug
> build to make more debug info ASAP.
> Vpp is running on Xeon E5-2600  series.
> I did the tanother tests with two rx-queue and two worker, also with 4
> rx-queue and 4 worker, I got segmentation fault on the same function.
>
> I will send more info in few days.
>
> Regards,
> Khers
>
> On Oct 24, 2017 6:43 PM, "Andrew 👽 Yourtchenko" <ayour...@gmail.com>
> wrote:
>
>> Dear Khers,
>>
>> Thanks for the info!
>>
>> I tried with these configs in my local setup (I tried even to increase
>> the multi-cpu contention by specifying 4 rx-queues instead of 2), but
>> it works ok for me on the master. What is the version you are testing
>> with ? I presume it is also the master, but just wanted to verify.
>>
>> To try to get more info about this happening: could you give a shot at
>> reproducing this on the debug build ? There are a few asserts that
>> would be handy to verify that they do hold true during your tests -
>> the location of the crash points to either the pool header being
>> corrupted by something (the asserts should catch that) or the pool
>> itself reallocated and memory used by something else (which should not
>> happen because the memory is preallocated during the initialisation
>> time - unless you change the max number of sessions after
>> initialisation).
>>
>> Also, could you tell a bit more about the hardware you are testing
>> with ? (cat /proc/cpuinfo)
>>
>> --a
>>
>> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote:
>> > Dear Andrew
>> >
>> > Thanks for your attention.
>> > Trex config file <https://paste.ubuntu.com/25807801/>
>> > Trex scenario is default sfr.yaml.
>> > vpp: startup.conf <https://paste.ubuntu.com/25807840/>
>> > I changed size of acl_mheap to '(uword)2<<32' in acl.c
>> > vpp config:
>> > vppctl set interface l2 bridge TenGigabitEthernet86/0/0 1
>> > vppctl set interface l2 bridge TenGigabitEthernet86/0/1 1
>> >
>> > vppctl set int state TenGigabitEthernet86/0/0 up
>> > vppctl set int state TenGigabitEthernet86/0/1 up
>> >
>> > vppctl set acl-plugin session table hash-table-buckets 1000000
>> > vppctl set acl-plugin session table hash-table-memory 2147483648
>> >
>> > vppctl set acl-plugin session timeout udp idle 5
>> > vppctl set acl-plugin session timeout tcp idle 10
>> > vppctl set acl-plugin session timeout tcp transient 5
>> >
>> > Regards,
>> > Khers
>> >
>> >
>> > On Mon, Oct 23, 2017 at 7:52 PM, Andrew 👽 Yourtchenko <
>> ayour...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> could you share the exact TRex and VPP config files, so I could
>> >> recreate it locally to investigate further ?
>> >>
>> >> Thanks a lot!
>> >>
>> >> --a
>> >>
>> >> On 10/23/17, khers <s3m2e1.6s...@gmail.com> wrote:
>> >> > Dear folks
>> >> >
>> >> > I have bridged two interfaces and set permit+reflect acl on the
>> >> > input
>> >> > of
>> >> > interface one and deny rule on output of same interface as follow:
>> >> >
>> >> > acl_add_replace permit+reflect
>> >> > acl_add_replace deny
>> >> >
>> >> > acl_interface_add_del sw_if_index 1 add input acl 0
>> >> > acl_interface_add_del sw_if_index 1 add output acl 1
>> >> >
>> >> >
>> >> > after about 100 seconds of running Trex with sfr scenario I got
>> >> > sigsegv.
>> >> > this is gdb's backtrace <https://pastebin.com/VvZ9Z3Nf>.
>> >> >
>> >> > Trex :
>> >> > ./t-rex-64 -f cap2/sfr.yaml -m 5 -c 4
>> >> >
>> >> >
>> >> > Regards,
>> >> > Khers
>> >> >
>> >>
>> >
>>
>
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to