Dear Khers,

That is without applying the one liner change that I have proposed, right ?

I would suggest to retry the reproduction on the same commit where you were 
previously able to reproduce it, and if it is reliably reproducible there - to 
apply that change and see if it addresses the issue. Then we can track if the 
latest commit fixed or merely masked it...

--a

> On 8 Nov 2017, at 08:40, khers <s3m2e1.6s...@gmail.com> wrote:
> 
> Dear Andrew
> 
> Sorry for my delay, I get last revision of master  (commit : 
> e695cb4dbdb6f9424ac5a567799e67f791fad328 ), and 
> segfault did not occur with the same environment and test scenario. I will 
> try to reproduce the potential bug
> with running test with longer duration and more aggressive scenario. 
> 
> Regards,
> Khers
> 
>> On Wed, Oct 25, 2017 at 1:45 PM, Andrew 👽 Yourtchenko <ayour...@gmail.com> 
>> wrote:
>> Dear Khers,
>> 
>> okay, cool! When testing the debug image, you could save the full dump
>> and the .debs for all the artefacts so just in case I could grab the
>> entire set of info and was able to look at it in my environment.
>> 
>> Meantime, I had an idea for another potential failure mode, whereby
>> the session would get checked while there is a session being freed,
>> potentially resulting in a reallocation of the free bitmap in the
>> pool.
>> 
>> So before the reproduction in the debug build, give a shot to this
>> one-line change
>>  in the release build and see if you still can reproduce the crash with it:
>> 
>> --- a/src/plugins/acl/fa_node.c
>> +++ b/src/plugins/acl/fa_node.c
>> @@ -609,6 +609,8 @@ acl_fa_verify_init_sessions (acl_main_t * am)
>>      for (wk = 0; wk < vec_len (am->per_worker_data); wk++) {
>>        acl_fa_per_worker_data_t *pw = &am->per_worker_data[wk];
>>        pool_alloc_aligned(pw->fa_sessions_pool,
>> am->fa_conn_table_max_entries, CLIB_CACHE_LINE_BYTES);
>> +      /* preallocate the free bitmap */
>> +      clib_bitmap_validate(pool_header(pw->fa_sessions_pool)->free_bitmap,
>> am->fa_conn_table_max_entries);
>>      }
>> 
>> --a
>> 
>> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote:
>> > Dear Andrew
>> >
>> > I used latest version of master branch, I will replay the test with debug
>> > build to make more debug info ASAP.
>> > Vpp is running on Xeon E5-2600  series.
>> > I did the tanother tests with two rx-queue and two worker, also with 4
>> > rx-queue and 4 worker, I got segmentation fault on the same function.
>> >
>> > I will send more info in few days.
>> >
>> > Regards,
>> > Khers
>> >
>> > On Oct 24, 2017 6:43 PM, "Andrew 👽 Yourtchenko" <ayour...@gmail.com>
>> > wrote:
>> >
>> >> Dear Khers,
>> >>
>> >> Thanks for the info!
>> >>
>> >> I tried with these configs in my local setup (I tried even to increase
>> >> the multi-cpu contention by specifying 4 rx-queues instead of 2), but
>> >> it works ok for me on the master. What is the version you are testing
>> >> with ? I presume it is also the master, but just wanted to verify.
>> >>
>> >> To try to get more info about this happening: could you give a shot at
>> >> reproducing this on the debug build ? There are a few asserts that
>> >> would be handy to verify that they do hold true during your tests -
>> >> the location of the crash points to either the pool header being
>> >> corrupted by something (the asserts should catch that) or the pool
>> >> itself reallocated and memory used by something else (which should not
>> >> happen because the memory is preallocated during the initialisation
>> >> time - unless you change the max number of sessions after
>> >> initialisation).
>> >>
>> >> Also, could you tell a bit more about the hardware you are testing
>> >> with ? (cat /proc/cpuinfo)
>> >>
>> >> --a
>> >>
>> >> On 10/24/17, khers <s3m2e1.6s...@gmail.com> wrote:
>> >> > Dear Andrew
>> >> >
>> >> > Thanks for your attention.
>> >> > Trex config file <https://paste.ubuntu.com/25807801/>
>> >> > Trex scenario is default sfr.yaml.
>> >> > vpp: startup.conf <https://paste.ubuntu.com/25807840/>
>> >> > I changed size of acl_mheap to '(uword)2<<32' in acl.c
>> >> > vpp config:
>> >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/0 1
>> >> > vppctl set interface l2 bridge TenGigabitEthernet86/0/1 1
>> >> >
>> >> > vppctl set int state TenGigabitEthernet86/0/0 up
>> >> > vppctl set int state TenGigabitEthernet86/0/1 up
>> >> >
>> >> > vppctl set acl-plugin session table hash-table-buckets 1000000
>> >> > vppctl set acl-plugin session table hash-table-memory 2147483648
>> >> >
>> >> > vppctl set acl-plugin session timeout udp idle 5
>> >> > vppctl set acl-plugin session timeout tcp idle 10
>> >> > vppctl set acl-plugin session timeout tcp transient 5
>> >> >
>> >> > Regards,
>> >> > Khers
>> >> >
>> >> >
>> >> > On Mon, Oct 23, 2017 at 7:52 PM, Andrew 👽 Yourtchenko <
>> >> ayour...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> could you share the exact TRex and VPP config files, so I could
>> >> >> recreate it locally to investigate further ?
>> >> >>
>> >> >> Thanks a lot!
>> >> >>
>> >> >> --a
>> >> >>
>> >> >> On 10/23/17, khers <s3m2e1.6s...@gmail.com> wrote:
>> >> >> > Dear folks
>> >> >> >
>> >> >> > I have bridged two interfaces and set permit+reflect acl on the
>> >> >> > input
>> >> >> > of
>> >> >> > interface one and deny rule on output of same interface as follow:
>> >> >> >
>> >> >> > acl_add_replace permit+reflect
>> >> >> > acl_add_replace deny
>> >> >> >
>> >> >> > acl_interface_add_del sw_if_index 1 add input acl 0
>> >> >> > acl_interface_add_del sw_if_index 1 add output acl 1
>> >> >> >
>> >> >> >
>> >> >> > after about 100 seconds of running Trex with sfr scenario I got
>> >> >> > sigsegv.
>> >> >> > this is gdb's backtrace <https://pastebin.com/VvZ9Z3Nf>.
>> >> >> >
>> >> >> > Trex :
>> >> >> > ./t-rex-64 -f cap2/sfr.yaml -m 5 -c 4
>> >> >> >
>> >> >> >
>> >> >> > Regards,
>> >> >> > Khers
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
> 
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to