A few weeks back, I became aware of the following issue with the LISP tests:

/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]:
vnet_lisp_add_del_locator_set:2140: Can't delete a locator that supports a
mapping!
/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: received
signal SIGSEGV, PC 0xa0a0a00, faulting address 0xa0a0a00
14:32:44,290 Child test runner process unresponsive and core-file exists in
test temporary directory (last test running was `Test case for basic
encapsulation' in `/tmp/vpp-unittest-TestLisp-wtuvdu4q')!

which seems to be triggered by the api trace commands in tearDown in
framework.py.  I see it while running tests in a docker container.



On Thu, May 28, 2020 at 4:51 AM Andrew Yourtchenko <ayour...@gmail.com>
wrote:

> Hi Elias,
>
> Yeah it all does point to something like uninitialized data - I ran
> yesterday the tests on two different machines for a while, apparently
> without the issues...
>
> The CI runtime environment is much more dynamic - it’s an ephemeral docker
> container that is orchestrated by the nomad and is destroyed after the job
> is run.
>
> Could you push as a separate change the code that reliably gives you the
> error in the LISP unit test in the CI, and let me know the change# ?
>
>
>  I will then test some tooling enhancement ideas I had for a while - to
> check within the job whether the core exists, and if it does, to load it
> into gdb and do some scripted processing of it and output the results...
> (Iterate over the call stack and issue stuff like ‘info locals’, ‘info
> regs’, etc).
>
> I did some experiments with that approach earlier and it seemed like a
> rather scalable technique for most of the issues, which should also save
> disk space and the developer time ...
>
> --a
>
> > On 28 May 2020, at 10:33, Elias Rudberg <elias.rudb...@bahnhof.net>
> wrote:
> >
> > Hi Andrew,
> >
> > In my case it failed several times and appeared to be triggered by
> > seemingly harmless code changes, but it seemed like the problem was
> > reproducible for a given version of the code. What seemed to matter was
> > when I changed things related to local variables inside the
> > set_ipfix_exporter_command_fn() function. The test logs said "Core-file
> > exists" which I suppose means that vpp crashed. The testing framework
> > repeats the test several times, saying "3 attempt(s) left", then "2
> > attempt(s) left" and so on, all those repeated attempts seemed to crash
> > in the same way.
> >
> > It could be something with uninitialized variables, e.g. something that
> > is assumed to be zero but is never explicitly initialized so it can
> > work when it happens to be zero but depending on platform and compiler
> > details there could be some garbage there causing a problem. Then
> > unrelated code changes like adding variables somewhere making things
> > end up at slightly different memory ocations could make the error come
> > and go. This is just guessing of course.
> >
> > Is it possible to get login access to the machine where the
> > gerrit/jenkins tests are run, to debug it there where the issue can be
> > reproduced?
> >
> > / Elias
> >
> >
> >> On Wed, 2020-05-27 at 19:03 +0200, Andrew 👽 Yourtchenko wrote:
> >> Yep, so it looks like we have an issue...
> >>
> >> https://gerrit.fd.io/r/c/vpp/+/27305 has the same failures, I am
> >> rerunning it now to see how intermittent it is - as well as testing
> >> the latest master locally....
> >>
> >> --a
> >>
> >>> On 27 May 2020, at 18:56, Elias Rudberg <elias.rudb...@bahnhof.net>
> >>> wrote:
> >>>
> >>> Hi Andrew,
> >>>
> >>> Yes, it was Basic LISP test. It looked like this in the
> >>> console.log.gz
> >>> for vpp-verify-master-ubuntu1804:
> >>>
> >>> ===================================================================
> >>> ====
> >>> =======
> >>> TEST RESULTS:
> >>>    Scheduled tests: 1177
> >>>     Executed tests: 1176
> >>>       Passed tests: 1039
> >>>      Skipped tests: 137
> >>> Not Executed tests: 1
> >>>             Errors: 1
> >>> FAILURES AND ERRORS IN TESTS:
> >>> Testcase name: Basic LISP test
> >>>     ERROR: Test case for basic encapsulation
> >>> [test_lisp.TestLisp.test_lisp_basic_encap]
> >>> TESTCASES WHERE NO TESTS WERE SUCCESSFULLY EXECUTED:
> >>> Basic LISP test
> >>> ===================================================================
> >>> ====
> >>> =======
> >>>
> >>> / Elias
> >>>
> >>>
> >>>
> >>> On Wed, 2020-05-27 at 18:42 +0200, Andrew 👽 Yourtchenko wrote:
> >>>> Basic LISP test - was it the one that was failing for you ?
> >>>> That particular test intermittently failed a couple of times for
> >>>> me
> >>>> as well, on a doc-only change, so we have an unrelated issue.
> >>>> I am running it locally to see what is going on.
> >>>> --a
> 
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16555): https://lists.fd.io/g/vpp-dev/message/16555
Mute This Topic: https://lists.fd.io/mt/74491544/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to