A few weeks back, I became aware of the following issue with the LISP tests:
/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: vnet_lisp_add_del_locator_set:2140: Can't delete a locator that supports a mapping! /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: received signal SIGSEGV, PC 0xa0a0a00, faulting address 0xa0a0a00 14:32:44,290 Child test runner process unresponsive and core-file exists in test temporary directory (last test running was `Test case for basic encapsulation' in `/tmp/vpp-unittest-TestLisp-wtuvdu4q')! which seems to be triggered by the api trace commands in tearDown in framework.py. I see it while running tests in a docker container. On Thu, May 28, 2020 at 4:51 AM Andrew Yourtchenko <ayour...@gmail.com> wrote: > Hi Elias, > > Yeah it all does point to something like uninitialized data - I ran > yesterday the tests on two different machines for a while, apparently > without the issues... > > The CI runtime environment is much more dynamic - it’s an ephemeral docker > container that is orchestrated by the nomad and is destroyed after the job > is run. > > Could you push as a separate change the code that reliably gives you the > error in the LISP unit test in the CI, and let me know the change# ? > > > I will then test some tooling enhancement ideas I had for a while - to > check within the job whether the core exists, and if it does, to load it > into gdb and do some scripted processing of it and output the results... > (Iterate over the call stack and issue stuff like ‘info locals’, ‘info > regs’, etc). > > I did some experiments with that approach earlier and it seemed like a > rather scalable technique for most of the issues, which should also save > disk space and the developer time ... > > --a > > > On 28 May 2020, at 10:33, Elias Rudberg <elias.rudb...@bahnhof.net> > wrote: > > > > Hi Andrew, > > > > In my case it failed several times and appeared to be triggered by > > seemingly harmless code changes, but it seemed like the problem was > > reproducible for a given version of the code. What seemed to matter was > > when I changed things related to local variables inside the > > set_ipfix_exporter_command_fn() function. The test logs said "Core-file > > exists" which I suppose means that vpp crashed. The testing framework > > repeats the test several times, saying "3 attempt(s) left", then "2 > > attempt(s) left" and so on, all those repeated attempts seemed to crash > > in the same way. > > > > It could be something with uninitialized variables, e.g. something that > > is assumed to be zero but is never explicitly initialized so it can > > work when it happens to be zero but depending on platform and compiler > > details there could be some garbage there causing a problem. Then > > unrelated code changes like adding variables somewhere making things > > end up at slightly different memory ocations could make the error come > > and go. This is just guessing of course. > > > > Is it possible to get login access to the machine where the > > gerrit/jenkins tests are run, to debug it there where the issue can be > > reproduced? > > > > / Elias > > > > > >> On Wed, 2020-05-27 at 19:03 +0200, Andrew 👽 Yourtchenko wrote: > >> Yep, so it looks like we have an issue... > >> > >> https://gerrit.fd.io/r/c/vpp/+/27305 has the same failures, I am > >> rerunning it now to see how intermittent it is - as well as testing > >> the latest master locally.... > >> > >> --a > >> > >>> On 27 May 2020, at 18:56, Elias Rudberg <elias.rudb...@bahnhof.net> > >>> wrote: > >>> > >>> Hi Andrew, > >>> > >>> Yes, it was Basic LISP test. It looked like this in the > >>> console.log.gz > >>> for vpp-verify-master-ubuntu1804: > >>> > >>> =================================================================== > >>> ==== > >>> ======= > >>> TEST RESULTS: > >>> Scheduled tests: 1177 > >>> Executed tests: 1176 > >>> Passed tests: 1039 > >>> Skipped tests: 137 > >>> Not Executed tests: 1 > >>> Errors: 1 > >>> FAILURES AND ERRORS IN TESTS: > >>> Testcase name: Basic LISP test > >>> ERROR: Test case for basic encapsulation > >>> [test_lisp.TestLisp.test_lisp_basic_encap] > >>> TESTCASES WHERE NO TESTS WERE SUCCESSFULLY EXECUTED: > >>> Basic LISP test > >>> =================================================================== > >>> ==== > >>> ======= > >>> > >>> / Elias > >>> > >>> > >>> > >>> On Wed, 2020-05-27 at 18:42 +0200, Andrew 👽 Yourtchenko wrote: > >>>> Basic LISP test - was it the one that was failing for you ? > >>>> That particular test intermittently failed a couple of times for > >>>> me > >>>> as well, on a doc-only change, so we have an unrelated issue. > >>>> I am running it locally to see what is going on. > >>>> --a > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16555): https://lists.fd.io/g/vpp-dev/message/16555 Mute This Topic: https://lists.fd.io/mt/74491544/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-