vpp-dev, Florin,

Below is an analysis of the all of the failures that this patch encountered before finally passing. None of the failures were related in any way to the code changes in the patch.

In summary, there appear to be a number of different factors involved with these failures.

 * Two failures appear to be caused by the run-time environment.
 * An intermittent bug appears to exist in `L2BD Multi-instance test 5
   - delete 5 BDs'
 * The segfault shows lots of threads being run.  Are tests being
   executed in parallel?  If so, it would be interesting to serialize
   the tests to see if that fixes any of these issues.

I'm also seeing a variation in the order that the "make tests" are run (or at least in the order of the status reports). My understanding of the 'make test' python infrastructure is insufficient to make an intelligent guess as to whether this has any bearing on any of these failures.

I get more predictable result output when running make test locally on my own server, but the order of test output is different than in the CI test runs. Locally, the order of tests appears to be the same between different runs of 'make test'. I have also not seen any of these errors on my server which is running Ubuntu 17.04, although I have not done an endurance test either.

My recommendation based on this analysis is as follows:
1. The L2BD unit test issue be investigated by the appropriate 'make test' experts 2. vpp-verify-master-centos7, vpp-verify-master-ubuntu1604, and vpp-test-debug-master-ubuntu1604 jobs should be run operationally in the Container PoC environment with the rest of the jjb jobs run in the cloud infra.

Thanks,
-daw-


---- %< ----
[ From https://gerrit.fd.io/r/#/c/8133 ]

=> Container PoC Aug 24 8:36 PM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1515/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1512/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1983/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1301/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2022/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1695/ : SUCCESS

=> fd.io JJB  Aug 24 9:19 PM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/ : FAILURE
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6775

   Failure Signature:
      01:08:59  verify templates on IP6 datapath      Fatal Python
   error: Segmentation fault

   Comment:
      Python bug or resource starvation?  Lots of threads running...
      Possibly due to bad environment/sick minion.

https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3098/ : SUCCESS
https://jenkins.fd.io/job/vpp-verify-master-centos7/6770/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6781/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5370/ : SUCCESS

=> Container PoC  Aug 24 10:54 PM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1519/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1516/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1987/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1305/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2027/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1699/ : SUCCESS

=> fd.io JJB  Aug 24 11:13 PM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-centos7/6774/ : FAILURE
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6774

   Failure Signature:
      00:23:17.198 CCLD     vcl_test_client
      00:24:32.936 FATAL: command execution failed
      00:24:32.937 java.io.IOException

   Comment:
      Bad environment/sick minion?
      There's no reason for compilation to kill the build.

https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6779/ : FAILURE
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6779

   Failure Signature:
      03:02:47
   
==============================================================================
      03:02:47  collect information on Ethernet, IP4 and IP6 datapath
   (no timers)
      03:02:47
   
==============================================================================
      03:02:47  no timers, one CFLOW packet, 9 Flows
   inside                              OK
      03:02:47  no timers, two CFLOW packets (mtu=256), 3 Flows in
   each                  OK
      03:02:47  L2 data on IP4 datapath OK
      03:02:47  L2 data on IP6 datapath OK
      03:02:47  L2 data on L2 datapath OK
      03:02:48  L3 data on IP4 datapath OK
      03:02:48  L3 data on IP6 datapath OK
      03:02:48  L3 data on L2 datapath OK
      03:02:48  L4 data on IP4 datapath OK
      03:02:48  L4 data on IP6 datapath OK
      03:02:48  L4 data on L2 datapath OK
      03:02:48  verify templates on IP6 datapath
      03:02:47,401 Timeout while waiting for child test runner process
   (last test running was `L2BD Multi-instance test 5 - delete 5 BDs' in

   Comment:
      Unknown level of parallelism going on here -- L2BD test status
   has not been flushed to console.
      Order of test results is different in later test runs.

https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3102/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6785/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5374/ : SUCCESS

=> Container PoC  3:11 AM  Patch Set 9:  Build Failed
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1307/ : FAILURE

   Failure Signature:
   06:51:59
   
==============================================================================
   06:51:59 Bidirectional Forwarding Detection (BFD)
   06:51:59
   
==============================================================================
   06:51:59 put session admin-up and
   admin-down                                      SKIP
   06:51:59 configuration change while peer in demand
   mode                           SKIP
   06:51:59 verify session goes down after
   inactivity                                SKIP
   06:51:59 echo function SKIP
   06:51:59 session goes down if echo function
   fails                                 SKIP
   06:51:59 echo packets looped
   back                                                 SKIP
   06:51:59 echo function stops if echo source is
   removed                            SKIP
   06:51:59 echo function stops if peer sets required min echo rx
   zero               SKIP
   06:51:59 hold BFD session
   up                                                      SKIP
   06:51:59 immediately honor remote required min rx
   reduction                       SKIP
   06:51:59 interface with bfd session
   deleted                                       SKIP
   06:51:59 echo packets with invalid checksum don't keep a session
   up               SKIP
   06:51:59 large remote required min rx
   interval                                    SKIP
   06:51:59 modify detect multiplier SKIP
   06:51:59 modify session - double required min
   rx                                  SKIP
   06:51:59 modify session - halve required min
   rx                                   SKIP
   06:51:59 no periodic frames outside poll sequence if remote demand
   set            SKIP
   06:51:59 test correct response to control frame with poll bit
   set                 SKIP
   06:51:59 test poll sequence
   queueing                                              SKIP
   06:51:59 bring BFD session
   down                                                   SKIP
   06:51:59 bring BFD session
   up                                                     SKIP
   06:51:59 bring BFD session up - first frame looked up by address
   pair             SKIP
   06:51:59 verify slow periodic control frames while session
   down                   SKIP
   06:51:59 stale echo packets don't keep a session
   up                               SKIP
   06:51:59 n07:03:51,792 Timeout while waiting for child test runner
   process (last test running was `L2BD Multi-instance test 5 - delete
   5 BDs' in `/tmp/vpp-unittest-TestL2bdMultiInst-AG7L1W')!
   07:02:08 Killing possible remaining process IDs:  21754 21764 21766

   Comment:
      Unknown level of parallelism going on here -- L2BD test status
   has not been flushed to console.
      Order of test results is different in the test runs on cloud infra.
      !The failure signature is the same as above!

http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1521/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1518/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1989/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2030/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1702/ : SUCCESS

=> fd.io JJB  3:42 AM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6781/ : FAILURE

   Failure Signature:
   07:29:09
   
==============================================================================
   07:29:09  collect information on Ethernet, IP4 and IP6 datapath (no
   timers)
   07:29:09
   
==============================================================================
   07:29:09  no timers, one CFLOW packet, 9 Flows
   inside                              OK
   07:29:09  no timers, two CFLOW packets (mtu=256), 3 Flows in
   each                  OK
   07:29:09  L2 data on IP4
   datapath                                                  OK
   07:29:09  L2 data on IP6
   datapath                                                  OK
   07:29:09  L2 data on L2
   datapath                                                   OK
   07:29:09  L3 data on IP4
   datapath                                                  OK
   07:29:09  L3 data on IP6
   datapath                                                  OK
   07:29:09  L3 data on L2
   datapath                                                   OK
   07:29:09  L4 data on IP4
   datapath                                                  OK
   07:29:09  L4 data on IP6
   datapath                                                  OK
   07:29:09  L4 data on L2
   datapath                                                   OK
   07:29:09  verify templates on IP6 datapath      07:29:08,087 Timeout
   while waiting for child test runner process (last test running was
   `L2BD Multi-instance test 5 - delete 5 BDs' in
   `/tmp/vpp-unittest-TestL2bdMultiInst-gbzkP4')!
   07:29:09  Killing possible remaining process IDs:  1883 1897 1899
   Comment:
   Unknown level of parallelism going on here -- L2BD test status has
   not been flushed to console.
   Order of test results is the same as the previous cloud infra run,
   but different than the Container PoC.
   !The failure signature is the same as both of the previous Timeout
   Failures above!

Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6781
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3104/ : SUCCESS
https://jenkins.fd.io/job/vpp-verify-master-centos7/6776/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6787/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5376/ : SUCCESS

=> Container PoC  9:26 AM  Patch Set 9:  Build Failed
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1527/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1524/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1997/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1313/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2039/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : NOT_BUILT

   Comment:
   Only seen on Container PoC.
   Erroneous Build Failure status.
   Subsequent Container PoC included only this job and was successful.

=> Container PoC 9:44 AM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : SUCCESS

=> fd.io JJB  10:02 AM  Patch Set 9:  Verified+1  Build Successful
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3110/ : SUCCESS
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-make-test-docs-verify-master/3110
https://jenkins.fd.io/job/vpp-verify-master-centos7/6782/ : SUCCESS
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6782
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6793/ : SUCCESS
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/6793
https://jenkins.fd.io/job/vpp-docs-verify-master/5382/ : SUCCESS
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-docs-verify-master/5382
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6787/ : SUCCESS
Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6787

---- %< ----


On 08/24/2017 10:21 PM, Florin Coras wrote:
Hi,

Build 6775 failed with:

*01:07:20* verify templates on IP6 datapath      Fatal Python error: 
Segmentation fault
*01:08:59*
*01:08:59* Thread 0x00007fccdfabf700 <python> (most recent call first):
*01:08:59*    File "/usr/lib/python2.7/threading.py", line 340 in wait
*01:08:59*    File "/usr/lib/python2.7/Queue.py", line 168 in get
*01:08:59*    File "build/bdist.linux-x86_64/egg/vpp_papi.py", line 664 in 
thread_msg_handler
*01:08:59*    File "/usr/lib/python2.7/threading.py", line 754 in run
*01:08:59*    File "/usr/lib/python2.7/threading.py", line 801 in 
__bootstrap_inner
*01:08:59*    File "/usr/lib/python2.7/threading.py", line 774 in __bootstrap
*01:08:59*

More details here [1]. Just my luck?

Thanks,
Florin

[1] https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/console


_______________________________________________
csit-dev mailing list
csit-...@lists.fd.io
https://lists.fd.io/mailman/listinfo/csit-dev

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev
  • Re: [vpp-dev]... Dave Wallace

Reply via email to