Dave,

the tests are not run in parallel. For each testcase  *class* a new vpp is
started and all the tests which are members of that class are run
against that vpp. Then the vpp is killed and another class is picked. We
use a helper thread to "pump" stdout & stderr from vpp to the logger,
motivation being - have it time-synchronized as much as possible with
the other logger messages. Afaik there is nothing which blocks the python
interpreter from reaping the thread.

Regarding timeouts - the first thing the framework does is setup a few
communication pipes and fork. Child sends periodic keepalives
by writing a tuple describing test class running, it's temporary
directory etc. to the pipe. If sufficient time passes without
activity on the pipe, the child is considered stuck, killed and the
messages which you describe appear. This usually happens if vpp
coredumps mid-api call, in which case the wait for shared memory
condition will never finish (as there is no vpp to signal that condition).

Regards,
Klement

Quoting Dave Wallace (2017-08-25 19:56:13)
>    vpp-dev, Florin,
> 
>    Below is an analysis of the all of the failures that this patch
>    encountered before finally passing. None of the failures were related in
>    any way to the code changes in the patch.
> 
>    In summary, there appear to be a number of different factors involved with
>    these failures.
> 
>      • Two failures appear to be caused by the run-time environment.
>      • An intermittent bug appears to exist in `L2BD Multi-instance test 5 -
>        delete 5 BDs'
>      • The segfault shows lots of threads being run.  Are tests being
>        executed in parallel?  If so, it would be interesting to serialize the
>        tests to see if that fixes any of these issues.
> 
>    I'm also seeing a variation in the order that the "make tests" are run (or
>    at least in the order of the status reports).  My understanding of the
>    'make test' python infrastructure is insufficient to make an intelligent
>    guess as to whether this has any bearing on any of these failures.
> 
>    I get more predictable result output when running make test locally on my
>    own server, but the order of test output is different than in the CI test
>    runs.  Locally, the order of tests appears to be the same between
>    different runs of 'make test'.  I have also not seen any of these errors
>    on my server which is running Ubuntu 17.04, although I have not done an
>    endurance test either.
> 
>    My recommendation based on this analysis is as follows:
>      1. The L2BD unit test issue be investigated by the appropriate 'make
>    test' experts
>      2. vpp-verify-master-centos7, vpp-verify-master-ubuntu1604, and
>    vpp-test-debug-master-ubuntu1604 jobs should be run operationally in the
>    Container PoC environment with the rest of the jjb jobs run in the cloud
>    infra.
> 
>    Thanks,
>    -daw-
> 
>    ---- %< ----
>    [ From [1]https://gerrit.fd.io/r/#/c/8133 ]
> 
>    => Container PoC Aug 24 8:36 PM  Patch Set 9:  Build Successful
>    [2]http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1515/ :
>    SUCCESS
>    
> [3]http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1512/
>    : SUCCESS
>    [4]http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1983/ :
>    SUCCESS
>    
> [5]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1301/
>    : SUCCESS
>    [6]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2022/ :
>    SUCCESS
>    [7]http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1695/ :
>    SUCCESS
> 
>    => fd.io JJB  Aug 24 9:19 PM  Patch Set 9:  Verified-1  Build Failed
>    [8]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/ : FAILURE
>    Logs:
>    
> [9]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6775
> 
>      Failure Signature:
>        01:08:59  verify templates on IP6 datapath      Fatal Python error:
>      Segmentation fault
> 
>      Comment:
>        Python bug or resource starvation?  Lots of threads running...
>        Possibly due to bad environment/sick minion.
> 
>    [10]https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3098/ :
>    SUCCESS
>    [11]https://jenkins.fd.io/job/vpp-verify-master-centos7/6770/ : SUCCESS
>    [12]https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6781/ : SUCCESS
>    [13]https://jenkins.fd.io/job/vpp-docs-verify-master/5370/ : SUCCESS
> 
>    => Container PoC  Aug 24 10:54 PM  Patch Set 9:  Build Successful
>    [14]http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1519/ :
>    SUCCESS
>    
> [15]http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1516/
>    : SUCCESS
>    [16]http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1987/ :
>    SUCCESS
>    
> [17]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1305/
>    : SUCCESS
>    [18]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2027/
>    : SUCCESS
>    [19]http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1699/ :
>    SUCCESS
> 
>    => fd.io JJB  Aug 24 11:13 PM  Patch Set 9:  Verified-1  Build Failed
>    [20]https://jenkins.fd.io/job/vpp-verify-master-centos7/6774/ : FAILURE
>    Logs:
>    
> [21]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6774
> 
>      Failure Signature:
>        00:23:17.198 CCLD     vcl_test_client
>        00:24:32.936 FATAL: command execution failed
>        00:24:32.937 java.io.IOException
> 
>      Comment: 
>        Bad environment/sick minion? 
>        There's no reason for compilation to kill the build.
> 
>    [22]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6779/ : FAILURE
>    Logs:
>    
> [23]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6779
> 
>      Failure Signature:
>        03:02:47 
>      
> ==============================================================================
>        03:02:47  collect information on Ethernet, IP4 and IP6 datapath (no
>      timers)
>        03:02:47 
>      
> ==============================================================================
>        03:02:47  no timers, one CFLOW packet, 9 Flows
>      inside                              OK
>        03:02:47  no timers, two CFLOW packets (mtu=256), 3 Flows in
>      each                  OK
>        03:02:47  L2 data on IP4
>      datapath                                                  OK
>        03:02:47  L2 data on IP6
>      datapath                                                  OK
>        03:02:47  L2 data on L2
>      datapath                                                   OK
>        03:02:48  L3 data on IP4
>      datapath                                                  OK
>        03:02:48  L3 data on IP6
>      datapath                                                  OK
>        03:02:48  L3 data on L2
>      datapath                                                   OK
>        03:02:48  L4 data on IP4
>      datapath                                                  OK
>        03:02:48  L4 data on IP6
>      datapath                                                  OK
>        03:02:48  L4 data on L2
>      datapath                                                   OK
>        03:02:48  verify templates on IP6 datapath     
>        03:02:47,401 Timeout while waiting for child test runner process (last
>      test running was `L2BD Multi-instance test 5 - delete 5 BDs' in
> 
>      Comment:
>        Unknown level of parallelism going on here -- L2BD test status has not
>      been flushed to console.
>        Order of test results is different in later test runs.
> 
>    [24]https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3102/ :
>    SUCCESS
>    [25]https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6785/ : SUCCESS
>    [26]https://jenkins.fd.io/job/vpp-docs-verify-master/5374/ : SUCCESS
> 
>    => Container PoC  3:11 AM  Patch Set 9:  Build Failed
>    
> [27]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1307/
>    : FAILURE
> 
>      Failure Signature:
>        06:51:59
>      
> ==============================================================================
>        06:51:59 Bidirectional Forwarding Detection (BFD)
>        06:51:59
>      
> ==============================================================================
>        06:51:59 put session admin-up and
>      admin-down                                      SKIP
>        06:51:59 configuration change while peer in demand
>      mode                           SKIP
>        06:51:59 verify session goes down after
>      inactivity                                SKIP
>        06:51:59 echo
>      function                                                            SKIP
>        06:51:59 session goes down if echo function
>      fails                                 SKIP
>        06:51:59 echo packets looped
>      back                                                 SKIP
>        06:51:59 echo function stops if echo source is
>      removed                            SKIP
>        06:51:59 echo function stops if peer sets required min echo rx
>      zero               SKIP
>        06:51:59 hold BFD session
>      up                                                      SKIP
>        06:51:59 immediately honor remote required min rx
>      reduction                       SKIP
>        06:51:59 interface with bfd session
>      deleted                                       SKIP
>        06:51:59 echo packets with invalid checksum don't keep a session
>      up               SKIP
>        06:51:59 large remote required min rx
>      interval                                    SKIP
>        06:51:59 modify detect
>      multiplier                                                 SKIP
>        06:51:59 modify session - double required min
>      rx                                  SKIP
>        06:51:59 modify session - halve required min
>      rx                                   SKIP
>        06:51:59 no periodic frames outside poll sequence if remote demand
>      set            SKIP
>        06:51:59 test correct response to control frame with poll bit
>      set                 SKIP
>        06:51:59 test poll sequence
>      queueing                                              SKIP
>        06:51:59 bring BFD session
>      down                                                   SKIP
>        06:51:59 bring BFD session
>      up                                                     SKIP
>        06:51:59 bring BFD session up - first frame looked up by address
>      pair             SKIP
>        06:51:59 verify slow periodic control frames while session
>      down                   SKIP
>        06:51:59 stale echo packets don't keep a session
>      up                               SKIP
>        06:51:59 n07:03:51,792 Timeout while waiting for child test runner
>      process (last test running was `L2BD Multi-instance test 5 - delete 5
>      BDs' in `/tmp/vpp-unittest-TestL2bdMultiInst-AG7L1W')!
>        07:02:08 Killing possible remaining process IDs:  21754 21764 21766
> 
>      Comment:
>        Unknown level of parallelism going on here -- L2BD test status has not
>      been flushed to console.
>        Order of test results is different in the test runs on cloud infra.
>        !The failure signature is the same as above!
> 
>    [28]http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1521/ :
>    SUCCESS
>    
> [29]http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1518/
>    : SUCCESS
>    [30]http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1989/ :
>    SUCCESS
>    [31]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2030/
>    : SUCCESS
>    [32]http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1702/ :
>    SUCCESS
> 
>    => fd.io JJB  3:42 AM  Patch Set 9:  Verified-1  Build Failed
>    [33]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6781/ : FAILURE
> 
>      Failure Signature:
>        07:29:09 
>      
> ==============================================================================
>        07:29:09  collect information on Ethernet, IP4 and IP6 datapath (no
>      timers)
>        07:29:09 
>      
> ==============================================================================
>        07:29:09  no timers, one CFLOW packet, 9 Flows
>      inside                              OK
>        07:29:09  no timers, two CFLOW packets (mtu=256), 3 Flows in
>      each                  OK
>        07:29:09  L2 data on IP4
>      datapath                                                  OK
>        07:29:09  L2 data on IP6
>      datapath                                                  OK
>        07:29:09  L2 data on L2
>      datapath                                                   OK
>        07:29:09  L3 data on IP4
>      datapath                                                  OK
>        07:29:09  L3 data on IP6
>      datapath                                                  OK
>        07:29:09  L3 data on L2
>      datapath                                                   OK
>        07:29:09  L4 data on IP4
>      datapath                                                  OK
>        07:29:09  L4 data on IP6
>      datapath                                                  OK
>        07:29:09  L4 data on L2
>      datapath                                                   OK
>        07:29:09  verify templates on IP6 datapath      07:29:08,087 Timeout
>      while waiting for child test runner process (last test running was `L2BD
>      Multi-instance test 5 - delete 5 BDs' in
>      `/tmp/vpp-unittest-TestL2bdMultiInst-gbzkP4')!
>        07:29:09  Killing possible remaining process IDs:  1883 1897 1899
>      Comment:
>        Unknown level of parallelism going on here -- L2BD test status has not
>      been flushed to console.
>        Order of test results is the same as the previous cloud infra run, but
>      different than the Container PoC.
>        !The failure signature is the same as both of the previous Timeout
>      Failures above!
> 
>    Logs:
>    
> [34]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6781
>    [35]https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3104/ :
>    SUCCESS
>    [36]https://jenkins.fd.io/job/vpp-verify-master-centos7/6776/ : SUCCESS
>    [37]https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6787/ : SUCCESS
>    [38]https://jenkins.fd.io/job/vpp-docs-verify-master/5376/ : SUCCESS
> 
>    => Container PoC  9:26 AM  Patch Set 9:  Build Failed
>    [39]http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1527/ :
>    SUCCESS
>    
> [40]http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1524/
>    : SUCCESS
>    [41]http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1997/ :
>    SUCCESS
>    
> [42]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1313/
>    : SUCCESS
>    [43]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2039/
>    : SUCCESS
>    [44]http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ :
>    NOT_BUILT
> 
>      Comment:
>        Only seen on Container PoC.
>        Erroneous Build Failure status.
>        Subsequent Container PoC included only this job and was successful. 
> 
>    => Container PoC  9:44 AM  Patch Set 9:  Build Successful
>    [45]http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ :
>    SUCCESS
> 
>    => fd.io JJB  10:02 AM  Patch Set 9:  Verified+1  Build Successful
>    [46]https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3110/ :
>    SUCCESS
>    Logs:
>    
> [47]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-make-test-docs-verify-master/3110
>    [48]https://jenkins.fd.io/job/vpp-verify-master-centos7/6782/ : SUCCESS
>    Logs:
>    
> [49]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6782
>    [50]https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6793/ : SUCCESS
>    Logs:
>    
> [51]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/6793
>    [52]https://jenkins.fd.io/job/vpp-docs-verify-master/5382/ : SUCCESS
>    Logs:
>    
> [53]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-docs-verify-master/5382
>    [54]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6787/ : SUCCESS
>    Logs:
>    
> [55]https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6787
> 
>    ---- %< ----
> 
>    On 08/24/2017 10:21 PM, Florin Coras wrote:
> 
>      Hi, 
>      Build 6775 failed with: 
> 
>  01:07:20 verify templates on IP6 datapath      Fatal Python error: 
> Segmentation fault
>  01:08:59
>  01:08:59 Thread 0x00007fccdfabf700 <python> (most recent call first):
>  01:08:59   File "/usr/lib/python2.7/threading.py", line 340 in wait
>  01:08:59   File "/usr/lib/python2.7/Queue.py", line 168 in get
>  01:08:59   File "build/bdist.linux-x86_64/egg/vpp_papi.py", line 664 in 
> thread_msg_handler
>  01:08:59   File "/usr/lib/python2.7/threading.py", line 754 in run
>  01:08:59   File "/usr/lib/python2.7/threading.py", line 801 in 
> __bootstrap_inner
>  01:08:59   File "/usr/lib/python2.7/threading.py", line 774 in __bootstrap
>  01:08:59
> 
>      More details here [1]. Just my luck?
>      Thanks, 
>      Florin
>      [1] 
> [56]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/console
> 
>  _______________________________________________
>  csit-dev mailing list
>  [57]csit-...@lists.fd.io
>  [58]https://lists.fd.io/mailman/listinfo/csit-dev
> 
> References
> 
>    Visible links
>    1. https://gerrit.fd.io/r/#/c/8133
>    2. http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1515/
>    3. 
> http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1512/
>    4. http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1983/
>    5. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1301/
>    6. http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2022/
>    7. http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1695/
>    8. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/
>    9. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6775
>   10. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3098/
>   11. https://jenkins.fd.io/job/vpp-verify-master-centos7/6770/
>   12. https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6781/
>   13. https://jenkins.fd.io/job/vpp-docs-verify-master/5370/
>   14. http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1519/
>   15. 
> http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1516/
>   16. http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1987/
>   17. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1305/
>   18. http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2027/
>   19. http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1699/
>   20. https://jenkins.fd.io/job/vpp-verify-master-centos7/6774/
>   21. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6774
>   22. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6779/
>   23. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6779
>   24. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3102/
>   25. https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6785/
>   26. https://jenkins.fd.io/job/vpp-docs-verify-master/5374/
>   27. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1307/
>   28. http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1521/
>   29. 
> http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1518/
>   30. http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1989/
>   31. http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2030/
>   32. http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1702/
>   33. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6781/
>   34. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6781
>   35. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3104/
>   36. https://jenkins.fd.io/job/vpp-verify-master-centos7/6776/
>   37. https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6787/
>   38. https://jenkins.fd.io/job/vpp-docs-verify-master/5376/
>   39. http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1527/
>   40. 
> http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1524/
>   41. http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1997/
>   42. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1313/
>   43. http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2039/
>   44. http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/
>   45. http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/
>   46. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3110/
>   47. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-make-test-docs-verify-master/3110
>   48. https://jenkins.fd.io/job/vpp-verify-master-centos7/6782/
>   49. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6782
>   50. https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6793/
>   51. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/6793
>   52. https://jenkins.fd.io/job/vpp-docs-verify-master/5382/
>   53. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-docs-verify-master/5382
>   54. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6787/
>   55. 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6787
>   56. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/console
>   57. mailto:csit-...@lists.fd.io
>   58. https://lists.fd.io/mailman/listinfo/csit-dev
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev
  • Re: [vpp-dev]... Dave Wallace
    • Re: [vpp... Florin Coras (fcoras)
    • Re: [vpp... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)

Reply via email to