I had some patches to collect more stats about mlocks here: https://review.openstack.org/#/q/topic:collect-mlock-stats-in-gate but they need reviews.
Ihar On Fri, Mar 17, 2017 at 5:28 AM Jordan Pittier <[email protected]> wrote: > The patch that reduced the number of Tempest Scenarios we run in every job > and also reduce the test run concurrency [0] was merged 13 days ago. Since, > the situation (i.e the high number of false negative job results) has not > improved significantly. We need to keep looking collectively at this. > > There seems to be an agreement that we are hitting some memory limit. > Several of our most frequent failures are memory related [1]. So we should > either reduce our memory usage or ask for bigger VMs, with more than 8GB of > RAM. > > There was/is several attempts to reduce our memory usage, by reducing the > Mysql memory consumption ([2] but quickly reverted [3]), reducing the > number of Apache workers ([4], [5]), more apache2 tuning [6]. If you have > any crazy idea to help in this regard, please help. This is high priority > for the whole openstack project, because it's plaguing many projects. > > We have some tools to investigate memory consumption, like some regular > "dstat" output [7], a home-made memory tracker [8] and stackviz [9]. > > Best, > Jordan > > [0]: https://review.openstack.org/#/c/439698/ > [1]: http://status.openstack.org/elastic-recheck/gate.html > [2] : https://review.openstack.org/#/c/438668/ > [3]: https://review.openstack.org/#/c/446196/ > [4]: https://review.openstack.org/#/c/426264/ > [5]: https://review.openstack.org/#/c/445910/ > [6]: https://review.openstack.org/#/c/446741/ > [7]: > http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/dstat-csv_log.txt.gz > [8]: > http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/screen-peakmem_tracker.txt.gz > [9] : > http://logs.openstack.org/41/446741/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/fa4d2e6/logs/stackviz/#/stdin/timeline > > On Sat, Mar 4, 2017 at 4:19 PM, Andrea Frittoli <[email protected] > > wrote: > > Quick update on this, the change is now merged, so we now have a smaller > number of scenario tests running serially after the api test run. > > We'll monitor gate stability for the next week or so and decide whether > further actions are required. > > Please keep categorizing failures via elastic recheck as usual. > > thank you > > andrea > > On Fri, 3 Mar 2017, 8:02 a.m. Ghanshyam Mann, <[email protected]> > wrote: > > Thanks. +1. i added my list in ethercalc. > > Left put scenario tests can be run on periodic and experimental job. IMO > on both ( periodic and experimental) to monitor their status periodically > as well as on particular patch if we need to. > > -gmann > > On Fri, Mar 3, 2017 at 4:28 PM, Andrea Frittoli <[email protected] > > wrote: > > Hello folks, > > we discussed a lot since the PTG about issues with gate stability; we need > a stable and reliable gate to ensure smooth progress in Pike. > > One of the issues that stands out is that most of the times during test > runs our test VMs are under heavy load. > This can be the common cause behind several failures we've seen in the > gate, so we agreed during the QA meeting yesterday [0] that we're going to > try reducing the load and see whether that improves stability. > > Next steps are: > - select a subset of scenario tests to be executed in the gate, based on > [1], and run them serially only > - the patch for this is [2] and we will approve this by the end of the day > - we will monitor stability for a week - if needed we may reduce > concurrency a bit on API tests as well, and identify "heavy" tests > candidate for removal / refactor > - the QA team won't approve any new test (scenario or heavy resource > consuming api) until gate stability is ensured > > Thanks for your patience and collaboration! > > Andrea > > --- > irc: andreaf > > [0] > http://eavesdrop.openstack.org/meetings/qa/2017/qa.2017-03-02-17.00.txt > [1] https://ethercalc.openstack.org/nu56u2wrfb2b > [2] https://review.openstack.org/#/c/439698/ > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
