On Thu, Aug 10, 2017 at 12:04 PM, Paul Belanger <[email protected]> wrote: > On Thu, Aug 10, 2017 at 07:22:42PM +0530, Rabi Mishra wrote: >> On Thu, Aug 10, 2017 at 4:34 PM, Rabi Mishra <[email protected]> wrote: >> >> > On Thu, Aug 10, 2017 at 2:51 PM, Ian Wienand <[email protected]> wrote: >> > >> >> On 08/10/2017 06:18 PM, Rico Lin wrote: >> >> > We're facing a high failure rate in Heat's gates [1], four of our gate >> >> > suffering with fail rate from 6 to near 20% in 14 days. which makes >> >> most of >> >> > our patch stuck with the gate. >> >> >> >> There have been a confluence of things causing some problems recently. >> >> The loss of OSIC has distributed more load over everything else, and >> >> we have seen an increase in job timeouts and intermittent networking >> >> issues (especially if you're downloading large things from remote >> >> sites). There have also been some issues with the mirror in rax-ord >> >> [1] >> >> >> >> > gate-heat-dsvm-functional-convg-mysql-lbaasv2-ubuntu-xenial(19.67%) >> >> > gate-heat-dsvm-functional-convg-mysql-lbaasv2-non-apache- >> >> ubuntu-xenia(9.09%) >> >> > gate-heat-dsvm-functional-orig-mysql-lbaasv2-ubuntu-xenial(8.47%) >> >> > gate-heat-dsvm-functional-convg-mysql-lbaasv2-py35-ubuntu-xenial(6.00%) >> >> >> >> > We still try to find out what's the cause but (IMO,) seems it might be >> >> some >> >> > thing wrong with our infra. We need some help from infra team, to know >> >> if >> >> > any clue on this failure rate? >> >> >> >> The reality is you're just going to have to triage this and be a *lot* >> >> more specific with issues. >> > >> > >> > One of the issues we see recently is that, many jobs killed mid way >> > through the tests as the job times out(120 mins). It seems jobs are many >> > times scheduled to very slow nodes, where setting up devstack takes more >> > than 80 mins[1]. >> > >> > [1] http://logs.openstack.org/49/492149/2/check/gate-heat-dsvm- >> > functional-orig-mysql-lbaasv2-ubuntu-xenial/03b05dd/console. >> > html#_2017-08-10_05_55_49_035693 >> > >> > We download an image from a fedora mirror and it seems to take more than >> 1hr. >> >> http://logs.openstack.org/41/484741/7/check/gate-heat-dsvm-functional-convg-mysql-lbaasv2-py35-ubuntu-xenial/a797010/logs/devstacklog.txt.gz#_2017-08-10_04_13_14_400 >> >> Probably an issue with the specific mirror or some infra network bandwidth >> issue. I've submitted a patch to change the mirror to see if that helps. >> > Today we mirror both fedora-26[1] and fedora-25 (to be removed shortly). So if > you want to consider bumping your image for testing, you can fetch it from our > AFS mirrors. > > You can source /etc/ci/mirror_info.sh to get information about things we > mirror. > > [1] > http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/
In order to make the gate happy, I've taken the time to submit this patch, appreciate if it can be reviewed so we can reduce the churn on our instances: https://review.openstack.org/#/c/492634/ >> >> > I find opening an etherpad and going >> >> through the failures one-by-one helpful (e.g. I keep [2] for centos >> >> jobs I'm interested in). >> >> >> >> Looking at the top of the console.html log you'll have the host and >> >> provider/region stamped in there. If it's timeouts or network issues, >> >> reporting to infra the time, provider and region of failing jobs will >> >> help. If it's network issues similar will help. Finding patterns is >> >> the first step to understanding what needs fixing. >> >> >> >> If it's due to issues with remote transfers, we can look at either >> >> adding specific things to mirrors (containers, images, packages are >> >> all things we've added recently) or adding a caching reverse-proxy for >> >> them ([3],[4] some examples). >> >> >> >> Questions in #openstack-infra will usually get a helpful response too >> >> >> >> Good luck :) >> >> >> >> -i >> >> >> >> [1] https://bugs.launchpad.net/openstack-gate/+bug/1708707/ >> >> [2] https://etherpad.openstack.org/p/centos7-dsvm-triage >> >> [3] https://review.openstack.org/491800 >> >> [4] https://review.openstack.org/491466 >> >> >> >> ____________________________________________________________ >> >> ______________ >> >> OpenStack Development Mailing List (not for usage questions) >> >> Unsubscribe: [email protected]?subject:unsubscrib >> >> e >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> > >> > >> > >> > -- >> > Regards, >> > Rabi Misra >> > >> > >> >> >> -- >> Regards, >> Rabi Mishra > >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: [email protected]?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
