On Thu, Mar 30, 2017 at 9:39 AM, Emilien Macchi <[email protected]> wrote:
> On Mon, Mar 27, 2017 at 8:00 AM, Flavio Percoco <[email protected]> wrote: > > On 23/03/17 16:24 +0100, Martin André wrote: > >> > >> On Wed, Mar 22, 2017 at 2:20 PM, Dan Prince <[email protected]> wrote: > >>> > >>> On Wed, 2017-03-22 at 13:35 +0100, Flavio Percoco wrote: > >>>> > >>>> On 22/03/17 13:32 +0100, Flavio Percoco wrote: > >>>> > On 21/03/17 23:15 -0400, Emilien Macchi wrote: > >>>> > > Hey, > >>>> > > > >>>> > > I've noticed that container jobs look pretty unstable lately; to > >>>> > > me, > >>>> > > it sounds like a timeout: > >>>> > > http://logs.openstack.org/19/447319/2/check-tripleo/gate-tripleo- > >>>> > > ci-centos-7-ovb-containers-oooq-nv/bca496a/console.html#_2017-03- > >>>> > > 22_00_08_55_358973 > >>>> > > >>>> > There are different hypothesis on what is going on here. Some > >>>> > patches have > >>>> > landed to improve the write performance on containers by using > >>>> > hostpath mounts > >>>> > but we think the real slowness is coming from the images download. > >>>> > > >>>> > This said, this is still under investigation and the containers > >>>> > squad will > >>>> > report back as soon as there are new findings. > >>>> > >>>> Also, to be more precise, Martin André is looking into this. He also > >>>> fixed the > >>>> gate in the last 2 weeks. > >>> > >>> > >>> I spoke w/ Martin on IRC. He seems to think this is the cause of some > >>> of the failures: > >>> > >>> http://logs.openstack.org/32/446432/1/check-tripleo/gate- > tripleo-ci-cen > >>> tos-7-ovb-containers-oooq-nv/543bc80/logs/oooq/overcloud-controller- > >>> 0/var/log/extra/docker/containers/heat_engine/log/heat/heat- > >>> engine.log.txt.gz#_2017-03-21_20_26_29_697 > >>> > >>> > >>> Looks like Heat isn't able to create Nova instances in the overcloud > >>> due to "Host 'overcloud-novacompute-0' is not mapped to any cell'. This > >>> means our cells initialization code for containers may not be quite > >>> right... or there is a race somewhere. > >> > >> > >> Here are some findings. I've looked at time measures from CI for > >> https://review.openstack.org/#/c/448533/ which provided the most > >> recent results: > >> > >> * gate-tripleo-ci-centos-7-ovb-ha [1] > >> undercloud install: 23 > >> overcloud deploy: 72 > >> total time: 125 > >> * gate-tripleo-ci-centos-7-ovb-nonha [2] > >> undercloud install: 25 > >> overcloud deploy: 48 > >> total time: 122 > >> * gate-tripleo-ci-centos-7-ovb-updates [3] > >> undercloud install: 24 > >> overcloud deploy: 57 > >> total time: 152 > >> * gate-tripleo-ci-centos-7-ovb-containers-oooq-nv [4] > >> undercloud install: 28 > >> overcloud deploy: 48 > >> total time: 165 (timeout) > >> > >> Looking at the undercloud & overcloud install times, the most task > >> consuming tasks, the containers job isn't doing that bad compared to > >> other OVB jobs. But looking closer I could see that: > >> - the containers job pulls docker images from dockerhub, this process > >> takes roughly 18 min. > > > > > > I think we can optimize this a bit by having the script that populates > the > > local > > registry in the overcloud job to run in parallel. The docker daemon can > do > > multiple pulls w/o problems. > > > >> - the overcloud validate task takes 10 min more than it should because > >> of the bug Dan mentioned (a fix is in the queue at > >> https://review.openstack.org/#/c/448575/) > > > > > > +A > > > >> - the postci takes a long time with quickstart, 13 min (4 min alone > >> spent on docker log collection) whereas it takes only 3 min when using > >> tripleo.sh > > > > > > mmh, does this have anything to do with ansible being in between? Or is > that > > time specifically for the part that gets the logs? > > > >> > >> Adding all these numbers, we're at about 40 min of additional time for > >> oooq containers job which is enough to cross the CI job limit. > >> > >> There is certainly a lot of room for optimization here and there and > >> I'll explore how we can speed up the containers CI job over the next > > > > > > Thanks a lot for the update. The time break down is fantastic, > > Flavio > > TBH the problem is far from being solved: > > 1. Click on https://status-tripleoci.rhcloud.com/ > 2. Select gate-tripleo-ci-centos-7-ovb-containers-oooq-nv > > Container job has been failing more than 55% of the time. > > As a reference, > gate-tripleo-ci-centos-7-ovb-nonha has 90% of success. > gate-tripleo-ci-centos-7-ovb-ha has 64% of success. > > It clearly means the ovb-containers job was and is not ready to be run > in the check pipeline, it's not reliable enough. > > The current queue time in TripleO OVB is 11 hours. This is not > acceptable for TripleO developers and we need a short term solution, > which is disabling this job from the check pipeline: > https://review.openstack.org/#/c/451546/ > > Yes, given resource constraints I don't see an alternative in the short term. > On the long-term, we need to: > > - Stabilize ovb-containers which is AFIK already WIP by Martin (kudos > to him). My hope is Martin gets enough help from Container squad to > work on this topic. > - Remove ovb-nonha scenario from the check pipeline - and probably > keep it periodic. Dan Prince started some work on it: > https://review.openstack.org/#/c/449791/ and > https://review.openstack.org/#/c/449785/ - but not much progress on it > in the recent days. > - Engage some work on getting multinode-scenario(001,002,003,004) jobs > for containers, so we don't need much OVB jobs (only one probably) for > container scenarios. > > Another work item in progress which should help with the stability of the ovb containers job is Dan has set up a docker-distribution based registry on a node in rhcloud. Once jobs are pulling images from this there should be less timeouts due to image pull speed. > I know everyone is busy by working on container support in composable > services, but we might assign more resources on CI work here, > otherwise I'm not sure how we're going to stabilize the CI. > > Any feedback is very welcome. > > > > >> weeks. > >> > >> Martin > >> > >> [1] > >> http://logs.openstack.org/33/448533/2/check-tripleo/gate- > tripleo-ci-centos-7-ovb-ha/d2c1b16/ > >> [2] > >> http://logs.openstack.org/33/448533/2/check-tripleo/gate- > tripleo-ci-centos-7-ovb-nonha/d6df760/ > >> [3] > >> http://logs.openstack.org/33/448533/2/check-tripleo/gate- > tripleo-ci-centos-7-ovb-updates/3b1f795/ > >> [4] > >> http://logs.openstack.org/33/448533/2/check-tripleo/gate- > tripleo-ci-centos-7-ovb-containers-oooq-nv/b816f20/ > >> > >>> Dan > >>> > >>>> > >>>> Flavio > >>>> > >>>> > >>>> > >>>> _____________________________________________________________________ > >>>> _____ > >>>> OpenStack Development Mailing List (not for usage questions) > >>>> Unsubscribe: [email protected]?subject:unsubs > >>>> cribe > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>> > >>> > >>> > >>> ____________________________________________________________ > ______________ > >>> OpenStack Development Mailing List (not for usage questions) > >>> Unsubscribe: > >>> [email protected]?subject:unsubscribe > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > >> > >> ____________________________________________________________ > ______________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: [email protected]?subject: > unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > -- > > @flaper87 > > Flavio Percoco > > > > ____________________________________________________________ > ______________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: [email protected]?subject: > unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > -- > Emilien Macchi > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
