On 18/12/14 08:48, Steve Kowalik wrote: > Hai, > > I am finding myself at a loss at explaining how the CI clouds that run > the tripleo jobs work from end-to-end. I am clear that we have a tripleo > deployment running on those racks, with a seed, a HA undercloud and > overcloud, but then I'm left with a number of questions, such as: Yup, this is correct, from a CI point of view all that is relevant is the overcloud and a set of baremetal test env hosts. The seed and undercloud are there because we used tripleo to deploy the thing in the first place.
> > How do we run the testenv images on the overcloud? nodepool talks to our overcloud to create an instance where the jenkins jobs run. This "jenkins node" is where we build the images, jenkins doesn't manage and isn't aware of the testenvs hosts. The entry point for jenkins to run tripleo ci is toci_gate_test.sh, at the end of this script you'll see a call to testenv-client[1] testenv-client talks to gearman (an instance on our overcloud, a different gearman instance to what infra have), gearman responds with a json file representing one of the the testenv's that have been registered with it. testenv-client then runs the command "./toci_devtest.sh" and passes in the json file (via $TE_DATAFILE). To prevent 2 CI jobs using the same testenv, the testenv is now "locked" until toci_devtest exits. The jenkins node now has all the relevant IPs and MAC addresses to talk to the testenv. > > How do the testenv images interact with the nova-compute machines in > the overcloud? The images are built on instances in this cloud. The MAC address of eth1 on the seed in for the testenv has been registered with neutron on the overcloud, so its IP is known (its in the json file we got in $TE_DATAFILE). All traffic to the other instances in the CI testenv is routed though the seed its eth2 shares a ovs bridge with eth1 from the other VM's in the same testenv. > > Are the machines running the testenv images meant to be long-running, > or are they recycled after n number of runs? They are long running and in theory shouldn't need to be recycled, in practice they get recycled sometimes for one of 2 reason 1. The image needs to be updated (e.g. to increase the amount of RAM on the vibvirt domains they host) 2. If one is experiencing a problem, I usually do a "nova rebuild" on it, this doesn't happen very frequently, we currently have 15 TE hosts on rh1 7 have an uptime over 80 days, while the others are new HW that was added last week. But problems we have encountered in the passed causing a rebuild include a TE Host loosing its IP or https://bugs.launchpad.net/tripleo/+bug/1335926 https://bugs.launchpad.net/tripleo/+bug/1314709 > > Cheers, No problem I tried to document this at one stage here[2] but feel free to add more or point out where its lacking or ask questions here and I'll attempt to answer. thanks, Derek. [1] http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_gate_test.sh?id=3d86dd4c885a68eabddb7f73a6dbe6f3e75fde64#n69 [2] http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/docs/TripleO-ci.rst _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev