I've been running overcloud CI tests on hp1 to establish if its ready to
turn back on running real CI, I'd like to add this back in soon but
first have some numbers we should look at and make some decisions

The hp1 cloud throws up more false negatives then rh1, nearly all of
these are either problems within the nova bm driver or the neutron l3
agent, things improve from a pass rate of somewhere around 40% to about
85% with the following 2 patches
https://review.openstack.org/#/c/121492/ # ensure l3 agent doesn't fail
if neutron-server isn't ready
https://review.openstack.org/#/c/121155/ # Increase sleep times in
nova-bm driver

With these 2 patches I think the pass rate is acceptable but there is a
difference in runtime, overcloud jobs run in about 140 minutes (rh1 is
averaging about 95 minues)

We are using VM's with 2G of memory, with 3G VM's the runtime goes down
to about 120 minutes, this is an option to save a little time but we end
up loosing 33% of our capacity (in simultanious jobs)

How would people feel about turning back on hp1 and increasing the
timeout to allow for the increased runtimes?

While making changes we should also consider increasing switching back
to x86_64 and bumping VM's to 4G essentially halving the number of jobs
we can simultaneously run, but CI would test what most deployments would
actually be using.

Also its worth noting the test I have been using to compare jobs is the
F20 overcloud job, something has happened recently causing this job to
run slower then it used to run (possibly upto 30 minutes slower), I'll
now try to get to the bottom of this. So the times may not end up being
as high as referenced above but I'm assuming the relative differences
between the two clouds wont change.


OpenStack-dev mailing list

Reply via email to