On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> Just an observation from the last week or so...
>> The biggest problem nova faces at the moment isn't code review latency. Our
>> biggest problem is failing to fix our bugs so that the gate is reliable.
>> The number of rechecks we've done in the last week to try and land code is
>> truly startling.
> I consider both problems to be pretty much equally as important. I don't
> think solving review latency or test reliabilty in isolation is enough to
> save Nova. We need to tackle both problems as a priority. I tried to avoid
> getting into my concerns about testing in my mail on review team bottlenecks
> since I think we should address the problems independantly / in parallel.
>> I know that some people are focused by their employers on feature work, but
>> those features aren't going to land in a world in which we have to hand
>> walk everything through the gate.
> Unfortunately the reliability of the gate systems has the highest negative
> impact on productivity right at the point in the dev cycle where we need
> it to have the least impact too.
> If we're going to continue to raise the bar in terms of testing coverage
> then we need to have a serious look at the overall approach we use for
> testing because what we do today isn't going to scale, even if it is
> 100% reliable. We can't keep adding new CI jobs for each new nova.conf
> setting that introduces a new code path, because each job has major
> implications for resource consumption (number of test nodes, log storage),
> not to mention reliability. I think we need to figure out a way to get
> more targetted testing of features, so we can keep the overall number
> of jobs lower and the tests shorter.
> Instead of having a single tempest run that exercises all the Nova
> functionality in one run, we need to figure out how to split it up
> into independant functional areas. For example if we could isolate
> tests which are affected by choice of cinder storage backend, then
> we could run those subset of tests multiple times, once for each
> supported cinder backend. Without this, the combinatorial explosion
> of test jobs is going to kill us.

One of the top issues killing Nova patches last week was a unit test
race (the wsgi worker one). There is no one to blame but Nova for that.
Jay was really the only team member digging into it.

I don't disagree on the disaggregation problem, however as lots of Nova
devs are ignoring unit test fails at this point, unless that changes no
other disaggregation is going make anything better.


Sean Dague

OpenStack-dev mailing list

Reply via email to