Zuul just caused my brain to overload ;) thx for the detailed explanation.

Sent from my really tiny device...

> On Oct 29, 2013, at 3:42 AM, "Sean Dague" <s...@dague.net> wrote:
> 
> Andrew Laski    correctly called us out for not really proving enough 
> information n the python-novaclient revert yesterday - 
> https://review.openstack.org/#/c/54108/. Appologies there. At the time we 
> were dealing with a gate that grenade was failing every change (for the prior 
> 6 hours), we were all on our first cup of coffee, and while we got to 
> resolution, we did so with an entirely unuseful commit message to explain it.
> 
> Here's what happened. python-novaclient landed a change that changed the user 
> interface. This change meant that devstack exercises failed on validating the 
> details on getting aggregates.
> 
> However, upgrade testing is hard, and we had a loophole, that led us to a 
> wedge in the gate.
> 
> For the grenade jobs we prep 2 versions of the OpenStack codebase, grizzly 
> and master (yes, still grizzly and master, we're working on that). The 
> grizzly tree is grizzly devstack, which means it's grizzly on all the core 
> servers, but master on all the clients. However, the grizzly tree doesn't get 
> "zuulified", which was the crux of the issue.
> 
> By zuulified I mean think about the zuul queue. How do we actually test a 
> change 15 deep in the gate? We aren't testing just that change, but all the 
> gerrit proposed changes above it. That means that zuul needs to go through 
> and update relevant git trees beyond master, but to the proposed change sets 
> for all the jobs in front of it. This is accross projects, and should be 
> across branches.
> 
> But we'd not gotten the system to do this correctly on the "old" side yet. 
> Which means that python-novaclient landed a breaking change, but the "old" 
> side built a grizzly cloud with only master, not master + gerrit. It passed 
> the verification of the "old" cloud, then moved to the new cloud, then ran a 
> different set of tests to verify the new cloud, which passed.
> 
> However, by threading the needle in this way, it meant no one else could ever 
> pass grenade again. The quick fix was the python-novaclient revert. The real 
> fix is probably this - https://review.openstack.org/#/c/53940/ which we were 
> actually working on last week, to both update the set of trees we are using, 
> and update the zuul refs on the "old" side of the equation. Once that lands 
> I'll attempt to revert the revert, and ensure that it actually gets caught in 
> the system. Then we can work on updating tests so it can get through. But 
> right now it's a perfect test case to proove that we did this right, so 
> leaving it in the reverted state is critical.
> 
> This also highlights one of the reasons I've been hard on folks recently 
> about some alternative upgrade or mixed version testing models, and doing it 
> outside of grenade. Everything is simple when you talk about a single change. 
> But when you are 15 or 20 deep in zuul gate, and have to handle 3 proposed 
> stable nova changes, 5 proposed master nova changes, a keystone stable, a 
> keystone master, and a few cinder master changes in front of you to build the 
> environments you need to test in the gate.... this gets complicated fast. 
> Basically you aren't allowed to use git inside your upgrade tool for this 
> reason, because your tool has no idea what it's supposed to actually test, 
> only ZUUL knows. And, as you can see, we've yet to get this whole thing 
> mapped out the first time. :)
> 
>    -Sean
> 
> -- 
> Sean Dague
> http://dague.net
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to