Re: [openstack-dev] [nova] [infra] The same SRIOV / NFV CI failures missed a regression, why?

Monty Taylor Sat, 26 Mar 2016 08:31:26 -0700

On 03/25/2016 03:52 PM, Jeremy Stanley wrote:

On 2016-03-25 16:33:44 -0400 (-0400), Jay Pipes wrote:
[...]

What I'm proposing isn't using or needing a custom OpenStack
deployment. There's nothing non-standard at all about the PCI or
NFV stuff besides the hardware required to functionally test it.


What you _are_ talking about though is maintaining physical servers
in a data center running an OpenStack environment (and if you want
it participating in gating/preventing changes from merging you need
more than one environment so we don't completely shutdown
development when one of them collapses). This much has been a
challenge for the TripleO team, such that the jobs running for them
are still not voting on their changes.

What we're talking about here is using the same upstream Infra
Puppet modules, installed on a long-running server in a lab that
can interface with upstream Gerrit, respond to new change events
in the Gerrit stream, and trigger devstack-gate[-like] builds on
some bare-metal gear.


It's possible I'm misunderstanding you... you're talking about
maintaining a deployment of OpenStack with specific hardware to be
able to run these jobs in, right? That's not as trivial an effort as
it sounds, and I'm skeptical "a couple of operators" is sufficient
to sustain such an endeavor.


Two things:

- Rhere is no current concept of "a long-lived machine running that werun devstack on from time to time" - everything in Infra is designedaround using OpenStack APIs to get compute resources. So if we want torun jobs on hardware in this lab, as it stands right now, that hardwarewould need to be provided by Ironic+Nova.

Last time we did the math (and Jim can maybe correct my numbers) inorder to keep up with the demand similar to our VM environments, Ibelieve such an env would need at least 83 Ironic nodes. And as Jeremysaid, we'd need at least 2 envs for redundancy - so in looking atgetting it funded, looking for approximately 200 machines is likelyabout right.

- zuul v3 does introduce the concept of statically available resourcesthat can be checked out of nodepool - specifically to address thequestion of people wanting to use long-lived servers as test resourcesfor things. The machine count is still likely to remain static - butonce we have zuul v3 out, it might reduce the need for the operators tooperate 2 100-node Ironic-based OpenStack clouds. (This implies thathelp with zuul v3 might be seen as an accelerant to this project)

Also keep in mind, if/when resources are sought out, that everyunderlying OS config would double the amount of resources. So if we got2 sets of 100 nodes to start with, and started running NFV config'ddevstack tests on them on ubuntu trusty, and then our friends at RedHatrequest that we test the same on a RH-baed distro, the cost for thatwould be an additional 100 nodes in each DC.

Is that something that is totally out of the question for the
upstream Infra team to be a guide for?


We've stated in the past that we're willing to accept this level of
integration as long as our requirements for redundancy/uptime are
met. We mostly just don't want to see issues with the environment
block development for projects relying on it because it's the only
place those jobs can run, so multiple environments in different data
centers would be a necessity (right now our gating jobs are able to
run in any of 9 regions from 6 providers, which mitigates this
risk).



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [infra] The same SRIOV / NFV CI failures missed a regression, why?

Reply via email to