Re: [openstack-dev] [infra][tripleo] status of tripleo-test-cloud-rh1

James Slagle Thu, 22 Sep 2016 05:46:27 -0700

On Mon, Aug 8, 2016 at 1:47 PM, James Slagle <[email protected]> wrote:
> On Mon, Aug 8, 2016 at 1:06 PM, Jeremy Stanley <[email protected]> wrote:
>> On 2016-08-08 11:47:56 -0400 (-0400), James Slagle wrote:
>> [...]
>>> I suppose it's also possible that we might be pushing too strongly
>>> down the multinode path? Is the general concensus in infra that they'd
>>> like to help enable project teams to eventually add 3 and 4 (and maybe
>>> more) node multinode jobs?
>> [...]
>>
>> We've not outright rejected the idea, but do want to make sure that
>> there's been suitable due diligence done explaining how the things
>> you'll be able to test with >2 job nodes effectively can't be done
>> with <=2.
>
> Our current 2 node job uses the first node as the undercloud which
> deploys an AIO Overcloud on the 2nd node. TripleO traditionally has
> also been able to deploy standalone Compute, Cinder, Swift, and Ceph
> nodes. Additionally in this cycle, a lot of work has gone into making
> it fully customizable what services are deployed on which roles. You
> can deploy nodes that are just API services, or just a DB server, or
> rabbitmq, etc. In order to test the composability feature we need to
> deploy to more than one node.
>
> Also, we'd need at least 3 Overcloud nodes to successfully test that
> we can deploy a Pacemaker managed cluster successfully.
>
>> Also we want to be sure that projects who are interested
>> in multi-node jobs start with just 2 job nodes and get some initial
>> tests performing well and returning stable results before trying to
>> push past 2.
>
> I think that the 2 node job that we've added has been stable. We've
> worked a few issues out that we've hit depending on which cloud
> provider we land on, but generally speaking it has been very stable.
>
> We make use of the ovs_vxlan_bridge function from devstack-gate to
> configure the private networking among the nodes. I think this was a
> good first step since that has been a proven way in the devstack
> multinode jobs. I'd like to move to using TripleO's os-net-config in
> the future though, since that is the tool used in TripleO. The end
> result of the network configuration would be the same (using ovs vxlan
> bridges), we'd just use a different tool to get there.
>
> --
> -- James Slagle
> --


Reviving this thread to continue the discussion. I'd like to keep the
discussion going with hopes that we can set the stage to finalize a
plan for what we want to tackle in Ocata for tripleo-ci at the
Summit[1].

State of rh1 and rh2
==============
Both rh1 and rh2 are OVB (OpenStack Virtual Baremetal) enabled
clouds[2]. OVB allows us to treat OpenStack instances as baremetal
instances for traditional tripleo-ci testing (PXE booting, etc).
Currently only rh1 is enabled in nodepool. We could re-enable rh2 if
we wanted (the previous ntp issue is resolved now).

As Paul indicated, he's done signficant work to bring these 2 clouds
in alignment with standard Infra tooling. If we wanted to move forward
with opening up these clouds to run other jobs besides tripleo-ci, we
could do that.

Multinode jobs
==========
We've continued to add additional CI jobs using the multinode support
in nodepool and tripleo-ci, running on all the enabled clouds (except
rh1) in nodepool. We are still only at using 2 nodes. I'd like to add
additional jobs and increase this to 3 nodes initially (probably
deploying ceph on the additional node), and then 4 nodes for doing an
HA deployment.

Becoming 3rd party CI
=================
tripleo-ci becoming 3rd party CI continues to come up in discussion. I
agree that the OVB based tripleo-ci jobs align better with the 3rd
party CI model since they do require a specially configured OpenStack
cloud. However, the previous points about opening up rh1/rh2 for non
tripleo jobs and scaling out multinode jobs muddies the water for me a
bit when this topic comes up.

Given we'd like to scale out and add more multinode jobs, I'd like to
counter that by offering some capacity back to nodepool by opening up
the rh1/rh2 clouds to all job types.

However, if tripleo-ci becomes 3rd party CI, we need some
infrastructure to run that CI on and resources to set it up and
maintain the CI tooling. At that point, the TripleO team would be
trying to maintain a 3rd party CI system, and keep 2 public clouds
running for normal infra jobs. That may be possible to do, but it is
additional commitment.

Just to be clear, I'm not trying to say that if tripleo-ci becomes 3rd
party CI, we will "just take our cloud and go home" :-). We want to be
better aligned and integrated with infra tooling and jobs. Maintaining
a 3rd party CI system and 2 public clouds integrated with Infra's CI
system is additional work though, and like a lot of project teams, we
have to prioritize and make trade offs.

Further, even if tripleo-ci becomes 3rd party CI for OVB jobs, and
there are capacity concerns about us scaling out our multinode jobs
onto the other enabled clouds in nodepool, we may still prioritize the
work to maintain these 2 clouds for Infra's general use.

We want to work more closely with Infra overall. But if there is
little perceived benefit in that there are no capacity concerns and no
concerns about us going to 3+ node multinode jobs, then I think we'd
probably just disable our clouds in nodepool, make tripleo-ci OVB jobs
3rd party, and press on that way.


Better alignment with infra CI tools
=========================
When the topic of 3rd party CI comes up, it is often accompanied with
the fact that tripleo-ci is not aligned with other infra tools
(devstack-gate, zuul-cloner, others?). We do plan to continue to
address these things and strive for better alignment. I'm not sure of
all the historical context around what the "original" plans were for
tripleo-ci, and it doesn't really matter all that much anyway.

If the repo needs some modernization to be in better alignment with
tooling or to take advantage of new features in nodepool/zuul (I know
Paul has some ideas around pipelining), then I think we will work on
that regardless.

Some of my goals for tripleo-ci are to continue to try and make it
easier to consume externally, and to use the TripleO production
tooling where possible.

However, we may not be able to align perfectly with how other jobs are
run given the nature of the project (we don't use source installs, pip
installs, devstack, etc). Using the same production tooling in
tripleo-ci that we expect TripleO users to also use when they deploy
is a goal of tripleo-ci.

As an example, we will likely not continue to use devstack-gate to
setup multinode networking, and instead use the tool TripleO uses for
production: os-net-config. Does that have any impact on the decision
of tripleo-ci becoming 3rd party CI or not? I'm not honestly sure what
the expectations are around things like that (e.g., must use something
from devstack-gate).

Personally, I would like to see more testing in the check and gate
queues with production deployment tools across the board (fuel, kolla,
tripleo, etc), because it makes all of OpenStack better when issues
are found earlier rather than later. I think the progress we've made
so far with the TripleO multinode jobs have proven that this is
possible.

[1] We have TripleO sessions proposed to talk about the state of CI:
https://etherpad.openstack.org/p/ocata-tripleo
[2] https://github.com/cybertron/openstack-virtual-baremetal

-- 
-- James Slagle
--

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [infra][tripleo] status of tripleo-test-cloud-rh1

Reply via email to