Re: [openstack-dev] [TripleO] Austin summit - session recap/summary

Paul Belanger Thu, 19 May 2016 08:37:57 -0700

On Wed, May 18, 2016 at 08:34:40AM -0400, Paul Belanger wrote:
> On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
> > On 6 May 2016 at 14:18, Paul Belanger <[email protected]> wrote:
> > > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
> > >> Hi all,
> > >>
> > >> Some folks have requested a summary of our summit sessions, as has been
> > >> provided for some other projects.
> > >>
> > >> I'll probably go into more detail on some of these topics either via
> > >> subsequent more focussed threads an/or some blog posts but what follows 
> > >> is
> > >> an overview of our summit sessions[1] with notable actions or decisions
> > >> highlighted.  I'm including some of my own thoughts and conclusions, 
> > >> folks
> > >> are welcome/encouraged to follow up with their own clarifications or
> > >> different perspectives :)
> > >>
> > >> TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:
> > >>
> > >> -------------------------------------
> > >> Upgrades - current status and roadmap
> > >> -------------------------------------
> > >>
> > >> In this session we discussed the current state of upgrades - initial
> > >> support for full major version upgrades has been implemented, but the
> > >> implementation is monolithic, highly coupled to pacemaker, and inflexible
> > >> with regard to third-party extraconfig changes.
> > >>
> > >> The main outcomes were that we will add support for more granular
> > >> definition of the upgrade lifecycle to the new composable services 
> > >> format,
> > >> and that we will explore moving towards the proposed lightweight HA
> > >> architecture to reduce the need for so much pacemaker specific logic.
> > >>
> > >> We also agreed that investigating use of mistral to drive upgrade 
> > >> workflows
> > >> was a good idea - currently we have a mixture of scripts combined with 
> > >> Heat
> > >> to drive the upgrade process, and some refactoring into discrete mistral
> > >> workflows may provide a more maintainable solution.  Potential for using
> > >> the existing SoftwareDeployment approach directly via mistral (outside of
> > >> the heat templates) was also discussed as something to be further
> > >> investigated and prototyped.
> > >>
> > >> We also touched on the CI implications of upgrades - we've got an 
> > >> upgrades
> > >> job now, but we need to ensure coverage of full release-to-release 
> > >> upgrades
> > >> (not just commit to commit).
> > >>
> > >> -------------------------------
> > >> Containerization status/roadmap
> > >> -------------------------------
> > >>
> > >> In this session we discussed the current status of containers in TripleO
> > >> (which is to say, the container based compute node which deploys 
> > >> containers
> > >> via Heat onto an an Atomic host node that is also deployed via Heat), and
> > >> what strategy is most appropriate to achieve a fully containerized 
> > >> TripleO
> > >> deployment.
> > >>
> > >> Several folks from Kolla participated in the session, and there was
> > >> significant focus on where work may happen such that further 
> > >> collaboration
> > >> between communities is possible.  To some extent this discussion on where
> > >> (as opposed to how) proved a distraction and prevented much discussion on
> > >> supportable architectural implementation for TripleO, thus what follows 
> > >> is
> > >> mostly my perspective on the issues that exist:
> > >>
> > >> Significant uncertainty exists wrt integration between Kolla and TripleO 
> > >> -
> > >> there's largely consensus that we want to consume the container images
> > >> defined by the Kolla community, but much less agreement that we can
> > >> feasably switch to the ansible-orchestrated deployment/config flow
> > >> supported by Kolla without breaking many of our primary operator 
> > >> interfaces
> > >> in a fundamentally unacceptable way, for example:
> > >>
> > >> - The Mistral based API is being implemented on the expectation that the
> > >>   primary interface to TripleO deployments is a parameters schema exposed
> > >>   by a series of Heat templates - this is no longer true in a "split 
> > >> stack"
> > >>   model where we have to hand off to an alternate service orchestration 
> > >> tool.
> > >>
> > >> - The tripleo-ui (based on the Mistral based API) consumes heat parameter
> > >>   schema to build it's UI, and Ansible doesn't support the necessary
> > >>   parameter schema definition (such as types and descriptions) to enable
> > >>   this pattern to be replicated.  Ansible also doesn't provide a HTTP 
> > >> API,
> > >>   so we'd still have to maintain and API surface for the (non python) UI 
> > >> to
> > >>   consume.
> > >>
> > >> We also discussed ideas around integration with kubernetes (a hot topic 
> > >> on
> > >> the Kolla track this summit), but again this proved inconclusive beyond
> > >> that yes someone should try developing a PoC to stimulate further
> > >> discussion.  Again, significant challenges exist:
> > >>
> > >> - We still need to maintain the Heat parameter interfaces for the API/UI,
> > >>   and there is also a strong preference to maintain puppet as a tool for
> > >>   generating service configuration (so that existing operator 
> > >> integrations
> > >>   via puppet continue to function) - this is a barrier to directly
> > >>   consuming the kolla-kubernetes effort directly.
> > >>
> > >> - A COE layer like kubernetes is a poor fit for deployments where 
> > >> operators
> > >>   require strict control of service placement (e.g exactly which nodes a 
> > >> service
> > >>   runs on, IP address assignments to specific nodes etc) - this is 
> > >> already
> > >>   a strong requirement for TripleO users and we need to figure out if/how
> > >>   it's possible to control container placement per node/namespace.
> > >>
> > >> - There are several uncertainties regarding the HA architecture, such as
> > >>   how do we achieve fencing for nodes (which is currently provided via
> > >>   pacemaker), in particular the HA model for real production deployments
> > >>   via kubernetes for stateful services such as rabbit/galera is unclear.
> > >>
> > >> Overall a session with much discussion, but further prototyping and
> > >> discussion is required before we can define a definitive implementation
> > >> strategy (several folks are offering to be involved in this).
> > >>
> > >> ---------------------------------------------
> > >> Work session (Composable Services and beyond)
> > >> ---------------------------------------------
> > >>
> > >> In this session we discussed the status of the currently in-progress work
> > >> to decompose our monolithic manifests into per-service profiles[3] in
> > >> puppet-tripleo, then consume these profiles via per-service templates in
> > >> tripleo-heat-templates[4][5], and potential further work to enable fully
> > >> composable (including user defined) roles.
> > >>
> > >> Overall there was agreement that the composable services work and puppet
> > >> refactoring are going well, but that we need to improve velocity and get
> > >> more reviewers helping to land the changes.  There was also agreement 
> > >> that
> > >> a sub-team should form temporarily to drive the remaining work[6], that
> > >> we should not land any new features in the "old" template architecture 
> > >> and
> > >> relatedly that tripleo cores should help rebase and convert currently
> > >> under-review changes to the new format where needed to ease the 
> > >> transition.
> > >>
> > >> I described a possible approach to providing fully composable roles that
> > >> uses some template pre-processing (via jinja2)[7], a blueprint and 
> > >> initial
> > >> implementation will be posted soon, but overall the response was 
> > >> positive,
> > >> and it may provide a workable path to fully composable roles that won't
> > >> break upgrades of existing deployments.
> > >>
> > >> ---------------------------------
> > >> Work session (API and TripleO UI)
> > >> ---------------------------------
> > >>
> > >> In this session we disccussed the current status of the TripleO UI, and 
> > >> the
> > >> Mistral based API implementation it depends on.
> > >>
> > >> Overall it's clear there is a lot of good progress in this area, but 
> > >> there
> > >> are some key areas which require focus and additional work to enable a
> > >> fully functional upstream TripleO UI:
> > >>
> > >> - The undercloud requires some configuration changes to enable the UI
> > >>   necessary access to the undercloud services
> > >>
> > >> - The UI currently depends on the previous prototype API implementation,
> > >>   and must be converted to the new Mistral based API (in-progress)
> > >>
> > >> - We need to improve velocity of the Mistral based implementation (need
> > >>   more testing and reviewing), such that we can land it and folks can 
> > >> start
> > >>   integrating with it.
> > >>
> > >> - There was agreement that the previously proposed validation API can be
> > >>   implemented as another Mistral action, which will provide a way to run
> > >>   validation related to the undercloud configuration/state.
> > >>
> > >> - There are some features we could add to Heat which would make
> > >>   implementation cleaner (description/metadata in environment files, 
> > >> enable
> > >>   multiple parameter groups.
> > >>
> > >> The session concluded with some discussion around the requirements 
> > >> related
> > >> to network configuration.  Currently the templates offer considerable
> > >> flexibility in this regard, and we need to decide how this is surfaced 
> > >> via
> > >> the API such that it's easily consumable via TripleO Ux interfaces.
> > >>
> > >> -----------------------------------
> > >> Work session (Reducing the CI pain)
> > >> -----------------------------------
> > >>
> > >> This session covered a few topics, but mostly ended up focussed on the
> > >> debate with infra regarding moving to 3rd party CI.  There are arguments 
> > >> on
> > >> both sides here, and I'll perhaps let derekh or dprince reply with a more
> > >> detailed discussion of them, but suffice to say there wasn't a clear
> > >> conclusion, and discussion is ongoing.
> > >>
> > > It was mostly me pushing for tripleo to move to 3rd party CI.  I still 
> > > think it
> > > is the right place for tripleo however after hearing dprince's concerns I 
> > > think
> > > we have a compromise for the moment. I've gone a head and done the work to
> > > upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] 
> > > produced by
> > > openstack-infra. Please take a moment to review the patch as it exposed 3
> > > issues.
> > >
> > > 1) CentOS 7 does not support nbd out of the box, and we can't compile a 
> > > new
> > > kernel ATM. So, I've worked around the problem by converting the qcow2 
> > > image to
> > > raw format, update instack and reconverted it back to qcow2.  Ideally, if 
> > > I can
> > > find where the instack.qcow2 image is build, we also produce a raw format 
> > > so we
> > > don't have to do this every gate job.
> > 
> > The conversion should be ok for the moment to allow use to make
> > progress, longer term
> > we'll probably need to change the libvirt domain definitions on the
> > testenvs in order to
> > be able to just generate and use a raw format.
> > 
> > >
> > > 2) Jenkins slave needs more HDD space. Using centos-7 we cache data to 
> > > the slave
> > > now, mostly packages and git repos.  As a result the HDD starts at 7.5GB 
> > > and
> > > because the current slaves use 20GB we quickly run out of space.  Ideally 
> > > we
> > > need 80GB[2] of space to be consistent with the other cloud provides we 
> > > run
> > > jenkins slaves on.
> > 
> > This is where we'll likely hit the biggest problems, In order to bump
> > the disk space allocated to the jenkins slaves and to simultaneously
> > take advantage of the SSD's we're going to have to look into using the
> > SSD's as a cache for the spinning disks. I havn't done this before but
> > I hope we can look into it soon.
> > 
Looks like we just ran out of space again on centos-7 DIB. 7GB for /opt/git,
10+GB for devstack-gate and the rest is converting the iamge from qcow2 to raw
and back.


Is there a diagram of how the cloud is deployed and resources? I'm having
trouble trying to figure out the setup of everything.

> > >
> > > 3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 
> > > dib,
> > > openstack-infra has an AFS mirroring infrastructure in place.  As a 
> > > result,
> > > we'll also need to launch one in tripleo-ci.  For the moment, I've 
> > > disabled the
> > > logic to configure the mirror.  Mirrors include pypi, npm, wheel, ubuntu 
> > > trusty,
> > > ubuntu precise, ceph.  We are bringing RPM mirrors online shortly.
> > 
> > I'm not sure we'll get as much a benefit from this as the devstack
> > based jobs do, as is some of the mirrors you mention wouldn't be used
> > at all while others we would only make very light use of. Is it
> > possible to selectively add mirrors to the AFS mirror, or add
> > additional things that tripleo would be interested in? e.g. image
> > cache
> > 
> I think you'll actually benefit from this, mostly because you no longer have 
> to
> run your own mirror / squid servers in tripleo.  The way AFS mirrors work is
> more like a cache.  
> 
> Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
> jobs only access a small fraction of the data, most mirror AFS servers are 
> only
> using about 5GB of data locally.
> 
> In the case of tripleo, it will even be less since you are not running the 
> full
> suite of job in your cloud.
> 
> Right now, nothing would need to chance to selectively use mirrors, because
> AFS will only cache what is used.  As for adding things specific to tripleo, 
> it
> could be possible, it is also possible other jobs will likely need the same 
> bits
> too.
> 
> I strongly encourage us to setup an AFS mirror.
> 

Any feedback here? I'd like to finish off this work if possible this week, but
we seem to be in a holding pattern on this.

> > >
> > > I'd really like to get some feedback on these 3 issue, I know they might 
> > > not be
> > > solved today because of the hardware move.  However, I think we are 
> > > pretty close
> > > now to getting triplo-ci more inline with some of the openstack-infra 
> > > tooling.
> > >
> > > [1] https://review.openstack.org/#/c/312725/
> > > [2] https://review.openstack.org/#/c/312992/
> > > [3] https://review.openstack.org/#/c/312058/
> > >
> > >> The other output from this session was agreement that we'd move our jobs 
> > >> to
> > >> a different cloud (managed by the RDO community) ahead of a planned
> > >> relocation of our current hardware.  This has advantages in terms of
> > >> maintenance overhead, and if it all goes well we can contribute our
> > >> hardware to this cloud long term vs maintaining our own infrastructure.
> > >>
> > >>
> > >> Overall it was an excellent week, and I thank all the session 
> > >> participants
> > >> for their input and discussion.  Further notes can be found in the
> > >> etherpads linked from [1] but feel free to reply if specific items 
> > >> require
> > >> clarification (and/or I've missed anything!)
> > >>
> > >> Thanks,
> > >>
> > >> Steve
> > >>
> > >> [1] 
> > >> https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
> > >> [2] https://review.openstack.org/#/c/299628/
> > >> [3] 
> > >> https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
> > >> [4] 
> > >> https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
> > >> [5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
> > >> [6] 
> > >> http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
> > >> [7] http://paste.fedoraproject.org/360836/87416814/
> > >>
> > >> __________________________________________________________________________
> > >> OpenStack Development Mailing List (not for usage questions)
> > >> Unsubscribe: 
> > >> [email protected]?subject:unsubscribe
> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > > __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: [email protected]?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: [email protected]?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Austin summit - session recap/summary

Reply via email to