On Thu, May 19, 2016 at 03:50:15PM +0100, Derek Higgins wrote: > On 18 May 2016 at 13:34, Paul Belanger <pabelan...@redhat.com> wrote: > > On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote: > >> On 6 May 2016 at 14:18, Paul Belanger <pabelan...@redhat.com> wrote: > >> > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote: > >> >> Hi all, > >> >> > >> >> Some folks have requested a summary of our summit sessions, as has been > >> >> provided for some other projects. > >> >> > >> >> I'll probably go into more detail on some of these topics either via > >> >> subsequent more focussed threads an/or some blog posts but what follows > >> >> is > >> >> an overview of our summit sessions[1] with notable actions or decisions > >> >> highlighted. I'm including some of my own thoughts and conclusions, > >> >> folks > >> >> are welcome/encouraged to follow up with their own clarifications or > >> >> different perspectives :) > >> >> > >> >> TripleO had a total of 5 sessions in Austin I'll cover them one-by-one: > >> >> > >> >> ------------------------------------- > >> >> Upgrades - current status and roadmap > >> >> ------------------------------------- > >> >> > >> >> In this session we discussed the current state of upgrades - initial > >> >> support for full major version upgrades has been implemented, but the > >> >> implementation is monolithic, highly coupled to pacemaker, and > >> >> inflexible > >> >> with regard to third-party extraconfig changes. > >> >> > >> >> The main outcomes were that we will add support for more granular > >> >> definition of the upgrade lifecycle to the new composable services > >> >> format, > >> >> and that we will explore moving towards the proposed lightweight HA > >> >> architecture to reduce the need for so much pacemaker specific logic. > >> >> > >> >> We also agreed that investigating use of mistral to drive upgrade > >> >> workflows > >> >> was a good idea - currently we have a mixture of scripts combined with > >> >> Heat > >> >> to drive the upgrade process, and some refactoring into discrete mistral > >> >> workflows may provide a more maintainable solution. Potential for using > >> >> the existing SoftwareDeployment approach directly via mistral (outside > >> >> of > >> >> the heat templates) was also discussed as something to be further > >> >> investigated and prototyped. > >> >> > >> >> We also touched on the CI implications of upgrades - we've got an > >> >> upgrades > >> >> job now, but we need to ensure coverage of full release-to-release > >> >> upgrades > >> >> (not just commit to commit). > >> >> > >> >> ------------------------------- > >> >> Containerization status/roadmap > >> >> ------------------------------- > >> >> > >> >> In this session we discussed the current status of containers in TripleO > >> >> (which is to say, the container based compute node which deploys > >> >> containers > >> >> via Heat onto an an Atomic host node that is also deployed via Heat), > >> >> and > >> >> what strategy is most appropriate to achieve a fully containerized > >> >> TripleO > >> >> deployment. > >> >> > >> >> Several folks from Kolla participated in the session, and there was > >> >> significant focus on where work may happen such that further > >> >> collaboration > >> >> between communities is possible. To some extent this discussion on > >> >> where > >> >> (as opposed to how) proved a distraction and prevented much discussion > >> >> on > >> >> supportable architectural implementation for TripleO, thus what follows > >> >> is > >> >> mostly my perspective on the issues that exist: > >> >> > >> >> Significant uncertainty exists wrt integration between Kolla and > >> >> TripleO - > >> >> there's largely consensus that we want to consume the container images > >> >> defined by the Kolla community, but much less agreement that we can > >> >> feasably switch to the ansible-orchestrated deployment/config flow > >> >> supported by Kolla without breaking many of our primary operator > >> >> interfaces > >> >> in a fundamentally unacceptable way, for example: > >> >> > >> >> - The Mistral based API is being implemented on the expectation that the > >> >> primary interface to TripleO deployments is a parameters schema > >> >> exposed > >> >> by a series of Heat templates - this is no longer true in a "split > >> >> stack" > >> >> model where we have to hand off to an alternate service orchestration > >> >> tool. > >> >> > >> >> - The tripleo-ui (based on the Mistral based API) consumes heat > >> >> parameter > >> >> schema to build it's UI, and Ansible doesn't support the necessary > >> >> parameter schema definition (such as types and descriptions) to enable > >> >> this pattern to be replicated. Ansible also doesn't provide a HTTP > >> >> API, > >> >> so we'd still have to maintain and API surface for the (non python) > >> >> UI to > >> >> consume. > >> >> > >> >> We also discussed ideas around integration with kubernetes (a hot topic > >> >> on > >> >> the Kolla track this summit), but again this proved inconclusive beyond > >> >> that yes someone should try developing a PoC to stimulate further > >> >> discussion. Again, significant challenges exist: > >> >> > >> >> - We still need to maintain the Heat parameter interfaces for the > >> >> API/UI, > >> >> and there is also a strong preference to maintain puppet as a tool for > >> >> generating service configuration (so that existing operator > >> >> integrations > >> >> via puppet continue to function) - this is a barrier to directly > >> >> consuming the kolla-kubernetes effort directly. > >> >> > >> >> - A COE layer like kubernetes is a poor fit for deployments where > >> >> operators > >> >> require strict control of service placement (e.g exactly which nodes > >> >> a service > >> >> runs on, IP address assignments to specific nodes etc) - this is > >> >> already > >> >> a strong requirement for TripleO users and we need to figure out > >> >> if/how > >> >> it's possible to control container placement per node/namespace. > >> >> > >> >> - There are several uncertainties regarding the HA architecture, such as > >> >> how do we achieve fencing for nodes (which is currently provided via > >> >> pacemaker), in particular the HA model for real production deployments > >> >> via kubernetes for stateful services such as rabbit/galera is unclear. > >> >> > >> >> Overall a session with much discussion, but further prototyping and > >> >> discussion is required before we can define a definitive implementation > >> >> strategy (several folks are offering to be involved in this). > >> >> > >> >> --------------------------------------------- > >> >> Work session (Composable Services and beyond) > >> >> --------------------------------------------- > >> >> > >> >> In this session we discussed the status of the currently in-progress > >> >> work > >> >> to decompose our monolithic manifests into per-service profiles[3] in > >> >> puppet-tripleo, then consume these profiles via per-service templates in > >> >> tripleo-heat-templates[4][5], and potential further work to enable fully > >> >> composable (including user defined) roles. > >> >> > >> >> Overall there was agreement that the composable services work and puppet > >> >> refactoring are going well, but that we need to improve velocity and get > >> >> more reviewers helping to land the changes. There was also agreement > >> >> that > >> >> a sub-team should form temporarily to drive the remaining work[6], that > >> >> we should not land any new features in the "old" template architecture > >> >> and > >> >> relatedly that tripleo cores should help rebase and convert currently > >> >> under-review changes to the new format where needed to ease the > >> >> transition. > >> >> > >> >> I described a possible approach to providing fully composable roles that > >> >> uses some template pre-processing (via jinja2)[7], a blueprint and > >> >> initial > >> >> implementation will be posted soon, but overall the response was > >> >> positive, > >> >> and it may provide a workable path to fully composable roles that won't > >> >> break upgrades of existing deployments. > >> >> > >> >> --------------------------------- > >> >> Work session (API and TripleO UI) > >> >> --------------------------------- > >> >> > >> >> In this session we disccussed the current status of the TripleO UI, and > >> >> the > >> >> Mistral based API implementation it depends on. > >> >> > >> >> Overall it's clear there is a lot of good progress in this area, but > >> >> there > >> >> are some key areas which require focus and additional work to enable a > >> >> fully functional upstream TripleO UI: > >> >> > >> >> - The undercloud requires some configuration changes to enable the UI > >> >> necessary access to the undercloud services > >> >> > >> >> - The UI currently depends on the previous prototype API implementation, > >> >> and must be converted to the new Mistral based API (in-progress) > >> >> > >> >> - We need to improve velocity of the Mistral based implementation (need > >> >> more testing and reviewing), such that we can land it and folks can > >> >> start > >> >> integrating with it. > >> >> > >> >> - There was agreement that the previously proposed validation API can be > >> >> implemented as another Mistral action, which will provide a way to run > >> >> validation related to the undercloud configuration/state. > >> >> > >> >> - There are some features we could add to Heat which would make > >> >> implementation cleaner (description/metadata in environment files, > >> >> enable > >> >> multiple parameter groups. > >> >> > >> >> The session concluded with some discussion around the requirements > >> >> related > >> >> to network configuration. Currently the templates offer considerable > >> >> flexibility in this regard, and we need to decide how this is surfaced > >> >> via > >> >> the API such that it's easily consumable via TripleO Ux interfaces. > >> >> > >> >> ----------------------------------- > >> >> Work session (Reducing the CI pain) > >> >> ----------------------------------- > >> >> > >> >> This session covered a few topics, but mostly ended up focussed on the > >> >> debate with infra regarding moving to 3rd party CI. There are > >> >> arguments on > >> >> both sides here, and I'll perhaps let derekh or dprince reply with a > >> >> more > >> >> detailed discussion of them, but suffice to say there wasn't a clear > >> >> conclusion, and discussion is ongoing. > >> >> > >> > It was mostly me pushing for tripleo to move to 3rd party CI. I still > >> > think it > >> > is the right place for tripleo however after hearing dprince's concerns > >> > I think > >> > we have a compromise for the moment. I've gone a head and done the work > >> > to > >> > upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] > >> > produced by > >> > openstack-infra. Please take a moment to review the patch as it exposed 3 > >> > issues. > >> > > >> > 1) CentOS 7 does not support nbd out of the box, and we can't compile a > >> > new > >> > kernel ATM. So, I've worked around the problem by converting the qcow2 > >> > image to > >> > raw format, update instack and reconverted it back to qcow2. Ideally, > >> > if I can > >> > find where the instack.qcow2 image is build, we also produce a raw > >> > format so we > >> > don't have to do this every gate job. > >> > >> The conversion should be ok for the moment to allow use to make > >> progress, longer term > >> we'll probably need to change the libvirt domain definitions on the > >> testenvs in order to > >> be able to just generate and use a raw format. > >> > >> > > >> > 2) Jenkins slave needs more HDD space. Using centos-7 we cache data to > >> > the slave > >> > now, mostly packages and git repos. As a result the HDD starts at 7.5GB > >> > and > >> > because the current slaves use 20GB we quickly run out of space. > >> > Ideally we > >> > need 80GB[2] of space to be consistent with the other cloud provides we > >> > run > >> > jenkins slaves on. > >> > >> This is where we'll likely hit the biggest problems, In order to bump > >> the disk space allocated to the jenkins slaves and to simultaneously > >> take advantage of the SSD's we're going to have to look into using the > >> SSD's as a cache for the spinning disks. I havn't done this before but > >> I hope we can look into it soon. > >> > >> > > >> > 3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 > >> > dib, > >> > openstack-infra has an AFS mirroring infrastructure in place. As a > >> > result, > >> > we'll also need to launch one in tripleo-ci. For the moment, I've > >> > disabled the > >> > logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu > >> > trusty, > >> > ubuntu precise, ceph. We are bringing RPM mirrors online shortly. > >> > >> I'm not sure we'll get as much a benefit from this as the devstack > >> based jobs do, as is some of the mirrors you mention wouldn't be used > >> at all while others we would only make very light use of. Is it > >> possible to selectively add mirrors to the AFS mirror, or add > >> additional things that tripleo would be interested in? e.g. image > >> cache > >> > > I think you'll actually benefit from this, mostly because you no longer > > have to > > run your own mirror / squid servers in tripleo. The way AFS mirrors work is > > more like a cache. > > > > Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our > > jobs only access a small fraction of the data, most mirror AFS servers are > > only > > using about 5GB of data locally. > > > > In the case of tripleo, it will even be less since you are not running the > > full > > suite of job in your cloud. > > > > Right now, nothing would need to chance to selectively use mirrors, because > > AFS will only cache what is used. As for adding things specific to > > tripleo, it > > could be possible, it is also possible other jobs will likely need the same > > bits > > too. > > > > I strongly encourage us to setup an AFS mirror. > > Ok, I'm still a little skeptical because our biggest bandwidth hogs > arn't mentioned in the list of things mirrored , but that's not a good > reason to get in your way, if it proves to be a help then great, if > not at least we tried, so what do you need from me to try it out? If I > create a d1.medium trusty instance with a floating IP, will that work > for you? This should allow you to test things for now, longer term > were going to have the same problems we do with larger jenkins > instance so until we solve this we wont be able to consider this a > permanent part of the infrastructure. > I just need to know the flavor we are using, I'll be using our opentack-infra/system-config launch-node script to provision the server. Since we need to loop it into our ansible / puppet wheel.
If you are okay with d1.medium for now, I can start it. > > > >> > > >> > I'd really like to get some feedback on these 3 issue, I know they might > >> > not be > >> > solved today because of the hardware move. However, I think we are > >> > pretty close > >> > now to getting triplo-ci more inline with some of the openstack-infra > >> > tooling. > >> > > >> > [1] https://review.openstack.org/#/c/312725/ > >> > [2] https://review.openstack.org/#/c/312992/ > >> > [3] https://review.openstack.org/#/c/312058/ > >> > > >> >> The other output from this session was agreement that we'd move our > >> >> jobs to > >> >> a different cloud (managed by the RDO community) ahead of a planned > >> >> relocation of our current hardware. This has advantages in terms of > >> >> maintenance overhead, and if it all goes well we can contribute our > >> >> hardware to this cloud long term vs maintaining our own infrastructure. > >> >> > >> >> > >> >> Overall it was an excellent week, and I thank all the session > >> >> participants > >> >> for their input and discussion. Further notes can be found in the > >> >> etherpads linked from [1] but feel free to reply if specific items > >> >> require > >> >> clarification (and/or I've missed anything!) > >> >> > >> >> Thanks, > >> >> > >> >> Steve > >> >> > >> >> [1] > >> >> https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO > >> >> [2] https://review.openstack.org/#/c/299628/ > >> >> [3] > >> >> https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests > >> >> [4] > >> >> https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles > >> >> [5] https://etherpad.openstack.org/p/tripleo-composable-roles-work > >> >> [6] > >> >> http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html > >> >> [7] http://paste.fedoraproject.org/360836/87416814/ > >> >> > >> >> __________________________________________________________________________ > >> >> OpenStack Development Mailing List (not for usage questions) > >> >> Unsubscribe: > >> >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > >> > __________________________________________________________________________ > >> > OpenStack Development Mailing List (not for usage questions) > >> > Unsubscribe: > >> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > >> __________________________________________________________________________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev