Re: [openstack-dev] [TripleO] Is Swift a good choice of database for the TripleO API?

Jiri Tomasek Tue, 05 Jan 2016 09:13:09 -0800

On 12/23/2015 07:40 PM, Steven Hardy wrote:

On Wed, Dec 23, 2015 at 11:05:05AM -0600, Ben Nemec wrote:

On 12/23/2015 10:26 AM, Steven Hardy wrote:

On Wed, Dec 23, 2015 at 09:28:59AM -0600, Ben Nemec wrote:

On 12/23/2015 03:19 AM, Dougal Matthews wrote:


On 22 December 2015 at 17:59, Ben Nemec <[email protected]
<mailto:[email protected]>> wrote:

     Can we just do git like I've been suggesting all along? ;-)

     More serious discussion inline. :-)

     On 12/22/2015 09:36 AM, Dougal Matthews wrote:
     > Hi all,
     >
     > This topic came up in the 2015-12-15 meeting[1], and again briefly
     today.
     > After working with the code that came out of the deployment library
     > spec[2] I
     > had some concerns with how we are storing the templates.
     >
     > Simply put, when we are dealing with 100+ files from
     tripleo-heat-templates
     > how can we ensure consistency in Swift without any atomicity or
     > transactions.
     > I think this is best explained with a couple of examples.
     >
     >  - When we create a new deployment plan (upload all the templates
     to swift)
     >    how do we handle the case where there is an error? For example,
     if we are
     >    uploading 10 files - what do we do if the 5th one fails for
     some reason?
     >    There is a patch to do a manual rollback[3], but I have
     concerns about
     >    doing this in Python. If Swift is completely inaccessible for a
     short
     >    period the rollback wont work either.
     >
     >  - When deploying to Heat, we need to download all the YAML files from
     > Swift.
     >    This can take a couple of seconds. What happens if somebody
     starts to
     >    upload a new version of the plan in the middle? We could end up
     trying to
     >    deploy half old and half new files. We wouldn't have a
     consistent view of
     >    the database.
     >
     > We had a few suggestions in the meeting:
     >
     >  - Add a locking mechanism. I would be concerned about deadlocks or
     > having to
     >    lock for the full duration of a deploy.

     There should be no need to lock the plan for the entire deploy.  It's
     not like we're re-reading the templates at the end of the deploy today.
      It's a one-shot read and then the plan could be unlocked, at least as
     far as I know.


Good point. That would be holding the lock for longer than we need.

     The only option where we wouldn't need locking at all is the
     read-copy-update model Clint mentions, which might be a valid option as
     well.  Whatever we do, there are going to be concurrency issues though.
      For example, what happens if two users try to make updates to the plan
     at the same time?  If you don't either merge the changes or disallow one
     of them completely then one user's changes might be lost.

     TBH, this is further convincing me that we should just make this git
     backed and let git handle the merging and conflict resolution (never
     mind the fact that it gets us a well-understood version control system
     for "free").  For updates that don't conflict with other changes, git
     can merge them automatically, but for merge conflicts you just return a
     rebase error to the user and make them resolve it.  I have a feeling
     this is the behavior we'll converge on eventually anyway, and rather
     than reimplement git, let's just use the real thing.


I'd be curious to hear more how you would go about doing this with git. I've
never automated git to this level, so I am concerned about what issues we
might hit.

TBH I haven't thought it through to that extent yet.  I'm mostly
suggesting it because it seems like a fit for the template storage
requirements - we know we want version control, we want to be able to
merge changes from multiple sources, and we want some way to handle
merge conflicts.  Git does all of this already.

That said, I'm not sure about everything here.  For example, how would
you expose merge conflicts to the user?  I don't know that I would want
to force a user to learn git in order to use TripleO (although that
would be the devops-y thing to do), but maybe just passing them back the
files with the merge conflict markers and having them resolve those
locally and retry the update would work.  I'm not sure how that would
map to the current version of the API though.  Do we provide any way to
pass templates back to the user?  I feel like that was kind of a one-way
street.

What part of the deployment API workflow could result in merge conflicts?

My understanding was that it's something like:

1. Take copy of reference templates tree
2. Introspect tempalates, expose required parameters so user can be
prompted for them
3. Create environment files(s) derived from the user input
4. Validate the combination of (1) and (3)
5. Deploy the templates+environments

On update, (1) would be "overwrite existing version of templates"

This update policy means you may have just blown away someone else's
work, unless you rebase on the plan's templates immediately before
updating (and even then there's a race if two people submit updates at
the same time).

What has been proposed to date is somewhat more limited in scope than what
you're hinting at (which I think is more of a colloborate-on-templates
requirement?)

https://github.com/openstack/tripleo-specs/blob/master/specs/mitaka/tripleo-overcloud-deployment-library.rst

Here, you would expect any template collaboration to happen outside of the
scope of the actual deployment workflow, so e.g step 1 above consumes
either a packaged version of tripleo-heat-templates (which we don't expect
to be routinely modified), or another location on the local filesystem
(such as a repository managed by e.g git, outside of the deployment
workflow).

The "plan" then takes a copy of the golden tree, prompts for additional
inputs, validates and deploys it.

You are right though, if we allow concurrent update of the plan, it's
possible that environments added to two versions of the plan would have to
be merged, which could mean either conflicts or validation errors (if two
operators select mutually exclusive configurations for example).

Possible example: Two operators are working on enabling separate
features in their cloud, and need to make configuration changes to the
plan to do so.  Let's say one decides they need to enable the Storage
network, while the other decides to enable the Tenant network.  The
first operator makes their changes, sends the update and thinks their
work is done.  The second operator, working from the same base set of
templates as the first, makes their changes and sends the update.  Using
the "overwrite" method of conflict resolution the first operator's
changes have just been silently destroyed with no indication to either
user that anything bad happened.

Ok, so separating the two requirements alluded to here may help improve
clarity:

1. Multiple users collaborating on the t-h-t tree as a whole.

2. Enabling multiple features via updates and avoiding mid-air-collisions

I think (2) may simpler problem to consider, particularly if a lock
of some sort is considered acceptable, e.g we explcitly do not allow multiple
operators actively modifying the cloud concurrently.

That would also be consistent with the current heat behavior, e.g even if
you did allow multiple operators to concurrently change a plan, they cannot
concurrently update the overcloud via heat anyway (this will change
eventually with convergence).

(1) is a much harder problem, and I can't help thinking it'd be better
solved with existing tools (e.g document how to use git, gerrit, jenkins &
CI test your own t-h-t tree, potentially allowing for semi-automated
promotion of things between environments, a staging workflow).

I guess you could tell users "don't do that", but unless you have
exactly one person making updates to the templates there's going to be
the possibility of conflicts, and in the Swift case all it takes is two
people editing the same file, even in completely different areas, for
someone's changes to be lost.

Ok, good point, I think I'd been assuming more of a serialized workflow as
a given, so it's definitely something to consider, thanks for clarifying.

Steve

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

To add the information here and maybe (hopefully) clear things a bit,the current workflow does not manipulate the templates and environmentscontent.We only set the metadata about certain templates/environments and createsingle temporary environment file:

1. Upload files (using git, it means provide git url) and identifycapabilities-map file (capabilities_map.yaml) and set it's 'type'metadata to 'capabilities-map'2. based on the capabilities-map information, identify 'root-template'(overcloud.yaml), 'root-environment'(overcloud-resource-registry-puppet.yaml), 'environment'(environments/*.yaml) and store this information in those files 'type'metadata.3. Let user select from optional environments ('type' is 'environment')based on the constraints defined in capabilities-map. Store theinformation about selected environments in 'enabled' meta.4. Generate a list of parameters by sending templates, root-environmentand _enabled_ optional environments to heat-validate (nested). Let userset values for those parameters and store the parameter values in newlycreated temporary environment's parameter_defaults block. Upload thistemplate to Swift and set it's 'type' meta to 'temp-environment'.5. Deploy - take everything from Swift, process templates (to resolvethe urls in get_file etc.) and merge environments in order: rootenvironment < enabled optional environments < temporary environment. Andsend this to Heat API's Stack Create.

So you can see, that we don't really manipulate the template files, wejust add a metadata and create single temporary environment that holdsthe parameter values, although this is not really necessary and can bereplaced by storing the parameter values in DB and then send this as'parameters' param to Heat. I think that storing files in Git is goodidea as it is what we already have (t-h-t) but we probably need to useDB to store the metadata because the metadata are plan-specific, whereasthe Git repository is not (or is it meant to be? That would meancreating separate git repo for every deploymeny attempt.)

To make sure, that Plan is in sync with Git repo (t-h-t) we can createthe Plan is tied to not just specific repository, but also to a specifictag or commit. This way if the user updates the templates repositorywith changes he wants to use, he needs to create a new Plan and startover the deployment process.

Correct me if I am wrong, but I think this approach resolves theproblems with merge conflicts. The Files and Plan (Deployment) areseparate thing - Files are stored in Git and Plan is stored in DB, holdsthe files metadata and is tied to a Git repo commit/tag.

Any changes that involve the changes in templates themself should bedone in Git repo and I am not convinced that we want to introduceanything like that in GUI/CLI deployment workflow, as as it was agreedbefore, Git is best tool for doing/tracking such changes.


Jirka



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Is Swift a good choice of database for the TripleO API?

Reply via email to