Re: [OpenStack-Infra] Thoughts on evolving Zuul

Zaro Thu, 26 Feb 2015 14:47:20 -0800

Thanks Jim.  This makes a lot of sense and will hopefully make things
simpler and more robust.


Just a few questions:
1.  It looks like zuul can request a specific set of nodes for a job.  Do
you envision the typical ansible playbook to install additional things
required for the jobs? or would zuul always need to request a suitable node
for the job?
2. Would there be a way to share environment variables across multiple
shell tasks?  For example would it be possible to reference a variable
defined in the job yaml file from inside of a shell script?

-Khai


On Thu, Feb 26, 2015 at 8:59 AM, James E. Blair <[email protected]> wrote:

> Hi,
>
> I've been wanting to make some structural changes to Zuul to round it
> out into a coherent system.  I don't want to change it too much, but I'd
> also like a clean break with some of the baggage we've been carrying
> around from earlier decisions, and I want it to be able to continue to
> scale up (the config in particular is getting hard to manage with >500
> projects).
>
> I've batted a few ideas around with Monty, and I've written up my
> thoughts below.  This is mostly a narrative exploration of what I think
> it should look like.  This is not exhaustive, but I think it explores
> most of the major ideas.  The next step is to turn this into a spec and
> start iterating on it and getting more detailed.
>
> I'm posting this here first for discussion to see if there are any
> major conceptual things that we should address before we get into more
> detailed spec review.  Please let me know what you think.
>
> -Jim
>
> =======
>  Goals
> =======
>
> Make zuul scale to thousands of projects.
> Make Zuul more multi-tenant friendly.
> Make it easier to express complex scenarios in layout.
> Make nodepool more useful for non virtual nodes.
> Make nodepool more efficient for multi-node tests.
> Remove need for long-running slaves.
> Make it easier to use Zuul for continuous deployment.
>
> To accomplish this, changes to Zuul's configuration syntax are
> proposed, making it simpler to manage large number of jobs and
> projects, along with a new method of describing and running jobs, and
> a new system for node distribution with Nodepool.
>
> =====================
>  Changes To Nodepool
> =====================
>
> Nodepool should be made to support explicit node requests and
> releases.  That is to say, it should act more like its name -- a node
> pool.
>
> Rather than having servers add themselves to the pool by waiting for
> them (or Jenkins on their behalf) to register with gearman, nodepool
> should instead define functions to supply nodes on demand.  For
> example it might define the gearman functions "get-nodes" and
> "put-nodes".  Zuul might request a node for a job by submitting a
> "get-nodes" job with the node type (eg "precise") as an argument.  It
> could request two nodes together (in the same AZ) by supplying more
> than one node type in the same call.  When complete, it could call
> "put-nodes" with the node identifiers to instruct nodepool to return
> them (nodepool might then delete, rebuild, etc).
>
> This model is much more efficient for multi-node tests, where we will
> no longer need to have special multinode labels.  Instead the
> multinode configuration can be much more ad-hoc and vary per job.
>
> The testenv broker used by tripleo behaves somewhat in this manner
> (though it only supports static sets of resources).  It also has logic
> to deal with the situation where Zuul might exit unexpectedly and not
> return nodes (though it should strive to do so).  This feature in the
> broker should be added to nodepool.  Additionally, nodepool should
> support fully static resources (they should become just another node
> type) so that it can handle the use case of the test broker.
>
> =================
>  Changes To Zuul
> =================
>
> Zuul is currently fundamentally a single-tenant application.  Some
> folks want to use it in a multi-tenant environment.  Even within
> OpenStack, we have use for multitenancy.  OpenStack might be one
> tenant, and each stackforge project might be another.  Even if the big
> tent discussion renders that thinking obsolete, we may still want the
> kind of separation multi-tenancy can provide.  The proposed
> implementation is flexible enough to run Zuul completely single tenant
> with shared everything, completely multi-tenant with shared nothing, and
> everything in-between.  Being able to adjust just how much is shared or
> required, and how much can be left to individual projects will be very
> useful.
>
> To support this, the main configuration should define tenants, and
> tenants should specify config files to include.  These include files
> should define pipelines, jobs, and projects, all of which are
> namespaced to the tenant (so different tenants may have different jobs
> with the same names)::
>
>   ### main.yaml
>   - tenant:
>       name: openstack
>       include:
>         - global_config.yaml
>         - openstack.yaml
>
> Files may be included by more than one tenant, so common items can be
> placed in a common file and referenced globally.  This means that for,
> eg, OpenStack, we can define pipelines and our base job definitions
> (with logging info, etc) once, and include them in all of our tenants::
>
>   ### main.yaml (continued)
>   - tenant:
>       name: openstack-infra
>       include:
>         - global_config.yaml
>         - infra.yaml
>
> A tenant may optionally specify repos from which it may derive its
> configuration.  In this manner, a repo may keep its Zuul configuration
> within its own repo.  This would only happen if the main configuration
> file specified that it is permitted::
>
>   ### main.yaml (continued)
>   - tenant:
>       name: random-stackforge-project
>       include:
>         - global_config.yaml
>       repos:
>         - stackforge/random  # Specific project config is in-repo
>
> Jobs defined in-repo may not have access to the full feature set
> (including some authorization features).  They also may not override
> existing jobs.
>
> Job definitions continue to have the features in the current Zuul
> layout, but they also take on some of the responsibilities currently
> handled by the Jenkins (or other worker) definition::
>
>   ### global_config.yaml
>   # Every tenant in the system has access to these jobs (because their
>   # tenant definition includes it).
>   - job:
>       name: base
>       timeout: 30m
>       node: precise   # Just a variable for later use
>       nodes:  # The operative list of nodes
>         - name: controller
>           image: {node}  # Substitute the variable
>       auth:  # Auth may only be defined in central config, not in-repo
>         swift:
>           - container: logs
>       pre-run:  # These specify what to run before and after the job
>         - zuul-cloner
>       post-run:
>         - archive-logs
>
> Jobs have inheritance, and the above definition provides a base level
> of functionality for all jobs.  It sets a default timeout, requests a
> single node (of type precise), and requests swift credentials to
> upload logs.  Further jobs may extend and override these parameters::
>
>   ### global_config.yaml (continued)
>   # The python 2.7 unit test job
>   - job:
>       name: python27
>       parent: base
>       node: trusty
>
> Our use of job names specific to projects is a holdover from when we
> wanted long-lived slaves on jenkins to efficiently re-use workspaces.
> This hasn't been necessary for a while, though we have used this to
> our advantage when collecting stats and reports.  However, job
> configuration can be simplified greatly if we simply have a job that
> runs the python 2.7 unit tests which can be used for any project.  To
> the degree that we want to know how often this job failed on nova, we
> can add that information back in when reporting statistics.  Jobs may
> have multiple aspects to accomodate differences among branches, etc.::
>
>   ### global_config.yaml (continued)
>   # Version that is run for changes on stable/icehouse
>   - job:
>       name: python27
>       parent: base
>       branch: stable/icehouse
>       node: precise
>
>   # Version that is run for changes on stable/juno
>   - job:
>       name: python27
>       parent: base
>       branch: stable/juno  # Could be combined into previous with regex
>       node: precise        # if concept of "best match" is defined
>
> Jobs may specify that they require more than one node::
>
>   ### global_config.yaml (continued)
>   - job:
>       name: devstack-multinode
>       parent: base
>       node: trusty  # could do same branch mapping as above
>       nodes:
>         - name: controller
>           image: {node}
>         - name: compute
>           image: {node}
>
> Jobs defined centrally (i.e., not in-repo) may specify auth info::
>
>   ### global_config.yaml (continued)
>   - job:
>       name: pypi-upload
>       parent: base
>       auth:
>         password:
>           pypi-password: pypi-password
>           # This looks up 'pypi-password' from an encrypted yaml file
>           # and adds it into variables for the job
>
> Pipeline definitions are similar to the current syntax, except that it
> supports specifying additional information for jobs in the context of
> a given project and pipeline.  For instance, rather than specifying
> that a job is globally non-voting, you may specify that it is
> non-voting for a given project in a given pipeline::
>
>   ### openstack.yaml
>   - project:
>       name: openstack/nova
>       gate:
>         queue: integrated  # Shared queues are manually built
>         jobs:
>           - python27  # Runs version of job appropriate to branch
>           - devstack
>           - devstack-deprecated-feature:
>               branch: stable/juno  # Only run on stable/juno changes
>               voting: false  # Non-voting
>       post:
>         jobs:
>           - tarball:
>               jobs:
>                 - pypi-upload
>
> Currently unique job names are used to build shared change queues.
> Since job names will no longer be unique, shared queues must be
> manually constructed by assigning them a name.  Projects with the same
> queue name for the same pipeline will have a shared queue.
>
> A subset of functionality is avaible to projects that are permitted to
> use in-repo configuration::
>
>   ### stackforge/random/.zuul.yaml
>   - job:
>       name: random-job
>       parent: base      # From global config; gets us logs
>       node: precise
>
>   - project:
>       name: stackforge/random
>       gate:
>         jobs:
>           - python27    # From global config
>           - random-job  # Flom local config
>
> The executable content of jobs should be defined as ansible playbooks.
> Playbooks can be fairly simple and might consist of little more than
> "run this shell script" for those who are not otherwise interested in
> ansible::
>
>   ### stackforge/random/playbooks/random-job.yaml
>   ---
>   hosts: controller
>   tasks:
>     - shell: run_some_tests.sh
>
> Global jobs may define ansible roles for common functions::
>
>   ### openstack-infra/zuul-playbooks/python27.yaml
>   ---
>   hosts: controller
>   roles:
>     - tox:
>         env: py27
>
> Because ansible has well-articulated multi-node orchestration
> features, this permits very expressive job definitions for multi-node
> tests.  A playbook can specify different roles to apply to the
> different nodes that the job requested::
>
>   ### openstack-infra/zuul-playbooks/devstack-multinode.yaml
>   ---
>   hosts: controller
>   roles:
>     - devstack
>   ---
>   hosts: compute
>   roles:
>     - devstack-compute
>
> Additionally, if a project is already defining ansible roles for its
> deployment, then those roles may be easily applied in testing, making
> CI even closer to CD.  Finally, to make Zuul more useful for CD, Zuul
> may be configured to run a job (ie, ansible role) on a specific node.
>
> The pre- and post-run entries in the job definition might also apply
> to ansible playbooks and can be used to simplify job setup and
> cleanup::
>
>   ### openstack-infra/zuul-playbooks/zuul-cloner.yaml
>   ---
>   hosts: all
>   roles:
>     - zuul-cloner: {{zuul}}
>
> Where the zuul variable is a dictionary containing all the information
> currently transmitted in the ZUUL_* environment variables.  Similarly,
> the log archiving script can copy logs from the host to swift.
>
> A new Zuul component would be created to execute jobs.  Rather than
> running a worker process on each node (which requires installing
> software on the test node, and establishing and maintaining network
> connectivity back to Zuul, and the ability to coordinate actions across
> nodes for multi-node tests), this new component will accept jobs from
> Zuul, and for each one, write an ansible inventory file with the node
> and variable information, and then execute the ansible playbook for that
> job.  This means that the new Zuul component will maintain ssh
> connections to all hosts currently running a job.  This could become a
> bottleneck, but ansible and ssh have been known to scale to a large
> number of simultaneous hosts, and this component may be scaled
> horizontally.  It should be simple enough that it could even be
> automatically scaled if needed.  In turn, however, this does make node
> configuration simpler (test nodes need only have an ssh public key
> installed) and makes tests behave more like deployment.
>
> _______________________________________________
> OpenStack-Infra mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Thoughts on evolving Zuul

Reply via email to