Re: [OpenStack-Infra] Thoughts on evolving Zuul

Jay Pipes Sat, 28 Feb 2015 08:47:09 -0800

Jim, great stuff. A couple suggestions inline :)

On 02/26/2015 09:59 AM, James E. Blair wrote:

A tenant may optionally specify repos from which it may derive its
configuration.  In this manner, a repo may keep its Zuul configuration
within its own repo.  This would only happen if the main configuration
file specified that it is permitted::


   ### main.yaml (continued)
   - tenant:
       name: random-stackforge-project
       include:
        - global_config.yaml
       repos:
        - stackforge/random  # Specific project config is in-repo

Might I suggest that instead of a repos: YAML block, that instead, theinclude: YAML block allow URIs. So, to support some random Zuul configin a stackforge repo, you could do:


include:
 - global_config.yaml
 - https://git.openstack.org/stackforge/random/tools/zuul.yml

That would make the configuration simpler, I think.

Jobs defined in-repo may not have access to the full feature set
(including some authorization features).  They also may not override
existing jobs.

Job definitions continue to have the features in the current Zuul
layout, but they also take on some of the responsibilities currently
handled by the Jenkins (or other worker) definition::

   ### global_config.yaml
   # Every tenant in the system has access to these jobs (because their
   # tenant definition includes it).
   - job:
       name: base
       timeout: 30m
       node: precise   # Just a variable for later use
       nodes:  # The operative list of nodes
        - name: controller
          image: {node}  # Substitute the variable
       auth:  # Auth may only be defined in central config, not in-repo
        swift:
          - container: logs
       pre-run:  # These specify what to run before and after the job
        - zuul-cloner
       post-run:
        - archive-logs

++

Jobs have inheritance, and the above definition provides a base level
of functionality for all jobs.  It sets a default timeout, requests a
single node (of type precise), and requests swift credentials to
upload logs.  Further jobs may extend and override these parameters::

   ### global_config.yaml (continued)
   # The python 2.7 unit test job
   - job:
       name: python27
       parent: base
       node: trusty


Yes, this is great :)

Our use of job names specific to projects is a holdover from when we
wanted long-lived slaves on jenkins to efficiently re-use workspaces.
This hasn't been necessary for a while, though we have used this to
our advantage when collecting stats and reports.  However, job
configuration can be simplified greatly if we simply have a job that
runs the python 2.7 unit tests which can be used for any project.  To
the degree that we want to know how often this job failed on nova, we
can add that information back in when reporting statistics.  Jobs may
have multiple aspects to accomodate differences among branches, etc.::

   ### global_config.yaml (continued)
   # Version that is run for changes on stable/icehouse
   - job:
       name: python27
       parent: base
       branch: stable/icehouse
       node: precise

   # Version that is run for changes on stable/juno
   - job:
       name: python27
       parent: base
       branch: stable/juno  # Could be combined into previous with regex
       node: precise        # if concept of "best match" is defined

Jobs may specify that they require more than one node::

   ### global_config.yaml (continued)
   - job:
       name: devstack-multinode
       parent: base
       node: trusty  # could do same branch mapping as above
       nodes:
        - name: controller
          image: {node}
        - name: compute
          image: {node}

Jobs defined centrally (i.e., not in-repo) may specify auth info::

   ### global_config.yaml (continued)
   - job:
       name: pypi-upload
       parent: base
       auth:
        password:
          pypi-password: pypi-password
          # This looks up 'pypi-password' from an encrypted yaml file
          # and adds it into variables for the job

Pipeline definitions are similar to the current syntax, except that it
supports specifying additional information for jobs in the context of
a given project and pipeline.  For instance, rather than specifying
that a job is globally non-voting, you may specify that it is
non-voting for a given project in a given pipeline::

   ### openstack.yaml
   - project:
       name: openstack/nova
       gate:
        queue: integrated  # Shared queues are manually built
        jobs:
          - python27  # Runs version of job appropriate to branch
          - devstack
          - devstack-deprecated-feature:
              branch: stable/juno  # Only run on stable/juno changes
              voting: false  # Non-voting
       post:
        jobs:
          - tarball:
              jobs:
                - pypi-upload

Currently unique job names are used to build shared change queues.
Since job names will no longer be unique, shared queues must be
manually constructed by assigning them a name.  Projects with the same
queue name for the same pipeline will have a shared queue.

A subset of functionality is avaible to projects that are permitted to
use in-repo configuration::

   ### stackforge/random/.zuul.yaml
   - job:
       name: random-job
       parent: base      # From global config; gets us logs
       node: precise

   - project:
       name: stackforge/random
       gate:
        jobs:
          - python27    # From global config
          - random-job  # Flom local config

Again, here I would support URI-based job config directives. Why? Well,let's say that a project has a separate repository that contains job andtest configuration files. You'd be able to set a URI here and continueto keep your job and test configurations separate from the code base...

The executable content of jobs should be defined as ansible playbooks.
Playbooks can be fairly simple and might consist of little more than
"run this shell script" for those who are not otherwise interested in
ansible::

   ### stackforge/random/playbooks/random-job.yaml
   ---
   hosts: controller
   tasks:
     - shell: run_some_tests.sh

Global jobs may define ansible roles for common functions::

   ### openstack-infra/zuul-playbooks/python27.yaml
   ---
   hosts: controller
   roles:
     - tox:
        env: py27

Because ansible has well-articulated multi-node orchestration
features, this permits very expressive job definitions for multi-node
tests.  A playbook can specify different roles to apply to the
different nodes that the job requested::

   ### openstack-infra/zuul-playbooks/devstack-multinode.yaml
   ---
   hosts: controller
   roles:
     - devstack
   ---
   hosts: compute
   roles:
     - devstack-compute

Additionally, if a project is already defining ansible roles for its
deployment, then those roles may be easily applied in testing, making
CI even closer to CD.  Finally, to make Zuul more useful for CD, Zuul
may be configured to run a job (ie, ansible role) on a specific node.

The pre- and post-run entries in the job definition might also apply
to ansible playbooks and can be used to simplify job setup and
cleanup::

   ### openstack-infra/zuul-playbooks/zuul-cloner.yaml
   ---
   hosts: all
   roles:
     - zuul-cloner: {{zuul}}

Where the zuul variable is a dictionary containing all the information
currently transmitted in the ZUUL_* environment variables.  Similarly,
the log archiving script can copy logs from the host to swift.

A new Zuul component would be created to execute jobs.  Rather than
running a worker process on each node (which requires installing
software on the test node, and establishing and maintaining network
connectivity back to Zuul, and the ability to coordinate actions across
nodes for multi-node tests), this new component will accept jobs from
Zuul, and for each one, write an ansible inventory file with the node
and variable information, and then execute the ansible playbook for that
job.  This means that the new Zuul component will maintain ssh
connections to all hosts currently running a job.  This could become a
bottleneck, but ansible and ssh have been known to scale to a large
number of simultaneous hosts, and this component may be scaled
horizontally.  It should be simple enough that it could even be
automatically scaled if needed.  In turn, however, this does make node
configuration simpler (test nodes need only have an ssh public key
installed) and makes tests behave more like deployment.


+100 on the Ansible-related suggested changes. :)

Thanks!
-jay

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Thoughts on evolving Zuul

Reply via email to