On 02/26/2015 05:41 PM, Zaro wrote: > Thanks Jim. This makes a lot of sense and will hopefully make things > simpler and more robust. > > Just a few questions:
I am not Jim - but I'm going to answer anyway ... > 1. It looks like zuul can request a specific set of nodes for a job. Do > you envision the typical ansible playbook to install additional things > required for the jobs? or would zuul always need to request a suitable node > for the job? I think we're leaning towards fewer types of images - so I'd expect playbooks to specialize an image as step one. This is, of course, similar to what many jobs do today already, with installation steps before revoke-sudo is run. > 2. Would there be a way to share environment variables across multiple > shell tasks? For example would it be possible to reference a variable > defined in the job yaml file from inside of a shell script? Yes - although it might not be specifically environment variables. Sharing variables from one task to another or taking the output variables from one task and referencing them as part of a subsequent task are well supported (I can even show you examples of doing this in the recent launch_node work if you wanna see what it looks like) > -Khai > > > On Thu, Feb 26, 2015 at 8:59 AM, James E. Blair <[email protected]> wrote: > >> Hi, >> >> I've been wanting to make some structural changes to Zuul to round it >> out into a coherent system. I don't want to change it too much, but I'd >> also like a clean break with some of the baggage we've been carrying >> around from earlier decisions, and I want it to be able to continue to >> scale up (the config in particular is getting hard to manage with >500 >> projects). >> >> I've batted a few ideas around with Monty, and I've written up my >> thoughts below. This is mostly a narrative exploration of what I think >> it should look like. This is not exhaustive, but I think it explores >> most of the major ideas. The next step is to turn this into a spec and >> start iterating on it and getting more detailed. >> >> I'm posting this here first for discussion to see if there are any >> major conceptual things that we should address before we get into more >> detailed spec review. Please let me know what you think. >> >> -Jim >> >> ======= >> Goals >> ======= >> >> Make zuul scale to thousands of projects. >> Make Zuul more multi-tenant friendly. >> Make it easier to express complex scenarios in layout. >> Make nodepool more useful for non virtual nodes. >> Make nodepool more efficient for multi-node tests. >> Remove need for long-running slaves. >> Make it easier to use Zuul for continuous deployment. >> >> To accomplish this, changes to Zuul's configuration syntax are >> proposed, making it simpler to manage large number of jobs and >> projects, along with a new method of describing and running jobs, and >> a new system for node distribution with Nodepool. >> >> ===================== >> Changes To Nodepool >> ===================== >> >> Nodepool should be made to support explicit node requests and >> releases. That is to say, it should act more like its name -- a node >> pool. >> >> Rather than having servers add themselves to the pool by waiting for >> them (or Jenkins on their behalf) to register with gearman, nodepool >> should instead define functions to supply nodes on demand. For >> example it might define the gearman functions "get-nodes" and >> "put-nodes". Zuul might request a node for a job by submitting a >> "get-nodes" job with the node type (eg "precise") as an argument. It >> could request two nodes together (in the same AZ) by supplying more >> than one node type in the same call. When complete, it could call >> "put-nodes" with the node identifiers to instruct nodepool to return >> them (nodepool might then delete, rebuild, etc). >> >> This model is much more efficient for multi-node tests, where we will >> no longer need to have special multinode labels. Instead the >> multinode configuration can be much more ad-hoc and vary per job. >> >> The testenv broker used by tripleo behaves somewhat in this manner >> (though it only supports static sets of resources). It also has logic >> to deal with the situation where Zuul might exit unexpectedly and not >> return nodes (though it should strive to do so). This feature in the >> broker should be added to nodepool. Additionally, nodepool should >> support fully static resources (they should become just another node >> type) so that it can handle the use case of the test broker. >> >> ================= >> Changes To Zuul >> ================= >> >> Zuul is currently fundamentally a single-tenant application. Some >> folks want to use it in a multi-tenant environment. Even within >> OpenStack, we have use for multitenancy. OpenStack might be one >> tenant, and each stackforge project might be another. Even if the big >> tent discussion renders that thinking obsolete, we may still want the >> kind of separation multi-tenancy can provide. The proposed >> implementation is flexible enough to run Zuul completely single tenant >> with shared everything, completely multi-tenant with shared nothing, and >> everything in-between. Being able to adjust just how much is shared or >> required, and how much can be left to individual projects will be very >> useful. >> >> To support this, the main configuration should define tenants, and >> tenants should specify config files to include. These include files >> should define pipelines, jobs, and projects, all of which are >> namespaced to the tenant (so different tenants may have different jobs >> with the same names):: >> >> ### main.yaml >> - tenant: >> name: openstack >> include: >> - global_config.yaml >> - openstack.yaml >> >> Files may be included by more than one tenant, so common items can be >> placed in a common file and referenced globally. This means that for, >> eg, OpenStack, we can define pipelines and our base job definitions >> (with logging info, etc) once, and include them in all of our tenants:: >> >> ### main.yaml (continued) >> - tenant: >> name: openstack-infra >> include: >> - global_config.yaml >> - infra.yaml >> >> A tenant may optionally specify repos from which it may derive its >> configuration. In this manner, a repo may keep its Zuul configuration >> within its own repo. This would only happen if the main configuration >> file specified that it is permitted:: >> >> ### main.yaml (continued) >> - tenant: >> name: random-stackforge-project >> include: >> - global_config.yaml >> repos: >> - stackforge/random # Specific project config is in-repo >> >> Jobs defined in-repo may not have access to the full feature set >> (including some authorization features). They also may not override >> existing jobs. >> >> Job definitions continue to have the features in the current Zuul >> layout, but they also take on some of the responsibilities currently >> handled by the Jenkins (or other worker) definition:: >> >> ### global_config.yaml >> # Every tenant in the system has access to these jobs (because their >> # tenant definition includes it). >> - job: >> name: base >> timeout: 30m >> node: precise # Just a variable for later use >> nodes: # The operative list of nodes >> - name: controller >> image: {node} # Substitute the variable >> auth: # Auth may only be defined in central config, not in-repo >> swift: >> - container: logs >> pre-run: # These specify what to run before and after the job >> - zuul-cloner >> post-run: >> - archive-logs >> >> Jobs have inheritance, and the above definition provides a base level >> of functionality for all jobs. It sets a default timeout, requests a >> single node (of type precise), and requests swift credentials to >> upload logs. Further jobs may extend and override these parameters:: >> >> ### global_config.yaml (continued) >> # The python 2.7 unit test job >> - job: >> name: python27 >> parent: base >> node: trusty >> >> Our use of job names specific to projects is a holdover from when we >> wanted long-lived slaves on jenkins to efficiently re-use workspaces. >> This hasn't been necessary for a while, though we have used this to >> our advantage when collecting stats and reports. However, job >> configuration can be simplified greatly if we simply have a job that >> runs the python 2.7 unit tests which can be used for any project. To >> the degree that we want to know how often this job failed on nova, we >> can add that information back in when reporting statistics. Jobs may >> have multiple aspects to accomodate differences among branches, etc.:: >> >> ### global_config.yaml (continued) >> # Version that is run for changes on stable/icehouse >> - job: >> name: python27 >> parent: base >> branch: stable/icehouse >> node: precise >> >> # Version that is run for changes on stable/juno >> - job: >> name: python27 >> parent: base >> branch: stable/juno # Could be combined into previous with regex >> node: precise # if concept of "best match" is defined >> >> Jobs may specify that they require more than one node:: >> >> ### global_config.yaml (continued) >> - job: >> name: devstack-multinode >> parent: base >> node: trusty # could do same branch mapping as above >> nodes: >> - name: controller >> image: {node} >> - name: compute >> image: {node} >> >> Jobs defined centrally (i.e., not in-repo) may specify auth info:: >> >> ### global_config.yaml (continued) >> - job: >> name: pypi-upload >> parent: base >> auth: >> password: >> pypi-password: pypi-password >> # This looks up 'pypi-password' from an encrypted yaml file >> # and adds it into variables for the job >> >> Pipeline definitions are similar to the current syntax, except that it >> supports specifying additional information for jobs in the context of >> a given project and pipeline. For instance, rather than specifying >> that a job is globally non-voting, you may specify that it is >> non-voting for a given project in a given pipeline:: >> >> ### openstack.yaml >> - project: >> name: openstack/nova >> gate: >> queue: integrated # Shared queues are manually built >> jobs: >> - python27 # Runs version of job appropriate to branch >> - devstack >> - devstack-deprecated-feature: >> branch: stable/juno # Only run on stable/juno changes >> voting: false # Non-voting >> post: >> jobs: >> - tarball: >> jobs: >> - pypi-upload >> >> Currently unique job names are used to build shared change queues. >> Since job names will no longer be unique, shared queues must be >> manually constructed by assigning them a name. Projects with the same >> queue name for the same pipeline will have a shared queue. >> >> A subset of functionality is avaible to projects that are permitted to >> use in-repo configuration:: >> >> ### stackforge/random/.zuul.yaml >> - job: >> name: random-job >> parent: base # From global config; gets us logs >> node: precise >> >> - project: >> name: stackforge/random >> gate: >> jobs: >> - python27 # From global config >> - random-job # Flom local config >> >> The executable content of jobs should be defined as ansible playbooks. >> Playbooks can be fairly simple and might consist of little more than >> "run this shell script" for those who are not otherwise interested in >> ansible:: >> >> ### stackforge/random/playbooks/random-job.yaml >> --- >> hosts: controller >> tasks: >> - shell: run_some_tests.sh >> >> Global jobs may define ansible roles for common functions:: >> >> ### openstack-infra/zuul-playbooks/python27.yaml >> --- >> hosts: controller >> roles: >> - tox: >> env: py27 >> >> Because ansible has well-articulated multi-node orchestration >> features, this permits very expressive job definitions for multi-node >> tests. A playbook can specify different roles to apply to the >> different nodes that the job requested:: >> >> ### openstack-infra/zuul-playbooks/devstack-multinode.yaml >> --- >> hosts: controller >> roles: >> - devstack >> --- >> hosts: compute >> roles: >> - devstack-compute >> >> Additionally, if a project is already defining ansible roles for its >> deployment, then those roles may be easily applied in testing, making >> CI even closer to CD. Finally, to make Zuul more useful for CD, Zuul >> may be configured to run a job (ie, ansible role) on a specific node. >> >> The pre- and post-run entries in the job definition might also apply >> to ansible playbooks and can be used to simplify job setup and >> cleanup:: >> >> ### openstack-infra/zuul-playbooks/zuul-cloner.yaml >> --- >> hosts: all >> roles: >> - zuul-cloner: {{zuul}} >> >> Where the zuul variable is a dictionary containing all the information >> currently transmitted in the ZUUL_* environment variables. Similarly, >> the log archiving script can copy logs from the host to swift. >> >> A new Zuul component would be created to execute jobs. Rather than >> running a worker process on each node (which requires installing >> software on the test node, and establishing and maintaining network >> connectivity back to Zuul, and the ability to coordinate actions across >> nodes for multi-node tests), this new component will accept jobs from >> Zuul, and for each one, write an ansible inventory file with the node >> and variable information, and then execute the ansible playbook for that >> job. This means that the new Zuul component will maintain ssh >> connections to all hosts currently running a job. This could become a >> bottleneck, but ansible and ssh have been known to scale to a large >> number of simultaneous hosts, and this component may be scaled >> horizontally. It should be simple enough that it could even be >> automatically scaled if needed. In turn, however, this does make node >> configuration simpler (test nodes need only have an ssh public key >> installed) and makes tests behave more like deployment. >> >> _______________________________________________ >> OpenStack-Infra mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra >> > > > > _______________________________________________ > OpenStack-Infra mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
