Re: [OpenStack-Infra] Can we use short domain names for build log servers?

2020-06-02 Thread James E. Blair
Sorin Sbarnea  writes:

> I would like to re-raise an older question: what can we do to avoid
> using human-unfriendly URLs for our build logs?

When was this question previously asked?

> The current setup lead us to some URLs that seems more like a way to
> test client limitations.

What limitations?

> I know that we use object storage from various providers but that
> should not be an excuse for having more human urls, maybe even using
> our own domains.
> Using a CDN does not require ugly logs urls, that is for sure.
>
>
> One random example (not even the worst):
> https://27171abe9707251ada06-40d76dc3f646b86e4453b642950e6efd.ssl.cf2.rackcdn.com/729996/2/check/tox-py35/2c5d394/
>
> Only the domain is >70 chars long, why not having something like 
> logN.opendev.org instead?

Several of our storage providers have unique requirements, including one
which serves data from hundreds of unpredictable domains.

For a more complete understanding of the requirements leading to our
current system, you may want to read this message (and the message it
links to):

  http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000501.html

> Current URLs are backend urls, something that was not designed to be facing 
> the consumer.

I agree.  The URLs that we report to the end user go to the Zuul
dashboard, and we encourage and expect users to browse logs via that
method.  It may then link to direct access to individual log files
(especially files it can't display), but that would only be after the
user has visited OpenDev Zuul's dashboard and will know they are at the
right place (to address your point below).

> How does someone have to guess that this url is linked to openstack of
> opendev in any way? They would have to trust me that it does not
> include a magic blob that would highjack their browser. It is not
> uncommon for me to raise bugs to other opensource projects that never
> heard of zuul. Maybe if we start serving our logs in more friendly
> way, we can also market Zuul CI/CD better.
>
> Why it matters:
>
> - I often browse various log files, even on a hires desktop monitor I
> am unable to read the filename of the log because the window barely
> fits the domain name alone. Not even the changeset seems to fit the
> visible part of the url
> - we need to share links to logs, long ones are impose additional
> problems, including splitting on irc.
> - smaller screens

I agree with all of those points, which is why we have focused on making
the log browser in the dashboard as pleasant and functional as possible.
If you're not being linked to it, or are not using it, I'd love to know
why.  Improving that is the best way to make Zuul more marketable as you
suggest (after all, if OpenDev sets up a complex system to mask the
domain names of log services, that's not necessarily something other
Zuul users can do).

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] OpenDev infra as an OSF pilot project

2020-03-03 Thread James E. Blair
"Clark Boylan"  writes:

> On Sun, Mar 1, 2020, at 11:20 AM, Jonathan Bryce wrote:
>> Hi everyone,
>> 
>> I saw some of the discussions on different channels last week about the 
>> ongoing move of the OpenDev infra services out of OpenStack project and 
>> TC governance. One of the questions that was raised was around setting 
>> it up as an OSF pilot project. I wanted to send an email to this list 
>> to see if that was something the team was interested in moving forward 
>> on.
>> 
>> As a pilot project, it would create some official standing for the new 
>> effort that would make it clear it’s something that is still supported 
>> by the OSF community and staff. It would also provide additional 
>> opportunities for education and exposure as part of the foundation’s 
>> overall activities. While the OpenDev infra services are somewhat 
>> different than the other projects we have piloted (e.g. Zuul), I think 
>> the process would still work and could be helpful to complete the 
>> transition to a more standalone community both from a governance and 
>> perception standpoint.
>> 
>> Pilot projects are initiated through action of the foundation staff and 
>> over time may be confirmed by the Board as a top-level project with 
>> long-term support. I personally would be supportive of taking the pilot 
>> step, and would like to hear thoughts from those of you who are 
>> directly engaged in it.
>
> I'm in favor of it. I think my biggest concern is that it could be
> awkward to sort through the confirmation process. Perhaps you could
> elaborate on how you think that might work given the current
> framework? Also, I'm not sure the entire infra team is familiar with
> this process so a bit more information on the process and what would
> be required of us would be useful. (I'd try but I'm sure to get it
> wrong).

I also think this would be a good outcome.  I'm in favor of the
additional formal ties this would provide.  I share Clark's questions
about the applicability of the current criteria; I'm sure we can work
through it if the will is there, but it sounds like some changes may be
required.

Considering that this effort has been ongoing for two years, with
generally positive feedback and support along the way, perhaps we could
avoid blocking on this as a requirement to begin the official split from
the openstack-infra project?  It would be beneficial to try to start
growing the OpenDev team as a distinct group.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] tarballs.openstack.org to AFS publishing gameplan

2020-01-29 Thread James E. Blair
Ian Wienand  writes:

> On Wed, Jan 29, 2020 at 05:21:49AM +, Jeremy Stanley wrote:
>> Of course I meant from /(.*) to tarballs.opendev.org/openstack/$1 so
>> that clients actually get directed to the correct files. ;)
>
> Ahh yes, sorry you mentioned that in IRC and I should have
> incorporated that.  I'm happy with that; we can also have that
> in-place and test it by overriding our hosts files before any
> cut-over.

The overall plan sounds good to me, as does the follow-up.  I'm
ambivalent about when we put the redirects in place (during or after the
host move).  Whichever is easiest (but my guess is that due to the
additional testing we would be able to do, *during* might be easiest).

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] proposal: custom favicon for review.o.o

2020-01-29 Thread James E. Blair
Ian Wienand  writes:

> On Wed, Jan 29, 2020 at 06:35:28AM +, Sorin Sbarnea wrote:
>> I guess that means that you are not against the idea.
>
> I know it's probably not what you want to hear, but as it seems
> favicons are becoming a component of branding like a logo I think
> you'd do well to run your proposed work by someone with the expertise
> to evaluate it with-respect-to whatever branding standards we have (I
> imagine someone on the TC would have such contacts from the Foundation
> or whoever does marketing).
>
> If you just make something up and send it, you're probably going to
> get review questions like "how can we know this meets the branding
> standards to be the logo on our most popular website" or "is this the
> right size, format etc. for browsers in 2020" which are things
> upstream marketing and web people could sign off on.  So, personally,
> I'd suggest a bit of pre-coordination there would mean any resulting
> technical changes would be very non-controversial.

That bridge has been crossed for opendev.org, which has a
well-thought-out favicon.  I think adding the same one to
review.opendev.org is a technical exercise at this point.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Checking release approvals automatically

2019-12-16 Thread James E. Blair
Thierry Carrez  writes:

> James E. Blair wrote:
>> [...]
>> But back on the first hand, I think that installing python packages in a
>> virtualenv is too heavyweight for a job to run on the executor.  The
>> candidates we usually look for are things that can run with what's
>> already installed.  Happily, yaml is already installed, because it's
>> kind of a big deal on the executor.  Unhappily, openstack-governance is
>> not merely a repo you need to have on-disk, but is actually a python
>> package you need installed (wow, when did that happen?).
>>
>> We were so close.  If you just needed to run a python script that
>> imported yaml and read a file out of governance, I'd say it would be a
>> great candidate for running on the executor.  But I think the
>> installation of openstack-governance (which has its own requirements
>> that are not installed on the executor) pushes this over the line, and
>> we should run it on a full node.
>
> Actually the script only uses openstack-governance to parse YAML files
> that are in the governance repository... So if YAML is available and
> the contents of the governance repo are accessible, that can easily
> work.
>
> The only drawback compared to using the governance lib is that it will
> not survive a change in the YAML format of governance files... but
> then it's not the only thing that would break if we did that.
>
> So it looks like a simple Python script that only imports yaml would
> work on the executor. The script uses requests as well, but I can make
> it use urllib instead (unless requests is pre-installed on the
> executor too ?)

Yes, requests is installed too.

> Thanks for the full analysis, I learned a couple of things :)

You're welcome!

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Checking release approvals automatically

2019-12-13 Thread James E. Blair
Thierry Carrez  writes:

> I moved to implementation on this, but I hit an issue with the
> original plan:
>
>> [...]
>> The job should be lightweight enough to run on the executor. With
>> all those safeguards in place, I do not expect it to trigger
>> significant additional load.
>
> My current implementation is a python script run from tox. But I can't
> use the standard tox jobs as they have "hosts:all" tasks all over them
> which are bypassed[1] if the job is run on the executor.
>
> [1] See
> https://zuul.opendev.org/t/openstack/build/4056ca3ee8b247ebbe1cbb1474191c16/console
>
> Ideally I would define my own narrow playbook to run the script,
> without inheriting from the standard tox job. However the current
> script requires some dependencies to be installed
> (openstack-governance, yaml...).
>
> Here are the options I see:
>
> 1- reimplementing most of the unittests/tox job logic in
> "hosts:localhost" playbook(s) -- would mean lots of copypaste, does
> not rhyme so well with "lightweight", and increases execution times
> significantly

On the contrary, this is quite simple.  The jobs in zuul-jobs are
designed to have very simple playbooks which are typically just lists of
roles.  Our goal was to make them as much like using JJB as possible.

These are the 2 playbooks involved:

  https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/tox/pre.yaml
  https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/tox/run.yaml

Ultimately the entire job is just a list of 4 roles:

  - ensure-tox
  - ensure-python
  - revoke-sudo
  - tox

Please feel free to make a job with whatever role combination you need.

If you plan to run on the executor, you do not need ensure-python; we
know it's there.  Nor do you need revoke-sudo; we know we don't have
sudo.

On the other hand, Zuul restricts what can be run on the executor by
playbooks in untrusted repos, and executing a subprocess like tox is one
of the restricted actions, so if you did make a "hosts: localhost"
playbook, it would not work if run out of the releases repo.  It would
need to be put in the project-config repo, and changes to the job would
not be self-testing.  But if you've got this mostly working already,
that's probably not a big deal.

But back on the first hand, I think that installing python packages in a
virtualenv is too heavyweight for a job to run on the executor.  The
candidates we usually look for are things that can run with what's
already installed.  Happily, yaml is already installed, because it's
kind of a big deal on the executor.  Unhappily, openstack-governance is
not merely a repo you need to have on-disk, but is actually a python
package you need installed (wow, when did that happen?).

We were so close.  If you just needed to run a python script that
imported yaml and read a file out of governance, I'd say it would be a
great candidate for running on the executor.  But I think the
installation of openstack-governance (which has its own requirements
that are not installed on the executor) pushes this over the line, and
we should run it on a full node.

> 2- rewrite the Python script so that it can run on stdlib -- not sure
> I want to write a YAML parser from scratch though

If you want to drop the dependency on openstack-governance and rewrite
the "business logic" parts of that without rewriting the yaml parsing
parts (since PyYAML is installed), then the above is a viable option.

> 3- Abandon the idea of running on the executor -- the trick is that
> for such a short job the overhead of requesting a test node is a bit
> heavy, and we want to run the job on every vote change on release
> requests

Maybe #2 is worth it then?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Report from Gerrit User Summit

2019-09-04 Thread James E. Blair
"Clark Boylan"  writes:

> How does triggering work with the checks api? I seem to recall reading
> the original design spec for the feature and that CI systems would
> Poll Gerrit for changes that apply to their checks giving them a list
> of items to run? Then as a future improvement there was talk of having
> a callback system similar to Github's app system?

Yes, that's essentially correct.  That's actually why implementing
support for this now in Zuul helps us with the effort to run jobs
against upstream Gerrit, since our *only* option there is to poll, due
to the lack of stream-events support.

The polling operation is designed to be very efficient -- each time you
get back a list of changes which are configured to run the checker, but
where it hasn't reported start yet.

A future enhancement is an event (which would show up in stream-events,
so perhaps useful in most installations, but still not upstream gerrit)
and also a webhook (which would work in upstream gerrit I think).  The
event would merely indicate that a poll should be performed.  That's
good enough, and would allow us to achieve the near-instantaneous
response we have now.

(Having said that, we may be able to have a fairly frequent poll
interval on upstream gerrit without problems.)

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Report from Gerrit User Summit

2019-09-04 Thread James E. Blair
Hi,

Monty and I attended the Gerrit User Summit and hackathon last week.  It
was very productive: we learned some good information about upgrading
Gerrit, received offers of help doing so if we need it, formed closer
ties with the Gerrit community, and fielded a lot of interest in Reno
and Zuul.  In general, people were happy that we attended as
representatives of the OpenDev/OpenStack/Zuul communities and (re-)
engaged with the Gerrit community.

Gerrit Upgrade
--

We learned some practical things about upgrading to 3.0:

* We can turn off rebuilding the secondary index ("reindexing") on
  startup to speed both or normal restarts as well as prevent unwanted
  reindexes during upgrades.  (Monty pushed a change for this.)

* We can upgrade from 2.13 -> 2.14 -> 2.15 -> 2.16 during a relatively
  quick downtime.  We could actually do some of that while up, but Monty
  and I advocate just taking a downtime to keep things simple.

* We should, under no circumstances, enable NoteDB before 2.16.  The
  migration implementation in 2.15 is flawed and will cause delays or
  errors in later upgrades.

* Once on 2.16, we should enable NoteDB and perform the migration.  This
  can happen online in the background.

* We should GC the repos before starting, to make reindexing faster.

* We should ensure that we have a sufficiently sized diff cache, as that
  Gerrit will be able to re-use previously computed patchset diffs when
  reindexing.  This can considerably speed an onlide reindex.

* We should probably run 2.16 in production for some time (1 month?) to
  allow users to acclimate to polygerrit, and deal with hideCI.

* Regarding hideCI -- will someone implement that for polygerrit?  will
  it be obviated by improvements in Zuul reporting (tagged or robot
  comments)?  even if we improve Zuul, will third-party CI's upgrade?
  do we just ignore it?

* The data in the AccountPatchReviewDb are not very important, and we
  don't need to be too concerned if we lose them during the upgrade.

* We need to pay attention to H2 tuning parameters, because many of the
  caches use H2.

* Luca has offered to provide any help if we need it.

I'm sure there's more, but that's a pretty good start.  Monty has
submitted several changes to our configuration of Gerrit with the topic
"gus2019" based on some of this info.

Gerrit Community


During the hackathon, Monty and I bootstrapped our workstations with a
full development environment for Gerrit.  We learned a bit about the new
build system (bazel) -- mostly that it's very complicated, changes
frequently from version to version, and many of the options are black
magic.  However, the bazel folks have been convinced that stability is
in the community's interest, and an initial stable version is
forthcoming.

The key practical things we learned are:

* Different versions of Gerrit may want different bazel versions
  (however, I was able to build the tips of all 3 supported branches
  with the latest bazel).

* There is a tool to manage bazel for you (bazelisk), and it will help
  get the right version of bazel for a given branch/project (it is
  highly recommended, but in theory (especially with the forthcoming
  stable release) should not be required (see last point)).

* The configuration options specified in the developer documentation are
  important and correct.  Monty fixed instability in our docker image
  builds by reverting to just those options.

* Eclipse (or intelliJ) are the IDEs of choice.  Note that the latest
  version of Eclipse (which may not be in distros) is required.  Of
  course, these aren't required, but it's Java, so they help a lot.
  There is a helper script to generate the Eclipse project file.

* Switching between branches requires a full rebuild with bazel, and a
  regeneration/re-import of the Eclipse project.  Given that, I suggest
  this pro-tip: maintain a git repo+Eclipse project for each Gerrit
  branch you work on.  Same for your test Gerrit instance (so you don't
  have to run "gerrit init" over and over).

* The Gerrit maintainers are most easily reachable on Slack.

* Monty and I have been given some additional permissions to edit bugs
  in the issue tracker.  They seem fairly willing to give out those
  permissions if others are interested.

* The issue tracker, like most, doesn't receive enough attention to
  dealing with old issues.  But for newer issues, still seems of
  practical use.

* The project has formed a steering committee and adopted a
  design-driven contribution process[1] (not dissimilar to our own specs
  process).  More on this later.

Reno


The Gerrit maintainers like to make releases at the end of hackathons,
and so we all (most especially the maintainers) observed that the
current process around manually curating release notes was cumbersome
and error-prone.  Monty demonstrated Reno to an enthusiastic reception
and therefore, Monty will be working on integrating Reno into Gerrit's
release 

[OpenStack-Infra] [all][infra] Zuul logs are in swift

2019-08-15 Thread James E. Blair
Hi,

We have made the switch to begin storing all of the build logs from Zuul
in Swift.

Each build's logs will be stored in one of 7 randomly chosen Swift
regions in Fort Nebula, OVH, Rackspace, and Vexxhost.  Thanks to those
providers!

You'll note that the links in Gerrit to the Zuul jobs now go to a page
on the Zuul web app.  A lot of the features previously available on the
log server are now available there, plus some new ones.

If you're looking for a link to a docs preview build, you'll find that
on the build page under the "Artifacts" section now.

If you're curious about where your logs ended up, you can see the Swift
hostname under the "logs_url" row in the summary table.

Please let us know if you have any questions or encounter any issues,
either here, or in #openstack-infra on IRC.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [release][infra] Supporting rget in our release process

2019-07-29 Thread James E. Blair
Hi,

A colleague at Red Hat is working on an effort to record signatures of
release artifacts.  Essentially it's a way to help users verify release
artifacts (or determine if they have been changed) independent of PGP
signatures.  You can read about it here:
https://github.com/merklecounty/rget#rget

It sounds like an interesting and useful effort, and I think we can
support it at little cost.  If we wanted to do so, I think we would need
to do the following things:

1) Generate SHA256SUMS of our release artifacts.  These could even
include the GPG signature files.

2) Run "rget submit" on the resulting files after publication.

That's it.

Both of those would be changes to the release publication jobs, and
wouldn't require any other changes to our processes.

As mentioned in the README this is very early stages and the author,
Brandon Philips, welcomes both further testing and feedback on the
process in general.

Thoughts?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zanata broken on Bionic

2019-04-26 Thread James E. Blair
Frank Kloeker  writes:

> Just as a follow up: I've wrote down all required steps and ideas for
> a migration to Weblate on [1].

Thanks for that!  Will this be a topic at the PTG?

> There are some issues adressed but thats not unsolvable (i.e. invent
> openstackid as a OpenId provider).

We may want to think about deploying this in OpenDev, so the openstackid
provider may not be as critical (likely one option among many).  Though
we still may want to wait until it's a choice before we deploy it.

> First big steps are almost done. Gerrit integration is working out of
> the box [2]. The workflow will be much easier in the future. Beside
> proposals every 24 hours, also ad hoc proposals are possible. So
> translations will be get faster into repos.
> The other way around is also tested: Webhook with Github is working to
> push translations to Weblate. I saw Gitea has a simlar feature - so
> that should also work out and faster as the current way.
> A rough installation procedere is on [3], including a semi
> automatation to setup projects.

We are no longer replicating all projects to Github, so I don't think we
want to build any tooling that depends on that.  We could do something
with Gitea, however, I'd prefer to continue treating it as a simple
read-only mirror at the moment.  So for getting data into Weblate, I
think we should look at using Zuul for that.  Post-merge jobs could push
changes to Weblate fairly easily.  It looks like that's one of the
options in the etherpad, with "wlc pull".

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Announcing Gertty 1.6.0

2019-04-23 Thread James E. Blair
Announcing Gertty 1.6.0
===

Gertty is a console-based interface to the Gerrit Code Review system.

Gertty is designed to support a workflow similar to reading network
news or mail.  It syncs information from Gerrit to local storage to
support disconnected operation and easy manipulation of local git
repos.  It is fast and efficient at dealing with large numbers of
changes and projects.

The full README may be found here:

  https://opendev.org/ttygroup/gertty/src/branch/master/README.rst

Changes since 1.5.0:


* The default config file location is now
  ~/.config/gertty/gertty.yaml (but Gertty will fall back on the
  previous default of ~/.gertty.yaml if it is not found)

* The source code is now hosted in OpenDev (https://opendev.org)

* Gertty supports (and should be run with) python3

* Added tab display support

* Reviewkeys are enabled in diffs

* Added a new option (off by default) to close the change after reviewing

* Added inline comments to the change overview screen

* The size column graph is more configurable

* Reviewkeys now support sending a message

* Added more example configurations

As well as several bug fixes and stability improvements.

Thanks to the following people whose changes are included in this
release:

Clint Byrum
Dominique Martinet
Doug Wiegley
Emilien Macchi
Ian Wienand
John L. Villalovos
Logan V
Major Hayden
Masayuki Igawa
Matthias Runge
Natal Ngétal
Nate Johnston
Nguyen Hung Phuong
Robert Collins
Tobias Henkel
Tristan Cacqueray

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] OpenDev git hosting migration and Gerrit downtime April 19, 2019

2019-04-17 Thread James E. Blair
Thierry Carrez  writes:

> Clark Boylan wrote:
>> Fungi has generated a master list of project renames for the
>> openstack namespaces: http://paste.openstack.org/show/749402/. If
>> you have a moment please quickly review these planned renames for
>> any obvious errors or issues.
>
> One thing that bothers me is the massive openstack-infra/ ->
> openstack/ rename, with things like:
>
> openstack-infra/gerrit -> openstack/gerrit
>
> Shouldn't that be directly moved to opendev/gerrit? Moving it to
> openstack/ sounds like a step backward.

My understanding is this is due to openstack-infra being TC-governed and
opendev not quite having gotten around to establishing an official
non-TC governance yet.  I think the intent is to eventually do that.  We
could probably anticipate that a bit if we would like and go ahead and
sort openstack-infra things into different buckets.  At the end of this,
I think we will all have more hats, with overlap between opendev and
openstack.  Some current infra activities and repos are
openstack-specific and should be re-homed into openstack; others serve
all projects and should be in opendev; yet more are just things that are
incidentally related to what we do and should be on their own.

I've produced a list based on my estimation of what things will look
like at the end of the process.  This is just a starting point if we
would like to explore this option.  We could refine the list and use it,
or we could choose to stick with the status quo temporarily and move the
infra repos out of openstack at a later time when things are more clear.

https://etherpad.openstack.org/p/6CmVhW40m0

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Gitea next steps

2019-02-04 Thread James E. Blair
Hi,

At the last infra team meeting, we talked about whether and how to
proceed with Gitea.  I'd like to summarize that quickly and make sure
we're all on board with it.

* We will continue to deploy our own Kubernetes using the
  k8s-for-openstack Ansible playbook that Monty found.  Since that's
  developed by a third-party, we will use it by checking out the
  upstream source from GitHub, but pinning to a known sha so that we
  don't encounter surprises.

* We discussed deploying with a new version of rook which does not
  require the flex driver, but it turns out I was a bit ahead of things
  -- that hasn't landed yet.  So we can probably keep our current
  deployment.

Ian raised two new issues:

1) We should verify that the system still functions if our single-master
Kubernetes loses its master.

Monty and I tried this -- it doesn't.  The main culprit here seems to be
DNS.  The single master is responsible for intra-(and extra!)-cluster
DNS.  This makes gitea unhappy for three reasons: a) if its SQL
connections have gone idle and terminated, it cannot re-establish them,
and b) it is unable to resolve remote hostnames for avatars, which can
greatly slow down page loads, and c) the replication receiver is not a
long running process, it's just run over SSH, so it can't connect to the
database either, and therefore replication fails.

The obvious solution, use a multi-master setup, apparently has issues if
k8s is deployed in a cloud with LoadBalancer objects (which we are
using).

Kubernetes does have support for scale-out DNS, it's not clear whether
that still has a SPOF though.  Monty is experimenting with this.

If that doesn't improve things, we may still want to proceed since the
system should still mostly work for browsing and git clones if the
master fails, and full operation will resume when it comes online.

2) Rook is difficult to upgrade.

This appears to be the case.  When it does come time to upgrade rook, we
may want to simply build a new Kubernetes cluster for the system.
Presumably by that point, it won't require the flexvolume driver, which
will be a good reason to make a new cluster anyway, and perhaps further
upgrades after that won't be as complicated.

Once we conclude investigation into issue #1, I think these are the next
steps:

* Land the patches to manage the opendev k8s cluster with Ansible.

* Pin the k8s-on-openstack repo to the current sha.

* Add HTTPS termination to the cluster.

* Update opendev.org DNS to point to the cluster.

* Treat this as a soft-launch of the production service.  Do not
  publicise it or encourage people to switch to it yet, but continue to
  observe it as we complete the rest of the tasks in [1].

[1] 
http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html#work-items

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Kubernetes walkthrough (for OpenDev gitea)

2019-01-07 Thread James E. Blair
cor...@inaugust.com (James E. Blair) writes:

> Hi,
>
> As part of the OpenDev Gerrit Hosting spec [1], we're planning on
> running gitea as our primary git mirror.  Monty and I have been working
> on a system to run it in a fully HA manner using Kubernetes, cephfs, and
> percona.  The changes to implement this are in review[2].  But there's a
> lot of new technology, and it's been very educational to be able to
> build this system up from the ground.  We'd like to walk through the
> process interactively with other folks so we all benefit from this.
>
> We will schedule a time where we will broadcast a terminal session which
> anyone can watch (using telnet) at the same time we all join a voice
> conference on the PBX.  Monty and I will demonstrate the system and
> answer questions as we go.  We will record the session and make it
> available afterwords.
>
> The infra-root team, who may end up debugging problems with the system
> in the future, are the primary audience of this session, but anyone is
> welcome to join.
>
> If you're interested in attending, please let us know which of the two
> suggested times work for you by adding an entry to this ethercalc:
>
>   https://ethercalc.openstack.org/infra-k8-walkthrough
>
> Thanks,
>
> Jim
>
> [1] 
> http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html
> [2] https://review.openstack.org/#/q/topic:opendev-gerrit

We'll do this on 2019-01-08, 20:00 UTC.  That's right after the next
infra team meeting.

Join us in #opendev on IRC and room 6561 on the PBX:

https://wiki.openstack.org/wiki/Infrastructure/Conferencing

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Kubernetes walkthrough (for OpenDev gitea)

2019-01-02 Thread James E. Blair
Hi,

As part of the OpenDev Gerrit Hosting spec [1], we're planning on
running gitea as our primary git mirror.  Monty and I have been working
on a system to run it in a fully HA manner using Kubernetes, cephfs, and
percona.  The changes to implement this are in review[2].  But there's a
lot of new technology, and it's been very educational to be able to
build this system up from the ground.  We'd like to walk through the
process interactively with other folks so we all benefit from this.

We will schedule a time where we will broadcast a terminal session which
anyone can watch (using telnet) at the same time we all join a voice
conference on the PBX.  Monty and I will demonstrate the system and
answer questions as we go.  We will record the session and make it
available afterwords.

The infra-root team, who may end up debugging problems with the system
in the future, are the primary audience of this session, but anyone is
welcome to join.

If you're interested in attending, please let us know which of the two
suggested times work for you by adding an entry to this ethercalc:

  https://ethercalc.openstack.org/infra-k8-walkthrough

Thanks,

Jim

[1] 
http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html
[2] https://review.openstack.org/#/q/topic:opendev-gerrit

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Launch node and the new bridge server

2018-08-28 Thread James E. Blair
Ian Wienand  writes:

> On 08/28/2018 09:48 AM, Clark Boylan wrote:
>> On Mon, Aug 27, 2018, at 4:21 PM, Clark Boylan wrote:
>> One quick new observation. launch-node.py does not install puppet at
>> all so the subsequent ansible runs on the newly launched instances
>> will fail when attempting to stop the puppet service (and will
>> continue on to fail to run puppet as well I think).
>
> I think we should manage puppet on the hosts from Ansible; we did
> discuss that we could just manually run
> system-config:install_puppet.sh after launching the node; but while
> that script does contain some useful things for getting various puppet
> versions, it also carries a lot of extra cruft from years gone by.
>
> I've proposed the roles to install puppet in [1].  This runs the roles
> under Zuul for integration testing.

Unlike the afs/krb roles, I don't believe we have plans to run these
roles directly in Zuul jobs, so a better choice might be to exercise
them in the eventual per-hostgroup jobs that we write, which test
end-to-end deployment of each host.  That will be a more realistic
exercise of the roles.

How about we temporarily add these to the system-config-run-base job
until we write our first hostgroup job for a puppet host, then remove
it?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Moving logs into swift (redux)

2018-07-17 Thread James E. Blair
Joshua Hesketh  writes:

> I know the CDN was complicated with the cloud provider we were using at the
> time. However, I'm unsure what the CDN options are these days. Will there
> be an API we can use to turn the CDN on per container and get the public
> URL for example?

A typical swift has the ability to allow for public access to swift
itself, so this shouldn't be an issue.  We should survey our available
swifts and make sure of this.  I'm not currently advocating that we use
non-standard swifts (i.e., ones which require non-standard API calls to
retrieve CDN urls, etc).

> If the above two items turn out sub-optimal, I'm personally not opposed to
> continuing to run our own middleware. We don't necessarily need that to be
> in os_loganalyze as the returned URL could be a new middleware. The
> middleware can then handle the ARA and possibly even work as our own CDN
> choosing the correct container as needed (if we can't get CDN details
> otherwise).

I'd love to get out of the middleware business entirely if we can.  It
causes large, disruptive outages when it breaks.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Moving logs into swift (redux)

2018-07-17 Thread James E. Blair
Doug Hellmann  writes:

> Excerpts from corvus's message of 2018-07-16 15:27:10 -0700:
>
>> The Zuul dashboard makes finding the location of logs for jobs
>> (especially post jobs) simpler.  So we no longer need logs.o.o to find
>> the storage location (files or swift) for post jobs -- a user can just
>> follow the link from the build history in the dashboard.
>
> Is that information available through an API? I could update git-os-job
> to use the API to get the URL (it knows how to construct the URL from
> the commit ID today).

Yes:

  
http://zuul.openstack.org/api/builds?project=openstack/python-monascaclient=f5b8831fbaf69d5c93776b166bd4915cf452ae27

But Zuul's API is in flux, undocumented, and comes with no stability
promises yet.  We want to change that, but we're still getting the
basics down.  I hesitate to suggest that folks write to the API too much
at this point.

Having said that, this is a pretty lightweight use, and I'm sure we'll
always have this functionality, even if we end up changing the details,
so I think we should do it.  If we have to change git-os-job again
before everything is final, I'm sure it won't be much trouble.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Moving logs into swift (redux)

2018-07-16 Thread James E. Blair
Clark Boylan  writes:

> Couple of thoughts about this and Ara specifically. Ara static
> generation easily produces tens of thousands of files. Copying many
> small files to the log server with rsync was often quite slow (on the
> order of 10 minutes for some jobs (that is my fuzzy memory though)). I
> am concerned that HTTP to $swift service will have similar problems
> with many small files. This is something we should test.

Yes.  If we want to get out of the business of running a log proxy (I
*very* much do), static generation is the only currently supported
option with ara.  Despite the downsides, we were able to use ara in
static generation mode before and it worked.  I'm hopeful that by
uploading to swift in parallel, we can mitigate the upload cost.

> Also, while swift doesn't have inode problems the end user needs to
> worry about, it does apparently have limits on practical number of
> objects per container. One of the issues we had in the past,
> particularly with the swift we had access to, was that each container
> was not directly accessible by default and you had to configure CDN
> distribution of each container to be publicly visible. This made
> creating many containers to shard the objects more complicated than we
> had hoped. All this to say we may still have to solve the "inode"
> problem just within the context of swift containers, creating
> containers, making them visible.
>
> We should do our best to test both of these items and/or follow up
> with whichever cloud hosts the containers to make sure we aren't
> missing anything else (possible object creation rate limits for
> example).

Yes, the object limit concern is why I think our swift role should
create containers as necessary and shard storage.

The CDN behavior you describe where public access to swift is not
possible except by CDN, and where the CDN uses unpredictable hostnames
per container which must be determined via a non-standard API call is,
not swift's standard behavior, it is a cloud-specific variant of swift.

I believe we can write upload roles for the swift-variant you describe.
We can also write upload roles for standard swift.  I think it will be
difficult to try to use both at the same time, so if we're serious about
distributing logs to cloud-local swifts in the future, we may want to
start by focusing on standard swift (and accept that clouds that either
don't run swift, or don't run standard swift, will export their logs to
another cloud.  That's not so bad.  Most of our clouds do something
similar today).

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Moving logs into swift (redux)

2018-07-16 Thread James E. Blair
cor...@inaugust.com (James E. Blair) writes:

> To summarize: static generation combined with a new role to upload to
> swift using openstacksdk should allow us to migrate to swift fairly
> quickly.  Once there, we can work on a number of enhancements which I
> will describe in a followup post to zuul-discuss.

The followup message is here:

  http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000501.html

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] How do I add a third party CI by ZuulV3?

2018-07-06 Thread James E. Blair
We could consider hosting a config-project with pipeline definitions for
third-party CI as an optional service folks could use.  It would not,
however, be able to support customized reporting messages or recheck
syntax.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] zuulv3 feedback for 3pci

2018-07-05 Thread James E. Blair
Paul Belanger  writes:

> Greetings,
>
> Over the last few weeks I've been helping the RDO project migrate away from
> zuulv2 (jenkins) to zuulv3. Today all jobs have been migrated with the help of
> the zuul-migrate script. We'll start deleting jenkins bits in the next few 
> days.
>
> I wanted to get down some things I've noticed in the process as feedback to
> thirdparty CI operators. Hopefully this will help others.

Thanks!

> Need for use-cached-repos
> -
>
> Today, use-cached-repos is only available to openstack-infra/project-config, 
> we
> should promote this into zuul-jobs to help reduce the amount of pressure on
> zuul-executors when jobs start. In the case of 3pci, prepare-workspace role
> isn't up to the task to sync everything at once.
>
> The feedback here, is to some how allow the base job to be smart enough to 
> work
> if a project is found in /opt/git or not.  Today we have 2 different images in
> rdo, 1 has the cache of upstream git.o.o and other doesn't.

I agree.  I think we've talked about the possibility of merging the
use-cached-repos functionality into prepare-workspace, so that it works
in all cases.  I think it should be possible and would be a good
improvement.

> Namespace projects with fqdn
> 
>
> This one is likely unique to rdoproject, but because we have 2 connection to
> different gerrit systems, review.rdoproject.org and git.openstack.org, we
> actually have duplicate project names. For example:
>
>   openstack/tripleo-common
>
> which means, for zuul we have to write projects as:
>
>   project:
> name: git.openstack.org/openstack/tripleo-common
>
>   project:
> name: review.openstack.org/openstack/tripleo-common
>
> There are legacy reasons for this, and we plan on cleaning review.r.o, however
> because of this duplication we cannot use upstream jobs right now. My initial
> thought would be to update jobs, in this case devstack to use the following 
> for
> required-projects:
>
>   required-projects:
> - git.openstack.org/openstack-dev/devstack
> - git.openstack.org/openstack/tripleo-common
>
> and propose the patch upstream.  Again, this is likely specific to rdoproject,
> but something right now that blocks them on loading jobs from zuul.o.o.

Oh, interesting.  I think we may have missed this subtlety when thinking
about this use case.  I agree that's the best solution for now.

> I do have some other suggestions, but they are more specific to zuul. I could
> post them here as a follow up or on zuul ML.
>
> I am happy I was able to help in the original migration of the openstack
> projects from jenkins to zuulv3, it did help a lot when I was debugging zuul
> failures. But over all rdo project didn't have any major issues with job 
> content.

Thanks for the current (and upcoming) feedback.  I think RDO is in a
particularly good place to exercise the upstream/downstream sharing of
job content; I'm looking forward to more!

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] What's the future for git-review?

2018-07-05 Thread James E. Blair
Jeremy already articulated my thoughts well; I don't have much to add.
But I think it's important to reiterate that I find it extremely
valuable that git-review perform its function ("push changes to Gerrit")
simply and reliably.

There are certainly projects we've created which are neglected due to
lack of time or interest.  I don't think git-review is one of them.  I
think with agreement on scope, you'll find that we are interested in
maintaining it.  Again, I agree with Jeremy's evaluations of Darragh's
proposals.

I also don't think there is (or should be) anything OpenStack specific
about it.  I see it as an essential component of any Gerrit system.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [infra] Behavior change in Zuul post pipeline

2018-06-26 Thread James E. Blair
Hi,

We recently changed the behavior* of the post pipeline in Zuul to only
run jobs for the most recently merged changes on each project's
branches.  If you were relying on the old behavior where jobs ran on
every merged change, let us know, we can make a new pipeline for that.
But for the typical case, this should result in some improvements:

1) We waste fewer build resources building intermediate build artifacts
(e.g., documentation for a version which is already obsoleted by the
change which landed after it).

2) Races in artifact build jobs will no longer result in old versions of
documentation being published because they ran on a slightly faster node
than the newer version.

If you observe any unexpected behavior as the result of this change,
please let us know in #openstack-infra.

-Jim

* The thing which implements this behavior in Zuul is the
  "supercedent"** pipeline manager[1].  Zuul has had, since the initial
  commit six years ago, a pluggable system for controlling the behavior
  in its pipelines.  To date, we have only had two pipeline managers:
  "dependent" which controls the gate, and "independent" which controls
  everything else.

[1] 
https://zuul-ci.org/docs/zuul/user/config.html#value-pipeline.manager.supercedent

** It may or may not be named after anyone you know.

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [infra][all] Upcoming Zuul behavior change for files and irrelevant-files

2018-06-07 Thread James E. Blair
Hi,

Earlier[1][2], we discussed proposals to make files and irrelevant-files
easier to use -- particularly ways to make them overridable.  We settled
on an approach, and it is now implemented.  We plan on upgrading
OpenStack's Zuul to the new behavior on Monday, June 11, 2018.

To summarize the change:

  Files and irrelevant-files are treated as overwriteable attributes and
  evaluated after branch-matching variants are combined.
  
  * Files and irrelevant-files are overwritten, so the last value
encountered when combining all the matching variants (looking only at
branches) wins.
  * It's possible to both reduce and expand the scope of jobs, but the
user may need to manually copy values from a parent or other variant
in order to do so.
  * It will no longer be possible to alter a job attribute by adding a
variant with only a files matcher -- in all cases files and
irrelevant-files are used solely to determine whether the job is run,
not to determine whether to apply a variant.

This is a behavior change to Zuul that is not possible[3] to support in
a backwards compatible way.  That means that on Monday, there may be
sudden alterations to the set of jobs which run on changes.  Considering
that many of us can barely predict what happens at all when multiple
irrelevant-files stanzas enter the picture, it's not possible[4] to say
in advance exactly what the changes will be.

Suffice it to say that, on Monday, if some jobs you were expecting to
run on a change don't, or some jobs you were not expecting to run do,
then you will need to alter the files or irrelevant-files matchers on
those jobs.  Hopefully the new approach is sufficiently intuitive that
corrective changes will be simple to make.  Jobs which have no more than
one files or irrelevant-files attribute involved in their construction
(likely the bulk of the jobs out there) are unlikely to need any
immediate changes.

Please let us know in #openstack-infra if you encounter any problems and
we'll be happy to help.  Hopefully after we cross this speedbump we'll
find the files and irrelevant-files matchers much more useful.

-Jim

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-May/130074.html
[2] http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-May/000397.html
[3] At least, not possible with a reasonable amount of effort.
[4] Of course it's possible but only with an unhealthy amount of beer.

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Winterscale: a proposal regarding the project infrastructure

2018-05-30 Thread James E. Blair
Doug Hellmann  writes:

>> * Move many of the git repos currently under the OpenStack project
>>   infrastructure team's governance to this new team.
>
> I'm curious about the "many" in that sentence. Which do you anticipate
> not moving, and if this new team replaces the existing team then who
> would end up owning the ones that do not move?

There are a lot.  Generally speaking, I think most of the custom
software, deployment tooling, and configuration would move.

An example of something that probably shouldn't move is
"openstack-zuul-jobs".  We still need people that are concerned with how
OpenStack uses the winterscale service.  I'm not sure whether that
should be its own team or should those functions get folded into other
teams.

>> * Establish a "winterscale infrastructure council" (to be renamed) which
>>   will govern the services that the team provides by vote.  The council
>>   will consist of the PTL of the winterscale infrastructure team and one
>>   member from each official OpenStack Foundation project.  Currently, as
>>   I understand it, there's only one: OpenStack.  But we expect kata,
>>   zuul, and others to be declared official in the not too distant
>>   future.  The winterscale representative (the PTL) will have
>>   tiebreaking and veto power over council decisions.
>
> That structure seems sound, although it means the council is going
> to be rather small (at least in the near term).  What sorts of
> decisions do you anticipate needing to be addressed by this council?

Yes, very small.  Perhaps we need an interim structure until it gets
larger?  Or perhaps just discipline and agreement that the two people on
it will consult with the necessary constituencies and represent them
well?

I expect the council not to have to vote very often.  Perhaps only on
substantial changes to services (bringing a new offering online,
retiring a disused offering, establishing parameters of a service).  As
an example, the recent thread on "terms of service" would be a good
topic for the council to settle.

>>   (This is structured loosely based on the current Infrastructure
>>   Council used by the OpenStack Project Infrastructure Team.)
>> 
>> None of this is obviously final.  My goal here is to give this effort a
>> name and a starting point so that we can discuss it and make progress.
>> 
>> -Jim
>> 
>
> Thanks for starting this thread! I've replied to both mailing lists
> because I wasn't sure which was more appropriate. Please let me
> know if I should focus future replies on one list.

Indeed, perhaps we should steer this toward openstack-dev now.  I'll
drop openstack-infra from future replies.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Winterscale: a proposal regarding the project infrastructure

2018-05-30 Thread James E. Blair
Hi,

With recent changes implemented by the OpenStack Foundation to include
projects other than "OpenStack" under its umbrella, it has become clear
that the "Project Infrastructure Team" needs to change.

The infrastructure that is run for the OpenStack project is valued by
other OpenStack Foundation projects (and beyond).  Our community has not
only produced an amazing cloud infrastructure system, but it has also
pioneered new tools and techniques for software development and
collaboration.

For some time it's been apparent that we need to alter the way we run
services in order to accommodate other Foundation projects.  We've been
talking about this informally for at least the last several months.  One
of the biggest sticking points has been a name for the effort.  It seems
very likely that we will want a new top-level domain for hosting
multiple projects in a neutral environment (so that people don't have to
say "hosted on OpenStack's infrastructure").  But finding such a name is
difficult, and even before we do, we need to talk about it.

I propose we call the overall effort "winterscale".  In the best
tradition of code names, it means nothing; look for no hidden meaning
here.  We won't use it for any actual services we provide.  We'll use it
to refer to the overall effort of restructuring our team and
infrastructure to provide services to projects beyond OpenStack itself.
And we'll stop using it when the restructuring effort is concluded.

This is my first proposal: that we acknowledge this effort is underway
and name it as such.

My second proposal is an organizational structure for this effort.
First, some goals:

* The infrastructure should be collaboratively run as it is now, and
  the operational decisions should be made by the core reviewers as
  they are now.

* Issues of service definition (i.e., what services we offer and how
  they are used) should be made via a collaborative process including
  the infrastructure operators and the projects which use it.

To that end, I propose that we:

* Work with the Foundation to create a new effort independent of the
  OpenStack project with the goal of operating infrastructure for the
  wider OpenStack Foundation community.

* Work with the Foundation marketing team to help us with the branding
  and marketing of this effort.

* Establish a "winterscale infrastructure team" (to be renamed)
  consisting of the current infra-core team members to operate this
  effort.

* Move many of the git repos currently under the OpenStack project
  infrastructure team's governance to this new team.

* Establish a "winterscale infrastructure council" (to be renamed) which
  will govern the services that the team provides by vote.  The council
  will consist of the PTL of the winterscale infrastructure team and one
  member from each official OpenStack Foundation project.  Currently, as
  I understand it, there's only one: OpenStack.  But we expect kata,
  zuul, and others to be declared official in the not too distant
  future.  The winterscale representative (the PTL) will have
  tiebreaking and veto power over council decisions.

  (This is structured loosely based on the current Infrastructure
  Council used by the OpenStack Project Infrastructure Team.)

None of this is obviously final.  My goal here is to give this effort a
name and a starting point so that we can discuss it and make progress.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Options for logstash of ansible tasks

2018-03-27 Thread James E. Blair
Ian Wienand  writes:

> The closest other thing I could find was "aggregate" [1]; but this
> relies on having a unique task-id to group things together with.
> Ansible doesn't give us this in the logs and AFAIK doesn't have a
> concept of a uuid for tasks.

We control the log output format in Zuul (both job-output.txt and
job-output.json).  So we could include a unique ID for tasks if we
wished.  However, we should not put that on every line, so that still
would require some handling in the log processor.

As soon as I say that, it makes me think that the solution to this
really should be in the log processor.  Whether it's a grok filter, or
just us parsing the lines looking for task start/stop -- that's where we
can associate the extra data with every line from a task.  We can even
generate a uuid right there in the log processor.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread James E. Blair
David Moreau Simard <dmsim...@redhat.com> writes:

> On Mon, Mar 26, 2018 at 10:20 AM, James E. Blair <cor...@inaugust.com> wrote:
>>> - # of jobs and Ansible playbooks per month ran by Zuul
>>
>> I'm curious about this one -- how were you planning on defining these
>> values and obtaining them?
>>
>
> I've needed to pull statistics out of Zuul in the past for RDO (i.e,
> justifying budget for CI resources)
> and I use the sql reporter data to do it.
> It looks like this:
>
> $range = "'2018-02-01 00:00:00' AND '2018-02-28 23:59:59'"
> SELECT job_name,
>result,
>start_time,
>end_time,
>TIMEDIFF(end_time, start_time) as duration
> FROM zuul_build
> WHERE
> start_time BETWEEN $range
>
> This gets me the amount of monthly *jobs* and I can extrapolate (over
> N playbooks..)
> by estimating a number knowing that:
> - base and post playbooks are fairly consistently X playbooks
> - there is at least one "run" playbook
>
> So pretending that 1000 jobs ran, I can say something like:
> 1000 jobs and over [1000 * (X+1)] playbooks
>
> It's not a perfect number but we know we run more playbooks than that.
>
> What I have also been thinking about is, if I want to get a more
> accurate number, I could do a sum of all the executor playbook results
> (which are in graphite) but the history for those don't go too far
> back.
> Ex: stats.zuul.executor.ze*_openstack_org.phase.*.*

The SQL query gets the number of completed jobs which are *reported*.
It doesn't get you two other numbers, which are the jobs *launched*
(many of which may have been aborted before completion), or the jobs
*completed* (the results of many of which may have been discarded due to
changes in the environment).  In reality, the system is likely to be
significantly busier than the number of jobs reported will indicate.

Both of the other values can be obtained from graphite or by parsing
logs.  I think for this purpose, graphite might be sufficient.  (The
only time I'd recommend going to logs is when we need to find
project-specific resource usage information.)

stats_counts.zuul.executor.*.builds should be all jobs launched.
stats_counts.zuul.tenant.*.pipeline.*.all_jobs should be all jobs completed.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread James E. Blair
David Moreau Simard  writes:

> Unless there's any objection, I'd have a slide with up to date numbers such 
> as:

I don't have any objection to making them public (I believe nearly all,
if not all, of these are public already).  But I would like them to be
as accurate as possible :).

> - # of projects hosted (as per git.openstack.org)
> - # of servers (in aggregate of all our regions)
> -- (Maybe some big highlights like the size of logstash, logs.o.o, Zuul)
> - Nodepool capacity (number of clouds, aggregate capacity)
> - # of jobs and Ansible playbooks per month ran by Zuul

I'm curious about this one -- how were you planning on defining these
values and obtaining them?

> - Approximate number of maintained and hosted services (irc,
> gerritbot, meetbot, gerrit, git, mailing lists, wiki, ask.openstack,
> storyboard, codesearch, etc.)
> - Probably some high level numbers from Stackalytics
> - Maybe something else I haven't thought about

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul project evolution

2018-03-15 Thread James E. Blair
Hi,

To date, Zuul has (perhaps rightly) often been seen as an
OpenStack-specific tool.  That's only natural since we created it
explicitly to solve problems we were having in scaling the testing of
OpenStack.  Nevertheless, it is useful far beyond OpenStack, and even
before v3, it has found adopters elsewhere.  Though as we talk to more
people about adopting it, it is becoming clear that the less experience
they have with OpenStack, the more likely they are to perceive that Zuul
isn't made for them.

At the same time, the OpenStack Foundation has identified a number of
strategic focus areas related to open infrastructure in which to invest.
CI/CD is one of these.  The OpenStack project infrastructure team, the
Zuul team, and the Foundation staff recently discussed these issues and
we feel that establishing Zuul as its own top-level project with the
support of the Foundation would benefit everyone.

It's too early in the process for me to say what all the implications
are, but here are some things I feel confident about:

* The folks supporting the Zuul running for OpenStack will continue to
  do so.  We love OpenStack and it's just way too fun running the
  world's most amazing public CI system to do anything else.

* Zuul will be independently promoted as a CI/CD tool.  We are
  establishing our own website and mailing lists to facilitate
  interacting with folks who aren't otherwise interested in OpenStack.
  You can expect to hear more about this over the coming months.

* We will remain just as open as we have been -- the "four opens" are
  intrinsic to what we do.

As a first step in this process, I have proposed a change[1] to remove
Zuul from the list of official OpenStack projects.  If you have any
questions, please don't hesitate to discuss them here, or privately
contact me or the Foundation staff.

-Jim

[1] https://review.openstack.org/552637

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [nodepool] Restricting images to specific nodepool builders

2018-02-19 Thread James E. Blair
Paul Belanger  writes:

> On Mon, Feb 19, 2018 at 08:28:27AM -0500, David Shrewsbury wrote:
>> Hi,
>> 
>> On Sun, Feb 18, 2018 at 10:25 PM, Ian Wienand  wrote:
>> 
>> > Hi,
>> >
>> > How should we go about restricting certain image builds to specific
>> > nodepool builder instances?  My immediate issue is with ARM64 image
>> > builds, which I only want to happen on a builder hosted in an ARM64
>> > cloud.
>> >
>> > Currently, the builders go through the image list and check "is the
>> > existing image missing or too old, if so, build" [1].  Additionally,
>> > all builders share a configuration file [2]; so builders don't know
>> > "who they are".
>> >
>> >
>> 
>> Why not just split the builder configuration file? I don't see a need to
>> add code
>> to do this.
>> 
> In our case (openstack-infra) this will require another change to
> puppet-nodepool to support this. Not that we cannot, but it will now mean 
> we'll
> have 7[1] different nodepool configuration files to now manage. 4 x
> nodepool-launchers, 3 x nodepool-builders, since we have 7 services running.

This seems like a pretty legitimate case to split the config.  Very
little of the config for the arm64 builder will be shared with any of
the other builders, so perhaps unlike the case where one simply wants
high-availability launchers, this seems like a very sensible use of a
separate config file.

At any rate, that's what we should do in openstack-infra to solve the
issue Ian asked about.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [infra][all] New Zuul Depends-On syntax

2018-01-24 Thread James E. Blair
Hi,

We recently introduced a new URL-based syntax for Depends-On: footers
in commit messages:

  Depends-On: https://review.openstack.org/535851

The old syntax will continue to work for a while, but please begin using
the new syntax on new changes.

Why are we changing this?  Zuul has grown the ability to interact with
multiple backend systems (Gerrit, GitHub, and plain Git so far), and we
have extended the cross-repo-dependency feature to support multiple
systems.  But Gerrit is the only one that uses the change-id syntax.
URLs, on the other hand, are universal.

That means you can write, as in https://review.openstack.org/535541, a
commit message such as:

  Depends-On: https://github.com/ikalnytskyi/sphinxcontrib-openapi/pull/17

Or in a Github pull request like
https://github.com/ansible/ansible/pull/20974, you can write:

  Depends-On: https://review.openstack.org/536159

But we're getting a bit ahead of ourselves here -- we're just getting
started with Gerrit <-> GitHub dependencies and we haven't worked
everything out yet.  While you can Depends-On any GitHub URL, you can't
add any project to required-projects yet, and we need to establish a
process to actually report on GitHub projects.  But cool things are
coming.

We will continue to support the Gerrit-specific syntax for a while,
probably for several months at least, so you don't need to update the
commit messages of changes that have accumulated precious +2s.  But do
please start using the new syntax now, so that we can age the old syntax
out.

There are a few differences in using the new syntax:

* Rather than copying the change-id from a commit message, you'll need
  to get the URL from Gerrit.  That means the dependent change already
  needs to be uploaded.  In some complex situations, this may mean that
  you need to amend an existing commit message to add in the URL later.

  If you're uploading both changes, Gerrit will output the URL when you
  run git-review, and you can copy it from there.  If you are looking at
  an existing change in Gerrit, you can copy the URL from the permalink
  at the top left of the page.  Where it says "Change 535855 - Needs
  ..." the change number itself is the permalink of the change.

* The new syntax points to a specific change on a specific branch.  This
  means if you depend on a change to multiple branches, or changes to
  multiple projects, you need to list each URL.  The old syntax looks
  for all changes with that ID, and depends on all of them.  This may
  mean some changes need multiple Depends-On footers, however, it also
  means that we can express dependencies is a more fine-grained manner.

Please start using the new syntax, and let us know in #openstack-infra
if you have any problems.  As new features related to GitHub support
become available, we'll announce them here.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Merging feature/zuulv3 into master

2018-01-16 Thread James E. Blair
Hi,

On Thursday, January 18, 2018, we will merge the feature/zuulv3 branches
of both Zuul and Nodepool into master.

If you continuously deploy Zuul or Nodepool from master, you should make
sure you are prepared for this.

The current version of the single_node_ci pattern in puppet-openstackci
should, by default, install the latest released versions of Zuul and
Nodepool.  However, if you are running Zuul continuously deployed from a
version of puppet-openstackci which is not continuously deployed, or
using some other method, you may find that your system has automatically
been upgraded if you have not taken action before the branch is merged.

Regardless of how you deploy Zuul, if you find that your system has been
upgraded, simply re-install the most current releases of Zuul and
Nodepool, either from PyPI or from a git tag.  They are:

Nodepool: 0.5.0
Zuul: 2.6.0

Note that the final version of Zuul v3 has not been released yet.  We
hope to do so soon, but until we do, our recommendation is to continue
using the current releases.

Finally, if you find this message relevant, please subscribe to the new
zuul-annou...@lists.zuul-ci.org mailing list:

http://lists.zuul-ci.org/cgi-bin/mailman/listinfo/zuul-announce

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] RFC: Zuul executor congestion control

2018-01-16 Thread James E. Blair
 writes:

> Hi zuulers,
>
> the zuul-executor resource governor topic seems to be a recurring now
> and we might want take the step and make it a bit smarter.

To be honest, it keeps coming up because we haven't gotten around to
finishing the work already in progress on this.  We're not done with the
current approach yet, so let's not declare it a failure until we've
tried it and learned what we can.

> I think the current approach of a set of on/off governors based on the
> current conditions may not be sufficient. I thought about that and
> like to have feedback about what you think about that.
>
> TLDR; I propose having a congestion control algorithm managing a
> congestion window utilizing a slow start with a generic sensor
> interface and weighted job costs.
>
> Algorithm
> --
>
> The algorithm I propose would manage a congestion window of an
> abstract metric measured in points. This is intended to leverage some
> (simple) weighting of jobs as multi node jobs e.g. probably take more
> resources than single node jobs.
>
> The algorithm consists of two threads. One managing the congestion
> window, one accepting the jobs.
>
> Congestion window management:
>
>   1.  Start with a window of size START_WINDOW_SIZE points
>   2.  Get current used percentage of window
>   3.  Ask sensors for green/red
>   4.  If all green AND window-usage > INCREASE_WINDOW_THRESHOLD
>  *   Increase window
>   5.  If one red
>  *   Decrease window below current usage
>   6.  Loop back to step 2
>
> Job accepting:
>
>   1.  Get current used window-percentage
>   2.  If window-usage < window-size
>  *   Register function if necessary
>  *   Accept job
>  *   summarize window-usage (the job will update asynchronously when 
> finished)
>   3.  Else
>  *   Deregister function if necessary
>   4.  Loop back to step 1
>
> The magic numbers used are subject for further discussion and
> algorithm tweaking.
>
>
> Weighting of jobs
> 
> Now different jobs take different amounts of resources so we would need some 
> simple estimation about that. This could be tuned in the future. For the 
> start I’d propose something simple like this:
>
> Cost_job = 5 + 5 * size of inventory
>
> In the future this could be improved to estimate the costs based on 
> historical data of the individual jobs.

This is the part I'm most concerned about.  The current approach
involves some magic numbers, but they are only there to help approximate
what an appropriate load (or soon memory) for a host might be, and would
be straightforward for an operator to tune if necessary.

Approximating what resources a job uses is a much more difficult matter.
We could have a job which uses 0 nodes and runs for 30 seconds, or a job
that uses 10 nodes and runs for 6 hours collecting a lot of output.  We
could be running both of those at the same time.  Tuning that magic
number would not be straightforward for anyone, and may be impossible.

Automatically collecting that data would improve the accuracy, but would
also be very difficult.  Collecting the cpu usuage, memory consumption,
disk usage, etc, over time and using it to predict impact on the system
is a very sophisticated task, and I'm afraid we'll spend a lot of time
on it.

> Sensors
> --
>
> Further different ways of deployment will have different needs about
> the sensors. E.g. the load and ram sensors which utilize load1 and
> memfree won’t work in a kubernetes based deployments as they assume
> the executor is located exclusively on a VM. In order to mitigate I’d
> like to have some generic sensor interface where we also could put a
> cgroups sensor into which checks resource usage according to the
> cgroup limit (which is what we need for a kubernetes hosted zuul). We
> also could put a filesystem sensor in which monitors if there is
> enough local storage. For hooking this into the algorithm I think we
> could start with a single function
>
> def isStatusOk() -> bool

This is a good idea, and I see no reason why we shouldn't go ahead and
work toward an interface like this even with the current system.  That
will make it more flexible, and if we decide to implement a more
sophisticated system like the one described here in the future, it will
be easy to incorporate this.

> Exposing the data
> -
>
> The window-usage and window-size values could also be exported to
> statsd. This could enable autoscaling of the number of executors in
> deployments supporting that.

I agree that whatever we do, we should expose the data to statsd.

> What are your thoughts about that?

Let me lay out my goals for this:

The governor doesn't need to be perfect -- it only needs to keep the
system from becoming so overloaded that jobs which are running don't
fail due to being out of resources, or cause the kernel to kill the
process.  Ideally an operator should be able to look at a graph and see
that a significant number 

[OpenStack-Infra] New Zuul mailing lists

2018-01-16 Thread James E. Blair
Hi,

We've created two new mailing lists for Zuul.  If you run an instance of
Zuul, or are a user of Zuul, you should subscribe.  The new lists are:

zuul-annou...@lists.zuul-ci.org
---

http://lists.zuul-ci.org/cgi-bin/mailman/listinfo/zuul-announce

This will be a low-traffic, announce-only list where we post important
information about Zuul releases, or changes to the zuul-jobs repository.
If you operate or package Zuul, you should definitely subscribe to this.
If you spend a lot of time working with jobs defined in the zuul-jobs
repo, or jobs which inherit from that, it would also be a good idea to
subscribe to this list.

zuul-disc...@lists.zuul-ci.org
--

http://lists.zuul-ci.org/cgi-bin/mailman/listinfo/zuul-discuss

This is where all discussion about Zuul will take place.  Operaters,
users, and developers of Zuul should all participate in this list.  If
someone has a question about how to accomplish something with Zuul, we
will answer it here.  If we need to discuss making a change to the job
syntax, this is also the place.  We're all involved in using and
developing Zuul together.

Treat this as an "upstream" for Zuul, meaning that if a discussion topic
pertains entirely to OpenStack's use of Zuul -- for instance, whether we
should make a certain change to openstack-tox-py35, or if we should add
more executors -- please use the openstack-dev or openstack-infra lists
as appropriate.

Please go ahead and subscribe now, and we will start using the new lists
soon.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Merging feature/zuulv3 into master in Zuul and Nodepool repos

2018-01-15 Thread James E. Blair
Clark Boylan  writes:

> Hello,
>
> I think we are very close to being ready to merge the zuulv3 feature
> branch into master in both the Zuul and Nodepool repos. In particular
> we merged https://review.openstack.org/#/c/523951/ which should
> prevent breakages for anyone using that deployment method
> (single_node_ci) for an all in one CI suite.
>
> One thing I've noticed is that we don't have this same handling in the
> lower level individual service manifests. For us I don't think that is
> a major issue, we'll just pin our builders to the nodepool 0.5.0 tag,
> do the merge, then update our configs and switch back to master. But
> do we have any idea if it is common for third part CI's to bypass
> single_node_ci and construct their own like we do?
>
> As for the actual merging itself a quick test locally using `git merge
> -s recursive -X theirs feature/zuulv3` on the master branch of
> nodepool appears to work. I have to delete the files that the feature
> branch deleted by hand but otherwise the merge is automated. The
> resulting tree does also pass `tox -e pep8` and `tox -epy36` testing.
>
> We will probably want a soft freeze of both Zuul and Nodepool then do
> our best to get both merged together so that we don't have to remember
> which project has merged and which hasn't. Once that is done we will
> need to repropose any open changes on the feature branch to the master
> branch, abandon the changes on the feature branch then delete the
> feature branch. Might be a good idea to merge as many feature branch
> changes as possible before hand?
>
> Am I missing anything?
>
> Thank you,
> Clark

Thanks for looking into that!

I don't think we're in a great position to merge a lot of the
outstanding changes on the feature branch -- many of them are post 3.0
things and I don't want to distract us from stabilizing and finalizing
the release.  We may just want to plan on porting a bunch over.

Let's plan on performing the merge this Thursday, Jan 18th.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Hostnames

2018-01-08 Thread James E. Blair
Clark Boylan  writes:

> On Sun, Jan 7, 2018, at 2:30 PM, David Moreau Simard wrote:
>> When I compared ze10 with ze09 today, I noticed that ze09's "hostname"
>> command returned "ze09" while ze10 had "ze10.openstack.org".
>> 
>> However, both nodes had the full fqdn when doing "hostname -f".
>> 
>> I didn't dig deeper since we're the weekend and all that but there might be
>> a clue in my experience above.
>
> I think the reason for this is that ze09 was rebuilt so the launch
> node scripts modified it setting hostname to only ze09 and not
> ze09.openstack.org. ze10 on the other hand was simply rebooted so its
> old hostname, ze10.openstack.org, stuck.

Distilling this conversation and that in IRC today:

The current software should produce consistent results:
  hostname -> ze09
  hostname --fqdn -> ze09.openstack.org

This is what we want on all machines.

Machines launched before October 2017 were subject to a race with
cloud-init which has since been corrected.  Those may have the FQDN for
the hostname.  That explains the discrepancy observed.

The next time we stop all of Zuul, should we rename all the hosts and
then update the grafana dashboards?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul mailing lists

2018-01-08 Thread James E. Blair
Hi,

We recently completed the work needed to host mailing lists for projects
at their own domains.  With our expanded focus in Zuul v3 on users
beyond those related to the OpenStack project, now seems like a good
time to create dedicated Zuul mailing lists.

I'd like to create the following two lists to start:

zuul-annou...@lists.zuul-ci.org  --  A list for release announcements
and to disseminate information about job definition changes (including
information about the shared zuul-jobs repos).

zuul-disc...@lists.zuul-ci.org  --  A list for general discussion about
using Zuul and development work.

Note in particular that, at the moment, I'm not proposing a dev/user
mailing list split.  Much of our dev work directly impacts users and
needs broad discussion.  Likewise, we're better developers when we dive
into real user problems.  So to the extent possible, I'd like to avoid
creating a split where one isn't necessary.

Of course, if that doesn't work out, or if circumstances change, we can
add new lists as necessary.  It seems like the most conservative
approach is to create only one discussion list and add more if needed.

It's also worth noting that some of us wear multiple hats related to
OpenStack and Zuul.  It will still be reasonable for folks to have
Zuul-related discussions on this list, or openstack-dev, when they
relate entirely to OpenStack's use of Zuul.  We might discuss adding new
executors on openstack-infra, and we might promulgate a new role for
working with devstack logs on openstack-dev.  Neither of those
discussions need to happen on the new lists.  However, like any project
used heavily by another, people involved in OpenStack with a significant
interest in Zuul should subscribe to the new lists, so they can interact
with the rest of the Zuul community.

How does that sound?

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Hostnames

2018-01-06 Thread James E. Blair
Hi,

It seems that every time we boot a new server, it either randomly has a
hostname of foo, or foo.openstack.org.  And maybe that changes between
the first boot and second.

The result of this is that our services which require that they know
their hostname (which is a lot, especially the complicated ones) end up
randomly working or not.  We waste time repeating the same diagnosis and
manual fix each time.

What is the cause of this, and how do we fix this correctly?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Test message

2017-12-22 Thread James E. Blair
Hi,

This is a test message following some maintenance to the mailing list
server.  There is no need to reply.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Xenial Upgrade Sprint Recap

2017-12-18 Thread James E. Blair
Ian Wienand  writes:

> There's a bunch of stuff that wouldn't show up until live, but we
> probably could have got a lot of prep work out of the way if the
> integration tests were doing something.  I didn't realise that although
> we run the tests, most of our modules don't actually have any tests
> run ... even something very simple like "apply without failures"

Don't the apply tests do that?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Xenial Upgrade Sprint Recap

2017-12-15 Thread James E. Blair
Clark Boylan  writes:

> Hello everyone,
>
> Just wanted to quickly recap what we got done this week during our
> control plane upgrade to Xenial sprint.

Thanks!

Now that this is over, I wonder if we should start a 'marathon' (as
opposed to a sprint) to finish the rest of the servers?

Now that we've established the pattern, perhaps if we all agreed to do
one or two servers each per week, we'd knock the rest out in good time?

I wasn't able to spend as much time dedicated to this as I would
normally have liked, but did find that due to the latency, I could fit
in a few minutes here and there into my schedule fairly easily.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Nodepool drivers

2017-12-08 Thread James E. Blair
Tristan Cacqueray  writes:

>>> I also proposed a 'plugin' interface so that driver are fully contained
>>> in their namespace, which seems like another legitimate addition to this
>>> feature:
>>>  https://review.openstack.org/524620
>>
>> I think that will be a nice thing to have, but I'd like to delay it (as
>> we have for Zuul) much further into the future.  I'd like to make a
>> couple of releases where we think the internal API is stable before we
>> consider making an external API.  In the mean time, I'd like to expand
>> the set of drivers we support in-tree.
>>
> Note that this help adding test and it makes rebase easier... How about
> we keep it for the internal API without supporting out of tree drivers?

If that's the case, I think it would be fine -- as long as it's not
exposed at all (we should remove the config file setting in favor of
direct imports, for example).

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] plan for Zuul and Nodepool to support Python3.x ?

2017-12-07 Thread James E. Blair
"Apua A.Aa"  writes:

> Hi,
>
> As title, is there a plan or road map for Zuul and Nodepool to
> support/migrate to Python3.x currently?

Yes, the next versions, Zuul and Nodepool v3.0, will support python 3
only.  Note that they will be very different from the current versions
and are not backwards compatible.

We are running pre-release versions of them in OpenStack now.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Nodepool drivers

2017-12-07 Thread James E. Blair
Tristan Cacqueray  writes:

> Hi,
>
> Top posting here to raise another complication.
> James mentioned an API problem regarding the NodeRequestHandler
> interface. Indeed the run_handler method should actually be part of the
> generic code so that the driver's handler only implements the 'launch' method.
>
> Unfortunately, this is another refactor where we need to move and
> abstract a good chunk of the openstack handler... I worked on a first
> implementation that adds new handler interfaces to address the openstack
> driver needs (such as setting az when a node is reused):
>  https://review.openstack.org/526325 
>
> Well I'm not sure what's the best repartition of roles between the
> handler, the node_launcher and the provider, so feedback would be
> appreciated.

I think we can probably perform the refactor after landing the static
driver with the current design (and we don't need to do this before the
v3.0 release).  It will mean that folks can't request a static node as
part of a nodeset with dynamic nodes, but being able to just request
static nodes alone is a useful improvement.  So if we document the
caveat and indicate that we're working to lift the restriction in the
future, that should be sufficient.

> I also proposed a 'plugin' interface so that driver are fully contained
> in their namespace, which seems like another legitimate addition to this
> feature:
>  https://review.openstack.org/524620

I think that will be a nice thing to have, but I'd like to delay it (as
we have for Zuul) much further into the future.  I'd like to make a
couple of releases where we think the internal API is stable before we
consider making an external API.  In the mean time, I'd like to expand
the set of drivers we support in-tree.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul roadmap

2017-12-06 Thread James E. Blair
Clint Byrum  writes:

> I know a bunch of this stuff is janky as all get out, because as much of the
> jankiness is my own fault as anybody else's. But so much work has gone into
> zuulv3 beyond what OpenStack needs, I am still not convinced we need to wait
> for any of this. Maybe the zuul-web stuff, since changing URLs after a release
> is going to be a bear.
>
> I'm confident we'll get some of these done soon, and I may even get a chance 
> to
> contribute directly. But we all know that complexity creeps into engineering 
> in
> the most frustrating ways. I'd prefer that this list gets pared down, and that
> the release comes basically at or right before PTG, even if this list doesn't
> all happen.

I understand where you're coming from, but please also understand that
folks are already starting to show up in #zuul asking us the same
questions repeatedly because of how janky it is.  We're getting really
close to a point where we're spending too much time talking about how
things are going to get a lot easier for folks in just a few weeks
instead of actually doing those things.  So let me go through the list
and either expand on why I think a thing is important for the release,
or move it down in priority.

>> * granular quota support in nodepool (tobias)
>> * zuul-web dashboard (tristanC)
>> * update private key api for zuul-web (jeblair)

These things are basically done.  I agree they don't have to block the
release, but they are so likely to land very soon, we should just plan
for that.  If they don't we won't wait for them.

>> * github event ingestion via zuul-web (jlk)

Zuul currently has two web servers, and telling people how to set both
of them up is complicated.  This is the sort of thing that will cause
people to think that either this software is not ready to use, or it's
too complicated.

>> * abstract flag (do not run this job) (jeblair)

(I have a WIP patch for this)

We can move this to v3.1.

>> * zuul_json fixes (dmsimard)

This is a known bug that causes Zuul to fail with certain perfectly
valid uses of Ansible.  It's easy for users to hit, but it should also
be easy to fix.

>> * refactor config loading (jeblair)

Originally this task was mostly about solving the forward inheritance
problem, which is done.  At this point, I consider the task to be more
akin to double checking that we aren't missing anything major from the
job language that we can't fix in the future.

>> * protected flag (inherit only within this project) (jeblair)

(Tobias has a patch for this)

We can move this to v3.1.

>> * refactor zuul_stream and add testing (mordred)

This is important because there are still a number of cases where errors
in Ansible are not reported in the streaming log.  We need to handle
those cases, but this code has evolved quite a bit from its original
implementation to the point where it is difficult to understand, and it
has very limited testing.  This module is nearly frozen until this
refactor happens.  This means that new users (the most likely to hit the
bugs currently masked by this) are going to have a frustrating time --
they'll have to go look at executor logs to identify job failures.

Having said that, if the release came down to this alone, we could
probably delay it.  I'd like to keep this on the list and prioritize
work on it so we can get it into v3.0, but I'm okay deferring it if it's
the last thing standing.

>> * getting-started documentation (leifmadsen)

This is also really important to have for folks -- when we release 3.0
and say "okay, we've spent 2 years telling you not to use it, go use it
now" we should have some instructions to help people do that.  It's a
complicated system, and I don't want folks bouncing off of it the first
time they try it.

However, I'll repeat the caveat from the last item here -- if it's the
last thing standing, we don't have to wait for it.

>> * demonstrate openstack-infra reporting on github
(pabelanger has since volunteered for this and begun work)

This item is about ensuring that the GitHub support works at scale.
We've had a number of folks using the GitHub driver, but as soon as we
started having the openstack-infra instance of Zuul watch some busy
GitHub repos, we started seeing errors in the log.  This is an important
new feature, and I want to make sure it's ready.  This item will either
be easy because we've already fixed the major issues, or we're going to
discover serious bugs that we would deal with soon after release anyway.

>> * cross-source dependencies

This is part of the work of adding a second source.  One of Zuul's major
features is cross-repo dependencies, and when we release GitHub support
for Zuul, I think it's important that we're able to tell a story about
how Zuul can integrate all of the repositories it works with.  It is so
much more compelling.  I understand that some folks don't need this, but
for a lot of the folks interested in Zuul and awaiting the 3.0 release
it is.

>> * add 

Re: [OpenStack-Infra] Zuul roadmap

2017-12-01 Thread James E. Blair
Fabien Boucher  writes:

>> * finish git driver
>>
>
> If ok for you, I want to propose myself to work on that git driver topic.
> I'll try to
> provide a first patch asap.

That's great, thanks!

There's some code there already, but it has no tests and hasn't been
used in a while -- I don't know if it still works.

Some brief thoughts about it:

* We should add some tests that exercise it
* It currently only supports file:// urls; it should also support
  https://
* It should periodically poll for new updates (with a configurable
  interval, maybe default to 2 hours?)
* When a branch is updated, it should perform a diffstat to determine if
  any zuul config files were updated, and if so, emit an event for the
  scheduler to reconfigure that tenant

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Nodepool drivers

2017-12-01 Thread James E. Blair
Tristan Cacqueray  writes:

> Hi,
>
> Now that the zuulv3 release is approaching, please find below a
> follow-up on this spec.
>
> The current code could use one more patch[0] to untangle the common
> config from the openstack provider specific bits. The patch often needs
> to be manualy rebased. Since it looks like a good addition to what
> has already been merged, I think we should consider it for the release.
>
> Then it seems like new drivers are listed as 'future work' on the
> zuul roadmap board, though they are still up for review[1].
> They are fairly self contained and they don't require further
> zuul or nodepool modification, thus they could be easily part of a
> future release indeed.
>
> However I think we should re-evaluate them for the release one more
> time since they enable using zuul without an OpenStack cloud.
> Anyway I remain available to do the legwork.
>
> Regards,
> -Tristan
>
> [0]: https://review.openstack.org/488384
> [1]: https://review.openstack.org/468624

I think getting the static driver in to the 3.0 release is reasonable --
most of the work is done, and I think it will make simple or test
deployments of Zuul much easier.  That can make for a better experience
for users trying out Zuul.

I'd support moving that to the 3.0 roadmap, but reserving further
drivers for later work.  Thanks!

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [zuul] third-party CI for zuul-jobs

2017-11-28 Thread James E. Blair
Jens Harbott  writes:

> 2017-11-23 5:28 GMT+00:00 Tristan Cacqueray :
> ...
>> TL;DR; Is it alright if we re-enable this CI and report those tests on
>>   zuul-jobs patchsets?
>
> I like the general idea, but please wait for more feedback until doing so.

I am in favor of the idea in general, thanks!

> Also, IMHO it would be better if you could change the "recheck-sf"
> trigger to something that does not also rerun upstream checks. What
> seems to work well for other projects is "run ci-name", where ci-name
> is the name of the Gerrit account.

Actually, I'd prefer that we do the opposite.  I'd like the recheck
command for both to just be "recheck".  There's no harm in both systems
re-running tests for a change in this case, and it keeps things simpler
for developers.  The requirements in
https://docs.openstack.org/infra/system-config/third_party.html#requirements
state that all systems should honor "recheck".  I'd like to go beyond
that in zuul-jobs and say that third-party ci systems on that repo
should *only* honor "recheck".

In the meeting today we agreed that we should start by reporting without
voting, gain some confidence, then enable +1/-1 voting.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Merging Zuul v3 into master

2017-11-27 Thread James E. Blair
Hi,

A while back we created a feature branch in git for Zuul and Nodepool v3
(feature/zuulv3).  Now that OpenStack-Infra is running it, and all of
our development focus is on it, we are unlikely to merge any more
changes to Zuul v2 or issue another Zuul v2 release.  In fact, we are
approaching the point where we can make our initial Zuul and Nodepool
v3.0 releases.

We would like to go ahead and merge feature/zuulv3 into master.  That
will effectively mean the current contents of the feature/zuulv3 branch
will appear in master.

There is no automatic upgrade path, and backwards compatibility is not
supported, so folks running CD from master should take steps to ensure
their installations are not affected.

In particular, we need to sort out a migration strategy for
puppet-openstackci.  We probably need to add a switch to that module
that will install the latest release of Zuul v2 from a tag if Zuul v2 is
selected, and otherwise install Zuul v3 from master if not.

Here are actions you can take:

* If you continuously deploy Zuul from master via some method other than
  puppet-openstackci, please take action soon to mitigate this change,
  and install from the latest release until you are ready to upgrade to
  Zuul v3.

* If you are able to contribute to making the necessary changes to
  puppet-openstackci, please reply here to volunteer to do so.

* If you think of other impacts to this change that we should mitigate,
  please let us know.

* If you have outstanding changes proposed to Zuul or Nodepool master
  branches, please take a moment and determine whether they are still
  relevant in Zuul v3.

* The same applies to stories in Storyboard as well.

We will send another announcement once we have scheduled the actual
merge.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul roadmap

2017-11-01 Thread James E. Blair
Hi,

At the PTG we brainstormed a road map for Zuul once we completed the
infra cutover.  I think we're in a position now that we can get back to
thinking about this, so I've (slightly) cleaned it up and organized it
here.

I've grouped into a number of sections.  First:

Very Near Term
--

These are things that we should be able to land within just a few weeks
at most, once we're back from the OpenStack summit and can pay more
attention to work other than the openstack-infra migration.  All of
these are already in progress (some are basically finished) and all have
a primary driver assigned:

* granular quota support in nodepool (tobias)
* zuul-web dashboard (tristanC)
* update private key api for zuul-web (jeblair)
* github event ingestion via zuul-web (jlk)
* abstract flag (do not run this job) (jeblair)
* zuul_json fixes (dmsimard)

Short Term
--

These are things we should be able to do within the weeks or months
following.  Some have had work start on them already and have a driver
assigned, others are still up for grabs.  These are things we really
ought to get done before the v3.0 release because either they involve
some of the defining features of v3, make it possible to actually deploy
and run v3, or may involve significant changes for which we don't want
to have to deal with backwards compatability.

* refactor config loading (jeblair)
* protected flag (inherit only within this project) (jeblair)
* refactor zuul_stream and add testing (mordred)
* getting-started documentation (leifmadsen)
* demonstrate openstack-infra reporting on github
* cross-source dependencies
* add command socket to scheduler and merger for consistent start/stop
* finish git driver
* standardize javascript tooling

-- v3.0 release 

Yay!  After we release...

Medium Term
---

Once the initial v3 release is out the door, there are some things that
we have been planning on for a while and should work on to improve the
v3 story.  These should be straightforward to implement, but these don't
need to hold up the release and can easily fit into v3.1.

* add line comment support to reporters
* gerrit ci reporting (2.14)
* add cleanup jobs (jobs that always run even if parents fail)
* automatic job doc generation

Long Term / Design
--

Some of these are items that we should either discuss a bit further
before implementing, but most of them probably warrant an proposal in
infra-specs so we can flesh out the design before we start work.

* gerrit ingestion via separate process?
* per-job artifact location
* need way for admin to trigger a single job (not just a buildset)
* nodepool backends
* nodepool label access (tenant/project label restrictions?)
* nodepool tenant awareness?
* nodepool rest api alignment?
* selinux domains
* fedmesg driver (trigger/reporter)
* mqtt driver (trigger/reporter)
* nodepool status ui?

How does this look?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul v3 questions

2017-10-30 Thread James E. Blair
Rikimaru Honjo  writes:

> I confirmed the below PPA, but my version is the latest.
>
> https://launchpad.net/~ansible/+archive/ubuntu/bubblewrap
>
> Should I use ubuntu higher than 16.04...?

We are running on 16.04.  But it looks like this is the PPA we're using:

deb http://ppa.launchpad.net/openstack-ci-core/bubblewrap/ubuntu xenial main

Hopefully there will be a backport soon and we can stop using it.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Some branch issues with Zuul

2017-10-27 Thread James E. Blair
cor...@inaugust.com (James E. Blair) writes:

 ...
> The Changes
> ===
>
> I believe the following changes will address all five problems and
> achieve both design goals:
>
> a) Apply inheritance at the same time as variance
>
> Rather than applying inheritance at configuration time, apply it at the
> time the job is frozen before being run.  We can perform a depth-first
> traversal up the hierarchy of parents, applying all of the matching
> variants at each level as we return down.  With the following graph
> (where lines indicate parent relationships):
>
> 0 base
>  /\
> 1 devstack  2 devstack  4 altbase
>  \/ |
>3 tempest5 tempest
>   /  \
>  6 foo  7 foo
>
> Would have the following jobs applied in order:
>
> 0 base, 1 devstack, 2 devstack, 3 tempest, 4 altbase,
> 5 tempest, 6 foo, 7 foo
>
> b) Add an implicit branch matcher to the master branch
>
> Generally this will add clarity to projects with multiple branches,
> however, if we always add the an implicit branch matcher, then it makes
> it difficult to use repos like zuul-jobs to define jobs that run
> everywhere.  So do this only if the project has multiple branches.  If
> the project only has a single branch, omit the implicit branch matcher.
>
> c) Add a config option to disable the implicit branch matcher
>
> There are some times when an implicit branch matcher on master may be
> undesirable.  For example when tempest was becoming branchless, it had
> multiple branches, but we would have wanted the jobs defined on master
> to be applicable everywhere.  Or if, for some reason, we wanted to
> create a feature branch on zuul-jobs.  For these cases, it's necessary
> to have an option to disable the implicit branch matcher.  We can add a
> new kind of configuration object to zuul.yaml, for example:
>
>   - meta:
>   implicit-branch-matcher: False
>
> Which would be intended only to apply to the current branch.  Or we
> could add an option to the tenant config file, so the admin can indicate
> that a certain project should not have an implicit branch matcher
> applied to certain branches.
>
...
>
> (d) Remove the implicit run playbook
>
> This is not required to solve any of the problems stated, however, it
> does make the solution to problem (4) even more explicit.
>
> Moreover, despite being an initial proponent of the implicit playbook, I
> have found that in practice, we have so many jobs that do not have
> playbooks at all (i.e., we're making heavy use of inheritance and
> variance) that it is becoming difficult to determine where to look for a
> job's run playbook.  Declaring the run playbook explicitly will help
> with discoverability.

These changes have been implemented and merged into the feature/zuulv3
branch.

I also included a change to accept file extensions on playbook paths,
with the intention that we will deprecate the option to omit extensions.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul v3 questions (was: Re: [openstack-dev] Update on Zuul v3 Migration - and what to do about issues)

2017-10-27 Thread James E. Blair
Rikimaru Honjo  writes:

> Hello,
>
> (Can I still use this thread?)

In the future, you may want to start a new thread on
openstack-infra@lists.openstack.org for general Zuul questions.

I've changed the CC list and subject of this message to redirect the
conversation there.

> Excuse me, I'm trying to run Zuul v3 in my environment, and I have three
> question about it.
> I'd appreciate it if anyone helps.
>
> My environment)
> I use feature/zuulv3 branch, and version is 2.5.3.dev1374.

We have not released Zuul v3 yet and we don't recommend that folks use
it yet unless they want to contribute to developing it.  Installation
and configuration is currently more difficult than we would like, and
the code base is still rapidly changing.  We will send out announcements
when it is ready for general use (including OpenStack third-party CI).
This may or may not apply to you, but I wanted to reiterate it for
anyone else reading.  Thanks for trying it out.  :)

> Q1)
> "Unknown option --die-with-parent" error was occurred when zuul ran job.
> Is there requirement of bubblewrap version?
>
> I used bubblewrap 0.1.7-1~16.04~ansible.
> If I removed "--die-with-parent" from zuul/driver/bubblewrap/__init__.py,
> above error wouldn't occurred.

You will need a newer version of bubblewrap.  Attempting to run with an
older one will cause Zuul not to behave as expected.  I believe
OpenStack-infra uses a PPA with a more recent version.

> Q2)
> When I specified "zuul_return" in playbook, the below error was occurred
> on remote host.
>
> KeyError: 'ZUUL_JOBDIR'
>
> Should I write a playbook to set a environment variable "ZUUL_JOBDIR"?

I believe that zuul_return is only expected to work on the executor, so
you may need to delegate this play to 'localhost' to ensure it does not
run on the remote node.

> Q3)
> Setup module of ansible took long time when zuul ran jobs.
> My job was succeeded if I extended timeout from 60 to 120 by modifying
> runAnsibleSetup() in zuul/executor/server.py.
>
> But, if I run same job directly(by own), it was finished soon.
> Do you have any knowledge about it?

I'm not sure about this.  It might be related to the persistent SSH
connections which are constructed by the setup task and then used by
later playbook invocations.  I'd start by correcting the bubblewrap
issue and see if this changes.

> P.S.
> Is there a constructed VM image or ansible for running zuul v3...?

Not yet, but we hope to have something like that before release.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Some branch issues with Zuul

2017-10-25 Thread James E. Blair
Doug Hellmann  writes:

> In the discussion yesterday, and in the emails today, you've implied
> that there is an ordering to job definitions beyond inheritance. Is that
> discovery order documented somewhere? If not, is it simple enough to
> describe in a few sentences here? Are repositories scanned in a
> particular order, for example? Or is it based on something else?

There's some discussion of it here:

https://docs.openstack.org/infra/zuul/feature/zuulv3/user/config.html#job

  When Zuul decides to run a job, it performs a process known as
  freezing the job. Because any number of job variants may be
  applicable, Zuul collects all of the matching variants and applies
  them in the order they appeared in the configuration. The resulting
  frozen job is built from attributes gathered from all of the matching
  variants. In this way, exactly what is run is dependent on the
  pipeline, project, branch, and content of the item.

Because top-level job variants may only be defined in the same project
(so that one project may not alter the jobs defined by another project),
the order that the repos are loaded doesn't matter for this; only the
order that branches are loaded within a repo.  That's not specified by
the documentation, though it is currently 'master' followed by others in
alphabetical order.

The proposed changes would reduce the importance of that, since master
will have an implied branch matcher, meaning that by default, jobs in
master won't have an effect on other branches.  I'd probably still leave
the order the same though, in case someone wanted to override that
behavior.

In practice, and especially with the proposed change to have an implied
branch matcher on master, the ordering aspect is most likely to be
visible when a user adds several variants of a job in the same file.

The order that the repos themselves are loaded is important, however, in
inheritance.  They are loaded in a defined order (the order they appear
in the main.yaml tenant configuration file), and currently, a job may
only inherit from another job which has already been defined.  So we
manually put more "general-use" projects (e.g., devstack, tempest)
earlier in the config.  I would characterize this as an oversight, and
was planning on fixing it soon regardless, however, the proposal to
perform late-binding inheritance will solve it as well (since the
inheritance path would be determined when the job is run, well after all
the configuration is loaded).

There's some more discussion of the repo loading order here:

  
https://docs.openstack.org/infra/zuul/feature/zuulv3/admin/tenants.html#attr-tenant

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Some branch issues with Zuul

2017-10-25 Thread James E. Blair
Andrea Frittoli  writes:

> We will need somewhere in the logs a description of the traversal that was
> done to build the final job. I believe that would help debugging issues that
> may arise from unexpected inheritance behaviour.
>
> Andrea Frittoli (andreaf)

Yes!  We have that to some degree today, and it's been very helpful
lately. :)

Example:

  
http://logs.openstack.org/33/509233/7/check/tox-linters/23b39f2/zuul-info/inventory.yaml

  _inheritance_path:
  - 'inherit from '
  - 'inherit from '
  - 'inherit from '
  - 'self '
  - 'apply variant '

The var name has an underscore because it's undocumented and (very!)
likely to change.  It's basically just a some debug lines rendered as a
list at the moment.  Once it settles down, I plan on turning it into a
proper data structure and documenting it.  Then it may be appropriate to
output it in the console log.

In the late-binding proposal, the distinction between inheritance and
variance is largely lost, and it essentially just turns into a list of
'apply variant' lines.  I think even that would still be enough,
However, I may be able to improve the output slightly and get a bit more
context back.  I'll look at that when I make the final version of the
patch.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Ignoring pep8 E741

2017-10-24 Thread James E. Blair
Hi,

I discussed this briefly with some folks in IRC and received support,
but I thought it wise to bring to the mailing list.

I think we should add E741 to the list of pep8 errors that we ignore as
a matter of course in infra projects.

This is a recently added change which forbids the use of variables
named either "l", "I", or "O".

The same upgrade also brought E722, which rejects bare "except:"
clauses.  There is a good reason to do so -- the KeyboardInterrupt
exception does not inherit from the "Exception" class, and you almost
always don't want to catch it.  So all such instances should be replaced
with "except Exception:".  I think we should simply fix these errors.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Some guidance on job variants

2017-10-24 Thread James E. Blair
At the infra meeting today, we discussed how to handle job variants.  I
will try to summarize the discussion and extrapolate some things.

The Zuul v3 migration doc in the infra manual is very clear that some
project-templates should only be added in project-config, rather than
in-repo[1].

However, there are some edge cases worth considering:

First: projects may want to alter the behavior of those jobs (e.g., to
run unit tests on an older node type).  In these cases projects should
leave the template in project-config, and just alter the job by adding a
project-pipeline variant in-repo.

Theoretically, projects could use this to make their py27 jobs
non-voting.  That would be a TC policy violation, but this system is not
designed to enforce that policy, only facilitate it.

Second: projects may want to cause those jobs not to run in some
circumstances.

Zuul's configuration for job variants is additive.  Any job variants
which match a change will be cumulatively applied to the final job
configuration before it runs.  However, once any job or variant for a
project-pipeline matches a change, that job will run.  There is not a
way to have one variant match a change, and then have a second also
match it and somehow cause the job not to run.  Variants which don't
match the job simply don't add their own attributes to it.

This means that if a project wants to alter the files or irrelevant
files list for a job covered by one of those templates, or avoid running
a job on a particular branch, there is no way to do that if the template
is applied in project-config.

In these cases, I think the following policy should apply:

1) If you can use the project-template as-is, then it should be applied
to the project in project-config.

2) If you want to improve the files or irrelevant-files matchers on the
template in such a way that they can apply safely to everyone using the
template, please do so.

3) If your project needs a job variant that is incompatible with the
template, then remove it from project-config and add the individual jobs
to the project in-repo.  The project is still responsible for adhering
to the PTI.

As you can see, these aren't hard and fast rules, this is more of an
attempt to gain clarity and be able to make helpful and consistent
suggestions to folks about how to configure jobs in certain ways.

-Jim

[1] https://docs.openstack.org/infra/manual/zuulv3.html#what-not-to-convert

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Some branch issues with Zuul

2017-10-24 Thread James E. Blair
Hi,

A number of issues related to how jobs are defined and run on projects
with stable branches have come up recently.  I believe they are all
related, and they, as well as the solutions to them, must be considered
together.


The Problems


I've identified the following five problems:

1) Parents should have variants applied

Currently inheritance and variance are distinct.  Inheritance modifies a
job's configuration statically at the time the configuration is loaded.
Variants are applied dynamically right before the job runs.

That means that when a job starts, we pick a single job to start with,
then apply variants of that particular job.  But since the inheritance
has already been applied, any variants which a user may expect to apply
to a parent job will not be applied.  When the inheritance chain is
created at configuration time, only the "reference" definition of the
job is used -- that is, the first appearance of the job.

In other words, more than likely, most jobs defined in a stable branch
are going to inherit from a parent job defined on the master branch --
even if that parent job has a stable branch variant.

2) Variants may cause parents to be ignored

We currently ignore the parent attribute on a job variant.  If we did
not, then when the variant is applied, it would pull in all of the
attributes of its parent, which is likely to be on the master branch
(since its parent will, as in #1, be the reference definition).

This ignoring of the parent attribute happens at configuration time
(naturally, since that is when inheritance is applied).

This means that if the first job that matches a change is a variant
(i.e., the reference definition of a job has a non-matching branch
matcher), this top-level variant job will not actually have a parent.

3) Variants may add duplicate pre/post playbooks

Currently, the master branch does not have an implied branch matcher, so
jobs that exist on master and stable branches will generally be derived
from both.

If such a job adds a pre or post playbook, the job that is ultimately
created and run for a change on the stable branch may have those added
by both the variant defined on the master branch as well as that defined
on the stable branch (since pre and post playbooks are cumulative).

4) Variants on branches without explicit playbooks should use branch
   playbooks

Whenever a job includes a pre-run, run (including the implicit run), or
post-run playbook, Zuul remembers where and uses that branch of that
repo to run that playbook.  If a job were constructed from the master
branch, and then had a stable branch variant applied but did not repeat
the pre-run, run, or post-run attributes from the master, then Zuul
would end up attempting to run the playbook from the master branch
rather than the stable.

5) The master branch should have implied branch matchers

Currently jobs defined in an untrusted project on any branch other than
'master' have an implicit branch matcher applied to them.  This is what
allows the version of a job in a stable branch to only affect the stable
branch.  The fact that there is no implicit branch matcher applied to
the master branch is what allows jobs defined in zuul-jobs to run on
changes to any branch.

However, this also means that jobs on stable branches are frequently
built from variants on both the master and stable branch.  This may work
for a short while, but will fail as soon as someone wants to add
something to the master branch which should not exist on the stable
branch (e.g., enabling a new service by default).


The Design Considerations
=

In looking at these, I believe they have come about because of two
design goals which we did not appreciate were in moderate tension with
each other:

A) In-repo configuration for a project should apply to that branch.
Changing the behavior of a job on a stable branch should merely involve
changing the configuration or playbook in that stable branch.  When a
project branches master to create a new stable branch, both branches
should initially have the same content and behavior, but then evolve
independently.

B) We should be able to define jobs not only in a central repository
such as project-config, but a central *untrusted* repository such as
zuul-jobs or openstack-zuul-jobs.

I think these are valid and important to keep in mind as we consider
solutions.


The Changes
===

I believe the following changes will address all five problems and
achieve both design goals:

a) Apply inheritance at the same time as variance

Rather than applying inheritance at configuration time, apply it at the
time the job is frozen before being run.  We can perform a depth-first
traversal up the hierarchy of parents, applying all of the matching
variants at each level as we return down.  With the following graph
(where lines indicate parent relationships):

0 base
 /\
1 devstack  2 devstack  4 altbase
 \/ |
   

[OpenStack-Infra] Zuul v3 tag checkouts

2017-10-24 Thread James E. Blair
Hi,

It turns out that there are some jobs in OpenStack's Zuul that were
relying on a behavior in devstack-gate and zuul-cloner that would check
out a tag when given an override branch.

Zuul v3 is a bit more literal -- the 'override-branch' option (for both
jobs and projects) will only checkout a branch, and if it does not
exist, it falls back in the usual manner.

To support this case we could:

1) Extend 'override-branch' to support tags as well as branches.

2) Add 'override-tag' to support tags.

3) Add 'override-ref' to support branches + tags.

We can of course do both 2 and 3 as well.

4) Add 'override-checkout' to support branches + tags.

If we do 3 (only), or 4, we might choose to drop 'override-branch'.

If we keep override-branch, we would need to establish whether more than
one of these options is permitted, and if so, what the order of
precedence is.  And if we declare them exclusive, what do to in the case
of inheritance.

I'm inclined to do the following:

Add 'override-checkout' to support branches + tags and drop (after a
deprecation period) 'override-branch'.  It's very clear what it does,
and it relates to git terminology.

Are there any other use cases (either in OpenStack or from other users)
that we should consider that might suggest we should choose another
solution?  And are there any other suggestions for how we might handle
this?

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Nominating new project-config and zuul job cores

2017-10-17 Thread James E. Blair
Clark Boylan  writes:

> Hello everyone,
>
> I'd like to nominate a few people to be core on our job related config
> repos. Dmsimard, mnaser, and jlk have been doing some great reviews
> particularly around the Zuul v3 transition. In recognition of this work
> I propose that we give them even more responsibility and make them all
> cores on project-config, openstack-zuul-jobs, and zuul-jobs.

Yes, that sounds great!  They've been a big help and totally grok stuff.

Thanks to all!

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Upcoming backwards incompatible change on Zuul master

2017-09-27 Thread James E. Blair
cor...@inaugust.com (James E. Blair) writes:

> Hi,
>
> We need to merge a backwards incompatible change to Zuul master.
>
> The change is: https://review.openstack.org/482856 and it makes Gerrit
> label entries case sensitive.  Unfortunately, some combinations of
> Gerrit versions and underlying database configurations make this both
> necessary and difficult to handle seamlessly.
>
> This will affect both installations continuously delivered from git
> master, as well as those that are upgraded to the latest releases.
>
> The complexity of this situation leaves us few options other than to
> make this change and minimize the impact by isolating it and providing
> an upgrade plan.
>
> The upgrade plan is the same regardless of whether you run Zuul
> continuously deployed from master or releases.
>
> Upgrade Procedure
> -
>
> The latest release of Zuul, as of this writing, is 2.5.2.  It treats all
> Gerrit labels as case insensitive, however, if a label is capitalized in
> Zuul's layout configuration with this version, typical gate pipelines
> may not function correctly.  Therefore:
>
> 1) Prepare, *but do not merge*, a patch to change the case of all Gerrit
> labels in layout.yaml.  Typically, this would mean changing instances of
> "verified:" to "Verified:" or "workflow:" to "Workflow:".
>
> 2) Next Tuesday, September 26, we will merge
> https://review.openstack.org/482856 and release version 2.6.0, which
> switches to the case-sensitive behavior (and contains no other
> substantive changes).  Once this change is merged to the master branch
> and and the new version of Zuul is released is released, prepare to
> upgrade.
>
> 3) Merge the change prepared in step 1, then upgrade to 2.6.0 (or the
> master branch tip) immediately afterword and restart Zuul.

We have merged the change referenced above, and released Zuul 2.6.0.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul feature/zuulv3 branch to be rewound

2017-09-21 Thread James E. Blair
Darragh Bailey  writes:

>> The abandon/restore steps are required by Gerrit in order to delete the
>> branch.  We could force-push the branch tip, but this is the procedure
>> we have asked and would ask any other project to use in a similar
>> situation, in order to reduce the risk of error.
>
> Can you elaborate more on how this reduces risk/errors? Curious in case we
> run into a similar scenario in the future.

Deleting/creating branches entails granting fewer permissions in Gerrit,
so there is less opportunity to accidentally commit an error such as
pushing to the wrong branch, or accidentally pushing local tags, etc.
Creating a branch through the Gerrit web UI is also a very deliberate
step, arguably easier to verify than a push.

It's not a lot of extra safety.  Just some.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul feature/zuulv3 branch to be rewound

2017-09-21 Thread James E. Blair
Hi,

A change recently landed on the feature/zuulv3 branch of Zuul with a
Sem-Ver commit message footer.  This is used by PBR to alter the way it
constructs version numbers.  It's pretty nifty, but in my opinion, it
has a fundamental flaw: it can't be undone.

I think use of this by a project should be very carefully considered,
and that hasn't happened in the case of Zuul.  Meanwhile, I think that
during a development phase, in order to feel comfortable merging any
change, we need to know that we can revert it if we make a mistake.
That isn't possible with Sem-Ver footers -- they will always be parsed
by PBR once they exist in the commit history.

To correct this situation, the commit with the Sem-Ver footer needs to
be removed from the branch.  To accomplish this, I will do the following
within the next hour or so:

1) Abandon all open changes on feature/zuulv3.
2) Delete the feature/zuulv3 branch.
3) Re-create the feature/zuulv3 branch at the commit before the Sem-Ver
   change: 027ba992595d23e920a9cf84f67c87959a4b2a13.
4) Restore all the changes abandoned in step 1.

The abandon/restore steps are required by Gerrit in order to delete the
branch.  We could force-push the branch tip, but this is the procedure
we have asked and would ask any other project to use in a similar
situation, in order to reduce the risk of error.

After this is complete, if you have updated your copy of the
feature/zuulv3 branch within the past day, you will probably not be able
to fast-forward any more.  You will need to run "git reset --hard
origin/feature/zuulv3" on your local feature/zuulv3 branch to correct
the situation.  Anyone deploying continuously from the branch tip may
need to perform similar repairs.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Moving docs-draft to logs.o.o

2017-09-20 Thread James E. Blair
Hi,

We originally created the docs-draft site (and filesystem partition)
because doc builds were *big* and we had to expire them much more
quickly than build logs.  The expiration times were 21 days for
docs-draft and 6 months for build logs.

The tables have turned.

Docs-draft expiration is still 21 days, but build logs have gotten so
large we've reduced expiration there to 30 days.

Since they are more closely matched now, we can fold docs-draft back
into the logs volume and gain several benefits.  We're currently using
299G of space for docs-draft.  Accounting for the extra 9 days of
retention would bring us to about 427G of space.  Our current headroom
on logs is 3.0T, so we can handle that change immediately with no ill
effect.

Once the existing retention period has expired, we can reclaim the space
from the doc-draft volume and gain an additional 1.5T of capacity in
logs.  Once that is done, accounting for the increased use from
docs-draft jobs, we'll have an overall net gain of 1T of free space.

On top of all that, we will improve the UX of the docs-draft system by
making it easier to switch to the build logs on a successful build.
Currently we have to edit the hostname in the URL to do so; but in the
future, we can simply remove trailing path components.

Assuming there aren't any objections, we can effect this change during
the v3 transition, then perform the volume reclamation 3 weeks later.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [zuul] v3 UI enquiry

2017-09-19 Thread James E. Blair
Darragh Bailey <daragh.bai...@gmail.com> writes:

> On 6 September 2017 at 01:26, James E. Blair <cor...@inaugust.com> wrote:
>
>> Darragh Bailey <daragh.bai...@gmail.com> writes:
>>
>> > Hi,
>> >
>> > Currently the main issue from end users, is the user experience around
>> the
>> > UI when looking at job results once the run is complete, and to a lesser
>> > extend looking at your jobs in the zuul status page when running.
>> >
>> > I'm wondering if there is a plan to have a full UI covering both
>> historical
>> > job runs for projects as well as live status reporting?
>>
>> Yes, Tristan has started work on that, and we plan to continue it after
>> we migrate openstack to Zuul v3.
>>
>
> Is there a link to where this is being tracked/discussed?

It's not a priority until after the OpenStack transition, so it's not
well tracked.  But there are some work in progress patches in the review
queue.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Upcoming backwards incompatible change on Zuul master

2017-09-18 Thread James E. Blair
Hi,

We need to merge a backwards incompatible change to Zuul master.

The change is: https://review.openstack.org/482856 and it makes Gerrit
label entries case sensitive.  Unfortunately, some combinations of
Gerrit versions and underlying database configurations make this both
necessary and difficult to handle seamlessly.

This will affect both installations continuously delivered from git
master, as well as those that are upgraded to the latest releases.

The complexity of this situation leaves us few options other than to
make this change and minimize the impact by isolating it and providing
an upgrade plan.

The upgrade plan is the same regardless of whether you run Zuul
continuously deployed from master or releases.

Upgrade Procedure
-

The latest release of Zuul, as of this writing, is 2.5.2.  It treats all
Gerrit labels as case insensitive, however, if a label is capitalized in
Zuul's layout configuration with this version, typical gate pipelines
may not function correctly.  Therefore:

1) Prepare, *but do not merge*, a patch to change the case of all Gerrit
labels in layout.yaml.  Typically, this would mean changing instances of
"verified:" to "Verified:" or "workflow:" to "Workflow:".

2) Next Tuesday, September 26, we will merge
https://review.openstack.org/482856 and release version 2.6.0, which
switches to the case-sensitive behavior (and contains no other
substantive changes).  Once this change is merged to the master branch
and and the new version of Zuul is released is released, prepare to
upgrade.

3) Merge the change prepared in step 1, then upgrade to 2.6.0 (or the
master branch tip) immediately afterword and restart Zuul.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [zuul] v3 UI enquiry

2017-09-05 Thread James E. Blair
Darragh Bailey  writes:

> Hi,
>
> Currently the main issue from end users, is the user experience around the
> UI when looking at job results once the run is complete, and to a lesser
> extend looking at your jobs in the zuul status page when running.
>
> I'm wondering if there is a plan to have a full UI covering both historical
> job runs for projects as well as live status reporting?

Yes, Tristan has started work on that, and we plan to continue it after
we migrate openstack to Zuul v3.

> I've noticed there is an SQL reporter, should that be leveraged to provide
> access to the sort of information that needs to be tracked? Or should
> something else be used such as simple files to identify success/failure?

Yes, we plan on using it to supply the historical data.

> 1) Project status view:
>
> The main status page can be useful for those supporting zuul in an
> environment to have a quick overview of everything, however feedback we've
> received locally suggests most developers are frequently only interested in
> a single project and find this to be overwhelming This suggests there
> should be a per-project status view, showing only the pipelines and changes
> of interest for a single project, along with any CRD that are linked.
>
> Possibly under an endpoint of "//" as the default view showing any
> pipelines relevant to the project along with any changes going through them.

That sounds reasonable once we have the new web framework in place.

> 2) Project build history view:
>
> Returning links at the end of a run in a comment to the raw log files as
> the only build history is somewhat jarring for people used to the
> Travis/Drone/Circle CI services available, there's the expectation that it
> should be easily to see the previous runs, seeing whether they passed,
> which jobs within a set failed, and to display the detailed log when
> desired. Providing a reasonable UI is becoming a pre-requisite for
> adoption, no matter how good the functionality, it can be difficult to get
> acceptance if the "form" is deemed limiting (non all follow this, but it
> does seem that some of the most vocal objectors can fall into this area).

I'm a little confused by this one.  I agree they should be available,
but I think they already are.  Zuul currently provides (in a comment, as
you say) the overall pass/fail status of the build set, the pass/fail
status of individual jobs, that same information for previous runs on
the change, and links to the detailed logs.  In Github, the presentation
of that available to us is somewhat limited, but I think in Zuul v3
we're approaching a reasonable compromise.  Note this may be
significantly different that what you are running.

If you're suggesting that there should *also* be a way to find that
information from a main Zuul web page, yes, I think that would be
covered by the previous point.

I would like to point out though that Zuul's primary user interface
always has been, and will continue to be, the code review system.
That's on purpose.  The recognition that a developer should have all the
information about a change, including test results, at their fingertips
is one of the earliest innovations in Zuul; only now are facilities
appearing in Github *and* Gerrit to support this the way we've always
wanted.  The web interface will be a complement, not a replacement, for
this.  That doesn't mean the on-change reporting is frozen, we will
happily continue to improve the UX there (especially with the new tools
available to us).

> 3) Change view
>
> Users of tools such as GitHub are expecting a link to appear on the PR
> status checks that can then be followed to see the state of the job(s)
> currently executing. Currently this can be enabled through setting of a
> config option "status_url_with_change" in the zuul section. This can also
> be used with Gerrit making it easy to quickly see a single change.

That config option does not appear in my tree.  However, I believe the
Github support in Zuul v3 already supports what you describe.

> Are these of any interest?
>
> I'm no UI expert, so I'm just identifying some of the pieces that we hear
> feedback over and hoping that if I can help get the necessary information
> exposed and appearing at the right places someone else might know how to
> make it look nice..

Thanks for the use cases.  I think what you're looking for is not
radically different than the rest of the team.  It may be helpful for
you to start testing Zuul v3, as I think some of the areas you're
finding rough edges are different there.  Once you do so, I'm sure I and
others will be happy to help work out any remaining issues.

Expect effort on the historical web interface and rest API to begin
after OpenStack migrates to Zuul v3.  It'd be great if you can pitch in
on that with either reviews or code once we get going.  None of us
describe ourselves as javascript developers, so we aim to keep that area
pretty accessible.

-Jim


Re: [OpenStack-Infra] Plan for devstack Zuul v3 jobs

2017-08-25 Thread James E. Blair
David Moreau Simard  writes:

> I'm OK with this.
> Should we get to a point where the scripts can be used by both Zuul v2
> and Zuul v3 simultaneously so that there is no "migration" or "cutover" so
> to speak ?
>
> It might mean some amount of work but it shouldn't be too bad, I think ?
> Ansible (Zuul v3) will potentially end up running Ansible but we've seen
> worse and it'll be temporary.
>
> It would allow for a smooth transition because we would just essentially
> stop running the v2 jobs when we are ready.

Yes, I'm working on that in this series of changes to create the
devstack-legacy job:

  https://review.openstack.org/497699

However, there will still be work for the migration script to do.
Essentially, the JJB shell snippet will need to be transformed into a
Zuul job variable, the usage of the local_conf macro transformed into a
different job variable, and any projects which appear in "PROJECTS=..."
variable assignments in the job shell will need to be detected and added
to the required-projects job attribute.  I believe these are all
relatively straightforward transformations.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Plan for devstack Zuul v3 jobs

2017-08-25 Thread James E. Blair
Hi,

As many of you may recall, our plan to migrate the devstack jobs to Zuul
v3 was to gradually convert pieces of the existing devstack-gate script
to Ansible roles which would be shared between the Ansible run inside
the devstack-gate script and the Ansible run by Zuul v3.  Eventually,
the idea was that the devstack-gate script would be functionally very
similar to Zuul v3, and replacement would be easy.

Unfortunately, not enough of that work is completed for us to be able to
effect the transition in that manner.  We could continue in that
direction, however we would certainly miss our target of a PTG cutover.
Instead, in an attempt to meet that deadline, I propose the following:

We create a devstack-legacy job in Zuul v3 which attempts to run
devstack-gate in the manner closest to that in which it runs today.
This means that it will use the Zuul-provided git repos rather than
performing its own git fetch operations, and supply config files and
environment variables which are compatible with the way Zuul v2 works.

Simultaneously, we also create a new devstack job which utilizes all the
new features of Zuul v3 and is structured in the way we envisioned
earlier.  We can start very simply here and avoid carrying all of the
design baggage from the earlier job.  This will be a job that projects
can build off of and migrate to over time, once we have completed the
migration.

During the migration itself, we automatically convert all of the
devstack jobs to use the devstack-legacy job framework.

Fortunately, we have completed (and are still continuing) some
significant work on the Ansiblification of the current devstack-gate job
(thanks!).  That has already allowed us to copy some of those roles into
the base job and made some features previously only available to
devstack available to all jobs.  We should continue the work in progress
there, but I don't think we should expect the two jobs to directly share
those roles.

To that end, we should keep the v2 devstack-gate roles in the
playbooks/roles/ directory, but as we create v3 versions of the roles,
place them in the top-level roles/ directory.  These roles will be
somewhat duplicative, but this will allow us to fully separate the two
sets of jobs which will allow us to work on the v3 versions of the roles
without having to worry about maintaining compatability, and to finish
the legacy version of the job quickly.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul v3: some layout checks disabled in project-config

2017-08-10 Thread James E. Blair
Hi,

With https://review.openstack.org/492697 we are moving gating of Zuul
itself and some related job repos from Zuul v2 to Zuul v3.  As part of
this, we need to disable some of the checks that we perform on the
layout file.  That change disables the following checks for the
openstack-infra/* repos only:

* usage of the merge-check template
* at least one check job
* at least one gate job
* every gerrit project appears in zuul

The first three should only be needed for a short time while we continue
to construct the post and release pipelines in Zuul v3.  After that is
complete, we should be able to reinstate those checks, but we will need
to keep the final check disabled (for openstack-infra repos at least)
until Zuul v2 is retired.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Blogging Question

2017-08-03 Thread James E. Blair
Jeremy Stanley  writes:

> On 2017-08-03 20:34:14 + (+), Morgenstern, Chad wrote:
>> I'm wondering, now that the ini is corrected (I hope), will our older
>> posts show up or will only the blogs we post after the change is
>> picked up show up on your site?
>
> The cache directory for it also contains no content for that site
> yet, so if it was actually fixed I think it might try to interleave
> your earlier posts but I don't actually know enough about its
> internals to be able to say that for certain.

Yes, the expected behavior is that with an empty cache, all the previous
posts will show up after it is fixed.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Announcing Gertty 1.5.0

2017-07-30 Thread James E. Blair
Announcing Gertty 1.5.0
===

Gertty is a console-based interface to the Gerrit Code Review system.

Gertty is designed to support a workflow similar to reading network
news or mail.  It syncs information from Gerrit to local storage to
support disconnected operation and easy manipulation of local git
repos.  It is fast and efficient at dealing with large numbers of
changes and projects.

The full README may be found here:

  https://git.openstack.org/cgit/openstack/gertty/tree/README.rst

Changes since 1.4.0:


* Added support for sorting dashboards and change lists by multiple
  columns

* Added a Unicode graphic indication of the size of changes in the
  change list

* Added the number of changes to the status bar in the change list

* Added a trailing whitespace indication (which can be customized or
  ignored in a custom palette)

* Several bug fixes related to:
  * Negative topic search results
  * Crashes on loading changes with long review messages
  * Avoiding spurious sync failures on conflict queries
  * Errors after encounting a deleted project
  * Better detection of some offline errors
  * Fetching missing refs
  * Gerrit projects created since Gertty started
  * Re-syncing individual changes after a sync failure

Thanks to the following people whose changes are included in this
release:

  Jim Rollenhagen
  Kevin Benton
  Masayuki Igawa
  Matthew Thode

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] On the subject of HTTP interfaces and Zuul

2017-06-28 Thread James E. Blair
Monty Taylor  writes:

> With that in mind, I believe the path should be for the new server
> being written for the console-log streaming to be called "zuul-web"
> and that it should serve the console-log websocket at /console-stream
> or /console-log or something.
>
> We can then use our Apache frontend to serve /status from the
> zuul-scheduler and /console-stream from zuul-web - and in the fullness
> of time we can potentially move the webhooks and the status page from
> the scheduler to zuul-web should we choose.
>
> Additionally, as we look at things like dashboards, they can go
> directly into zuul-web and be written with aiohttp. As zuul-web is
> stateless, it's a great candidate for a pure scaleout model.

I like this as a way forward.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [infra] Status of Zuul v3

2017-06-14 Thread James E. Blair
Greetings!

This periodic update is primarily intended as a way to keep
contributors to the OpenStack community apprised of Zuul v3 project
status, including future changes and milestones on our way to use in
production. Additionally, the numerous existing and future users of
Zuul outside of the OpenStack community may find this update useful as
a way to track Zuul v3 development status.

If "changes are coming in the land of Zuul" is new news to you, please
read the section "About Zuul and Zuul v3" towards the end of this
email.

== Zuul v3 project status and updates ==

The biggest recent development is that basic support for GitHub has
merged!  Thanks to Jan, Tobias, Jonathan, Jamie, Jesse, and everyone
else that helped with that years-long effort!  We're still working to
achieve feature parity (notably, cross-repo dependency support hasn't
been implemented yet), but basic operations work and we have a good base
to start from.

We've also landed support for bubblewrap, so that untrusted job content
can run in a restricted environment.  This is a big improvement for
executor security.  Thanks to Clint and others who helped with this!

We merged support for live-streaming interleaved ansible logs and
console logs from all of the hosts in a job.  The streaming protocol is
compatible with finger, so you can easily request the log for a job by
running "finger UUID@executor".  That's handy for using unix tools to
deal with the output (think grep, sed, awk, etc).  To make this
accessible over the web, we are working on a websocket based console
streamer, which uses the finger-compatible endpoints on the backend.
When we're done, we'll have a nice web frontend for easily viewing
console logs linked to from the status page, and finger URLs for users
who want to view or process their logs from a unix shell.  Thanks to
David and Monty for work on this!

We've created some new repositories to hold Zuul jobs and the Ansible
roles that they use.  We're going to try something new here -- we want
to create a standard library of jobs that any Zuul installation (not
just those related to OpenStack) can use.  Flexibility and local
customization of jobs is very important in Zuul v3, but with job
inheritance and Ansible roles, we have two very useful methods of
composition that we can use to share job content so that not everyone
has to reinvent the wheel.  These are the repos we've created and how we
expect to use them:

  openstack-infra/zuul-jobs

This is what we're calling the "standard library".  We're going to
put any jobs which we think are not inherently OpenStack-specific.
For example, jobs to run python unit tests, java builds, go tests,
autoconf/makefile based projects, etc.

  openstack-infra/openstack-zuul-jobs

This is where we will put OpenStack-specific jobs (or
OpenStack-specific variants of standard library jobs).

In the near term, we're going to start populating these repos with what
we need for OpenStack's Zuul, and will probably move things around quite
a bit as we figure out where they should go.  We are also working on a
Sphinx extension (in the openstack-infra/zuul-sphinx repo) to
automatically document all of the jobs and roles in these repos.  We
should have self-documenting jobs with published documentation right
from the start.  Thanks to Paul for his help on this!

Also thanks to Paul for setting up OpenStack's production instance of
Zuul v3 at zuulv3.openstack.org server and our first executor at
ze01.openstack.org.  That's running now, and we're currently working
through some things that we deferred from setting up our dev instance,
notably log publishing.

With the approval of the nodepool drivers spec:

  
http://specs.openstack.org/openstack-infra/infra-specs/specs/nodepool-drivers.html

Tristan has started work on an implementation supporting multiple
backend drivers for nodepool.  This will initially include a driver for
static nodes, and later we will use this to support multiple cloud
technologies:

  http://lists.openstack.org/pipermail/openstack-infra/2017-June/005387.html

Tristan has also proposed a proof-of-concept implementation of a
dashboard for Zuul, which has prompted a conversation about web
frameworks:

  http://lists.openstack.org/pipermail/openstack-infra/2017-June/005402.html

We're working to come to consensus on that so that we can ultimately
converge our webhooks, status page, websocket console streaming, and
dashboard onto one framework.

Upcoming tasks and focus:
* Re-enabling disabled tests: We're continuing to make our way through
the list of remaining tests that need enabling. See the list, which
includes an annotation as to complexity for each test, here:
https://etherpad.openstack.org/p/zuulv3skips
* Github parity
* Log streaming
* Standard jobs
* Set up production zuulv3.openstack.org server
* Full task list and plan is in the Zuul v3 storyboard:
https://storyboard.openstack.org/#!/board/41

Recent changes:
* Zuul v3:

Re: [OpenStack-Infra] Nodepool drivers

2017-06-14 Thread James E. Blair
Tristan Cacqueray  writes:

> Hi,
>
> With the nodepool-drivers[0] spec approved, I started to hack a quick
> implementation[1]. Well I am not very familiar with the nodepool/zookeeper
> architecture, thus this implementation may very well be missing important
> bits... The primary goal is to be able to run ZuulV3 with static nodes,
> comments and feedbacks are most welcome.

I've taken a general look and I think this is heading in the right
direction.  We should ask David Shrewsbury to look at it when he gets a
chance, and Tobias as well when he's back.  Thanks!

> Moreover, assuming this isn't too off-track, I'd like to propose an
> OpenContainer and a libvirt driver to diversify Test environment.

I think the most important thing is the static node driver -- that's
part of the original scope for Zuul v3, and we need it for functional
parity with v2.

An OpenContainer driver sounds fine to me, but I'm reluctant to add a
libvirt driver at the moment -- there is a lot of potential overlap with
OpenStack, as well as other potential drivers such as linch-pin.  Maybe
there are some compelling reasons to do so, but I'd rather defer that
for a while until we establish some guidelines around in-tree drivers.

Since it's a scope expansion, we should consider anything beyond the
static driver to be a lower priority while we work to get Zuul v3
finished.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] On the subject of HTTP interfaces and Zuul

2017-06-12 Thread James E. Blair
Clint Byrum  writes:

> Excerpts from corvus's message of 2017-06-09 13:11:00 -0700:
>> Clark Boylan  writes:
>> 
>> > I'm wary of this simply because it looks a lot like repeating
>> > OpenStack's (now failed) decision to stick web servers in a bunch of
>> > python processes then do cooperative multithreading with them along with
>> > all your application logic. It just gets complicated. I also think this
>> > underestimates the value of using tools people are familiar with (wsgi
>> > and flask) particularly if making it easy to jump in and building
>> > community is a goal.
>> 
>> I agree that mixing an asyncio based httpserver with application logic
>> using cooperative multithreading is not a good idea.  Happily that is
>> not the proposal.  The proposal is that the webserver be a separate
>> process from the rest of Zuul, it would be an independently scaleable
>> component, and *only* the webserver would use asyncio.
>> 
>
> I'm not totally convinced that having an HTTP service in the scheduler
> that gets proxied to when appropriate is the worst idea in the short term,
> since we already have one and it already works reasonably well with paste,
> we just want to get rid of paste faster than we can refactor it out by
> making a ZK backend.
>
> Even if we remove paste and create a web tier aiohttp thing, we end up
> writing most of what would be complex about doing it in-process in the
> scheduler. So, to tack gearman on top of that, versus just letting the
> reverse proxy do its job, seems like extra work.

What I'd like to get out of this conversation is a shared understanding
of what the web tier for Zuul should look like in the future, so that we
can know where we want to end up eventually, but *not* a set of
additional requirements for Zuul v3.0.  In other words, I think this is
a long-term, rather than short-term conversation.

The way I see it is that we're adding a bunch of new functionality to an
area of Zuul that we've traditionally kept very simple.  We're growing
from a simple JSON endpoint to support websockets, event injection via
hooks, and a full-blown API for historic data.

That last item in particular calls out for a real web framework.  Since
it is new work and has substantial interaction with the web framework,
it would be good to know what our end state is, so that folks working on
it can go ahead and head in that direction.

The other aspects, which are largely already implemented, can be ported
over in the fullness of time.

We do not need to change how we are doing webhooks or log streaming for
Zuul v3.0.

In fact, I imagine that at least initially, we would implement something
in openstack-infra like what you describe, Clint.  We will have an
Apache server which proxies status.json requests and webhooks to
zuul-scheduler, and proxies websocket requests to the streaming server.

As time permits, we can incorporate those into a comprehensive web
server with the framework we choose.

Does that sound like a good plan?

Does aiohttp alone fit the bill as Monty suggests, or do we need to
consider something else?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] About aarch64 third party CI

2017-06-12 Thread James E. Blair
Ricardo Carrillo Cruz  writes:

> This is a nodepool.yaml that can help you get going:
>
> http://paste.openstack.org/show/612191/

Glad it worked!

You can drop 'zmq-publishers' from the config entirely.

If 'images-dir' and 'diskimages' are required, then I would consider
that a bug; we should have default values for those so you don't need to
provide them in this case.

That config snippet also illustrates something I didn't quite realize at
the time I reviewed https://review.openstack.org/472959.  I don't think
we should be using UUIDs as keys in nodepool because they are hard for
humans to distinguish from each other.  It could make for somewhat
error-prone configuration.

So instead of:

cloud-images:
  - name: 9e884aab-a46e-46de-b57c-a044da0f45cd
pools:
  - name: main
labels:
  - name: xenial
cloud-image: 9e884aab-a46e-46de-b57c-a044da0f45cd

If someone wants to specify an image by id, we should have:

cloud-images:
  - name: mycloudimagename
id: 9e884aab-a46e-46de-b57c-a044da0f45cd
pools:
  - name: main
labels:
  - name: xenial
cloud-image: mycloudimagename

And then if you omit the 'id' field, we should just implicitly use
'name' as before.  This way it's easy to see which of several
cloud-images a label uses, and, when it's time to update the UUID for
that cloud image, that only needs to happen in one place.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] On the subject of HTTP interfaces and Zuul

2017-06-09 Thread James E. Blair
Clark Boylan  writes:

> I'm wary of this simply because it looks a lot like repeating
> OpenStack's (now failed) decision to stick web servers in a bunch of
> python processes then do cooperative multithreading with them along with
> all your application logic. It just gets complicated. I also think this
> underestimates the value of using tools people are familiar with (wsgi
> and flask) particularly if making it easy to jump in and building
> community is a goal.

I agree that mixing an asyncio based httpserver with application logic
using cooperative multithreading is not a good idea.  Happily that is
not the proposal.  The proposal is that the webserver be a separate
process from the rest of Zuul, it would be an independently scaleable
component, and *only* the webserver would use asyncio.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] On the subject of HTTP interfaces and Zuul

2017-06-09 Thread James E. Blair
Clint Byrum  writes:

> Your words are more succinct than what I wrote, which is nice. I think
> we agree on the general direction for the time being.
>
> However, I don't think ZK will be a good choice for async event
> handling. I'd sooner expect MQTT to replace gear for that. It's worth
> noting that MQTT's protocol shares a lot in common with gearman and was
> created to do similar things.

You make some good points here, and in the other message, which have
both immediate and longer term aspects.

Your concern about using ZK for distributed ingestion is worth
considering as part of that discussion.  We've shelved it for the moment
as potentially distracting for v3 work.  But I think we can take your
point as being both that we should consider the potential issues there
when we discuss it, but also, for the moment, let's not presume in this
design that we're going to shove events into ZK in the future.

The other immediate aspect of this is that, if we are going to use this
framework for Github web hooks, we do need *some* answer of how to get
that info to the scheduler.

I'd say that since this is intended to be part of Zuul v3, and that we
have not taken any steps to reduce reliance on gearman in v3, that we
should go ahead and say that when the webhooks move to this framework,
they should submit their events to the scheduler via gearman.  The
scheduler already accepts administrative events ("zuul enqueue") over
gearman; this is not a stretch.

I think that should clarify this aspect of this proposal for now, and
leaves us to consider the general question of distributed event
ingestion later.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] On the subject of HTTP interfaces and Zuul

2017-06-09 Thread James E. Blair
Monty Taylor  writes:

> We should use aiohttp with no extra REST framework.
>
> Meaning:
>
> - aiohttp serving REST and websocket streaming in a scale-out tier
> - talking RPC to the scheduler over gear or zk
> - possible in-process aiohttp endpoints for k8s style health endpoints

...

> Since we're starting fresh, I like the idea of a single API service
> that RPCs to zuul and nodepool, so I like the idea of using ZK for the
> RPC layer. BUT - using gear and adding just gear worker threads back
> to nodepol wouldn't be super-terrible maybe.

Thanks for the thoughtful analysis.  I think your argument is compelling
and I generally like the approach you suggest.

On the RPC front, how about we accept that, for the moment, the
webserver will need to consult ZK for collecting some information
(current nodepool label/image status), and use gear for other things
(querying zuul about build status)?

The rest of Zuul already uses both things, let's just have the webserver
do the same.  Eventually gear functions will be replaced with ZK.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul V3: Behavior of change related requirements on push like events

2017-05-30 Thread James E. Blair
Jeremy Stanley  writes:

> On 2017-05-30 12:53:15 -0700 (-0700), Jesse Keating wrote:
> [...]
>> Github labels: This is like approvals/reviews.
> [...]
>
> Perhaps an interesting aside, Gerrit uses the same term (labels) for
> how we're doing approvals and review voting.

Yeah, or at least, related.  I think in Gerrit a "label" is a review
category (eg, "Verified", "Code Review") and an "approval" is a value
given by a user to a change in one of those categories (eg, "Verified:
+1", or "Code Review -2" would be an interestingly named "approval").

Of course, that's new[1]; they used to be called "categories" rather
than "labels".

>> Personally, my opinions are that to avoid confusion, change type
>> requirements should always fail on push type events. This means
>> open, current-patchset, approvals, reviews, labels, and maybe
>> status requirements would all fail to match a pipeline for a push
>> type event. It's the least ambiguous, and promotes the practice of
>> creating a separate pipeline for push like events from change like
>> events. I welcome other opinions!
>
> This seems like a reasonable conclusion to me.

Agreed -- we haven't run into this condition because our pipelines are
naturally segregated into change or ref related workflows.  I think
that's probably going to be the case for most folks, so codifying this
seems reasonable.  However, I could simply be failing to imagine a
pipeline that works with both.

-Jim

[1] As of six years ago.

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Canceling Zuul meeting 2017-05-29

2017-05-26 Thread James E. Blair
Hi,

Due to the US holiday on Monday, the Zuul meeting on May 29, 2017 is
canceled.

Thanks,

Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul v3: proposed new Depends-On syntax

2017-05-26 Thread James E. Blair
Joshua Hesketh <joshua.hesk...@gmail.com> writes:

> On Fri, May 26, 2017 at 1:09 AM, James E. Blair <cor...@inaugust.com> wrote:
>> So I think that we should start out by simply silently ignoring any
>> patchset elements in the URL.  We could consider having Zuul leave a
>> note indicating that the patchset component is not necessary and is
>> being ignored.
>>
>
> Hmm, I'm not sure if that's the best way to handle it. If somebody clicks
> the link they'll be shown a particular patchset (whether they are aware of
> not) and it may cause confusion if something else is applied in testing.
> zuul leaving a message could help clarify this, but perhaps we should
> honour the patchset in the URL to allow for some very specific testing or
> re-runs. This also links into what I was saying (in my now forwarded
> message) about "tested-with" vs "must be merged first". We could test with
> a patchset but that is irrelevant once something has merged (unless we add
> complexity such as detecting if the provided patchset version has merged or
> if it was a different one and therefore the dependency isn't met and needs
> updating).
>
> Either way I like the idea of zuul (or something) leave a message to be
> explicit.

I think when someone uses Depends-On, they invariably mean "this change
depends on this other change" not "this other patchset".  Referring to a
previous patchset may have some utility, however, it would be
counter-intuitive and doesn't help developers in the way that Zuul is
designed to.

Fundamentally Zuul is about making sure that it's okay to land a change.
It creates a proposed future state of one or more repositories, and
verifies that future state is correct.  Depending on a patchset would
violate this in two ways.

First, if A Depends-On: B and B is updated, there is no feedback to the
developers whether the new revision of B is correct.  We use Depends-On
not only to ensure that change A has a way to pass tests, but also that
B is a correct change that enables the behavior desired in A.

In other words, we're not answering the question "Is it okay to merge B?"

Second, we would be creating a proposed future state that we know can
not exist.  If A Depends-On: an old patchset of B, and we run that test,
it would be pointless because we know that old patchset of B is not
going to merge.  So we're not answering the question "Will A be able to
merge?"

We merge every change with its target branch before testing in check
(rather than testing the change as it was written) for the same
reason -- we test what *will be*, not *what was*.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul v3: proposed new Depends-On syntax

2017-05-25 Thread James E. Blair
Jeremy Stanley  writes:

> On 2017-05-25 08:50:14 -0500 (-0500), Kevin L. Mitchell wrote:
>> Can I suggest that, for OpenStack purposes, we also deploy some sort of
>> bot that comments on reviews using the old syntax, to at least alert
>> developers to the pending deprecation?  If it had the smarts to guess
>> URLs to place in the Depends-On footer, that'd be even better.
>
> That's pretty doable as a Gerrit hook or standalone event stream
> consuming daemon. Pretty low-hanging fruit if anyone wants to
> volunteer to code that up.
>
> Alternative/complimentary idea, Gerrit hooks can also be used to
> reject uploads, so when the time comes to stop supporting the old
> syntax we can also see about rejecting new patchsets which are using
> the then-unsupported format (as long as the error can be clearly
> passed through the likes of git-review so users aren't too
> confused).

Yes, though it's also possible we may want to have Zuul itself leave
such messages.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul v3: proposed new Depends-On syntax

2017-05-25 Thread James E. Blair
Sean Dague <s...@dague.net> writes:

> On 05/24/2017 07:04 PM, James E. Blair wrote:
> 
>> The natural way to identify a GitHub pull request is with its URL.
>> 
>> This can be used to identify Gerrit changes as well, and will likely be
>> well supported by other systems.  Therefore, I propose we support URLs
>> as the content of the Depends-On footers for all systems.  E.g.:
>> 
>>   Depends-On: https://review.openstack.org/12345
>>   Depends-On: https://github.com/ansible/ansible/pull/12345
>> 
>> Similarly to the Gerrit change IDs, these identifiers are easily
>> navigable within Gerrit (and Gertty), so that reviewers can traverse the
>> dependency chain easily.
>
> Sounds sensible to me. The only thing I ask is that we get a good clock
> countdown on when it will be removed. Upgrade testing is one of the
> places where the multi branch magic was really useful, so it will take a
> little while to get good at it.

Yes!

> For gerrit reviews it should also accept -
> https://review.openstack.org/#/c/467243/ (as that's what is in people's
> browser url bar).

Yeah, I was thinking of copying Gertty's URL parsing here which deals
with all the variants.

This reminds me of something I forgot to mention: we should *not* depend
on specific patchsets even if the URL specifies it.  Sometimes you end
up with:

  https://review.openstack.org/#/c/467634/1

as the URL, with the patchset at the end.  I think that still confuses a
lot of people and they don't notice.  And generally, if someone is
specifying a dependency, they mean the change in general, and don't want
to have to go update the depending change's commit message if they fix a
typo.

So I think that we should start out by simply silently ignoring any
patchset elements in the URL.  We could consider having Zuul leave a
note indicating that the patchset component is not necessary and is
being ignored.

> And while this change is taking place, it would be nice if there was the
> ability to have words after the url. I've often wanted:
>
> Depends-On: https://review.openstack.org/12345 - nova
> Depends-On: https://review.openstack.org/12346 - python-neutronclient
>
> Just as a quick way to remember, without having to link follow, which of
> multiple depends on are which projects. I've resorted to putting them on
> top, but for short things would love to have them on the same line.

That seems reasonable.  In a reply to Jeremy and Tristan, I suggested we
may want to extend the Depends-On syntax in the future to consume some
more information after the URL, but I think it should be fine to allow
arbitrary text now and then reclaim keywords (like "applied to") later
if necessary.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Zuul v3: proposed new Depends-On syntax

2017-05-24 Thread James E. Blair
Hi,

As part of Zuul v3, we're adding support for GitHub (and later possibly
other systems).  We want these systems to have access to the full power
of cross-project-dependencies in the same way as Gerrit.  However, the
current syntax for the Depends-On footer is currently the
Gerrit-specific change-id.

We chose this in an attempt to be future-compatible with some proposed
changes to Gerrit itself to support cross-project dependencies.  Since
then, Gerrit has gone in a different direction on this subject, so I no
longer think we should weigh that very heavily.

While Gerrit change ids can be used to identify one or more changes
within a Gerrit installation, there is no comparable identifier on
GitHub, as pull request numbers are unique only within a project.

The natural way to identify a GitHub pull request is with its URL.

This can be used to identify Gerrit changes as well, and will likely be
well supported by other systems.  Therefore, I propose we support URLs
as the content of the Depends-On footers for all systems.  E.g.:

  Depends-On: https://review.openstack.org/12345
  Depends-On: https://github.com/ansible/ansible/pull/12345

Similarly to the Gerrit change IDs, these identifiers are easily
navigable within Gerrit (and Gertty), so that reviewers can traverse the
dependency chain easily.

One substantial aspect of this change is that it is more specific about
projects and branches.  A single Gerrit change ID can refer to more than
one branch, and even more than one project.  Zuul interprets this as
"this change depends on *all* of the changes that match".  Often times
that is convenient, but sometimes it is not.  Frequently users ask "how
can I make this depend only on a change to master, not the backport of
the change to stable?" and the answer is, "you can't".

URLs have the advantage of allowing users to be specific as to which
instances of a given change are actually required.  If, indeed, a change
depends on more than one, of course a user can still add multiple
Depends-On headers, one for each.

It is also easy for Zuul connections to determine whether a given URL is
referring to a change on that system without actually needing to query
it.  A Zuul connected to several code review systems can easy determine
which to ask for the change by examining the hostname.

URLs do have two disadvantages compared to Gerrit change IDs: they can
not be generated ahead of time, and they are not as easily found in
offline git history.

With Gerrit change IDs, we can write several local changes, and before
pushing them to Gerrit, add Depends-On headers since the change id is
generated locally.  URLs are not known until the changes are pushed to
Gerrit (or GitHub pull requests opened).  So in some cases, editing of
an already existing commit message may be required.  However, the most
common case of a simple dependency chain can still be easily created by
pushing one change up at a time.

Change IDs, by virtue of being in the commit message of the dependent as
well as depending change, become part of the permanent history of the
project, no longer tied to the code review system, once they merge.
This is an important thing to consider for long-running projects.  URLs
are less suitable for this, since they acquire their context from
contemporaneous servers.  However, Gerrit does record the review URL in
git notes, so while it's not as convenient, with some additional tooling
it should be possible to follow dependency paths with only the git
history.

Of course, this is not a change we can make instantaneously -- the
change IDs have a lot of inertia and developer muscle memory.  And we
don't want changes that have been in progress for a while to suddenly be
broken with the switch to v3.  So we will need to support both syntaxes
for some time.

We could, indeed, support both syntaxes indefinitely, but I believe it
would be better to plan on deprecating the Gerrit change ID syntax with
an eye to eventually removing it.  I think that ultimately, the URL
syntax for Depends-On is more intuitive to a new user, especially one
that may end up being exposed to a Zuul which connects to multiple
systems.  Having a Gerrit change depend on a GitHub pull request (and
vice versa) will be one of the most powerful features of Zuul v3, and
the syntax for that should be approachable.

In short, I think the value of consistency across multiple backends and
ease of use for new users outweighs the small loss of functionality for
Gerrit power users in this case.

I propose we adopt support for URLs in all source drivers in v3, and
declare Gerrit change IDs deprecated.  We will continue to support both
for a generous deprecation period (at least 6 months after the initial
Zuul 3.0 release), and then remove support for them.

How does that sound?

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Canceling 2017-05-15 Zuul meeting

2017-05-11 Thread James E. Blair
Hi,

I think enough of us have post-summit plans/chores/burnout that it makes
sense to cancel the May 15, 2017 Zuul meeting.  See you on May 22.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Ask.o.o Email not getting through?

2017-05-01 Thread James E. Blair
Tom Fifield  writes:

> Hello infra,
>
> Exploratory question here, no idea what's actually going on.
>
> We have a prospective new user on Ask OpenStack, who despite trying
> multiple auth methods (Launchpad & Google) multiple times did not
> receive a confirmation email.
>
> I checked and there have been several new users created within the
> same day, so it does not seem like a general email problem.
>
>
> May I ask if there is anything revealing in the logs around
> m...@tiferrei.com ?

The MTA reports a DNS failure for that domain.  The problem appears to
be related to DNSSEC on that domain.  It looks like the domain
previously had invalid signing keys, but they have been removed.
However, DNS resolvers may still have those keys cached.  The caching
resolver that ask.o.o is using indicates it has a bit over 8 hours
remaining.  Google's public DNS servers indicate almost 24 hours
remaining in their cache.

Assuming the user's DNS configuration is and remains correct, the
problem should be corrected on ask.o.o within 9 hours, and they should
see other DNS related failures abate as they reach the original
expiration time of the erroneous DS records.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Skipping 2017-04-10 Zuul meeting

2017-04-07 Thread James E. Blair
Hi,

We're going to have several absences next Monday, the 10th, so let's
skip the Zuul meeting that day.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Test message

2017-03-31 Thread James E. Blair
Hi,

This is a test message to verify mailing list functionality after the
upgrade.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [openstack-dev] [infra][security] Encryption in Zuul v3

2017-03-22 Thread James E. Blair
Darragh Bailey <daragh.bai...@gmail.com> writes:

> On 22 March 2017 at 15:02, James E. Blair <cor...@inaugust.com> wrote:
>
>> Ian Cordasco <sigmaviru...@gmail.com> writes:
>>
>> >
>> > I suppose Barbican doesn't meet those requirements either, then, yes?
>>
>> Right -- we don't want to require another service or tie Zuul to an
>> authn/authz system for a fundamental feature.  However, I do think we
>> can look at making integration with Barbican and similar systems an
>> option for folks who have such an installation and prefer to use it.
>>
>> -Jim
>>
>
> Sounds like you're going to make this plugable, is that a hard requirement
> that will be added to the spec? or just a possibility?

More of a possibility at this point.  In general, I'd like to off-load
interaction with other systems to Ansible as much as possible, and then
add minimal backing support in Zuul itself if needed, that way the core
of Zuul doesn't become a choke point.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


Re: [OpenStack-Infra] [openstack-dev] [infra][security] Encryption in Zuul v3

2017-03-22 Thread James E. Blair
Ian Cordasco <sigmaviru...@gmail.com> writes:

> On Tue, Mar 21, 2017 at 6:10 PM, James E. Blair <cor...@inaugust.com> wrote:
>> We did talk about some other options, though unfortunately it doesn't
>> look like a lot of that made it into the spec reviews.  Among them, it's
>> probably worth noting that there's nothing preventing a Zuul deployment
>> from relying on some third-party secret system -- if you can use it with
>> Ansible, you should be able to use it with Zuul.  But we also want Zuul
>> to have these features out of the box, and, wearing our sysadmin hits,
>> we're really keen on having source control and code review for the
>> system secrets for the OpenStack project.
>>
>> Vault alone doesn't meet our requirements here because it relies on
>> symmetric encryption, which means we need users to share a key with
>> Zuul, implying an extra service with out-of-band authn/authz.  However,
>> we *could* use our PKCS#1 style system to share a vault key with Zuul.
>> I don't think that has come up as a suggestion yet, but seems like it
>> would work.
>
> I suppose Barbican doesn't meet those requirements either, then, yes?

Right -- we don't want to require another service or tie Zuul to an
authn/authz system for a fundamental feature.  However, I do think we
can look at making integration with Barbican and similar systems an
option for folks who have such an installation and prefer to use it.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


Re: [OpenStack-Infra] [openstack-dev] [infra][security] Encryption in Zuul v3

2017-03-21 Thread James E. Blair
David Moreau Simard  writes:

> I don't have a horse in this race or a strong opinion on the topic, in
> fact I'm admittedly not very knowledgeable when it comes to low-level
> encryption things.
>
> However, I did have a question, even if just to generate discussion.
> Did we ever consider simply leaving secrets out of Zuul and offloading
> that "burden" to something else ?
>
> For example, end-users could use something like git-crypt [1] to crypt
> files in their git repos and Zuul could have a mean to decrypt them at
> runtime.
> There is also ansible-vault [2] that could perhaps be leveraged.
>
> Just trying to make sure we're not re-inventing any wheels,
> implementing crypto is usually not straightfoward.

We did talk about some other options, though unfortunately it doesn't
look like a lot of that made it into the spec reviews.  Among them, it's
probably worth noting that there's nothing preventing a Zuul deployment
from relying on some third-party secret system -- if you can use it with
Ansible, you should be able to use it with Zuul.  But we also want Zuul
to have these features out of the box, and, wearing our sysadmin hits,
we're really keen on having source control and code review for the
system secrets for the OpenStack project.

Vault alone doesn't meet our requirements here because it relies on
symmetric encryption, which means we need users to share a key with
Zuul, implying an extra service with out-of-band authn/authz.  However,
we *could* use our PKCS#1 style system to share a vault key with Zuul.
I don't think that has come up as a suggestion yet, but seems like it
would work.

Git-crypt in GPG mode, at first glance, looks like it could work fairly
well for this.  It encrypts entire files, so we would have to rework how
secrets are stored (we encrypt blobs within plaintext files) and add
another file to the list of zuul config files (e.g., .zuul.yaml.gpg).
But aside from that, I think it could work and may be worth further
exploration.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


[OpenStack-Infra] [infra][security] Encryption in Zuul v3

2017-03-21 Thread James E. Blair
Hi,

In working on the implementation of the encrypted secrets feature of
Zuul v3, I have found some things that warrant further discussion.  It's
important to be deliberate about this and I welcome any feedback.

For reference, here is the relevant portion of the Zuul v3 spec:

http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#secrets

And here is an implementation of that:

https://review.openstack.org/#/q/status:open+topic:secrets+project:openstack-infra/zuul

The short version is that we want to allow users to store private keys
in the public git repos which Zuul uses to run jobs.  To do this, we
propose to use asymmetric cryptography (RSA) to encrypt the data.  The
specification suggests implementing PKCS#1-OAEP, a standard for
implementing RSA encryption.

Note that RSA is not able to encrypt a message longer than the key, and
PKCS#1 includes some overhead which eats into that.  If we use 4096 bit
RSA keys in Zuul, we will be able to encrypt 3760 bits (or 470 bytes) of
information.

Further, note that value only holds if we use SHA-1.  It has been
suggested that we may want to consider using SHA-256 with PKCS#1.  If we
do, we will be able to encrypt slightly less data.  However, I'm not
sure that the Python cryptography library allows this (yet?).  Also, see
this answer for why it may not be necessary to use SHA-256 (and also,
why we may want to anyway):

https://security.stackexchange.com/questions/112029/should-sha-1-be-used-with-rsa-oaep

One thing to note is that the OpenSSL CLI utility uses SHA-1.  Right
now, I have a utility script which uses that to encrypt secrets so that
it's easy for anyone to encrypt a secret without installing many
dependencies.  Switching to another hash function would probably mean we
wouldn't be able to use that anymore.  But that's also true for other
systems (see below).

In short, PKCS#1 pros: Simple, nicely packaged asymmetric encryption,
hides plaintext message length (up to its limit).  Cons: limited to 470
bytes (or less).

Generally, when faced with the prospect of encrypting longer messages,
the advice is to adopt a hybrid encryption scheme (as opposed to, say,
chaining RSA messages together, or increasing the RSA key size) which
uses symmetric encryption with a single-use key for the message and
asymmetric encryption to hide the key.  If we want Zuul to support the
encryption of longer secrets, we may want to adopt the hybrid approach.
A frequent hybrid approach is to encrypt the message with AES, and then
encrypt the AES key with RSA.

The hiera-eyaml work which originally inspired some of this is based on
PKCS#7 with AES as the cipher -- ultimately a hybrid approach.  An
interesting aspect of that implementation is that the use of PKCS#7 as a
message passing format allows for multiple possible underlying ciphers
since the message is wrapped in ASN.1 and is self-descriptive.  We might
have simply chosen to go with that except that there don't seem to be
many good options for implementing this in Python, largely because of
the nightmare that is ASN.1 parsing.

The system we have devised for including encrypted content in our YAML
files involves a YAML tag which specifies the encryption scheme.  So we
can evolve our use to add or remove systems as needed in the future.

So to break this down into a series of actionable questions:

1) Do we want a system to support encrypting longer secrets?  Our PKCS#1
system supports up to 470 bytes.  That should be sufficient for most
passwords and API keys, but unlikely to be sufficient for some
certificate related systems, etc.

2) If so, what system should we use?

   2.1a) GPG?  This has hybrid encryption and transport combined.
   Implementation is likely to be a bit awkward, probably involving
   popen to external processes.

   2.1b) RSA+AES?  This recommendation from the pycryptodome
   documentation illustrates a typical hybrid approach:
   
https://pycryptodome.readthedocs.io/en/latest/src/examples.html#encrypt-data-with-rsa
   The transport protocol would likely just be the concatenation of
   the RSA and AES encrypted data, as it is in that example.  We can
   port that example to use the python-cryptography primatives, or we
   can switch to pycryptodome and use it exactly.

   2.1c) RSA+Fernet?  We can stay closer to the friendly recipes in
   python-cryptography.  While there is no complete hybrid recipe,
   there is a symmetric recipe for "Fernet" which is essentially a
   recipe for AES encryption and transport.  We could encode the
   Fernet key with RSA and concatenate the Fernet token.
   https://github.com/fernet/spec/blob/master/Spec.md

   2.1d) NaCL?  A "sealed box" in libsodium (which underlies PyNaCL)
   would do what we want with a completely different set of
   algorithms.
   https://github.com/pyca/pynacl/issues/189

3) Do we think it is important to hide the length of the secret?  AES
will expose the approximate length of the secret up to the block size
(16 bytes).  This 

Re: [OpenStack-Infra] [zuul] Feedback requested for tox job definition

2017-03-09 Thread James E. Blair
Clark Boylan  writes:

> Also reading these job defs and comparing against the zuulv3 spec it
> isn't clear to me what the expected behavior for inheriting pre and post
> playbooks is. Seems like maybe pre is a queue so parent pre roles run
> first and post is a stack so parent post roles run last? If that is
> already written down and I am just missing it sorry for the noise, if it
> isn't written down maybe we can do that?

Yeah, I think it was first written down in my pre-PTG summary message.
I intend on porting over the relevant portions of that text to the Zuul
documentation soon, once we're finished making major changes to the
config format (I have one such change in progress).

The wording I used to explain it was "nesting" -- when you inherit from
a job, your job nests inside the pre and post playbooks of the job from
which you inherit.  This is because the most basic job should do the
earliest pre-tasks (e.g., set up git repos) which other jobs should be
able to rely on.  Likewise, inheriting jobs should be able to rely on
base jobs performing the most general post-tasks (e.g., copy all of the
logs).

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


Re: [OpenStack-Infra] Adding projects to zuulv3-dev.o.o

2017-03-02 Thread James E. Blair
Paul Belanger  writes:

> Greetings!
>
> I wanted to start a thread about what people thought about expanding our
> coverage of projects a little. I've been working on ansible roles for CI 
> things
> for a while, and figure it might be a good first step to have zuulv3 test a
> role[1] to install zuul with ansible.
>
> So far, it is up to date with feature/zuulv3 branch and support ubuntu-trusty,
> ubuntu-xenial, fedora-25 and centos-7.  I've been using it locally for testing
> environments for a while now and would love to start importing it into zuulv3.

Well, I don't want to expand Zuul v3 coverage for its own sake yet, for
all of the reasons mentioned in Monty and Robyn's emails (stability!
security!).  However, I do think we should start exercising some role
dependencies, and getting an ansible-based all-in-one deployment is a
near-term goal, so I think that would be a good restrained addition to
our coverage.

Having a job on the zuulv3 repo which uses that role to deploy an
all-in-one zuul would be a good way of advancing both of those goals.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


[OpenStack-Infra] Multi-connection support in Zuul

2017-03-02 Thread James E. Blair
Hi,

For a while, Zuul has had support for connections to multiple services
for use in triggering jobs, fetching changes, and reporting results.
This has been in use (the github patches work with github, we do support
reporting to gerrit as two different users, we send email over SMTP),
but we have not used the facility to its full potential.  That is
something we want to do in Zuul v3.  Specifically, in addition to all of
the above, we want to be able to have a change to Ansible depend on a
change to shade and be able to test them together.  Ansible is hosted in
github and shade is hosted in gerrit.  By the same token, a third-party
CI operator should be able to point their Zuul at upstream OpenStack and
their own internal Gerrit at the same time.

The pieces are all nearly there for that, but there are a few
assumptions lingering that we need to correct for v3.  They mostly
center around the fact that the "source" for a project (i.e., the
connection over which its source code should be fetched) is specified by
the pipeline.  This was added quite a while ago when we added our first
trigger which did not imply a source (the timer trigger).  By adding the
source to the pipeline, we told Zuul "when the timer goes off, enqueue a
project from gerrit in this pipeline".

That constrains us now, as it is confusing to consider two items from
different connections enqueued in the same pipeline.  There are other
issues too, such as specifying that a job needs an Ansible role from
another Zuul project.

The natural way to correct this seems to be to associate projects with
their connections directly.  Therefore, whenever Zuul encounters a
project (via a trigger, a dependency, or an internal reference) it will
know how to fetch it.

To fully implement this idea necessitates some changes.  With Monty's
help, I have sketched out a plan that should support all of our
use-cases and make this much simpler for developers and users:

1) Associate each source-capable driver with a canonical source-code
   location hostname.  

Our goal is to associate every project with a connection.  Our
connections already have names (like "gerrit") which make perfect sense
as names of pipeline triggers or reporters.  However, they make poor
identifiers for the logical canonical names of source code repos.  If we
were to describe the canonical location for Zuul's source code -- the
place we want users to clone it from -- it would be
git.openstack.org/openstack-infra/zuul.  Adding this attribute allows us
to disambiguate identically named repos from different connections.

Go-lang style project layout and source-code imports are fully-qualified
hostname/path/project identifiers (both on disk and in source code).
For example, "github.com/user/stringutil".  By adopting this facility,
we are better positioned to stage git repos in the Go-lang convention,
and we are using a sensible and common universal identifier for our
projects.

We should default this attribute to the hostname of the connection, but
allow it to be set to another value by the administrator.  So that in
our case, rather than referring to our project locations as
'review.openstack.org/...' we may use 'git.openstack.org.'.

The pipeline and tenant configuration files will continue to reference
the connection name since "trigger: gerrit: ..." makes more sense than
"trigger: git.openstack.org: ...".

2) Use the tenant configuration to associate projects with connections.

The main goal of this exercise is to associate each project with a
connection.  The best place to do that is within the tenant
configuration file.  This is a location where the project-connection
mapping is unambiguous, and it happens early in the configuration
process so that the results are usable by the rest of the
configuration.  The current syntax is:

  - tenant:
  name: openstack
  source:
gerrit:
  config-repos:
- openstack-infra/project-config
  project-repos:
- openstack-infra/zuul

That source section may contain multiple sources (only gerrit in this
case, but could be extended with "github: ...").  Each of the projects
is therefore unambiguously associated with a connection.  Since each
connection has a canonical hostname (via section 1 above), we now also
know the fully qualified canonical location of each source code repo.

While we're on the subject, the terminology config-repos and
project-repos is a bit confusing, especially when we talk about our
config-repo named project-config and the project-repos which have their
own in-project-config.  We started using 'trusted' and 'untrusted'
internally in the launcher to we can keep track of when we're supposed
to deny certain actions.  We should go ahead and rename these
similarly.  So the new syntax would be:

  - tenant:
  name: openstack
  source:
gerrit:
  trusted-projects:
- openstack-infra/project-config
  untrusted-projects:
- 

  1   2   3   >