Re: [openstack-dev] [all] Zuul job backlog

2018-10-04 Thread Matthew Treinish


On October 5, 2018 12:11:51 AM EDT, Abhishek Kekane  wrote:
>Hi Clark,
>
>Thank you for the inputs. I have verified the logs and found that
>mostly
>image import web-download import method related tests are failing.
>Now in this test [1] we are trying to download a file from '
>https://www.openstack.org/assets/openstack-logo/2016R/OpenStack-Logo-Horizontal.eps.zip'
>in glance. Here we are assuming image will be downloaded and active
>within
>20 seconds of time and if not it will be marked as failed. Now this
>test
>never fails in local environment but their might be a problem of
>connecting
>to remote while this test is executed in zuul jobs.
>
>Do you have any alternative idea how we can test this scenario, as it
>is
>very hard to reproduce this in local environment.
>

External networking will always be unreliable from the ci environment, nothing 
is 100% reliable and just given the sheer number of jobs we execute there will 
be an appreciable number of failures just from that. That being said this exact 
problem you've described is one we fixed in devstack/tempest over 5 years ago: 

https://bugs.launchpad.net/tempest/+bug/1190623

It'd be nice if we didn't keep repeating problems. The solution for that bug is 
likely to be the same thing here, and not relying on pulling something from the 
external network in the test. Just use something else hosted on the local 
apache httpd of the test node and use that as the url to import in the test.

-Matt Treinish

>
>
>On Thu, Oct 4, 2018 at 7:43 PM Clark Boylan 
>wrote:
>
>> On Thu, Oct 4, 2018, at 12:16 AM, Abhishek Kekane wrote:
>> > Hi,
>> > Could you please point out some of the glance functional tests
>which are
>> > failing and causing this resets?
>> > I will like to put some efforts towards fixing those.
>>
>> http://status.openstack.org/elastic-recheck/data/integrated_gate.html
>is
>> a good place to start. That shows you a list of tests that failed in
>the
>> OpenStack Integrated gate that elastic-recheck could not identify the
>> failure for including those for several functional jobs.
>>
>> If you'd like to start looking at identified bugs first then
>> http://status.openstack.org/elastic-recheck/gate.html shows
>identified
>> failures that happened in the gate.
>>
>> For glance functional jobs the first link points to:
>>
>>
>http://logs.openstack.org/99/595299/1/gate/openstack-tox-functional/fc13eca/
>>
>>
>http://logs.openstack.org/44/569644/3/gate/openstack-tox-functional/b7c487c/
>>
>>
>http://logs.openstack.org/99/595299/1/gate/openstack-tox-functional-py35/b166313/
>>
>>
>http://logs.openstack.org/44/569644/3/gate/openstack-tox-functional-py35/ce262ab/
>>
>> Clark
>>
>>
>__
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] [infra] [placement] tempest plugins virtualenv

2018-09-28 Thread Matthew Treinish
On Fri, Sep 28, 2018 at 03:31:10PM +0100, Chris Dent wrote:
> On Fri, 28 Sep 2018, Matthew Treinish wrote:
> 
> > > http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_13_25_798683
> > 
> > Right above this line it shows that the gabbi-tempest plugin is installed in
> > the venv:
> > 
> > http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_13_25_650661
> 
> Ah, so it is, thanks. My grepping and visual-grepping failed
> because of the weird linebreaks. Le sigh.
> 
> For curiosity: What's the processing that is making it be installed
> twice? I ask because I'm hoping to (eventually) trim this to as
> small and light as possible. And then even more eventually I hope to
> make it so that if a project chooses the right job and has a gabbits
> directory, they'll get run.

The plugin should only be installed once. From the logs here is the only
place the plugin is being installed in the venv:

http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_13_01_027151

The rest of the references are just tox printing out the packages installed in
the venv before running a command.

> 
> The part that was confusing for me was that the virtual env that
> lib/tempest (from devstack) uses is not even mentioned in tempest's
> tox.ini, so is using its own directory as far as I could tell.

It should be, devstack should be using the venv-tempest tox job to do venv
prep (like installling the plugins) and run commands (like running
tempest list-plugins for the log). This tox env is defined here:

https://github.com/openstack/tempest/blob/master/tox.ini#L157-L162

It's sort of a hack, devstack is just using tox as venv manager for
setting up tempest. But, then we use tox in the runner (what used to be
devstack-gate) so this made sense.

-Matt Treinish

> 
> > My guess is that the plugin isn't returning any tests that match the regex.
> 
> I'm going to run it without a regex and see what it produces.
> 
> It might be that pre job I'm using to try to get the gabbits in the
> right place is not working as desired.
> 
> A few patchsets ago when I was using the oogly way of doing things
> it was all working.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] [infra] [placement] tempest plugins virtualenv

2018-09-28 Thread Matthew Treinish
On Fri, Sep 28, 2018 at 02:39:24PM +0100, Chris Dent wrote:
> 
> I'm still trying to figure out how to properly create a "modern" (as
> in zuul v3 oriented) integration test for placement using gabbi and
> tempest. That work is happening at https://review.openstack.org/#/c/601614/
> 
> There was lots of progress made after the last message on this
> topic 
> http://lists.openstack.org/pipermail/openstack-dev/2018-September/134837.html
> but I've reached another interesting impasse.
> 
> From devstack's standpoint, the way to say "I want to use a tempest
> plugin" is to set TEMPEST_PLUGINS to alist of where the plugins are.
> devstack:lib/tempest then does a:
> 
> tox -evenv-tempest -- pip install -c 
> $REQUIREMENTS_DIR/upper-constraints.txt $TEMPEST_PLUGINS
> 
> http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_12_58_138163
> 
> I have this part working as expected.
> 
> However,
> 
> The advice is then to create a new job that has a parent of
> devstack-tempest. That zuul job runs a variety of tox environments,
> depending on the setting of the `tox_envlist` var. If you wish to
> use a `tempest_test_regex` (I do) the preferred tox environment is
> 'all'.
> 
> That venv doesn't have the plugin installed, thus no gabbi tests are
> found:
> 
> http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_13_25_798683

Right above this line it shows that the gabbi-tempest plugin is installed in
the venv:

http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/job-output.txt.gz#_2018-09-28_11_13_25_650661

at version 0.1.1. It's a bit weird because it's line wrapped in my browser.
The devstack logs also shows the plugin:

http://logs.openstack.org/14/601614/21/check/placement-tempest-gabbi/f44c185/controller/logs/devstacklog.txt.gz#_2018-09-28_11_13_13_076

All the tempest tox jobs that run tempest (and the tempest-venv command used by
devstack) run inside the same tox venv:

https://github.com/openstack/tempest/blob/master/tox.ini#L52

My guess is that the plugin isn't returning any tests that match the regex.

I'm also a bit alarmed that tempest run is returning 0 there when no tests are
being run. That's definitely a bug because things should fail with no tests
being successfully run.

-Matt Treinish

> 
> How do I get my plugin installed into the right venv while still
> following the guidelines for good zuul behavior?
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] [infra] [qa] tuning some zuul jobs from "it works" to "proper"

2018-09-20 Thread Matthew Treinish
On Thu, Sep 20, 2018 at 10:55:31AM +0200, Luigi Toscano wrote:
> On Thursday, 20 September 2018 04:47:20 CEST Matthew Treinish wrote:
> > On Thu, Sep 20, 2018 at 11:11:12AM +0900, Ghanshyam Mann wrote:
> > >   On Wed, 19 Sep 2018 23:29:46 +0900 Monty Taylor
> > >   wrote >  
> > >  > On 09/19/2018 09:23 AM, Monty Taylor wrote:
> > >  > > On 09/19/2018 08:25 AM, Chris Dent wrote:
> > >  > >> I have a patch in progress to add some simple integration tests to
> > >  > >> 
> > >  > >> placement:
> > >  > >>  https://review.openstack.org/#/c/601614/
> > >  > >> 
> > >  > >> They use https://github.com/cdent/gabbi-tempest . The idea is that
> > >  > >> the method for adding more tests is to simply add more yaml in
> > >  > >> gate/gabbits, without needing to worry about adding to or think
> > >  > >> about tempest.
> > >  > >> 
> > >  > >> What I have at that patch works; there are two yaml files, one of
> > >  > >> which goes through the process of confirming the existence of a
> > >  > >> resource provider and inventory, booting a server, seeing a change
> > >  > >> in allocations, resizing the server, seeing a change in allocations.
> > >  > >> 
> > >  > >> But this is kludgy in a variety of ways and I'm hoping to get some
> > >  > >> help or pointers to the right way. I'm posting here instead of
> > >  > >> asking in IRC as I assume other people confront these same
> > >  > >> confusions. The issues:
> > >  > >> 
> > >  > >> * The associated playbooks are cargo-culted from stuff labelled
> > >  > >> 
> > >  > >>"legacy" that I was able to find in nova's jobs. I get the
> > >  > >>impression that these are more verbose and duplicative than they
> > >  > >>need to be and are not aligned with modern zuul v3 coolness.
> > >  > > 
> > >  > > Yes. Your life will be much better if you do not make more legacy
> > >  > > jobs.
> > >  > > They are brittle and hard to work with.
> > >  > > 
> > >  > > New jobs should either use the devstack base job, the
> > >  > > devstack-tempest
> > >  > > base job or the devstack-tox-functional base job - depending on what
> > >  > > things are intended.
> > > 
> > > +1. All the base job from Tempest and Devstack (except grenade which is in
> > > progress) are available to use as base for legacy jobs. Using
> > > devstack-temepst in your patch is right things. In addition, you need to
> > > mention the tox_envlist as all-plugins to make tempest_test_regex work. I
> > > commented on review.
> > No, all-plugins is incorrect and should never be used. It's only there for
> > legacy support, it is deprecated and I thought we pushed a patch to
> > indicating that (but I can't find it).
> 
> This one?
> https://review.openstack.org/#/c/543974/
> 

Yep, that's the one I was thinking of.


Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] [infra] [qa] tuning some zuul jobs from "it works" to "proper"

2018-09-19 Thread Matthew Treinish
On Thu, Sep 20, 2018 at 11:11:12AM +0900, Ghanshyam Mann wrote:
>   On Wed, 19 Sep 2018 23:29:46 +0900 Monty Taylor  
> wrote  
>  > On 09/19/2018 09:23 AM, Monty Taylor wrote:
>  > > On 09/19/2018 08:25 AM, Chris Dent wrote:
>  > >>
>  > >> I have a patch in progress to add some simple integration tests to
>  > >> placement:
>  > >>
>  > >>  https://review.openstack.org/#/c/601614/
>  > >>
>  > >> They use https://github.com/cdent/gabbi-tempest . The idea is that
>  > >> the method for adding more tests is to simply add more yaml in
>  > >> gate/gabbits, without needing to worry about adding to or think
>  > >> about tempest.
>  > >>
>  > >> What I have at that patch works; there are two yaml files, one of
>  > >> which goes through the process of confirming the existence of a
>  > >> resource provider and inventory, booting a server, seeing a change
>  > >> in allocations, resizing the server, seeing a change in allocations.
>  > >>
>  > >> But this is kludgy in a variety of ways and I'm hoping to get some
>  > >> help or pointers to the right way. I'm posting here instead of
>  > >> asking in IRC as I assume other people confront these same
>  > >> confusions. The issues:
>  > >>
>  > >> * The associated playbooks are cargo-culted from stuff labelled
>  > >>"legacy" that I was able to find in nova's jobs. I get the
>  > >>impression that these are more verbose and duplicative than they
>  > >>need to be and are not aligned with modern zuul v3 coolness.
>  > > 
>  > > Yes. Your life will be much better if you do not make more legacy jobs. 
>  > > They are brittle and hard to work with.
>  > > 
>  > > New jobs should either use the devstack base job, the devstack-tempest 
>  > > base job or the devstack-tox-functional base job - depending on what 
>  > > things are intended.
> 
> +1. All the base job from Tempest and Devstack (except grenade which is in 
> progress) are available to use as base for legacy jobs. Using 
> devstack-temepst in your patch is right things. In addition, you need to 
> mention the tox_envlist as all-plugins to make tempest_test_regex work. I 
> commented on review. 

No, all-plugins is incorrect and should never be used. It's only there for
legacy support, it is deprecated and I thought we pushed a patch to indicating
that (but I can't find it). It tells tox to create a venv with system
site-packages enabled and that almost always causes more problems than it
fixes. Specifying the plugin with TEMPEST_PLUGINS will make sure the plugin is
installed in tempest's venv, and if you need to run a tox job without a preset
selection regex (so you can specify your own) you should use the "all" job.
(not all-plugins)

-Matt Treinish

> 
>  > > 
>  > > You might want to check out:
>  > > 
>  > > https://docs.openstack.org/devstack/latest/zuul_ci_jobs_migration.html
>  > > 
>  > > also, cmurphy has been working on updating some of keystone's legacy 
>  > > jobs recently:
>  > > 
>  > > https://review.openstack.org/602452
>  > > 
>  > > which might also be a source for copying from.
>  > > 
>  > >> * It takes an age for the underlying devstack to build, I can
>  > >>presumably save some time by installing fewer services, and making
>  > >>it obvious how to add more when more are required. What's the
>  > >>canonical way to do this? Mess with {enable,disable}_service, cook
>  > >>the ENABLED_SERVICES var, do something with required_projects?
>  > > 
>  > > http://git.openstack.org/cgit/openstack/openstacksdk/tree/.zuul.yaml#n190
>  > > 
>  > > Has an example of disabling services, of adding a devstack plugin, and 
>  > > of adding some lines to localrc.
>  > > 
>  > > 
>  > > http://git.openstack.org/cgit/openstack/openstacksdk/tree/.zuul.yaml#n117
>  > > 
>  > > Has some more complex config bits in it.
>  > > 
>  > > In your case, I believe you want to have parent: devstack-tempest 
>  > > instead of parent: devstack-tox-functional
>  > > 
>  > > 
>  > >> * This patch, and the one that follows it [1] dynamically install
>  > >>stuff from pypi in the post test hooks, simply because that was
>  > >>the quick and dirty way to get those libs in the environment.
>  > >>What's the clean and proper way? gabbi-tempest itself needs to be
>  > >>in the tempest virtualenv.
>  > > 
>  > > This I don't have an answer for. I'm guessing this is something one 
>  > > could do with a tempest plugin?
>  > 
>  > K. This:
>  > 
>  > 
> http://git.openstack.org/cgit/openstack/neutron-tempest-plugin/tree/.zuul.yaml#n184
> 
> Yeah, You can install that via TEMPEST_PLUGINS var. All plugins specified in 
> TEMPEST_PLUGINS var, will be installed into the tempest venv[1]. You can 
> mention the gabbi-tempest same way. 
> 
> [1] 
> https://github.com/openstack-dev/devstack/blob/6f4b7fc99c4029d25a924bcad968089d89e9d296/lib/tempest#L663
> 
> -gmann
> 
>  > 
>  > Has an example of a job using a tempest plugin.
>  > 
>  > >> * The post.yaml playbook which gathers up 

Re: [openstack-dev] [python3] tempest and grenade conversion to python 3.6

2018-09-18 Thread Matthew Treinish
On Tue, Sep 18, 2018 at 09:52:50PM -0500, Matt Riedemann wrote:
> On 9/18/2018 12:28 PM, Doug Hellmann wrote:
> > What's probably missing is a version of the grenade job that allows us
> > to control that USE_PYTHON3 variable before and after the upgrade.
> > 
> > I see a few different grenade jobs (neutron-grenade,
> > neutron-grenade-multinode,
> > legacy-grenade-dsvm-neutron-multinode-live-migration, possibly others).
> > Which ones are "current" and would make a good candidate as a base for a
> > new job?
> 
> Grenade just runs devstack on the old side (e.g. stable/rocky) using the
> devstack stackrc file (which could have USE_PYTHON3 in it), runs tempest
> 'smoke' tests to create some resources, saves off some information about
> those resources in a "database" (just an ini file), then runs devstack on
> the new side (e.g. master) using the new side stackrc file and verifies
> those saved off resources made it through the upgrade. It's all bash so
> there isn't anything python-specific about grenade.

This isn't quite right, we run devstack on the old side. But, on the new side
we don't actually run devstack. Grenade updates the code, runs DB migrations
(and any other mandatory upgrade steps), and then just relaunches the service.
That's kind of the point to make sure new code works with old config.

The target (ie new side) stackrc and localrc/local.conf are there for the
common functions shared between devstack and grenade which are used to do things
like pull the code and start services to make sure they run against the proper
branches. Since there isn't any point in reimplementing the same exact thing.
But we don't do a full devstack run, that's why you see only see stack.sh run
once in the logs on a grenade job.

> 
> I saw, but didn't comment on, the other thread about if it would be possible
> to create a grenade-2to3 job. I'd think that is pretty doable based on the
> USE_PYTHON3 variable. We'd just have that False on the old side, and True on
> the new side, and devstack will do it's thing. Right now the USE_PYTHON3
> variable is global in devstack-gate [1] (which is the thing that
> orchestrates the grenade run for the legacy jobs), but I'm sure we could
> hack that to be specific to the base (old) and target (new) release for the
> grenade run.

I don't think this will work because we won't be running any initial python 3
setup on the system. I think it will just update paths and try to use python3
pip and python3 paths for things, but it will be missing the things it needs
for those to work. It's probably worth a try either way (a quick experiment
to say definitively) but my gut is telling me that it's not going to be that
simple.


-Matt Treinish

> 
> [1] 
> https://github.com/openstack-infra/devstack-gate/blob/95fa4343104eafa655375cce3546d27139211d13/devstack-vm-gate-wrap.sh#L434
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [doc][i18n][infra][tc] plan for PDF and translation builds for documentation

2018-09-13 Thread Matthew Treinish
On Fri, Sep 14, 2018 at 10:09:26AM +0900, Ian Y. Choi wrote:
> First of all, thanks a lot for nice summary - I would like to deeply
> read and put comments later.
> 
> And @mtreinish, please see my reply inline:
> 
> Matthew Treinish wrote on 9/14/2018 5:09 AM:
> > On Thu, Sep 13, 2018 at 07:23:53AM -0600, Doug Hellmann wrote:
> >> Excerpts from Michel Peterson's message of 2018-09-13 10:04:27 +0300:
> >>> On Thu, Sep 13, 2018 at 1:09 AM, Doug Hellmann 
> >>> wrote:
> >>>
> >>>> The longer version is that we want to continue to use the existing
> >>>> tox environment in each project as the basis for the job, since
> >>>> that allows teams to control the version of python used, the
> >>>> dependencies installed, and add custom steps to their build (such
> >>>> as for pre-processing the documentation). So, the new or updated
> >>>> job will start by running "tox -e docs" as it does today. Then it
> >>>> will run Sphinx again with the instructions to build PDF output,
> >>>> and copy the results into the directory that the publish job will
> >>>> use to sync to the web server. And then it will run the scripts to
> >>>> build translated versions of the documentation as HTML, and copy
> >>>> the results into place for publishing.
> >>>>
> >>> Just a question out of curiosity. You mention that we still want to use 
> >>> the
> >>> docs environment because it allows fine grained control over how the
> >>> documentation is created. However, as I understand, the PDF output will
> >>> happen in a more standardized way and outside of that fine grained 
> >>> control,
> >>> right? That couldn't lead to differences in both documentations? Do we 
> >>> have
> >>> to even worry about that?
> >> Good question.  The idea is to run "tox -e docs" to get the regular
> >> HTML, then something like
> >>
> >>.tox/docs/bin/sphinx-build -b latex doc/build doc/build/latex
> >>cd doc/build/latex
> >>make
> >>cp doc/build/latex/*.pdf doc/build/html
> > To be fair, I've looked at this several times in the past, and sphinx's 
> > latex
> > generation is good enough for the simple case, but on more complex documents
> > it doesn't really work too well. For example, on nova I added this a while 
> > ago:
> >
> > https://github.com/openstack/nova/blob/master/tools/build_latex_pdf.sh
> 
> After seeing what the script is doing, I wanna divide into several parts
> and would like to tell with some generic approach:
> 
> - svg -> png
>  : PDF builds ideally convert all svg files into PDF with no problems,
> but there are some realistic problems
>    such as problems on determining bounding sbox size on vector svg
> files, and big memory problems with lots of tags in svg files.
>  : Maybe it would be solved if we check all svg files with correct
> formatting,
>    or if all svg files are converted to png files with temporal changes
> on rst file (.svg -> .png), wouldn't it?

Yeah we will have to do either. In my experience just converting to png images
is normally easier.

> 
> - non-latin code problems:
>  : By default, Sphinx uses latex builder, which doesn't support
> non-latin codes and customized fonts [1].
>    Documentation team tried to make use of xelatex instead of latex in
> Sphinx configuration and now it is overridden
>    on openstackdocstheme >=1.20. So non-latin code would not generate
> problems if you use openstackdocstheme >=1.20.

Ok sure, using XeTex will solve this problem. I typically still just use
pdflatex so back when I pushed that script (which was over 3 years ago)
I was trying to fix it by converting the non-latin characters by using latex
symbol equivalents for those characters. (which is a feature built-in to
sphinx, but it just misses a lot of symbols)

> 
> - other things
>  : I could not capture the background on other changes such as
> additional packages.
>    If u provide more background on other things, I would like to
> investigate on how to approach by changing a rst file
>    to make compatible with pdf builds or how to support all pdf builds
> on many project repos as much as possible.

The extra packages were part of the attempt to fix the non-latin characters
using latex symbols. Those packages are just added there so you can call
\checkmark and \ding{54} instead of ✔ and ✖.

> 
> When I test PDF builds on current nova repo with master branch, it seems
> that the rst document is too big
>

Re: [openstack-dev] [doc][i18n][infra][tc] plan for PDF and translation builds for documentation

2018-09-13 Thread Matthew Treinish
On Thu, Sep 13, 2018 at 07:23:53AM -0600, Doug Hellmann wrote:
> Excerpts from Michel Peterson's message of 2018-09-13 10:04:27 +0300:
> > On Thu, Sep 13, 2018 at 1:09 AM, Doug Hellmann 
> > wrote:
> > 
> > > The longer version is that we want to continue to use the existing
> > > tox environment in each project as the basis for the job, since
> > > that allows teams to control the version of python used, the
> > > dependencies installed, and add custom steps to their build (such
> > > as for pre-processing the documentation). So, the new or updated
> > > job will start by running "tox -e docs" as it does today. Then it
> > > will run Sphinx again with the instructions to build PDF output,
> > > and copy the results into the directory that the publish job will
> > > use to sync to the web server. And then it will run the scripts to
> > > build translated versions of the documentation as HTML, and copy
> > > the results into place for publishing.
> > >
> > 
> > Just a question out of curiosity. You mention that we still want to use the
> > docs environment because it allows fine grained control over how the
> > documentation is created. However, as I understand, the PDF output will
> > happen in a more standardized way and outside of that fine grained control,
> > right? That couldn't lead to differences in both documentations? Do we have
> > to even worry about that?
> 
> Good question.  The idea is to run "tox -e docs" to get the regular
> HTML, then something like
> 
>.tox/docs/bin/sphinx-build -b latex doc/build doc/build/latex
>cd doc/build/latex
>make
>cp doc/build/latex/*.pdf doc/build/html

To be fair, I've looked at this several times in the past, and sphinx's latex
generation is good enough for the simple case, but on more complex documents
it doesn't really work too well. For example, on nova I added this a while ago:

https://github.com/openstack/nova/blob/master/tools/build_latex_pdf.sh

To work around some issues with this workflow. It was enough to get the
generated latex to actually compile back then. But, that script has bitrotted
and needs to be updated, because the latex from sphinx for nova's docs no
longer compiles. (also I submitted a patch to sphinx in the meantime to
fix the check mark latex output) I'm afraid that it'll be a constant game
of cat and mouse trying to get everything to build.

I think that we'll find that on most projects' documentation we will need
to massage the latex output from sphinx to build pdfs.

-Matt Treinish

> 
> We would run the HTML translation builds in a similar way by invoking
> sphinx-build from the virtualenv repeatedly with different locale
> settings based on what translations exist.
> 
> In my earlier comment, I was thinking of the case where a team runs
> a script to generate rst content files before invoking sphinx to
> build the HTML. That script would have been run before the PDF
> generation happens, so the content should be the same. That also
> applies for anyone using sphinx add-ons, which will be available
> to the latex builder because we'll be using the version of sphinx
> installed in the virtualenv managed by tox.
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][nova] Nominating melwitt for nova stable core

2018-08-28 Thread Matthew Treinish
On Tue, Aug 28, 2018 at 03:26:02PM -0500, Matt Riedemann wrote:
> I hereby nominate Melanie Witt for nova stable core. Mel has shown that she
> knows the stable branch policy and is also an active reviewer of nova stable
> changes.
> 
> +1/-1 comes from the stable-maint-core team [1] and then after a week with
> no negative votes I think it's a done deal. Of course +1/-1 from existing
> nova-stable-maint [2] is also good feedback.

+1 from me.

-Matt Treinish


> 
> [1] https://review.openstack.org/#/admin/groups/530,members
> [2] https://review.openstack.org/#/admin/groups/540,members
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stestr?][tox?][infra?] Unexpected success isn't a failure

2018-08-03 Thread Matthew Treinish
On Tue, Jul 10, 2018 at 03:16:14PM -0400, Matthew Treinish wrote:
> On Tue, Jul 10, 2018 at 10:16:37AM +0100, Chris Dent wrote:
> > On Mon, 9 Jul 2018, Matthew Treinish wrote:
> > 
> > > It's definitely  a bug, and likely a bug in stestr (or one of the lower 
> > > level
> > > packages like testtools or python-subunit), because that's what's 
> > > generating
> > > the return code. Tox just looks at the return code from the commands to 
> > > figure
> > > out if things were successful or not. I'm a bit surprised by this though I
> > > thought we covered the unxsuccess and xfail cases because I would have 
> > > expected
> > > cdent to file a bug if it didn't. Looking at the stestr tests we don't 
> > > have
> > > coverage for the unxsuccess case so I can see how this slipped through.
> > 
> > This was reported on testrepository some years ago and a bit of
> > analysis was done: https://bugs.launchpad.net/testrepository/+bug/1429196
> > 
> 
> This actually helps a lot, because I was seeing the same issue when I tried
> writing a quick patch to address this. When I manually poked the TestResult
> object it didn't have anything in the unxsuccess list. So instead of relying
> on that I wrote this patch:
> 
> https://github.com/mtreinish/stestr/pull/188
> 
> which uses the output filter's internal function for counting results to
> find unxsuccess tests. It's still not perfect though because if someone
> runs with the --no-subunit-trace flag it still doesn't work (because that
> call path never gets run) but it's at least a starting point. I've
> marked it as WIP for now, but I'm thinking we could merge it as is and
> leave the --no-subunit-trace and unxsuccess as a known issues for now,
> since xfail and unxsuccess are pretty uncommon in practice. (gabbi is the
> only thing I've seen really use it)
> 
> 
> 
> > So yeah, I did file a bug but it fell off the radar during those
> > dark times.
> > 
> 

Just following up here, after digging some more and getting a detailed
bug filed by electrofelix [1] I was able to throw together a different patch
that should solve this in a better way:

https://github.com/mtreinish/stestr/pull/190

Once that lands I can push a bugfix release to get it out there so people
can actually use the fix.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stestr?][tox?][infra?] Unexpected success isn't a failure

2018-07-10 Thread Matthew Treinish
On Tue, Jul 10, 2018 at 10:16:37AM +0100, Chris Dent wrote:
> On Mon, 9 Jul 2018, Matthew Treinish wrote:
> 
> > It's definitely  a bug, and likely a bug in stestr (or one of the lower 
> > level
> > packages like testtools or python-subunit), because that's what's generating
> > the return code. Tox just looks at the return code from the commands to 
> > figure
> > out if things were successful or not. I'm a bit surprised by this though I
> > thought we covered the unxsuccess and xfail cases because I would have 
> > expected
> > cdent to file a bug if it didn't. Looking at the stestr tests we don't have
> > coverage for the unxsuccess case so I can see how this slipped through.
> 
> This was reported on testrepository some years ago and a bit of
> analysis was done: https://bugs.launchpad.net/testrepository/+bug/1429196
> 

This actually helps a lot, because I was seeing the same issue when I tried
writing a quick patch to address this. When I manually poked the TestResult
object it didn't have anything in the unxsuccess list. So instead of relying
on that I wrote this patch:

https://github.com/mtreinish/stestr/pull/188

which uses the output filter's internal function for counting results to
find unxsuccess tests. It's still not perfect though because if someone
runs with the --no-subunit-trace flag it still doesn't work (because that
call path never gets run) but it's at least a starting point. I've
marked it as WIP for now, but I'm thinking we could merge it as is and
leave the --no-subunit-trace and unxsuccess as a known issues for now,
since xfail and unxsuccess are pretty uncommon in practice. (gabbi is the
only thing I've seen really use it)

-Matt Treinish


> So yeah, I did file a bug but it fell off the radar during those
> dark times.
> 



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stestr?][tox?][infra?] Unexpected success isn't a failure

2018-07-09 Thread Matthew Treinish
On Mon, Jul 09, 2018 at 06:59:42PM -0500, Eric Fried wrote:
> In gabbi, there's a way [1] to mark a test as an expected failure, which
> makes it show up in your stestr run thusly:
> 
> {0}
> nova.tests.functional.api.openstack.placement.test_placement_api.allocations-1.28_put_that_allocation_to_new_consumer.test_request
> [0.710821s] ... ok
> 
> ==
> Totals
> ==
> Ran: 1 tests in 9. sec.
>  - Passed: 0
>  - Skipped: 0
>  - Expected Fail: 1
>  - Unexpected Success: 0
>  - Failed: 0
> 
> If I go fix the thing causing the heretofore-expected failure, but
> forget to take out the `xfail: True`, it does this:
> 
> {0}
> nova.tests.functional.api.openstack.placement.test_placement_api.allocations-1.28_put_that_allocation_to_new_consumer.test_request
> [0.710517s] ... FAILED
> {0}
> nova.tests.functional.api.openstack.placement.test_placement_api.allocations-1.28_put_that_allocation_to_new_consumer.test_request
> [0.00s] ... ok
> 
> ==
> Failed 1 tests - output below:
> ==
> 
> nova.tests.functional.api.openstack.placement.test_placement_api.allocations-1.28_put_that_allocation_to_new_consumer.test_request
> --
> 
> 
> ==
> Totals
> ==
> Ran: 2 tests in 9. sec.
>  - Passed: 1
>  - Skipped: 0
>  - Expected Fail: 0
>  - Unexpected Success: 1
>  - Failed: 0
> 
> BUT it does not cause the run to fail. For example, see the
> nova-tox-functional results for [2] (specifically PS4): the test appears
> twice in the middle of the run [3] and prints failure output [4] but the
> job passes [5].
> 
> So I'm writing this email because I have no idea if this is expected
> behavior or a bug (I'm hoping the latter, cause it's whack, yo); and if
> a bug, I have no idea whose bug it should be. Help?
It's definitely  a bug, and likely a bug in stestr (or one of the lower level
packages like testtools or python-subunit), because that's what's generating
the return code. Tox just looks at the return code from the commands to figure
out if things were successful or not. I'm a bit surprised by this though I
thought we covered the unxsuccess and xfail cases because I would have expected
cdent to file a bug if it didn't. Looking at the stestr tests we don't have
coverage for the unxsuccess case so I can see how this slipped through.

Looking at the where the return code for the output from the run command is
generated (it's a bit weird because run calls the load command internally which
handles the output generation, result storage, and final return code):

https://github.com/mtreinish/stestr/blob/master/stestr/commands/load.py#L222-L225

I'm thinking it might be an issue in testtools or python-subunit, I don't 
remember
which generates the results object used there (if it is subunit it'll be a
subclass from testtools). But I'll have to trace through it to be sure. In the
mean time we can easily workaround the issue in stestr itself by just manually
checking the result status instead of relying on the existing function from the
results class.

-Matt Treinish

> 
> [1] https://gabbi.readthedocs.io/en/latest/format.html?highlight=xfail
> [2] https://review.openstack.org/#/c/579921/4
> [3]
> http://logs.openstack.org/21/579921/4/check/nova-tox-functional/5fb6ee9/job-output.txt.gz#_2018-07-09_17_22_11_846366
> [4]
> http://logs.openstack.org/21/579921/4/check/nova-tox-functional/5fb6ee9/job-output.txt.gz#_2018-07-09_17_31_07_229271
> [5]
> http://logs.openstack.org/21/579921/4/check/nova-tox-functional/5fb6ee9/testr_results.html.gz
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][tempest-plugins][release][tc][ptl]: Coordinated Release Model proposal for Tempest & Tempest Plugins

2018-06-26 Thread Matthew Treinish
On Tue, Jun 26, 2018 at 10:12:30AM -0400, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2018-06-26 09:52:09 -0400:
> > On Tue, Jun 26, 2018 at 08:53:21AM -0400, Doug Hellmann wrote:
> > > Excerpts from Andrea Frittoli's message of 2018-06-26 13:35:11 +0100:
> > > > On Tue, 26 Jun 2018, 1:08 pm Thierry Carrez,  
> > > > wrote:
> > > > 
> > > > > Dmitry Tantsur wrote:
> > > > > > [...]
> > > > > > My suggestion: tempest has to be compatible with all supported 
> > > > > > releases
> > > > > > (of both services and plugins) OR be branched.
> > > > > > [...]
> > > > > I tend to agree with Dmitry... We have a model for things that need
> > > > > release alignment, and that's the cycle-bound series. The reason 
> > > > > tempest
> > > > > is branchless was because there was no compatibility issue. If the 
> > > > > split
> > > > > of tempest plugins introduces a potential incompatibility, then I 
> > > > > would
> > > > > prefer aligning tempest to the existing model rather than introduce a
> > > > > parallel tempest-specific cycle just so that tempest can stay
> > > > > release-independent...
> > > > >
> > > > > I seem to remember there were drawbacks in branching tempest, 
> > > > > though...
> > > > > Can someone with functioning memory brain cells summarize them again ?
> > > > >
> > > > 
> > > > 
> > > > Branchless Tempest enforces api stability across branches.
> > > 
> > > I'm sorry, but I'm having a hard time taking this statement seriously
> > > when the current source of tension is that the Tempest API itself
> > > is breaking for its plugins.
> > > 
> > > Maybe rather than talking about how to release compatible things
> > > together, we should go back and talk about why Tempest's API is changing
> > > in a way that can't be made backwards-compatible. Can you give some more
> > > detail about that?
> > > 
> > 
> > Well it's not, if it did that would violate all the stability guarantees
> > provided by Tempest's library and plugin interface. I've not ever heard of
> > these kind of backwards incompatibilities in those interfaces and we go to
> > all effort to make sure we don't break them. Where did the idea that
> > backwards incompatible changes where being introduced come from?
> 
> In his original post, gmann said, "There might be some changes in
> Tempest which might not work with older version of Tempest Plugins."
> I was surprised to hear that, but I'm not sure how else to interpret
> that statement.

I have no idea what he means here either. If we went off and broke plugins using
a defined stable interface with changes on master we would breaking all the
stability guarantees Tempest provides on those interfaces. That's not something
we do, and have review processes and testing to prevent. The only thing I can
think of is removal of an interface, but that is pretty rare and when we do we
go through the standard deprecation procedure when we do that.

> 
> > As for this whole thread I don't understand any of the points being brought 
> > up
> > in the original post or any of the follow ons, things seem to have been 
> > confused
> > from the start. The ask from users at the summit was simple. When a new 
> > OpenStack
> > release is pushed we push a tempest release to mark that (the next one will 
> > be
> > 19.0.0 to mark Rocky). Users were complaining that many plugins don't have a
> > corresponding version to mark support for a new release. So when trying to 
> > run
> > against a rocky cloud you get tempest 19.0.0 and then a bunch of plugins for
> > various services at different sha1s which have to be manually looked up 
> > based
> > on dates. All users wanted at the summit was a tag for plugins like tempest
> > does with the first number in:
> > 
> > https://docs.openstack.org/tempest/latest/overview.html#release-versioning
> > 
> > which didn't seem like a bad idea to me. I'm not sure the best mechanism to
> > accomplish this, because I agree with much of what plugin maintainers were
> > saying on the thread about wanting to control their own releases. But the
> > desire to make sure users have a tag they can pull for the addition or
> > removal of a supported release makes sense as something a plugin should do.
> 
> We don't coordinate versions across projects anywhere else, for a
> bunch of reasons including the complexity of coordinating the details
> and the confusion it causes when the first version of something is
> 19.0.0. Instead, we list the compatible versions of everything
> together on a series-specific page on releases.o.o. That seems to
> be enough to help anyone wanting to know which versions of tools
> work together. The data is also available in YAML files, so it's easy
> enough to consume by automation.
> 
> Would that work for tempest and it's plugins, too?

That is exactly what I had in mind. I wasn't advocating all plugins use the same
version number for releases, for the same reasons we don't do that for service
projects anymore. Just that there 

Re: [openstack-dev] [qa][tempest-plugins][release][tc][ptl]: Coordinated Release Model proposal for Tempest & Tempest Plugins

2018-06-26 Thread Matthew Treinish
On Tue, Jun 26, 2018 at 08:53:21AM -0400, Doug Hellmann wrote:
> Excerpts from Andrea Frittoli's message of 2018-06-26 13:35:11 +0100:
> > On Tue, 26 Jun 2018, 1:08 pm Thierry Carrez,  wrote:
> > 
> > > Dmitry Tantsur wrote:
> > > > [...]
> > > > My suggestion: tempest has to be compatible with all supported releases
> > > > (of both services and plugins) OR be branched.
> > > > [...]
> > > I tend to agree with Dmitry... We have a model for things that need
> > > release alignment, and that's the cycle-bound series. The reason tempest
> > > is branchless was because there was no compatibility issue. If the split
> > > of tempest plugins introduces a potential incompatibility, then I would
> > > prefer aligning tempest to the existing model rather than introduce a
> > > parallel tempest-specific cycle just so that tempest can stay
> > > release-independent...
> > >
> > > I seem to remember there were drawbacks in branching tempest, though...
> > > Can someone with functioning memory brain cells summarize them again ?
> > >
> > 
> > 
> > Branchless Tempest enforces api stability across branches.
> 
> I'm sorry, but I'm having a hard time taking this statement seriously
> when the current source of tension is that the Tempest API itself
> is breaking for its plugins.
> 
> Maybe rather than talking about how to release compatible things
> together, we should go back and talk about why Tempest's API is changing
> in a way that can't be made backwards-compatible. Can you give some more
> detail about that?
> 

Well it's not, if it did that would violate all the stability guarantees
provided by Tempest's library and plugin interface. I've not ever heard of
these kind of backwards incompatibilities in those interfaces and we go to
all effort to make sure we don't break them. Where did the idea that
backwards incompatible changes where being introduced come from?

That being said things are definitely getting confused here, all andreaf was
talking about the branchless nature is ensuring we run the same tests against
service REST APIs, making sure we have API stability between releases in the
services when we say we do. (to answer ttx's question)

As for this whole thread I don't understand any of the points being brought up
in the original post or any of the follow ons, things seem to have been confused
from the start. The ask from users at the summit was simple. When a new 
OpenStack
release is pushed we push a tempest release to mark that (the next one will be
19.0.0 to mark Rocky). Users were complaining that many plugins don't have a
corresponding version to mark support for a new release. So when trying to run
against a rocky cloud you get tempest 19.0.0 and then a bunch of plugins for
various services at different sha1s which have to be manually looked up based
on dates. All users wanted at the summit was a tag for plugins like tempest
does with the first number in:

https://docs.openstack.org/tempest/latest/overview.html#release-versioning

which didn't seem like a bad idea to me. I'm not sure the best mechanism to
accomplish this, because I agree with much of what plugin maintainers were
saying on the thread about wanting to control their own releases. But the
desire to make sure users have a tag they can pull for the addition or
removal of a supported release makes sense as something a plugin should do.


-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][ci][infra] telemetry test broken on oslo.messaging stable/queens

2018-06-05 Thread Matthew Treinish
On Tue, Jun 05, 2018 at 10:47:17AM -0400, Ken Giusti wrote:
> Hi,
> 
> The telemetry integration test for oslo.messaging has started failing
> on the stable/queens branch [0].
> 
> A quick review of the logs points to a change in heat-tempest-plugin
> that is incompatible with the version of gabbi from queens upper
> constraints (1.40.0) [1][2].
> 
> The job definition [3] includes required-projects that do not have
> stable/queens branches - including heat-tempest-plugin.
> 
> My question - how do I prevent this job from breaking when these
> unbranched projects introduce changes that are incompatible with
> upper-constrants for a particular branch?

Tempest and plugins should be installed in a venv to isolate it's
requirements from the rest of what devstack is installing during the
job. This should be happening by default, the only place it gets installed
on system python and where there is a potential conflict is if INSTALL_TEMPEST
is set to True. See:

https://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/tempest#n57

That flag only exists so we test tempest coinstallability in the gate, as well
as for local devstack users.

We don't install branchless projects on system python in stable jobs exactly
because they're is a likely conflict between the stable branch's requirements
and master's (which is what branchless projects follow).

-Matt Treinish

> 
> I've tried to use override-checkout in the job definition, but that
> seems a bit hacky in this case since the tagged versions don't appear
> to work and I've resorted to a hardcoded ref [4].
> 
> Advice appreciated, thanks!
> 
> [0] https://review.openstack.org/#/c/567124/
> [1] 
> http://logs.openstack.org/24/567124/1/check/oslo.messaging-telemetry-dsvm-integration-rabbit/e7fdc7d/logs/devstack-gate-post_test_hook.txt.gz#_2018-05-16_05_20_05_624
> [2] 
> http://logs.openstack.org/24/567124/1/check/oslo.messaging-telemetry-dsvm-integration-rabbit/e7fdc7d/logs/devstacklog.txt.gz#_2018-05-16_05_19_06_332
> [3] 
> https://git.openstack.org/cgit/openstack/oslo.messaging/tree/.zuul.yaml?h=stable/queens#n250
> [4] https://review.openstack.org/#/c/572193/2/.zuul.yaml


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][all] A culture change (nitpicking)

2018-05-30 Thread Matthew Treinish
On Thu, May 31, 2018 at 12:21:35AM +, Fox, Kevin M wrote:
> To play devils advocate and as someone that has had to git bisect an ugly 
> regression once I still think its important not to break trunk. It can be 
> much harder to deal with difficult issues like that if trunk frequently 
> breaks.
> 
> Thanks,
> Kevin
> 
> From: Sean McGinnis [sean.mcgin...@gmx.com]
> Sent: Wednesday, May 30, 2018 5:01 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [tc][all] A culture change (nitpicking)
> 
> > "master should be always deployable and fully backward compatible and
> > so we cant let anything in anytime that could possibly regress anyone"
> >
> > Should we change that attitude too? Anyone agree? disagree?
> >
> > Thanks,
> > Dims
> >
> I'll definitely jump at this one.
> 
> I've always thought (and shared on the ML several times now) that our
> implied
> but not explicit support for CD from any random commit was a bad thing.
> 
> While I think it's good to support the idea that master is always
> deployable, I
> do not think it is a good mindset to think that every commit is a
> "release" and
> therefore should be supported until the end of time. We have a coordinated
> release for a reason, and I think design decisions and fixes should be
> based on
> the assumption that a release is a release and the point at which we
> need to be
> cognizant and caring about keeping backward compatibility. Doing that for
> every single commit is not ideal for the overall health of the product, IMO.
> 

It's more than just a CD guarantee, while from a quick glance it would seem like
that's the only value it goes much deeper than that. Ensuring that every commit
works, is deployable, and maintains backwards compatibility is what enables us
to have such a high quality end result at release time. Quite frankly it's
looking at every commit as always being a working unit that enables us to manage
a project this size at this velocity. Even if people assume no one is actually
CDing the projects(which we shouldn't), it's a flawed assumption to think that
everyone is running strictly the same code as what's in the release tarballs. I
can't think of any production cloud out there that doesn't carry patches to fix
things encountered in the real world. Or look at stable maint we regularly need
to backport fixes to fix bugs found after release. If we can't rely on these to
always work this makes our life much more difficult, both as upstream
maintainers but also as downstream consumers of OpenStack.

The other aspect to look at here is just the review mindset, supporting every
every commit is useable puts reviewers in the mindset to consider things like
backwards compatibility and deployability when looking at proposed changes. If
we stop looking for these potential issues, we t will also cause many more bugs
to be in our released code. To simply discount this as only a release concern
and punt this kind of scrutiny until it's time to release is not only going to
make release time much more stressful. Also, our testing is built to try and
ensure every commit works **before** we merge it. If we decided to take this
stance as a community then we should really just rip out all the testing,
because that's what it's there to verify and help us make sure we don't land a
change that doesn't work. If we don't actually care about that making sure every
commit is deployable we are wasting quite a lot of resources on it.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] final stages of python 3 transition

2018-05-29 Thread Matthew Treinish
On Tue, May 29, 2018 at 03:51:31PM -0400, Doug Hellmann wrote:
> Following up on this topic, at the Forum discussion last week (see
> https://etherpad.openstack.org/p/YVR-python-2-deprecation-timeline) the
> general plan outlined below was acceptable to most of the folks in the
> room with a few small changes (included below).
> 
> Excerpts from Doug Hellmann's message of 2018-04-25 16:54:46 -0400:
> > It's time to talk about the next steps in our migration from python
> > 2 to python 3.
> > 
> > Up to this point we have mostly focused on reaching a state where
> > we support both versions of the language. We are not quite there
> > with all projects, as you can see by reviewing the test coverage
> > status information at
> > https://wiki.openstack.org/wiki/Python3#Python_3_Status_of_OpenStack_projects
> > 
> > Still, we need to press on to the next phase of the migration, which
> > I have been calling "Python 3 first". This is where we use python
> > 3 as the default, for everything, and set up the exceptions we need
> > for anything that still requires python 2.
> > 
> > To reach that stage, we need to:
> > 
> > 1. Change the documentation and release notes jobs to use python 3.
> >(The Oslo team recently completed this, and found that we did
> >need to make a few small code changes to get them to work.)
> > 2. Change (or duplicate) all functional test jobs to run under
> >python 3.
> > 3. Change the packaging jobs to use python 3.
> > 4. Update devstack to use 3 by default and require setting a flag to
> >use 2. (This may trigger other job changes.)
> 
> Also:
> 
> - Ensure that devstack configures mod_wsgi (or whatever WSGI service) to
>   use Python 3 when deploying API components.

The python 3 dsvm jobs already do this for the most part. All API services that
support running as a wsgi application run under uwsgi with a single apache
redirecting traffic to those. This is the supported model for running wsgi
services on devstack. Currently keystone, glance, nova, placement, and cinder
run their API servers this way. Neutron doesn't run under as wsgi app (I
don't recall why this was never implemented for neutron) and swift doesn't run
in the py3 jobs at all. You can see an example of this here:

http://logs.openstack.org/08/550108/3/gate/tempest-full-py3/df744ef/controller/logs/

For other services it depends on how they implemented their devstack plugin. I
haven't done an inventory on how all the plugins are running things, so I don't
know what the status of each project is there.

> - Test "python version skew" within a service during a rolling upgrade
>   across multiple hosts.
> - Add an integration test job that does not include python2 on the host
>   at all.
> 
> That last item may block us from using other tools, such as ansible,
> that rely on python2. If the point of such a test is to ensure that
> we are properly installing (and running) our tools under python3,
> maybe *that's* what we want to check, instead of forbidding a python2
> package at all? Could we, for example, look at the set of packages
> installed under python2 and report errors if any OpenStack packages end
> up there?
> 
> > 
> > At that point, all of our deliverables will be produced using python
> > 3, and we can be relatively confident that if we no longer had
> > access to python 2 we could still continue operating. We could also
> > start updating deployment tools to use either python 3 or 2, so
> > that users could actually deploy using the python 3 versions of
> > services.
> > 
> > Somewhere in that time frame our third-party CI systems will need
> > to ensure they have python 3 support as well.
> > 
> > After the "Python 3 first" phase is completed we should release
> > one series using the packages built with python 3. Perhaps Stein?
> > Or is that too ambitious?
> > 
> > Next, we will be ready to address the prerequisites for "Python 3
> > only," which will allow us to drop Python 2 support.
> > 
> > We need to wait to drop python 2 support as a community, rather
> > than going one project at a time, to avoid doubling the work of
> > downstream consumers such as distros and independent deployers. We
> > don't want them to have to package all (or even a large number) of
> > the dependencies of OpenStack twice because they have to install
> > some services running under python 2 and others under 3. Ideally
> > they would be able to upgrade all of the services on a node together
> > as part of their transition to the new version, without ending up
> > with a python 2 version of a dependency along side a python 3 version
> > of the same package.
> > 
> > The remaining items could be fixed earlier, but this is the point
> > at which they would block us:
> > 
> > 1. Fix oslo.service functional tests -- the Oslo team needs help
> >maintaining this library. Alternatively, we could move all
> >services to use cotyledon (https://pypi.org/project/cotyledon/).
> > 
> > 2. Finish the unit test and 

Re: [openstack-dev] [all][tc][ptls] final stages of python 3 transition

2018-05-20 Thread Matthew Treinish
On Sun, May 20, 2018 at 03:05:34PM +0200, Thomas Goirand wrote:
> On 05/19/2018 07:54 PM, Matthew Treinish wrote:
> > On Sat, May 19, 2018 at 07:04:53PM +0200, Thomas Goirand wrote:
> >> using:
> >> - RBD backend
> >> - swift backend
> >> - swift+rgw
> > 
> > As for the backend store choice I don't have any personal experience using
> > either of these 3 as a backend store. That being said your choice of store
> > should be independent from the getting glance-api deployed behind uwsgi
> > and a webserver. 
> > 
> > Although, you might have trouble with swift on py3, because IIRC that still
> > isn't working. (unless something changed recently) But, the store config is
> > really independent from getting the api to receive and handle api requests
> > properly. 
> 
> Thanks for these details. What exactly is the trouble with the Swift
> backend? Do you know? Is anyone working on fixing it? At my company,
> we'd be happy to work on that (if of course, it's not too time demanding).
> 

Sorry I didn't mean the swift backend, but swift itself under python3:

https://wiki.openstack.org/wiki/Python3#OpenStack_applications_.28tc:approved-release.29

If you're trying to deploy everything under python3 I don't think you'll be
able to deploy swift. But if you already have a swift running then the glance
backend should work fine under pythom 3.

> >>> The issues glance has with running in a wsgi app are related to it's
> >>> use of async tasks via taskflow. (which includes the tasks api and
> >>> image import stuff) This shouldn't be hard to fix, and I've had
> >>> patches up to address these for months:
> >>>
> >>> https://review.openstack.org/#/c/531498/
> >>> https://review.openstack.org/#/c/549743/
> >>
> >> Do I need to backport these patches to Queens to run Glance the way I
> >> described? Will it also fix running Glance with mod_wsgi?
> > 
> > These patches are independent of getting things working for you. They
> > are only required for 2 API features in glance to work. The tasks api and
> > the image import api (which was added in queens). You don't need either
> > to upload images by default, and the patches will only ever be necessary
> > if you have something using those APIs (which personally I've never
> > encountered in the wild). There is also no test coverage in tempest or
> > any external test suite using these apis that I'm aware of so your CI
> > likely won't even be blocked by this. (which is how this situation
> > arose in the first place)
> 
> Allright, So hopefully, I'm very close from having Debian to gate
> properly in puppet-openstack upstream. As much as I could tell, Glance
> and Cinder are the only pieces that are still failing with SSL (and
> everything works already without SSL), so I must be very close to a nice
> result (after a course of nearly 2 months already).
> 
> Thanks again for all the very valuable details that you provided. I have
> to admit that I was starting to loose faith in the project, because of
> all the frustration of not finding a working solution.
> 
> I'll let the list knows when I have something that fully works and
> gating with puppet-openstack, of course.
> 
> Cheers,
> 
> Thomas Goirand (zigo)


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc][ptls] final stages of python 3 transition

2018-05-19 Thread Matthew Treinish
On Sat, May 19, 2018 at 07:04:53PM +0200, Thomas Goirand wrote:
> On 05/08/2018 06:22 PM, Matthew Treinish wrote:
> >> Glance - Has issues with image upload + uwsgi + eventlet [1]
> > 
> > This actually is a bit misleading. Glance works fine with image upload and 
> > uwsgi.
> > That's the only configuration of glance in a wsgi app that works because
> > of chunked transfer encoding not being in the WSGI protocol. [2] uwsgi 
> > provides
> > an alternate interface to read chunked requests which enables this to work.
> > If you look at the bugs linked off that release note about image upload
> > you'll see they're all fixed.
> 
> Hi Matt,
> 
> I'm quite happy to read the above. Just to make sure...
> 
> Can you confirm that Glance + Python 3 + uwsgi with SSL will work using
> the below setup?

So glance with uwsgi, python3, and ssl works fine. (with the caveats I
mentioned below) We test that on every commit in the integrated gate
today in the tempest-full-py3 job. It's been that way for almost a year
at this point.

> 
> using:
> - RBD backend
> - swift backend
> - swift+rgw

As for the backend store choice I don't have any personal experience using
either of these 3 as a backend store. That being said your choice of store
should be independent from the getting glance-api deployed behind uwsgi
and a webserver. 

Although, you might have trouble with swift on py3, because IIRC that still
isn't working. (unless something changed recently) But, the store config is
really independent from getting the api to receive and handle api requests
properly. 

> 
> If so, then I'll probably end up pushing for such uwsgi setup.
> 
> If I understand you correctly, it wont work with Apache mod_wsgi,
> because of these chcked transfer encoding, which is what made if fail
> when I tried using the RBD backend. Right?

This is correct, you can not use glance and mod_wsgi together because
it will not handle requests with chunked transfer encoding by default.
So it will fail on any image upload request made to glance that uses
chunked transfer encoding.

> 
> > The issues glance has with running in a wsgi app are related to it's
> > use of async tasks via taskflow. (which includes the tasks api and
> > image import stuff) This shouldn't be hard to fix, and I've had
> > patches up to address these for months:
> >
> > https://review.openstack.org/#/c/531498/
> > https://review.openstack.org/#/c/549743/
> 
> Do I need to backport these patches to Queens to run Glance the way I
> described? Will it also fix running Glance with mod_wsgi?

These patches are independent of getting things working for you. They
are only required for 2 API features in glance to work. The tasks api and
the image import api (which was added in queens). You don't need either
to upload images by default, and the patches will only ever be necessary
if you have something using those APIs (which personally I've never
encountered in the wild). There is also no test coverage in tempest or
any external test suite using these apis that I'm aware of so your CI
likely won't even be blocked by this. (which is how this situation
arose in the first place)

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc][ptls][glance] final stages of python 3 transition

2018-05-19 Thread Matthew Treinish
On Sat, May 19, 2018 at 07:21:22PM +0200, Thomas Goirand wrote:
> On 05/08/2018 07:55 PM, Matthew Treinish wrote:
> > I wrote up a doc about running under
> > apache when I added the uwsgi chunked transfer encoding support to glance 
> > about
> > running glance under apache here:
> > 
> > https://docs.openstack.org/glance/latest/admin/apache-httpd.html
> > 
> > Which includes how you have to configure things to get it working and a 
> > section
> > on why mod_wsgi doesn't work.
> 
> Thanks for that. Could you also push a uWSGI .ini configuration example
> file, as well as the mod_proxy example? There's so many options in uwsgi
> that I don't want to risk doing something wrong. I've pasted my config
> at the end of this message. Do you think it's also OK to use SSL
> directly with uwsgi, using the --https option? What about the 104 error
> that I've been experiencing? Is it because I'm not using mod_proxy?

There already are example configs in the glance repo. I pushed them up
when I added the documentation:

https://github.com/openstack/glance/tree/master/httpd

Those configs are are more or less just a mirror of what I setup for the
gate (and my personal cloud):

http://logs.openstack.org/47/566747/1/gate/tempest-full-py3/c7f3b2e/controller/logs/etc/glance/glance-uwsgi.ini.gz
http://logs.openstack.org/47/566747/1/gate/tempest-full-py3/c7f3b2e/controller/logs/apache_config/glance-wsgi-api_conf.txt.gz

The way I normally configure things is to do the ssl termination with
apache and then just limit the uwsgi socket on localhost. I haven't
tried setting up the ssl in uwsgi directly, since the idea was to
share a single web server with different endpoints off of it for
each api service. 

As for the 104 error there are several probable causes, without seeing
the full configuration and looking at the traffic it's hard to say
where the connections are getting reset. I would try getting your
config to mirror what we know to be working and then go from there.

> 
> BTW, there's no need to manually do the symlink, you can use instead:
> a2ensite uwsgi-glance-api.conf

Feel free to push a patch to update the docs.

> 
> Cheers,
> 
> Thomas Goirand (zigo)
> 
> 
> [uwsgi]
> 
> ### Generic UWSGI config ###
> 
> 
> # Override the default size for headers from the 4k default.
> buffer-size = 65535
> 
> # This avoids error 104: "Connection reset by peer"
> rem-header = Content-Lenght
> 
> # This is running standalone
> master = true
> 
> # Threads and processes
> enable-threads = true
> 
> processes = 4
> 
> # uwsgi recommends this to prevent thundering herd on accept.
> thunder-lock = true
> 
> plugins = python3
> 
> # This ensures that file descriptors aren't shared between the WSGI
> application processes.
> lazy-apps = true
> 
> # Log from the wsgi application: needs python3-pastescript as runtime
> depends.
> paste-logger = true
> 
> # automatically kill workers if master dies
> no-orphans = true
> 
> # exit instead of brutal reload on SIGTERM
> die-on-term = true
> 
> ##
> ### OpenStack service specific ###
> ##
> 
> # This is the standard port for the WSGI application, listening on all
> available IPs
> http-socket = :9292
> logto = /var/log/glance/glance-api.log
> name = glance-api
> uid = glance
> gid = glance
> chdir = /var/lib/glance
> wsgi-file = /usr/bin/glance-wsgi-api
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should we add a tempest-slow job?

2018-05-11 Thread Matthew Treinish
On Fri, May 11, 2018 at 08:45:39AM -0500, Matt Riedemann wrote:
> The tempest-full job used to run API and scenario tests concurrently, and if
> you go back far enough I think it also ran slow tests.

Well it's a bit more subtle than that. Skipping slow tests was added right
before we introduced parallel execution to tempest ~5 years ago:

https://github.com/openstack/tempest/commit/68a8060b24abd6b6bf99c4f9296bf418a8349a2d

Note those are in separate testr jobs which we migrated to the full job a bit
later in that cycle. The full job back then ran using nose and ran things
serially. But back then we didn't actually have any tests tagged as slow. It was
more of a future proofing thing because we were planning to add a bunch of
really slow heat tests we didn't want to run on every commit to each project.
The slow tags were first added for heat tests which came later in the havana
cycle.

> 
> Sometime in the last year or so, the full job was changed to run the
> scenario tests in serial and exclude the slow tests altogether. So the API
> tests run concurrently first, and then the scenario tests run in serial.
> During that change, some other tests were identified as 'slow' and marked as
> such, meaning they don't get run in the normal tempest-full job.

It was changed in:

https://github.com/openstack/tempest/commit/49505df20f3dc578506e479c2afa4a4f02e464bf

> 
> There are some valuable scenario tests marked as slow, however, like the
> only encrypted volume testing we have in tempest is marked slow so it
> doesn't get run on every change for at least nova.
> 
> There is only one job that can be run against nova changes which runs the
> slow tests but it's in the experimental queue so people forget to run it.
> 
> As a test, I've proposed a nova-slow job [1] which only runs the slow tests
> and only the compute API and scenario tests. Since there currently no
> compute API tests marked as slow, it's really just running slow scenario
> tests. Results show it runs 37 tests in about 37 minutes [2]. The overall
> job runtime was 1 hour and 9 minutes, which is on average less than the
> tempest-full job. The nova-slow job is also running scenarios that nova
> patches don't actually care about, like the neutron IPv6 scenario tests.
> 
> My question is, should we make this a generic tempest-slow job which can be
> run either in the integrated-gate or at least in nova/neutron/cinder
> consistently (I'm not sure if there are slow tests for just keystone or
> glance)? I don't know if the other projects already have something like this
> that they gate on. If so, a nova-specific job for nova changes is fine for
> me.

So there used to be an experimental queue tempest-all job which ran everything
in tempest, including the slow tests. I can't find it in the .zuul.yaml in the
tempest repo, so my assumption is that got dropped during the v3 migration.

I'm fine with adding a general purpose job for just running the slow tests to
the integrated gate if we think there is enough value from that. It's mostly
just a question of weighing the potential value from the increased coverage vs
the increased resource consumption for adding yet another job to the integrated
gate. Personally, I'm fine with that tradeoff.

-Matt Treinish

> 
> [1] https://review.openstack.org/#/c/567697/
> [2] 
> http://logs.openstack.org/97/567697/1/check/nova-slow/bedfafb/job-output.txt.gz#_2018-05-10_23_46_47_588138
> 



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc][ptls][glance] final stages of python 3 transition

2018-05-08 Thread Matthew Treinish
On Tue, May 08, 2018 at 03:02:05PM -0400, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2018-05-08 13:55:43 -0400:
> > On Tue, May 08, 2018 at 01:34:11PM -0400, Doug Hellmann wrote:
> > > 
> > > (added [glance] subject tag)
> > > 
> > > Excerpts from Matthew Treinish's message of 2018-05-08 12:22:56 -0400:
> > > > On Tue, May 08, 2018 at 05:01:36PM +0100, Graham Hayes wrote:
> > > > > On 08/05/18 16:53, Doug Hellmann wrote:
> > > > > > Excerpts from Graham Hayes's message of 2018-05-08 16:28:46 +0100:
> 
> [snip]
> 
> > > > > Glance - Has issues with image upload + uwsgi + eventlet [1]
> > > > 
> > > > This actually is a bit misleading. Glance works fine with image upload 
> > > > and uwsgi.
> > > > That's the only configuration of glance in a wsgi app that works because
> > > > of chunked transfer encoding not being in the WSGI protocol. [2] uwsgi 
> > > > provides
> > > > an alternate interface to read chunked requests which enables this to 
> > > > work.
> > > > If you look at the bugs linked off that release note about image upload
> > > > you'll see they're all fixed.
> > > 
> > > Is this documented somewhere?
> > 
> > The wsgi limitation or the glance usage? I wrote up a doc about running 
> > under
> > apache when I added the uwsgi chunked transfer encoding support to glance 
> > about
> > running glance under apache here:
> > 
> > https://docs.openstack.org/glance/latest/admin/apache-httpd.html
> > 
> > Which includes how you have to configure things to get it working and a 
> > section
> > on why mod_wsgi doesn't work.
> 
> I meant the glance usage so it sounds like you've covered the docs
> for that. Thanks!
> 
> > > > The issues glance has with running in a wsgi app are related to it's 
> > > > use of
> > > > async tasks via taskflow. (which includes the tasks api and image 
> > > > import stuff)
> > > > This shouldn't be hard to fix, and I've had patches up to address these 
> > > > for
> > > > months:
> > > > 
> > > > https://review.openstack.org/#/c/531498/
> > > > https://review.openstack.org/#/c/549743/
> > > > 
> > > > Part of the issue is that there is no api driven testing for these 
> > > > async api
> > > > functions or any documented way to test them. Which is why I marked the 
> > > > 2nd
> > > > one WIP, since I have no method to test it and after asking several 
> > > > times
> > > > for a test case or some other method to validate these APIs without an 
> > > > answer.
> > > 
> > > It would be helpful if some of this detail made its way into the glance
> > > section of 
> > > https://wiki.openstack.org/wiki/Python3#Python_3_Status_of_OpenStack_projects
> > 
> > It really doesn't have anything to do with Python 3 though since the bug 
> > with
> > glance's taskflow usage is on both py2 and py3. In fact we're already 
> > running
> > glance under uwsgi in the gate with python 3 today for the dsvm py3 jobs. 
> > The
> > reason these bugs haven't come up there is because there is no test coverage
> > for any of these async APIs. But I can add it to the wiki later today.
> 
> Will it block us from moving glance to python 3 if we drop the WSGI
> code from oslo.service so that the only way to deploy is behind
> some other WSGI server?
> 

It shouldn't be a blocker, the wsgi entrypoint just uses paste to expose the
wsgi app directly:

https://github.com/openstack/glance/blob/master/glance/common/wsgi_app.py#L59-L67

oslo.service doesn't come into play in that code path. So it won't block
the deploying with uwsgi model. The bugs addressed by the 2 patches I referenced
above will still be present though.

Although, I don't think glance uses oslo.service even in the case where it's
using the standalone eventlet server. It looks like it launches eventlet.wsgi
directly:

https://github.com/openstack/glance/blob/master/glance/common/wsgi.py

and I don't see oslo.service in the requirements file either:

https://github.com/openstack/glance/blob/master/requirements.txt

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc][ptls][glance] final stages of python 3 transition

2018-05-08 Thread Matthew Treinish
On Tue, May 08, 2018 at 01:34:11PM -0400, Doug Hellmann wrote:
> 
> (added [glance] subject tag)
> 
> Excerpts from Matthew Treinish's message of 2018-05-08 12:22:56 -0400:
> > On Tue, May 08, 2018 at 05:01:36PM +0100, Graham Hayes wrote:
> > > On 08/05/18 16:53, Doug Hellmann wrote:
> > > > Excerpts from Graham Hayes's message of 2018-05-08 16:28:46 +0100:
> > > >> On 08/05/18 16:09, Zane Bitter wrote:
> > > >>> On 30/04/18 17:16, Ben Nemec wrote:
> > > > Excerpts from Doug Hellmann's message of 2018-04-25 16:54:46 -0400:
> > > >> 1. Fix oslo.service functional tests -- the Oslo team needs help
> > > >>     maintaining this library. Alternatively, we could move all
> > > >>     services to use cotyledon 
> > > >> (https://pypi.org/project/cotyledon/).
> > > >>>
> > > >>> I submitted a patch that fixes the py35 gate (which was broken due to
> > > >>> changes between CPython 3.4 and 3.5), so once that merges we can flip
> > > >>> the gate back to voting:
> > > >>>
> > > >>> https://review.openstack.org/566714
> > > >>>
> > >  For everyone's awareness, we discussed this in the Oslo meeting today
> > >  and our first step is to see how many, if any, services are actually
> > >  relying on the oslo.service functionality that doesn't work in Python
> > >  3 today.  From there we will come up with a plan for how to move 
> > >  forward.
> > > 
> > >  https://bugs.launchpad.net/manila/+bug/1482633 is the original bug.
> > > >>>
> > > >>> These tests are currently skipped in both oslo_service and nova.
> > > >>> (Equivalent tests were removed from Neutron and Manila on the 
> > > >>> principle
> > > >>> that they're now oslo_service's responsibility.)
> > > >>>
> > > >>> This appears to be a series of long-standing bugs in eventlet:
> > > >>>
> > > >>> Python 3.5 failure mode:
> > > >>> https://github.com/eventlet/eventlet/issues/308
> > > >>> https://github.com/eventlet/eventlet/issues/189
> > > >>>
> > > >>> Python 3.4 failure mode:
> > > >>> https://github.com/eventlet/eventlet/issues/476
> > > >>> https://github.com/eventlet/eventlet/issues/145
> > > >>>
> > > >>> There are also more problems coming down the pipeline in Python 3.6:
> > > >>>
> > > >>> https://github.com/eventlet/eventlet/issues/371
> > > >>>
> > > >>> That one is resolved in eventlet 0.21, but we have that blocked by
> > > >>> upper-constraints:
> > > >>> http://git.openstack.org/cgit/openstack/requirements/tree/upper-constraints.txt#n135
> > > >>>
> > > >>>
> > > >>> Given that the code in question relates solely to standalone WSGI
> > > >>> servers with SSL and everything should have already migrated to 
> > > >>> Apache,
> > > >>> and that the upstream is clearly overworked and unlikely to merge 
> > > >>> fixes
> > > >>> any time soon (plus we would have to deal with the fallout of moving 
> > > >>> the
> > > >>> upper constraint), I agree that it would be preferable if we could 
> > > >>> just
> > > >>> ditch this functionality.
> > > >>
> > > >> There are a few projects that have not migrated, and some that have
> > > >> issues running in non standalone WSGI mode (due, ironically to 
> > > >> eventlet)
> > > >>
> > > >> We should probably get people to run these projects behind an reverse
> > > >> proxy, and terminate SSL there, but right now we don't have that
> > > >> documented.
> > > > 
> > > > Do you know which projects?
> > > 
> > > I know of 2:
> > > 
> > > Designate - mainly due to the major lack of resources available during
> > > the uwsgi goal period, and the level of work needed to unravel our
> > > tooling to support it.
> > > 
> > > Glance - Has issues with image upload + uwsgi + eventlet [1]
> > 
> > This actually is a bit misleading. Glance works fine with image upload and 
> > uwsgi.
> > That's the only configuration of glance in a wsgi app that works because
> > of chunked transfer encoding not being in the WSGI protocol. [2] uwsgi 
> > provides
> > an alternate interface to read chunked requests which enables this to work.
> > If you look at the bugs linked off that release note about image upload
> > you'll see they're all fixed.
> 
> Is this documented somewhere?

The wsgi limitation or the glance usage? I wrote up a doc about running under
apache when I added the uwsgi chunked transfer encoding support to glance about
running glance under apache here:

https://docs.openstack.org/glance/latest/admin/apache-httpd.html

Which includes how you have to configure things to get it working and a section
on why mod_wsgi doesn't work.

> 
> > 
> > The issues glance has with running in a wsgi app are related to it's use of
> > async tasks via taskflow. (which includes the tasks api and image import 
> > stuff)
> > This shouldn't be hard to fix, and I've had patches up to address these for
> > months:
> > 
> > https://review.openstack.org/#/c/531498/
> > https://review.openstack.org/#/c/549743/
> > 
> > Part of the issue is that there is no api driven testing for these async api

Re: [openstack-dev] [all][tc][ptls] final stages of python 3 transition

2018-05-08 Thread Matthew Treinish
On Tue, May 08, 2018 at 05:01:36PM +0100, Graham Hayes wrote:
> On 08/05/18 16:53, Doug Hellmann wrote:
> > Excerpts from Graham Hayes's message of 2018-05-08 16:28:46 +0100:
> >> On 08/05/18 16:09, Zane Bitter wrote:
> >>> On 30/04/18 17:16, Ben Nemec wrote:
> > Excerpts from Doug Hellmann's message of 2018-04-25 16:54:46 -0400:
> >> 1. Fix oslo.service functional tests -- the Oslo team needs help
> >>     maintaining this library. Alternatively, we could move all
> >>     services to use cotyledon (https://pypi.org/project/cotyledon/).
> >>>
> >>> I submitted a patch that fixes the py35 gate (which was broken due to
> >>> changes between CPython 3.4 and 3.5), so once that merges we can flip
> >>> the gate back to voting:
> >>>
> >>> https://review.openstack.org/566714
> >>>
>  For everyone's awareness, we discussed this in the Oslo meeting today
>  and our first step is to see how many, if any, services are actually
>  relying on the oslo.service functionality that doesn't work in Python
>  3 today.  From there we will come up with a plan for how to move forward.
> 
>  https://bugs.launchpad.net/manila/+bug/1482633 is the original bug.
> >>>
> >>> These tests are currently skipped in both oslo_service and nova.
> >>> (Equivalent tests were removed from Neutron and Manila on the principle
> >>> that they're now oslo_service's responsibility.)
> >>>
> >>> This appears to be a series of long-standing bugs in eventlet:
> >>>
> >>> Python 3.5 failure mode:
> >>> https://github.com/eventlet/eventlet/issues/308
> >>> https://github.com/eventlet/eventlet/issues/189
> >>>
> >>> Python 3.4 failure mode:
> >>> https://github.com/eventlet/eventlet/issues/476
> >>> https://github.com/eventlet/eventlet/issues/145
> >>>
> >>> There are also more problems coming down the pipeline in Python 3.6:
> >>>
> >>> https://github.com/eventlet/eventlet/issues/371
> >>>
> >>> That one is resolved in eventlet 0.21, but we have that blocked by
> >>> upper-constraints:
> >>> http://git.openstack.org/cgit/openstack/requirements/tree/upper-constraints.txt#n135
> >>>
> >>>
> >>> Given that the code in question relates solely to standalone WSGI
> >>> servers with SSL and everything should have already migrated to Apache,
> >>> and that the upstream is clearly overworked and unlikely to merge fixes
> >>> any time soon (plus we would have to deal with the fallout of moving the
> >>> upper constraint), I agree that it would be preferable if we could just
> >>> ditch this functionality.
> >>
> >> There are a few projects that have not migrated, and some that have
> >> issues running in non standalone WSGI mode (due, ironically to eventlet)
> >>
> >> We should probably get people to run these projects behind an reverse
> >> proxy, and terminate SSL there, but right now we don't have that
> >> documented.
> > 
> > Do you know which projects?
> 
> I know of 2:
> 
> Designate - mainly due to the major lack of resources available during
> the uwsgi goal period, and the level of work needed to unravel our
> tooling to support it.
> 
> Glance - Has issues with image upload + uwsgi + eventlet [1]

This actually is a bit misleading. Glance works fine with image upload and 
uwsgi.
That's the only configuration of glance in a wsgi app that works because
of chunked transfer encoding not being in the WSGI protocol. [2] uwsgi provides
an alternate interface to read chunked requests which enables this to work.
If you look at the bugs linked off that release note about image upload
you'll see they're all fixed.

The issues glance has with running in a wsgi app are related to it's use of
async tasks via taskflow. (which includes the tasks api and image import stuff)
This shouldn't be hard to fix, and I've had patches up to address these for
months:

https://review.openstack.org/#/c/531498/
https://review.openstack.org/#/c/549743/

Part of the issue is that there is no api driven testing for these async api
functions or any documented way to test them. Which is why I marked the 2nd
one WIP, since I have no method to test it and after asking several times
for a test case or some other method to validate these APIs without an answer.

In fact people are running glance under uwsgi in production already because it 
makes a lot of things easier and the current issues don't effect most users.

-Matt Treinish


> 
> I am sure there are probably others, but I know of these 2.
> 
> [1] https://docs.openstack.org/releasenotes/glance/unreleased.html#b1
> 

[2] There are a few other ways, as some other wsgi servers have grafted on
support for chunked transfer encoding. But, most wsgi servers have not
implemented a method.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [all][tc][ptls] final stages of python 3 transition

2018-04-30 Thread Matthew Treinish
On Mon, Apr 30, 2018 at 04:16:35PM -0500, Ben Nemec wrote:
> Resending from an address that is subscribed to the list.  Apologies to
> those of you who get this twice.
> 
> On 04/30/2018 10:06 AM, Doug Hellmann wrote:
> > It would be useful to have more input from PTLs on this issue, so I'm
> > CCing all of them to get their attention.
> > 
> > Excerpts from Doug Hellmann's message of 2018-04-25 16:54:46 -0400:
> > > It's time to talk about the next steps in our migration from python
> > > 2 to python 3.
> > > 
> > > Up to this point we have mostly focused on reaching a state where
> > > we support both versions of the language. We are not quite there
> > > with all projects, as you can see by reviewing the test coverage
> > > status information at
> > > https://wiki.openstack.org/wiki/Python3#Python_3_Status_of_OpenStack_projects
> > > 
> > > Still, we need to press on to the next phase of the migration, which
> > > I have been calling "Python 3 first". This is where we use python
> > > 3 as the default, for everything, and set up the exceptions we need
> > > for anything that still requires python 2.
> > > 
> > > To reach that stage, we need to:
> > > 
> > > 1. Change the documentation and release notes jobs to use python 3.
> > > (The Oslo team recently completed this, and found that we did
> > > need to make a few small code changes to get them to work.)
> > > 2. Change (or duplicate) all functional test jobs to run under
> > > python 3.
> > > 3. Change the packaging jobs to use python 3.
> > > 4. Update devstack to use 3 by default and require setting a flag to
> > > use 2. (This may trigger other job changes.)
> > > 
> > > At that point, all of our deliverables will be produced using python
> > > 3, and we can be relatively confident that if we no longer had
> > > access to python 2 we could still continue operating. We could also
> > > start updating deployment tools to use either python 3 or 2, so
> > > that users could actually deploy using the python 3 versions of
> > > services.
> > > 
> > > Somewhere in that time frame our third-party CI systems will need
> > > to ensure they have python 3 support as well.
> > > 
> > > After the "Python 3 first" phase is completed we should release
> > > one series using the packages built with python 3. Perhaps Stein?
> > > Or is that too ambitious?
> > > 
> > > Next, we will be ready to address the prerequisites for "Python 3
> > > only," which will allow us to drop Python 2 support.
> > > 
> > > We need to wait to drop python 2 support as a community, rather
> > > than going one project at a time, to avoid doubling the work of
> > > downstream consumers such as distros and independent deployers. We
> > > don't want them to have to package all (or even a large number) of
> > > the dependencies of OpenStack twice because they have to install
> > > some services running under python 2 and others under 3. Ideally
> > > they would be able to upgrade all of the services on a node together
> > > as part of their transition to the new version, without ending up
> > > with a python 2 version of a dependency along side a python 3 version
> > > of the same package.
> > > 
> > > The remaining items could be fixed earlier, but this is the point
> > > at which they would block us:
> > > 
> > > 1. Fix oslo.service functional tests -- the Oslo team needs help
> > > maintaining this library. Alternatively, we could move all
> > > services to use cotyledon (https://pypi.org/project/cotyledon/).
> 
> For everyone's awareness, we discussed this in the Oslo meeting today and
> our first step is to see how many, if any, services are actually relying on
> the oslo.service functionality that doesn't work in Python 3 today.  From
> there we will come up with a plan for how to move forward.
> 
> https://bugs.launchpad.net/manila/+bug/1482633 is the original bug.
> 
> > > 
> > > 2. Finish the unit test and functional test ports so that all of
> > > our tests can run under python 3 (this implies that the services
> > > all run under python 3, so there is no more porting to do).
> 
> And integration tests?  I know for the initial python 3 goal we said just
> unit and functional, but it seems to me that we can't claim full python 3
> compatibility until we can run our tempest jobs against python 3-based
> OpenStack.

They already are running, and have been since the Atlanta PTG (which was the
same cycle as the goal):

https://review.openstack.org/#/c/436540/

You can see the gate jobs history here:

http://status.openstack.org/openstack-health/#/job/tempest-full-py3

-Matt Treinish

> 
> > > 
> > > Finally, after we have *all* tests running on python 3, we can
> > > safely drop python 2.
> > > 
> > > We have previously discussed the end of the T cycle as the point
> > > at which we would have all of those tests running, and if that holds
> > > true we could reasonably drop python 2 during the beginning of the
> > > U cycle, in late 2019 and before the 

Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-30 Thread Matthew Treinish
On Mon, Apr 30, 2018 at 09:21:22AM -0700, melanie witt wrote:
> On Fri, 27 Apr 2018 17:40:20 +0800, Chen Ch Ji wrote:
> > According to requirements and comments, now we opened the CI runs with
> > run_validation = True
> > And according to [1] below, for example, [2] need the ssh validation
> > passed the test
> > 
> > And there are a couple of comments need some enhancement on the logs of
> > CI such as format and legacy incorrect links of logs etc
> > the newest logs sample can be found [3] (take n-cpu as example and those
> > logs are with _white.html)
> > 
> > Also, the blueprint [4] requested by previous discussion post here again
> > for reference
> 
> Thank you for alerting us about the completion of the work on the z/VM CI.
> The logs look much improved and ssh connectivity and metadata functionality
> via config drive is being verified by tempest.
> 
> The only strange thing I noticed is it appears tempest starts multiple times
> in the log [0]. Do you know what's going on there?

This is normal, it's an artifact of a few things. The first time config is
dumped to the logs is because of tempest verify-config being run as part of
devstack:

https://github.com/openstack-dev/devstack/blob/master/lib/tempest#L590

You also see the API requests this command is making being logged. Then
when the tempest tests are actually being run the config is dumped to the logs
once per test worker process. Basically every time we parse the config file at
debug log levels it get's printed to the log file.

FWIW, you can also see this in a gate run too:
http://logs.openstack.org/90/539590/10/gate/tempest-full/4b0a136/controller/logs/tempest_log.txt

-Matt Treinish


> 
> That said, since things are looking good with z/VM CI now, we've added the
> z/VM patch series back into a review runway today.
> 
> Cheers,
> -melanie
> 
> [0] 
> http://extbasicopstackcilog01.podc.sl.edst.ibm.com/test_logs/jenkins-check-nova-master-17444/logs/tempest.log
> from https://review.openstack.org/527658
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone] [infra] Post PTG performance testing needs

2018-03-06 Thread Matthew Treinish
On Tue, Mar 06, 2018 at 03:28:57PM -0600, Lance Bragstad wrote:
> Hey all,
> 
> Last week during the PTG the keystone team sat down with a few of the
> infra folks to discuss performance testing. The major hurdle here has
> always been having dedicated hosts to use for performance testing,
> regardless of that being rally, tempest, or a home-grown script.
> Otherwise results vary wildly from run to run in the gate due to
> differences from providers or noisy neighbor problems.
> 
> Opening up the discussion here because it sounded like some providers
> (mnaser, mtreinish) had some thoughts on how we can reserve specific
> hardware for these cases.

While I like being called a provider, I'm not really one. I was more trying to
find a use case for my closet cloud [1], and was volunteering to open that up
to external/infra use to provide dedicated hardware for consistent performance
testing. That's still an option, (I mean the boxes are just sitting there not
doing anything) and I'd gladly work with infra and keystone to get that
working. But, if mnaser and vexxhost have an alternative route with their real
capacity and modern hardware, that's probably a better route to go.

-Matt Treinish

[1] https://blog.kortar.org/?p=380
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zuul] requirements-check FAILURE

2018-01-17 Thread Matthew Treinish
On Wed, Jan 17, 2018 at 10:01:13PM +, Kwan, Louie wrote:
> Would like to add the following module to openstack.masakari project
> 
> https://github.com/pytransitions/transitions
> 
> Got the following error with zuul requirements-check
> 
> Requirement set([Requirement(package=u'transitions', location='', 
> specifiers='>=0.6.4', markers=u'', comment='', extras=frozenset([]))]) not in 
> openstack/requirements
> 
> http://logs.openstack.org/88/534888/3/check/requirements-check/edec7bf/ara/
> 
> Any tip or insight to fix it?

That error is caused by the dependency you're adding not being tracked in
global requirements. To add it to the masakari project you first have to 
add it to the openstack/requirements project.

The process for doing that is documented in:

https://docs.openstack.org/requirements/latest/

That link also explains the reasoning behind why we handle adding dependencies
centrally like this.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements] adding uwsgi to global-requirements

2017-12-19 Thread Matthew Treinish
On Tue, Dec 19, 2017 at 07:50:59PM -0500, Sam Yaple wrote:
> >  Original Message 
> > Subject: Re: [openstack-dev] [requirements] adding uwsgi to 
> > global-requirements
> > Local Time: December 19, 2017 6:34 PM
> > UTC Time: December 19, 2017 11:34 PM
> > From: mtrein...@kortar.org
> > To: Sam Yaple , OpenStack Development Mailing List (not for 
> > usage questions) 
> >
> > On Tue, Dec 19, 2017 at 05:46:34PM -0500, Sam Yaple wrote:
> >
> >> Hello,
> >> I wanted to bring up the idea of getting uwsgi into the requirements repo. 
> >> I seem to recall this being discussed a couple of years back, but for the 
> >> life of me I cannot find the thread, so forgive me if I am digging up 
> >> ancient history.
> >> I would like to see uwsgi in global-requirements.txt and 
> >> upper-constraints.txt .
> >> Since the recent goal of running all api's behind WSGI has been mostly 
> >> accomplished, I have seen a migration toward wsgi based deploys. Some of 
> >> which use uwsgi+nginx/apache.
> >> Glance recommends uwsgi [0] as "the current recommended way to deploy" if 
> >> the docs are to be believed.
> >>
> >> Umm finish the sentence there, it says "with a real web server". The 
> >> context
> >> there is use uwsgi if you want to run glance with Apache HTTPD, nginx, 
> >> etc. Not
> >> a general recommendation to use uwsgi.
> 
> I did say uwsgi+nginx/apache on the line directly before that. You cannot run 
> wsgi+apache with glance at all (directly) due to the lack of chunked transfer 
> support making a wsgi deploy of glance *require* uwsgi. Though this goes to 
> your support further of not defining how people deploy.
> 
> >> In fact if you read more of the doc it
> >> outlines issues involved with using uwsgi and glance and lots of tradeoffs 
> >> with
> >> doing that. The wording in the more recent doc might make the situation a 
> >> bit
> >> clearer. [1] If you want all the API functionality to work in glance you 
> >> should
> >> still be using the eventlet server, using WSGI means things like the tasks 
> >> api
> >> doesn't work. (although TBH, I don't think most people care about that)
> >> The LOCI project has been including uwsgi (and recommending people use it) 
> >> since its inception.
> >> These facts, in my opinion, make a pretty strong case for uwsgi being an 
> >> indirect dependancy and worthy of inclusion and tracking in the 
> >> requirements repo.
> >> My question for the community, are there strong feelings against including 
> >> uwsgi? If so, why?
> >>
> >> For the majority of projects out there we test using the WSGI interface 
> >> using
> >> uWSGI, but uWSGI isn't actually a requirement. The cross project goal[2] 
> >> where
> >> we moved all the devstack jobs to use uWSGI was not about using uWSGI, but
> >> about using the standard interfaces for deploying web services under a web
> >> server, the goal is about exposing a WSGI not using uWSGI. The uWSGI part 
> >> in
> >> the goal is an implementation detail for setting up the gate jobs.
> 
> Agreed. I should clarify, I am in no way trying to force anyone to use uwsgi. 
> Quite the opposite. I am talking specifically about those who _choose_ to use 
> uwsgi. Which, as you point out, the gate jobs already do as part of the 
> implementation.
> 
> >> We don't want to dictate how people are deploying the webapps, instead we 
> >> say
> >> we use the normal interfaces for deploying python webapps. So if your used 
> >> to
> >> use mod_wsgi with apache, gunicorn + ngix, or uwsgi standalone, etc. you 
> >> can do
> >> that. uwsgi in this context is the same as apache. It's not actually a
> >> requirement for any project, you can install and run everything without 
> >> it, and
> >> in fact many people do.
> >>
> >> The other half of this is that just pip installing uwsgi is not always 
> >> enough
> >> to actually leverage using it with a webserver. You also need the web 
> >> server
> >> support for talking to uwsgi. If that's how you use choose to deploy it, 
> >> which
> >> is not always straightforward. For example, take a look at how it is 
> >> installed
> >> in devstack to make uwsgi work properly with apache. [3] There are also 
> >> other
> >> concerns using pip to install uwsgi. uWSGI is a C application and not 
> >> actually
> >> a python project. It also supports running applications in several 
> >> languages[4],
> >> not just python. The pypi published install is kind of a hack to download 
> >> the
> >> tarball and compile the application with only the python bindings compiled.
> >> The setup.py literally calls out to gcc to build it, it's essentially a 
> >> makefile
> >> written in python. [5][6]
> >>
> >> So what advantage do we get by adding it to global requirements when it's 
> >> not
> >> actually a requirement for any project, nor is it even python code?
> 
> Not to discount the rest of your reply, but it does seem geared toward the 
> idea that this 

Re: [openstack-dev] [requirements] adding uwsgi to global-requirements

2017-12-19 Thread Matthew Treinish
On Tue, Dec 19, 2017 at 05:46:34PM -0500, Sam Yaple wrote:
> Hello,
> 
> I wanted to bring up the idea of getting uwsgi into the requirements repo. I 
> seem to recall this being discussed a couple of years back, but for the life 
> of me I cannot find the thread, so forgive me if I am digging up ancient 
> history.
> 
> I would like to see uwsgi in global-requirements.txt and 
> upper-constraints.txt .
> 
> Since the recent goal of running all api's behind WSGI has been mostly 
> accomplished, I have seen a migration toward wsgi based deploys. Some of 
> which use uwsgi+nginx/apache.
> 
> Glance recommends uwsgi [0] as "the current recommended way to deploy" if the 
> docs are to be believed.

Umm finish the sentence there, it says "with a real web server". The context
there is use uwsgi if you want to run glance with Apache HTTPD, nginx, etc. Not
a general recommendation to use uwsgi. In fact if you read more of the doc it
outlines issues involved with using uwsgi and glance and lots of tradeoffs with
doing that. The wording in the more recent doc might make the situation a bit
clearer. [1] If you want all the API functionality to work in glance you should
still be using the eventlet server, using WSGI means things like the tasks api
doesn't work. (although TBH, I don't think most people care about that)

> 
> The LOCI project has been including uwsgi (and recommending people use it) 
> since its inception.
> 
> These facts, in my opinion, make a pretty strong case for uwsgi being an 
> indirect dependancy and worthy of inclusion and tracking in the requirements 
> repo.
> 
> My question for the community, are there strong feelings against including 
> uwsgi? If so, why?

For the majority of projects out there we test using the WSGI interface using
uWSGI, but uWSGI isn't actually a requirement. The cross project goal[2] where
we moved all the devstack jobs to use uWSGI was not about using uWSGI, but
about using the standard interfaces for deploying web services under a web
server, the goal is about exposing a WSGI not using uWSGI. The uWSGI part in
the goal is an implementation detail for setting up the gate jobs.

We don't want to dictate how people are deploying the webapps, instead we say
we use the normal interfaces for deploying python webapps. So if your used to
use mod_wsgi with apache, gunicorn + ngix, or uwsgi standalone, etc. you can do
that. uwsgi in this context is the same as apache. It's not actually a
requirement for any project, you can install and run everything without it, and
in fact many people do.

The other half of this is that just pip installing uwsgi is not always enough
to actually leverage using it with a webserver. You also need the web server
support for talking to uwsgi. If that's how you use choose to deploy it, which
is not always straightforward. For example, take a look at how it is installed
in devstack to make uwsgi work properly with apache. [3] There are also other
concerns using pip to install uwsgi. uWSGI is a C application and not actually
a python project. It also supports running applications in several languages[4],
not just python. The pypi published install is kind of a hack to download the
tarball and compile the application with only the python bindings compiled.
The setup.py literally calls out to gcc to build it, it's essentially a makefile
written in python. [5][6]

So what advantage do we get by adding it to global requirements when it's not
actually a requirement for any project, nor is it even python code?


-Matt Treinish

> 
> [0] https://docs.openstack.org/glance/pike/admin/apache-httpd.html#uwsgi
[1] https://docs.openstack.org/glance/latest/admin/apache-httpd.html
[2] https://governance.openstack.org/tc/goals/pike/deploy-api-in-wsgi.html
[3] 
https://github.com/openstack-dev/devstack/blob/57ddd7c1613208017728c50370d2e259c072d511/lib/apache#L76-L116
[4] http://uwsgi-docs.readthedocs.io/en/latest/LanguagesAndPlatforms.html
[5] https://github.com/unbit/uwsgi/blob/master/setup.py
[6] https://github.com/unbit/uwsgi/blob/master/uwsgiconfig.py#L254-L278


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Removal of tempest plugin code from openstack/ironic & openstack/ironic-inspector

2017-12-18 Thread Matthew Treinish
On Mon, Dec 18, 2017 at 01:37:13PM -0700, Julia Kreger wrote:
> > And actually I almost think the holiday time is the best time since the
> > fewest number of people are going to care. But maybe I'm wrong. I do wonder
> > if nobody is around to watch a 3rd Party CI for two weeks, how likely is it
> > to still be working when they get back?
> >
> > I'm not vehemently opposed to delaying, but somewhat opposed.
> >
> > Thoughts?
> 
> I agree and disagree of course. :)  Arkady raises a good point about
> availability of people, and the simple fact is they will be broken if
> nobody is around to fix them. That being said, the true measurement is
> going to be if third party CI shows the commits to remove the folders
> as passing. If they pass, ideally we should proceed with removing them
> sooner rather than later to avoid confusion. If they break after the
> removal of the folders but still ultimately due to the removal of the
> folders, we have found a bug that will need to be corrected, and we
> can always temporarily revert to restore the folders in the mean time
> until people return.
> 

Well it depends, there might not be a failure mode with removing the in-tree
plugins. It depends on the test selection the 3rd party ci's run. (or if they're
doing anything extra downstream which has a hard dependency on the in-tree
stuff, like importing from it directly) If they're running anything from tempest
itself it's unlikely they'd fail because of the plugin removal. The plugins are
loaded dynamically during test discovery, and if you remove a plugin then it
just doesn't get loaded by tempest anymore. So for the normal case this would
only cause a failure if the only tests being selected were in the plugin (and
then it fails because no tests were run).

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic][tempest][qa] dropping the ironic-inspector-tempest-plugin repo

2017-10-31 Thread Matthew Treinish
On Tue, Oct 31, 2017 at 08:22:10PM +0200, Pavlo Shchelokovskyy wrote:
> Hi all,
> 
> there exists this repo
> http://git.openstack.org/cgit/openstack/ironic-inspector-tempest-plugin.
> It is basically empty and there's a single pending review adding initial
> cookiecutter-generated project stuff.
> I am not sure who/how asked for creation of this project (may be automated
> for the x-project goal?),
> but eventually Ironic community decided to keep tempest tests for both
> ironic and ironic-inspector in a single repo 'ironic-tempest-plugin' (work
> of moving ironic tests there is in progress, inspector will follow).
> 
> Thus a question to QA/tempest team - would that play nice with tempest and
> scripts/logic around running it on gates if two separate projects with
> different names would have a common tempest plugin project?
> If yes, then we should request to delete this
> 'ironic-inspector-tempest-plugin' project as it is and will be empty and
> useless, just confusing users.
> If not, ironic community probably might have to re-assess its decision...
> 

I don't see anything wrong with this. The x-project goal is mostly about
packaging for the plugins and ensuring we're actually doing branchless testing
for all projects. It was always up to the project teams with plugins to
maintain and organize the plugins however they saw fit. So if having 1 plugin
for ironic and ironic-inspector makes the most sense that's what we should do.
If a blank repo was created by someone for a separate ironic-inspector tempest
plugin I think deleting that repo is fine.

As for the mechanics of setting up the gate jobs, there aren't any complications
with doing this. You just make sure you install the combined plugin on the test
jobs for both projects and it should work fine. (this is actually part of the
reason for the x-project goal, because doing this kind of thing with bundled
in-tree plugins is much more difficult)

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-10-30 Thread Matthew Treinish
From a quick glance at the logs my guess is that the issue is related to this 
stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane right now 
(which is why this is a top post, sorry) so I can't really dig much more than 
that. I'll try to take a deeper look at things later when I'm on solid ground. 
(hopefully someone will beat me to it by then though) 

-Matt Treinish

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser  
wrote:
>Hi everyone,
>
>I'm looking for some help regarding an issue that we're having with
>the Puppet OpenStack modules, we've had very inconsistent failures in
>the Xenial with the following error:
>
>http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>Details: {u'message': u'Unable to associate floating IP
>172.24.5.17 to fixed IP 10.100.0.8 for instance
>d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>timed out', u'code': 400}
>
>At this point, we're at a bit of a loss.  I've tried my best in order
>to find the root cause however we have not been able to do this.  It
>was persistent enough that we elected to go non-voting for our Xenial
>gates, however, with no fix ahead of us, I feel like this is a waste
>of resources and we need to either fix this or drop CI for Ubuntu.  We
>don't deploy on Ubuntu and most of the developers working on the
>project don't either at this point, so we need a bit of resources.
>
>If you're a user of Puppet on Xenial, we need your help!  Without any
>resources going to fix this, we'd unfortunately have to drop support
>for Ubuntu because of the lack of resources to maintain it (or
>assistance).  We (Puppet OpenStack team) would be more than happy to
>work together to fix this so pop-in at #puppet-openstack or reply to
>this email and let's get this issue fixed.
>
>Thanks,
>Mohammed
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe:
>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][infra][devstack] Changes to devstack-core for zuul v3 migration

2017-10-17 Thread Matthew Treinish
On Tue, Oct 10, 2017 at 04:34:29PM -0400, Sean Dague wrote:
> On 10/10/2017 04:22 PM, Dean Troyer wrote:
> > On Tue, Oct 10, 2017 at 1:15 PM, Andrea Frittoli
> >  wrote:
> >> - we will treat +1 from infra-core members on devstack changes in the
> >> ansible bits as +2
> >> - add clarkb to devstack-core, since he's quite aware of the the devstack
> >> codebase
> > 
> > +2 on both of these proposals!
> 
> Agreed +2.
> 

+2 from me too.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Recent Changes in os-testr and migrating to stestr

2017-09-13 Thread Matthew Treinish
Hi Everyone,

People might have noticed the recently os-testr 1.0.0 was recently released and
made some pretty big changes to the internals of the ostestr script. ostestr was
originally created to replace the local pretty_tox.sh scripts in all the 
projects
with a consistent interface. Because of that the script originally literally 
just
would use subprocess to run testr. However, in the 1.0.0 release this has 
changed
and now uses stestr's python interface to run tests. [1]

stestr[2][3] is a fork of testr I started several months ago. The testrepository
project is pretty much completely inactive at this point and has several long
standing bugs. stestr was started to address these bugs, but also limits the 
scope
to just a parallel python test runner. (instead of a generic test runner runner
like testr is) The best example of the improvements stestr brings fixes the dbm
issue between python versions so you don't have to delete the times.dbm file
anymore between. (but there are a lot of other improvements so far)

So what does this mean for current ostestr users, in most cases not much. The
only external differences are that the repository by default is are to .stestr
instead of .testrepository and ostestr will emit a warning until a .stestr.conf
file is created. This probably means .stestr/ should be added to .gitignore
before too long, but it's not really a blocker. There is an issue with neutron
(and networking-*) functional tests because their post-test-hook runs chmod on
.testrepository unconditionally. This will need to be updated for things to
with the new ostestr version pass.

As for the warning about the .stestr.conf ostestr parses the .testr.conf and
tries to guess the parameters it needs to run, but it's not a perfect process
and that's why adding a .stestr.conf is best. I've seen a couple of cases where
projects were setting custom env variable in the .testr.conf where this process
didn't work and creating a .stestr.conf and adding the env vars to tox.ini was
the only way to address this.

Moving forward I'd like to get everything switched over to using stestr, either
directly or indirectly via ostestr. (although longer term I'd like to see the
ostestr script go away because it doesn't really do anything anymore) This way
we can get everything using an actively maintained test runner.

I'd also like to apologize for the timing of this transition, we originally
intended to just test the waters during the PTG while people were f2f and debug
issues. But, we wanted to wait until after the PTG (for obvious reasons) to make
the big cut over. But, things ballooned kinda quickly and we're now using the 
new
version of ostestr. I'll be around all week at the PTG or on irc to help people
that might be having issues related to the new os-testr release.

Thanks,

-Matt Treinish

[1] https://review.openstack.org/#/c/488441/
[2] https://github.com/mtreinish/stestr
[3] http://stestr.readthedocs.io/en/latest/


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] mod_wsgi support (pike bug?)

2017-09-03 Thread Matthew Treinish
On Sun, Sep 03, 2017 at 01:47:24PM -0400, Mohammed Naser wrote:
> Hi folks,
> 
> I've attempted to enable mod_wsgi support in our dev environment with
> Puppet however it results in a traceback.  I figured it was an
> environment thing so I looked into moving the Puppet CI to test using
> mod_wsgi and it resulted in the same error.
> 
> http://logs.openstack.org/82/500182/3/check/gate-puppet-openstack-integration-4-scenario004-tempest-ubuntu-xenial/791523c/logs/apache/neutron_wsgi_error.txt.gz
> 
> Would anyone from the Neutron team be able to give input on this?
> We'd love to add gating for Neutron deployed by mod_wsgi which can
> help find similar issues.
> 

Neutron never got their wsgi support working in Devstack either. The patch
adding that: https://review.openstack.org/#/c/439191/ never passed the gate and
seems to have lost the attention of the author. The wsgi support in neutron
probably doesn't work yet, and is definitely untested. IIRC, the issue they were
hitting was loading the config files. [1] I don't think I saw any progress on it
after that though.

The TC goal doc [2] probably should say something about it never landing and
missing pike.

-Matt Treinish


[1] http://lists.openstack.org/pipermail/openstack-dev/2017-June/117830.html
[2] 
https://governance.openstack.org/tc/goals/pike/deploy-api-in-wsgi.html#neutron


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements][I18n][OpenStackClient][Quality Assurance][Security][Telemetry][ec2-api][heat][horizon][ironic][kuryr][magnum][manila][monasca][murano][neutron][octavia][senlin][solu

2017-08-10 Thread Matthew Treinish
On Thu, Aug 10, 2017 at 03:46:32PM +1000, Tony Breeds wrote:
> 
> Hi All,
> In an effort to qualify which projects are likley to be affected if
> when we open the requirements repo I generated a list of all repos that:
> 
> 1. Subscribe to requirements management
> 2. Do not already have a stable/pike branch
> 3. Do not follow the cycle-with-milestones, cycle-trailing or
>independent release models
> 4. Are not a 'branchless' project (tempest or tempest-plugin)
> 
> These repos I believe *should* have a stable/pike branch or will see
> problems when we open openstack/requirements.  Those issues were
> described in [1]
> 
> It turns out close to 1/3rd of projects that subscribe to requirements
> management are not ready for us to re-open for master.  So we need you
> help to get that number down to a much more acceptable number.
> 
> The good news is it's pretty easy to fix this with the cool tools and
> workflow in the releases repo[2].  I suspect that the 'service' will
> take care of themselves, and the horizon-plugins are waiting to horizon
> to cut RC1.
> 
> Repos with type: horizon-plugin
> ironic-ui  ironic  
> manila-ui  manila  
> monasca-ui monasca 
> neutron-fwaas-dashboardneutron 
> solum-dashboardsolum   
> tacker-horizon tacker  
> watcher-dashboard  watcher 
> zun-ui zun 
> 
> Repos with type: other
> python-openstackclient OpenStackClient 
> patroleQuality Assurance

Fwiw, patrole is also a tempest plugin. [1] So it should also fall into the
branchless category and you don't need to worry about it branching before
unfreezing requirements.

-Matt Treinish

[1] 
https://github.com/openstack/patrole/blob/master/README.rst#release-versioning

> heat-agentsheat
> ironic-inspector   ironic  
> ironic-python-agentironic  
> kuryr-kubernetes   kuryr   
> monasca-common monasca 
> monasca-notification   monasca 
> monasca-persister  monasca 
> monasca-transform  monasca 
> 
> Repos with type: service
> ironic ironic  
> monasca-apimonasca 
> monasca-log-apimonasca 
> swift  swift   
> tricircle  tricircle   
> vitragevitrage 
> watcherwatcher 
> zunzun
> 
> Those are the easy items.
> 
> The following repos don't seem to use the openstack/releases repo so I
> have less information there.
> 
> i18n   I18n
> almanach   
> blazar 
> blazar-nova
> compute-hyperv 
> ekko   
> gce-api
> glare  
> ironic-staging-drivers 
> kosmos 
> masakari   
> masakari-monitors  
> mixmatch   
> mogan  
> nemesis
> networking-dpm 
> networking-fujitsu 
> networking-generic-switch  
> networking-l2gw
> networking-powervm 
> neutron-vpnaas   

Re: [openstack-dev] [keystone][api] Backwards incompatible changes based on config

2017-08-04 Thread Matthew Treinish
On Fri, Aug 04, 2017 at 03:35:38PM -0400, William M Edmonds wrote:
> 
> Lance Bragstad  wrote on 08/04/2017 02:37:40 PM:
> > Properly fixing this would result in a 403 -> 204 status code, which
> > requires an API version bump according to the interoperability
> > guidelines [5] (note that keystone has not implemented microversions at
> > this point). At the same time - not fixing the issues results in a 403
> > anytime a project is deleted while in this configuration.
> >
> 
> The guidelines you linked actually say that this is allowed without a
> version bump:
> 
> "There are two types of change which do not require a version change:... or
> responding with success (when the request was properly formed, but the
> server had broken handling)."

That's only for 500-599 response codes. The 'broken handling' there literally
means broken as in the server couldn't handle the request. That bullet point is
saying if you had a 500-599 response fixing the code so it's either a 4XX or a
2XX does not need a version. This specific case needs a version boundary because
you going from a 403 -> 204.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [barbican] Help for Barbican and UWSGI Community Goal

2017-06-23 Thread Matthew Treinish
On Fri, Jun 23, 2017 at 04:11:50PM +, Dave McCowan (dmccowan) wrote:
> The Barbican team is currently lacking a UWSGI expert.
> We need help identifying what work items we have to meet the UWSGI community 
> goal.[1]
> Could someone with expertise in this area review our code and docs [2] and 
> help me put together a to-do list?

So honestly barbican is probably already like 90% complete by the way there. It
was already running everything as a proper wsgi script under uwsgi. The only 
thing
missing was the apache config to use mod_proxy_uwsgi to have all the api servers
running on port 80.

It was also doing everything manually instead of relying on the common
functionality in PBR and devstack to handle creating wsgi entrypoints and
deploying wsgi apps.

I pushed up:

https://review.openstack.org/#/q/topic:deploy-in-wsgi

To take care of the gaps and make everything use the common mechanisms. It
probably will need a little bit of work before it's ready to go. (I didn't
bother testing anything before I pushed it)

-Matt Treinish


 
> [1] https://governance.openstack.org/tc/goals/pike/deploy-api-in-wsgi.html
> [2] https://git.openstack.org/cgit/openstack/barbican/tree/


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] tempest failures when deploying neutron-server in wsgi with apache

2017-06-19 Thread Matthew Treinish
On Mon, Jun 19, 2017 at 12:09:12AM -0700, Kevin Benton wrote:
> I've been working on Victor's patch a bit. One thing that isn't clear to me
> is how we can get the neutron.conf options loaded when using WSGI. How are
> other projects doing this?

Most projects are using a default location, for example: 

https://review.openstack.org/#/c/459450/11/glance/common/wsgi_app.py

Which IIRC I just looked at how other project's wsgi entrypoints were doing it
when I wrote that. The issue I think we'll hit with Neutron is that by default
we tell everyone to do that annoying multi-file config setup, which makes doing
a default like this difficult. Personally I think we need to change that,
because it's not needed and makes it generally confusing, but even if we did
it wouldn't solve the upgrade path from non-wsgi to wsgi.

-Matt Treinish

> 
> On Fri, Jun 2, 2017 at 7:44 AM, Emilien Macchi  wrote:
> 
> > On Thu, Jun 1, 2017 at 10:28 PM, Morales, Victor
> >  wrote:
> > > Hi Emilien,
> > >
> > > I noticed that the configuration file was created using puppet.  I
> > submitted a patch[1] that was targeting to include the changes in Devstack.
> > My major concern is with the value of WSGIScriptAlias which should be
> > pointing to WSGI script.
> >
> > Thanks for looking, the script that is used is from
> > /usr/bin/neutron-api which is I think correct. If you look at logs,
> > you can see that API actually works but some tempest tests fail
> > though...
> >
> > > Regards/Saludos
> > > Victor Morales
> > >
> > > [1] https://review.openstack.org/#/c/439191
> > >
> > > On 5/31/17, 4:40 AM, "Emilien Macchi"  wrote:
> > >
> > > Hey folks,
> > >
> > > I've been playing with deploying Neutron in WSGI with Apache and
> > > Tempest tests fail on spawning Nova server when creating Neutron
> > > ports:
> > > http://logs.openstack.org/89/459489/4/check/gate-puppet-
> > openstack-integration-4-scenario001-tempest-centos-7/
> > f2ee8bf/console.html#_2017-05-30_13_09_22_715400
> > >
> > > I haven't found anything useful in neutron-server logs:
> > > http://logs.openstack.org/89/459489/4/check/gate-puppet-
> > openstack-integration-4-scenario001-tempest-centos-7/
> > f2ee8bf/logs/apache/neutron_wsgi_access_ssl.txt.gz
> > >
> > > Before I file a bug in neutron, can anyone look at the logs with me
> > > and see if I missed something in the config:
> > > http://logs.openstack.org/89/459489/4/check/gate-puppet-
> > openstack-integration-4-scenario001-tempest-centos-7/
> > f2ee8bf/logs/apache_config/10-neutron_wsgi.conf.txt.gz
> > >
> > > Thanks for the help,
> > > --
> > > Emilien Macchi
> > >
> > > 
> > __
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> > unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > > 
> > __
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> > unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > --
> > Emilien Macchi
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] [tc] [all] more tempest plugins (was Re: [tc] [all] TC Report 22)

2017-06-01 Thread Matthew Treinish
On Thu, Jun 01, 2017 at 11:09:56AM +0100, Chris Dent wrote:
> A lot of this results, in part, from there being no single guiding
> pattern and principle for how (and where) the tests are to be
> managed. 

It sounds like you want to write a general testing guide for openstack.
Have you started this effort anywhere? I don't think anyone would be opposed
to starting a document for that, it seems like a reasonable thing to have.
But, I think you'll find there is not a one size fits all solution though,
because every project has their own requirements and needs for testing. 

> When there's a choice between one, some and all, "some" is
> almost always the wrong way to manage something. "some" is how we do
> tempest (and fair few other OpenStack things).
> 
> If it is the case that we want some projects to not put their tests
> in the main tempest repo then the only conceivable pattern from a
> memorability, discoverability, and equality standpoint is actually
> for all the tests to be in plugins.
> 
> If that isn't possible (and it is clear there are many reasons why
> that may be the case) then we need to be extra sure that we explore
> and uncover the issues that the "some" approach presents and provide
> sufficient documentation, tooling, and guidance to help people get
> around them. And that we recognize and acknowledge the impact it has.

So have you read the documentation:

https://docs.openstack.org/developer/tempest/ (or any of the other relevant
documentation

and filed bugs about where you think there are gaps? This is something that
really bugs me sometimes (yes the pun is intended) just like anything else this
is all about iterative improvements. These broad trends are things tempest
and (every project hopefully) have been working on. But improvements don't
just magically occur overnight it takes time to implement them.

Just compare the state of the documentation and tooling from 2 years ago (when
tempest started adding the plugin interface) to today. Things have steadily
improved over time and the situation now is much better. This will continue and
in the future things will get even better.

The thing is this is open source collaborative development and there is an
expectation that people who have issues with something in the project will
report them or contribute a fix and communicate with the maintainers. The users
of tempest's plugin interface tend to be other openstack projects (but not
exclusively) and if there are something that's not clear we need to work
together to fix them.

Based on this paragraph I feel like you think the decision to add a tempest
plugin interface and decrease it's scope was taken lightly without forethought
or careful consideration. But, it's the exact opposite there was extensive
debate and exploration of the problem space and took a long time to reach a
consensus.

> 
> If the answer to that is "who is going to do that?" or "who has the
> time?" then I ask you to ask yourself why we think the "non-core"
> projects have time to fiddle about with tempest plugins?

I think this unfair simplification, no one is required to write a tempest
plugin it's a choice the projects made. While I won't say the interface
is perfect, things are always improving. If a project chooses to write
a plugin, the expectation is that we'll all work together to help fix
issues as they are encountered. No individual can do everything by themselves 
and
it's a shared group effort. But, even so there is no shortage of work for
anyone, it's all about prioritization of effort.

> 
> And finally, I actually don't have too strong of a position in the
> case of tempest and tempest plugins. What I take issue with is the
> process whereby we discuss and decide these things and characterize
> the various projects
> 
> If I have any position on tempest at all it is that we should limit
> it to gross cloud validation and maybe interop testing, and projects
> should manage their own integration testing in tree using whatever
> tooling they feel is most appropriate. If that turns out to be
> tempest, cool.

I fail to see how this is any different than how things work today. No one is
required to use a tempest plugin and they can write tests however they want.
Tempest itself has a well defined scope (which does evolve over time like any
other project) and doesn't try to be all the testing everywhere. Almost every
other project has it's own in tree testing outside of tempest or tempest
plugins. Also, projects which have in-tree tempest tests also have tempest
plugins to expand on that set of functionality.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][tc][all] Tempest to reject trademark tests

2017-06-01 Thread Matthew Treinish
On Thu, Jun 01, 2017 at 11:57:00AM -0400, Doug Hellmann wrote:
> Excerpts from Thierry Carrez's message of 2017-06-01 11:51:50 +0200:
> > Graham Hayes wrote:
> > > On 01/06/17 01:30, Matthew Treinish wrote:
> > >> TBH, it's a bit premature to have the discussion. These additional 
> > >> programs do
> > >> not exist yet, and there is a governance road block around this. Right 
> > >> now the
> > >> set of projects that can be used defcore/interopWG is limited to the set 
> > >> of 
> > >> projects in:
> > >>
> > >> https://governance.openstack.org/tc/reference/tags/tc_approved-release.html
> > > 
> > > Sure - but that is a solved problem, when the interop committee is
> > > ready to propose them, they can add projects into that tag. Or am I
> > > misunderstanding [1] (again)?
> > 
> > I think you understand it well. The Board/InteropWG should propose
> > additions/removals of this tag, which will then be approved by the TC:
> > 
> > https://governance.openstack.org/tc/reference/tags/tc_approved-release.html#tag-application-process
> > 
> > > [...]
> > >> We had a forum session on it (I can't find the etherpad for the session) 
> > >> which
> > >> was pretty speculative because it was about planning the new programs. 
> > >> Part of
> > >> that discussion was around the feasibility of using tests in plugins and 
> > >> whether
> > >> that would be desirable. Personally, I was in favor of doing that for 
> > >> some of
> > >> the proposed programs because of the way they were organized it was a 
> > >> good fit.
> > >> This is because the proposed new programs were extra additions on top of 
> > >> the
> > >> base existing interop program. But it was hardly a definitive discussion.
> > > 
> > > Which will create 2 classes of testing for interop programs.
> > 
> > FWIW I would rather have a single way of doing "tests used in trademark
> > programs" without differentiating between old and new trademark programs.
> > 
> > I fear that we are discussing solutions before defining the problem. We
> > want:
> > 
> > 1- Decentralize test maintenance, through more tempest plugins, to
> > account for limited QA resources
> > 2- Additional codereview constraints and approval rules for tests that
> > happen to be used in trademark programs
> > 3- Discoverability/ease-of-install of the set of tests that happen to be
> > used in trademark programs
> > 4- A git repo layout that can be simply explained, for new teams to
> > understand
> > 
> > It feels like the current git repo layout (result of that 2016-05-04
> > resolution) optimizes for 2 and 3, which kind of works until you add
> > more trademark programs, at which point it breaks 1 and 4.
> > 
> > I feel like you could get 2 and 3 without necessarily using git repo
> > boundaries (using Gerrit approval rules and some tooling to install/run
> > subset of tests across multiple git repos), which would allow you to
> > optimize git repo layout to get 1 and 4...
> > 
> > Or am I missing something ?
> > 
> 
> Right. The point of having the trademark tests "in tempest" was not
> to have them "in the tempest repo", that was just an implementation
> detail of the policy of "put them in a repository managed by people
> who understand the expanded review rules".

There was more to it than this, a big part was duplication of effort as well.
Tempest itself is almost a perfect fit for the scope of the testing defcore is
doing. While tempest does additional testing that defcore doesn't use, a large
subset is exactly what they want.

> 
> There were a lot of unexpected issues when we started treating the
> test suite as a production tool for validating a cloud.  We have
> to be careful about how we change the behavior of tests, for example,
> even if the API responses are expected to be the same.  It's not
> fair to vendors or operators who get trademark approval with one
> release to have significant changes in behavior in the exact same
> tests for the next release.

I actually find this to be kinda misleading. Tempest has always had
running on any cloud as part of it's mission. I think you're referring
to the monster defcore thread from last summer about proprietary nova extensions
adding on to API responses. This is honestly a completely separate problem
which is not something I want to dive into again, because that was a much more
nuanced problem that involved much m

Re: [openstack-dev] [qa][tc][all] Tempest to reject trademark tests

2017-06-01 Thread Matthew Treinish
On Thu, Jun 01, 2017 at 12:32:03PM +0900, Ghanshyam Mann wrote:
> On Thu, Jun 1, 2017 at 9:46 AM, Matthew Treinish <mtrein...@kortar.org> wrote:
> > On Wed, May 31, 2017 at 04:24:14PM +, Jeremy Stanley wrote:
> >> On 2017-05-31 17:18:54 +0100 (+0100), Graham Hayes wrote:
> >> [...]
> >> > Trademark programs are trademark programs - we should have a unified
> >> > process for all of them. Let's not make the same mistakes again by
> >> > creating classes of projects / programs. I do not want this to be
> >> > a distinction as we move forward.
> >>
> >> This I agree with. However I'll be surprised if a majority of the QA
> >> team disagree on this point (logistic concerns with how to curate
> >> this over time I can understand, but that just means they need to
> >> interest some people in working on a manageable solution).
> >
> > +1 I don't think anyone disagrees with this. There is a logistical concern
> > with the way the new proposed programs are going to be introduced. Quite
> > frankly it's too varied and broad and I don't think we'll have enough people
> > working on this space to help maintain it in the same manner.
> >
> > It's the same reason we worked on the plugin decomposition in the first 
> > place.
> > You can easily look at the numbers of tests to see this:
> >
> > https://raw.githubusercontent.com/mtreinish/qa-in-the-open/lca2017/tests_per_proj.png
> >
> > Which shows things before the plugin decomposition (and before the big 
> > tent) Just
> > because we said we'd support all the incubated and integrated projects in 
> > tempest
> > didn't mean people were contributing and/or the tests were well maintained.
> >
> > But, as I said elsewhere in this thread this is a bit too early to have the
> > conversation because the new interop programs don't actually exist yet.
> 
> Yes, there is no question on goal to have a unified process for all.
> As Jeremy, Matthew mentioned, key things here is manageability issues.
> 
> We know contributors in QA are reducing cycle by cycle. I might be
> thinking over but I thought about QA team situation when we have
> around 30-40 trademark projects and all tests on Tempest
> repo.Personally I am ok to have tests in Tempest repo or a dedicated
> interop plugin repo which can be controlled by QA at some level But we

I actually don't think a dedicated interop plugin is a good idea. It doesn't
actually solve anything, because the tests are going to be the same and the
same people are going to be maintaining them. All you did was move it into a
different repo which solves none of the problems. What I was referring to was
exploring a more distributed approach to handling the tests (like what we did
for plugin decomposition for higher level services) That is the only way I see
us addressing the work overload problem. But, as I said before this is still
too early to talk about because there aren't defined new programs yet, just
the idea for them and a rough plan. We're still talking very much in the
abstract about everything...

-Matt Treinish

> need dedicated participation from interop + projects liason (I am not
> sure that worked well in pass but if with TC help it might work :)).
> 
> I can recall that, QA team has many patches on plugin side to improve
> them or fix them but may of them has no active reviews or much
> attentions from project team. I am afraid about same case for
> trademark projects also.
> 
> May be broad direction on trademark program and scope of it can help
> to imagine the quantity of programs and tests which QA teams need to
> maintain.



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][tc][all] Tempest to reject trademark tests

2017-05-31 Thread Matthew Treinish
On Wed, May 31, 2017 at 04:24:14PM +, Jeremy Stanley wrote:
> On 2017-05-31 17:18:54 +0100 (+0100), Graham Hayes wrote:
> [...]
> > Trademark programs are trademark programs - we should have a unified
> > process for all of them. Let's not make the same mistakes again by
> > creating classes of projects / programs. I do not want this to be
> > a distinction as we move forward.
> 
> This I agree with. However I'll be surprised if a majority of the QA
> team disagree on this point (logistic concerns with how to curate
> this over time I can understand, but that just means they need to
> interest some people in working on a manageable solution).

+1 I don't think anyone disagrees with this. There is a logistical concern
with the way the new proposed programs are going to be introduced. Quite
frankly it's too varied and broad and I don't think we'll have enough people
working on this space to help maintain it in the same manner.

It's the same reason we worked on the plugin decomposition in the first place.
You can easily look at the numbers of tests to see this:

https://raw.githubusercontent.com/mtreinish/qa-in-the-open/lca2017/tests_per_proj.png

Which shows things before the plugin decomposition (and before the big tent) 
Just
because we said we'd support all the incubated and integrated projects in 
tempest
didn't mean people were contributing and/or the tests were well maintained.

But, as I said elsewhere in this thread this is a bit too early to have the
conversation because the new interop programs don't actually exist yet.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][tc][all] Tempest to reject trademark tests (was: more tempest plugins)

2017-05-31 Thread Matthew Treinish
On Wed, May 31, 2017 at 03:45:52PM +, Jeremy Stanley wrote:
> On 2017-05-31 15:22:59 + (+), Jeremy Stanley wrote:
> > On 2017-05-31 09:43:11 -0400 (-0400), Doug Hellmann wrote:
> > [...]
> > > it's news to me that they're considering reversing course. If the
> > > QA team isn't going to continue, we'll need to figure out what
> > > that means and potentially find another group to do it.
> > 
> > I wasn't there for the discussion, but it sounds likely to be a
> > mischaracterization. I'm going to assume it's not true (or much more
> > nuanced) at least until someone responds on behalf of the QA team.
> > This particular subthread is only going to go further into the weeds
> > until it is grounded in some authoritative details.
> 
> Apologies for replying to myself, but per discussion[*] with Chris
> in #openstack-dev I'm adjusting the subject header to make it more
> clear which particular line of speculation I consider weedy.
> 
> Also in that brief discussion, Graham made it slightly clearer that
> he was talking about pushback on the tempest repo getting tests for
> new trademark programs beyond "OpenStack Powered Platform,"
> "OpenStack Powered Compute" and "OpenStack Powered Object Storage."

TBH, it's a bit premature to have the discussion. These additional programs do
not exist yet, and there is a governance road block around this. Right now the
set of projects that can be used defcore/interopWG is limited to the set of 
projects in:

https://governance.openstack.org/tc/reference/tags/tc_approved-release.html

We had a forum session on it (I can't find the etherpad for the session) which
was pretty speculative because it was about planning the new programs. Part of
that discussion was around the feasibility of using tests in plugins and whether
that would be desirable. Personally, I was in favor of doing that for some of
the proposed programs because of the way they were organized it was a good fit.
This is because the proposed new programs were extra additions on top of the
base existing interop program. But it was hardly a definitive discussion.

We will have to have discussions about how we're going to actually implement
the additional programs when we start to create them, but that's not happening
yet.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] [tc] [all] more tempest plugins (was Re: [tc] [all] TC Report 22)

2017-05-31 Thread Matthew Treinish
On Wed, May 31, 2017 at 03:22:59PM +, Jeremy Stanley wrote:
> On 2017-05-31 09:43:11 -0400 (-0400), Doug Hellmann wrote:
> [...]
> > it's news to me that they're considering reversing course. If the
> > QA team isn't going to continue, we'll need to figure out what
> > that means and potentially find another group to do it.
> 
> I wasn't there for the discussion, but it sounds likely to be a
> mischaracterization. 
> I'm going to assume it's not true (or much more
> nuanced) at least until someone responds on behalf of the QA team.
> This particular subthread is only going to go further into the weeds
> until it is grounded in some authoritative details.

+1

I'm very confused by this whole thread TBH. Was there a defcore test which was
blocked from tempest? Quite frankly the amount of contribution to tempest
specifically for defcore tests is very minimal. (at most 1 or 2 patches per
cycle) It seems like this whole concern is based on a misunderstanding somewhere
and just is going off in a weird direction.

-Matt Treinish



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest] Proposing Fanglei Zhu for Tempest core

2017-05-16 Thread Matthew Treinish

On Tue, May 16, 2017 at 08:22:44AM +, Andrea Frittoli wrote:
> Hello team,
> 
> I'm very pleased to propose Fanglei Zhu (zhufl) for Tempest core.
> 
> Over the past two cycle Fanglei has been steadily contributing to Tempest
> and its community.
> She's done a great deal of work in making Tempest code cleaner, easier to
> read, maintain and
> debug, fixing bugs and removing cruft. Both her code as well as her reviews
> demonstrate a
> very good understanding of Tempest internals and of the project future
> direction.
> I believe Fanglei will make an excellent addition to the team.
> 
> As per the usual, if the current Tempest core team members would please
> vote +1
> or -1(veto) to the nomination when you get a chance. We'll keep the polls
> open
> for 5 days or until everyone has voted.

+1

-Matt Treinish

> 
> References:
> https://review.openstack.org/#/q/owner:zhu.fanglei%2540zte.com.cn
> https://review.openstack.org/#/q/reviewer:zhufl


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][heat][murano][daisycloud] Removing Heat support from Tempest

2017-05-04 Thread Matthew Treinish
On Fri, May 05, 2017 at 09:29:40AM +1200, Steve Baker wrote:
> On Thu, May 4, 2017 at 3:56 PM, Matthew Treinish <mtrein...@kortar.org>
> wrote:
> 
> > On Wed, May 03, 2017 at 11:51:13AM +, Andrea Frittoli wrote:
> > > On Tue, May 2, 2017 at 5:33 PM Matthew Treinish <mtrein...@kortar.org>
> > > wrote:
> > >
> > > > On Tue, May 02, 2017 at 09:49:14AM +0530, Rabi Mishra wrote:
> > > > > On Fri, Apr 28, 2017 at 2:17 PM, Andrea Frittoli <
> > > > andrea.fritt...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 28, 2017 at 10:29 AM Rabi Mishra <ramis...@redhat.com>
> > > > wrote:
> > > > > >
> > > > > >> On Thu, Apr 27, 2017 at 3:55 PM, Andrea Frittoli <
> > > > > >> andrea.fritt...@gmail.com> wrote:
> > > > > >>
> > > > > >>> Dear stackers,
> > > > > >>>
> > > > > >>> starting in the Liberty cycle Tempest has defined a set of
> > projects
> > > > > >>> which are in scope for direct
> > > > > >>> testing in Tempest [0]. The current list includes keystone, nova,
> > > > > >>> glance, swift, cinder and neutron.
> > > > > >>> All other projects can use the same Tempest testing
> > infrastructure
> > > > (or
> > > > > >>> parts of it) by taking advantage
> > > > > >>> the Tempest plugin and stable interfaces.
> > > > > >>>
> > > > > >>> Tempest currently hosts a set of API tests as well as a service
> > > > client
> > > > > >>> for the Heat project.
> > > > > >>> The Heat service client is used by the tests in Tempest, which
> > run in
> > > > > >>> Heat gate as part of the grenade
> > > > > >>> job, as well as in the Tempest gate (check pipeline) as part of
> > the
> > > > > >>> layer4 job.
> > > > > >>> According to code search [3] the Heat service client is also
> > used by
> > > > > >>> Murano and Daisycore.
> > > > > >>>
> > > > > >>
> > > > > >> For the heat grenade job, I've proposed two patches.
> > > > > >>
> > > > > >> 1. To run heat tree gabbi api tests as part of grenade
> > 'post-upgrade'
> > > > > >> phase
> > > > > >>
> > > > > >> https://review.openstack.org/#/c/460542/
> > > > > >>
> > > > > >> 2. To remove tempest tests from the grenade job
> > > > > >>
> > > > > >> https://review.openstack.org/#/c/460810/
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> I proposed a patch to Tempest to start the deprecation counter
> > for
> > > > Heat
> > > > > >>> / orchestration related
> > > > > >>> configuration items in Tempest [4], and I would like to make sure
> > > > that
> > > > > >>> all tests and the service client
> > > > > >>> either find a new home outside of Tempest, or are removed, by
> > the end
> > > > > >>> the Pike cycle at the latest.
> > > > > >>>
> > > > > >>> Heat has in-tree integration tests and Gabbi based API tests,
> > but I
> > > > > >>> don't know if those provide
> > > > > >>> enough coverage to replace the tests on Tempest side.
> > > > > >>>
> > > > > >>>
> > > > > >> Yes, the heat gabbi api tests do not yet have the same coverage
> > as the
> > > > > >> tempest tree api tests (lacks tests using nova, neutron and swift
> > > > > >> resources),  but I think that should not stop us from *not*
> > running
> > > > the
> > > > > >> tempest tests in the grenade job.
> > > > > >>
> > > > > >> I also don't know if the tempest tree heat tests are used by any
> > other
> > > > > >> upstream/downstream jobs. We could surely add more tests to bridge
> &

Re: [openstack-dev] [qa][cinder][ceph] should Tempest tests the backend specific feature?

2017-05-03 Thread Matthew Treinish
On Wed, May 03, 2017 at 11:09:32PM -0400, Monty Taylor wrote:
> On 05/02/2017 11:49 AM, Sean McGinnis wrote:
> > On Tue, May 02, 2017 at 03:36:20PM +0200, Jordan Pittier wrote:
> > > On Tue, May 2, 2017 at 7:42 AM, Ghanshyam Mann 
> > > wrote:
> > > 
> > > > In Cinder, there are many features/APIs which are backend specific and
> > > > will return 405 or 501 if same is not implemented on any backend [1].
> > > > If such tests are implemented in Tempest, then it will break some gate
> > > > where that backend job is voting. like ceph job in glance_store gate.
> > > > 
> > > > There been many such cases recently where ceph jobs were broken due to
> > > > such tests and recently it is for force-delete backup feature[2].
> > > > Reverting force-delete tests in [3]. To resolve such cases at some
> > > > extend, Jon is going to add a white/black list of tests which can run
> > > > on ceph job [4] depends on what all feature ceph implemented. But this
> > > > does not resolve it completely due to many reason like
> > > > 1. External use of Tempest become difficult where user needs to know
> > > > what all tests to skip for which backend
> > > > 2. Tempest tests become too specific to backend.
> > > > 
> > > > Now there are few options to resolve this:
> > > > 1. Tempest should not tests such API/feature which are backend
> > > > specific like mentioned by api-ref like[1].
> > > > 
> > > So basically, if one of the 50 Cinder driver doesn't support a feature, we
> > > should never test that feature ? What about the 49 other drivers ? If a
> > > feature exists and can be tested in the Gate (with whatever default
> > > config/driver is shipped) then I think we should test it.
> > > 
> > 50? Over 100 as of Ocata.
> > 
> > Well, is tempest's purpose in life to provide complete gate test coverage,
> > or is tempest's purpose in life to give operators a tool to validate that
> > their deployment is working as expected?
> 
> I'd actually like to suggest that such a scenario actually points out a
> thing that is ultimately potential pain passed to the end user in the real
> world, so this question about what/how to test this in tempest is a good one
> to have.
> 
> If there is a feature which is only provisionally available depending on the
> backend driver such that it's hard to test in tempest without an out of band
> config - then it's a feature that a user will have no clue whether it works
> on a given cloud.
> 
> As we find these, I'd love it if we could expose discovery in the API for
> viability of the feature. Like:
> 
> GET /capabilities
> 
> {
>   "capabilities": {
> "has_force_delete": true
>   }
> }
> 
> (I know we've talked about that concept generally, but this is a specific
> example)
> 
> If such a thing existed, then the user can know whether they can use a thing
> .. and so can tempest. A tempest test to validate force_delete working could
> check the capability reported by the API and validate that if the API says
> "true" that the feature work as expected, and if it says "false" validate
> that attempting to call it returns a 405 (or whatever is appropriate)
> 
> Ultimately, every config we need to feed to tempest is potentially a place
> where an end user is unable to know whether or not to expect a call to work
> - and an opportunity for us to provide our API consumers with a richer
> experience.

Heh, well I've been saying all of these things for years. In fact I got so tired
of repeating it all the time I just put it in the tempest configuration guide:
(although I remember it being a lot snarkier)

https://docs.openstack.org/developer/tempest/configuration.html#service-feature-configuration

So, I'll be very happy when the capabilities work actually becomes a thing you
can use. But, it feels like we've been talking about around the problem for a
very long time...

-Matt Treinish


> 
> > In attempting to do things in the past, I've received push back based on
> > the argument that it was the latter. For this reason, in-tree tempest tests
> > were added to Cinder to give us a way to get better test coverage for our
> > own sake.
> > 
> > Now that this is all in place, I think it's working well and I would like
> > to see it continue that way. IMO, tempest proper should not have anything
> > that isn't universally applicable to real world deployments. Not just for
> > things like Ceph, but things like the manage/unmanage backend specific
> > tests that were added and broke a large majority of third party CI.
> > 
> > Backend specific things should not be part of tempest in my opinion. We
> > should cover those things through in-tree tempest plugins and our own
> > testing.
> > > 
> > > > 2. Tempest test can be disabled/skip based on backend. - This is not
> > > > good idea as it increase config options and overhead of setting those.
> > > > 
> > > Using regex and blacklist, any 3rd party CI can skip any test based on the
> > > test ID. Without introducing a config flag. 

Re: [openstack-dev] [qa][heat][murano][daisycloud] Removing Heat support from Tempest

2017-05-03 Thread Matthew Treinish
On Wed, May 03, 2017 at 11:51:13AM +, Andrea Frittoli wrote:
> On Tue, May 2, 2017 at 5:33 PM Matthew Treinish <mtrein...@kortar.org>
> wrote:
> 
> > On Tue, May 02, 2017 at 09:49:14AM +0530, Rabi Mishra wrote:
> > > On Fri, Apr 28, 2017 at 2:17 PM, Andrea Frittoli <
> > andrea.fritt...@gmail.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > On Fri, Apr 28, 2017 at 10:29 AM Rabi Mishra <ramis...@redhat.com>
> > wrote:
> > > >
> > > >> On Thu, Apr 27, 2017 at 3:55 PM, Andrea Frittoli <
> > > >> andrea.fritt...@gmail.com> wrote:
> > > >>
> > > >>> Dear stackers,
> > > >>>
> > > >>> starting in the Liberty cycle Tempest has defined a set of projects
> > > >>> which are in scope for direct
> > > >>> testing in Tempest [0]. The current list includes keystone, nova,
> > > >>> glance, swift, cinder and neutron.
> > > >>> All other projects can use the same Tempest testing infrastructure
> > (or
> > > >>> parts of it) by taking advantage
> > > >>> the Tempest plugin and stable interfaces.
> > > >>>
> > > >>> Tempest currently hosts a set of API tests as well as a service
> > client
> > > >>> for the Heat project.
> > > >>> The Heat service client is used by the tests in Tempest, which run in
> > > >>> Heat gate as part of the grenade
> > > >>> job, as well as in the Tempest gate (check pipeline) as part of the
> > > >>> layer4 job.
> > > >>> According to code search [3] the Heat service client is also used by
> > > >>> Murano and Daisycore.
> > > >>>
> > > >>
> > > >> For the heat grenade job, I've proposed two patches.
> > > >>
> > > >> 1. To run heat tree gabbi api tests as part of grenade 'post-upgrade'
> > > >> phase
> > > >>
> > > >> https://review.openstack.org/#/c/460542/
> > > >>
> > > >> 2. To remove tempest tests from the grenade job
> > > >>
> > > >> https://review.openstack.org/#/c/460810/
> > > >>
> > > >>
> > > >>
> > > >>> I proposed a patch to Tempest to start the deprecation counter for
> > Heat
> > > >>> / orchestration related
> > > >>> configuration items in Tempest [4], and I would like to make sure
> > that
> > > >>> all tests and the service client
> > > >>> either find a new home outside of Tempest, or are removed, by the end
> > > >>> the Pike cycle at the latest.
> > > >>>
> > > >>> Heat has in-tree integration tests and Gabbi based API tests, but I
> > > >>> don't know if those provide
> > > >>> enough coverage to replace the tests on Tempest side.
> > > >>>
> > > >>>
> > > >> Yes, the heat gabbi api tests do not yet have the same coverage as the
> > > >> tempest tree api tests (lacks tests using nova, neutron and swift
> > > >> resources),  but I think that should not stop us from *not* running
> > the
> > > >> tempest tests in the grenade job.
> > > >>
> > > >> I also don't know if the tempest tree heat tests are used by any other
> > > >> upstream/downstream jobs. We could surely add more tests to bridge
> > the gap.
> > > >>
> > > >> Also, It's possible to run the heat integration tests (we've enough
> > > >> coverage there) with tempest plugin after doing some initial setup,
> > as we
> > > >> do in all our dsvm gate jobs.
> > > >>
> > > >> It would propose to move tests and client to a Tempest plugin owned /
> > > >>> maintained by
> > > >>> the Heat team, so that the Heat team can have full flexibility in
> > > >>> consolidating their integration
> > > >>> tests. For Murano and Daisycloud - and any other team that may want
> > to
> > > >>> use the Heat service
> > > >>> client in their tests, even if the client is removed from Tempest, it
> > > >>> would still be available via
> > > >>> the Heat Tempest plugin. As long as the plugin implements the service
> > > 

Re: [openstack-dev] [qa][heat][murano][daisycloud] Removing Heat support from Tempest

2017-05-02 Thread Matthew Treinish
On Tue, May 02, 2017 at 09:49:14AM +0530, Rabi Mishra wrote:
> On Fri, Apr 28, 2017 at 2:17 PM, Andrea Frittoli 
> wrote:
> 
> >
> >
> > On Fri, Apr 28, 2017 at 10:29 AM Rabi Mishra  wrote:
> >
> >> On Thu, Apr 27, 2017 at 3:55 PM, Andrea Frittoli <
> >> andrea.fritt...@gmail.com> wrote:
> >>
> >>> Dear stackers,
> >>>
> >>> starting in the Liberty cycle Tempest has defined a set of projects
> >>> which are in scope for direct
> >>> testing in Tempest [0]. The current list includes keystone, nova,
> >>> glance, swift, cinder and neutron.
> >>> All other projects can use the same Tempest testing infrastructure (or
> >>> parts of it) by taking advantage
> >>> the Tempest plugin and stable interfaces.
> >>>
> >>> Tempest currently hosts a set of API tests as well as a service client
> >>> for the Heat project.
> >>> The Heat service client is used by the tests in Tempest, which run in
> >>> Heat gate as part of the grenade
> >>> job, as well as in the Tempest gate (check pipeline) as part of the
> >>> layer4 job.
> >>> According to code search [3] the Heat service client is also used by
> >>> Murano and Daisycore.
> >>>
> >>
> >> For the heat grenade job, I've proposed two patches.
> >>
> >> 1. To run heat tree gabbi api tests as part of grenade 'post-upgrade'
> >> phase
> >>
> >> https://review.openstack.org/#/c/460542/
> >>
> >> 2. To remove tempest tests from the grenade job
> >>
> >> https://review.openstack.org/#/c/460810/
> >>
> >>
> >>
> >>> I proposed a patch to Tempest to start the deprecation counter for Heat
> >>> / orchestration related
> >>> configuration items in Tempest [4], and I would like to make sure that
> >>> all tests and the service client
> >>> either find a new home outside of Tempest, or are removed, by the end
> >>> the Pike cycle at the latest.
> >>>
> >>> Heat has in-tree integration tests and Gabbi based API tests, but I
> >>> don't know if those provide
> >>> enough coverage to replace the tests on Tempest side.
> >>>
> >>>
> >> Yes, the heat gabbi api tests do not yet have the same coverage as the
> >> tempest tree api tests (lacks tests using nova, neutron and swift
> >> resources),  but I think that should not stop us from *not* running the
> >> tempest tests in the grenade job.
> >>
> >> I also don't know if the tempest tree heat tests are used by any other
> >> upstream/downstream jobs. We could surely add more tests to bridge the gap.
> >>
> >> Also, It's possible to run the heat integration tests (we've enough
> >> coverage there) with tempest plugin after doing some initial setup, as we
> >> do in all our dsvm gate jobs.
> >>
> >> It would propose to move tests and client to a Tempest plugin owned /
> >>> maintained by
> >>> the Heat team, so that the Heat team can have full flexibility in
> >>> consolidating their integration
> >>> tests. For Murano and Daisycloud - and any other team that may want to
> >>> use the Heat service
> >>> client in their tests, even if the client is removed from Tempest, it
> >>> would still be available via
> >>> the Heat Tempest plugin. As long as the plugin implements the service
> >>> client interface,
> >>> the Heat service client will register automatically in the service
> >>> client manager and be available
> >>> for use as today.
> >>>
> >>>
> >> if I understand correctly, you're proposing moving the existing tempest
> >> tests and service clients to a separate repo managed by heat team. Though
> >> that would be collective decision, I'm not sure that's something I would
> >> like to do. To start with we may look at adding some of the missing pieces
> >> in heat tree itself.
> >>
> >
> > I'm proposing to move tests and the service client outside of tempest to a
> > new home.
> >
> > I also suggested that the new home could be a dedicate repo, since that
> > would allow you to maintain the
> > current branchless nature of those tests. A more detailed discussion about
> > the topic can be found
> > in the corresponding proposed queens goal [5],
> >
> > Using a dedicated repo *is not* a precondition for moving tests and
> > service client out of Tempest.
> >
> >
> We probably are mixing two different things here.
> 
> 1. Moving in-tree heat templest plugn and tests to a dedicated repo
> 
> Though we don't have any plans for it now, we may have to do it when/if
> it's accepted as a community goal.
> 
> 2.  Moving tempest tree heat tests and heat service client to a new home
> and owner.
> 
> I don't think that's something heat team would like to do given that we
> don't use these tests anywhere and would probably spend time improving the
> coverage of the gabbi api tests we already have.
> 

Ok, well if the heat team has no interest in maintaining these tests there is
no point in keeping them around anymore. I've pushed up:

https://review.openstack.org/461841

to remove the tests. As for the clients we can just move those to tempest.lib
to not break the plugins that are using them.


Re: [openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

2017-05-01 Thread Matthew Treinish
On Mon, May 01, 2017 at 05:00:17AM -0700, Flavio Percoco wrote:
> On 28/04/17 11:19 -0500, Eric Fried wrote:
> > If it's *just* glance we're making an exception for, I prefer #1 (don't
> > deprecate/remove [glance]api_servers).  It's way less code &
> > infrastructure, and it discourages others from jumping on the
> > multiple-endpoints bandwagon.  If we provide endpoint_override_list
> > (handwave), people will think it's okay to use it.
> > 
> > Anyone aware of any other services that use multiple endpoints?
> 
> Probably a bit late but yeah, I think this makes sense. I'm not aware of other
> projects that have list of api_servers.

I thought it was just nova too, but it turns out cinder has the same exact
option as nova: (I hit this in my devstack patch trying to get glance deployed
as a wsgi app)

https://github.com/openstack/cinder/blob/d47eda3a3ba9971330b27beeeb471e2bc94575ca/cinder/common/config.py#L51-L55

Although from what I can tell you don't have to set it and it will fallback to
using the catalog, assuming you configured the catalog info for cinder:

https://github.com/openstack/cinder/blob/19d07a1f394c905c23f109c1888c019da830b49e/cinder/image/glance.py#L117-L129


-Matt Treinish


> 
> > On 04/28/2017 10:46 AM, Mike Dorman wrote:
> > > Maybe we are talking about two different things here?  I’m a bit confused.
> > > 
> > > Our Glance config in nova.conf on HV’s looks like this:
> > > 
> > > [glance]
> > > api_servers=http://glance1:9292,http://glance2:9292,http://glance3:9292,http://glance4:9292
> > > glance_api_insecure=True
> > > glance_num_retries=4
> > > glance_protocol=http
> 
> 
> FWIW, this feature is being used as intended. I'm sure there are ways to 
> achieve
> this using external tools like haproxy/nginx but that adds an extra burden to
> OPs that is probably not necessary since this functionality is already there.
> 
> Flavio
> 
> > > So we do provide the full URLs, and there is SSL support.  Right?  I am 
> > > fairly certain we tested this to ensure that if one URL fails, nova goes 
> > > on to retry the next one.  That failure does not get bubbled up to the 
> > > user (which is ultimately the goal.)
> > > 
> > > I don’t disagree with you that the client side choose-a-server-at-random 
> > > is not a great load balancer.  (But isn’t this roughly the same thing 
> > > that oslo-messaging does when we give it a list of RMQ servers?)  For us 
> > > it’s more about the failure handling if one is down than it is about 
> > > actually equally distributing the load.
> > > 
> > > In my mind options One and Two are the same, since today we are already 
> > > providing full URLs and not only server names.  At the end of the day, I 
> > > don’t feel like there is a compelling argument here to remove this 
> > > functionality (that people are actively making use of.)
> > > 
> > > To be clear, I, and I think others, are fine with nova by default getting 
> > > the Glance endpoint from Keystone.  And that in Keystone there should 
> > > exist only one Glance endpoint.  What I’d like to see remain is the 
> > > ability to override that for nova-compute and to target more than one 
> > > Glance URL for purposes of fail over.
> > > 
> > > Thanks,
> > > Mike
> > > 
> > > 
> > > 
> > > 
> > > On 4/28/17, 8:20 AM, "Monty Taylor"  wrote:
> > > 
> > > Thank you both for your feedback - that's really helpful.
> > > 
> > > Let me say a few more words about what we're trying to accomplish here
> > > overall so that maybe we can figure out what the right way forward is.
> > > (it may be keeping the glance api servers setting, but let me at least
> > > make the case real quick)
> > > 
> > >  From a 10,000 foot view, the thing we're trying to do is to get 
> > > nova's
> > > consumption of all of the OpenStack services it uses to be less 
> > > special.
> > > 
> > > The clouds have catalogs which list information about the services -
> > > public, admin and internal endpoints and whatnot - and then we're 
> > > asking
> > > admins to not only register that information with the catalog, but to
> > > also put it into the nova.conf. That means that any updating of that
> > > info needs to be an API call to keystone and also a change to 
> > > nova.conf.
> > > If we, on the other hand, use the catalog, then nova can pick up 
> > > changes
> > > in real time as they're rolled out to the cloud - and there is 
> > > hopefully
> > > a sane set of defaults we could choose (based on operator feedback 
> > > like
> > > what you've given) so that in most cases you don't have to tell nova
> > > where to find glance _at_all_ becuase the cloud already knows where it
> > > is. (nova would know to look in the catalog for the interal interface 
> > > of
> > > the image service - for instance - there's no need to ask an operator 
> > > to
> > > add to the config "what is the service_type of the image service we
> > > 

Re: [openstack-dev] [qa][gate] tempest slow - where do we execute them in gate?

2017-04-17 Thread Matthew Treinish
On Mon, Apr 17, 2017 at 11:55:28AM -0700, Ihar Hrachyshka wrote:
> On Mon, Apr 17, 2017 at 9:35 AM, Jordan Pittier
>  wrote:
> > We don"t run slow tests because the QA team think that they don't bring
> > enough value to be executed, every time and everywhere. The idea is that if
> > some specific slow tests are of some interest to some specific openstack
> > projects, those projects can change the config of their jobs to enable these
> > tests.
> 
> 
> But since it's not executed anywhere in tempest gate, even as
> non-voting (?), it's effectively dead code that may be long broken
> without anyone knowing. Of course there are consumers of the tests
> downstream, but for those consumers it's a tough call to start
> depending on the tests if they are not sanity checked by tempest
> itself. Wouldn't it make sense to have some job in tempest gate that
> would execute those tests (maybe just them to speed up such a job?
> maybe non-voting? maybe even as periodic? but there should be
> something that keeps it green in long run).
> 

In theory those tests are already supposed to be run as a periodic/experimental
job. The periodic-tempest-dsvm-all-master is setup to run all tests, including
those tagged as slow. However, the job has been broken for some time, I didn't
even notice until I looked just now. (openstack-health didn't show it because
it fails before subunit is generated) I'll pushed:

https://review.openstack.org/#/c/457334/

to fix the job. Once that lands lets see how far things have bitrotted in there.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [elections] Available time and top priority

2017-04-12 Thread Matthew Treinish
On Mon, Apr 10, 2017 at 11:16:57AM +0200, Thierry Carrez wrote:
> Hi everyone,
> 
> New in this TC election round, we have a few days between nominations
> and actual voting to ask questions and get to know the candidates a bit
> better. I'd like to kick off this new "campaigning period" with a basic
> question on available time and top priority.
> 
> All the candidates are top community members with a lot of
> responsibilities on their shoulders already. My experience tells me that
> it is easy to overestimate the time we can dedicate to Technical
> Committee matters, and how much we can push and get done in six months
> or one year. At the same time, our most efficient way to make progress
> is always when someone "owns" a particular initiative and pushes it
> through the governance process.
> 
> So my question is the following: if elected, how much time do you think
> you'll be able to dedicate to Technical Committee affairs (reviewing
> proposed changes and pushing your own) ? If there was ONE thing, one
> initiative, one change you will actively push in the six months between
> this election round and the next, what would it be ?

I know personally I'm planning to dedicate more time over the next cycle to TC
activities. Something for me personally that came out of doing the visioning
exercise last month was that it did give me an opportunity to reflect on what
I viewed as priorities for the next 2 years.

So I think for me, as I outlined in the candidacy email, the one initiative I'd
really like to make a push for over the next 6 months is to start working on
trying build up the systems in place to support both part time contributors and
working on growing new leaders in the community. Realistically I don't think
either is completely accomplishable in 6 months, but getting a good start and
making initial progress is.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [election][tc] TC candidacy

2017-04-04 Thread Matthew Treinish
Hi Everyone,

I'd like to submit my candidacy for another term on the OpenStack Technical
Committee.

In my candidacy platform last year I outlined that if elected to the TC I wanted
to bring more technical oversight to the TC, and also to work on improving the
messaging around OpenStack, to make it clearer. I think in the past year we've
made lots of progress on both fronts, but there is still more work to do. Over
the next year I'd like to see the TC continue to make progress on both fronts.

The introduction of community wide goals was a good start in having the TC
setting a more concrete technical direction for projects. I see a lot of
potential with it and am eager to see it grow it over time. It was also just a
start and long term I'd like to see the TC take on more difficult technical
discussions and making decisions. For example, the recent discussion about
being opinionated about our RDBMS is an example of the kind of discussions I'd
like to see the TC have more of in the future.

For making the messaging around OpenStack clearer I think we've also been
making improvements. Over the last year, we've added and improved a few tags,
added the OpenStack principles document, and more recently we've worked on a TC
vision for the next 2 years. Over the past year we've also started taking
a more aggressive stance towards removing projects. Moving into the future I'd
like to see more progress on clearly documenting what OpenStack is and how it
can be used. I'm optimistic some of the efforts we're working on will continue
to make this better over time.

For the next year in addition to continuing progress on the items I outlined
before the other priority I see for the TC is working on improving the health
of the contributor base in the community. It's no secret that there has been
a recent contraction in the community and a lot of long time contributors and
community leaders are no longer actively contributing. I think we need to take
more proactive steps about this if we want OpenStack to continue to thrive.

I think there are 2 key areas we'll need to address on this front. The first
is I'd like to see the TC taking a more hands on approach for both building up
the mentorship pipeline to enable growing the contributor base and leadership in
the community. Most of of existing efforts in this area tend to be concentrated
on on-boarding new contributors which is good, but there isn't anything to
help bridge the gap into becoming a community leader. This is somewhere I can
see the TC taking a more active role to help people work towards taking
on leadership roles in the community.

The other aspect is I think we need to working towards making our community
more accessible for casual and/or part time contributors. Right now our
community process for contribution is heavily weighted towards full time and
corporate sponsored contribution. Over the next year I'd like to work towards
easing this burden and growing the number of casual and/or independent
contributors.

I think this will take 2 forms, the first is decreasing the barrier to entry on
our tooling. Things like the CLA and the multi-step process involving creating
multiple account just to get access to pushing proposed changes is very
off-putting, especially if you're not familiar with the systems. The other
aspect I think is more social. To a certain degree a lot of processes around
contribution or interaction assume that people are constant contributors and
always active (or connected). For example, how often do people push a patch
up for review and then bug a bunch of cores on IRC about it. That's not really
an option if you only can contribute for an 1 hr or 2 on the weekends. This is
an area where I'd like to see the TC start driving more active change to improve
the situation so we can hopefully start to grow the number of casual
contributors we have to the project.

Thanks for reading, I hope this outlines where my focus and priorities would be
if I'm lucky enough to be elected for another term.

Thanks,

Matthew Treinish

IRC: mtreinish
Review history: 
https://review.openstack.org/#/q/reviewer:mtreinish%2540kortar.org
Commit history: https://review.openstack.org/#/q/owner:mtreinish%2540kortar.org
Blog: http://blog.kortar.org/


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api][qa][tc][nova][cinder] Testing of a microversioned world

2017-03-10 Thread Matthew Treinish
On Fri, Mar 10, 2017 at 12:34:31PM -0800, Ken'ichi Ohmichi wrote:
> Hi John,
> 
> Now Tempest is testing microversions only for Nova and contains some
> testing framework for re-using for another projects.
> On this framework, we can implement necessary microversions tests as
> we want and actually many microversions of Nova are not tested by
> Tempest.
> We can see the tested microversion of Nova on
> https://github.com/openstack/tempest/blob/master/doc/source/microversion_testing.rst#microversion-tests-implemented-in-tempest
> 
> Before implementing microversion testing for Cinder, we will implement
> JSON-Schema validation for API responses for Cinder.
> The validation will be helpful for testing base microversion of Cinder
> API and we will be able to implement the microversion tests based on
> that.
> This implementation is marked as 7th priority in this Pike cycle as
> https://etherpad.openstack.org/p/pike-qa-priorities
> 
> In addition, now Cinder V3 API is not tested. So we are going to
> enable v3 tests with some restructure of Tempest in this cycle.
> The detail is described on the part of "Volume API" of
> https://etherpad.openstack.org/p/tempest-api-versions-in-pike


Umm, I don't know what you're basing that on, but there have been cinder v3
tests and cinder microversion support in Tempest since Newton. It was initially
added in this patch: https://review.openstack.org/#/c/300639/

-Matt Treinish


> 
> 2017-03-10 11:37 GMT-08:00 John Griffith :
> > Hey Everyone,
> >
> > So along the lines of an earlier thread that went out regarding testing of
> > deprecated API's and Tempest etc [1].
> >
> > Now that micro-versions are *the API versioning scheme to rule them all* one
> > question I've not been able to find an answer for is what we're going to
> > promise here for support and testing.  My understanding thus far is that the
> > "community" approach here is "nothing is ever deprecated, and everything is
> > supported forever".
> >
> > That's sort of a tall order IMO, but ok.  I've already had some questions
> > from folks about implementing an explicit Tempest test for every
> > micro-versioned implementation of an API call also.  My response has been
> > "nahh, just always test latest available".  This kinda means that we're not
> > testing/supporting the previous versions as promised though.
> >
> > Anyway; I'm certain that between Nova and the API-WG this has come up and is
> > probably addressed, just wondering if somebody can point me to some
> > documentation or policies in this respect.
> >
> > Thanks,
> > John


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Our New Weekly(ish) Test Status Report

2017-03-02 Thread Matthew Treinish
On Tue, Feb 28, 2017 at 11:49:53AM -0500, Matthew Treinish wrote:
> Hello,
> 
> We have a few particularly annoying bugs that have been impacting the
> reliability of gate testing recently. It would be great if we could get
> volunteers to look at these bugs to improve the reliability of our testing as 
> we
> start working on Pike.
> 
> These two issues have been identified by elastic-recheck as being our biggest
> problems:
> 
> 1. SSH Banner bug http://status.openstack.org/elastic-recheck/#1349617
> 
> This bug is a longstanding issue that comes and goes and also has lots of very
> similar (but subtly different) failure modes. Tempest attempts to ssh into the
> cirros guest and is unable to after 18 attempts over the 300 sec timeout 
> window
> and fails to login. Paramiko reports that there was an issue reading the 
> banner
> returned on port 22 from the guest. This indicates that something is likely
> responding on port 22. We're working on trying to get more details on what is
> the cause here with:
> 
> https://review.openstack.org/437128

We've been doing some more debugging on this issue and made some progress
getting to the bottom of the bug. Jens Rosenboom figured out that the banner
errors are actually being caused by tempest leaking ssh connections (via
paramiko) on auth failures. Dropbear is set to only allow 5 unauthorized
connections per ip address whcih tempest would trip after 5 failed login
attempts. [1] Dropbear would just close the socket after this for login attempt
6 which would cause the banner error. We addressed this in tempest with:

https://review.openstack.org/439638 

since that has merged we haven't seen the banner failure signature anymore, but
it still hasn't solved our ssh connectivity issues. Temepest still isn't able to
login to the guest and fails with an auth error. Kevin Benton has been looking
into this with:

https://bugs.launchpad.net/nova/+bug/1668958

and we're tracking the actual failure signature now: (which only appears after
the tempest fix merged)

http://status.openstack.org/elastic-recheck/gate.html#1668958


The work here is ongoing, but we made enough progress to change the elastic
recheck signature so I figured an update was warranted.

Thanks,

Matt Treinish

[1] https://bugs.launchpad.net/nova/+bug/1668958/comments/4




> 
> 2. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911 and
> http://status.openstack.org/elastic-recheck/#1646779
> 
> Libvirt is randomly crashing during the job which causes things to fail (for
> obvious reasons). To address this will likely require someone with experience
> debugging libvirt since it's most likely a bug isolated to libvirt. Tonyb has
> offered to start working on this so talk to him to coordinate efforts around
> fixing this.
> 
> The other thing to note is the oom-killer bug:
> http://status.openstack.org/elastic-recheck/gate.html#1656386 while there 
> aren't
> a lot of hits in logstash for this particular bug, it does raise an import 
> issue
> about the increased memory pressure on the test nodes. It's likely that a lot 
> of
> the instability may be related to the increased load on the nodes. As a 
> starting
> point all projects should look at their memory footprint and see where they 
> can
> trim things to try and make the situation better.
> 
> As a friendly reminder we do track bug rate incidence within our testing using
> the elastic-recheck tool. You can find that data at
> http://status.openstack.org/elastic-recheck. It can be quite useful to start
> there when determining which bugs to fix based on impact. Elastic recheck also
> maintains a list of failures that occurred without a known signature:
> http://status.openstack.org/elastic-recheck/data/integrated_gate.html
> 
> We also need some people to help maintain the list of existing queries, we 
> have
> a lot of queries for closed bugs that have no hits and others which are overly
> broad and matching failures which are unrelated to the bug. This would also be
> good task for a new person to start getting involved with. Feel free to submit
> patches to:
> https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries to
> track new issues.
> 
> Thank you,
> 
> mtreinish and clarkb


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Our New Weekly(ish) Test Status Report

2017-02-28 Thread Matthew Treinish
Hello,

We have a few particularly annoying bugs that have been impacting the
reliability of gate testing recently. It would be great if we could get
volunteers to look at these bugs to improve the reliability of our testing as we
start working on Pike.

These two issues have been identified by elastic-recheck as being our biggest
problems:

1. SSH Banner bug http://status.openstack.org/elastic-recheck/#1349617

This bug is a longstanding issue that comes and goes and also has lots of very
similar (but subtly different) failure modes. Tempest attempts to ssh into the
cirros guest and is unable to after 18 attempts over the 300 sec timeout window
and fails to login. Paramiko reports that there was an issue reading the banner
returned on port 22 from the guest. This indicates that something is likely
responding on port 22. We're working on trying to get more details on what is
the cause here with:

https://review.openstack.org/437128

2. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911 and
http://status.openstack.org/elastic-recheck/#1646779

Libvirt is randomly crashing during the job which causes things to fail (for
obvious reasons). To address this will likely require someone with experience
debugging libvirt since it's most likely a bug isolated to libvirt. Tonyb has
offered to start working on this so talk to him to coordinate efforts around
fixing this.

The other thing to note is the oom-killer bug:
http://status.openstack.org/elastic-recheck/gate.html#1656386 while there aren't
a lot of hits in logstash for this particular bug, it does raise an import issue
about the increased memory pressure on the test nodes. It's likely that a lot of
the instability may be related to the increased load on the nodes. As a starting
point all projects should look at their memory footprint and see where they can
trim things to try and make the situation better.

As a friendly reminder we do track bug rate incidence within our testing using
the elastic-recheck tool. You can find that data at
http://status.openstack.org/elastic-recheck. It can be quite useful to start
there when determining which bugs to fix based on impact. Elastic recheck also
maintains a list of failures that occurred without a known signature:
http://status.openstack.org/elastic-recheck/data/integrated_gate.html

We also need some people to help maintain the list of existing queries, we have
a lot of queries for closed bugs that have no hits and others which are overly
broad and matching failures which are unrelated to the bug. This would also be
good task for a new person to start getting involved with. Feel free to submit
patches to:
https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries to
track new issues.

Thank you,

mtreinish and clarkb


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][swg] per-project "Business only" moderated mailing lists

2017-02-27 Thread Matthew Treinish
On Mon, Feb 27, 2017 at 06:18:10PM +0100, Thierry Carrez wrote:
> > Dean Troyer wrote:
> >> On Mon, Feb 27, 2017 at 3:31 AM, Clint Byrum  wrote:
> >> This is not for users who only want to see some projects. That is a well
> >> understood space and the mailman filtering does handle it. This is for
> >> those who want to monitor the overall health of the community, address
> >> issues with cross-project specs, or participate in so many projects it
> >> makes little sense to spend time filtering.
> > 
> > Monday morning and the caffiene is just beginning to reach my brain,
> > but this seems counter-intuitive to me.  I consider myself someone who
> > _does_ want to keep in touch with the majority of the community, and
> > breaking things into N additional mailing lists makes that harder, not
> > easier.  I _do_ include core team updates, mascots, social meetings in
> > that set of things to pay a little attention to here, especially
> > around summit/PTG/Forum/etc times.

+1, I'm also someone who also tries to keep an eye on a lot of projects and
cross project work and will find this a lot more difficult.

> > 
> > I've seen a couple of descriptions of who this proposal is not
> > intended to address, who exactly is expected to benefit from more
> > mailing lists?
> 
> I'm not (yet) convinced that getting rid of 10% of ML messages (the ones
> that would go to the -business lists) is worth the hassle of setting up
> 50 new lists, have people subscribe to them, and have overworked PTL
> moderate them...

I agree with this. (although TBH I don't think I can be convinced) I
also don't think in practice it will even be close to 10% of the ML traffic
being routed to the per project lists.

> 
> Also from my experience moderating such a -business list (the
> openstack-tc list) I can say that it takes significant effort to avoid
> having general-interest discussions there (or to close them when they
> start from an innocent thread). Over those 50+ -business mailing-lists
> I'm pretty sure a few would diverge and use the convenience of isolated
> discussions without "outsiders" potentially chiming in. And they would
> be pretty hard to detect...
> 

Another similar counter example is the dedicated openstack-qa list, which has
been dead for a long time now. This was something that had similar issues,
although it wasn't a moderated list. What ended up happening was that the
discussions happening there were siloed and no one ever noticed anything being
discussed there. So things had to be cross posted to get any attention.
Discussions also ended up being duplicated between the 2 lists (like ttx said
he ties to avoid via active moderation). Which is why we dissolved the
openstack-qa list and just reintegrated the discussion back into openstack-dev.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-stable-maint] Stable check of openstack/tempest failed

2017-02-20 Thread Matthew Treinish
On Sun, Feb 19, 2017 at 07:25:39AM +, A mailing list for the OpenStack 
Stable Branch test reports. wrote:
> Build failed.
> 
> - periodic-tempest-dsvm-full-ubuntu-trusty-mitaka 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-full-ubuntu-trusty-mitaka/27fe759/
>  : SUCCESS in 32m 57s
> - periodic-tempest-dsvm-neutron-full-ubuntu-trusty-mitaka 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-neutron-full-ubuntu-trusty-mitaka/87b6097/
>  : SUCCESS in 52m 02s
> - periodic-tempest-dsvm-full-ubuntu-xenial-newton 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-full-ubuntu-xenial-newton/8eb0b4c/
>  : SUCCESS in 42m 26s
> - periodic-tempest-dsvm-neutron-full-ubuntu-xenial-newton 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-neutron-full-ubuntu-xenial-newton/d18419a/
>  : SUCCESS in 1h 19m 48s
> - periodic-tempest-dsvm-full-ubuntu-xenial-ocata 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-full-ubuntu-xenial-ocata/07a2329/
>  : SUCCESS in 1h 01m 27s
> - periodic-tempest-dsvm-neutron-full-ubuntu-xenial-ocata 
> http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-neutron-full-ubuntu-xenial-ocata/f5a5874/
>  : FAILURE in 1h 08m 32s


So I took a quick look at this failure and it was the oom failure we're 
tracking with https://bugs.launchpad.net/tempest/+bug/1664953

http://logs.openstack.org/periodic-stable/periodic-tempest-dsvm-neutron-full-ubuntu-xenial-ocata/f5a5874/logs/syslog.txt.gz#_Feb_19_07_13_47

I guess I don't mind playing manual elastic-recheck today.

-Matt


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [QA][all] Pike PTG Tempest Plugin Cross Project Session

2017-02-15 Thread Matthew Treinish
Hi Everyone,

I just wanted to advertise that we will be hosting a 1 hr cross project session
at the PTG next week. We have the Macon discussion room at the PTG reserved [1]
on Tuesday between 9:30a to 10:30a.

The intent of this session is for anyone who is producing or consuming tempest
plugins to discuss best practices, where there are gaps currently, and anything
we need to improve in the future.

We'll also be discussing the merits of and how to move forward with the proposed
community goal to split tempest plugins into separate repos:

https://review.openstack.org/369749

There was some opposition to this during the review period and it wasn't
accepted as a pike goal. I think to move forward having a face to face
discussion to address the concerns around this (especially with everyone who
has expressed issues with this) will be useful and productive.

I'm looking forward to seeing everyone next week.

Thanks,

Matt Treinish


[1] https://ethercalc.openstack.org/Pike-PTG-Discussion-Rooms


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

2017-02-02 Thread Matthew Treinish
On Thu, Feb 02, 2017 at 11:10:22AM -0500, Matthew Treinish wrote:
> On Wed, Feb 01, 2017 at 04:24:54PM -0800, Armando M. wrote:
> > Hi,
> > 
> > [TL;DR]: OpenStack services have steadily increased their memory
> > footprints. We need a concerted way to address the oom-kills experienced in
> > the openstack gate, as we may have reached a ceiling.
> > 
> > Now the longer version:
> > 
> > 
> > We have been experiencing some instability in the gate lately due to a
> > number of reasons. When everything adds up, this means it's rather
> > difficult to merge anything and knowing we're in feature freeze, that adds
> > to stress. One culprit was identified to be [1].
> > 
> > We initially tried to increase the swappiness, but that didn't seem to
> > help. Then we have looked at the resident memory in use. When going back
> > over the past three releases we have noticed that the aggregated memory
> > footprint of some openstack projects has grown steadily. We have the
> > following:
> > 
> >- Mitaka
> >   - neutron: 1.40GB
> >   - nova: 1.70GB
> >   - swift: 640MB
> >   - cinder: 730MB
> >   - keystone: 760MB
> >   - horizon: 17MB
> >   - glance: 538MB
> >- Newton
> >- neutron: 1.59GB (+13%)
> >   - nova: 1.67GB (-1%)
> >   - swift: 779MB (+21%)
> >   - cinder: 878MB (+20%)
> >   - keystone: 919MB (+20%)
> >   - horizon: 21MB (+23%)
> >   - glance: 721MB (+34%)
> >- Ocata
> >   - neutron: 1.75GB (+10%)
> >   - nova: 1.95GB (%16%)
> >   - swift: 703MB (-9%)
> >   - cinder: 920MB (4%)
> >   - keystone: 903MB (-1%)
> >   - horizon: 25MB (+20%)
> >   - glance: 740MB (+2%)
> > 
> > Numbers are approximated and I only took a couple of samples, but in a
> > nutshell, the majority of the services have seen double digit growth over
> > the past two cycles in terms of the amount or RSS memory they use.
> > 
> > Since [1] is observed only since ocata [2], I imagine that's pretty
> > reasonable to assume that memory increase might as well be a determining
> > factor to the oom-kills we see in the gate.
> > 
> > Profiling and surgically reducing the memory used by each component in each
> > service is a lengthy process, but I'd rather see some gate relief right
> > away. Reducing the number of API workers helps bring the RSS memory down
> > back to mitaka levels:
> > 
> >- neutron: 1.54GB
> >- nova: 1.24GB
> >- swift: 694MB
> >- cinder: 778MB
> >- keystone: 891MB
> >- horizon: 24MB
> >- glance: 490MB
> > 
> > However, it may have other side effects, like longer execution times, or
> > increase of timeouts.
> > 
> > Where do we go from here? I am not particularly fond of stop-gap [4], but
> > it is the one fix that most widely address the memory increase we have
> > experienced across the board.
> 
> So I have a couple of concerns with doing this. We're only running with 2
> workers per api service now and dropping it down to 1 means we have no more
> memory head room in the future. So this feels like we're just delaying the
> inevitable maybe for a cycle or 2. When we first started hitting OOM issues a
> couple years ago we dropped from nprocs to nprocs/2. [5] Back then we were 
> also
> running more services per job, it was back in the day of the integrated 
> release
> so all those projects were running. (like ceilometer, heat, etc.) So in a 
> little
> over 2 years the memory consumption for the 7 services has increased to the
> point where we're making up for a bunch of extra services that don't run in 
> the
> job anymore and we had to drop the worker count in half since. So if we were 
> to
> do this we don't have anymore room for when things keep growing. I think now 
> is
> the time we should start seriously taking a stance on our memory footprint
> growth and see if we can get it under control.
> 
> My second concern is the same as you here, the long term effects of this 
> change
> aren't exactly clear. With the limited sample size of the test patch[4] we 
> can't
> really say if it'll negatively affect run time or job success rates. I don't 
> think
> it should be too bad, tempest is only making 4 api requests at a time, and 
> most of
> the services should be able to handle that kinda load with a single worker. 
> (I'd
> hope)
> 
> This also does bring up the question of the gate config being representative
> of how we recommend

Re: [openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

2017-02-02 Thread Matthew Treinish
On Wed, Feb 01, 2017 at 04:24:54PM -0800, Armando M. wrote:
> Hi,
> 
> [TL;DR]: OpenStack services have steadily increased their memory
> footprints. We need a concerted way to address the oom-kills experienced in
> the openstack gate, as we may have reached a ceiling.
> 
> Now the longer version:
> 
> 
> We have been experiencing some instability in the gate lately due to a
> number of reasons. When everything adds up, this means it's rather
> difficult to merge anything and knowing we're in feature freeze, that adds
> to stress. One culprit was identified to be [1].
> 
> We initially tried to increase the swappiness, but that didn't seem to
> help. Then we have looked at the resident memory in use. When going back
> over the past three releases we have noticed that the aggregated memory
> footprint of some openstack projects has grown steadily. We have the
> following:
> 
>- Mitaka
>   - neutron: 1.40GB
>   - nova: 1.70GB
>   - swift: 640MB
>   - cinder: 730MB
>   - keystone: 760MB
>   - horizon: 17MB
>   - glance: 538MB
>- Newton
>- neutron: 1.59GB (+13%)
>   - nova: 1.67GB (-1%)
>   - swift: 779MB (+21%)
>   - cinder: 878MB (+20%)
>   - keystone: 919MB (+20%)
>   - horizon: 21MB (+23%)
>   - glance: 721MB (+34%)
>- Ocata
>   - neutron: 1.75GB (+10%)
>   - nova: 1.95GB (%16%)
>   - swift: 703MB (-9%)
>   - cinder: 920MB (4%)
>   - keystone: 903MB (-1%)
>   - horizon: 25MB (+20%)
>   - glance: 740MB (+2%)
> 
> Numbers are approximated and I only took a couple of samples, but in a
> nutshell, the majority of the services have seen double digit growth over
> the past two cycles in terms of the amount or RSS memory they use.
> 
> Since [1] is observed only since ocata [2], I imagine that's pretty
> reasonable to assume that memory increase might as well be a determining
> factor to the oom-kills we see in the gate.
> 
> Profiling and surgically reducing the memory used by each component in each
> service is a lengthy process, but I'd rather see some gate relief right
> away. Reducing the number of API workers helps bring the RSS memory down
> back to mitaka levels:
> 
>- neutron: 1.54GB
>- nova: 1.24GB
>- swift: 694MB
>- cinder: 778MB
>- keystone: 891MB
>- horizon: 24MB
>- glance: 490MB
> 
> However, it may have other side effects, like longer execution times, or
> increase of timeouts.
> 
> Where do we go from here? I am not particularly fond of stop-gap [4], but
> it is the one fix that most widely address the memory increase we have
> experienced across the board.

So I have a couple of concerns with doing this. We're only running with 2
workers per api service now and dropping it down to 1 means we have no more
memory head room in the future. So this feels like we're just delaying the
inevitable maybe for a cycle or 2. When we first started hitting OOM issues a
couple years ago we dropped from nprocs to nprocs/2. [5] Back then we were also
running more services per job, it was back in the day of the integrated release
so all those projects were running. (like ceilometer, heat, etc.) So in a little
over 2 years the memory consumption for the 7 services has increased to the
point where we're making up for a bunch of extra services that don't run in the
job anymore and we had to drop the worker count in half since. So if we were to
do this we don't have anymore room for when things keep growing. I think now is
the time we should start seriously taking a stance on our memory footprint
growth and see if we can get it under control.

My second concern is the same as you here, the long term effects of this change
aren't exactly clear. With the limited sample size of the test patch[4] we can't
really say if it'll negatively affect run time or job success rates. I don't 
think
it should be too bad, tempest is only making 4 api requests at a time, and most 
of
the services should be able to handle that kinda load with a single worker. (I'd
hope)

This also does bring up the question of the gate config being representative
of how we recommend running OpenStack. Like the reasons we try to use default
config values as much as possible in devstack. We definitely aren't saying
running a single worker

But, I'm not sure any of that is a blocker for moving forward with dropping down
to a single worker.

As an aside, I also just pushed up: https://review.openstack.org/#/c/428220/ to
see if that provides any useful info. I'm doubtful that it will be helpful,
because it's the combination of services running causing the issue. But it
doesn't really hurt to collect that.

-Matt Treinish

> [1] https://bugs.launchpad.net/neutron/+bug/1656386
> [2]
> http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22oom-killer%5C%22%20AND%20tags:syslog
> [3]
> http://logs.openstack.org/21/427921/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/82084c2/
> 

Re: [openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate

2017-02-02 Thread Matthew Treinish
On Thu, Feb 02, 2017 at 04:27:51AM +, Dolph Mathews wrote:
> What made most services jump +20% between mitaka and newton? Maybe there is
> a common cause that we can tackle.

Yeah, I'm curious about this too, there seems to be a big jump in Newton for
most of the project. It might not a be a single common cause between them, but
I'd be curious to know what's going on there. 

> 
> I'd also be in favor of reducing the number of workers in the gate,
> assuming that doesn't also substantially increase the runtime of gate jobs.
> Does that environment variable (API_WORKERS) affect keystone and horizon?

It affects keystone, in certain deploy modes (only uwsgi standalone I think,
which menas not for most jobs) if it's running under apache we rely on apache
to handle things. Which is why this doesn't work on horizon.

API_WORKERS was the interface we added to devstack after we started having OOM
issues the first time around (roughly 2 years ago) Back then we were running
the service defaults which in most cases was nprocs for the number of workers.
API_WORKERS was added to have a global flag to set that to something else for
all the services. Right now it defaults to nproc/4 as long as that's >=2:

https://github.com/openstack-dev/devstack/blob/master/stackrc#L714

which basically means in the gate right now we're only running with 2 api
workers per server. It's just that a lot of 

-Matt Treinish

> 
> On Wed, Feb 1, 2017 at 6:39 PM Kevin Benton  wrote:
> 
> > And who said openstack wasn't growing? ;)
> >
> > I think reducing API workers is a nice quick way to bring back some
> > stability.
> >
> > I have spent a bunch of time digging into the OOM killer events and
> > haven't yet figured out why they are being triggered. There is significant
> > swap space remaining in all of the cases I have seen so it's likely some
> > memory locking issue or kernel allocations blocking swap. Until we can
> > figure out the cause, we effectively have no usable swap space on the test
> > instances so we are limited to 8GB.
> >
> > On Feb 1, 2017 17:27, "Armando M."  wrote:
> >
> > Hi,
> >
> > [TL;DR]: OpenStack services have steadily increased their memory
> > footprints. We need a concerted way to address the oom-kills experienced in
> > the openstack gate, as we may have reached a ceiling.
> >
> > Now the longer version:
> > 
> >
> > We have been experiencing some instability in the gate lately due to a
> > number of reasons. When everything adds up, this means it's rather
> > difficult to merge anything and knowing we're in feature freeze, that adds
> > to stress. One culprit was identified to be [1].
> >
> > We initially tried to increase the swappiness, but that didn't seem to
> > help. Then we have looked at the resident memory in use. When going back
> > over the past three releases we have noticed that the aggregated memory
> > footprint of some openstack projects has grown steadily. We have the
> > following:
> >
> >- Mitaka
> >   - neutron: 1.40GB
> >   - nova: 1.70GB
> >   - swift: 640MB
> >   - cinder: 730MB
> >   - keystone: 760MB
> >   - horizon: 17MB
> >   - glance: 538MB
> >- Newton
> >- neutron: 1.59GB (+13%)
> >   - nova: 1.67GB (-1%)
> >   - swift: 779MB (+21%)
> >   - cinder: 878MB (+20%)
> >   - keystone: 919MB (+20%)
> >   - horizon: 21MB (+23%)
> >   - glance: 721MB (+34%)
> >- Ocata
> >   - neutron: 1.75GB (+10%)
> >   - nova: 1.95GB (%16%)
> >   - swift: 703MB (-9%)
> >   - cinder: 920MB (4%)
> >   - keystone: 903MB (-1%)
> >   - horizon: 25MB (+20%)
> >   - glance: 740MB (+2%)
> >
> > Numbers are approximated and I only took a couple of samples, but in a
> > nutshell, the majority of the services have seen double digit growth over
> > the past two cycles in terms of the amount or RSS memory they use.
> >
> > Since [1] is observed only since ocata [2], I imagine that's pretty
> > reasonable to assume that memory increase might as well be a determining
> > factor to the oom-kills we see in the gate.
> >
> > Profiling and surgically reducing the memory used by each component in
> > each service is a lengthy process, but I'd rather see some gate relief
> > right away. Reducing the number of API workers helps bring the RSS memory
> > down back to mitaka levels:
> >
> >- neutron: 1.54GB
> >- nova: 1.24GB
> >- swift: 694MB
> >- cinder: 778MB
> >- keystone: 891MB
> >- horizon: 24MB
> >- glance: 490MB
> >
> > However, it may have other side effects, like longer execution times, or
> > increase of timeouts.
> >
> > Where do we go from here? I am not particularly fond of stop-gap [4], but
> > it is the one fix that most widely address the memory increase we have
> > experienced across the board.
> >
> > Thanks,
> > Armando
> >
> > [1] https://bugs.launchpad.net/neutron/+bug/1656386
> > [2]
> > 

Re: [openstack-dev] [elastic-recheck] Miscellaneous questions of potential Neutron interest

2017-01-31 Thread Matthew Treinish
On Tue, Jan 31, 2017 at 10:38:58AM -0800, Ihar Hrachyshka wrote:
> Hi all,
> 
> we were looking at expanding usage of elastic-recheck in Neutron, and
> several questions popped up that we would like to ask.
> 
> 1. Are all jobs eligible for coverage with queries? The reason we ask
> is that there was some disagreement on whether all job runs are
> eligible, or e.g. gate queue job runs only. For example, in Neutron,
> we have fullstack and functional tests that are in check queue but not
> in gate queue. Can we still post queries for those jobs? Will e-r bot
> match against those queries?

The elastic recheck bot listens to all jobs, and we can add queries for
any gate failure. In the past we limited it to just dsvm jobs and just projects
in openstack/ namespace. But we haven't done either of those in a really long 
time,
the dsvm limitation was just for like the first month of the project.
> 
> 2. Review velocity is not stable in the project. Sometimes we get
> immediate reviews, sometimes not so much (the last one took me a month
> to land a query). It's important that new queries get timely feedback.
> Can we consider expanding core reviewer team to smoothen the process?
> If not, how can we make sure queries land in time?

Well there are really only 3 cores on the project, and if some of us aren't
working or are busy with other things the queue can get backed up and things
fall through the cracks. Although, fwiw new queries aren't a steady stream
either. We've gone months where just mriedem or me were the only people
pushing queries.

I'm totally in favor of expanding the review team, the issue here is that not
many people have stood up to start tackling reviews. The only reviews from
non-cores I normally see are people from a project team piling on to a query
for a bad gate bug they're hitting at the time. e-r queries aren't that hard
to review and there are just a few things we look for which are outlined here:
https://github.com/openstack-infra/elastic-recheck#queries
if people step up and start helping out with the review load we definitely can
expand the core team.


> 
> 3. I see some IRC channels have elastic-recheck bot reporting about
> identified failures in the channels. How can we add the bot to our
> channel?

This is a just specified in a config file:

https://github.com/openstack-infra/puppet-elastic_recheck/blob/master/files/recheckwatchbot.yaml#L1-L7

It's just no project (besides QA) has ever chosen to subscribe to irc
notifications before. There was discussion about it back when we first
introduced the bot, but it wasn't turned on because of concerns around channel
noise. (https://review.openstack.org/#/c/79123/ )

What the bot reports to irc is also configurable. So you can have it return
on failures for a particular project (or group of projects) and also only on
identified or unidentified failures.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] gate jobs - papercuts

2017-01-31 Thread Matthew Treinish
On Tue, Jan 31, 2017 at 01:19:41PM -0500, Steve Martinelli wrote:
> On Tue, Jan 31, 2017 at 12:49 PM, Davanum Srinivas 
> wrote:
> 
> > Folks,
> >
> > Here's the list of job failures that failed in the gate queue.
> > captured with my script[1][2] since around 10:00 AM today. All jobs
> > failed with just one bad test.
> >
> > http://logs.openstack.org/48/423548/11/gate/gate-keystone-
> > python27-db-ubuntu-xenial/a1f55ca/
> >- keystone.tests.unit.test_v3_auth.TestMFARules
> >
> > 
> 
> 
> This was due to a race condition between token issuance and validation,
> should be fixed.

Is there a bug open for this? If so lets get an elastic-recheck query up for it
so we can track it and get it off the uncategorized page:

http://status.openstack.org/elastic-recheck/data/integrated_gate.html

Our categorization rate is quite low right now and it'll only make things harder
to debug other failures if we've got a bunch of unknown races going on.

We have a lot of tools to make debugging the gate easier and making everyone 
more
productive. But, it feels like we haven't been utilizing them fully lately which
makes gate backups more likely and digging out of the hole harder.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] gate jobs - papercuts

2017-01-31 Thread Matthew Treinish
On Tue, Jan 31, 2017 at 12:49:13PM -0500, Davanum Srinivas wrote:
> Folks,
> 
> Here's the list of job failures that failed in the gate queue.
> captured with my script[1][2] since around 10:00 AM today. All jobs
> failed with just one bad test.
> 
> http://logs.openstack.org/48/426448/2/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/ecb3d0a/
>- tempest.api.compute.servers.test_servers_negative.ServersNegativeTestJSON
> http://logs.openstack.org/48/426448/2/gate/gate-tempest-dsvm-neutron-full-ssh/71f6c8c/
>  - tempest.api.compute.admin.test_servers.ServersAdminTestJSON
> http://logs.openstack.org/48/376548/8/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/cf3028b/
>- tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON
> http://logs.openstack.org/68/417668/8/gate/gate-tempest-dsvm-neutron-full-ssh/27bda02/
>  - 
> tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON
> http://logs.openstack.org/48/423548/11/gate/gate-keystone-python27-db-ubuntu-xenial/a1f55ca/
>- keystone.tests.unit.test_v3_auth.TestMFARules
> http://logs.openstack.org/61/424961/1/gate/gate-tempest-dsvm-cells-ubuntu-xenial/8a1f9e7/
>   - tempest.api.compute.admin.test_servers.ServersAdminTestJSON
> http://logs.openstack.org/23/426823/3/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/0204168/
>- 
> tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps
> 
> So our gate is now 36 deep with stuff running for little more than 4
> hours repeatedly Can folks look deeper please?
> 
> Thanks,
> Dims
> 
> [1] https://gist.github.com/dims/54b391bd5964d3d208113b16766ea85e
> [2] http://paste.openstack.org/show/597071/

Just as an aside this basic view is integrated into the home page on
openstack-health:

http://status.openstack.org/openstack-health/#/

under the section "Failed Tests in Last 10 Failed Runs". It also hooks into
elastic-recheck and will point out e-r hits there too. So, people don't need
to run this script manually to see what is failing.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all][QA][goals] Proposed Pike Goal Split Out Tempest Plugins

2017-01-03 Thread Matthew Treinish

Hi Everyone,

I pushed out a proposed OpenStack-wide goal for the Pike cycle to split all
tempest plugins into separate repos:

https://review.openstack.org/369749

At a high level all that is being proposed is that we split out any tempest
plugins that are bundled in a service repo into a separate standalone
project/git repository. There are several reasons for doing this which are
outlined in the review and also are in the tempest documentation
(although we probably need to clean up the wording in the tempest docs a bit):

http://docs.openstack.org/developer/tempest/plugin.html#standalone-plugin-vs-in-repo-plugin

I encourage everyone to take a look at the review and provide feedback.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance][tempest][api] community images, tempest tests, and API stability

2016-12-22 Thread Matthew Treinish
On Thu, Dec 22, 2016 at 02:57:20PM -0500, Brian Rosmaita wrote:
> Something has come up with a tempest test for Glance and the community
> images implementation, and I think it could use some mailing list
> discussion, as everyone might not be aware of the patch where the
> discussion is happening now [1].
> 
> First, the Glance background, to get everyone on the same page:
> 
> As part of implementing community images [2], the 'visibility' field of
> an image is going from being 2-valued to being 4-valued.  Up to now, the
> only values have been 'private' and 'public', which meant that shared
> images were 'private', which was inaccurate and confusing (and bugs were
> filed with Glance about shared images not having visibility 'shared'
> [3a,b]).
> 
> With the new visibility enum, the Images API v2 will behave as follows:
> 
> * An image with visibility == 'private' is not shared, and is not
> shareable until its visibility is changed to 'shared'.
> 
> * An image must have visibility == 'shared' in order to do member
> operations or be accessible to image members.
> 
> * The default visibility of a freshly-created image is 'shared'.  This
> may seem weird, but a freshly-created image has no members, so it's
> effectively private, behaving exactly as a freshly-created image does,
> pre-Ocata.  It's also ready to immediately accept a member-create call,
> as freshly-created images are pre-Ocata.  So from a workflow
> perspective, this change is backward compatible.
> 
> * After much discussion [4], including discussion with operators and an
> operator's survey [5], we decided that the correct migration of
> 'visibility' values for existing images when a cloud is updated would
> be: public images stay 'public', private images with members become
> 'shared', and private images without images stay 'private'.  (Thus, if
> you have a 'private' image, you'll have to change it to 'shared' before
> you can add members.  On the other hand, now it's *really* private.)
> 
> * You can specify a visibility at the time of image-creation, as you can
> now.  But if you specify 'private', what you get is *really* private.
> This either introduces a minor backward incompatibility, or it fixes a
> bug, depending on how you look at it.  The key thing is, if you *don't*
> specify a visibility, an image with the default visibility will behave
> exactly as it does now, from the perspective of being able to make API
> calls on it (e.g., adding members).
> 
> Thanks for reading this far.  (There's a much more detailed discussion
> in the spec; see the "Other end user impact" section. [2])  Here's the
> point of this email:
> 
> The community images patch [6] is causing a recently added tempest test
> [7] to fail.  The test in question uses an image that was created by a
> request that explicitly specified visibility == private.  Eventually it
> tries to add a member to this image, and as discussed above, this
> operation will fail once we have merged Community Images (because the
> image visibility is not 'shared').  If the image had been created with
> the default visibility (that is, not explicitly specifying a visibility
> in the image-create call), this problem would not arise.  Keep in mind
> that prior to Ocata, there was no reason for end users to specify an
> image visibility explicitly upon image creation because there were only
> two possible values, one of which required special permissions in order
> to use successfully.

While you say there was no reason for a user to do it was still part of the
glance API and part of your contract with end users. It's something anyone
could be doing today, (which is obvious because tempest is doing it)
regardless of whether you think there is a use case for it or not. The whole
point of a stable api is that you don't break things like this. I'd really
recommend reading Sean Dague's blog post here about Nova's api here:

https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2/

because it does a very good job explaining typical api use cases and how to
think about api compatibility.

> Thus, we feel that the situation occurring in the
> test is not one that many end users will come across.  We have discussed
> the situation extensively with the broader OpenStack community, and the
> consensus is that this change to the API is acceptable.

The guidelines for API compatiblity [1] are there for a reason, and breaking
existing users is **never** the right answer. You'll never get full coverage of
end users by just asking for feedback on the ML and IRC meetings. Heck, I hadn't
heard of any of these proposed changes until that tempest review, and I'm hardly
a disconnected user. The thing to remember is that all APIs have warts and
aspects which are less than ideal, our first priority when making improvements
should be to not randomly break our users in the process. If the goal of
OpenStack is to provide an interoperable cloud platform, ensuring we don't break
our users is kinda 

Re: [openstack-dev] [kolla][tc] Video Meetings - input requested

2016-12-12 Thread Matthew Treinish
On Tue, Dec 13, 2016 at 01:16:13AM +0800, Jeffrey Zhang wrote:
> TC
> ​,
> ​
> Some contributors in kolla have had unscheduled video meetings. This has
> resulted in complaints about inclusiveness. Some contributors can’t even
> make
> the meeting we have, and another scheduled video meeting might produce a
> situation in which there is no any record of decisions made during the video
> meeting. At least with IRC meetings there is always a log.
> 
> One solution is to schedule these meetings and have two 1 hour meetings per
> week.
> 
> As the PTL while Michal is moving, I have trouble following these video
> meetings since English isn’t my native language. Can you offer any advice
> for
> our project?
> 

Well one of is 4 opens open community specifically calls out having official
meetings over irc. [1] It's also a requirement for OpenStack projects to have
meetings on irc where they're logged. [2] If these video meetings are being used
to make decisions and there is no discussion of it in on the ML or via an 
official
irc meeting then that's a problem. (for the reasons you've outlined)

This basic topic was also discussed before in the thread starting here:

http://lists.openstack.org/pipermail/openstack-dev/2015-February/056551.html

As Flavio said there I don't think we can (or should?) prevent people from
having ad-hoc calls or video chats to work through an issue. They can be quite
valuable to work through a disagreement or other problem with high bandwidth
communication. But, that by itself should never be definitive discussion or
used in lieu of an open communication mechanism to make decisions in the
community. Whatever is discussed in these has to go through the normal
open communication mechanisms we use in the community before you can act upon
them.

I'm not really familiar with the full scope of these video meetings Kolla is
having (this is the first I've heard of them) but based on your description it
sounds like they are encroaching on violating the open community requirement
for projects. I think this is especially true if you're using the video
meetings as a replacement for irc meetings. But, without knowing all the
details I can't say for certain.


-Matt Treinish

[1] https://governance.openstack.org/tc/reference/opens.html#open-community
[2] https://governance.openstack.org/tc/reference/new-projects-requirements.html


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest] Test case for new feature failed in Jenkins check against old release

2016-09-29 Thread Matthew Treinish
On Thu, Sep 29, 2016 at 03:49:27PM +0800, Bruce Tan wrote:
> Hello everyone,
> 
> I am having a problem writing/updating a test case to verify some new
> feature (in my case, the "description" field for a network).
> 
> Acoording to Tempest Coding Guide[1], I am supposed to check if the
> related feature is there by @test.requires_ext() like this:
> @test.requires_ext(extension="standard-attr-description",
>service="network")
> 
> And according to the same doc,
> > When running a test that requires a certain "feature" in the target
> > cloud, if that feature is missing we should fail, because either the
> > test configuration is invalid, or the cloud is broken and the expected
> > "feature" is not there even if the cloud was configured with it.
> 
> 
> However, my patch[2] got a "-1" from Jenkins because one check
> ("gate-tempest-dsvm-neutron-full-ubuntu-trusty-liberty") failed. The
> reason for failing, I think, is just what I quoted above: the
> tempest.conf[3]  file is configured as
>   [network-feature-enabled]api_extensions = all
> which means any api_extension is supported; but the feature I am
> testing is obviously not there in Liberty, so the API doesn't accept
> "description" field, and the test case failed.
> 
> So my question is, what did I do wrong? Is there some other way
> to skip the case for older releases? Or, maybe we shouldn't use
> "api_extensions=all" (explicitly list all extensions instead, which
> takes some effort obviously) in the configuration file?

You actually did everything correctly for making sure you skip properly
if the new feature isn't present on the tempest side. It's actually a bug in
devstack you're hitting. On stable branches we're supposed to hard code the
extension list of what api extensions are available when we branch the project.
But in the case of liberty, we neglected to do this:

http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/tempest?h=stable%2Fliberty#n430

Instead this should look something like what we did on the kilo branch:

http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/tempest?h=stable/kilo#n413

You just need to push up a patch to devstack's stable/liberty branch that hard
codes the extensions available like we did in previous branches.  Once that's up
add a Depends-On to your commit and it should work fine.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] os-loganalyze, project log parsing, or ...

2016-09-27 Thread Matthew Treinish
On Tue, Sep 27, 2016 at 03:32:17PM -0400, Andrew Laski wrote:
> 
> 
> On Tue, Sep 27, 2016, at 02:40 PM, Matthew Treinish wrote:
> > On Tue, Sep 27, 2016 at 01:03:35PM -0400, Andrew Laski wrote:
> > > 
> > > 
> > > On Tue, Sep 27, 2016, at 12:39 PM, Matthew Treinish wrote:
> > > > On Tue, Sep 27, 2016 at 11:36:07AM -0400, Andrew Laski wrote:
> > > > > Hello all,
> > > > > 
> > > > > Recently I noticed that people would look at logs from a Zuul née
> > > > > Jenkins CI run and comment something like "there seem to be more
> > > > > warnings in here than usual." And so I thought it might be nice to
> > > > > quantify that sort of thing so we didn't have to rely on gut feelings.
> > > > > 
> > > > > So I threw together https://review.openstack.org/#/c/376531 which is a
> > > > > script that lives in the Nova tree, gets called from a devstack-gate
> > > > > post_test_hook, and outputs an n-stats.json file which can be seen at
> > > > > http://logs.openstack.org/06/375106/8/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/e103612/logs/n-stats.json.
> > > > > This provides just a simple way to compare two runs and spot large
> > > > > changes between them. Perhaps later things could get fancy and these
> > > > > stats could be tracked over time. I am also interested in adding stats
> > > > > for things that are a bit project specific like how long (max, min, 
> > > > > med)
> > > > > it took to boot an instance, or what's probably better to track is how
> > > > > many operations that took for some definition of an operation.
> > > > > 
> > > > > I received some initial feedback that this might be a better fit in 
> > > > > the
> > > > > os-loganalyze project so I took a look over there. So I cloned the
> > > > > project to take a look and quickly noticed
> > > > > http://git.openstack.org/cgit/openstack-infra/os-loganalyze/tree/README.rst#n13.
> > > > > That makes me think it would not be a good fit there because what I'm
> > > > > looking to do relies on parsing the full file, or potentially multiple
> > > > > files, in order to get useful data.
> > > > > 
> > > > > So my questions: does this seem like a good fit for os-loganalyze? If
> > > > > not is there another infra/QA project that this would be a good fit 
> > > > > for?
> > > > > Or would people be okay with a lone project like Nova implementing 
> > > > > this
> > > > > in tree for their own use?
> > > > > 
> > > > 
> > > > I think having this in os-loganalyze makes sense since we use that for
> > > > visualizing the logs already. It also means we get it for free on all 
> > > > the
> > > > log
> > > > files. But, if it's not a good fit for a technical reason then I think
> > > > creating
> > > > another small tool under QA or infra would be a good path forward. Since
> > > > there
> > > > really isn't anything nova specific in that.
> > > 
> > > There's nothing Nova specific atm because I went for low hanging fruit.
> > > But if the plan is to have Nova specific, Cinder specific, Glance
> > > specific, etc... things in there do people still feel that a QA/infra
> > > tool is the right path forward. That's my only hesitation here.
> > 
> > Well I think that raises more questions, what do you envision the nova
> > specific
> > bits would be. The only thing I could see would be something that looks
> > for
> > specific log messages or patterns in the logs. Which feels like exactly
> > what
> > elastic-recheck does?
> 
> I'm thinking beyond single line things. An example could be a parser
> that can calculate the timing between the first log message seen for a
> request-id and the last, or could count the number of log lines
> associated with each instance boot perhaps even broken down by log
> level. Things that require both an understanding of how to correlate
> groups of log lines with specific events(instance boot), and being able
> to calculate stats for groups of log lines(debug log line count by
> request-id).
> 
> I have only a rudimentary familiarity with elastic-recheck but my
> understanding is that doing anything that looks at multiple lines like
> that is either complex or not really possible.


Th

Re: [openstack-dev] os-loganalyze, project log parsing, or ...

2016-09-27 Thread Matthew Treinish
On Tue, Sep 27, 2016 at 01:03:35PM -0400, Andrew Laski wrote:
> 
> 
> On Tue, Sep 27, 2016, at 12:39 PM, Matthew Treinish wrote:
> > On Tue, Sep 27, 2016 at 11:36:07AM -0400, Andrew Laski wrote:
> > > Hello all,
> > > 
> > > Recently I noticed that people would look at logs from a Zuul née
> > > Jenkins CI run and comment something like "there seem to be more
> > > warnings in here than usual." And so I thought it might be nice to
> > > quantify that sort of thing so we didn't have to rely on gut feelings.
> > > 
> > > So I threw together https://review.openstack.org/#/c/376531 which is a
> > > script that lives in the Nova tree, gets called from a devstack-gate
> > > post_test_hook, and outputs an n-stats.json file which can be seen at
> > > http://logs.openstack.org/06/375106/8/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/e103612/logs/n-stats.json.
> > > This provides just a simple way to compare two runs and spot large
> > > changes between them. Perhaps later things could get fancy and these
> > > stats could be tracked over time. I am also interested in adding stats
> > > for things that are a bit project specific like how long (max, min, med)
> > > it took to boot an instance, or what's probably better to track is how
> > > many operations that took for some definition of an operation.
> > > 
> > > I received some initial feedback that this might be a better fit in the
> > > os-loganalyze project so I took a look over there. So I cloned the
> > > project to take a look and quickly noticed
> > > http://git.openstack.org/cgit/openstack-infra/os-loganalyze/tree/README.rst#n13.
> > > That makes me think it would not be a good fit there because what I'm
> > > looking to do relies on parsing the full file, or potentially multiple
> > > files, in order to get useful data.
> > > 
> > > So my questions: does this seem like a good fit for os-loganalyze? If
> > > not is there another infra/QA project that this would be a good fit for?
> > > Or would people be okay with a lone project like Nova implementing this
> > > in tree for their own use?
> > > 
> > 
> > I think having this in os-loganalyze makes sense since we use that for
> > visualizing the logs already. It also means we get it for free on all the
> > log
> > files. But, if it's not a good fit for a technical reason then I think
> > creating
> > another small tool under QA or infra would be a good path forward. Since
> > there
> > really isn't anything nova specific in that.
> 
> There's nothing Nova specific atm because I went for low hanging fruit.
> But if the plan is to have Nova specific, Cinder specific, Glance
> specific, etc... things in there do people still feel that a QA/infra
> tool is the right path forward. That's my only hesitation here.

Well I think that raises more questions, what do you envision the nova specific
bits would be. The only thing I could see would be something that looks for
specific log messages or patterns in the logs. Which feels like exactly what
elastic-recheck does?

I definitely can see the value in having machine parsable log stats in our
artifacts, but I'm not sure where project specific pieces would come from. But,
given that hypothetical I would say as long as you made those pieces
configurable (like a yaml syntax to search for patterns by log file or
something) and kept a generic framework/tooling for parsing the log stats I
think it's still a good fit for a QA or Infra project. Especially if you think
whatever pattern you're planning to use is something other projects would want
to reuse.

-Matt Treinish


> 
> > 
> > I would caution against doing it as a one off in a project repo doesn't
> > seem
> > like the best path forward for something like this. We actually tried to
> > do
> > something similar to that in the past inside the tempest repo:
> > 
> > http://git.openstack.org/cgit/openstack/tempest/tree/tools/check_logs.py
> > 
> > and
> > 
> > http://git.openstack.org/cgit/openstack/tempest/tree/tools/find_stack_traces.py
> > 
> > all it did was cause confusion because no one knew where the output was
> > coming
> > from. Although, the output from those tools was also misleading, which
> > was
> > likely a bigger problm. So this probably won't be an issue if you add a
> > json
> > output to the jobs.
> > 
> > I also wonder if the JSONFormatter from oslo.log:
> > 
> > http://docs.openstack.org/developer/oslo.log/api/formatters.html#oslo_log.

Re: [openstack-dev] os-loganalyze, project log parsing, or ...

2016-09-27 Thread Matthew Treinish
On Tue, Sep 27, 2016 at 11:36:07AM -0400, Andrew Laski wrote:
> Hello all,
> 
> Recently I noticed that people would look at logs from a Zuul née
> Jenkins CI run and comment something like "there seem to be more
> warnings in here than usual." And so I thought it might be nice to
> quantify that sort of thing so we didn't have to rely on gut feelings.
> 
> So I threw together https://review.openstack.org/#/c/376531 which is a
> script that lives in the Nova tree, gets called from a devstack-gate
> post_test_hook, and outputs an n-stats.json file which can be seen at
> http://logs.openstack.org/06/375106/8/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/e103612/logs/n-stats.json.
> This provides just a simple way to compare two runs and spot large
> changes between them. Perhaps later things could get fancy and these
> stats could be tracked over time. I am also interested in adding stats
> for things that are a bit project specific like how long (max, min, med)
> it took to boot an instance, or what's probably better to track is how
> many operations that took for some definition of an operation.
> 
> I received some initial feedback that this might be a better fit in the
> os-loganalyze project so I took a look over there. So I cloned the
> project to take a look and quickly noticed
> http://git.openstack.org/cgit/openstack-infra/os-loganalyze/tree/README.rst#n13.
> That makes me think it would not be a good fit there because what I'm
> looking to do relies on parsing the full file, or potentially multiple
> files, in order to get useful data.
> 
> So my questions: does this seem like a good fit for os-loganalyze? If
> not is there another infra/QA project that this would be a good fit for?
> Or would people be okay with a lone project like Nova implementing this
> in tree for their own use?
> 

I think having this in os-loganalyze makes sense since we use that for
visualizing the logs already. It also means we get it for free on all the log
files. But, if it's not a good fit for a technical reason then I think creating
another small tool under QA or infra would be a good path forward. Since there
really isn't anything nova specific in that.

I would caution against doing it as a one off in a project repo doesn't seem
like the best path forward for something like this. We actually tried to do
something similar to that in the past inside the tempest repo:

http://git.openstack.org/cgit/openstack/tempest/tree/tools/check_logs.py

and

http://git.openstack.org/cgit/openstack/tempest/tree/tools/find_stack_traces.py

all it did was cause confusion because no one knew where the output was coming
from. Although, the output from those tools was also misleading, which was
likely a bigger problm. So this probably won't be an issue if you add a json
output to the jobs.

I also wonder if the JSONFormatter from oslo.log:

http://docs.openstack.org/developer/oslo.log/api/formatters.html#oslo_log.formatters.JSONFormatter

would be useful here. We can proabbly turn that on if it makes things easier.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest]Tempest test concurrency

2016-09-21 Thread Matthew Treinish
On Wed, Sep 21, 2016 at 10:44:51AM -0400, Bob Hansen wrote:
> 
> 
> I have been looking at some of the stackviz output as I'm trying to improve
> the run time of my thrid-party CI. As an example:
> 
> http://logs.openstack.org/36/371836/1/check/gate-tempest-dsvm-full-ubuntu-xenial/087db0f/logs/stackviz/#/stdin/timeline
> 
> What jumps out is the amount of time that each worker is not running any
> tests. I would have expected quite a bit more concurrecy between the two
> workers in the chart, e.g. more overlap. I've noticed a simliar thing with
> my test runs using 4 workers.

So the gaps between tests aren't actually wait time, the workers are saturated
doing stuff during a run. Those gaps are missing data in the subunit streams
that are used as the soure of the data for rendering those timelines. The gaps
are where things like setUp, setUpClass, tearDown, tearDownClass, and
addCleanups which are not added to the subunit stream. It's just an artifact of
the incomplete data, not bad scheduling. This also means that testr does not
take into account any of the missing timing when it makes decisions based on
previous runs.

> 
> Can anyone explain why this is and where can I find out more information
> about the scheduler and what information it is using to decide when to
> dispatch tests? I'm already feeding my system a prior subunit stream to
> help influence the scheduler as my test run times are different due to the
> way our openstack implementation is architected. A simple round-robin
> approach is not the most efficeint in my case.

If you're curious about how testr does scheduling most of that happens here:

https://github.com/testing-cabal/testrepository/blob/master/testrepository/testcommand.py

One thing to remember is that testr isn't actually a test runner, it's a test
runner runner. It partitions the tests based on time information and passes
those to (multiple) test runner workers. The actual order of execution inside
those partitions is handled by the test runner itself. (in our case subunit.run)

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Announcing firehose.openstack.org

2016-09-19 Thread Matthew Treinish
Hi Everyone,

I wanted to announce the addition of a new infra service we started running
recently, the firehose. Running at firehose.openstack.org the firehose is an
MQTT based unified message bus for infra services. The concept behind is it that
there are a lot of infra services many of which emit events, however there
wasn't a single place to go to for anything. If you have a script or tool which
is listening for events from an infra service, or has a poll loop (like anything
using gerritlib) there is now a single place to go for consuming those messages.
We also have 2 interfaces to subscribe to topics on the firehose, either via the
MQTT protocol on the default port or via a websockets over port 80. The 
websocket
interface should enable easier consumption for people on networks with stricter
firewalls.

Right now the only things sending messages into the firehose is the gerrit event
stream (via germqtt) and launchpad bug events (via lpmqtt) but several other
efforts are in progress to add additional services to the firehose. The plan is
to expand what publishes events to include all the infra services. This way
there is a single location for anything that needs to consume events.

There is also an example on the consuming side with gerritbot, which now has 
support for subscribing to the gerrit event stream over MQTT. You can see the
patch here:

http://git.openstack.org/cgit/openstack-infra/gerritbot/commit/?id=7c6e57983d499b16b3fabb864cf3bd5cfea8ab63

For those interested the spec detailing all the pieces is here:

http://specs.openstack.org/openstack-infra/infra-specs/specs/firehose.html

and the docs are available here:

http://docs.openstack.org/infra/system-config/firehose.html

which contain details on how the services are setup and includes some basic
steps and examples on how to start consuming events from the firehose.

Thanks,

Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [QA] Running Tempest tests for a customized cloud

2016-08-18 Thread Matthew Treinish
On Wed, Aug 17, 2016 at 12:27:29AM +, Elancheran Subramanian wrote:
> Hello Punal,
> We do support both V2 and V3, that’s just a example I’ve stated BTW. We do 
> have our own integration tests which are pretty much covers all our 
> integration points with openstack. But we would like to leverage the tempest 
> while doing our upstream merge for openstack components in CI.
> 
> I believe the tests support the include list, how can I exclude test? Any 
> pointer would be a great help.
> 

It depends on the test runner you're using. The tempest run command supports
several methods of excluding tests:

http://docs.openstack.org/developer/tempest/run.html#test-selection

If you use ostestr it offers the same options that tempest run offers:

http://docs.openstack.org/developer/os-testr/ostestr.html#test-selection

If you're using testr plain it can take a regex filter that will run any tests
that match regex filter:

http://testrepository.readthedocs.io/en/latest/MANUAL.html#running-tests

You can also use a negative lookahead in the regex to exclude something. This
is how tempest's tox jobs skip slow tests in the normal gate run jobs:

http://git.openstack.org/cgit/openstack/tempest/tree/tox.ini#n80

Other test runners have similar selection mechanisms.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [QA] Running Tempest tests for a customized cloud

2016-08-18 Thread Matthew Treinish
On Tue, Aug 16, 2016 at 10:40:07PM +, Elancheran Subramanian wrote:
> Hello There,
> I’m currently playing with using Tempest as our integration tests for our 
> internal and external clouds, facing some issues with api which are not 
> supported in our cloud. For ex, listing domains isn’t supported for any user, 
> due to this V3 Identity tests are failing. So I would like to know what’s the 
> best practice? Like fix those tests, and apply those fix as patch? Or just 
> exclude those tests?
> 

It really depends on the configuration of your cloud. It could be a bug in
tempest or it could be a tempest configuration issue. You also could have
configured your cloud in a way that is invalid and breaks API end user
expectations from tempest's POV. It's hard to say without out knowing the
specifics of your deployment.

I'd start with filing a bug with more details. You can file a bug here:

https://bugs.launchpad.net/tempest

If it's a valid tempest bug then submitting a patch to fix the bug in tempest is
the best path forward.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Let's drop the postgresql gate job

2016-08-18 Thread Matthew Treinish
On Thu, Aug 18, 2016 at 11:33:59AM -0500, Matthew Thode wrote:
> On 08/18/2016 10:00 AM, Matt Riedemann wrote:
> > It's that time of year again to talk about killing this job, at least
> > from the integrated gate (move it to experimental for people that care
> > about postgresql, or make it gating on a smaller subset of projects like
> > oslo.db).
> > 
> > The postgresql job used to have three interesting things about it:
> > 
> > 1. It ran keystone with eventlet (which is no longer a thing).
> > 2. It runs the n-api-meta service rather than using config drive.
> > 3. It uses postgresql for the database.
> > 
> > So #1 is gone, and for #3, according to the April 2016 user survey (page
> > 40) [1], 4% of reporting deployments are using it in production.
> > 
> > I don't think we're running n-api-meta in any other integrated gate
> > jobs, but I'm pretty sure there is at least one neutron job out there
> > that's running with it that way. We could also consider making the
> > nova-net dsvm full gate job run n-api-meta, or vice-versa with the
> > neutron dsvm full gate job.
> > 
> > We also have to consider that with HP public cloud being gone as a node
> > provider and we've got fewer test nodes to run with, we have to make
> > tough decisions about which jobs we're going to run in the integrated gate.
> > 
> > I'm bringing this up again because Nova has a few more jobs it would
> > like to make voting on it's repo (neutron LB and live migration, at
> > least in the check queue) but there are concerns about adding yet more
> > jobs that each change has to get through before it's merged, which means
> > if anything goes wrong in any of those we can have a 24 hour turnaround
> > on getting an approved change back through the gate.
> > 
> > [1]
> > https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf
> > 
> 
> 
> I don't know about nova, but at least in keystone when I was testing
> upgrades I found an error that had to be fixed before release of Mitaka.
>  Guess I'm part of the 4% :(

That's not what we're talking about here. Your issue most likely stemmed from
keystone's lack of tests that do DB migrations with real data. The proposal here
is not talking about stopping all testing on postgres, just removing the 
postgres
dsvm tempset jobs from the integrated gate. Those jobs have very limited
additional value for the reasons Matt outlined. They also clearly did not catch
your upgrade issue and most (if not all) of the other postgres issues are caught
with are more targeted testing of the db layer done in the project repos.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-10 Thread Matthew Treinish
On Wed, Aug 10, 2016 at 09:52:55AM -0700, Clay Gerrard wrote:
> On Wed, Aug 10, 2016 at 7:42 AM, Ben Swartzlander 
> wrote:
> 
> >
> > A big source of problems IMO is that tempest doesn't have stable branches.
> > We use the master branch of tempest to test stable branches of other
> > projects, and tempest regularly adds new features.
> >
> 
> How come not this +1000 just fix this?

Well, mostly because it's actually not a problem and ignores the history on why
tempest is branchless. We actually used to do this pre-icehouse and it actually
made things much worse. What was happening back then was we didn't have enough
activity to keep the stable branches working at all. So we'd go very long
periods where nothing actually could land. We also often wedged ourselves where
master tempest changed to a point where we couldn't sanely backport a fix to
the stable branch. This would often mean that up until right before a stable
release things just couldn't land until someone was actually motivated to try
and dig us out. But, what more often happened was we had to just disable tempest
on the branch, because we didn't have another option. It also turns out that
having different tests across a release boundary meant we weren't actually
validating that the OpenStack APIs were consistent and worked the same. We had
many instances where a projects API just changed between release boundaries,
which violates our API consistency and backwards compatibility guidelines.
Tempest is about verifying the API and just like an other API client it should
work against any OpenStack release.

Doing this has been a huge benefit for making things actually work on the stable
branches. (in fact just thinking back about how broken everything was all the
time back then makes me appreciate it even more) We also test every incoming
tempest change on all the stable branches, and nothing can land unless it works
on all supported branches. It means we have a consistent and stable api across
releases. We do have occasional bugs where a new test or change in tempest
triggers a new race in a project's stable branch. But, that's a real bug and
normally a fix can be backported.(which is the whole point of doing stable
branches) If it can't and the race is bad enough to actively interfere with
things, we have a mechanism to skip the test. (but that's normally a last
resort) Although, these issues tend to come up pretty infrequently in practice,
especially as we slowly ramp up the stability of things over time.

FWIW, a lot of these details are covered in the original spec for implementing
this: (although it's definitely assumes a bit of prior knowledge about the
state of things going on when it was written)

http://specs.openstack.org/openstack/qa-specs/specs/tempest/implemented/branchless-tempest.html


-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-10 Thread Matthew Treinish
On Wed, Aug 10, 2016 at 09:56:09AM -0700, Clay Gerrard wrote:
> On Mon, Aug 8, 2016 at 8:31 AM, Matthew Treinish <mtrein...@kortar.org>
> wrote:
> 
> > When we EOL a branch all of the infrastructure for running any ci against
> > it goes away.
> 
> 
> But... like... version control?  I mean I'm sure it's more complicated than
> that or you wouldn't have said this - but I don't understand, sorry.
> 
> Can you elaborate on this?
> 

I did in other parts of the thread. The thing is you're only thinking about the
CI system as involving a single project and repo. But, to keep the gate running
involves a lot of coordination between multiple projects that are tightly
coupled. Things like an entire extra set of job definitions in zuul, a branch on
global requirements, a devstack branch, extra devstack-gate logic, a bunch of
extra config options for skips in tempest, extra node types, etc. Keeping all
those things working together is a big part of what stable maint actually
entails. When we EOL a branch most of the mechanics involved are a matter of
cleaning up all of those pieces everywhere because we don't have the bandwidth
or resources to continue keeping it all working. That's why at the EOL we tag 
the branch tip and then delete it. Leaving the branch around advertises that
we're in a position to accept new patches to it, which we aren't after the EOL.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-09 Thread Matthew Treinish
On Tue, Aug 09, 2016 at 09:16:02PM -0700, John Griffith wrote:
> On Tue, Aug 9, 2016 at 7:21 PM, Matthew Treinish <mtrein...@kortar.org>
> wrote:
> 
> > On Tue, Aug 09, 2016 at 05:28:52PM -0700, John Griffith wrote:
> > > On Tue, Aug 9, 2016 at 4:53 PM, Sean McGinnis <sean.mcgin...@gmx.com>
> > wrote:
> > >
> > > > .
> > > > >
> > > > > Mike, you must have left the midcycle by the time this topic came
> > > > > up. On the issue of out-of-tree drivers, I specifically offered this
> > > > > proposal (a community managed mechanism for distributing driver
> > > > > bugfix backports) as an compromise alternative to try to address the
> > > > > needs of both camps. Everyone who was in the room at the time (plus
> > > > > DuncanT who wasn't) agreed that if we had that (a way to deal with
> > > > > backports) that they wouldn't want drivers out of the tree anymore.
> > > > >
> > > > > Your point of view wasn't represented so go ahead and explain why,
> > > > > if we did have a reasonable way for bugfixes to get backported to
> > > > > the releases customers actually run (leaving that mechanism
> > > > > unspecified for the time being), that you would still want the
> > > > > drivers out of the tree.
> > > > >
> > > > > -Ben Swartzlander
> > > >
> > > > The conversation about this started around the 30 minute point here if
> > > > anyone is interested in more of the background discussion on this:
> > > >
> > > > https://www.youtube.com/watch?v=g3MEDFp08t4
> > > >
> > > > 
> > __
> > > > OpenStack Development Mailing List (not for usage questions)
> > > > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> > unsubscribe
> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > >
> > >
> > > ​I don't think anybody is whining at all here, we had a fairly productive
> > > discussion at the mid-cycle surrounding this topic and I do think there
> > are
> > > some valid advantages to this approach regardless of the QA question.
> > Note
> > > that it's been pointed out we weren't talking about or considering
> > > advertising this *special* branch as tested by the standard means or gate
> > > CI etc.
> > >
> > > We did discuss this though mostly in the context of helping the package
> > > maintainers and distributions.  The fact is that many of us currently
> > offer
> > > backports of fixes in our own various github accounts.  That's fine and
> > it
> > > works well for many.  The problem we were trying to address however is
> > that
> > > this practice is rather problematic for the distros.  For example RHEL,
> > > Helion or Mirantis are most certainly not going to run around cherry
> > > picking change sets from random github repos scattered around.
> > >
> > > The context of the discussion was that by having a long lived *driver*
> > > (emphasis on driver) branch there would be a single location and an
> > *easy*
> > > method of contact and communication regarding fixes to drivers that may
> > be
> > > available for stable branches that are no longer supported.  This puts
> > the
> > > burden of QA/Testing mostly on the vendors and distros, which I think is
> > > fine.  They can either choose to work with the Vendor and verify the
> > > versions for backport on a regular basis, or they can choose to ignore
> > them
> > > and NOT provide them to their customers.
> > >
> > > I don't think this is an awful idea, and it's very far from the "drivers
> > > out of tree" discussion.  The feedback from the distro maintainers during
> > > the week was that they would gladly welcome a model where they could pull
> > > updates from a single driver branch on a regular basis or as needed for
> > > customers that are on *unsupported* releases and for whom a fix exists.
> > > Note that support cycles are not the same for the distros as they are of
> > > the upstream community.  This is in no way proposing a change to the
> > > existing support time frames or processes we have now, and in that way it
> > > differs significantly from proposals and discussions we've had in the
> > past.
> > >
> > > The b

Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-09 Thread Matthew Treinish
On Wed, Aug 10, 2016 at 01:39:55AM +, Jeremy Stanley wrote:
> On 2016-08-09 15:56:57 -0700 (-0700), Mike Perez wrote:
> > As others have said and as being a Cinder stable core myself, the status-quo
> > and this proposal itself are terrible practices because there is no testing
> > behind it, thereby it not being up to the community QA standards set.
> [...]
> 
> In fairness to Sean, this thread stared because he was asking in
> #openstack-infra for help creating some long-lived driver fix
> branches because he felt it was against stable branch policy to
> backport bugfixes for drivers. Since this was an unprecedented
> request, I recommended he first raise the topic on this list to find
> out if this is a common problem across other projects and whether
> stable branch policy should be revised to permit driver fixes.
> 
> There was a brief discussion of what to do if the Cinder team wanted
> driver fixes to EOL stable series, and I still firmly believe effort
> there is better expended attempting to help extend stable branch
> support since "convenience to package maintainers" (what he said
> this plan was trying to solve) is the primary reason we provide
> those branches to begin with.
> 
> So I guess what I'm asking: If stable branches exist as a place for
> package maintainers to collaborate on a common set of backported
> fixes, and are not actually usable to that end, why do we continue
> to provide them? Should we just stop testing stable branches
> altogether since their primary value would (as is suggested) be
> served even without our testing efforts? Ceasing any attempts to
> test backports post-release would certainly free up a lot of our
> current upstream effort and resources we could redirect into other
> priorities. Or is it just stable branch changes for drivers we
> shouldn't bother testing?

Well, at a bare minimum we need the previous release around to test upgrades
which is very important. But, this exact argument has come up in the past when
we've had this exact discussion before. (it's been at least once a cycle for as
long as I can remember) I might have actually proposed having only one stable
branch at a time during one of the past summits. But, every time it's been
proposed in the past people come out of the woodwork and say there is value in
continuing them, so we've continued maintaining them. I do agree though, that it
does feel like there is a disconnect between downstream consumers and upstream
when we get proposals like this while at the same time we have had recent quite
lengthy discussions where we decided not to extend our support windows because
it's not feasible given the level of activity.

As for not testing stable changes for drivers I fundamentally disagree with any
approach that puts us in a situation where we are landing patches in an
OpenStack project that does not have any testing. This is a core part of doing
development "the OpenStack way", (to quote the governance repo) if the driver
code is part of the project then we need to be testing it.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-09 Thread Matthew Treinish
On Tue, Aug 09, 2016 at 05:28:52PM -0700, John Griffith wrote:
> On Tue, Aug 9, 2016 at 4:53 PM, Sean McGinnis  wrote:
> 
> > .
> > >
> > > Mike, you must have left the midcycle by the time this topic came
> > > up. On the issue of out-of-tree drivers, I specifically offered this
> > > proposal (a community managed mechanism for distributing driver
> > > bugfix backports) as an compromise alternative to try to address the
> > > needs of both camps. Everyone who was in the room at the time (plus
> > > DuncanT who wasn't) agreed that if we had that (a way to deal with
> > > backports) that they wouldn't want drivers out of the tree anymore.
> > >
> > > Your point of view wasn't represented so go ahead and explain why,
> > > if we did have a reasonable way for bugfixes to get backported to
> > > the releases customers actually run (leaving that mechanism
> > > unspecified for the time being), that you would still want the
> > > drivers out of the tree.
> > >
> > > -Ben Swartzlander
> >
> > The conversation about this started around the 30 minute point here if
> > anyone is interested in more of the background discussion on this:
> >
> > https://www.youtube.com/watch?v=g3MEDFp08t4
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> ​I don't think anybody is whining at all here, we had a fairly productive
> discussion at the mid-cycle surrounding this topic and I do think there are
> some valid advantages to this approach regardless of the QA question.  Note
> that it's been pointed out we weren't talking about or considering
> advertising this *special* branch as tested by the standard means or gate
> CI etc.
> 
> We did discuss this though mostly in the context of helping the package
> maintainers and distributions.  The fact is that many of us currently offer
> backports of fixes in our own various github accounts.  That's fine and it
> works well for many.  The problem we were trying to address however is that
> this practice is rather problematic for the distros.  For example RHEL,
> Helion or Mirantis are most certainly not going to run around cherry
> picking change sets from random github repos scattered around.
> 
> The context of the discussion was that by having a long lived *driver*
> (emphasis on driver) branch there would be a single location and an *easy*
> method of contact and communication regarding fixes to drivers that may be
> available for stable branches that are no longer supported.  This puts the
> burden of QA/Testing mostly on the vendors and distros, which I think is
> fine.  They can either choose to work with the Vendor and verify the
> versions for backport on a regular basis, or they can choose to ignore them
> and NOT provide them to their customers.
> 
> I don't think this is an awful idea, and it's very far from the "drivers
> out of tree" discussion.  The feedback from the distro maintainers during
> the week was that they would gladly welcome a model where they could pull
> updates from a single driver branch on a regular basis or as needed for
> customers that are on *unsupported* releases and for whom a fix exists.
> Note that support cycles are not the same for the distros as they are of
> the upstream community.  This is in no way proposing a change to the
> existing support time frames or processes we have now, and in that way it
> differs significantly from proposals and discussions we've had in the past.
> 
> The basic idea here was to eliminate the proliferation of custom backport
> patches scattered all over the web, and to ease the burden for distros and
> vendors in supporting their customers.  I think there may be some concepts
> to iron out and I certainly understand some of the comments regarding being
> disingenuous regarding what we're advertising.  I think that's a
> misunderstanding of the intent however, the proposal is not to extend the
> support life of stable from an upstream or community perspective but
> instead the proposal is geared at consolidation and tracking of drivers.

I fully understood the proposal but I still think you're optimizing for the
wrong thing. We have a community process for doing backports and maintaining
released versions of OpenStack code. The fundamental problem here is actually
that the parties you've identified aren't actively involved in stable branch
maintenance. The stable maint team and policy was primarily created as a
solution to the exact problem you outlined above, that it providing a place for
vendors, distros, etc to collaborate on backports and stable branch maint.
while following our communities process. Regardless of framing it as being only
for drivers it doesn't change that you're talking about the same thing. (this is
why in-tree vs 

Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-08 Thread Matthew Treinish
On Mon, Aug 08, 2016 at 07:40:56PM +0300, Duncan Thomas wrote:
> On 8 August 2016 at 18:31, Matthew Treinish <mtrein...@kortar.org> wrote:
> 
> >
> > This argument comes up at least once a cycle and there is a reason we
> > don't do
> > this. When we EOL a branch all of the infrastructure for running any ci
> > against
> > it goes away. This means devstack support, job definitions, tempest skip
> > checks,
> > etc. Leaving the branch around advertises that you can still submit
> > patches to
> > it which you can't anymore. As a community we've very clearly said that we
> > don't
> > land any code without ensuring it passes tests first, and we do not
> > maintain any
> > of the infrastructure for doing that after an EOL.
> >
> >
> Ok, to turn the question around, we (the cinder team) have recognised a
> definite and strong need to have somewhere for vendors to share patches on
> versions of Cinder older than the stable branch policy allows.
> 
> Given this need, what are our options?
> 
> 1. We could do all this outside Openstack infrastructure. There are
> significant downsides to doing so from organisational, maintenance, cost
> etc points of view. Also means that the place vendors go for these patches
> is not obvious, and the process for getting patches in is not standard.

This is probably your only viable option. As a community we've hit this boundary
many times. Everyone claims to want longer support windows but when it comes
down to it there is very little activity in making things work on stable
branches. Our support window is at it's maximum viable length now, and that's
unlikely to change anytime soon. We had a discussion on this exact topic at
summit:

https://etherpad.openstack.org/p/r.ddabf5c865d6f77740bcfbc112ed391c

> 
> 2. We could have something not named 'stable' that has looser rules than
> stable branches,, maybe just pep8 / unit / cinder in-tree tests. No
> devstack.

This is not an option, as I said before this isn't feasible. All the
infrastructure for running jobs on the old branches goes away. It's much more
than you realize is actually there. Including things like global requirements,
job definitions, and old node types. A lot of work goes into keeping all of
this running, and it's all interconnected. There is a reason we EOL a branch,
it's not to be vindictive, it's because keeping it running is too much work for
the small number of people who fix things. (and to a lesser extent an increased
burden our CI resources)

Ignoring all that, this is also contrary to how we perform testing in OpenStack.
We don't turn off entire classes of testing we have so we can land patches,
that's just a recipe for disaster.

-Matt Treinish

> 
> 3. We go with the Neutron model and take drivers out of tree. This is not
> something the cinder core team are in favour of - we see significant value
> in the code review that drivers currently get - the code quality
> improvements between when a driver is submitted and when it is merged are
> sometimes very significant. Also, taking the code out of tree makes it
> difficult to get all the drivers checked out in one place to analyse e.g.
> how a certain driver call is implemented across all the drivers, when
> reasoning or making changes to core code.
> 
> Given we've identified a clear need, and have repeated rejected one
> solution (take drivers out of tree - it has been discussed at every summit
> and midcycle for 3+ cycles), what positive suggestions can people make?
> 


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] [stable] [all] Changing stable policy for drivers

2016-08-08 Thread Matthew Treinish
On Mon, Aug 08, 2016 at 09:47:53AM -0500, Sean McGinnis wrote:
> > 
> > Unless you manage to get it approved for the global policy, I think
> > you will effectively make your stable:follows-policy tag obsolete,
> > and then it should be removed from your project. Read the
> > requirements:
> > 
> > https://governance.openstack.org/reference/tags/stable_follows-policy.html#requirements
> > 
> > Support phases are part of the stable policy, and so if you don’t
> > mostly adhere to their definitions, you should not carry the tag.
> > Which is fine with me, it’s up to Cinder team to decide whether it’s
> > worth it.
> 
> I think "currently active stable branches" is key there. These branches
> would no longer be "currently active". They would get an EOL tag when it
> reaches the end of the support phases. We just wouldn't delete the
> branch.

This argument comes up at least once a cycle and there is a reason we don't do
this. When we EOL a branch all of the infrastructure for running any ci against
it goes away. This means devstack support, job definitions, tempest skip checks,
etc. Leaving the branch around advertises that you can still submit patches to
it which you can't anymore. As a community we've very clearly said that we don't
land any code without ensuring it passes tests first, and we do not maintain any
of the infrastructure for doing that after an EOL. 

> 
> Again, this is only for driver code. We would not allow backports to the
> core Cinder codebase.

This distinction does not actually matter, you're still trying to backport code
without the ability to run tests in the gate. The fact that it's part of a
driver doesn't really change anything.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][grafana][infra] how to read grafana

2016-08-08 Thread Matthew Treinish
On Mon, Aug 08, 2016 at 02:40:31PM +0200, Ihar Hrachyshka wrote:
> Hi,
> 
> I was looking at grafana today, and spotted another weirdness.
> 
> See the periodic jobs dashboard:
> 
> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=4
> 
> Currently it shows for me 100% failure rate for py34/oslo-master job,
> starting from ~Aug 3. But when I go to openstack-health, I don’t see those
> runs at all:
> 
> http://status.openstack.org/openstack-health/#/job/periodic-neutron-py34-with-neutron-lib-master
> 
> (^ The last run is July 31.)
> 
> But then when I drill down into files, I can see more recent runs, like:
> 
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/?C=M;O=A
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/testr_results.html.gz
> 
> The last link points to a run from yesterday. And as you can see it is
> passing.

That run isn't actually from yesterday, it's from July 30th. The directory shows
a recent date, but the last modified dates for the individual files is older:

http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/

The openstack-health data goes up until the job started failing, this is likely
because the failures occur early enough in the test run that no subunit output
is generated for the run.

> 
> So, what’s wrong with the grafana dashboard? And why doesn’t
> openstack-health show the latest runs?
> 

On the openstack-health side it looks like you're running into an issue with
using subunit2sql as the primary data source there. If you look at an example
output from what's not in openstack-health, like:

http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/37cd5eb/console.html.gz

You'll see that the failure is occuring before any subunit output is generated.
(during the discovery phase of testr) If there is no subunit file in the log
output for the run, then there is nothing to populate the subunit2sql DB with.
The grafana/graphite data doesn't share this limitation because it gets
populated directly by zuul.

This is a known limitation with openstack-health right, and the plan to solve it
is to add a zuul sql data store that we can query like subunit2sql for job level
information, and then use subunit2sql for more fine grained details. The work on
that currently depends on: https://review.openstack.org/#/c/22/ which adds
the datastore to zuul. Once that lands we can work on the openstack-health side
consume that data in conjunction with subunit2sql.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Next steps for proxy API deprecation

2016-07-26 Thread Matthew Treinish
On Tue, Jul 26, 2016 at 01:21:53PM -0400, Sean Dague wrote:
> On 07/26/2016 01:14 PM, Matt Riedemann wrote:
> > On 7/26/2016 11:59 AM, Matt Riedemann wrote:
> >> Now that the 2.36 microversion change has merged [1], we can work on the
> >> python-novaclient changes for this microversion.
> >>
> >> At the midcycle we agreed [2] to also return a 404 for network APIs,
> >> including nova-network (which isn't a proxy), for consistency and
> >> further signaling that nova-network is going away.
> >>
> >> In the client, we agreed to soften the impact for network CLIs by
> >> determining if the latest microversion supported will fail (so will we
> >> send >=2.36) and rather than fail, send 2.35 instead (if the user didn't
> >> specifically specify a different version). However, we'd emit a warning
> >> saying this is deprecated and will go away in the first major client
> >> release (in Ocata? after nova-network is removed? after Ocata is
> >> released?).
> >>
> >> We should probably just deprecate any CLIs/APIs in python-novaclient
> >> today that are part of this server side API change, including network
> >> CLIs/APIs in novaclient. The baremetal and image proxies in the client
> >> are already deprecated, and the volume proxies were already removed.
> >> That leaves the network proxies in the client.
> >>
> >> From my notes, Dan Smith was going to work on the novaclient changes for
> >> 2.36 to not fail and use 2.35 - unless anyone else wants to volunteer to
> >> do that work (please speak up).
> >>
> >> We can probably do the network CLI/API deprecations in the client in
> >> parallel to the 2.36 support, but need someone to step up for that. I'll
> >> try to get it started this week if no one else does.
> >>
> >> [1] https://review.openstack.org/#/c/337005/
> >> [2] https://etherpad.openstack.org/p/nova-newton-midcycle
> >>
> > 
> > I forgot to mention Tempest. We're going to have to probably put a
> > max_microversion cap in several tests in Tempest to cap at 2.35 (or
> > change those to use Neutron?). There are also going to be some response
> > schema changes like for quota usage/limits, I'm not sure if anyone is
> > looking at this yet. We could also get it done after feature freeze on
> > 9/2, but I still need to land the get-me-a-network API change which is
> > microversion 2.37 and has it's own Tempest test, although that test
> > relies on Neutron so I might be OK for the most part.
> 
> Is that strictly true? We could also just configure all the jobs for
> Nova network to set max microversion at 2.35. That would probably be
> more straight forward way of approaching this, and make it a bit more
> clear how serious we are here.
> 

Yeah, for the gate that should work. By default tempest sends the minimum
microversion based on the config and the test requirements. So we should
never send 2.36 unless the test says it's minimum required microversion
is >=2.36. Setting the max at 2.35 would mean we skip those tests. My bigger
concern is for people using tempest outside of the gate. I still think we
should set a max microversion on any test classes that call nova's network
apis to make sure they're properly skipped just in case someone sets the
min microversion in the tempest config at 2.36. (assuming such a test class
exists at all, I don't actually know) Unless you thinking failing there is the
correct way to do it?

-Matt Treinish



signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Next steps for proxy API deprecation

2016-07-26 Thread Matthew Treinish
On Tue, Jul 26, 2016 at 12:14:03PM -0500, Matt Riedemann wrote:
> On 7/26/2016 11:59 AM, Matt Riedemann wrote:
> > Now that the 2.36 microversion change has merged [1], we can work on the
> > python-novaclient changes for this microversion.
> > 
> > At the midcycle we agreed [2] to also return a 404 for network APIs,
> > including nova-network (which isn't a proxy), for consistency and
> > further signaling that nova-network is going away.
> > 
> > In the client, we agreed to soften the impact for network CLIs by
> > determining if the latest microversion supported will fail (so will we
> > send >=2.36) and rather than fail, send 2.35 instead (if the user didn't
> > specifically specify a different version). However, we'd emit a warning
> > saying this is deprecated and will go away in the first major client
> > release (in Ocata? after nova-network is removed? after Ocata is
> > released?).
> > 
> > We should probably just deprecate any CLIs/APIs in python-novaclient
> > today that are part of this server side API change, including network
> > CLIs/APIs in novaclient. The baremetal and image proxies in the client
> > are already deprecated, and the volume proxies were already removed.
> > That leaves the network proxies in the client.
> > 
> > From my notes, Dan Smith was going to work on the novaclient changes for
> > 2.36 to not fail and use 2.35 - unless anyone else wants to volunteer to
> > do that work (please speak up).
> > 
> > We can probably do the network CLI/API deprecations in the client in
> > parallel to the 2.36 support, but need someone to step up for that. I'll
> > try to get it started this week if no one else does.
> > 
> > [1] https://review.openstack.org/#/c/337005/
> > [2] https://etherpad.openstack.org/p/nova-newton-midcycle
> > 
> 
> I forgot to mention Tempest. We're going to have to probably put a
> max_microversion cap in several tests in Tempest to cap at 2.35 (or change
> those to use Neutron?). There are also going to be some response schema
> changes like for quota usage/limits, I'm not sure if anyone is looking at
> this yet. We could also get it done after feature freeze on 9/2, but I still
> need to land the get-me-a-network API change which is microversion 2.37 and
> has it's own Tempest test, although that test relies on Neutron so I might
> be OK for the most part.

The only case where this will matter is for test classes that have an unbound
max microversion, which should be very few. It's only classes that specify a
higher minimum. The simple way around that is just change the max microversion
for those classes to 2.35 and it won't ever send a 2.36 request. For example:

http://git.openstack.org/cgit/openstack/tempest/tree/tempest/api/compute/volumes/test_attach_volume.py#n187

change 'latest' to 2.35 if it calls nova-network anywhere in that call path.
(as an aside to people unfamiliar with microversions in tempest that doesn't
mean send 'latest' on the wire but that all microversions are valid for this
test, ie it won't skip based on the min microversion in config)

However, this does raise a bigger issue about removing nova-network next cycle.
It's still the default a lot of places. We can't remove the tempest support
until newton eol (assuming an ocata removal) So we'll likely have to add a flag
or 2 to make sure we never use it on master, there will also be devstack,
devstack-gate, etc changes that will have to happen first too. But this is all
probably a topic better suited for another thread or even a summit discussion.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][qa] When do we need tests for microversions in Tempest?

2016-07-13 Thread Matthew Treinish
On Wed, Jul 13, 2016 at 09:55:33PM -0500, Matt Riedemann wrote:
> There are several changes in Tempest right now trying to add response schema
> validation for the 2.26 microversion which added server tags to the server
> GET response. This is needed for anything testing a microversion >=2.26,
> which several people are trying to add.
> 
> We have a similar issue with the 2.3 microversion which is really a bug, but
> only exposed in jobs that have run_validation=True which is only in a
> non-voting job right now.
> 
> I've mostly been debating this in this change:
> 
> https://review.openstack.org/#/c/233176/
> 
> I've added an item to the nova midcycle meetup agenda to talk about the plan
> for handling microversion testing in tempest for nova changes, specifically
> around API response validation.
> 
> I agree that nova doesn't test response schema validation in tree, so doing
> it in tempest is good.
> 
> But I'm not sure that we need a new set of tempest tests for every
> microversion change in nova, e.g. if it's only touching the API and
> database, like server tags, we can test that in nova.

I largely agree with this, we don't need 100% coverage of every microversion.
Especially if it's just an API that's just extra metadata in the DB.

> 
> It's also not great having several changes in flight at the same time to
> tempest trying to add the same 2.26 response schema because it wasn't added
> the at the same time the 2.26 API merged.

I agree it's not ideal, but it's not like there is a huge burden on rebasing,
no more than for developers having to bump their microversions because another
bp landed and took the microversion they were using.


> 
> I also wonder what it means if someone configures max_microversion in
> tempest.conf to something we don't test, like say 2.11, what blows up? For
> example, we know that we don't have response validation for 2.3 so some
> tests are broken when you run with ssh validation and microversion>=2.3.

We can easily add a job that changes the min microversion config flag in tempest
to something higher than 2.1. This will ensure we send a higher microversion
everywhere and will catch these issues sooner. But, I'm not sure we want to do
that on a normal check/gate job.

> 
> So I'm thinking we should:
> 
> 1. Always add a schema change to Tempest if a microversion changes a
> response.

The problem with this is we shouldn't land a schema change by itself in tempest.
Until we have something using the schema we have no verification that they
actually work. We can and will land incorrect schemas if we did this. That's why
there is a pretty strong policy of only landing code that is run in CI somewhere
for Tempest.

> 
> 2. Only add tests to Tempest for a microversion change if it can't be tested
> within nova, e.g. you actually need to create a server, or use other
> services like glance/cinder/neutron.

+1

-Matt Treinish

> 
> mtreinish and sdague will be at the nova midcycle so hopefully they can
> represent for the QA team.
>


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-07-06 Thread Matthew Treinish
On Wed, Jul 06, 2016 at 11:41:56AM -0500, Matt Riedemann wrote:
> On 7/6/2016 10:55 AM, Matthew Treinish wrote:
> > 
> > Well, for better or worse rootwrap filters are put in /etc and treated like 
> > a
> > config file. What you're essentially saying is that it shouldn't be config 
> > and
> > just be in code. I completely agree with that being what we want 
> > eventually, but
> > it's not how we advertise it today. Privsep sounds like it's our way of 
> > making
> > this migration. But, it doesn't change the status quo where it's this hybrid
> > config/code thing today, like policy was in nova before:
> > 
> > http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/policy-in-code.html
> > 
> > (which has come up before as another tension point in the past during 
> > upgrades)
> > I don't think we should break what we're currently enforcing today because 
> > we
> > don't like the model we've built. We need to handle the migration to the new
> > better thing gracefully so we don't break people who are relying on our 
> > current
> > guarantees, regardless of how bad they are.
> > 
> > -Matt Treinish
> > 
> > 
> 
> I just wonder how many deployments are actually relying on this, since as
> noted elsewhere in this thread we don't really enforce this for all things,
> only what happens to get tested in our CI system, e.g. the virtuozzo
> rootwrap filters that don't have grenade testing.

Sure, our testing coverage here is far from perfect, that's never been in
dispute. It's always been best effort (which there has been limited in this
space) like I'm not aware of anything doing any upgrade testing with
virtuozzo enabled, or any of the other random ephemeral storage backends,
**cough** ceph **cough**.  But, as I said before just because we don't catch all
the issues isn't a reason to throw everything out the window.

> 
> Which is also why I'd like to get some operator perspective on this.
> 

I think what we'll find is the people who rely on this don't even realize it.
(which is kinda the point) I expect the people on the ops list are knowledgeable
enough and have enough experience to figure this kind of issue out and just
expect it during the course of an upgrade. This is more likely a trap for young
players who haven't even thought about this as being a potential issue before.
I don't think there is any disagreement we should move to something better in
this space. But, this is something we've said we would guarantee and I don't
think we should break that in the process of moving to the new better thing.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-07-06 Thread Matthew Treinish
On Wed, Jul 06, 2016 at 06:20:30PM +0200, Thierry Carrez wrote:
> Matthew Treinish wrote:
> > > [...]
> > > Am I missing something else here?
> > 
> > Well, for better or worse rootwrap filters are put in /etc and treated like 
> > a
> > config file. What you're essentially saying is that it shouldn't be config 
> > and
> > just be in code. I completely agree with that being what we want 
> > eventually, but
> > it's not how we advertise it today.
> 
> Well, some (most ?) distros ship them as code rather than configuration,
> under /usr/share rather than under /etc. So one may argue that the issue is
> that devstack is installing them under /etc :)
> 

Devstack doesn't do anything special here, it just uses the project defaults.
For most cases that's what devstack strives to do that wherever possible. Your
issue is with nova and pretty much everything using rootwrap then. The fact that
most distros do this is just further indication that how we have things setup
today is the wrong way to handle this.

-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-07-06 Thread Matthew Treinish
On Wed, Jul 06, 2016 at 10:34:49AM -0500, Matt Riedemann wrote:
> On 6/27/2016 6:24 AM, Sean Dague wrote:
> > On 06/26/2016 10:02 PM, Angus Lees wrote:
> > > On Fri, 24 Jun 2016 at 20:48 Sean Dague  > > > wrote:
> > > 
> > > On 06/24/2016 05:12 AM, Thierry Carrez wrote:
> > > > I'm adding Possibility (0): change Grenade so that rootwrap
> > > filters from
> > > > N+1 are put in place before you upgrade.
> > > 
> > > If you do that as general course what you are saying is that every
> > > installer and install process includes overwriting all of rootwrap
> > > before every upgrade. Keep in mind we do upstream upgrade as offline,
> > > which means that we've fully shut down the cloud. This would remove 
> > > the
> > > testing requirement that rootwrap configs were even compatible 
> > > between N
> > > and N+1. And you think this is theoretical, you should see the patches
> > > I've gotten over the years to grenade because people didn't see an 
> > > issue
> > > with that at all. :)
> > > 
> > > I do get that people don't like the constraints we've self imposed, 
> > > but
> > > we've done that for very good reasons. The #1 complaint from 
> > > operators,
> > > for ever, has been the pain and danger of upgrading. That's why we are
> > > still trademarking new Juno clouds. When you upgrade Apache, you don't
> > > have to change your config files.
> > > 
> > > 
> > > In case it got lost, I'm 100% on board with making upgrades safe and
> > > straightforward, and I understand that grenade is merely a tool to help
> > > us test ourselves against our process and not an enemy to be worked
> > > around.  I'm an ops guy proud and true and hate you all for making
> > > openstack hard to upgrade in the first place :P
> > > 
> > > Rootwrap configs need to be updated in line with new rootwrap-using code
> > > - that's just the way the rootwrap security mechanism works, since the
> > > security "trust" flows from the root-installed rootwrap config files.
> > > 
> > > I would like to clarify what our self-imposed upgrade rules are so that
> > > I can design code within those constraints, and no-one is answering my
> > > question so I'm just getting more confused as this thread progresses...
> > > 
> > > ***
> > > What are we trying to impose on ourselves for upgrades for the present
> > > and near future (ie: while rootwrap is still a thing)?
> > > ***
> > > 
> > > A. Sean says above that we do "offline" upgrades, by which I _think_ he
> > > means a host-by-host (or even global?) "turn everything (on the same
> > > host/container) off, upgrade all files on disk for that host/container,
> > > turn it all back on again".  If this is the model, then we can trivially
> > > update rootwrap files during the "upgrade" step, and I don't see any
> > > reason why we need to discuss anything further - except how we implement
> > > this in grenade.
> > > 
> > > B. We need to support a mix of old + new code running on the same
> > > host/container, running against the same config files (presumably
> > > because we're updating service-by-service, or want to minimise the
> > > service-unavailability during upgrades to literally just a process
> > > restart).  So we need to think about how and when we stage config vs
> > > code updates, and make sure that any overlap is appropriately allowed
> > > for (expand-contract, etc).
> > > 
> > > C. We would like to just never upgrade rootwrap (or other config) files
> > > ever again (implying a freeze in as_root command lines, effective ~a
> > > year ago).  Any config update is an exception dealt with through
> > > case-by-case process and release notes.
> > > 
> > > 
> > > I feel like the grenade check currently implements (B) with a 6 month
> > > lead time on config changes, but the "theory of upgrade" doc and our
> > > verbal policy might actually be (C) (see this thread, eg), and Sean
> > > above introduced the phrase "offline" which threw me completely into
> > > thinking maybe we're aiming for (A).  You can see why I'm looking for
> > > clarification  ;)
> > 
> > Ok, there is theory of what we are striving for, and there is what is
> > viable to test consistently.
> > 
> > The thing we are shooting for is making the code Continuously
> > Deployable. Which means the upgrade process should be "pip install -U
> > $foo && $foo-manage db-sync" on the API surfaces and "pip install -U
> > $foo; service restart" on everything else.
> > 
> > Logic we can put into the python install process is common logic shared
> > by all deployment tools, and we can encode it in there. So all
> > installers just get it.
> > 
> > The challenge is there is no facility for config file management in
> > python native packaging. Which means that software which *depends* on
> > config files for new or even working features now moves from the camp of
> > CDable to manual upgrade needed. What you need to do is in 

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-17 Thread Matthew Treinish
On Fri, Jun 17, 2016 at 04:26:49PM -0700, Mike Perez wrote:
> On 15:12 Jun 14, Matthew Treinish wrote:
> > On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> > > > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> 
> 
> 
> > > We have basically three options.
> > > 
> > > 1. Tell deployers who are trying to do the right for their immediate
> > >users that they can't use the trademark.
> > > 
> > > 2. Flag the related tests or remove them from the DefCore enforcement
> > >suite entirely.
> > > 
> > > 3. Be flexible about giving consumers of Tempest time to meet the
> > >new requirement by providing a way to disable the checks.
> > > 
> > > Option 1 goes against our own backwards compatibility policies.
> > 
> > I don't think backwards compatibility policies really apply to what what 
> > define
> > as the set of tests that as a community we are saying a vendor has to pass 
> > to
> > say they're OpenStack. From my perspective as a community we either take a 
> > hard
> > stance on this and say to be considered an interoperable cloud (and to get 
> > the
> > trademark) you have to actually have an interoperable product. We slowly 
> > ratchet
> > up the requirements every 6 months, there isn't any implied backwards
> > compatibility in doing that. You passed in the past but not in the newer 
> > stricter
> > guidelines.
> > 
> > Also, even if I did think it applied, we're not talking about a change which
> > would fall into breaking that. The change was introduced a year and half ago
> > during kilo and landed a year ago during liberty:
> > 
> > https://review.openstack.org/#/c/156130/
> > 
> > That's way longer than our normal deprecation period of 3 months and a 
> > release
> > boundary.
> 
> 
> 
> What kind of communication happens today for these changes? There are so many
> channels/high volume mailing lists a downstream deployer is expected by the
> community to listening in. Some disruptive change being introduced a year or
> longer ago can still be communicated poorly.

Sure, I agree with that, but I don't think this was necessarily communicated
poorly. This has been already mentioned a few times on this thread but:

It was talked about on openstack-dev:

http://lists.openstack.org/pipermail/openstack-dev/2015-February/057613.html

On the defcore list: (which is definitely not high volume/traffic ML)

http://lists.openstack.org/pipermail/defcore-committee/2015-June/000849.html

This was also raised as an issue for 1 vendor ~6 months ago. (which is also the
same duration of the hard deadline being discussed in this thread):

http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html

IMHO, this was more than enough time to introduce a fix or workaround on their
end. Likely the easiest being just adding an extra nova-api endpoint with the
extensions disabled.

I don't have any links or other evidence to point to, but I know that this
exact topic has been discussed with with people from the vendors having
difficulties during sessions at at least one of the 2 summits and/or 2 QA
midcycle meetups since this change landed. I really don't think this is a
communication problem or unfair surprise for anyone.

There might be more too, but I don't remember every conversation that I've had
in the community over the past year. (or where to find the links to point to)

> 
> Just like we've done with Reno in communicating better about disruptive 
> changes
> in release notes, what tells teams like DefCore about changes with Tempest?
> (I looked in release.o.o for tempest release notes, although maybe I missed
> it?)

Yes, tempest has release notes, they are here:

http://docs.openstack.org/releasenotes/tempest/

But, the change in question predates the existence of reno and centralized
release notes for everything in openstack.

If this change were pushed today it would definitely be included in the release
notes. We also would do the same things, put it on the dev list, put it on the
defcore list. (although probably as a standalone thread this time) I also think
we'd probably ping hogepodge on irc about it too just so he could also raise it
up on the defcore side. (which we might have done back then too) Defcore and
tempest are tightly coupled so we do have pretty constant communication around
changes being made. But, I do admit we have better mechanisms in place today
to communicate this kind of change, and hopefully this would be handled better
now.

> 
> Since some members of DefCore have interest in making the market place 
> healthy

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-16 Thread Matthew Treinish
On Thu, Jun 16, 2016 at 02:15:47PM -0400, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2016-06-16 13:56:31 -0400:
> > On Thu, Jun 16, 2016 at 12:59:41PM -0400, Doug Hellmann wrote:
> > > Excerpts from Matthew Treinish's message of 2016-06-15 19:27:13 -0400:
> > > > On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote:
> > > > > Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
> > > > > > Top posting one note and direct comments inline, I’m proposing
> > > > > > this as a member of the DefCore working group, but this
> > > > > > proposal itself has not been accepted as the forward course of
> > > > > > action by the working group. These are my own views as the
> > > > > > administrator of the program and not that of the working group
> > > > > > itself, which may independently reject the idea outside of the
> > > > > > response from the upstream devs.
> > > > > > 
> > > > > > I posted a link to this thread to the DefCore mailing list to make
> > > > > > that working group aware of the outstanding issues.
> > > > > > 
> > > > > > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish 
> > > > > > > <mtrein...@kortar.org> wrote:
> > > > > > > 
> > > > > > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> > > > > > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 
> > > > > > >> -0400:
> > > > > > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > > > > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 
> > > > > > >>>> 14:21:27 -0400:
> > > > > > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> > > > > > >>>>>> Last year, in response to Nova micro-versioning and 
> > > > > > >>>>>> extension updates[1],
> > > > > > >>>>>> the QA team added strict API schema checking to Tempest to 
> > > > > > >>>>>> ensure that
> > > > > > >>>>>> no additional properties were added to Nova API 
> > > > > > >>>>>> responses[2][3]. In the
> > > > > > >>>>>> last year, at least three vendors participating the the 
> > > > > > >>>>>> OpenStack Powered
> > > > > > >>>>>> Trademark program have been impacted by this change, two of 
> > > > > > >>>>>> which
> > > > > > >>>>>> reported this to the DefCore Working Group mailing list 
> > > > > > >>>>>> earlier this year[4].
> > > > > > >>>>>> 
> > > > > > >>>>>> The DefCore Working Group determines guidelines for the 
> > > > > > >>>>>> OpenStack Powered
> > > > > > >>>>>> program, which includes capabilities with associated 
> > > > > > >>>>>> functional tests
> > > > > > >>>>>> from Tempest that must be passed, and designated sections 
> > > > > > >>>>>> with associated
> > > > > > >>>>>> upstream code [5][6]. In determining these guidelines, the 
> > > > > > >>>>>> working group
> > > > > > >>>>>> attempts to balance the future direction of development with 
> > > > > > >>>>>> lagging
> > > > > > >>>>>> indicators of deployments and user adoption.
> > > > > > >>>>>> 
> > > > > > >>>>>> After a tremendous amount of consideration, I believe that 
> > > > > > >>>>>> the DefCore
> > > > > > >>>>>> Working Group needs to implement a temporary waiver for the 
> > > > > > >>>>>> strict API
> > > > > > >>>>>> checking requirements that were introduced last year, to 
> > > > > > >>>>>> give downstream
> > > > > > >>>>>> deployers more time to 

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-16 Thread Matthew Treinish
On Thu, Jun 16, 2016 at 12:59:41PM -0400, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2016-06-15 19:27:13 -0400:
> > On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote:
> > > Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
> > > > Top posting one note and direct comments inline, I’m proposing
> > > > this as a member of the DefCore working group, but this
> > > > proposal itself has not been accepted as the forward course of
> > > > action by the working group. These are my own views as the
> > > > administrator of the program and not that of the working group
> > > > itself, which may independently reject the idea outside of the
> > > > response from the upstream devs.
> > > > 
> > > > I posted a link to this thread to the DefCore mailing list to make
> > > > that working group aware of the outstanding issues.
> > > > 
> > > > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtrein...@kortar.org> 
> > > > > wrote:
> > > > > 
> > > > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> > > > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 
> > > > >> -0400:
> > > > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 
> > > > >>>> -0400:
> > > > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> > > > >>>>>> Last year, in response to Nova micro-versioning and extension 
> > > > >>>>>> updates[1],
> > > > >>>>>> the QA team added strict API schema checking to Tempest to 
> > > > >>>>>> ensure that
> > > > >>>>>> no additional properties were added to Nova API responses[2][3]. 
> > > > >>>>>> In the
> > > > >>>>>> last year, at least three vendors participating the the 
> > > > >>>>>> OpenStack Powered
> > > > >>>>>> Trademark program have been impacted by this change, two of which
> > > > >>>>>> reported this to the DefCore Working Group mailing list earlier 
> > > > >>>>>> this year[4].
> > > > >>>>>> 
> > > > >>>>>> The DefCore Working Group determines guidelines for the 
> > > > >>>>>> OpenStack Powered
> > > > >>>>>> program, which includes capabilities with associated functional 
> > > > >>>>>> tests
> > > > >>>>>> from Tempest that must be passed, and designated sections with 
> > > > >>>>>> associated
> > > > >>>>>> upstream code [5][6]. In determining these guidelines, the 
> > > > >>>>>> working group
> > > > >>>>>> attempts to balance the future direction of development with 
> > > > >>>>>> lagging
> > > > >>>>>> indicators of deployments and user adoption.
> > > > >>>>>> 
> > > > >>>>>> After a tremendous amount of consideration, I believe that the 
> > > > >>>>>> DefCore
> > > > >>>>>> Working Group needs to implement a temporary waiver for the 
> > > > >>>>>> strict API
> > > > >>>>>> checking requirements that were introduced last year, to give 
> > > > >>>>>> downstream
> > > > >>>>>> deployers more time to catch up with the strict micro-versioning
> > > > >>>>>> requirements determined by the Nova/Compute team and enforced by 
> > > > >>>>>> the
> > > > >>>>>> Tempest/QA team.
> > > > >>>>> 
> > > > >>>>> I'm very much opposed to this being done. If we're actually 
> > > > >>>>> concerned with
> > > > >>>>> interoperability and verify that things behave in the same manner 
> > > > >>>>> between multiple
> > > > >>>>> clouds then doing this would be a big step backwards. The 
> > > > >

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-15 Thread Matthew Treinish
On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote:
> Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
> > Top posting one note and direct comments inline, I’m proposing
> > this as a member of the DefCore working group, but this
> > proposal itself has not been accepted as the forward course of
> > action by the working group. These are my own views as the
> > administrator of the program and not that of the working group
> > itself, which may independently reject the idea outside of the
> > response from the upstream devs.
> > 
> > I posted a link to this thread to the DefCore mailing list to make
> > that working group aware of the outstanding issues.
> > 
> > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtrein...@kortar.org> 
> > > wrote:
> > > 
> > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
> > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> > >>>>>> Last year, in response to Nova micro-versioning and extension 
> > >>>>>> updates[1],
> > >>>>>> the QA team added strict API schema checking to Tempest to ensure 
> > >>>>>> that
> > >>>>>> no additional properties were added to Nova API responses[2][3]. In 
> > >>>>>> the
> > >>>>>> last year, at least three vendors participating the the OpenStack 
> > >>>>>> Powered
> > >>>>>> Trademark program have been impacted by this change, two of which
> > >>>>>> reported this to the DefCore Working Group mailing list earlier this 
> > >>>>>> year[4].
> > >>>>>> 
> > >>>>>> The DefCore Working Group determines guidelines for the OpenStack 
> > >>>>>> Powered
> > >>>>>> program, which includes capabilities with associated functional tests
> > >>>>>> from Tempest that must be passed, and designated sections with 
> > >>>>>> associated
> > >>>>>> upstream code [5][6]. In determining these guidelines, the working 
> > >>>>>> group
> > >>>>>> attempts to balance the future direction of development with lagging
> > >>>>>> indicators of deployments and user adoption.
> > >>>>>> 
> > >>>>>> After a tremendous amount of consideration, I believe that the 
> > >>>>>> DefCore
> > >>>>>> Working Group needs to implement a temporary waiver for the strict 
> > >>>>>> API
> > >>>>>> checking requirements that were introduced last year, to give 
> > >>>>>> downstream
> > >>>>>> deployers more time to catch up with the strict micro-versioning
> > >>>>>> requirements determined by the Nova/Compute team and enforced by the
> > >>>>>> Tempest/QA team.
> > >>>>> 
> > >>>>> I'm very much opposed to this being done. If we're actually concerned 
> > >>>>> with
> > >>>>> interoperability and verify that things behave in the same manner 
> > >>>>> between multiple
> > >>>>> clouds then doing this would be a big step backwards. The fundamental 
> > >>>>> disconnect
> > >>>>> here is that the vendors who have implemented out of band extensions 
> > >>>>> or were
> > >>>>> taking advantage of previously available places to inject extra 
> > >>>>> attributes
> > >>>>> believe that doing so means they're interoperable, which is quite far 
> > >>>>> from
> > >>>>> reality. **The API is not a place for vendor differentiation.**
> > >>>> 
> > >>>> This is a temporary measure to address the fact that a large number
> > >>>> of existing tests changed their behavior, rather than having new
> > >>>> tests added to enforce this new requirement. The result is deployments
> > >>>> that previously passed these tests ma

Re: [openstack-dev] [tempest] the project specific config option not generated together with tempest.conf.sample

2016-06-15 Thread Matthew Treinish
Just a note, please don't start a new thread as a reply to an existing thread.
(well unless you remove the In-Reply-To header from the message) There is more
details on this here:

https://wiki.openstack.org/wiki/MailingListEtiquette#Threading

I almost missed this because it was part of a different thread.

On Wed, Jun 15, 2016 at 05:14:26AM +, joehuang wrote:
> Hello, 
> 
> A tempest plugin was written for the Kingbird 
> https://review.openstack.org/#/c/328683/, the plugin and test cases could be 
> discovered by tempest, and the configuration is working if we add the 
> configuration items into the tempest.conf manfully, but if we run tox 
> -egenconfig in the tempest folder, these configuration items not generated in 
> the tempest.conf.sample.
> 
> How to make the plugin customized configuration items also being generated in 
> the tempest.conf.sample ? 

This is a documented part of the tempest plugin interface:

http://docs.openstack.org/developer/tempest/plugin.html#tempest.test_discover.plugins.TempestPlugin.get_opt_lists

> 
> And for service_available group, it should be already there in the config, 
> isn't it?

Yes, but it depends on your plugin to pass the extra config options properly on
sample config generation. If you look at your tempest plugin:

http://git.openstack.org/cgit/openstack/kingbird/tree/kingbird/tests/tempest/scenario/plugin.py#n38

You're not returning the service_available option to tempest, just the KBGroup
options. You need to add a tuple with the service_available option and group
name to the output list there for it to show up in the output sample config
file.

That being said I don't actually see the service_available kingbird option being
defined anywhere in the plugin. For example see:

http://git.openstack.org/cgit/openstack/zaqar/tree/zaqar/tests/tempest_plugin/config.py#n18

and

http://git.openstack.org/cgit/openstack/zaqar/tree/zaqar/tests/tempest_plugin/plugin.py#n38


-Matt Treinish


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-14 Thread Matthew Treinish
On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
> > On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> > > > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> > > > > Last year, in response to Nova micro-versioning and extension 
> > > > > updates[1],
> > > > > the QA team added strict API schema checking to Tempest to ensure that
> > > > > no additional properties were added to Nova API responses[2][3]. In 
> > > > > the
> > > > > last year, at least three vendors participating the the OpenStack 
> > > > > Powered
> > > > > Trademark program have been impacted by this change, two of which
> > > > > reported this to the DefCore Working Group mailing list earlier this 
> > > > > year[4].
> > > > > 
> > > > > The DefCore Working Group determines guidelines for the OpenStack 
> > > > > Powered
> > > > > program, which includes capabilities with associated functional tests
> > > > > from Tempest that must be passed, and designated sections with 
> > > > > associated
> > > > > upstream code [5][6]. In determining these guidelines, the working 
> > > > > group
> > > > > attempts to balance the future direction of development with lagging
> > > > > indicators of deployments and user adoption.
> > > > > 
> > > > > After a tremendous amount of consideration, I believe that the DefCore
> > > > > Working Group needs to implement a temporary waiver for the strict API
> > > > > checking requirements that were introduced last year, to give 
> > > > > downstream
> > > > > deployers more time to catch up with the strict micro-versioning
> > > > > requirements determined by the Nova/Compute team and enforced by the
> > > > > Tempest/QA team.
> > > > 
> > > > I'm very much opposed to this being done. If we're actually concerned 
> > > > with
> > > > interoperability and verify that things behave in the same manner 
> > > > between multiple
> > > > clouds then doing this would be a big step backwards. The fundamental 
> > > > disconnect
> > > > here is that the vendors who have implemented out of band extensions or 
> > > > were
> > > > taking advantage of previously available places to inject extra 
> > > > attributes
> > > > believe that doing so means they're interoperable, which is quite far 
> > > > from
> > > > reality. **The API is not a place for vendor differentiation.**
> > > 
> > > This is a temporary measure to address the fact that a large number
> > > of existing tests changed their behavior, rather than having new
> > > tests added to enforce this new requirement. The result is deployments
> > > that previously passed these tests may no longer pass, and in fact
> > > we have several cases where that's true with deployers who are
> > > trying to maintain their own standard of backwards-compatibility
> > > for their end users.
> > 
> > That's not what happened though. The API hasn't changed and the tests 
> > haven't
> > really changed either. We made our enforcement on Nova's APIs a bit 
> > stricter to
> > ensure nothing unexpected appeared. For the most these tests work on any 
> > version
> > of OpenStack. (we only test it in the gate on supported stable releases, 
> > but I
> > don't expect things to have drastically shifted on older releases) It also
> > doesn't matter which version of the API you run, v2.0 or v2.1. Literally, 
> > the
> > only case it ever fails is when you run something extra, not from the 
> > community,
> > either as an extension (which themselves are going away [1]) or another 
> > service
> > that wraps nova or imitates nova. I'm personally not comfortable saying 
> > those
> > extras are ever part of the OpenStack APIs.
> >
> > > We have basically three options.
> > > 
> > > 1. Tell deployers who are trying to do the right for their immediate
> > >users that they can't use the trademark.
> > > 
> > > 2. Flag the related tests or remove them from the DefCore enforcement
> > >suite entirely.
> > > 
> > > 3. Be flexible about giving consumers of Tempest time to meet the
> > >new requirement by providing a way to disable the checks.
> > > 
> > > Option 1 goes against our own backwards compatibility policies.
> > 
> > I don't think backwards compatibility policies really apply to what what 
> > define
> > as the set of tests that as a community we are saying a vendor has to pass 
> > to
> > say they're OpenStack. From my perspective as a community we either take a 
> > hard
> > stance on this and say to be considered an interoperable cloud (and to get 
> > the
> > trademark) you have to actually have an interoperable product. We slowly 
> > ratchet
> > up the requirements every 6 months, there isn't any implied backwards
> > compatibility in doing that. You passed in the past but not in the newer 
> > stricter
> > guidelines.
> > 
> > Also, even if I did think it applied, we're not 

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-14 Thread Matthew Treinish
On Tue, Jun 14, 2016 at 12:19:54PM -0700, Chris Hoge wrote:
> 
> > On Jun 14, 2016, at 11:21 AM, Matthew Treinish <mtrein...@kortar.org> wrote:
> > 
> > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> >> Last year, in response to Nova micro-versioning and extension updates[1],
> >> the QA team added strict API schema checking to Tempest to ensure that
> >> no additional properties were added to Nova API responses[2][3]. In the
> >> last year, at least three vendors participating the the OpenStack Powered
> >> Trademark program have been impacted by this change, two of which
> >> reported this to the DefCore Working Group mailing list earlier this 
> >> year[4].
> >> 
> >> The DefCore Working Group determines guidelines for the OpenStack Powered
> >> program, which includes capabilities with associated functional tests
> >> from Tempest that must be passed, and designated sections with associated
> >> upstream code [5][6]. In determining these guidelines, the working group
> >> attempts to balance the future direction of development with lagging
> >> indicators of deployments and user adoption.
> >> 
> >> After a tremendous amount of consideration, I believe that the DefCore
> >> Working Group needs to implement a temporary waiver for the strict API
> >> checking requirements that were introduced last year, to give downstream
> >> deployers more time to catch up with the strict micro-versioning
> >> requirements determined by the Nova/Compute team and enforced by the
> >> Tempest/QA team.
> > 
> > I'm very much opposed to this being done. If we're actually concerned with
> > interoperability and verify that things behave in the same manner between 
> > multiple
> > clouds then doing this would be a big step backwards. The fundamental 
> > disconnect
> > here is that the vendors who have implemented out of band extensions or were
> > taking advantage of previously available places to inject extra attributes
> > believe that doing so means they're interoperable, which is quite far from
> > reality. **The API is not a place for vendor differentiation.**
> 
> Yes, it’s bad practice, but it’s also a reality, and I honestly believe that
> vendors have received the message and are working on changing.

They might be working on this, but this change was coming for quite some
time it shouldn't be a surprise to anyone at this point. I mean seriously, it's
been in tempest for 1 year, and it took 6months to land. Also, lets say we set
a hard deadline on this new option to disable the enforcement and enforce it.
Then we implement a similar change on keystone are we gonna have to do the same
thing again when vendors who have custom things running there fail.

> 
> > As a user of several clouds myself I can say that having random gorp in a
> > response makes it much more difficult to use my code against multiple 
> > clouds. I
> > have to determine which properties being returned are specific to that 
> > vendor's
> > cloud and if I actually need to depend on them for anything it makes 
> > whatever
> > code I'm writing incompatible for using against any other cloud. (unless I
> > special case that block for each cloud) Sean Dague wrote a good post where 
> > a lot
> > of this was covered a year ago when microversions was starting to pick up 
> > steam:
> > 
> > https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2 
> > <https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2>
> > 
> > I'd recommend giving it a read, he explains the user first perspective more
> > clearly there.
> > 
> > I believe Tempest in this case is doing the right thing from an 
> > interoperability
> > perspective and ensuring that the API is actually the API. Not an API with 
> > extra
> > bits a vendor decided to add.
> 
> A few points on this, though. Right now, Nova is the only API that is
> enforcing this, and the clients. While this may change in the
> future, I don’t think it accurately represents the reality of what’s
> happening in the ecosystem.

This in itself doesn't make a difference. There is a disparity in the level of
testing across all the projects. Nova happens to be further along in regards
to api stability and testing things compared to a lot of projects, it's not
really a surprise that they're the first for this to come up on. It's only a
matter of time for other projects to follow nova's example and implement similar
enforcement.

> 
> As mentioned before, we also need to balance the lagging nature of
> DefCore as an interoperabili

  1   2   3   4   >