from:"John Garbutt"

Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-30 Thread John Garbutt

Hi,

Basically we should kill quota classes.

It required out of tree stuff that was never implemented, AFAIK.

When I checked with Kevin about this, my memory says the idea was out
of tree authorization plugin would populate context.quota_class with
something like "i_have_big_credit_limit" or
"i_have_prepaid_loads_limit", and if blank fall back to the default. I
don't believe anyone ever used that system. Gives you like groups of
pre-defined quota limits, rather than per project overrides.

Either way, it should die, and now its keystone's problem.
I subscribe to the idea that downstream operational scripting is the
currently preferred solution.

Thanks,
johnthetubaguy

PS
Sorry been busy on SKA architecture last month or so, slowing getting
back up to speed.
On Fri, 26 Oct 2018 at 14:55, Jay Pipes  wrote:
>
> On 10/25/2018 02:44 PM, melanie witt wrote:
> > On Thu, 25 Oct 2018 14:00:08 -0400, Jay Pipes wrote:
> >> On 10/25/2018 01:38 PM, Chris Friesen wrote:
> >>> On 10/24/2018 9:10 AM, Jay Pipes wrote:
>  Nova's API has the ability to create "quota classes", which are
>  basically limits for a set of resource types. There is something
>  called the "default quota class" which corresponds to the limits in
>  the CONF.quota section. Quota classes are basically templates of
>  limits to be applied if the calling project doesn't have any stored
>  project-specific limits.
> 
>  Has anyone ever created a quota class that is different from "default"?
> >>>
> >>> The Compute API specifically says:
> >>>
> >>> "Only ‘default’ quota class is valid and used to set the default quotas,
> >>> all other quota class would not be used anywhere."
> >>>
> >>> What this API does provide is the ability to set new default quotas for
> >>> *all* projects at once rather than individually specifying new defaults
> >>> for each project.
> >>
> >> It's a "defaults template", yes.
> >>
> >> The alternative is, you know, to just set the default values in
> >> CONF.quota, which is what I said above. Or, if you want project X to
> >> have different quota limits from those CONF-driven defaults, then set
> >> the quotas for the project to some different values via the
> >> os-quota-sets API (or better yet, just use Keystone's /limits API when
> >> we write the "limits driver" into Nova). The issue is that the
> >> os-quota-classes API is currently blocking *me* writing that "limits
> >> driver" in Nova because I don't want to port nova-specific functionality
> >> (like quota classes) to a limits driver when the Keystone /limits
> >> endpoint doesn't have that functionality and nobody I know of has ever
> >> used it.
> >
> > When you say it's blocking you from writing the "limits driver" in nova,
> > are you saying you're picking up John's unified limits spec [1]? It's
> > been in -W mode and hasn't been updated in 4 weeks. In the spec,
> > migration from quota classes => registered limits and deprecation of the
> > existing quota API and quota classes is described.
> >
> > Cheers,
> > -melanie
> >
> > [1] https://review.openstack.org/602201
>
> Actually, I wasn't familiar with John's spec. I'll review it today.
>
> I was referring to my own attempts to clean up the quota system and
> remove all the limits-related methods from the QuotaDriver class...
>
> Best,
> -jay
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-10-02 Thread John Garbutt

Back to the deprecation for a moment...

My plan was to tell folks to use Traits to influence placement
decisions, rather than capabilities.

We probably can't remove the feature till we have deploy templates,
but it seems wrong not to warn our users to avoid using capabilities,
when 80% of the use cases can be moved to traits today, and give you
better performance, etc.

Thoughts?

On Mon, 1 Oct 2018 at 22:42, Eric Fried  wrote:
> I do want to zoom out a bit and point out that we're talking about
> implementing a new framework of substantial size and impact when the
> original proposal - using the trait for both - would just work out of
> the box today with no changes in either API. Is it really worth it?

Yeah, I think the simpler solution deals with a lot of the cases right now.

Personally, I see using traits as about hiding complexity from the end
user (not the operator). End users are requesting a host with a given
capability (via flavor, image or otherwise), and they don't really
care if the operator has statically configured it, or Ironic
dynamically configures it. Operator still statically configures what
deploy templates are possible on what nodes (last time I read the
spec).

For the common cases, I see us adding standard traits. They would also
be useful to pick between nodes that are statically configured one way
or the other. (Although MarkG keeps telling me (in a British way) that
is probably rubbish, and he might be right...)

I am +1 Jay's idea for the more complicated cases (a bit like what
jroll was saying). For me, the user (gets bad interop and) has no
visibility into what the crazy custom trait means (i.e. the LAYOUT_Y
in efried's example). A validated blob in Glare doesn't seem terrible
for that special case. But generally that seems like quite a different
use case, and its tempting to focus on something well typed that is
disk configuration specific. Although, it is tempting not to block the
simpler solution, while we work out how people use this for real.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-10-01 Thread John Garbutt

On Fri, 28 Sep 2018 at 00:46, Jay Pipes  wrote:

> On 09/27/2018 06:23 PM, Matt Riedemann wrote:
> > On 9/27/2018 3:02 PM, Jay Pipes wrote:
> >> A great example of this would be the proposed "deploy template" from
> >> [2]. This is nothing more than abusing the placement traits API in
> >> order to allow passthrough of instance configuration data from the
> >> nova flavor extra spec directly into the nodes.instance_info field in
> >> the Ironic database. It's a hack that is abusing the entire concept of
> >> the placement traits concept, IMHO.
> >>
> >> We should have a way *in Nova* of allowing instance configuration
> >> key/value information to be passed through to the virt driver's
> >> spawn() method, much the same way we provide for user_data that gets
> >> exposed after boot to the guest instance via configdrive or the
> >> metadata service API. What this deploy template thing is is just a
> >> hack to get around the fact that nova doesn't have a basic way of
> >> passing through some collated instance configuration key/value
> >> information, which is a darn shame and I'm really kind of annoyed with
> >> myself for not noticing this sooner. :(
> >
> > We talked about this in Dublin through right? We said a good thing to do
> > would be to have some kind of template/profile/config/whatever stored
> > off in glare where schema could be registered on that thing, and then
> > you pass a handle (ID reference) to that to nova when creating the
> > (baremetal) server, nova pulls it down from glare and hands it off to
> > the virt driver. It's just that no one is doing that work.
>
> No, nobody is doing that work.
>
> I will if need be if it means not hacking the placement API to serve
> this purpose (for which it wasn't intended).
>

Going back to the point Mark Goddard made, there are two things here:

1) Picking the correct resource provider
2) Telling Ironic to transform the picked node in some way

Today we allow the use Capabilities for both.

I am suggesting we move to using Traits only for (1), leaving (2) in place
for now, while we decide what to do (i.e. future of "deploy template"
concept).

It feels like Ironic's plan to define the "deploy templates" in Ironic
should replace the dependency on Glare for this use case, largely because
the definition of the deploy template (in my mind) is very heavily related
to inspector and driver properties, etc. Mark is looking at moving that
forward at the moment.

Thanks,
John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-25 Thread John Garbutt

On Thu, 20 Sep 2018 at 16:02, Matt Riedemann  wrote:

> On 9/20/2018 4:16 AM, John Garbutt wrote:
> > Following on from the PTG discussions, I wanted to bring everyone's
> > attention to Nova's plans to deprecate ComputeCapabilitiesFilter,
> > including most of the the integration with Ironic Capabilities.
> >
> > To be specific, this is my proposal in code form:
> > https://review.openstack.org/#/c/603102/
> >
> > Once the code we propose to deprecate is removed we will stop using
> > capabilities pushed up from Ironic for 'scheduling', but we would still
> > pass capabilities request in the flavor down to Ironic (until we get
> > some standard traits and/or deploy templates sorted for things like
> UEFI).
> >
> > Functionally, we believe all use cases can be replaced by using the
> > simpler placement traits (this is more efficient than post placement
> > filtering done using capabilities):
> >
> https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/ironic-driver-traits.html
> >
> > Please note the recent addition of forbidden traits that helps improve
> > the usefulness of the above approach:
> >
> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-forbidden-traits.html
> >
> > For example, a flavor request for GPUs >= 2 could be replaced by a
> > custom trait trait that reports if a given Ironic node has
> > CUSTOM_MORE_THAN_2_GPUS. That is a bad example (longer term we don't
> > want to use traits for this, but that is a discussion for another day)
> > but it is the example that keeps being raised in discussions on this
> topic.
> >
> > The main reason for reaching out in this email is to ask if anyone has
> > needs that the ResourceClass and Traits scheme does not currently
> > address, or can think of a problem with a transition to the newer
> approach.
>
> I left a few comments in the change, but I'm assuming as part of the
> deprecation we'd remove the filter from the default enabled_filters list
> so new installs don't automatically get warnings during scheduling?
>

+1
Good point, we totally need to do that.


> Another thing is about existing flavors configured for these
> capabilities-scoped specs. Are you saying during the deprecation we'd
> continue to use those even if the filter is disabled? In the review I
> had suggested that we add a pre-upgrade check which inspects the flavors
> and if any of these are found, we report a warning meaning those flavors
> need to be updated to use traits rather than capabilities. Would that be
> reasonable?
>

I like the idea of a warning, but there are features that have not yet
moved to traits:
https://specs.openstack.org/openstack/ironic-specs/specs/juno-implemented/uefi-boot-for-ironic.html

There is a more general plan that will help, but its not quite ready yet:
https://review.openstack.org/#/c/504952/

As such, I think we can't get pull the plug on flavors including
capabilities and passing them to Ironic, but (after a cycle of deprecation)
I think we can now stop pushing capabilities from Ironic into Nova and
using them for placement.

Thanks,
John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [all] Consistent policy names

2018-09-20 Thread John Garbutt

tl;dr
+1 consistent names
I would make the names mirror the API
... because the Operator setting them knows the API, not the code
Ignore the crazy names in Nova, I certainly hate them

Lance Bragstad  wrote:
> I'm curious if anyone has context on the "os-" part of the format?

My memory of the Nova policy mess...
* Nova's policy rules traditionally followed the patterns of the code
** Yes, horrible, but it happened.
* The code used to have the OpenStack API and the EC2 API, hence the "os"
* API used to expand with extensions, so the policy name is often based on
extensions
** note most of the extension code has now gone, including lots of related
policies
* Policy in code was focused on getting us to a place where we could rename
policy
** Whoop whoop by the way, it feels like we are really close to something
sensible now!

Lance Bragstad  wrote:

> Thoughts on using create, list, update, and delete as opposed to post,
> get, put, patch, and delete in the naming convention?
>

I could go either way as I think about "list servers" in the API.
But my preference is for the URL stub and POST, GET, etc.

 On Sun, Sep 16, 2018 at 9:47 PM Lance Bragstad  wrote:

> If we consider dropping "os", should we entertain dropping "api", too? Do
>> we have a good reason to keep "api"?
>> I wouldn't be opposed to simple service types (e.g "compute" or
>> "loadbalancer").
>>
>
+1
The API is known as "compute" in api-ref, so the policy should be for
"compute", etc.

From: Lance Bragstad 
> The topic of having consistent policy names has popped up a few times
this week.

I would love to have this nailed down before we go through all the policy
rules again. In my head I hope in Nova we can go through each policy rule
and do the following:

* move to new consistent policy name, deprecate existing name
* hardcode scope check to project, system or user
** (user, yes... keypairs, yuck, but its how they work)
** deprecate in rule scope checks, which are largely bogus in Nova anyway
* make read/write/admin distinction
** therefore adding the "noop" role, amount other things

Thanks,
John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-20 Thread John Garbutt

Hi,

Following on from the PTG discussions, I wanted to bring everyone's
attention to Nova's plans to deprecate ComputeCapabilitiesFilter, including
most of the the integration with Ironic Capabilities.

To be specific, this is my proposal in code form:
https://review.openstack.org/#/c/603102/

Once the code we propose to deprecate is removed we will stop using
capabilities pushed up from Ironic for 'scheduling', but we would still
pass capabilities request in the flavor down to Ironic (until we get some
standard traits and/or deploy templates sorted for things like UEFI).

Functionally, we believe all use cases can be replaced by using the simpler
placement traits (this is more efficient than post placement filtering done
using capabilities):
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/ironic-driver-traits.html

Please note the recent addition of forbidden traits that helps improve the
usefulness of the above approach:
https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-forbidden-traits.html

For example, a flavor request for GPUs >= 2 could be replaced by a custom
trait trait that reports if a given Ironic node has
CUSTOM_MORE_THAN_2_GPUS. That is a bad example (longer term we don't want
to use traits for this, but that is a discussion for another day) but it is
the example that keeps being raised in discussions on this topic.

The main reason for reaching out in this email is to ask if anyone has
needs that the ResourceClass and Traits scheme does not currently address,
or can think of a problem with a transition to the newer approach.

Many thanks,
John Garbutt

IRC: johnthetubaguy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Review runways this cycle

2018-03-22 Thread John Garbutt

Hi

So I am really excited for us to try runways out, ASAP.

On 20 March 2018 at 23:44, melanie witt  wrote:

> We were thinking of starting the runways process after the spec review
> freeze (which is April 19) so that reviewers won't be split between spec
> reviews and reviews of work in runways.
>

I think spec reviews, blueprint reviews, and code review topics could all
get a runway slot.

What if we had these queues:
Backlog Queue, Blueprint Runway, Approved Queue, Code Runway

Currently all approved blueprints would sit in the Approved queue.
As described, you leave the runway and go back in the queue if progress
stalls.

Basically abandon the spec freeze. Control with runways instead.

The process and instructions are explained in detail on this etherpad,
> which will also serve as the place we queue and track blueprints for
> runways:
>
> https://etherpad.openstack.org/p/nova-runways-rocky

I like its simplicity.
If progress stalls you are booted out of a run way slot back to the queue.

Having said all that, I am basically happy with anything that gets us
trying this out ASAP.

Thanks,
johnthetubaguy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] PTL Election Season

2018-01-23 Thread John Garbutt

On 23 January 2018 at 04:04, Ed Leafe  wrote:

> On Jan 22, 2018, at 5:09 PM, Matt Riedemann  wrote:
> > To anyone that cares, I don't plan on running for Nova PTL again for the
> Rocky release. Queens was my fourth tour and it's definitely time for
> someone else to get the opportunity to lead here. I don't plan on going
> anywhere and I'll be here to help with any transition needed assuming
> someone else (or a couple of people hopefully) will run in the election.
> It's been a great experience and I thank everyone that has had to put up
> with me and my obsessive paperwork and process disorder in the meantime.
>
> I still don't understand how anyone could do what you have done over these
> past two years and not a) had a stress-induced heart attack or b) gotten
> divorced.
>

++
Great work and amazing sticking power.
I know I hit a brick wall after two seasons.

johnthetubaguy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ironic] Remove in-tree policy and config?

2018-01-22 Thread John Garbutt

Hi,

While I was looking at the traits work, I noticed we still have policy and
config in tree for ironic and ironic inspector:

http://git.openstack.org/cgit/openstack/ironic/tree/etc/ironic/policy.json.sample
http://git.openstack.org/cgit/openstack/ironic/tree/etc/ironic/ironic.conf.sample
http://git.openstack.org/cgit/openstack/ironic/tree/etc/ironic/policy.json

And in a similar way:
http://git.openstack.org/cgit/openstack/ironic-inspector/tree/policy.yaml.sample
http://git.openstack.org/cgit/openstack/ironic-inspector/tree/example.conf

There is an argument that says we shouldn't force operators to build a full
environment to generate these, but this has been somewhat superseded by us
having good docs:

https://docs.openstack.org/ironic/latest/configuration/sample-config.html
https://docs.openstack.org/ironic/latest/configuration/sample-policy.html
https://docs.openstack.org/ironic-inspector/latest/configuration/sample-config.html
https://docs.openstack.org/ironic-inspector/latest/configuration/sample-policy.html

It could look something like this (but with the tests working...):
https://review.openstack.org/#/c/536349

What do you all think?

Thanks,
johnthetubaguy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][ironic] Concerns over rigid resource class-only ironic scheduling

2017-10-19 Thread John Garbutt

On 19 October 2017 at 15:38, Jay Pipes  wrote:

> On 10/16/2017 05:31 AM, Nisha Agarwal wrote:
>
>> Hi Matt,
>>
>> As i understand John's spec https://review.openstack.org/#/c/507052/ <
>> https://review.openstack.org/#/c/507052/>, it is actually a replacement
>> for capabilities(qualitative only) for ironic. It doesnt cover the
>> quantitative capabilities as 'traits' are meant only for qualitative
>> capabilities. Quantitative capabilities are covered by resource classes in
>> Nova. We have few (one or two) quantitative capabilities already supported
>> in ironic.
>>
>
> Hi Nisha,
>
> This may be a case of mixed terminology. We do not refer to anything
> quantitative as a "capability". Rather, we use the term "resource class"
> (or sometimes just "resource") to represent quantitative things that may be
> consumed by the instance.
>
> Traits, on the other hand, are qualitative. They represent a binary on/off
> capability that the compute host (or baremetal node in the case of Ironic)
> exposes.
>
> There's no limit on the number of traits that may be associated with a
> particular Ironic baremetal node. However, for Ironic baremetal nodes, if
> the node.resource_class attribute is set, the Nova Ironic virt driver will
> create a single inventory record for the node containing a quantity of 1
> and a resource class equal to whatever is in the node.resource_class
> attribute. This resource class is auto-created by Nova as a custom resource
> class.
>

Just to follow up on this one...

I hope my traits spec will replace the need for the non-exact filters.

Consider two flavors Gold and Gold_Plus. Lets say Gold_plus gives you a
slightly newer CPU, or something.

Consider this setup:

* both GOLD and GOLD_PLUS ironic nodes have Resource Class: CUSTOM_GOLD
* but you can have some have trait: GOLD_REGULAR and some with GOLD_PLUS

Now you can have the flavors:

* GOLD flavor requests resources:CUSTOM_GOLD=1
* GOLD_PLUS flavor also has resources:CUSTOM_GOLD=1 but also
trait:GOLD_PLUS:requires

Now eventually we could modify the GOLD flavor to say:

* resources:CUSTOM_GOLD=1 and trait:GOLD_REGULAR:prefer

@Nisha I think that should largely allow you to construct the same behavior
you have today, or am I totally missing what you are wanting to do?

Thanks,
John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] ironic and traits

2017-10-16 Thread John Garbutt

On 16 October 2017 at 17:55, Eric Fried  wrote:

> * Adding references to the specs: ironic side [1]; nova side [2] (which
> just merged).
>
> * Since Jay is on vacation, I'll tentatively note his vote by proxy [3]
> that ironic should be the source of truth - i.e. option (a).  I think
> the upshot is that it's easier for Ironic to track and resolve conflicts
> than for the virt driver to do so.
>

As I see it, all of these options have Ironic as the source of truth for
Nova.

Driver here is about the Ironic drivers, not Nova virt driver.

 > The downside is obvious - with a lot of deploy templates

> > available it can be a lot of manual work.
>
> * How does option (b) help with this?
>

The operator defines the configuration templates. The driver could then
report traits for any configuration templates that it knows it a given node
can support.

But I suspect a node would have to boot up an image to check if a given set
of RAID or BIOS parameters are valid. Is that correct? I am sure there are
way to cache things that could help somewhat.


> * I suggested a way to maintain the "source" of a trait (operator,
> inspector, etc.) [4] which would help with resolving conflicts.
> However, I agree it would be better to avoid this extra complexity if
> possible.
>

That is basically (b.2).


>
> * This is slightly off topic, but it's related and will eventually need
> to be considered: How are you going to know whether a
> UEFI-capable-but-not-enabled node should have its UEFI mode turned on?
> Are you going to parse the traits specified in the flavor?  (This might
> work for Ironic, but will be tough in the general case.)
>
> [1] https://review.openstack.org/504531


Also the other ironic spec: https://review.openstack.org/#/c/504952


> [2] https://review.openstack.org/507052
> [3]
> https://review.openstack.org/#/c/507052/4/specs/queens/appro
> ved/ironic-traits.rst@88
> [4]
> https://review.openstack.org/#/c/504531/4/specs/approved/nod
> e-traits.rst@196
>
> On 10/16/2017 11:24 AM, Dmitry Tantsur wrote:
> > Hi all,
> >
> > I promised John to dump my thoughts on traits to the ML, so here we go :)
> >
> > I see two roles of traits (or kinds of traits) for bare metal:
> > 1. traits that say what the node can do already (e.g. "the node is
> > doing UEFI boot")
> > 2. traits that say what the node can be *configured* to do (e.g. "the
> node can
> > boot in UEFI mode")
> >
> > This seems confusing, but it's actually very useful. Say, I have a
> flavor that
> > requests UEFI boot via a trait. It will match both the nodes that are
> already in
> > UEFI mode, as well as nodes that can be put in UEFI mode.
> >
> > This idea goes further with deploy templates (new concept we've been
> thinking
> > about). A flavor can request something like CUSTOM_RAID_5, and it will
> match the
> > nodes that already have RAID 5, or, more interestingly, the nodes on
> which we
> > can build RAID 5 before deployment. The UEFI example above can be
> treated in a
> > similar way.
> >
> > This ends up with two sources of knowledge about traits in ironic:
> > 1. Operators setting something they know about hardware ("this node is
> in UEFI
> > mode"),
> > 2. Ironic drivers reporting something they
> >   2.1. know about hardware ("this node is in UEFI mode" - again)
> >   2.2. can do about hardware ("I can put this node in UEFI mode")
> >
> > For case #1 we are planning on a new CRUD API to set/unset traits for a
> node.
> > Case #2 is more interesting. We have two options, I think:
> >
> > a) Operators still set traits on nodes, drivers are simply validating
> them. E.g.
> > an operators sets CUSTOM_RAID_5, and the node's RAID interface checks if
> it is
> > possible to do. The downside is obvious - with a lot of deploy templates
> > available it can be a lot of manual work.
> >
> > b) Drivers report the traits, and they get somehow added to the traits
> provided
> > by an operator. Technically, there are sub-cases again:
> >   b.1) The new traits API returns a union of operator-provided and
> > driver-provided traits
> >   b.2) The new traits API returns only operator-provided traits;
> driver-provided
> > traits are returned e.g. via a new field (node.driver_traits). Then nova
> will
> > have to merge the lists itself.
>

As an alternative, we could enable a configuration template by Resource
Class.
That way its explicit, but you don't have to set it on every node?

I think we would then need a version of (b.1) to report that extra trait up
to Nova, based on the given Resource Class.


> > My personal favorite is the last option: I'd like a clear distinction
> between
> > different "sources" of traits, but I'd also like to reduce manual work
> for
> > operators.
>

I am all for making an operators lives easier, but personally I lean
towards explicitly enabling things, hence my current preference for (a).

I would be tempted to add (b.2) as a second step, after we get (a) working
and tested.

> A valid

Re: [openstack-dev] [doc][ptls][all] Documentation publishing future

2017-05-26 Thread John Garbutt

On Mon, 22 May 2017 at 10:43, Alexandra Settle  wrote:
> We could also view option 1 as the longer-term goal,
> and option 2 as an incremental step toward it

+1 doing option 2 then option 1.
It just seems a good way to split up the work.

Thanks,
johnthetubaguy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-26 Thread John Garbutt

+1 on not forcing Operators to transition to something new twice, even if
we did go for option 3.

Do we have an agreed non-distruptive upgrade path mapped out yet? (For any
of the options) We spoke about fallback rules you pass but with a warning
to give us a smoother transition. I think that's my main objection with the
existing patches, having to tell all admins to get their token for a
different project, and give them roles in that project, all before being
able to upgrade.

Thanks,
johnthetubaguy

On Fri, 26 May 2017 at 08:09, Belmiro Moreira <
moreira.belmiro.email.li...@gmail.com> wrote:

> Hi,
> thanks for bringing this into discussion in the Operators list.
>
> Option 1 and 2 and not complementary but complety different.
> So, considering "Option 2" and the goal to target it for Queens I would
> prefer not going into a migration path in
> Pike and then again in Queens.
>
> Belmiro
>
> On Fri, May 26, 2017 at 2:52 AM, joehuang  wrote:
>
>> I think a option 2 is better.
>>
>> Best Regards
>> Chaoyi Huang (joehuang)
>> --
>> *From:* Lance Bragstad [lbrags...@gmail.com]
>> *Sent:* 25 May 2017 3:47
>> *To:* OpenStack Development Mailing List (not for usage questions);
>> openstack-operat...@lists.openstack.org
>> *Subject:* Re: [openstack-dev]
>> [keystone][nova][cinder][glance][neutron][horizon][policy] defining
>> admin-ness
>>
>> I'd like to fill in a little more context here. I see three options with
>> the current two proposals.
>>
>> *Option 1*
>>
>> Use a special admin project to denote elevated privileges. For those
>> unfamiliar with the approach, it would rely on every deployment having an
>> "admin" project defined in configuration [0].
>>
>> *How it works:*
>>
>> Role assignments on this project represent global scope which is denoted
>> by a boolean attribute in the token response. A user with an 'admin' role
>> assignment on this project is equivalent to the global or cloud
>> administrator. Ideally, if a user has a 'reader' role assignment on the
>> admin project, they could have access to list everything within the
>> deployment, pending all the proper changes are made across the various
>> services. The workflow requires a special project for any sort of elevated
>> privilege.
>>
>> Pros:
>> - Almost all the work is done to make keystone understand the admin
>> project, there are already several patches in review to other projects to
>> consume this
>> - Operators can create roles and assign them to the admin_project as
>> needed after the upgrade to give proper global scope to their users
>>
>> Cons:
>> - All global assignments are linked back to a single project
>> - Describing the flow is confusing because in order to give someone
>> global access you have to give them a role assignment on a very specific
>> project, which seems like an anti-pattern
>> - We currently don't allow some things to exist in the global sense (i.e.
>> I can't launch instances without tenancy), the admin project could own
>> resources
>> - What happens if the admin project disappears?
>> - Tooling or scripts will be written around the admin project, instead of
>> treating all projects equally
>>
>> *Option 2*
>>
>> Implement global role assignments in keystone.
>>
>> *How it works:*
>>
>> Role assignments in keystone can be scoped to global context. Users can
>> then ask for a globally scoped token
>>
>> Pros:
>> - This approach represents a more accurate long term vision for role
>> assignments (at least how we understand it today)
>> - Operators can create global roles and assign them as needed after the
>> upgrade to give proper global scope to their users
>> - It's easier to explain global scope using global role assignments
>> instead of a special project
>> - token.is_global = True and token.role = 'reader' is easier to
>> understand than token.is_admin_project = True and token.role = 'reader'
>> - A global token can't be associated to a project, making it harder for
>> operations that require a project to consume a global token (i.e. I
>> shouldn't be able to launch an instance with a globally scoped token)
>>
>> Cons:
>> - We need to start from scratch implementing global scope in keystone,
>> steps for this are detailed in the spec
>>
>> *Option 3*
>>
>> We do option one and then follow it up with option two.
>>
>> *How it works:*
>>
>> We implement option one and continue solving the admin-ness issues in
>> Pike by helping projects consume and enforce it. We then target the
>> implementation of global roles for Queens.
>>
>> Pros:
>> - If we make the interface in oslo.context for global roles consistent,
>> then consuming projects shouldn't know the difference between using the
>> admin_project or a global role assignment
>>
>> Cons:
>> - It's more work and we're already strapped for resources
>> - We've told operators that the admin_project is a thing but after Queens
>> they will be able to do real global role assignments, so they should now
>>

[openstack-dev] [forum] Writing applications for the VM and Baremetal Platform

2017-05-19 Thread John Garbutt

A quick summary of what happened in the writing applications for the
VM and Baremetal forum session. The etherpad is available here:
https://etherpad.openstack.org/p/BOS-forum-using-vm-and-baremetal

We had a good number of API users and API developers talking together
about the issues facing API users. It would be nice to have involved a
more diverse set of API users, but we have a reasonable starting
place.

There was general agreement on the need for API keys for applications
to access OpenStack APIs rather than forcing the user of passwords,
etc Plan A was to have a voting excercise on the most important
problems facing writing applications for the VM and Baremetal
platform. This was abandoned because there was a clear winer in
Keystone API keys. For example, LDAP passwords give you access to more
things than OpenStack, so you probably don't want to hand those out.
Currently service configuration files have lots of service user
passwords in them, API keys for each node feels a much better
solution, etc, etc.

Saddly the people previously working on this feature are no longer
working on OpenStack. Lance has been asking for help in this email
thread, where the conversation is now continuing:
http://lists.openstack.org/pipermail/openstack-dev/2017-May/116596.html

We agreed a clear next step, once API keys are implemented, was
working out how to limit the access that is granted to a particular
API key. Discussion around this was defferred to the forum session
called "Cloud-Aware Application support", more details here:
https://etherpad.openstack.org/p/pike-forum-cloud-applications

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [forum] [vm] Action required to help improve Ops <-> Dev feedback loops

2017-05-19 Thread John Garbutt

Hi,

On the ops list I have started a thread about the forum session sumary
and the actions needed to keep things going. Please do join in over
there:
http://lists.openstack.org/pipermail/openstack-operators/2017-May/013448.html

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

2017-05-19 Thread John Garbutt

On 19 May 2017 at 10:03, Sylvain Bauza <sba...@redhat.com> wrote:
>
>
> Le 19/05/2017 10:02, Sylvain Bauza a écrit :
>>
>>
>> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>>> The etherpad for this session is here [1]. The goal for this session was
>>> to inform operators and get feedback on the plan for what we're doing
>>> with moving claims from the computes to the control layer (scheduler or
>>> conductor).
>>>
>>> We mostly talked about retries, which also came up in the cells v2
>>> session that Dan Smith led [2] and will recap later.
>>>
>>> Without getting into too many details, in the cells v2 session we came
>>> to a compromise on build retries and said that we could pass hosts down
>>> to the cell so that the cell-level conductor could retry if needed (even
>>> though we expect doing claims at the top will fix the majority of
>>> reasons you'd have a reschedule in the first place).
>>>
>>
>> And during that session, we said that given cell-local conductors (when
>> there is a reschedule) can't upcall the global (for all cells)
>> schedulers, that's why we agreed to use the conductor to be calling
>> Placement API for allocations.
>>
>>
>>> During the claims in the scheduler session, a new wrinkle came up which
>>> is the hosts that the scheduler returns to the top-level conductor may
>>> be in different cells. So if we have two cells, A and B, with hosts x
>>> and y in cell A and host z in cell B, we can't send z to A for retries,
>>> or x or y to B for retries. So we need some kind of post-filter/weigher
>>> filtering such that hosts are grouped by cell and then they can be sent
>>> to the cells for retries as necessary.
>>>
>>
>> That's already proposed for reviews in
>> https://review.openstack.org/#/c/465175/
>>
>>
>>> There was also some side discussion asking if we somehow regressed
>>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>>> Smith have the context on this (I think) so I'm hoping they can clarify
>>> if we really need to fix something in Ocata at this point, or is this
>>> more of a case of closing a loop-hole?
>>>
>>
>> The problem is that the scheduler doesn't verify the cells when trying
>> to find a destination for an instance, it's just using weights for packing.
>>
>> So, for example, say I have N hosts and 2 cells, the first weighting
>> host could be in cell1 while the second could be in cell2. Then, even if
>> the operator uses the weighers for packing, for example a RequestSpec
>> with num_instances=2 could push one instance in cell1 and the other in
>> cell2.
>>
>> From a scheduler point of view, I think we could possibly add a
>> CellWeigher that would help to pack instances within the same cell.
>> Anyway, that's not related to the claims series, so we could possibly
>> backport it for Ocata hopefully.
>>
>
> Melanie actually made a good point about the current logic based on the
> `host_subset_size`config option. If you're leaving it defaulted to 1, in
> theory all instances coming along the scheduler would get a sorted list
> of hosts by weights and only pick the first one (ie. packing all the
> instances onto the same host) which is good for that (except of course
> some user request that fits all the space of the host and where a spread
> could be better by shuffling between multiple hosts).
>
> So, while I began deprecating that option because I thought the race
> condition would be fixed by conductor claims, I think we should keep it
> for the time being until we clearly identify whether it's still necessary.
>
> All what I said earlier above remains valid tho. In a world where 2
> hosts are given as the less weighed ones, we could send instances from
> the same user request onto different cells, but that only ties the
> problem to a multi-instance boot problem, which is far less impactful.

FWIW, I think we need to keep this.

If you have *lots* of contention when picking your host, increasing
host_subset_size should help reduce that contention (and maybe help
increase the throughput). I haven't written a simulator to test it
out, but it feels like we will still need to keep the fuzzy select.
That might just be a different way to say the same thing mel was
saying, not sure.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][swg] Updates on the TC Vision for 2019

2017-05-18 Thread John Garbutt

On 17 May 2017 at 20:02, Dean Troyer  wrote:
> On Wed, May 17, 2017 at 1:47 PM, Doug Hellmann  wrote:
>> The timeline depends on who signed up to do the next revision. Did
>> we get someone to do that, yet, or are we still looking for a
>> volunteer?  (Note that I am not volunteering here, just asking for
>> status.)
>
> I believe John (johnthetubaguy),Chris (cdent) and I (dtroyer) are the
> ones identified to drive the next steps.  Timing-wise, having this
> wrapped up by 2nd week of June suits me great as I am planning some
> time off about then.  I see that as having a solid 'final' proposal by
> then, not necessarily having it approved.

Yep, I am hoping to help.

I am away the week before you, but some kind of tag team should be fine.

I hope to read through the feedback properly and start digesting it
properly tomorrow with any luck.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][all] Do we need a #openstack-tc IRC channel

2017-05-17 Thread John Garbutt

On 16 May 2017 at 16:08, Doug Hellmann  wrote:
> Excerpts from Sean Dague's message of 2017-05-16 10:49:54 -0400:
>> On 05/16/2017 09:38 AM, Davanum Srinivas wrote:
>> > Folks,
>> >
>> > See $TITLE :)
>> >
>> > Thanks,
>> > Dims
>>
>> I'd rather avoid #openstack-tc and just use #openstack-dev.
>> #openstack-dev is pretty low used environment (compared to like
>> #openstack-infra or #openstack-nova). I've personally been trying to
>> make it my go to way to hit up members of other teams whenever instead
>> of diving into project specific channels, because typically it means we
>> can get a broader conversation around the item in question.
>>
>> Our fragmentation of shared understanding on many issues is definitely
>> exacerbated by many project channels, and the assumption that people
>> need to watch 20+ different channels, with different context, to stay up
>> on things.
>>
>> I would love us to have the problem that too many interesting topics are
>> being discussed in #openstack-dev that we feel the need to parallelize
>> them with a different channel. But I would say we should wait until
>> that's actually a problem.
>>
>> -Sean
>
> +1, let's start with just the -dev channel and see if volume becomes
> an issue.

+1 my preference is to just start with the -dev channel, and see how we go.

+1 all the comments about the history and the discussion needing to be
summarised via ML/gerrit anyway. We can link to the logs of the -dev
channel for the "raw" discussion.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Compute Node restart

2017-04-28 Thread John Garbutt

On 27 April 2017 at 06:45, Ajay Kalambur (akalambu)  wrote:
> I am just issuing a reboot command on the compute node
>
> Not a reboot –f
>
> From: Mark Mielke 
> Date: Wednesday, April 26, 2017 at 8:42 PM
> To: "OpenStack Development Mailing List (not for usage questions)"
> , Ajay Kalambur 
> Subject: Re: [openstack-dev] [nova] Compute Node restart
>
> On Apr 25, 2017 2:45 AM, "Ajay Kalambur (akalambu)" 
> wrote:
>
> I see that when a host is gracefully rebooted nova-compute receives a
> lifecycle event to shutdown the instance and it updates the database with
> the state set to SHUTOFF.
> Now when compute node reboot and libvirt brings the VM back up nova checks
> its database and issues stop api and hence shuts down the VM
>
> So even though resume guest state on reboot is set it does not work. Why
> does nova compute ignore the lifecycle event from libvirt saying bring up
> the VM and reconciles with database
>
>
> How are you defining "graceful reboot"?
>
> I am thinking that you may be defining it in a way that implies prior safe
> shutdown of the guests including setting their state to "power off", in
> which case restoring them to the prior state before reboot will correctly
> leave them "power off". I believe this feature is only intended to work with
> a shutdown of the hypervisor such as occurs when you "shutdown -r now" on
> the hypervisor without first shutting down the guests.

It sounds like libvirt service is stopping (and stopping all the VMs)
before the nova-compute service is being stopped.

I would have expected nova-compute service to stop running before the
VMs are stopped.

Is the service order incorrect somehow? I suspect this is
package/distro specific, what are you running?

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Owners needed for approved but orphaned OSIC blueprints

2017-04-25 Thread John Garbutt

Hi,

Matt, thanks for pulling that list together.

I have context on most of those, if people would like to take those on.
I hope to review most of those an clarify where everything is at.

I am going to take a look at where we are at with policy stuff first.

Thanks,
johnthetubaguy

On 24 April 2017 at 22:45, Matt Riedemann  wrote:
>
> Hi everyone,
>
> With the recent unfortunate news about OSIC ending abruptly and several Nova 
> developers no longer being able to work on the project, I wanted to go over 
> the list of blueprints which were approved for Pike and which now no longer 
> have an owner. I am starting this thread to look for owners for each 
> blueprint. If you have questions about picking one up but you're maybe too 
> shy to ask questions about it in the mailing list, feel free to email me 
> directly or send me a private message in IRC. The list is in no particular 
> order.
>
> * 
> https://blueprints.launchpad.net/nova/+spec/live-migration-per-instance-timeout
>
> This is not started but there is a spec with the design.
>
> * https://blueprints.launchpad.net/nova/+spec/remove-discoverable-policy-rules
>
> This is not started but is a specless blueprint that is straight-forward.
>
> * https://blueprints.launchpad.net/nova/+spec/remove-nova-cert
>
> This is not started but should be pretty straightforward, and has a spec with 
> the details.
>
> * https://blueprints.launchpad.net/nova/+spec/ironic-rescue-mode
>
> The nova side of this is dependent on Ironic changes which were owned by OSIC 
> developers. So the nova blueprint is effectively blocked now.
>
> * https://blueprints.launchpad.net/nova/+spec/live-migrate-rescued-instances
>
> This is started, and was picked up from some work started by HPE in previous 
> releases. I'm not sure what state it is in however.
>
> * https://blueprints.launchpad.net/nova/+spec/policy-docs
>
> Most of this is done. There might be a few more non-discoverable policy rules 
> that need documentation, but I haven't gone through what's left in detail yet.
>
> * 
> https://blueprints.launchpad.net/nova/+spec/prep-for-network-aware-scheduling-pike
>
> I'm not sure if John is going to continue trying to work on this or not yet, 
> but I think it would probably be good to get a helping hand here regardless 
> as it's stalled since Newton (really due to lack of reviews).
>
> * https://blueprints.launchpad.net/nova/+spec/use-service-tokens-pike
>
> This was started and made progress in Ocata but needs someone to take over 
> the remaining changes. This overlaps somewhat with 
> https://blueprints.launchpad.net/nova/+spec/use-service-catalog-for-endpoints.
>
> * https://blueprints.launchpad.net/nova/+spec/centralize-config-options-pike
>
> This is an ongoing multi-contributor effort. Stephen Finucane would be a good 
> organizer here if no one else is interested in formally "owning" it.
>
> * 
> https://blueprints.launchpad.net/nova/+spec/live-migration-force-after-timeout
>
> This has been started but needs work. There is also a spec for the design.
>
> --
>
> If you are interested in owning one of these, please reply so I and others 
> know, or contact me privately.
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][election] Questions for Candidates

2017-04-13 Thread John Garbutt

On 12 April 2017 at 17:19, Kendall Nelson  wrote:
> -What is one trait you have that makes it difficult to work in groups like
> the TC and how do you counteract it?

My inability to say no to more work (and being too interested in everything).

I think I am slowly getting better at focusing, delegating and saying
no. (Although folks who were in the Nova PTG room might be laughing
and rolling around on the floor right now.)

> - What do you see as the biggest roadblock in the upcoming releases for the
> TC?

In terms of the TC making progress on the vision, getting distracted
by minucia and the past is probably two big traps.

Overall progress for the whole of OpenStack, I think its all about
retaining contributors and making it easier to get deeply involved.
The need for a more diverse set of sponsoring companies for the top
5-10% of contributors is possibly just a symptom, but its very
related.

> -What is your favorite thing about OpenStack?

Its the people, their passion and the general spirit of collaboration.
(I was thinking hard to say something new... but its totally the people)

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][elections]questions about one platform vision

2017-04-12 Thread John Garbutt

On 12 April 2017 at 12:04, Davanum Srinivas <dava...@gmail.com> wrote:
> On Wed, Apr 12, 2017 at 6:51 AM, John Garbutt <j...@johngarbutt.com> wrote:
>> On 12 April 2017 at 03:54, joehuang <joehu...@huawei.com> wrote:
>>> What's the one platform will be in your own words? What's your proposal and
>>> your focus to help one platform vision being achieved?
>>
>> The more I think about this, the less I like the phrase "one platform".
>>
>> I like to think of OpenStack as group of constellations. Those
>> constellations are groups of projects that are built to be used
>> together around a shared set of use cases and users. Note that many
>> (all?) of those constellations involve open source projects that were
>> born and live outside of OpenStack.
>
> +1 for us to publish sets of projects that work together for specific
> scenarios. I heard this idea first from Allison Randall and it
> immediately struck a chord. To be fair folks like Jay Pipes have
> always said (paraphrasing) "OpenStack is a toolbox". So it's the next
> step i guess. Lauren Sell was mentioning yesterday about hearing
> confusion around "Big Tent", i do feel that when we put forth a set of
> constellations we can start deprecating the "Big Tent" terminology if
> appropriate.

Right, I have tools for hanging pictures, and tools for putting
together flat pack furniture. They all live in my tool box, and it
turns out I use the hammer for almost everything. (I don't generally
use the hammer when repairing my Tuba, and my wife prefers those
sticky hook things for hanging up pictures, but thats all fine) ... I
maybe stretched that a bit :)

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [elections] Available time and top priority

2017-04-12 Thread John Garbutt

On 12 April 2017 at 10:07, Thierry Carrez  wrote:
> Ildiko Vancsa wrote:
>>> On 2017. Apr 12., at 3:18, Monty Taylor >> > wrote:
>>> [...]
>>> Email allows someone to compose an actual structured narrative, and
>>> for replies to do the same. Some of us are loquatious and I imagine
>>> can be hard to follow even with time to read.
>>>
>>> IRC allows someone to respond quickly, and for someone to be like "yo,
>>> totes sorry, I didn't mean that at all LOL" and to walk things back
>>> before a pile of people become mortally insulted.
>>>
>>> Like now - hopefully you'll give me a smiley in IRC ... but you might
>>> not, and I'm stuck worrying that my tone came across wrong. Then if
>>> you just don't respond because ZOMG-EMAIL, I might start day drinking.
>>
>> Big +1 on balance.
>>
>> I agree in general that we need to revisit how to be more inclusive and
>> how to provide as equal conditions to people all around the globe as
>> possible.
>>
>> That said I still would like to keep the ability to have allocated time
>> for synchronous communication and allow the TC to be more of a team as
>> opposed to a group of people driving their own and some shared missions.
>> I think it helps with seeing maybe different parts but still the same
>> big picture and making the group more efficient with decision making and
>> bringing the community forward.
>> [...]
>
> Agree with you Ildiko and Monty, we still need sync communication to get
> a better feel of everyone's feelings on a particular issue, especially
> on complex issues. At the same time, a unique weekly meeting is actively
> preventing people from participating. It is also very active and noisy,
> the timebox can be painful, and its weekly cadence makes a good reason
> for procrastinating in reviews until the topic is raised in meeting,
> where final decision is made. Creating multiple official meetings
> spreads the pain instead of eliminating it. It makes it easier for more
> people to join, but more difficult for any given member to participate
> to every meeting. Our ability to quickly process changes might be affected.
>
> One idea I've been considering is eliminating the one and only sacred
> one-hour weekly TC meeting, and encouraging ad-hoc discussions (on
> #openstack-dev for example) between change proposers and smaller groups
> of TC members present in the same timezone. That would give us a good
> feel of what everyone thinks, reduce noise, and happen at various times
> during the day on a public forum, giving an opportunity for more people
> to jump in the discussion. The informal nature of those discussions
> would make the governance reviews the clear focal point for coordination
> and final decision.

+1 on eliminating the meeting.

+1 on the need for the synchronous discussions, that are documented
and linked back to the gerrit review.

One idea that came up talking about the SWG, was a wiki page with a
weekly status update, that gets emailed out each week by the TC
chairperson. The chair has to chase folks who don't update it, ping
folks for their vote on patches they are ignoring, etc. Maybe it would
end up looking like the API-WG emails that cdent sends out, which
links to reviews that are the current focus as possible merge
candidates.

We often only merge things in the meeting, but if we went more towards
expecting a +1 from all members then as soon as all the required +1s
are present the chairperson is free to merge (and maybe timeouts using
the weekly email in API-WG style).

I strongly believe we need to try not having the meeting, and be more
globally inclusive in the TCs activities. I see the TC as a core team,
rather than the only people working on governance type activities. We
all need to work together to maintain and improve the community
vibrant we have. I think success is when we have people from all
around the globe involved in TC related activities and reviews.

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [elections] Available time and top priority

2017-04-12 Thread John Garbutt

On 11 April 2017 at 09:58, Thierry Carrez  wrote:
> Matt Riedemann wrote:
>> Lots of projects have alternating meeting times to accommodate
>> contributors in different time zones, especially Europe and Asia.
>>
>> The weekly TC meeting, however, does not.
>>
>> I have to assume this has come up before and if so, why hasn't the TC
>> adopted an alternating meeting schedule?
>>
>> For example, it's 4am in Beijing when the TC meeting happens. It's
>> already hard to get people from Asia into leadership roles within
>> projects and especially across the community, in large part because of
>> the timezone barrier.
>>
>> How will the TC grow a diverse membership if it's not even held, at
>> least every other week, in a timezone where the other half of the world
>> can attend?
>
> The current meeting time is more a consequence of the current membership
> composition than a hard rule. There is, however (as you point out) much
> chicken-and-egg effect at play here -- it's easier to get involved in
> the TC if you can regularly attend meetings, so we can't really wait
> until someone is elected to change the time.

+1 on the chicken-and-egg problem here.

If elected I plan on not attending the meetings, to help with this, please see:
https://git.openstack.org/cgit/openstack/election/tree/candidates/pike/TC/johnthetubaguy.txt

> Alternating meeting times would certainly improve the situation, but I'm
> not sure they are the best solution. Personally I would rather try to
> decrease our dependency on meetings.

+1 this.

I think finding better ways of working that don't need synchronous
meetings that naturally exclude someone is the way forward here. We
have the tooling for most of this already, we just need to find better
patterns to keep making progress.

IRC meeting also have lots of the issues documented as mentioned on
this thread. Been working with the SWG group to bring ideas together
here:
https://review.openstack.org/#/c/441923/

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][elections]questions about one platform vision

2017-04-12 Thread John Garbutt

On 12 April 2017 at 03:54, joehuang  wrote:
> What's the one platform will be in your own words? What's your proposal and
> your focus to help one platform vision being achieved?

The more I think about this, the less I like the phrase "one platform".

I like to think of OpenStack as group of constellations. Those
constellations are groups of projects that are built to be used
together around a shared set of use cases and users. Note that many
(all?) of those constellations involve open source projects that were
born and live outside of OpenStack.

I am trying to kick start the "VM and baremetal" working group to get
feedback on a specific constellation as a group of projects. Here I am
thinking about running Nova, Cinder, Neutron, Keystone, etc to give
you (in some sense) a Software Defined Data Center. Many applications
and services need to consume and integrate with that platform, like
Heat, Trove and Magnum, to can get access to the compute, networking
and storage they need to execute their workloads, such as containers.
Its like the next generation of consolidation to get to the next level
of utilization/efficiency. If you look at this constellation the
database and message queue are important non-OpenStack components of
the constellation. Maybe this is a false constellation, and there is a
different set of things that people use together. Thats some of the
feedback I hope we get at the forum.

The work ttx mentions is important. I hope the project maps will help
communicate how users can meet their needs by running various
combinations of OpenStack and non-OpenStack projects together.

To be clear, I am not claiming to have the answers here, this is just
my current thinking. I look forward to all the debate and discussions
around this topic, and all the interesting things I will learn about
along that journey, things that will likely make me change my mind.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic][nova] Suggestion required on pci_device inventory addition to ironic and its subsequent changes in nova

2017-04-10 Thread John Garbutt

On 10 April 2017 at 11:31,   wrote:
> On Mon, 2017-04-10 at 11:50 +0530, Nisha Agarwal wrote:
>> Hi team,
>>
>> Please could you pour in your suggestions on the mail?
>>
>> I raised a blueprint in Nova for this https://blueprints.launchpad.ne
>> t/nova/+spec/pci-passthorugh-for-ironic and two RFEs at ironic side h
>> ttps://bugs.launchpad.net/ironic/+bug/1680780 and https://bugs.launch
>> pad.net/ironic/+bug/1681320 for the discussion topic.
>
> If I understand you correctly, you want to be able to filter ironic
> hosts by available PCI device, correct? Barring any possibility that
> resource providers could do this for you yet, extending the nova ironic
> driver to use the PCI passthrough filter sounds like the way to go.

With ironic I thought everything is "passed through" by default,
because there is no virtualization in the way. (I am possibly
incorrectly assuming no BIOS tricks to turn off or re-assign PCI
devices dynamically.)

So I am assuming this is purely a scheduling concern. If so, why are
the new custom resource classes not good enough? "ironic_blue" could
mean two GPUs and two 10Gb nics, "ironic_yellow" could mean one GPU
and one 1Gb nic, etc.

Or is there something else that needs addressing here? Trying to
describe what you get with each flavor to end users? Are you needing
to aggregating similar hardware in a different way to the above
resource class approach?

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [elections] Available time and top priority

2017-04-10 Thread John Garbutt

On 10 April 2017 at 10:16, Thierry Carrez  wrote:
> All the candidates are top community members with a lot of
> responsibilities on their shoulders already. My experience tells me that
> it is easy to overestimate the time we can dedicate to Technical
> Committee matters, and how much we can push and get done in six months
> or one year. At the same time, our most efficient way to make progress
> is always when someone "owns" a particular initiative and pushes it
> through the governance process.

I am not productive during the TC meeting, its too late for me to
think properly. Similarly much of the travel I have had to skip
(partly due to the personal cost, more recently due to budget
constraints) feels like it has limited my involvement with the TC over
the past year. I want to help make us a more globally welcoming group.

I have found it very hard to pick TC related efforts, work out what
others are doing, what is urgent vs important. I believe the TC vision
will help with that massively.

My recent focus has been around the TC vision, and SWG efforts. More
generally, I have been looking at Nova reaching out to related
projects, like Keystone, Cinder and Neutron, and helping to break down
the silos a little. That has lead me to look at creating the VM and
Baremetal working group, getting groups of projects to get feedback
together. If it works, it might be a pattern other constellations of
projects could copy. While that is just "Nova" work, I hope to share
success and failures to help other collaboration efforts.

> So my question is the following: if elected, how much time do you think
> you'll be able to dedicate to Technical Committee affairs (reviewing
> proposed changes and pushing your own) ?

I hope to spend a minimum of 10% of my work time on TC related
efforts, regardless of being elected or not.

If elected, I hope to increase that to at least 20%.

> If there was ONE thing, one
> initiative, one change you will actively push in the six months between
> this election round and the next, what would it be ?

The TC vision.

Lets reach out to get feedback to help refine and revise the vision.
Then lets get it agreed, and start the work to make it reality.

After we have that agreed, I think my focus will turn to the feedback
loops in our community. By that I mean helping breaking down the silos
between the different groups of developers in our community, between
developers vs users, and so on. The VM & Baremetal group is my latest
push in that general direction. The long term aim being improving the
user experience for all the different users of all the various
constellations in OpenStack.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] allow_instance_snapshots config option is not used consistently

2017-04-10 Thread John Garbutt

On 10 April 2017 at 01:56, Matt Riedemann  wrote:
> I found a fun little legacy nugget of compute API non-interoperability joy
> tonight.
>
> The "allow_instance_snapshots" config option disables the createImage server
> action API. Completely config driven and therefore not discoverable.
>
> What intrigues me is that this isn't applied to the createBackup or shelve
> APIs, which also create a snapshot of the instance. Is this by design? I'm
> guessing probably not. In fact, this predates the use of Gerrit [1] so this
> was probably just something hacked in so long ago it makes zero sense now.
> The way to disable any of these APIs now is via policy.
>
> Unless anyone has an issue with this, I'm going to deprecate it for removal.
>
> [1]
> https://github.com/openstack/nova/commit/9633e9877c7836c18c30b51c8494abfb025e64ca

+1 for deprecate it.

That looks like something we did before we had policy.
Not that policy is that discoverable yet either, but it seems better.

Thanks,
john

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] remove-mox-pike blueprint

2017-03-27 Thread John Garbutt

Hi,

I added some notes on the blueprint:
https://blueprints.launchpad.net/nova/+spec/remove-mox-pike

I have seen quite a few patches trying to remove the use of
"self.stub_out". While possibly interesting in the future, I think
this should be out of scope for the mox removal blueprint. The aim of
that method is to help us easily stop calling the mox related
"self.stubs.Set" in a way that is really easy to review (and hard to
get wrong).

I think the current focus should be on emptying this list. I know we
have had quite a few patches up around related tests already:
https://github.com/openstack/nova/blob/master/tests-py3.txt

Just wanting to double check we are all agreed on the direction there.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][devstack][all] ZooKeeper vs etcd for Tooz/DLM

2017-03-15 Thread John Garbutt

On 15 March 2017 at 12:33, Sean Dague <s...@dague.net> wrote:
> On 03/15/2017 08:10 AM, John Garbutt wrote:
>> On 15 March 2017 at 11:58, Jay Pipes <jaypi...@gmail.com> wrote:
>>> On 03/15/2017 07:44 AM, Sean Dague wrote:
>>>>
>>>> On 03/14/2017 11:00 PM, Monty Taylor wrote:
>>>> 
>>>>>
>>>>> a) awesome. when the rest of this dips momentarily into words that might
>>>>> sound negative, please hear it all wrapped in an "awesome" and know that
>>>>> my personal desire is to see the thing you're working on be successful
>>>>> without undue burden...
>>>>>
>>>>> b) In Tokyo, we had the big discussion about DLMs (where at least my
>>>>> intent going in to the room was to get us to pick one and only one).
>>>>> There were three camps in the room who were all vocal:
>>>>>
>>>>> 1) YES! Let's just pick one, I don't care which one
>>>>> 2) I hate Java I don't want to run Zookeeper, so we can't pick that
>>>>> 3) I hate go/don't trust coreos I don't want to run etcd so we can't
>>>>> pick that
>>>>>
>>>>> Because of 2 and 3 the group represented by 1 lost and we ended up with:
>>>>> "crap, we have to use an abstraction library"
>>>>>
>>>>> I'd argue that unless something has changed significantly, having Nova
>>>>> grow a direct depend on etcd when the DLM discussion brought us to "the
>>>>> operators in the room have expressed a need for a pluggable choice
>>>>> between at least zk and etcd" should be pretty much a non-starter.
>>>>>
>>>>> Now, being that I was personally in group 1, I'd be THRILLED if we
>>>>> could, as a community, decide to pick one and skip having an abstraction
>>>>> library. I still don't care which one - and you know I love
>>>>> gRPC/protobuf.
>>>>>
>>>>> But I do think that given the anti-etcd sentiment that was expressed was
>>>>> equally as vehement as the anti-zk sentiment, that we need to circle
>>>>> back and make a legit call on this topic.
>>>>>
>>>>> If we can pick one, I think having special-purpose libraries like
>>>>> os-lively for specific purposes would be neat.
>>>>>
>>>>> If we still can't pick one, then I think adding the liveness check you
>>>>> implemented for os-lively as a new feature in tooz and also implementing
>>>>> the same thing in the zk driver would be necessary. (of course, that'll
>>>>> probably depend on getting etcd3 support added to tooz and making sure
>>>>> there is a good functional test for etcd3.
>>>>
>>>>
>>>> We should also make it clear that:
>>>>
>>>> 1) Tokyo was nearly 1.5 years ago.
>>>> 2) Many stake holders in openstack with people in that room may no
>>>> longer be part of our community
>>>> 3) Alignment with Kubernetes has become something important at many
>>>> levels inside of OpenStack (which puts some real weight on the etcd front)
>>>
>>>
>>> Yes, and even more so for etcd3 vs. etcd2, since a) k8s now uses etcd3 and
>>> b) etcd2 is no longer being worked on.
>>>
>>>> 4) The containers ecosystem, which etcd came out of, has matured
>>>> dramatically
>>
>> +1 for working towards etcd3 a "base service", based on operator acceptance.
>> +1 for liveness checks not causing silly DB churn.
>>
>> While we might not need/want an abstraction layer to hide the
>> differences between different backends, but a library (tooz and/or
>> os-lively) so we all consistently use the tool seems to make sense.
>>
>> Maybe that means get tooz using etcd3 (Julian or Jay, or both maybe
>> seemed keen?)
>> Maybe the tooz API adds bits from the os-lively POC?
>
> I do have a concern where we immediately jump to a generic abstraction,
> instead of using the underlying technology to the best of our ability.
> It's really hard to break down and optimize the abstractions later.
> We've got all sorts of cruft (an inefficiencies) in our DB access layer
> because of this (UUIDs stored as UTF8 strings being a good example).
>
> I'd definitely be more interested in etcd3 as a defined base service,
> people can use it directly. See what kind of patterns people come up
> with. Abstract late once the patterns are there.

Good point.

+1 to collecting the patterns.
Thats the bit I didn't want to throw away.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [notification] BlockDeviceMapping in InstancePayload

2017-03-15 Thread John Garbutt

On 13 March 2017 at 17:14, Balazs Gibizer  wrote:
> Hi,
>
> As part of the Searchlight integration we need to extend our instance
> notifications with BDM data [1]. As far as I understand the main goal is to
> provide enough data about the instance to Searchlight so that Nova can use
> Searchlight to generate the response of the GET /servers/{server_id}
> requests based on the data stored in Searchlight.
>
> I checked the server API response and I found one field that needs BDM
> related data: os-extended-volumes:volumes_attached. Only the uuid of the
> volume and the value of delete_on_terminate is provided in the API response.
>
> I have two options about what to add to the InstancePayload and I want to
> get some opinions about which direction we should go with the
> implementation.
>
> Option A: Add only the minimum required information from the BDM to the
> InstancePayload
>
>  additional InstancePayload field:
>  block_devices: ListOfObjectsField(BlockDevicePayload)
>
>  class BlockDevicePayload(base.NotificationPayloadBase):
>fields = {
>'delete_on_termination': fields.BooleanField(default=False),
>'volume_id': fields.StringField(nullable=True),
>}
>
> This payload would be generated from the BDMs connected to the instance
> where the BDM.destination_type == 'volume'.
>
>
> Option B: Provide a comprehensive set of BDM attributes
>
>  class BlockDevicePayload(base.NotificationPayloadBase):
>fields = {
>'source_type': fields.BlockDeviceSourceTypeField(nullable=True),
>'destination_type': fields.BlockDeviceDestinationTypeField(
>nullable=True),
>'guest_format': fields.StringField(nullable=True),
>'device_type': fields.BlockDeviceTypeField(nullable=True),
>'disk_bus': fields.StringField(nullable=True),
>'boot_index': fields.IntegerField(nullable=True),
>'device_name': fields.StringField(nullable=True),
>'delete_on_termination': fields.BooleanField(default=False),
>'snapshot_id': fields.StringField(nullable=True),
>'volume_id': fields.StringField(nullable=True),
>'volume_size': fields.IntegerField(nullable=True),
>'image_id': fields.StringField(nullable=True),
>'no_device': fields.BooleanField(default=False),
>'tag': fields.StringField(nullable=True)
>}
>
> In this case Nova would provide every BDM attached to the instance not just
> the volume ones.
>
> I intentionally left out connection_info and the db id as those seems really
> system internal.
> I also left out the instance related references as this BlockDevicePayload
> would be part of an InstancePayload which has an the instance uuid already.

+1 leaving those out.

> What do you think, which direction we should go?

There are discussions around extending the info we give out about BDMs
in the API.

What about in between, list all types of BDMs, so include a touch more
info so you can tell which one is a volume for sure.

  class BlockDevicePayload(base.NotificationPayloadBase):
fields = {
'destination_type': fields.BlockDeviceDestinationTypeField(
   nullable=True), # Maybe just called "type"?
'boot_index': fields.IntegerField(nullable=True),
'device_name': fields.StringField(nullable=True), # do we
ignore that now?
'delete_on_termination': fields.BooleanField(default=False),
'volume_id': fields.StringField(nullable=True),
'tag': fields.StringField(nullable=True)
}

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][appcat] The future of the App Catalog

2017-03-15 Thread John Garbutt

On 13 March 2017 at 21:10, Zane Bitter  wrote:
> Yes. this is a problem with the default policy - if you have *any* role in a
> project then you get write access to everything in that project. I don't
> know how I can even call this role-based, since everybody has access to
> everything regardless of their roles.
>
> Keystone folks are working on a new global default policy. The new policy
> will require specific reader/writer roles on a project to access any of that
> project's data (I attended the design session and insisted on it). That will
> free up services to create their own limited-scope roles without the
> consequence of opening up full access to every other OpenStack API. e.g.
> it's easy to imagine a magnum-tenant role that has permissions to move
> Neutron ports around but nothing else.
>
> We ultimately need finer-grained authorisation than that - we'll want users
> to be able to specify permissions for particular resources, and since most
> users are not OpenStack projects we'll need them to be able to do it for
> roles (or specific user accounts) that are not predefined in policy.json.
> With the other stuff in place that's at least do-able in individual projects
> though, and if a few projects can agree on a common approach then it could
> easily turn into e.g. an Oslo library, even if it never turns into a
> centralised authorisation service.

I would love feedback on these three Nova specs currently reworking
our default policy:
https://review.openstack.org/#/c/427872/

It clearly doesn't get us all the way there, but I think it lays the
foundations to build what you suggest.

In a related note, there is this old idea I am trying to write up for
Trove/Magnum concerns (now we have proper service token support in
keystoneauth and keystone middleware):
https://review.openstack.org/#/c/438134/

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-15 Thread John Garbutt

On 13 March 2017 at 15:17, Jay Pipes  wrote:
> On 03/13/2017 11:13 AM, Dan Smith wrote:
>>
>> Interestingly, we just had a meeting about cells and the scheduler,
>> which had quite a bit of overlap on this topic.
>>
>>> That said, as mentioned in the previous email, the priorities for Pike
>>> (and likely Queens) will continue to be, in order: traits, ironic,
>>> shared resource pools, and nested providers.
>>
>>
>> Given that the CachingScheduler is still a thing until we get claims in
>> the scheduler, and given that CachingScheduler doesn't use placement
>> like the FilterScheduler does, I think we need to prioritize the claims
>> part of the above list.
>>
>> Based on the discussion several of us just had, the priority list
>> actually needs to be this:
>>
>> 1. Traits
>> 2. Ironic
>> 3. Claims in the scheduler
>> 4. Shared resources
>> 5. Nested resources
>>
>> Claims in the scheduler is not likely to be a thing for Pike, but should
>> be something we do as much prep for as possible, and land early in Queens.
>>
>> Personally, I think getting to the point of claiming in the scheduler
>> will be easier if we have placement in tree, and anything we break in
>> that process will be easier to backport if they're in the same tree.
>> However, I'd say that after that goal is met, splitting placement should
>> be good to go.
> ++

+1 from me, a bit late I know.

John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [swg][tc] Moving Stewardship Working Group meeting

2017-03-15 Thread John Garbutt

On 15 March 2017 at 09:50, Thierry Carrez  wrote:
> Colette Alexander wrote:
>> Currently the Stewardship Working Group meetings every other Thursday at
>> 1400 UTC.
>>
>> We've had a couple of pings from folks who are interested in joining us
>> for meetings that live in US Pacific Time, and that Thursday time isn't
>> terribly conducive to them being able to make meetings. So - the
>> question is when to move it to, if we can.
>>
>> A quick glance at the rest of the Thursday schedule shows the 1500 and
>> 1600 time slots available (in #openstack-meeting I believe). I'm
>> hesitant to go beyond that in the daytime because we also need to
>> accommodate attendees in Western Europe.
>>
>> Thoughts on whether either of those works from SWG members and anyone
>> who might like to drop in? We can also look into having meetings once a
>> week, and potentially alternating times between the two to help
>> accommodate the spread of people.
>>
>> Let me know what everyone thinks - and for this week I'll see anyone who
>> can make it at 1400 UTC on Thursday.
>
> Alternatively, we could try to come up with ways to avoid regular
> meetings altogether. That would certainly be a bit experimental, but the
> SWG sounds like a nice place to experiment with more inclusive ways of
> coordination.
>
> IMHO meetings serve three purposes. The first is to provide a regular
> rhythm and force people to make progress on stated objectives. You give
> status updates, lay down actions, make sure nothing is stuck. The second
> is to provide quick progress on specific topics -- by having multiple
> people around at the same time you can quickly iterate through ideas and
> options. The third is to expose an entry point to new contributors: if
> they are interested they will look for a meeting to get the temperature
> on a workgroup and potentially jump in.
>
> I'm certainly guilty of being involved in too many things, so purpose
> (1) is definitely helpful to force me to make regular progress, but it
> also feels like something a good status board could do better, and async.
>
> The second purpose is definitely helpful, but I'd say that ad-hoc
> meetings (or discussions in a IRC channel) are a better way to achieve
> the result. You just need to come up with a one-time meeting point where
> all the interested parties will be around, and that's usually easier
> than to pick a weekly time that will work for everyone all the time. We
> just need to invent tooling that would facilitate organizing and
> tracking those.
>
> For the third, I think using IRC channels as the on-boarding mechanism
> is more efficient -- meetings are noisy, busy and not so great for
> newcomers. If we ramped up channel activity (and generally made IRC
> channels more discoverable), I don't think any newcomer would ever use
> meetings to "tune in".
>
> Am I missing something that only meetings could ever provide ? If not it
> feels like the SWG could experiment with meeting-less coordination by
> replacing it with better async status coordination / reminder tools,
> some framework to facilitate ad-hoc discussions, and ramping up activity
> in IRC channel. If that ends up being successful, we could promote our
> techniques to the rest of OpenStack.

+1 for trying out a meeting-less group ourselves.

In the absence of tooling, could we replace the meeting with weekly
email reporting current working streams, and whats planned next? That
would include fixing any problems we face trying to work well
together.

John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][devstack][all] ZooKeeper vs etcd for Tooz/DLM

2017-03-15 Thread John Garbutt

On 15 March 2017 at 11:58, Jay Pipes  wrote:
> On 03/15/2017 07:44 AM, Sean Dague wrote:
>>
>> On 03/14/2017 11:00 PM, Monty Taylor wrote:
>> 
>>>
>>> a) awesome. when the rest of this dips momentarily into words that might
>>> sound negative, please hear it all wrapped in an "awesome" and know that
>>> my personal desire is to see the thing you're working on be successful
>>> without undue burden...
>>>
>>> b) In Tokyo, we had the big discussion about DLMs (where at least my
>>> intent going in to the room was to get us to pick one and only one).
>>> There were three camps in the room who were all vocal:
>>>
>>> 1) YES! Let's just pick one, I don't care which one
>>> 2) I hate Java I don't want to run Zookeeper, so we can't pick that
>>> 3) I hate go/don't trust coreos I don't want to run etcd so we can't
>>> pick that
>>>
>>> Because of 2 and 3 the group represented by 1 lost and we ended up with:
>>> "crap, we have to use an abstraction library"
>>>
>>> I'd argue that unless something has changed significantly, having Nova
>>> grow a direct depend on etcd when the DLM discussion brought us to "the
>>> operators in the room have expressed a need for a pluggable choice
>>> between at least zk and etcd" should be pretty much a non-starter.
>>>
>>> Now, being that I was personally in group 1, I'd be THRILLED if we
>>> could, as a community, decide to pick one and skip having an abstraction
>>> library. I still don't care which one - and you know I love
>>> gRPC/protobuf.
>>>
>>> But I do think that given the anti-etcd sentiment that was expressed was
>>> equally as vehement as the anti-zk sentiment, that we need to circle
>>> back and make a legit call on this topic.
>>>
>>> If we can pick one, I think having special-purpose libraries like
>>> os-lively for specific purposes would be neat.
>>>
>>> If we still can't pick one, then I think adding the liveness check you
>>> implemented for os-lively as a new feature in tooz and also implementing
>>> the same thing in the zk driver would be necessary. (of course, that'll
>>> probably depend on getting etcd3 support added to tooz and making sure
>>> there is a good functional test for etcd3.
>>
>>
>> We should also make it clear that:
>>
>> 1) Tokyo was nearly 1.5 years ago.
>> 2) Many stake holders in openstack with people in that room may no
>> longer be part of our community
>> 3) Alignment with Kubernetes has become something important at many
>> levels inside of OpenStack (which puts some real weight on the etcd front)
>
>
> Yes, and even more so for etcd3 vs. etcd2, since a) k8s now uses etcd3 and
> b) etcd2 is no longer being worked on.
>
>> 4) The containers ecosystem, which etcd came out of, has matured
>> dramatically

+1 for working towards etcd3 a "base service", based on operator acceptance.
+1 for liveness checks not causing silly DB churn.

While we might not need/want an abstraction layer to hide the
differences between different backends, but a library (tooz and/or
os-lively) so we all consistently use the tool seems to make sense.

Maybe that means get tooz using etcd3 (Julian or Jay, or both maybe
seemed keen?)
Maybe the tooz API adds bits from the os-lively POC?

John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][kolla][openstack-helm][tripleo][all] Storing configuration options in etcd(?)

2017-03-15 Thread John Garbutt

On 15 March 2017 at 03:23, Monty Taylor  wrote:
> On 03/15/2017 12:05 AM, Joshua Harlow wrote:
>> So just fyi, this has been talked about before (but prob in context of
>> zookeeper or various other pluggable config backends).
>>
>> Some links:
>>
>> - https://review.openstack.org/#/c/243114/
>> - https://review.openstack.org/#/c/243182/
>> - https://blueprints.launchpad.net/oslo.config/+spec/oslo-config-db
>> - https://review.openstack.org/#/c/130047/
>>
>> I think the general questions that seem to reappear are around the
>> following:
>>
>> * How does reloading work (does it)?
>>
>> * What's the operational experience (editing a ini file is about the
>> lowest bar we can possible get to, for better and/or worse).
>
> As a person who operates many softwares (but who does not necessarily
> operate OpenStack specifically) I will say that services that store
> their config in a service that do not have an injest/update facility
> from file are a GIANT PITA to deal with. Config management is great at
> laying down config files. It _can_ put things into services, but that's
> almost always more work.
>
> Which is my way of saying - neat, but please please please whoever
> writes this make a simple facility that will let someone plop config
> into a file on disk and get that noticed and slurped into the config
> service. A one-liner command line tool that one runs on the config file
> to splat into the config service would be fine.

+1 for keeping the simple use a config file working well.

(+1 for trying other things too, if they don't break the simple way)

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][docs] Broken nova instructions

2017-03-09 Thread John Garbutt

On 9 March 2017 at 11:05, Alexandra Settle  wrote:
> I have attached the error logs that our tester, Brian Moss, sent to me 
> earlier this morning.
> He has noted that some people have been able to get the instructions to work, 
> but many others haven't.
> Also, noted that some are failing on Ubuntu and RDO, so it isn't a distro 
> specific problem either.

The current compute logs seem to point at this bug:
https://github.com/mseknibilel/OpenStack-Grizzly-Install-Guide/issues/71

The bug says you restart libvirt and the problem goes away. Seems like
a package issue, maybe? I am not sure.

I notice some 404s for flavors. You need to create your own flavors
before you can boot an instance. Not sure if thats covered in the
install docs.

For me I found the order unclear in the currently proposed docs, so I
revised them so the cells and placement logic is more integrated into
the existing steps in the guide.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][keystone] Pike PTG recap - quotas

2017-03-01 Thread John Garbutt

On 27 February 2017 at 21:18, Matt Riedemann  wrote:
> We talked about a few things related to quotas at the PTG, some in
> cross-project sessions earlier in the week and then some on Wednesday
> morning in the Nova room. The full etherpad is here [1].
>
> Counting quotas
> ---
>
> Melanie hit a problem with the counting quotas work in Ocata with respect to
> how to handle quotas when the cell that an instance is running in is down.
> The proposed solution is to track project/user ID information in the
> "allocations" table in the Placement service so that we can get allocation
> information for quota usage from Placement rather than the cell. That should
> be a relatively simple change to move this forward and hopefully get the
> counting quotas patches merged by p-1 so we have plenty of burn-in time for
> the new quotas code.
>
> Centralizing limits in Keystone
> ---
>
> This actually came up mostly during the hierarchical quotas discussion on
> Tuesday which was a cross-project session. The etherpad for that is here
> [2]. The idea here is that Keystone already knows about the project
> hierarchy and can be a central location for resource limits so that the
> various projects, like nova and cinder, don't have to have a similar data
> model and API for limits, we can just make that common in Keystone. The
> other projects would still track resource usage and calculate when a request
> is over the limit, but the hope is that the calculation and enforcement can
> be generalized so we don't have to implement the same thing in all of the
> projects for calculating when something is over quota.
>
> There is quite a bit of detail in the nova etherpad [1] about overbooking
> and enforcement modes, which will need to be brought up as options in a spec
> and then projects can sort out what makes the most sense (there might be
> multiple enforcement models available).
>
> We still have to figure out the data migration plan to get limits data from
> each project into Keystone, and what the API in Keystone is going to look
> like, including what this looks like when you have multiple compute
> endpoints in the service catalog, or regions, for example.
>
> Sean Dague was going to start working on the spec for this.
>
> Hierarchical quota support
> --
>
> The notes on hierarchical quota support are already in [1] and [2]. We
> agreed to not try and support hierarchical quotas in Nova until we were
> using limits from Keystone so that we can avoid the complexity of both
> systems (limits from Nova and limits from Keystone) in the same API code. We
> also agreed to not block the counting quotas work that melwitt is doing
> since that's already valuable on its own. It's also fair to say that
> hierarchical quota support in Nova is a Queens item at the earliest given we
> have to get limits stored in Keystone in Pike first.
>
> Dealing with the os-qouta-class-sets API
> 
>
> I had a spec [3] proposing to cleanup some issues with the
> os-quota-class-sets API in Nova. We agreed that rather than spend time
> fixing the latent issues in that API, we'd just invest that time in storing
> and getting limits from Keystone, after which we'll revisit deprecating the
> quota classes API in Nova.
>
> [1] https://etherpad.openstack.org/p/nova-ptg-pike-quotas
> [2] https://etherpad.openstack.org/p/ptg-hierarchical-quotas
> [3] https://review.openstack.org/#/c/411035/

I started a quota backlog spec before the PTG to collect my thoughts here:
https://review.openstack.org/#/c/429678

I have updated that post summit to include updated details on
hierarchy (ln134) when using keystone to store the limits. This mostly
came from some side discussions in the API-WG room with morgan and
melwitt.

It includes a small discussion on how the idea behind quota-class-sets
could be turned into something usable, although that is now a problem
for keystone's limits API.

There were some side discussion around the move to placement meaning
ironic quotas move from vCPU and RAM to custom resource classes. Its
worth noting this largely supersedes the ideas we discussed here in
flavor classes:
http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/flavor-class.html

I don't currently plan on taking that backlog spec further, as sdague
is going to take moving this all forward.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Final mascot

2017-02-03 Thread John Garbutt

On 3 February 2017 at 02:59, Matt Riedemann  wrote:
> The Foundation wants to have the mascots finalized before the PTG. This is
> just an opportunity for people to raise issues with it if they have any.

Honestly it looks a bit aggressive / sharp / pointy.
Maybe less spikes on the outside? I duno.

But asking for a friendlier looking supernova seems a bit... unscientific.

John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Latest and greatest on trying to get n-sch to require placement

2017-01-26 Thread John Garbutt

On 26 January 2017 at 14:14, Ed Leafe  wrote:
> On Jan 26, 2017, at 7:50 AM, Sylvain Bauza  wrote:
>>
>> That's where I think we have another problem, which is bigger than the
>> corner case you mentioned above : when upgrading from Newton to Ocata,
>> we said that all Newton computes have be upgraded to the latest point
>> release. Great. But we forgot to identify that it would also require to
>> *modify* their nova.conf so they would be able to call the placement API.
>>
>> That looks to me more than just a rolling upgrade mechanism. In theory,
>> a rolling upgrade process accepts that N-1 versioned computes can talk
>> to N versioned other services. That doesn't imply a necessary
>> configuration change (except the upgrade_levels flag) on the computes to
>> achieve that, right?
>>
>> http://docs.openstack.org/developer/nova/upgrade.html
>
> Reading that page: "At this point, you must also ensure you update the 
> configuration, to stop using any deprecated features or options, and perform 
> any required work to transition to alternative features.”
>
> So yes, "updating your configuration” is an expected action. I’m not sure why 
> this is so alarming.

We did make this promise:
https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html#requirements

Its bending that configuration requirement a little bit.
That requirement was originally added at the direct request of operators.

Now there is a need to tidy up your configuration after completing the
upgrade to N+1 before upgrading to N+2, but I believe that was assumed
to happen at the end of the N+1 upgrade, using the N+1 release notes.
The idea being warning messages in the logs etc, would help that all
get fixed before attempting the next upgrade. But I agree thats not
what the docs are currently saying.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Latest and greatest on trying to get n-sch to require placement

2017-01-26 Thread John Garbutt

On 26 January 2017 at 13:50, Sylvain Bauza  wrote:
> Le 26/01/2017 05:42, Matt Riedemann a écrit :
>> This is my public hand off to Sylvain for the work done tonight.
>>
>
> Thanks Matt for your help yesterday, was awesome to count you in even
> you're personally away.
>
>
>> Starting with the multinode grenade failure in the nova patch to
>> integrate placement with the filter scheduler:
>>
>> https://review.openstack.org/#/c/417961/
>>
>> The test_schedule_to_all_nodes tempest test was failing in there because
>> that test explicitly forces hosts using AZs to build two instances.
>> Because we didn't have nova.conf on the Newton subnode in the multinode
>> grenade job configured to talk to placement, there was no resource
>> provider for that Newton subnode when we started running smoke tests
>> after the upgrade to Ocata, so that test failed since the request to the
>> subnode had a NoValidHost (because no resource provider was checking in
>> from the Newton node).
>>
>
> That's where I think the current implementation is weird : if you force
> the scheduler to return you a destination (without even calling the
> filters) by just verifying if the corresponding service is up, then why
> are you needing to get the full list of computes before that ?
>
> To the placement extend, if you just *force* the scheduler to return you
> a destination, then why should we verify if the resources are happy ?
> FWIW, we now have a fully different semantics that replaces the
> "force_hosts" thing that I hate : it's called
> RequestSpec.requested_destination and it actually verifies the filters
> only for that destination. No straight bypass of the filters like
> force_hosts does.

Thats just a symptom though, as I understand it?

It seems the real problem seems to be the placement isn't configured
on the old node. Which by accident is what most deployers are likely
to hit, if they didn't setup placement when upgrading last cycle.

>> Grenade is not topology aware so it doesn't know anything about the
>> subnode. When the subnode is stacked, it does so via a post-stack hook
>> script that devstack-gate writes into the grenade run, so after stacking
>> the primary Newton node, it then uses Ansible to ssh into the subnode
>> and stack Newton there too:
>>
>> https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh#L629
>>
>>
>> logs.openstack.org/61/417961/26/check/gate-grenade-dsvm-neutron-multinode-ubuntu-xenial/15545e4/logs/grenade.sh.txt.gz#_2017-01-26_00_26_59_296
>>
>>
>> And placement was optional in Newton so, you know, problems.
>>
>
> That's where I think we have another problem, which is bigger than the
> corner case you mentioned above : when upgrading from Newton to Ocata,
> we said that all Newton computes have be upgraded to the latest point
> release. Great. But we forgot to identify that it would also require to
> *modify* their nova.conf so they would be able to call the placement API.
>
> That looks to me more than just a rolling upgrade mechanism. In theory,
> a rolling upgrade process accepts that N-1 versioned computes can talk
> to N versioned other services. That doesn't imply a necessary
> configuration change (except the upgrade_levels flag) on the computes to
> achieve that, right?
>
> http://docs.openstack.org/developer/nova/upgrade.html

We normally say the config that worked last cycle should be fine.

We probably should have said placement was required last cycle, then
this wouldn't have been an issue.

>> Some options came to mind:
>>
>> 1. Change the test to not be a smoke test which would exclude it from
>> running during grenade. QA would barf on this.
>>
>> 2. Hack some kind of pre-upgrade callback from d-g into grenade just for
>> configuring placement on the compute subnode. This would probably
>> require adding a script to devstack just so d-g has something to call so
>> we could keep branch logic out of d-g, like what we did for the
>> discover_hosts stuff for cells v2. This is more complicated than what I
>> wanted to deal with tonight with limited time on my hands.
>>
>> 3. Change the nova filter scheduler patch to fallback to get all compute
>> nodes if there are no resource providers. We've already talked about
>> this a few times already in other threads and I consider it a safety net
>> we'd like to avoid if all else fails. If we did this, we could
>> potentially restrict it to just the forced-host case...
>>
>> 4. Setup the Newton subnode in the grenade run to configure placement,
>> which I think we can do from d-g using the features yaml file. That's
>> what I opted to go with and the patch is here:
>>
>> https://review.openstack.org/#/c/425524/
>>
>> I've made the nova patch dependent on that *and* the other grenade patch
>> to install and configure placement on the primary node when upgrading
>> from Newton to Ocata.
>>
>> --
>>
>> That's where we're at right now. If #4 fails, I think we are stuck with
>> adding a workaround for

Re: [openstack-dev] [swg][tc] An attempt to clarify/discuss: what exactly is the Stewardship Working Group, anyhow?

2017-01-03 Thread John Garbutt

On 8 December 2016 at 15:33, Thierry Carrez <thie...@openstack.org> wrote:
> Colette Alexander wrote:
>> [...]
>> What are the sorts of things you'd like to see tackled?
>
> John Garbutt recently proposed that the TC works on defining visions for
> itself[1] and OpenStack in general[2] and started laying out the base
> objectives and requirements around that.
>
> Defining a vision is a significant effort, and the SWG is a great venue
> to collaborate on this, especially with the tooling some of us learned
> about in the training. So we could encourage whoever is interested in
> participating in that effort to join the #openstack-swg IRC channel and
> meetings[3].
>
> We could work on the TC vision first, with an (ambitious) objective to
> have it ready for formal submission to the TC after the PTG in February ?
>
> [1] https://review.openstack.org/401225
> [2] https://review.openstack.org/401226
> [3]
> http://eavesdrop.openstack.org/#OpenStack_Stewardship_Working_Group_Meeting

I have moved my patch for the TC vision to:
https://etherpad.openstack.org/p/AtlantaPTG-SWG-TCVision

I have added some ideas and questions based on the last SWG meeting
log and the current review comments on the patch, but please do add
more!

While we probably can't make real progress on this until the TC vision
is sorted, I have moved the OpenStack vision here:
https://etherpad.openstack.org/p/AtlantaPTG-SWG-OpenStackVision

If we can get all current members of TC behind a vision for the TC, I
believe it will really help us work together. Once its clear what
everyone wants to happen in the future, its certainly easy to step up
an help with things that get us closer to the shared vision.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Nominating Stephen Finucane for nova-core

2016-12-06 Thread John Garbutt

+1 from me.

He will be a great add. I really enjoyed working with him as part of OSIC
early on in his Nova journey, and I trust him to a great job as part of the
core team.

John

On Sun, 4 Dec 2016 at 00:00, Michael Still  wrote:

> +1, I'd value him on the team.
>
> Michael
>
> On Sat, Dec 3, 2016 at 2:22 AM, Matt Riedemann  > wrote:
>
> I'm proposing that we add Stephen Finucane to the nova-core team. Stephen
> has been involved with nova for at least around a year now, maybe longer,
> my ability to tell time in nova has gotten fuzzy over the years.
> Regardless, he's always been eager to contribute and over the last several
> months has done a lot of reviews, as can be seen here:
>
>
>
>
>
> https://review.openstack.org/#/q/reviewer:sfinucan%2540redhat.com
>
>
>
>
>
> http://stackalytics.com/report/contribution/nova/180
>
>
>
>
>
> Stephen has been a main contributor and mover for the config option
> cleanup series that last few cycles, and he's a go-to person for a lot of
> the NFV/performance features in Nova like NUMA, CPU pinning, huge pages,
> etc.
>
>
>
>
>
> I think Stephen does quality reviews, leaves thoughtful comments, knows
> when to hold a +1 for a patch that needs work, and when to hold a -1 from a
> patch that just has some nits, and helps others in the project move their
> changes forward, which are all qualities I look for in a nova-core member.
>
>
>
>
>
> I'd like to see Stephen get a bit more vocal / visible, but we all handle
> that differently and I think it's something Stephen can grow into the more
> involved he is.
>
>
>
>
>
> So with all that said, I need a vote from the core team on this
> nomination. I honestly don't care to look up the rules too much on number
> of votes or timeline, I think it's pretty obvious once the replies roll in
> which way this goes.
>
>
>
>
>
> --
>
>
>
>
>
> Thanks,
>
>
>
>
>
> Matt Riedemann
>
>
>
>
>
>
>
>
> __
>
>
> OpenStack Development Mailing List (not for usage questions)
>
>
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>
>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
>
> --
> Rackspace Australia
>
>
>
>
> __
>
> OpenStack Development Mailing List (not for usage questions)
>
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [telemetry] PostgreSQL gate broken

2016-11-25 Thread John Garbutt

On 24 November 2016 at 13:52, Chris Friesen  wrote:
> On 11/24/2016 05:57 AM, Julien Danjou wrote:
>>
>> Hi,
>>
>> It seems Nova broke its PostgreSQL support recently, and that impacts
>> Telemetry as we do gate on PostgreSQL. I opened a bug:
>>
>>https://bugs.launchpad.net/nova/+bug/1644513
>>
>> Could somebody from Nova indicates if this is something you want to fix
>> quickly or if we should just stop caring about Nova+PostgreSQL in our
>> integration test gate?
>
>
> As someone using Nova+PostgreSQL in production, I sure hope that we don't
> stop caring about it!

So Nova did stop gating on PostgreSQL a little while back:
http://lists.openstack.org/pipermail/openstack-dev/2016-August/101892.html

We probably need help to turn that back on, but will need help tidying
that job up I believe. Any takers to look into tidying that job up,
and getting it non-voting on the check queue?

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Some thoughts on API microversions

2016-08-04 Thread John Garbutt

On 4 August 2016 at 14:18, Andrew Laski  wrote:
> On Thu, Aug 4, 2016, at 08:20 AM, Sean Dague wrote:
>> On 08/03/2016 08:54 PM, Andrew Laski wrote:
>> > I've brought some of these thoughts up a few times in conversations
>> > where the Nova team is trying to decide if a particular change warrants
>> > a microversion. I'm sure I've annoyed some people by this point because
>> > it wasn't germane to those discussions. So I'll lay this out in it's own
>> > thread.
>> >
>> > I am a fan of microversions. I think they work wonderfully to express
>> > when a resource representation changes, or when different data is
>> > required in a request. This allows clients to make the same request
>> > across multiple clouds and expect the exact same response format,
>> > assuming those clouds support that particular microversion. I also think
>> > they work well to express that a new resource is available. However I do
>> > think think they have some shortcomings in expressing that a resource
>> > has been removed. But in short I think microversions work great for
>> > expressing that there have been changes to the structure and format of
>> > the API.
>> >
>> > I think microversions are being overused as a signal for other types of
>> > changes in the API because they are the only tool we have available. The
>> > most recent example is a proposal to allow the revert_resize API call to
>> > work when a resizing instance ends up in an error state. I consider
>> > microversions to be problematic for changes like that because we end up
>> > in one of two situations:
>> >
>> > 1. The microversion is a signal that the API now supports this action,
>> > but users can perform the action at any microversion. What this really
>> > indicates is that the deployment being queried has upgraded to a certain
>> > point and has a new capability. The structure and format of the API have
>> > not changed so an API microversion is the wrong tool here. And the
>> > expected use of a microversion, in my opinion, is to demarcate that the
>> > API is now different at this particular point.
>> >
>> > 2. The microversion is a signal that the API now supports this action,
>> > and users are restricted to using it only on or after that microversion.
>> > In many cases this is an artificial constraint placed just to satisfy
>> > the expectation that the API does not change before the microversion.
>> > But the reality is that if the API change was exposed to every
>> > microversion it does not affect the ability I lauded above of a client
>> > being able to send the same request and receive the same response from
>> > disparate clouds. In other words exposing the new action for all
>> > microversions does not affect the interoperability story of Nova which
>> > is the real use case for microversions. I do recognize that the
>> > situation may be more nuanced and constraining the action to specific
>> > microversions may be necessary, but that's not always true.
>> >
>> > In case 1 above I think we could find a better way to do this. And I
>> > don't think we should do case 2, though there may be special cases that
>> > warrant it.
>> >
>> > As possible alternate signalling methods I would like to propose the
>> > following for consideration:
>> >
>> > Exposing capabilities that a user is allowed to use. This has been
>> > discussed before and there is general agreement that this is something
>> > we would like in Nova. Capabilities will programatically inform users
>> > that a new action has been added or an existing action can be performed
>> > in more cases, like revert_resize. With that in place we can avoid the
>> > ambiguous use of microversions to do that. In the meantime I would like
>> > the team to consider not using microversions for this case. We have
>> > enough of them being added that I think for now we could just wait for
>> > the next microversion after a capability is added and document the new
>> > capability there.
>>
>> The problem with this approach is that the capability add isn't on a
>> microversion boundary, as long as we continue to believe that we want to
>> support CD deployments this means people can deploy code with the
>> behavior change, that's not documented or signaled any way.

+1

I do wonder if we want to relax our support of CD, to some extent, but
thats a different thread.

> The fact that the capability add isn't on a microversion boundary is
> exactly my point. There's no need for it to be in many cases. But it
> would only apply for capability adds which don't affect the
> interoperability of multiple deployments.
>
> The signaling would come from the ability to query the capabilities
> listing. A change in what that listing returns indicates a behavior
> change.
>
> Another reason I like the above mechanism is that it handles differences
> in policy better as well. As much as we say that two clouds with the
> same microversions available should accept the same requests and return
> the

Re: [openstack-dev] [Nova] Some thoughts on API microversions

2016-08-04 Thread John Garbutt

On 4 August 2016 at 16:28, Edward Leafe  wrote:
> On Aug 4, 2016, at 8:18 AM, Andrew Laski  wrote:
>
>> This gets to the point I'm trying to make. We don't guarantee old
>> behavior in all cases at which point users can no longer rely on
>> microversions to signal non breaking changes. And where we do guarantee
>> old behavior sometimes we do it artificially because the only signal we
>> have is microversions and that's the contract we're trying to adhere to.
>
> I've always understood microversions to be a way to prevent breaking an 
> automated tool when we change either the input or output of our API. Its 
> benefit was less clear for the case of adding a new API, since there is no 
> chance of breaking something that would never call it. We also accept that a 
> bug fix doesn't require a microversion bump, as users should *never* be 
> expecting a 5xx response, so not only does fixing that not need a bump, but 
> such fixes can be backported to affect all microversions.
>
> The idea that by specifying a distinct microversion would somehow guarantee 
> an immutable behavior, though, is simply not the case. We discussed this at 
> length at the midcycle regarding the dropping of the nova-network code; once 
> that's dropped, there won't be any way to get that behavior no matter what 
> microversion you specify. It's gone. We signal this with deprecation notices, 
> release notes, etc., and it's up to individuals to move away from using that 
> behavior during this deprecation period. A new microversion will never help 
> anyone who doesn't follow these signals.
>
> In the case that triggered this thread [0], the change was completely on the 
> server side of things; no change to either the request or response of the 
> API. It simply allowed a failed resize to be recovered more easily. That's a 
> behavior change, not an API change, and frankly, I can't imagine anyone who 
> would ever *want* the old behavior of leaving an instance in an error state. 
> To me, that's not very different than fixing a 5xx response, as it is 
> correcting an error on the server side.
>

The problem is was thinking about is, how do you know if a cloud
supports that new behaviour? For me, a microversion does help to
advertise that. Its probably a good example of where its not important
enough to add a new capability to tell people thats possible.

That triggers the follow up question, of is that important in this
case, could you just make the call and see if it works?

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] [neutron] get_all_bw_counters in the Ironic virt driver

2016-08-02 Thread John Garbutt

On 29 July 2016 at 19:58, Sean Dague  wrote:
> On 07/29/2016 02:29 PM, Jay Pipes wrote:
>> On 07/28/2016 09:02 PM, Devananda van der Veen wrote:
>>> On 07/28/2016 05:40 PM, Brad Morgan wrote:
 I'd like to solicit some advice about potentially implementing
 get_all_bw_counters() in the Ironic virt driver.

 https://github.com/openstack/nova/blob/master/nova/virt/driver.py#L438
 Example Implementation:
 https://github.com/openstack/nova/blob/master/nova/virt/xenapi/driver.py#L320


 I'm ignoring the obvious question about how this data will actually be
 collected/fetched as that's probably it's own topic (involving
 neutron), but I
 have a few questions about the Nova -> Ironic interaction:

 Nova
 * Is get_all_bw_counters() going to stick around for the foreseeable
 future? If
 not, what (if anything) is the replacement?
>>
>> I don't think Nova should be in the business of monitoring *any*
>> transient metrics at all.
>>
>> There are many tools out there -- Nagios, collectd, HEKA, Snap, gnocchi,
>> monasca just to name a few -- that can do this work.
>>
>> What action is taken if some threshold is reached is entirely
>> deployment-dependent and not something that Nova should care about. Nova
>> should just expose an API for other services to use to control the guest
>> instances under its management, nothing more.
>
> More importantly... *only* xenapi driver implements this, and it's not
> exposed over the API. In reality that part of the virt driver layer
> should probably be removed.

AFAIK, it is only exposed via notifications:
https://github.com/openstack/nova/blob/562a1fe9996189ddd9cc5c47ab070a498cfce258/nova/notifications/base.py#L276

I think its emitted here:
https://github.com/openstack/nova/blob/562a1fe9996189ddd9cc5c47ab070a498cfce258/nova/compute/manager.py#L5886

Agreed with not adding to the legacy, and not to encourage new users of this.

Long term, it feels like removing this from Nova is the correct thing
to do, but I do worry about not having an obvious direct replacement
yet, and a transition plan. (This also feeds back into being able to
list deleted instances in the API, and DB soft_delete). Its not
trivial.

> Like jay said, there are better tools for collecting this than Nova.

I am out of touch with what folks should use to get this data, and
build a billing system? Ceilometer + something?

It feels like that story has to be solid before we delete this
support. Maybe thats already the case, and I just don't know what that
is yet?

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic][neutron][nova] Sync port state changes.

2016-07-26 Thread John Garbutt

On 26 July 2016 at 10:52, Sam Betts (sambetts) <sambe...@cisco.com> wrote:
> On 26/07/2016 09:32, "John Garbutt" <j...@johngarbutt.com> wrote:
>
>>On 22 July 2016 at 11:51, Vasyl Saienko <vsaie...@mirantis.com> wrote:
>>> Kevin, thanks for reply,
>>>
>>> On Fri, Jul 22, 2016 at 11:50 AM, Kevin Benton <ke...@benton.pub> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Once you solve the issue of getting the baremetal ports to transition
>>>>to
>>>> the ACTIVE state, a notification will automatically be emitted to Nova
>>>>of
>>>> 'network-vif-plugged' with the port ID. Will ironic not have access to
>>>>that
>>>> event via Nova?
>>>>
>>> To solve issues of getting the baremetal ports to transition to the
>>>ACTIVE
>>> state we should do the following:
>>>
>>> Use FLAT network instead of VXLAN for Ironic gate jobs [3].
>>> On Nova side set vnic_type to baremetal for Ironic hypervisor [0].
>>> On Neutron side, perform fake 'baremetal' port binding [2] in case of
>>>FLAT
>>> network.
>>>
>>> We need to receive direct notifications from Neutron to Ironic, because
>>> Ironic creates ports in provisioning network by his own.
>>> Nova doesn't know anything about provisioning ports.
>>>
>>>> If not, Ironic could develop a service plugin that just listens for
>>>>port
>>>> update events and relays them to Ironic.
>>>>
>>>
>>> I already prepared PoC [4] to Neutron that allows to send notifications
>>>to
>>> Ironic on port_update event.
>>>
>>> Reference:
>>> [0] https://review.openstack.org/339143
>>> [1] https://review.openstack.org/339129
>>> [3] https://review.openstack.org/340695
>>> [4] https://review.openstack.org/345211
>>
>>I prefer Kevin's idea of Nova always getting the vif events.
>>
>>Can't the ironic virt driver can pass on the event to ironic, if required?
>>
>>In the libvirt case, for example, the virt driver waits to start the
>>instance until the networking has been setup. It feels like we might
>>need that in the Ironic case as well?
>>
>>Or is that likely to all end up too messy?
>>
>>Thanks,
>>John
>
> This is potentially possible for the nova user's network ports, but Ironic
> creates its own neutron ports in the Ironic provisioning and cleaning
> networks to perform provisioning and cleaning for a node. Nova is never
> aware of these ports as they have nothing to do with the tenant¹s
> requested connectivity.

Ah, OK. I see why you want those to go to ironic.

I clearly need to go a re-read the plan we have for all that.

Thanks,
johnthetubaguy

>>
>>>>
>>>> On Tue, Jul 12, 2016 at 4:07 AM, Vasyl Saienko <vsaie...@mirantis.com>
>>>> wrote:
>>>>>
>>>>> Hello Community,
>>>>>
>>>>> I'm working to make Ironic be aware about  Neutron port state changes
>>>>> [0].
>>>>> The issue consists of two parts:
>>>>>
>>>>> Neutron ports for baremetal instances remain in DOWN state [1]. The
>>>>>issue
>>>>> occurs because there is no mechanism driver that binds ports. To
>>>>>solve it we
>>>>> need to create port with  vnic_type='baremetal' in Nova [2], and bind
>>>>>in
>>>>> Neutron. New mechanism driver that supports baremetal vnic_type is
>>>>>needed
>>>>> [3].
>>>>>
>>>>> Sync Neutron events with Ironic. According to Neutron architecture [4]
>>>>> mechanism drivers work synchronously. When the port is bound by ml2
>>>>> mechanism driver it becomes ACTIVE. While updating dhcp information
>>>>>Neutron
>>>>> uses dhcp agent, which is asynchronous call. I'm confused here, since
>>>>>ACTIVE
>>>>> port status doesn't mean that it operates (dhcp agent may fail to
>>>>>setup
>>>>> port). The issue was solved by [5]. So starting from [5] when ML2
>>>>>uses new
>>>>> port status update flow, port update is always asynchronous
>>>>>operation. And
>>>>> the most efficient way is to implement callback mechanism between
>>>>>Neutron
>>>>> and Ironic is like it's done for Neutron/Nova.
>>>>>
>>>>> Neutron/Nova/Ironic teams let me

Re: [openstack-dev] [ironic][neutron][nova] Sync port state changes.

2016-07-26 Thread John Garbutt

On 22 July 2016 at 11:51, Vasyl Saienko  wrote:
> Kevin, thanks for reply,
>
> On Fri, Jul 22, 2016 at 11:50 AM, Kevin Benton  wrote:
>>
>> Hi,
>>
>> Once you solve the issue of getting the baremetal ports to transition to
>> the ACTIVE state, a notification will automatically be emitted to Nova of
>> 'network-vif-plugged' with the port ID. Will ironic not have access to that
>> event via Nova?
>>
> To solve issues of getting the baremetal ports to transition to the ACTIVE
> state we should do the following:
>
> Use FLAT network instead of VXLAN for Ironic gate jobs [3].
> On Nova side set vnic_type to baremetal for Ironic hypervisor [0].
> On Neutron side, perform fake 'baremetal' port binding [2] in case of FLAT
> network.
>
> We need to receive direct notifications from Neutron to Ironic, because
> Ironic creates ports in provisioning network by his own.
> Nova doesn't know anything about provisioning ports.
>
>> If not, Ironic could develop a service plugin that just listens for port
>> update events and relays them to Ironic.
>>
>
> I already prepared PoC [4] to Neutron that allows to send notifications to
> Ironic on port_update event.
>
> Reference:
> [0] https://review.openstack.org/339143
> [1] https://review.openstack.org/339129
> [3] https://review.openstack.org/340695
> [4] https://review.openstack.org/345211

I prefer Kevin's idea of Nova always getting the vif events.

Can't the ironic virt driver can pass on the event to ironic, if required?

In the libvirt case, for example, the virt driver waits to start the
instance until the networking has been setup. It feels like we might
need that in the Ironic case as well?

Or is that likely to all end up too messy?

Thanks,
John

>>
>> On Tue, Jul 12, 2016 at 4:07 AM, Vasyl Saienko 
>> wrote:
>>>
>>> Hello Community,
>>>
>>> I'm working to make Ironic be aware about  Neutron port state changes
>>> [0].
>>> The issue consists of two parts:
>>>
>>> Neutron ports for baremetal instances remain in DOWN state [1]. The issue
>>> occurs because there is no mechanism driver that binds ports. To solve it we
>>> need to create port with  vnic_type='baremetal' in Nova [2], and bind in
>>> Neutron. New mechanism driver that supports baremetal vnic_type is needed
>>> [3].
>>>
>>> Sync Neutron events with Ironic. According to Neutron architecture [4]
>>> mechanism drivers work synchronously. When the port is bound by ml2
>>> mechanism driver it becomes ACTIVE. While updating dhcp information Neutron
>>> uses dhcp agent, which is asynchronous call. I'm confused here, since ACTIVE
>>> port status doesn't mean that it operates (dhcp agent may fail to setup
>>> port). The issue was solved by [5]. So starting from [5] when ML2 uses new
>>> port status update flow, port update is always asynchronous operation. And
>>> the most efficient way is to implement callback mechanism between Neutron
>>> and Ironic is like it's done for Neutron/Nova.
>>>
>>> Neutron/Nova/Ironic teams let me know your thoughts on this.
>>>
>>>
>>> Reference:
>>> [0] https://bugs.launchpad.net/ironic/+bug/1304673
>>> [1] https://bugs.launchpad.net/neutron/+bug/1599836
>>> [2] https://review.openstack.org/339143
>>> [3] https://review.openstack.org/#/c/339129/
>>> [4]
>>> https://www.packtpub.com/sites/default/files/Article-Images/B04751_01.png
>>> [5]
>>> https://github.com/openstack/neutron/commit/b672c26cb42ad3d9a17ed049b506b5622601e891
>>>
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] neutron port duplication

2016-07-25 Thread John Garbutt

On 22 July 2016 at 14:54, Andrey Volkov  wrote:
> Hi, nova and neutron teams,
>
> While booting new instance nova requests port for that instance in the
> neutron.
> It's possible to have a situation when neutron doesn't response due timeout
> or connection break and nova retries port creation. It definitely results in
> ports duplication for instance [1].
>
> To solve this issue different methods can be applied:
> - Transactional port creating in neutron (when it's possible to rollback if
> the client doesn't accept the answer).
> - Idempotent port creation (when the client provides some id and server does
> get_or_create on this id).
> - Getting port on the client before next retry attempt (idempotent port
> creation on the client side).
>
> Questions to community:
> - Am I right with my thoughts? Does the problem exist? Maybe there is tool
> already can solve it?
> - Which method is better to apply to solve the problem if it exists?
>
> [1] https://bugs.launchpad.net/nova/+bug/1603909

So I am currently taking a close look at Nova and Neutron interactions
to eliminate these kinds of things.

I hope to work with Neutron to evolve our APIs to try and eliminate
this kind of thing, more systematically. I have promised to work with
Nova and Neutron to get a plan together for the next summit.

I am super happy you have tracked down one of failure modes here.

If we hit a port create timeout, it possible Neutron has created the
port, but Nova never gets the port uuid, so doesn't delete that port
during its cleanup.

I think we probably need to list the ports in neutron when working out
what to delete, to make sure we don't leak ports due to timeouts.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Remove duplicate code using Data Driven Tests (DDT)

2016-07-25 Thread John Garbutt

On 25 July 2016 at 13:56, Bhor, Dinesh  wrote:
>
>
> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: Monday, July 25, 2016 5:53 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Nova] Remove duplicate code using Data Driven 
> Tests (DDT)
>
> On 07/25/2016 08:05 AM, Daniel P. Berrange wrote:
>> On Mon, Jul 25, 2016 at 07:57:08AM -0400, Sean Dague wrote:
>>> On 07/22/2016 11:30 AM, Daniel P. Berrange wrote:
 On Thu, Jul 21, 2016 at 07:03:53AM -0700, Matt Riedemann wrote:
> On 7/21/2016 2:03 AM, Bhor, Dinesh wrote:
>
> I agree that it's not a bug. I also agree that it helps in some
> specific types of tests which are doing some kind of input
> validation (like the patch you've proposed) or are simply iterating
> over some list of values (status values on a server instance for example).
>
> Using DDT in Nova has come up before and one of the concerns was
> hiding details in how the tests are run with a library, and if
> there would be a learning curve. Depending on the usage, I
> personally don't have a problem with it. When I used it in manila
> it took a little getting used to but I was basically just looking
> at existing tests and figuring out what they were doing when adding new 
> ones.

 I don't think there's significant learning curve there - the way it
 lets you annotate the test methods is pretty easy to understand and
 the ddt docs spell it out clearly for newbies. We've far worse
 things in our code that create a hard learning curve which people
 will hit first :-)

 People have essentially been re-inventing ddt in nova tests already
 by defining one helper method and them having multiple tests methods
 all calling the same helper with a different dataset. So ddt is just
 formalizing what we're already doing in many places, with less code
 and greater clarity.

> I definitely think DDT is easier to use/understand than something
> like testscenarios, which we're already using in Nova.

 Yeah, testscenarios feels little over-engineered for what we want
 most of the time.
>>>
>>> Except, DDT is way less clear (and deterministic) about what's going
>>> on with the test name munging. Which means failures are harder to
>>> track back to individual tests and data load. So debugging the failures is 
>>> harder.
>>
>> I'm not sure what you think is unclear - given an annotated test:
>>
>>@ddt.data({"foo": "test", "availability_zone": "nova1"},
>>   {"name": "  test  ", "availability_zone": "nova1"},
>>   {"name": "", "availability_zone": "nova1"},
>>   {"name": "x" * 256, "availability_zone": "nova1"},
>>   {"name": "test", "availability_zone": "x" * 256},
>>   {"name": "test", "availability_zone": "  nova1  "},
>>   {"name": "test", "availability_zone": ""},
>>   {"name": "test", "availability_zone": "nova1", "foo": "bar"})
>> def test_create_invalid_create_aggregate_data(self, value):
>>
>> It is generated one test for each data item:
>>
>>  test_create_invalid_create_aggregate_data_1
>>  test_create_invalid_create_aggregate_data_2
>>  test_create_invalid_create_aggregate_data_3
>>  test_create_invalid_create_aggregate_data_4
>>  test_create_invalid_create_aggregate_data_5
>>  test_create_invalid_create_aggregate_data_6
>>  test_create_invalid_create_aggregate_data_7
>>  test_create_invalid_create_aggregate_data_8
>>
>> This seems about as obvious as you can possibly get
>
> At least when this was attempted to be introduced into Tempest, the naming 
> was a lot less clear, maybe it got better. But I still think milestone 3 
> isn't the time to start a thing like this.
>
> -Sean
>
> Hi Sean,
>
> IMO it is possible to have a descriptive name to test cases using DDT.
>
> For ex.,
>
> @ddt.data(annotated('missing_name', {"foo": "test", "availability_zone": 
> "nova1"}),
>   annotated('name_greater_than_255_characters', {"name": "x" * 256, 
> "availability_zone": "nova1"}))
> def test_create_invalid_aggregate_data(self, value):
>
>
> it generates following test names:
>
> test_create_invalid_aggregate_data_2_name_greater_than_255_characters
> test_create_invalid_aggregate_data_1_missing_name
>
> I think with this it is easy to identify which test scenario has failed.
>
> Same is implemented in openstack/zaqar.
> https://github.com/openstack/zaqar/blob/master/zaqar/tests/functional/wsgi/v1/test_queues.py#L87

That descriptive name does help with some of my fears (although I wish
it were simpler, maybe just using kwargs in data).

I am +1 sdague's not now. I know we normally allow unit test things,
but we have such a huge backlog, full of really useful changes we
should merge instead, I would rather we all looked at those.

Thanks,

Re: [openstack-dev] [nova][scheduler] A simple solution for better scheduler performance

2016-07-15 Thread John Garbutt

On 15 July 2016 at 09:26, Cheng, Yingxin  wrote:
> It is easy to understand that scheduling in nova-scheduler service consists 
> of 2 major phases:
> A. Cache refresh, in code [1].
> B. Filtering and weighing, in code [2].
>
> Couple of previous experiments [3] [4] shows that “cache-refresh” is the 
> major bottleneck of nova scheduler. For example, the 15th page of 
> presentation [3] says the time cost of “cache-refresh” takes 98.5% of time of 
> the entire `_schedule` function [6], when there are 200-1000 nodes and 50+ 
> concurrent requests. The latest experiments [5] in China Mobile’s 1000-node 
> environment also prove the same conclusion, and it’s even 99.7% when there’re 
> 40+ concurrent requests.
>
> Here’re some existing solutions for the “cache-refresh” bottleneck:
> I. Caching scheduler.
> II. Scheduler filters in DB [7].
> III. Eventually consistent scheduler host state [8].
>
> I can discuss their merits and drawbacks in a separate thread, but here I 
> want to show a simplest solution based on my findings during the experiments 
> [5]. I wrapped the expensive function [1] and tried to see the behavior of 
> cache-refresh under pressure. It is very interesting to see a single 
> cache-refresh only costs about 0.3 seconds. And when there’re concurrent 
> cache-refresh operations, this cost can be suddenly increased to 8 seconds. 
> I’ve seen it even reached 60 seconds for one cache-refresh under higher 
> pressure. See the below section for details.

I am curious about what DB driver you are using?
Using PyMySQL should remove at lot of those issues.
This is the driver we use in the gate now, but it didn't used to be the default.

If you use the C based MySQL driver, you will find it locks the whole
process when making a DB call, then eventlet schedules the next DB
call, etc, etc, and then it loops back and allows the python code to
process the first db call, etc. In extreme cases you will find the
code processing the DB query considers some of the hosts to be down
since its so long since the DB call was returned.

Switching the driver should dramatically increase the performance of (II)

> It raises a question in the current implementation: Do we really need a 
> cache-refresh operation [1] for *every* requests? If those concurrent 
> operations are replaced by one database query, the scheduler is still happy 
> with the latest resource view from database. Scheduler is even happier 
> because those expensive cache-refresh operations are minimized and much 
> faster (0.3 seconds). I believe it is the simplest optimization to scheduler 
> performance, which doesn’t make any changes in filter scheduler. Minor 
> improvements inside host manager is enough.

So it depends on the usage patterns in your cloud.

The caching scheduler is one way to avoid the cache-refresh operation
on every request. It has an upper limit on throughput as you are
forced into having a single active nova-scheduler process.

But the caching means you can only have a single nova-scheduler
process, where as (II) allows you to have multiple nova-scheduler
workers to increase the concurrency.

> [1] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104
> [2] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123
> [3] 
> https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf
> [4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html
> [5] Please refer to Barcelona summit session ID 15334 later: “A tool to test 
> and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.”
> [6] 
> https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53
> [7] https://review.openstack.org/#/c/300178/
> [8] https://review.openstack.org/#/c/306844/
>
>
> ** Here is the discovery from latest experiments [5] **
> https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing
>
> The figure 1 illustrates the concurrent cache-refresh operations in a nova 
> scheduler service. There’re at most 23 requests waiting for the cache-refresh 
> operations at time 43s.
>
> The figure 2 illustrates the time cost of every requests in the same 
> experiment. It shows that the cost is increased with the growth of 
> concurrency. It proves the vicious circle that a request will wait longer for 
> the database when there’re more waiting requests.
>
> The figure 3/4 illustrate a worse case when the cache-refresh operation costs 
> reach 60 seconds because of excessive cache-refresh operations.

Sorry, its not clear to be if this was using I, II, or III? It seems
like its just using the default system?

This looks like the problems I have seen when you don't use PyMySQL
for your DB driver.

Thanks,
John

__
OpenStack Development Mailing

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-17 Thread John Garbutt

On 15 June 2016 at 02:37, Sean Dague  wrote:
> On 06/14/2016 07:28 PM, Monty Taylor wrote:
>>
>> On 06/14/2016 05:42 PM, Doug Hellmann wrote:
>
> 
>>
>> I think this is the most important thing to me as it relates to this.
>> I'm obviously a huge proponent of clouds behaving more samely. But I
>> also think that, as Doug nicely describes above, we've sort of backed in
>> to removing something without a deprecation window ... largely because
>> of the complexities involved with the system here - and I'd like to make
>> sure that when we are being clear about behavior changes that we give
>> the warning period so that people can adapt.

+1

> I also think that "pass" to "pass *"  is useful social incentive. While I
> think communication of this new direction has happened pretty broadly,
> organizations are complex places, and it didn't filter everywhere it needed
> to with the urgency that was probably needed.
>
> "pass *"  * - with a greylist which goes away in 6 months
>
> Will hopefully be a reasonable enough push to get the behavior we want,
> which is everyone publishing the same interface.

+1

I know I am going back in time, but +1 all the same.

I like how this pushes forward the of interoperability cause.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Using image metadata to sanity check supplied authentication data at nova 'create' or 'recreate' time?

2016-06-17 Thread John Garbutt

On 7 June 2016 at 17:41, Jim Rollenhagen  wrote:
> On Tue, Jun 07, 2016 at 03:10:24PM +0100, Daniel P. Berrange wrote:
>> On Tue, Jun 07, 2016 at 09:37:25AM -0400, Jim Rollenhagen wrote:
>> > Right, so that's a third case. How I'd see this working is maybe an
>> > image property called "auth_requires" that could be one of ["none",
>> > "ssh_key", "x509_cert", "password"]. Or maybe it could be multiple
>> > values that are OR'd, so for example an image could require an ssh key
>> > or an x509 cert. If the "auth_requires" property isn't found, default to
>> > "none" to maintain compatibility, I guess.

That sounds reasonable to me.

>> NB, even if you have an image that requires an SSH key to be provided in
>> order to enable login, it is sometimes valid to not provide one. Not least
>> during development, I'm often testing images which would ordinarily require
>> an SSH key, but I don't actually need the ability to login, so I don't bother
>> to provide one.
>>
>> So if we provided this ability to tag images as needing an ssh key, and then
>> enforced that, we would then also need to extend the API to provide a way to
>> tell nova to explicitly ignore this and not bother enforcing it, despite what
>> the image metadata says.
>>
>> I'm not particularly convinced the original problem is serious enough to
>> warrant building such a solution. It feels like the kind of mistake that
>> people would do once, and then learn their mistake thereafter. IOW the
>> consequences of the mistake don't seem particularly severe really.

So the problem this is trying to resolve is reducing support calls /
reducing user frustration.

Doing that by making it harder to use our API in the "wrong" way seems
fine to me. It seems similar to the checks we do on neutron ports, to
make sure you have something that looks useful, to help avoid user
confusion.

By default, you would have no image_metadata, so there would be no
restriction, so I don't think it should impact most people's testing.
Once we get this spec done, the override should be simple:
http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/virt-image-props-boot-override.html

>> > The bigger question here is around hitting the images API syncronously
>> > during a boot request, and where/how/if to cache the metadata that's
>> > returned so we don't have to do it so often. I don't have a good answer
>> > for that, though.
>>
>> Nova already uses image metadata for countless things during the VM boot
>> request, so there's nothin new in this respect. We only query glance
>> once, thereafter the image metadata is cached by Nova in the DB on a per
>> instance basis, because we need to be isolated from later changes to the
>> metadata in glance after the VM boots.

+1

> This is beyond the API though, right? The purpose of the spec here is to
> reject the request if there isn't enough information to boot the
> machine.

Its normal to access it anywhere you need it form the instance object.
Random example:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1585

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][tc] Require a level playing field for OpenStack projects

2016-06-17 Thread John Garbutt

On 16 June 2016 at 09:58, Thierry Carrez  wrote:
> Project team requirements are just guidelines, which are interpreted by
> humans. In the end, the TC members vote and use human judgment rather than
> blind 'rules'. I just want (1) to state that a level playing field is an
> essential part of what we call "open collaboration", and (2) to have TC
> members *consider* whether the project presents a fair playing field or not,
> as part of how they judge future project teams.

FWIW, I think this is what wins me over.
These are just guidelines to be considered by humans.

> There is a grey area that requires human judgment here. In your example
> above, if the open implementation was unusable open core bait to lure people
> into using the one and only proprietary driver, it would be a problem. If
> the open implementation was fully functional and nothing prevented adding
> additional proprietary drivers, there would be no problem.

This answers a lot of the questions I had after reading the idea.
Along with the fact this is only about official projects.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Let me know if you have an approved spec but unapproved blueprint

2016-06-14 Thread John Garbutt

Hi,

I just fixed up Timofei's BP.
I went through all the specs and spotted another 5 or 6 that were out of sync.
There may well be others I didn't spot.

Thanks,
johnthetubaguy

On 14 June 2016 at 09:17, Timofei Durakov  wrote:
> Hi
>
> https://blueprints.launchpad.net/nova/+spec/remove-compute-compute-communication
> - bp is not approved, but the spec is.
>
> Timofey
>
> On Tue, Jun 14, 2016 at 3:47 AM, joehuang  wrote:
>>
>> Hi, Matt,
>>
>> Thank you for the clarification.
>>
>> Best Regards
>> Chaoyi Huang ( Joe Huang )
>>
>>
>> -Original Message-
>> From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
>> Sent: Monday, June 13, 2016 9:07 AM
>> To: openstack-dev@lists.openstack.org
>> Subject: Re: [openstack-dev] [nova] Let me know if you have an approved
>> spec but unapproved blueprint
>>
>> On 6/12/2016 7:48 PM, joehuang wrote:
>> > Hello,
>> >
>> > This spec is not approved yet:
>> > https://review.openstack.org/#/c/295595/
>> >
>> > But the BP is approved:
>> > https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-a
>> > pi
>> >
>> > Don't know how to deal with the spec now. Is this spec killed? Should
>> > Nova support application level consistency snapshot for disaster recovery
>> > purpose or not?
>> >
>> > Best Regards
>> > Chaoyi Huang ( Joe Huang )
>> >
>> > -Original Message-
>> > From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
>> > Sent: Sunday, June 12, 2016 9:08 PM
>> > To: OpenStack Development Mailing List (not for usage questions)
>> > Subject: [openstack-dev] [nova] Let me know if you have an approved
>> > spec but unapproved blueprint
>> >
>> > I've come across several changes up for review that are tied to Newton
>> > blueprints which have specs approved but the blueprints in launchpad are 
>> > not
>> > yet approved.
>> >
>> > If you have a spec that was approved for Newton but your blueprint in
>> > launchpad isn't approved yet, please ping me (mriedem) in IRC or reply to
>> > this thread to get it approved and tracked for the Newton release.
>> > It's important (at least to me) that we have an accurate representation
>> > of how much work we're trying to get done this release, especially with
>> > non-priority feature freeze coming up in three weeks.
>> >
>>
>> Neither the spec nor the blueprint is approved. The blueprint was
>> previously approved in mitaka but is not for newton, with reasons in the
>> spec review for newton.
>>
>> At this point we're past non-priority spec approval freeze so this isn't
>> going to get in for newton. There are a lot of concerns about this one so
>> it's going to be tabled for at least this release, we can revisit in ocata,
>> but it adds a lot of complexity and it's more than we're willing to take on
>> right now given everything else planned for this release.
>>
>> --
>>
>> Thanks,
>>
>> Matt Riedemann
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Centralize Configuration: ignore service list for newton

2016-06-07 Thread John Garbutt

On 27 May 2016 at 17:15, Markus Zoeller <mzoel...@linux.vnet.ibm.com> wrote:
> On 20.05.2016 11:33, John Garbutt wrote:
>> Hi,
>>
>> The current config template includes a list of "Services which consume this":
>> http://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/centralize-config-options.html#quality-view
>>
>> I propose we drop this list from the template.
>>
>> I am worried this is going to be hard to maintain, and hard to review
>> / check. As such, its of limited use to most deployers in its current
>> form.
>>
>> I have been thinking about a possible future replacement. Two separate
>> sample configuration files, one for the Compute node, and one for
>> non-compute nodes (i.e. "controller" nodes). The reason for this
>> split, is our move towards removing sensitive credentials from compute
>> nodes, etc. Over time, we could prove the split in gate testing, where
>> we look for conf options accessed by computes that shouldn't be, and
>> v.v.
>>
>>
>> Having said that, for newton, I propose we concentrate on:
>> * completing the move of all the conf options (almost there)
>> * (skip tidy up of deprecated options)
>> * tidying up the main description of each conf option
>> * tidy up the Opt group and Opt types, i.e. int min/max, str choices, etc
>> ** move options to use stevedoor, where needed
>> * deprecating ones that are dumb / unused
>> * identifying "required" options (those you have to set)
>> * add config group descriptions
>> * note any surprising dependencies or value meanings (-1 vs 0 etc)
>> * ensure the docs and sample files are complete and correct
>>
>> I am thinking we could copy API ref and add a comment at the top of
>> each file (expecting a separate patch for each step):
>> * fix_opt_registration_consistency (see sfinucan's tread)
>> * fix_opt_description_indentation
>> * check_deprecation_status
>> * check_opt_group_and_type
>> * fix_opt_description
>
>
> I pushed [1] which introduced the flags from above. I reordered them
> from most to least important, which is IMO:
>
> # needs:fix_opt_description
> # needs:check_deprecation_status
> # needs:check_opt_group_and_type
> # needs:fix_opt_description_indentation
> # needs:fix_opt_registration_consistency

This looks good to me:
https://review.openstack.org/#/c/322255/1

sneti (Sujitha) has put together a wiki page to help describe what
each step means:
https://wiki.openstack.org/wiki/ConfigOptionsConsistency

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo_config] Improving Config Option Help Texts

2016-05-24 Thread John Garbutt

On 24 May 2016 at 19:03, Ian Cordasco  wrote:
> -Original Message-
> From: Erno Kuvaja 
> Reply: OpenStack Development Mailing List (not for usage questions)
> 
> Date: May 24, 2016 at 06:06:14
> To: OpenStack Development Mailing List (not for usage questions)
> 
> Subject:  [openstack-dev] [all][oslo_config] Improving Config Option Help 
> Texts
>
>> Hi all,
>>
>> Based on the not yet merged spec of categorized config options [0] some
>> project seems to have started improving the config option help texts. This
>> is great but I noticed scary trend on clutter to be added on these
>> sections. Now looking individual changes it does not look that bad at all
>> in the code 20 lines well structured templating. Until you start comparing
>> it to the example config files. Lots of this data is redundant to what is
>> generated to the example configs already and then the maths struck me.
>>
>> In Glance only we have ~120 config options (this does not include
>> glance_store nor any other dependencies we pull in for our configs like
>> Keystone auth. Those +20 lines of templating just became over 2000 lines of
>> clutter in the example configs and if all projects does that we can
>> multiply the issue. I think no-one with good intention can say that it's
>> beneficial for our deployers and admins who are already struggling with the
>> configs.
>>
>> So I beg you when you do these changes to the config option help fields
>> keep them short and compact. We have the Configuration Docs for extended
>> descriptions and cutely formatted repetitive fields, but lets keep those
>> off from the generated (Example) config files. At least I would like to be
>> able to fit more than 3 options on the screen at the time when reading
>> configs.
>>
>> [0] https://review.openstack.org/#/c/295543/
>
> Hey Erno,
>
> So here's where I have to very strongly disagree with you. That spec
> was caused by operator feedback, specifically for projects that
> provide multiple services that may or may not have separated config
> files which and which already have "short and compact" descriptions
> that are not very helpful to oeprators.

+1

The feedback at operator sessions in Manchester and Austin seemed to
back up the need for better descriptions.

More precisely, Operators should not need to read the code to
understand how to use the configuration option.

Now often that means they are longer. But they shouldn't be too long.

> The *example* config files will have a lot more detail in them. Last I
> saw (I've stopped driving that specification) there was going to be a
> way to generate config files without all of the descriptions. That
> means that for operators who don't care about that can ignore it when
> they generate configuration files. Maybe the functionality doesn't
> work right this instant, but I do believe that's a goal and it will be
> implemented.

Different modes of the config generator should help us cater for
multiple use cases.

I am leaving that as a discussion in oslo specs for the moment.

> Beyond that, I don't think example/sample configuration files should
> be treated differently from documentation, nor do I think that our
> documentation team couldn't make use of the improved documentation
> we're adding to each option. In short, I think this effort will
> benefit many different groups of people in and around OpenStack.
> Simply arguing that this is going to make the sample config files have
> more lines of code is not a good argument against this. Please do
> reconsider.

Now I have been discussing a change in Nova's approach to reduce the
size of some of them, but that was really for different reasons:
http://lists.openstack.org/pipermail/openstack-dev/2016-May/095538.html

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [nova] Is verification of images in the image cache necessary?

2016-05-24 Thread John Garbutt

On 24 May 2016 at 10:16, Matthew Booth  wrote:
> During its periodic task, ImageCacheManager does a checksum of every image
> in the cache. It verifies this checksum against a previously stored value,
> or creates that value if it doesn't already exist.[1] Based on this
> information it generates a log message if the image is corrupt, but
> otherwise takes no action. Going by git, this has been the case since 2012.
>
> The commit which added it was associated with 'blueprint
> nova-image-cache-management phase 1'. I can't find this blueprint, but I did
> find this page: https://wiki.openstack.org/wiki/Nova-image-cache-management
> . This talks about 'detecting images which are corrupt'. It doesn't explain
> why we would want to do that, though. It also doesn't seem to have been
> followed through in the last 4 years, suggesting that nobody's really that
> bothered.
>
> I understand that corruption of bits on disks is a thing, but it's a thing
> for more than just the image cache. I feel that this is a problem much
> better solved at other layers, prime candidates being the block and
> filesystem layers. There are existing robust solutions to bitrot at both of
> these layers which would cover all aspects of data corruption, not just this
> randomly selected slice.

+1

That might mean improved docs on the need to configure such a thing.

> As it stands, I think this code is regularly running a pretty expensive task
> looking for something which will very rarely happen, only to generate a log
> message which nobody is looking for. And it could be solved better in other
> ways. Would anybody be sad if I deleted it?

For completeness, we need to deprecate it using the usual cycles:
https://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html

I like the idea of checking the md5 matches before each boot, as it
mirrors the check we do after downloading from glance. Its possible
thats very unlikely to spot anything that shouldn't already be worried
about by something else. It may just be my love of symmetry that makes
me like that idea?

Thanks,
johnthetubaguy


> [1] Incidentally, there also seems to be a bug in this implementation, in
> that it doesn't hold the lock on the image itself at any point during the
> hashing process, meaning that it cannot guarantee that the image has
> finished downloading yet.
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
>
> Phone: +442070094448 (UK)
>
>
> ___
> OpenStack-operators mailing list
> openstack-operat...@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Centralize Configuration: ignore service list for newton

2016-05-20 Thread John Garbutt

Hi,

The current config template includes a list of "Services which consume this":
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/centralize-config-options.html#quality-view

I propose we drop this list from the template.

I am worried this is going to be hard to maintain, and hard to review
/ check. As such, its of limited use to most deployers in its current
form.

I have been thinking about a possible future replacement. Two separate
sample configuration files, one for the Compute node, and one for
non-compute nodes (i.e. "controller" nodes). The reason for this
split, is our move towards removing sensitive credentials from compute
nodes, etc. Over time, we could prove the split in gate testing, where
we look for conf options accessed by computes that shouldn't be, and
v.v.

Having said that, for newton, I propose we concentrate on:
* completing the move of all the conf options (almost there)
* (skip tidy up of deprecated options)
* tidying up the main description of each conf option
* tidy up the Opt group and Opt types, i.e. int min/max, str choices, etc
** move options to use stevedoor, where needed
* deprecating ones that are dumb / unused
* identifying "required" options (those you have to set)
* add config group descriptions
* note any surprising dependencies or value meanings (-1 vs 0 etc)
* ensure the docs and sample files are complete and correct

I am thinking we could copy API ref and add a comment at the top of
each file (expecting a separate patch for each step):
* fix_opt_registration_consistency (see sfinucan's tread)
* fix_opt_description_indentation
* check_deprecation_status
* check_opt_group_and_type
* fix_opt_description

Does that sound like a good plan? If so, I can write this up in a wiki page.

Thanks,
John

PS
I also have concerns around the related config options bits and
possible values bit, but thats a different thread. Lets focus on the
main body of the description for now.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [all] [glance] On operating a high throughput or otherwise team

2016-05-19 Thread John Garbutt

On 17 May 2016 at 09:56, Thierry Carrez <thie...@openstack.org> wrote:
> John Garbutt wrote:
>>
>> [...]
>> Agreed that with a shared language, the ML is more effective.
>> [...]
>> I think some IRC meeting work, in a standup like way, for those with a
>> previously established shared context.
>
>
> Actually shared context / shared understanding / common culture is a
> prerequisite for any form of communication. The ML discussions are more
> effective, the IRC meetings can be effective, the reviews are more effective
> etc.

+1

> This shared understanding was simpler to generate in the early days of
> OpenStack where developers were a smaller group. We assumed that most of
> this shared understanding would naturally transmit to newcomers, so we
> overlooked documenting it and did not actively rebuild it as we went. We
> diluted the Design Summit into the gigantic Summit event, further preventing
> this cross-project culture to emerge in our group.
>
> Over the past cycle(s) we worked on the project team guide to document the
> shared culture. But it's not finished, and that's not enough. We also need
> time (as a cultural group) to discuss and reach this common culture, without
> distractions and without people external to the group disrupting the
> discussion (yes you see where I'm going).

+1

A great example of tribal knowledge -> written down common language

>> [...]
>> Synchronous vs Asynchronous (and in-between), high vs low bandwidth
>> communication tools all have their place. None of those replace having
>> curated content for new/returning folks to gain the current shared
>> context
>
> +1000 -- this is not about choosing between MLs vs. face-to-face meetings.
> You can't have a global community and rely only on meetings without
> excluding someone. You can't build the shared understanding and make quick
> progress on specific issues using only MLs.
>
> Global and virtual communities face three challenges: confusion, isolation,
> and fragmentation. They need to make use of the full spectrum of
> synchronous/asynchronous and simple-collaboration/complex-collaboration
> communication tools to address those challenges and actively generate
> transparency (fighting confusion), engagement (fighting isolation) and
> cohesion (fighting fragmentation).

+1
Love that summary of the issues.

John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Nova API is run on controller node instead of Compute node

2016-05-16 Thread John Garbutt

On 16 May 2016 at 09:58, Tarun  wrote:
> Hi All,
>
> I have setup the Openstack controller and compue node on 2 VMs in my windows
> 8 Laptop.
>
> It is running.
>
> I am starting development for the NOVA compute APIs.
>
> To kick-off, i am trying to call compute API, for example:
>
> $nova hypervisor-list
> Response is OK.
> It should call the 'show' function of hypervisors.py in the nova.
> Response is coming from nova code at controller node.
>
> [Problem]
> There is hypervisors.py file present on both controller and compute node.
> Path is
> /usr/lib/python2.7/dist-packages/nova/api/openstack/compute/contrib/hypervisors.py
>
> I have put logs in the 'show' function in hypervisors.py file present on
> both controller and compute node.
> But, 'show' function called at controller side only. It is not invoking
> functions of hypervisors.py at compute node.
>
> Please let me know whether there is conf file parameters gaps in openstack
> modules (nova) or any other gap, which would allow my setup to call compute
> API code from compute node instead of controller when compute API is run
> from controller node.
>
> I am not able to get what is exact flow from nova at controller node to nova
> at compute node when compute API is called from controller node.
>
> Looking forward for your valuable inputs.
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

It sounds like everything is working correctly.
That API call just reads from the database.

Does this document help you?
http://docs.openstack.org/developer/nova/architecture.html

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [all] [glance] On operating a high throughput or otherwise team

2016-05-16 Thread John Garbutt

Hi,

tl;dr
We need everyone agreeing and understanding the "why" behind what we are doing.
A shared language/understanding of the context is an important part of that.
Writing things down, and giving people links to that, really helps.

On 15 May 2016 at 13:07, Chris Dent  wrote:
> The fundamental problem is that we don't have shared understanding
> and we don't do what is needed to build share understanding. In fact
> we actively work against building it.

+1

> TL;DR: When we are apart we need to write more;
> when we are together we need to talk more but with less purpose.

They happen, but we should recognise the importance of those conversations.

> Most people in the OpenStack don't have that shared understanding
> and without that shared understanding the assumption problem you
> describe compounds. People make a list of goals that they believe
> are mutually shared but they are actually just a list of words for
> which everyone has a different interpretation.
...
> * Developing shared language and share understanding is a prerequisite
>   for
> * Developing shared goals which are a prerequisite for
> * Share actions

This is exactly why in Liberty Nova had a "DevRef Updates" a priority:
http://specs.openstack.org/openstack/nova-specs/priorities/liberty-priorities.html
We created things like this:
http://docs.openstack.org/developer/nova/project_scope.html

We didn't achieve all my aims, but we made some progress, and raised
some awareness of the need. Some of the blog posts (and ML posts)
outside the devref have probably been more successful. Many of those
have been peer reviewed prior to publication. (I think we should bring
the essence of those back into the central docs, where we can.)

On a micro level, the spec process has helped get review and submitter
use the same language. Frequently we used to come away from
discussions with everyone "agreeing" something slightly different. But
that only works in the context of a wider alignment.

> This myth-building happens to some extent already but there is not
> a strong culture in OpenStack of pointing people at the history.
> It's more common as a newbie to rock up in IRC with a question and
> get a brief synchronous answer when it would be better to be pointed
> at a written artifact that provides contextualizing background.

+1
When we have something already, we should totally do that.
Then follow up with a discussion to clarify the doc as required.
If we don't have it written down, its a good pointer towards the need.

> We've become trapped into the idea of using f2f or synchronous time
> for "working out the details of a single spec" because we haven't
> got the shared language that makes talking about details effective.
> If we get the shared language the details become so much easier to
> deal with, in any medium.

Frankly, the best summit and midcycle sessions for me are where we
spend time establish-ing a shared language. Coming out of a session
with a shared understanding of the problem and context, is (almost)
always my key aim.

Now, the problem you correctly mention, is the current conversations
are often additive to three or more years of previous discussions.

In the last few summits, Nova has pushed to provide "pre-reading" for
every summit session. Do we always succeed? No. But if its unclear, we
need folks to ask questions and help us to do better.

>> I think people prefer to use ML a lot and I am not a great fan of the
>> same. It is a multi-cast way of communication and it has assumptions
>> around time, space, intent of the audience & intent to actually read
>> them. Same is for gerrit/etherpad.
>
> I'm one of those people who prefers ML.

Agreed that with a shared language, the ML is more effective.

I do find video discussions are useful to resolve sticking points
between a small number of folks. Its important to create some artifact
to record the debate (and I don't mean publish the video, that doesn't
work).

> IRC meetings are just chaos.
>
> (IRC is good for two things:
>
> * Shooting the breeze and turning strangers into colleagues.
> * Dealing with emergent situations ("the gate's on fire, HALP!").

I think some IRC meeting work, in a standup like way, for those with a
previously established shared context.

> If your participants have somehow managed to achieve some shared
> understanding they can use that understanding when creating and
> discussing goals. When they get together in IRC for a weekly
> catchup and it is time to discuss "how's it going with the frobnitz"
> rather then everyone squirming because everyone has a different idea
> of what the frobnitz is (let alone how to fix it) people, although
> they may not agree on the solution, will at least agree on the
> problem domain so the scope of the discussion and the degree of what
> you're calling "disruption" (and I would call "discomfort") goes down,
> making room, finally, for shared action.

If that says what I think it says, I am

[openstack-dev] [nova] Prepping Mitaka RC2

2016-03-21 Thread John Garbutt

Hi,

We have created the Nova milestone for RC2:
https://launchpad.net/nova/+milestone/mitaka-rc2

To make bugs appear there, you need to add the mitaka series and
target RC2. Annoying LP permissions means this can only be done by a
nova-driver, AFAIK.

We will keep using the mitaka-rc-potential tag to see what should get
backported. Anything merged after we released RC1 needs backporting to
stable/mitaka to get into RC2. As we mark merged fixes as Fix Released
as they merge on master, its a bit messy trying to look at the list.
You will need a customer filter, something like this one:
http://bit.ly/1o1yAJM

My current thinking is to aim to cut RC2 on Thursday. That gives us
time to find any other stragglers, and get them backported to
stable/mitaka. We will later need to cut RC3 to add any extra
translations before the expected release date.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [cross-project] [all] Quotas -- service vs. library

2016-03-19 Thread John Garbutt

On 16 March 2016 at 10:09, Sean Dague  wrote:
> On 03/16/2016 05:46 AM, Duncan Thomas wrote:
>> On 16 March 2016 at 09:15, Tim Bell > > wrote:
>>
>> Then, there were major reservations from the PTLs at the impacts in
>> terms of
>> latency, ability to reconcile and loss of control (transactions are
>> difficult, transactions
>> across services more so).
>>
>>
>> Not just PTLs :-)
>>
>>
>> 
>> I would favor a library, at least initially. If we cannot agree on a
>> library, it
>> is unlikely that we can get a service adopted (even if it is desirable).
>>
>> A library (along the lines of 1 or 2 above) would allow consistent
>> implementation
>> of nested quotas and user quotas. Nested quotas is currently only
>> implemented
>> in Cinder and user quota implementations vary between projects which is
>> confusing.
>>
>>
>> It is worth noting that the cinder implementation has been found rather
>> lacking in correctness, atomicity requirements and testing - I wouldn't
>> suggest taking it as anything other than a PoC to be honest. Certainly
>> it should not be cargo-culted into another project in its present state.
>
> I think a library approach should probably start from scratch, with
> lessons learned from Cinder, but not really copied code, for just that
> reason.
>
> This is hard code to get right, which is why it's various degrees of
> wrong in every project in OpenStack.

+1

> A common library with it's own db tables and migration train is the only
> way I can imagine this every getting accomplished given the atomicity
> and two phase commit constraints of getting quota on long lived, async
> created resources, with sub resources that also have quota. Definitely
> think that's the nearest term path to victory.

As an aside, I believe Neutron have had some recent success with a
quota re-write, although I have not dug into that.

In Nova we have been discussing a new approach to the DB storage of
the quota, loosely similar to the resource provider claims system we
are currently adding into the scheduler:
https://review.openstack.org/#/c/182445/4/specs/backlog/approved/quotas-reimagined.rst

We also discussed the idea of dropping a lot quota reservation
complexity. Rather than worrying if the async create server succeeds,
we could just commit the quota as soon as the server is recorded in
the db. This means an failed build consumes quota until the instance
is deleted, but that seems like a good trade off. Similarly, resize up
could just consume quota (until delete or revert resize), rather than
waiting for resize confirm.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Mitaka RC1 available

2016-03-18 Thread John Garbutt

On 16 March 2016 at 18:17, Thierry Carrez  wrote:
> Hello everyone,
>
> Nova is next to produce a release candidate for the end of the Mitaka cycle!
> Congratulations to all the Nova devs. You can find the RC1 source code
> tarball at:
>
> https://tarballs.openstack.org/nova/nova-13.0.0.0rc1.tar.gz
>
> Unless release-critical issues are found that warrant a release candidate
> respin, this RC1 will be formally released as the final Mitaka Nova release
> on April 7th. You are therefore strongly encouraged to test and validate
> this tarball !

So I am expecting to cut RC2 to include translations that happen now
and release week.

To help this, we now have hit Hard String Freeze on the stable/mitaka branch.
https://wiki.openstack.org/wiki/StringFreeze#Hard_String_Freeze

> Alternatively, you can directly test the stable/mitaka release branch at:
>
> http://git.openstack.org/cgit/openstack/nova/log/?h=stable/mitaka
>
> If you find an issue that could be considered release-critical, please file
> it at:
>
> https://bugs.launchpad.net/nova/+filebug
>
> and tag it *mitaka-rc-potential* to bring it to the Nova release crew's
> attention.
>
> Note that the "master" branch of Nova is now open for Newton development,
> and feature freeze restrictions no longer apply there !

Although we have a few upgraded related DB changes, including:
https://review.openstack.org/#/c/289449/4
More on that from dansmith in a sec.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Nova Design Summit Ideas

2016-03-14 Thread John Garbutt

Hi,

To kick off the discussion about what sessions we need at the summit
we now have:
https://etherpad.openstack.org/p/newton-nova-summit-ideas

Thanks,
johnthetubaguy

PS
We tried a google form last time, but that excluded too many folks
from submitting ideas, so defaulting back to the etherpad, with
instructions that match the google form.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Nova PTL for Newton

2016-03-11 Thread John Garbutt

Hi,

It has been greatly rewarding serving you all as Nova PTL over the
Liberty and Mitaka cycles. I thank you all for the support,
encouragement and advice I have been given over the last year. I
really appreciate that. (That's british speak for "I love you all", or
something like that).

I don't plan on standing for the Newton cycle. I think its a good time
for someone to bring fresh energy, ideas, and drive to help keep Nova
driving forward. I have enjoyed my time as PTL, as such, I would
consider standing again. We have a good number of folks stepping up to
lead different parts of the project. I hope we can grow that effort,
and I hope to continue to be part of that.

I aim to continue contributing to Nova (I hope to be in Austin, and I
hope to write some code again soon). I will certainly make time to
ensure a smooth transition to the next PTL.

Many thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [release][all][ptl] preparing to create stable/mitaka branches for libraries

2016-03-10 Thread John Garbutt

On 9 March 2016 at 17:26, Doug Hellmann  wrote:
> I will process each repository as I hear from the owning team.
>
> openstack/python-novaclient 3.3.0

+1 from me.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] patches that improve code quality

2016-03-10 Thread John Garbutt

On 10 March 2016 at 09:35, Markus Zoeller  wrote:
> Radomir Dopieralski  wrote on 03/09/2016 01:22:56
> PM:
>
>> From: Radomir Dopieralski 
>> To: openstack-dev@lists.openstack.org
>> Date: 03/09/2016 01:24 PM
>> Subject: [openstack-dev] [nova] patches that improve code quality
>>
>> [...]
>>
>> And now, finally, I can get to the point of this e-mail. I'm relatively
>> new to this project, but I found no way to direct the (precious)
>> attention of core reviewers to such patches. They are not bugs, neither
>> they are parts of any new feature, and so there is no list in Nova that
>> core reviewers look at where we could add such a patch. Admittedly, such
>
>> patches are not urgent -- the code works the same (or almost the same)
>> way without them -- but it would be nice to have some way of merging
>> them eventually, because they do make our lives easier in the long run.
>>
>> So here's my question: what is the correct way to have such a patch
>> merged in Nova, and -- if there isn't one -- should we think about
>> making it easier?
>> --
>> Radomir Dopieralski
>
> Agreed, bug reports are *not* a way to document technical debt. They
> should be used for documenting faulty behavior which can hit downstream
> consumers.
>
> I also have the feeling that refactorings get less attention than
> bug fixes and features. Matt already pointed to the numbers of open
> reviews which compete for attention. One way to make those refactoring
> patches more visible and queryable could be to use a topic "refactoring"
> for them. Reviewers can then search for them if they decide to switch
> their focus to resolving technical debt.

Agreed with good points from Markus and Matt here.

For some general tips, please see:
http://docs.openstack.org/developer/nova/process.html#how-can-i-get-my-code-merged-faster

It doesn't really cover refactoring. From memory, the most successful
(larger) refactoring efforts have been tracked as a blueprint. At the
other end of the scale, the little patches you do before a bug fix
patch, so its a cleaner bug fix, often get reasonable amounts of
attention. Discussing your work with folks on IRC often helps raise
awareness of what is happening.

Its worth nothing, a large part of the current priority efforts are
"refactoring" efforts such that we can keep maintaining critical
sections that have become unmaintainable in their current form. Its
worth seeing how your efforts align with what other folks are
currently working on and/or worried about.

The bigger issue here is we don't have enough folks doing great
reviews compared to the amount of patches being proposed. I always
point folks towards this doc that encourages everyone to help with
more code reviews:
http://docs.openstack.org/developer/nova/how_to_get_involved.html#why-plus1

I hope that helps,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Identifying Mitaka release blockers

2016-03-08 Thread John Garbutt

On 3 March 2016 at 13:10, Markus Zoeller  wrote:
> We have now 11-15 days left [1] until it is planned to release the first
> release candidate. To provide a stable release, we need to identify the
> potential release blockers. Bugs which potentially block the release
> should be tagged with "mitaka-rc-potential". They can then be queried in
> Launchpad with [2].
>
> Agreed on blockers will get the "mitaka-rc1" milestone in Launchpad and
> are queryable with [3]. Because the next Nova meeting at March 10th
> could be to close to RC1 to decide actionable items, bring those
> potentials to the attention on the ML or in #openstack-nova.
>
> Open patches for blocker bugs can be queried with the script
> "query_rc_blockers.py" from [4]. The tags will be kept on the bug reports
> in case we will release additional RCs.
>
> When the final release is done we can remove the "mitaka-rc-potential"
> tag from the open bug reports. Probably it makes sense to increase their
> importance to "high" after that to put some focus on them for the
> Newton cycle. Open patches for high|critical bugs can be queried with
> "query_high_prio_bug_fixes.py" from [4].
>
> I recommend to the subteams to go through the bug reports of their area
> and double-check them if there are blockers. Please also spent some
> effort to go through the list of "new" bug reports [5]. There weren't
> enough volunteers for the bug skimming duty in the last months to get
> this list to zero.
>
> References:
> [1] http://releases.openstack.org/mitaka/schedule.html#m-rc1
> [2] https://bugs.launchpad.net/nova/+bugs?field.tag=mitaka-rc-potential
> [3] https://launchpad.net/nova/+milestone/mitaka-rc1
> [4]
> https://github.com/markuszoeller/openstack/tree/master/scripts/launchpad
> [5] https://bugs.launchpad.net/nova/+bugs?search=Search=New

Point of clarification...

In nova we are only using "mitaka-rc-potential" to track possible bugs
might block/delay RC1. You are not required to add this tag before
merging a bug fix.

Having said that, the normal rules apply for this point in the cycle.
Reviewers should reject patches that are likely to damage the quality
of the release, due to a big regression risk, breaking the soft string
freeze, etc.

As always, any questions, please ask,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Grant FFE to "Host-state level locking" BP

2016-03-04 Thread John Garbutt

tl;dr
As on IRC, I don't think this should get an FFE this cycle.

On 4 March 2016 at 10:56, Nikola Đipanov  wrote:
> Hi,
>
> The actual BP that links to the approved spec is here: [1] and 2
> outstanding patches are [2][3].
>
> Apart from the usual empathy-inspired reasons to allow this (code's been
> up for a while, yet only had real review on the last day etc.) which are
> not related to the technical merit of the work, there is also the fact
> that two initial patches that add locking around updates of the
> in-memory host map ([4] and [5]) have already been merged.
>
> They add the overhead of locking to the scheduler, but without the final
> work they don't provide any benefits (races will not be detected,
> without [2]).

We could land a patch to drop the synchronized decorators, but it
seemed like it might still help (the possibly theoretical issue?) of
two greenlets competing decrementing the same resource counts.

> I don't have any numbers on this but the result is likely that we made
> things worse, for the sake of adhering to random and made-up dates.

For details on the reasons behind our process, please see:
http://docs.openstack.org/developer/nova/process.html

>With
> this in mind I think it only makes sense to do our best to merge the 2
> outstanding patches.

Looking at the feature freeze exception criteria:
https://wiki.openstack.org/wiki/FeatureFreeze

The code is not ready to merge right now, so its hard to asses the
risk of merging it, and hard to asses how long it will take to merge.
It seems medium-ish risk, given the existing patches.

We have had 2 FFEs, just for things that were +Wed but merged when we
cut mitaka-3. They are all merged now.

Time is much tighter this cycle than usual. We also seem to have less
reviewers doing reviews than normal for this point in the cycle, and a
much bigger backlog of bug fixes to review. We only have about 7 more
working days between now and tagging RC1, at which point master opens
for Newton, and these patches are free to merge again.

While this is useful, its not a regression. It would help us detect
races in the scheduler sooner. It does not feel release critical.

As such, I don't think this it should get an exception. Lets keep
focus on the lower risk, high value bug fixes sitting in our review
backlog.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] nova hooks - document & test or deprecate?

2016-03-01 Thread John Garbutt

On 29 February 2016 at 18:49, Andrew Laski  wrote:
> On Mon, Feb 29, 2016, at 01:18 PM, Dan Smith wrote:
>> > Forgive my ignorance or for playing devil's advocate, but wouldn't the
>> > main difference between notifications and hooks be that notifications
>> > are asynchronous and hooks aren't?
>>
>> The main difference is that notifications are external and intended to
>> be stable (especially with the versioned notifications effort). The
>> hooks are internal and depend wholly on internal data structures.
>>
>> > In the case of how Rdo was using it,
>> > they are adding things to the injected_files list before the instance is
>> > created in the compute API.  You couldn't do that with notifications as
>> > far as I know.
>>
>> Nope, definitely not, but I see that as a good thing. Injecting files
>> like that is likely to be very fragile and I think mostly regarded as
>> substantially less desirable than the alternatives, regardless of how it
>> happens.
>>
>> I think that Laski's point was that the most useful and least dangerous
>> thing that hooks can be used for is the use case that is much better
>> served by notifications.
>
> Yep. My experience with them was things like updating an external cache
> on create/delete or calling out to a DNS provider to remove a reverse
> DNS entry on instance delete. Things that could easily be handled with
> notifications, and use cases that I think we should continue to support
> by improving notifications if necessary.
>
>
>>
>> So, if file injection (and any other internals-mangling that other
>> people may use them for) is not a reason to keep hooks, and if
>> notifications are the proper way to trigger on things happening, then
>> there's no reason to keep hooks.

+1 on the deprecation of hooks.

There will not be a single replacement for the hooks.

Deprecation is the best thing we can do to trigger good conversations
about how we replacing hooks with a supportable alternatives, that
will not break horribly across upgrades.

This is an important part of our push towards improving API
interoperability, see:
http://docs.openstack.org/developer/nova/project_scope.html#api-scope

The OpenStack Compute API should behave the same across all
deployments (i.e. any differences should be discoverable).

Thanks,
johnthetubaguy

PS
Around DNS integrations, we have just added this:
https://blueprints.launchpad.net/nova/+spec/neutron-hostname-dns

Injecting files fights interoperability in a nasty way, but do take a
look at this:
https://github.com/openstack/nova/blob/master/nova/api/metadata/vendordata_json.py

The work around versioned notifications should make it possible to
build a async system from our notification stream:
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/versioned-notification-api.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] nova hooks - document & test or deprecate?

2016-03-01 Thread John Garbutt

On 1 March 2016 at 10:10, Daniel P. Berrange  wrote:
> On Mon, Feb 29, 2016 at 12:36:03PM -0700, Rich Megginson wrote:
>> On 02/29/2016 12:19 PM, Chris Friesen wrote:
>> >On 02/29/2016 12:22 PM, Daniel P. Berrange wrote:
>> >
>> >>There's three core scenarios for hooks
>> >>
>> >>  1. Modifying some aspect of the Nova operation
>> >>  2. Triggering an external action synchronously to some Nova operation
>> >>  3. Triggering an external action asynchronously to some Nova operation
>> >>
>> >>The Rdo example is falling in scenario 1 since it is modifying the
>> >>injected files. I think this is is absolutely the kind of thing
>> >>we should explicitly *never* support. When external code can arbitrarily
>> >>modify some aspect of Nova operation we're in totally unchartered
>> >>territory as to the behaviour of Nova. To support that we'd need to
>> >>provide a stable internal API which is just not something we want to
>> >>tie ourselves into. I don't know just what the Rdo example is trying
>> >>to achieve, but whatever it is, it should be via some supportable API
>> >>and not a hook.,
>> >>
>> >>Scenaris 2 and 3 are both valid to consider. Using the notifications
>> >>system gets as an asynchronous trigger mechanism, which is probably
>> >>fine for many scenarios.  The big question is whether there's a
>> >>compelling need for scenario two, where the external action blocks
>> >>execution of the Nova operation until it has completed its hook.
>> >
>> >Even in the case of scenario two it is possible in some cases to use a
>> >proxy to intercept the HTTP request, take action, and then forward it or
>> >reject it as appropriate.
>> >
>> >I think the real question is whether there's a need to trigger an external
>> >action synchronously from down in the guts of the nova code.
>>
>> The hooks do the following: 
>> https://github.com/rcritten/rdo-vm-factory/blob/use-centos/rdo-ipa-nova/novahooks.py#L271
>>
>> We need to generate a token (ipaotp) and call ipa host-add with that token
>> _before_ the new machine has a chance to call ipa-client-install.  We have
>> to guarantee that the client cannot call ipa-client-install until we get
>> back the response from ipa that the host has been added with the token.
>> Additionally, we store the token in an injected_file in the new machine, so
>> the file can be removed as soon as possible.  We tried storing the token in
>> the VM metadata, but there is apparently no way to delete it?  Can the
>> machine do
>>
>> curl -XDELETE http://168.254.169.254/openstack/latest/metadata?key=value ?
>>
>> Using the build_instance.pre hook in Nova makes this simple and
>> straightforward.  It's also relatively painless to use the network_info.post
>> hook to handle the floating ip address assignment.
>>
>> Is it possible to do the above using notifications without jumping through
>> too many hoops?
>
> So from a high level POV, you are trying to generate a security token
> which will be provided to the guest OS before it is booted.
>
> I think that is a pretty clearly useful feature, and something that
> should really be officially integrated into Nova as a concept rather
> than done behind nova's back as a hook.

We did discuss the road to creating a very similar mechanism at the
last design summit, although the notes from that session are a little
cryptic:
https://etherpad.openstack.org/p/mitaka-nova-service-users

Roughly it was a per instance keystone token that would then give you
you access to Barbican, or similar.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] A prototype implementation towards the "shared state scheduler"

2016-03-01 Thread John Garbutt

On 1 March 2016 at 08:34, Cheng, Yingxin  wrote:
> Hi,
>
> I have simulated the distributed resource management with the incremental 
> update model based on Jay's benchmarking framework: 
> https://github.com/cyx1231st/placement-bench/tree/shared-state-demonstration. 
> The complete result lies at http://paste.openstack.org/show/488677/. It's ran 
> by a VM with 4 cores and 4GB RAM, and the mysql service is using the default 
> settings with the "innodb_buffer_pool_size" setting to "2G". The number of 
> simulated compute nodes are set to "300".
>
> [...]
>
> Second, here's what I've found in the centralized db claim design(i.e. rows 
> that "do claim in compute?" = No):
> 1. The speed of legacy python filtering is not slow(see rows that "Filter 
> strategy" = python): "Placement total query time" records the cost of all 
> query time including fetching states from db and filtering using python. The 
> actual cost of python filtering is (Placement_total_query_time - 
> Placement_total_db_query_time), and that's only about 1/4 of total cost or 
> even less. It also means python in-memory filtering is much faster than db 
> filtering in this experiment. See http://paste.openstack.org/show/488710/
> 2. The speed of `db filter strategy` and the legacy `python filter strategy` 
> are in the same order of magnitude, not a very huge improvement. See the 
> comparison of column "Placement total query time". Note that the extra cost 
> of `python filter strategy` mainly comes from "Placement total db query 
> time"(i.e. fetching states from db). See 
> http://paste.openstack.org/show/488709/

I think it might be time to run this in a nova-scheduler like
environment: eventlet threads responding to rabbit, using pymysql
backend, etc. Note we should get quite a bit of concurrency within a
single nova-scheduler process with the db approach.

I suspect clouds that are largely full of pets, pack/fill first, with
a smaller percentage of cattle on top, will benefit the most, as that
initial DB filter will bring back a small list of hosts.

> Third, my major concern of "centralized db claim" design is: Putting too much 
> scheduling works into the centralized db, and it is not scalable by simply 
> adding conductors and schedulers.
> 1. The filtering works are majorly done inside db by executing complex sqls. 
> If the filtering logic is much more complex(currently only CPU and RAM are 
> accounted in the experiment), the db overhead will be considerable.

So, to clarify, only resources we have claims for in the DB will be
filtered in the DB. All other filters will still occur in python.

The good news, is that if that turns out to be the wrong trade off,
its easy to revert back to doing all the filtering in python, with
zero impact on the DB schema.

> 2. The racing of centralized claims are resolved by rolling back transactions 
> and by checking the generations(see the design of "compare and update" 
> strategy in https://review.openstack.org/#/c/283253/), it also causes 
> additional overhead to db.

Its worth noting this pattern is designed to work well with a Galera
DB cluster, including one that has writes going to all the nodes.

> 3. The db overhead of filtering operation can be relaxed by moving them to 
> schedulers, that will be 38 times faster and can be executed in parallel by 
> schedulers according to the column "Placement avg query time". See 
> http://paste.openstack.org/show/488715/
> 4. The "compare and update" overhead can be partially relaxed by using 
> distributed resource claims in resource trackers. There is no need to roll 
> back transactions in updating inventories of compute local resources in order 
> to be accurate. It is confirmed by checking the db records at the end of each 
> run of eventually consistent scheduler state design.
> 5. If a) all the filtering operations are done inside schedulers,
> b) schedulers do not need to refresh caches from db because of 
> incremental updates,
> c) it is no need to do "compare and update" to compute-local 
> resources(i.e. none-shared resources),
>  then here is the performance comparison using 1 scheduler instances: 
> http://paste.openstack.org/show/488717/

The other side of the coin here is sharding.

For example, we could have a dedicated DB cluster for just the
scheduler data (need to add code to support that, but should be
possible now, I believe).

Consider if you have three types of hosts, that map directly to
specific flavors. You can shard your scheduler and DB clusters into
those groups (i.e. compute node inventory lives only in one of the
shards). When the request comes in you just route appropriate build
requests to each of the scheduler clusters.

If you have a large enough deployment, you can shard your hosts across
several DB clusters, and use a modulo or random sharding stragegy to
pick which cluster the request lands on. There are issues around
ensuring you do capacity planning that

Re: [openstack-dev] [os-brick][nova][cinder] os-brick/privsep change is done and awaiting your review

2016-02-25 Thread John Garbutt

On 25 February 2016 at 05:32, Angus Lees  wrote:
> (Reposting my reply to your gerrit comment here as well - conversation will
> probably be easier here than in gerrit)
>
> On Thu, 25 Feb 2016 at 00:07 Duncan Thomas  wrote:
>>
>> My (negative) feedback is on the review - I'm really not sure that this
>> matches what I understood the vision of privsep to be at all.
>>
>> - If this is the vision / the new vision then I think it is majorly
>> flawed.
>>
>> - If it is skipping the vision in the name of expediency of
>> implementation, then I think it has gone too far in that direction and we've
>> better off holding off one more cycle and putting it in next cycle instead
>> with a touch more purity of vision.
>>
>> Apologies since you've clearly put work into it, and I should have
>> provided such feedback earlier.
>
>
> Yes, I agree with your concerns 100% and I'm actually quite glad that you
> also understand that this isn't a very satisfying use of privsep.
>
> Better uses of privsep would look more like
> https://review.openstack.org/#/c/258252/ - but they're slow to write since
> they typically involve quite a bit of refactoring of code in order to move
> the trust boundary to a useful point up the call stack.  For os-brick in
> particular, I found it quite difficult/risky performing such code refactors
> without an easy way to actually test the affected low-level device
> manipulation.
>
> At the nova mid-cycle (and again in the nova-cinder VC conversation you were
> part of), it was decided that the difficulties in merging the os-brick
> rootwrap filters into nova (and I presume cinder) were too urgent to wait
> for such a slow os-brick refactoring process.  The conclusion we reached was
> that we would do a quick+dirty rootwrap drop-in replacement that effectively
> just ran commands as root for Mitaka, and then we would come back and
> refactor various codepaths away from that mechanism over time.  This is that
> first quick+dirty change.
> I tried to capture that in the commit description, but evidently did a poor
> job - if the above makes it any clearer to you, I'd welcome any suggested
> rewording for the commit description.
>
> TL;DR: what this buys us is the ability to use new/different command lines
> in os-brick without having to go through a rootwrap filters merge in
> downstream projects (and it is also that first baby step towards a
> technology that will allow something better in the future).

Agreed with the concerns here, but I thought these are the same we
raised at the midcycle.

My understanding of what came out of the midcycle was:
* current rootwrap system horribly breaks upgrade
* adopting privsep in this "sudo" like form fixes upgrade
* this approach is much lower risk than a full conversion at this
point in the release
* security wise its terrible, but then the current rules don't buy us
much anyhow
* makes it easier to slowly transition to better privsep integration
* all seems better than reverting os-brick integration to fix upgrade issues

Now at this point, we are way closer to release, but I want to check
we are making the correct tradeoff here.

Maybe the upgrade problem is not too bad this release, as the hard bit
was done with the last upgrade? Or is that total nonsense?

Thanks,
johnthetubaguy

>> On 24 February 2016 at 14:59, Michał Dulko  wrote:
>>>
>>> On 02/24/2016 04:51 AM, Angus Lees wrote:
>>> > Re: https://review.openstack.org/#/c/277224
>>> >
>>> > Most of the various required changes have flushed out by now, and this
>>> > change now passes the dsvm-full integration tests(*).
>>> >
>>> > (*) well, the experimental job anyway.  It still relies on a
>>> > merged-but-not-yet-released change in oslo.privsep so gate + 3rd party
>>> > won't pass until that happens.
>>> >
>>> > What?
>>> > This change replaces os-brick's use of rootwrap with a quick+dirty
>>> > privsep-based drop-in replacement.  Privsep doesn't actually provide
>>> > much security isolation when used in this way, but it *does* now run
>>> > commands with CAP_SYS_ADMIN (still uid=0/gid=0) rather than full root
>>> > superpowers.  The big win from a practical point of view is that it
>>> > also means os-brick's rootwrap filters file is essentially deleted and
>>> > no longer has to be manually merged with downstream projects.
>>> >
>>> > Code changes required in nova/cinder:
>>> > There is one change each to nova+cinder to add the relevant
>>> > privsep-helper command to rootwrap filters, and a devstack change to
>>> > add a nova.conf/cinder.conf setting.  That's it - this is otherwise a
>>> > backwards/forwards compatible change for nova+cinder.
>>> >
>>> > Deployment changes required in nova/cinder:
>>> > A new "privsep_rootwrap.helper_command" needs to be defined in
>>> > nova/cinder.conf (default is something sensible using sudo), and
>>> > rootwrap filters or sudoers updated depending on the exact command
>>> > chosen.  Be aware

Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach

2016-02-23 Thread John Garbutt

On 22 February 2016 at 22:08, Walter A. Boring IV <walter.bor...@hpe.com> wrote:
> On 02/22/2016 11:24 AM, John Garbutt wrote:
>>
>> Hi,
>>
>> Just came up on IRC, when nova-compute gets killed half way through a
>> volume attach (i.e. no graceful shutdown), things get stuck in a bad
>> state, like volumes stuck in the attaching state.
>>
>> This looks like a new addition to this conversation:
>>
>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html
>> And brings us back to this discussion:
>> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova
>>
>> What if we move our attention towards automatically recovering from
>> the above issue? I am wondering if we can look at making our usually
>> recovery code deal with the above situation:
>>
>> https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934
>>
>> Did we get the Cinder APIs in place that enable the force-detach? I
>> think we did and it was this one?
>>
>> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api
>>
>> I think diablo_rojo might be able to help dig for any bugs we have
>> related to this. I just wanted to get this idea out there before I
>> head out.
>>
>> Thanks,
>> John
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> .
>>
> The problem is a little more complicated.
>
> In order for cinder backends to be able to do a force detach correctly, the
> Cinder driver needs to have the correct 'connector' dictionary passed in to
> terminate_connection.  That connector dictionary is the collection of
> initiator side information which is gleaned here:
> https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144
>
> The plan was to save that connector information in the Cinder
> volume_attachment table.  When a force detach is called, Cinder has the
> existing connector saved if Nova doesn't have it.  The problem was live
> migration.  When you migrate to the destination n-cpu host, the connector
> that Cinder had is now out of date.  There is no API in Cinder today to
> allow updating an existing attachment.
>
> So, the plan at the Mitaka summit was to add this new API, but it required
> microversions to land, which we still don't have in Cinder's API today.

Ah, OK.

We do keep looping back to that core issue.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach

2016-02-22 Thread John Garbutt

Hi,

Just came up on IRC, when nova-compute gets killed half way through a
volume attach (i.e. no graceful shutdown), things get stuck in a bad
state, like volumes stuck in the attaching state.

This looks like a new addition to this conversation:
http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html
And brings us back to this discussion:
https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova

What if we move our attention towards automatically recovering from
the above issue? I am wondering if we can look at making our usually
recovery code deal with the above situation:
https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934

Did we get the Cinder APIs in place that enable the force-detach? I
think we did and it was this one?
https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api

I think diablo_rojo might be able to help dig for any bugs we have
related to this. I just wanted to get this idea out there before I
head out.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [api][all] api variation release by release

2016-02-22 Thread John Garbutt

On 13 January 2016 at 14:28, Matt Riedemann  wrote:
> On 1/13/2016 12:11 AM, joehuang wrote:
>>
>> Thanks for the information, it's good to know the documentation. The
>> further question is whether there is any XML format like document will be
>> published for each release and all core projects, so that other cloud
>> management software can read the changes, and deal with the fields
>> variation.
>>
>> For example, each project will maintain one XML file in its repository to
>> record all API update in each release.
>>
>> Best Regards
>> Chaoyi Huang ( Joe Huang )
>>
>>
>> -Original Message-
>> From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
>> Sent: Wednesday, January 13, 2016 10:56 AM
>> To: openstack-dev@lists.openstack.org
>> Subject: Re: [openstack-dev] [api][all] api variation release by release
>>
>>
>>
>> On 1/12/2016 7:27 PM, joehuang wrote:
>>>
>>> Hello,
>>>
>>> As more and more OpenStack release are deployed in the production
>>> cloud, multiple releases of OpenStack co-located in a cloud is a very
>>> common situation. For example, "Juno" and "Liberty" releases co-exist
>>> in the same cloud.
>>>
>>> Then the cloud management software has to be aware of the API
>>> variation of different releases, and deal with the different field of
>>> object in the request / response. For example, in "Juno", no
>>> "multiattach" field in the "volume" object, but the field presents in
>>> "Liberty".
>>>
>>> Each releases will bring some API changes, it will be very useful that
>>> the API variation will also be publish after each release is
>>> delivered, so that the cloud management software can read and changes
>>> and react accordingly.
>>>
>>> Best Regards
>>>
>>> Chaoyi Huang ( Joe Huang )
>>>
>>>
>>>
>>> __
>>>  OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> Have you heard of this effort going on in multiple projects called
>> microversions? For example, in Nova:
>>
>> http://docs.openstack.org/developer/nova/api_microversion_history.html
>>
>> Nova and Ironic already support microversioned APIs. Cinder and Neutron
>> are working on it I think, and there could be others.
>>
>
> No, there is nothing like that, at least that I've heard of. I don't know
> how you'd model what's changing in the microversions in a language like XML.

You will probably find this summary really interesting:
https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2/

The idea is people can deploy any commit of Nova in production.
So to work out what is supported you just look at this API:
http://developer.openstack.org/api-ref-compute-v2.1.html#listVersionsv2.1

You can consult the docs to see what that specific version gives you.
Thats still a work in progress, for now we have this:
* http://docs.openstack.org/developer/nova/api_microversion_history.html
* http://developer.openstack.org/api-guide/compute/

There is talk of using JSON home to make some details machine readable.
But its hard to express the semantic changes in a machine readable form.

Does that help?

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Cinder] Multi-attach, determining when to call os-brick's connector.disconnect_volume

2016-02-22 Thread John Garbutt

So just attempting to read through this thread, I think I hear:

Problems:

1. multi-attach breaks the assumption that made detach work
2. live-migrate, already breaks with some drivers, due to not fully
understanding the side affects of all API calls.
3. evacuate and shelve issues also related


Solution ideas:

1. New export/target for every volume connection
* pro: simple
* con: that doesn't work for all drivers (?)

2. Nova works out when to disconnect volume on host
* pro: no cinder API changes (i.e. no upgrade issue)
* con: adds complexity in Nova
* con: requires all nodes to run fixed code before multi-attach is safe
* con: doesn't help with the live-migrate and evacuate issues anyways?

3. Give Cinder all the info, so it knows what has to happen
* pro: seems to give cinder the info to stop API users doing bad things
* pro: more robust API particularly useful with multiple nova, and
with baremetal, etc
* con: Need cinder micro-versions to do this API change and work across upgrade


So from where I am sat:
1: doesn't work for everyone
2: doesn't fix all the problems we need to fix
3: will take a long time

If so, it feels like we need solution 3, regardless, to solve wider issues.
We only need solution 2, if solution 3 will block multi-attach for too long.

Am I missing something in that summary?

Thanks,
johnthetubaguy

On 12 February 2016 at 20:26, Ildikó Váncsa  wrote:
> Hi Walt,
>
> Thanks for describing the bigger picture.
>
> In my opinion when we will have microversion support available in Cinder that 
> will give us a bit of a freedom and also possibility to handle these 
> difficulties.
>
> Regarding terminate_connection we will have issues with live_migration as it 
> is today. We need to figure out what information would be best to feed back 
> to Cinder from Nova, so we should figure out what API we would need after we 
> are able to introduce it in a safe way. I still see benefit in storing the 
> connection_info for the attachments.
>
> Also I think the multiattach support should be disable for the problematic 
> drivers like lvm, until we don't have a solution for proper detach on the 
> whole call chain.
>
> Best Regards,
> Ildikó
>
>> -Original Message-
>> From: Walter A. Boring IV [mailto:walter.bor...@hpe.com]
>> Sent: February 11, 2016 18:31
>> To: openstack-dev@lists.openstack.org
>> Subject: Re: [openstack-dev] [Nova][Cinder] Multi-attach, determining when 
>> to call os-brick's connector.disconnect_volume
>>
>> There seems to be a few discussions going on here wrt to detaches.   One
>> is what to do on the Nova side with calling os-brick's disconnect_volume, 
>> and also when to or not to call Cinder's
>> terminate_connection and detach.
>>
>> My original post was simply to discuss a mechanism to try and figure out the 
>> first problem.  When should nova call brick to remove the
>> local volume, prior to calling Cinder to do something.
>>
>> Nova needs to know if it's safe to call disconnect_volume or not. Cinder 
>> already tracks each attachment, and it can return the
>> connection_info
>> for each attachment with a call to initialize_connection.   If 2 of
>> those connection_info dicts are the same, it's a shared volume/target.
>> Don't call disconnect_volume if there are any more of those left.
>>
>> On the Cinder side of things, if terminate_connection, detach is called, the 
>> volume manager can find the list of attachments for a
>> volume, and compare that to the attachments on a host.  The problem is, 
>> Cinder doesn't track the host along with the instance_uuid in
>> the attachments table.  I plan on allowing that as an API change after 
>> microversions lands, so we know how many times a volume is
>> attached/used on a particular host.  The driver can decide what to do with 
>> it at
>> terminate_connection, detach time. This helps account for
>> the differences in each of the Cinder backends, which we will never get all 
>> aligned to the same model.  Each array/backend handles
>> attachments different and only the driver knows if it's safe to remove the 
>> target or not, depending on how many attachments/usages
>> it has
>> on the host itself.   This is the same thing as a reference counter,
>> which we don't need, because we have the count in the attachments table, 
>> once we allow setting the host and the instance_uuid at
>> the same time.
>>
>> Walt
>> > On Tue, Feb 09, 2016 at 11:49:33AM -0800, Walter A. Boring IV wrote:
>> >> Hey folks,
>> >> One of the challenges we have faced with the ability to attach a
>> >> single volume to multiple instances, is how to correctly detach that
>> >> volume.  The issue is a bit complex, but I'll try and explain the
>> >> problem, and then describe one approach to solving one part of the detach 
>> >> puzzle.
>> >>
>> >> Problem:
>> >>When a volume is attached to multiple instances on the same host.
>> >> There are 2 scenarios here.
>> >>
>> >>1) Some Cinder drivers

Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread John Garbutt

On 22 February 2016 at 17:38, Sean Dague  wrote:
> On 02/22/2016 12:20 PM, Daniel P. Berrange wrote:
>> On Mon, Feb 22, 2016 at 12:07:37PM -0500, Sean Dague wrote:
>>> On 02/22/2016 10:43 AM, Chris Friesen wrote:
 Hi all,

 We've recently run into some interesting behaviour that I thought I
 should bring up to see if we want to do anything about it.

 Basically the problem seems to be that nova-compute is doing disk I/O
 from the main thread, and if it blocks then it can block all of
 nova-compute (since all eventlets will be blocked).  Examples that we've
 found include glance image download, file renaming, instance directory
 creation, opening the instance xml file, etc.  We've seen nova-compute
 block for upwards of 50 seconds.

 Now the specific case where we hit this is not a production
 environment.  It's only got one spinning disk shared by all the guests,
 the guests were hammering on the disk pretty hard, the IO scheduler for
 the instance disk was CFQ which seems to be buggy in our kernel.

 But the fact remains that nova-compute is doing disk I/O from the main
 thread, and if the guests push that disk hard enough then nova-compute
 is going to suffer.

 Given the above...would it make sense to use eventlet.tpool or similar
 to perform all disk access in a separate OS thread?  There'd likely be a
 bit of a performance hit, but at least it would isolate the main thread
 from IO blocking.
>>>
>>> Making nova-compute more robust is fine, though the reality is once you
>>> IO starve a system, a lot of stuff is going to fall over weird.
>>>
>>> So there has to be a tradeoff of the complexity of any new code vs. what
>>> it gains. I think individual patches should be evaluated as such, or a
>>> spec if this is going to get really invasive.
>>
>> There are OS level mechanisms (eg cgroups blkio controller) for doing
>> I/O priorization that you could use to give Nova higher priority over
>> the VMs, to reduce (if not eliminate) the possibility that a busy VM
>> can inflict a denial of service on the mgmt layer.  Of course figuring
>> out how to use that mechanism correctly is not entirely trivial.
>>
>> I think it is probably worth focusing effort in that area, before jumping
>> into making all the I/O related code in Nova more complicated. eg have
>> someone investigate & write up recommendation in Nova docs for how to
>> configure the host OS & Nova such that VMs cannot inflict an I/O denial
>> of service attack on the mgmt service.
>
> +1 that would be much nicer.
>
> We've got some set of bugs in the tracker right now which are basically
> "after the compute node being at loadavg of 11 for an hour, nova-compute
> starts failing". Having some basic methodology to use Linux
> prioritization on the worker process would mitigate those quite a bit,
> and could be used by all users immediately, vs. complex nova-compute
> changes which would only apply to new / upgraded deploys.
>

+1

Does that turn into improved deployment docs that cover how you do
that on various platforms?

Maybe some tools to help with that also go in here?
http://git.openstack.org/cgit/openstack/osops-tools-generic/

Thanks,
John

PS
FWIW, how xenapi runs nova-compute in VM has a similar outcome, albeit
in a more heavy handed way.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] A proposal to separate the design summit

2016-02-22 Thread John Garbutt

On 22 February 2016 at 15:31, Monty Taylor  wrote:
> On 02/22/2016 07:24 AM, Russell Bryant wrote:
>> On Mon, Feb 22, 2016 at 10:14 AM, Thierry Carrez >> > wrote:
>>> Hi everyone,
>>> TL;DR: Let's split the events, starting after Barcelona.
>> This proposal sounds fantastic.  Thank you very much to those that help
>> put it together.
> Totally agree. I think it's an excellent way to address the concerns and
> balance all of the diverse needs we have.

tl;dr
+1
Awesome work ttx.
Thank you!

Cheaper cities & venues should make it easier for more contributors to
attend. Thats a big deal. This also feels like enough notice to plan
for that.

I think this means summit talk proposal deadline is both after the
previous release, and after the contributor event for the next
release? That should help keep proposals concrete (less guess work
when submitting). Nice.

Dev wise, it seems equally good timing. Initially I was worried about
the event distracting from RC bugs, but actually I can see this
helping.

I am sure there are more questions that will pop up. Like I assume
this means there is no ATC free pass to the summit? And I guess a
small nominal fee for the contributor meetup (like the recent ops
meetup, to help predict numbers of accurately)? I guess that helps
level the playing field for contributors who don't put git commits in
the repo (I am thinking vocal operators that don't contribute code).
But I probably shouldn't go into all that just yet.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] How would nova microversion get-me-a-network in the API?

2016-02-22 Thread John Garbutt

On 22 February 2016 at 15:14, John Garbutt <j...@johngarbutt.com> wrote:
> On 22 February 2016 at 12:01, Jay Pipes <jaypi...@gmail.com> wrote:
>> On 02/22/2016 06:56 AM, Sean Dague wrote:
>>> On 02/19/2016 12:49 PM, John Garbutt wrote:
>>> 
>>>> Consider a user that uses these four clouds:
>>>> * nova-network flat DHCP
>>>> * nova-network VLAN manager
>>>> * neutron with a single provider network setup
>>>> * neutron where user needs to create their own network
>>>>
>>>> For the first three, the user specifies no network, and they just get
>>>> a single NIC with some semi-sensible IP address, likely with a gateway
>>>> to the internet.
>>>>
>>>> For the last one, the user ends up with a network with zero NICs. If
>>>> they then go and configure a network in neutron (and they can now use
>>>> the new easy one shot give-me-a-network CLI), they start to get VMs
>>>> just like they would have with nova-network VLAN manager.
>>>>
>>>> We all agree the status quo is broken. For me, this is a bug in the
>>>> API where we need to fix the consistency. Because its a change in the
>>>> behaviour, it needs to be gated by a micro version.
>>>>
>>>> Now, if we step back and created this again, I would agree that
>>>> --nic=auto is a good idea, so its explicit. However, all our users are
>>>> used to automatic being the default, all be it a very patchy default.
>>>> So I think the best evolution here is to fix the inconsistency by
>>>> making a VM with no network being the explicit option (--no-nic or
>>>> something?), and failing the build if we are unable to get a nic using
>>>> an "automatic guess" route. So now the default is more consistent, and
>>>> those that what a VM with no NIC have a way to get their special case
>>>> sorted.
>>>>
>>>> I think this means I like "option 2" in the summary mail on the ops list.
>>>
>>>
>>> Thinking through this over the weekend.
>>>
>>>  From the API I think I agree with Laski now. An API shouldn't doesn't
>>> typically need default behavior, it's ok to make folks be explicit. So
>>> making nic a required parameter is fine.
>>>
>>> "nic": "auto"
>>> "nic": "none"
>>> "nic": "$name"
>>>
>>> nic is now jsonschema enforced, 400 if not provided.
>>>
>>> that being said... I think the behavior of CLI tools should default to
>>> nic auto being implied. The user experience there is different. You use
>>> cli tools for one off boots of things, so should be as easy as possible.
>>>
>>> I think this is one of the places where the UX needs of the API and the
>>> CLI are definitely different.
>>
>> I'd be cool with this, too.
>
> +1 I am OK with this.
>
> Its an explicit API, its can be the same for nova-network and neutron,
> and supports the CLI behaviour folks want.
>
> Just to clarify with nic=auto, if there is no way to create a nic
> automatically (not allowed to do give-me-a-network, and no networks
> available, or the user has more than one network available, etc), we
> should fail the build of the server. Today I think we (sometimes?) end
> up falling back to nic=none, which seems totally counter to the
> explicit nature of the API. Its possible we could fail before the API
> returns, as least of the most likely reasons for failure (might need
> more neutron APIs - policy discovery - to do a more complete job of
> that).

Oh, I forgot to mention Horizon...

Talking with folks, since they require a network to be selected in the
create server flow, its likely they will use the new neutron API to
help populate their network drop down, rather than use these changes
in the Nova API. Although, I guess they might want to expose the new
nic=none option, thats not related to the key "first boot in a new
tenant" that we are trying to improve.

I think that means restricting our thoughts to just the Nova CLI and
direct API users, is OK. Although I am curious if that fits with
everyone's thinking about this topic?

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] How would nova microversion get-me-a-network in the API?

2016-02-22 Thread John Garbutt

On 22 February 2016 at 12:01, Jay Pipes <jaypi...@gmail.com> wrote:
> On 02/22/2016 06:56 AM, Sean Dague wrote:
>> On 02/19/2016 12:49 PM, John Garbutt wrote:
>> 
>>> Consider a user that uses these four clouds:
>>> * nova-network flat DHCP
>>> * nova-network VLAN manager
>>> * neutron with a single provider network setup
>>> * neutron where user needs to create their own network
>>>
>>> For the first three, the user specifies no network, and they just get
>>> a single NIC with some semi-sensible IP address, likely with a gateway
>>> to the internet.
>>>
>>> For the last one, the user ends up with a network with zero NICs. If
>>> they then go and configure a network in neutron (and they can now use
>>> the new easy one shot give-me-a-network CLI), they start to get VMs
>>> just like they would have with nova-network VLAN manager.
>>>
>>> We all agree the status quo is broken. For me, this is a bug in the
>>> API where we need to fix the consistency. Because its a change in the
>>> behaviour, it needs to be gated by a micro version.
>>>
>>> Now, if we step back and created this again, I would agree that
>>> --nic=auto is a good idea, so its explicit. However, all our users are
>>> used to automatic being the default, all be it a very patchy default.
>>> So I think the best evolution here is to fix the inconsistency by
>>> making a VM with no network being the explicit option (--no-nic or
>>> something?), and failing the build if we are unable to get a nic using
>>> an "automatic guess" route. So now the default is more consistent, and
>>> those that what a VM with no NIC have a way to get their special case
>>> sorted.
>>>
>>> I think this means I like "option 2" in the summary mail on the ops list.
>>
>>
>> Thinking through this over the weekend.
>>
>>  From the API I think I agree with Laski now. An API shouldn't doesn't
>> typically need default behavior, it's ok to make folks be explicit. So
>> making nic a required parameter is fine.
>>
>> "nic": "auto"
>> "nic": "none"
>> "nic": "$name"
>>
>> nic is now jsonschema enforced, 400 if not provided.
>>
>> that being said... I think the behavior of CLI tools should default to
>> nic auto being implied. The user experience there is different. You use
>> cli tools for one off boots of things, so should be as easy as possible.
>>
>> I think this is one of the places where the UX needs of the API and the
>> CLI are definitely different.
>
> I'd be cool with this, too.

+1 I am OK with this.

Its an explicit API, its can be the same for nova-network and neutron,
and supports the CLI behaviour folks want.

Just to clarify with nic=auto, if there is no way to create a nic
automatically (not allowed to do give-me-a-network, and no networks
available, or the user has more than one network available, etc), we
should fail the build of the server. Today I think we (sometimes?) end
up falling back to nic=none, which seems totally counter to the
explicit nature of the API. Its possible we could fail before the API
returns, as least of the most likely reasons for failure (might need
more neutron APIs - policy discovery - to do a more complete job of
that).

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] A prototype implementation towards the "shared state scheduler"

2016-02-22 Thread John Garbutt

On 21 February 2016 at 13:51, Cheng, Yingxin <yingxin.ch...@intel.com> wrote:
> On 19 February 2016 at 5:58, John Garbutt wrote:
>> On 17 February 2016 at 17:52, Clint Byrum <cl...@fewbar.com> wrote:
>> > Excerpts from Cheng, Yingxin's message of 2016-02-14 21:21:28 -0800:
>> Long term, I see a world where there are multiple scheduler Nova is able to 
>> use,
>> depending on the deployment scenario.
>
> Technically, what I've implemented is a new type of scheduler host manager
> `shared_state_manager.SharedHostManager`[1] with the ability to synchronize 
> host
> states directly from resource trackers.

Thats fine. You just get to re-use more code.

Maybe I should say multiple scheduling strategies, or something like that.

>> So a big question for me is, does the new scheduler interface work if you 
>> look at
>> slotting in your prototype scheduler?
>>
>> Specifically I am thinking about this interface:
>> https://github.com/openstack/nova/blob/master/nova/scheduler/client/__init_
>> _.py

I am still curious if this interface is OK for your needs?

Making this work across both types of scheduler might be tricky, but I
think it is worthwhile.

>> > This mostly agrees with recent tests I've been doing simulating 1000
>> > compute nodes with the fake virt driver.
>>
>> Overall this agrees with what I saw in production before moving us to the
>> caching scheduler driver.
>>
>> I would love a nova functional test that does that test. It will help us 
>> compare
>> these different schedulers and find the strengths and weaknesses.
>
> I'm also working on implementing the functional tests of nova scheduler, there
> is a patch showing my latest progress: 
> https://review.openstack.org/#/c/281825/
>
> IMO scheduler functional tests are not good at testing real performance of
> different schedulers, because all of the services are running as green threads
> instead of real processes. I think the better way to analysis the real 
> performance
> and the strengths and weaknesses is to start services in different processes 
> with
> fake virt driver(i.e. Clint Byrum's work) or Jay Pipe's work in emulating 
> different
> designs.

Having an option to run multiple process seems OK, if its needed.
Although starting with a greenlet version that works in the gate seems best.

Lets try a few things, and see what predicts the results in real environments.

>> I am really interested how your prototype and the caching scheduler compare?
>> It looks like single node scheduler will perform in a very similar way, but 
>> multiple
>> schedulers are less likely to race each other, although there are quite a few
>> races?
>
> I think the major weakness of caching scheduler comes from its host state 
> update
> model, i.e. updating host states from db every ` 
> CONF.scheduler_driver_task_period`
> seconds.

The trade off is that consecutive scheduler decisions don't race each
other, at all. Say you have a burst of 1000 instance builds and you
want to avoid build failures (but accept sub optimal placement, and
you are using fill first), thats a very good trade off.

Consider a burst of 1000 deletes, it may take you 60 seconds to notice
they are all deleted and you have lots more free space, but that
doesn't cause build failures like excessive races for the same
resources will, at least under the usual conditions where you are not
yet totally full (i.e. non-HPC use cases).

I was shocked how well the caching_scheduler works in practice. I
assumed it would be terrible, but when we tried it, it worked well.
Its a million miles from perfect, but handy for many deployment
scenarios.

Thanks,
johnthetubaguy

PS
If you need a 1000 node test cluster to play with, its worth applying
to use this one:
http://go.rackspace.com/developercloud
I am happy to recommend these efforts gets some time with that hardware.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] How would nova microversion get-me-a-network in the API?

2016-02-19 Thread John Garbutt

On 19 February 2016 at 16:28, Andrew Laski  wrote:
> On Fri, Feb 19, 2016, at 11:14 AM, Sean Dague wrote:
>> On 02/19/2016 09:30 AM, Andrew Laski wrote:
>> >
>> >
>> > On Thu, Feb 18, 2016, at 05:34 PM, melanie witt wrote:
>> >> On Feb 12, 2016, at 14:49, Jay Pipes  wrote:
>> >>
>> >>> This would be my preference as well, even though it's technically a 
>> >>> backwards-incompatible API change.
>> >>>
>> >>> The idea behind get-me-a-network was specifically to remove the current 
>> >>> required complexity of the nova boot command with regards to networking 
>> >>> options and allow a return to the nova-net model where an admin could 
>> >>> auto-create a bunch of unassigned networks and the first time a user 
>> >>> booted an instance and did not specify any network configuration (the 
>> >>> default, sane behaviour in nova-net), one of those unassigned networks 
>> >>> would be grabbed for the troject, I mean prenant, sorry.
>> >>>
>> >>> So yeah, the "opt-in to having no networking at all with a 
>> >>> --no-networking or --no-nics option" would be my preference.
>> >>
>> >> +1 to this, especially opting in to have no network at all. It seems most
>> >> friendly to me to have the network allocation automatically happen if
>> >> nothing special is specified.
>> >>
>> >> This is something where it seems like we need a "reset" to a default
>> >> behavior that is user-friendly. And microversions is the way we have to
>> >> "fix" an undesirable current default behavior.
>> >
>> > The question I would still like to see addressed is why do we need to
>> > have a default behavior here? The get-me-a-network effort is motivated
>> > by the current complexity of setting up a network for an instance
>> > between Nova and Neutron and wants to get back to a simpler time of
>> > being able to just boot an instance and get a network. But it still
>> > isn't clear to me why requiring something like "--nic auto" wouldn't
>> > work here, and eliminate the confusion of changing a default behavior.
>>
>> The point was the default behavior was a major concern to people. It's
>> not like this was always the behavior. If you were (or are) on nova net,
>> you don't need that option at all.
>
> Which is why I would prefer to shy away from default behaviors.
>
>>
>> The major reason we implemented API microversions was so that we could
>> make the base API experience better for people, some day. One day, we'll
>> have an API we love, hopefully. Doing so means that we do need to make
>> changes to defaults. Deprecate some weird and unmaintained bits.
>>
>> The principle of least surprise to me is that you don't need that
>> attribute at all. Do the right thing with the least amount of work.
>> Instead of making the majority of clients and users do extra work
>> because once upon a time when we brought in neutron a thing happen.
>
> The principal of least surprise to me is that a user explicitly asks for
> something rather than relying on a default that changes based on network
> service and/or microversion. This is the only area in the API where
> something did, and would, happen without explicitly being requested by a
> user. I just don't understand why it's special compared to
> flavor/image/volume which we require to be explicit. But I think we just
> need to agree to disagree here.

Consider a user that uses these four clouds:
* nova-network flat DHCP
* nova-network VLAN manager
* neutron with a single provider network setup
* neutron where user needs to create their own network

For the first three, the user specifies no network, and they just get
a single NIC with some semi-sensible IP address, likely with a gateway
to the internet.

For the last one, the user ends up with a network with zero NICs. If
they then go and configure a network in neutron (and they can now use
the new easy one shot give-me-a-network CLI), they start to get VMs
just like they would have with nova-network VLAN manager.

We all agree the status quo is broken. For me, this is a bug in the
API where we need to fix the consistency. Because its a change in the
behaviour, it needs to be gated by a micro version.

Now, if we step back and created this again, I would agree that
--nic=auto is a good idea, so its explicit. However, all our users are
used to automatic being the default, all be it a very patchy default.
So I think the best evolution here is to fix the inconsistency by
making a VM with no network being the explicit option (--no-nic or
something?), and failing the build if we are unable to get a nic using
an "automatic guess" route. So now the default is more consistent, and
those that what a VM with no NIC have a way to get their special case
sorted.

I think this means I like "option 2" in the summary mail on the ops list.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

Re: [openstack-dev] [Nova][Glance]Glance v2 api support in Nova

2016-02-19 Thread John Garbutt

On 19 February 2016 at 11:45, Sean Dague <s...@dague.net> wrote:
> On 02/15/2016 06:00 PM, Flavio Percoco wrote:
>> On 12/02/16 18:24 +0300, Mikhail Fedosin wrote:
>>> Hello!
>>>
>>> In late December I wrote several messages about glance v2 support in
>>> Nova and
>>> Nova's xen plugin. Many things have been done after that and now I'm
>>> happy to
>>> announce that there we have a set of commits that makes Nova fully v2
>>> compatible (xen plugin works too)!
>>>
>>> Here's the link to the top commit
>>> https://review.openstack.org/#/c/259097/
>>> Here's the link to approved spec for Mitaka https://github.com/openstack/
>>> nova-specs/blob/master/specs/mitaka/approved/use-glance-v2-api.rst
>>>
>>> I think it'll be a big step for OpenStack, because api v2 is much more
>>> stable
>>> and RESTful than v1.  We would very much like to deprecate v1 at some
>>> point. v2
>>> is 'Current' since Juno, and after that there we've had a lot of
>>> attempts to
>>> adopt it in Nova, and every time it was postponed to next release cycle.
>>>
>>> Unfortunately, it may not happen this time - this work was marked as
>>> 'non-priority' when the related patches had been done. I think it's a big
>>> omission, because this work is essential for all OpenStack, and it
>>> will be a
>>> shame if we won't be able to land it in Mitaka.
>>> As far as I know, Feature Freeze will be announced on March, 3rd, and
>>> we still
>>> have enough time and people to test it before. All patches are split
>>> into small
>>> commits (100 LOC max), so they should be relatively easy to review.
>>>
>>> I wonder if Nova community members may change their decision and
>>> unblock this
>>> patches? Thanks in advance!
>>
>> A couple of weeks ago, I had a chat with Sean Dague and John Garbutt and we
>> agreed that it was probably better to wait until Newton. After that
>> chat, we
>> held a Glance virtual mid-cycle where Mikhail mentioned that he would
>> rather
>> sprint on getting Nova on v2 than waiting for Newton. The terms and code
>> Mikhail
>> worked on aligns with what has been discussed throughout the cycle in
>> numerous
>> chats, patch sets, etc.
>>
>> After all the effort that has been put on this (including getting a py24
>> environment ready to test the xenplugin) it'd be a real shame to have
>> this work
>> pushed to Newton. The Glance team *needs* to be able to deprecate v1 and
>> the
>> team has been working on this ever since Kilo, when this effort of
>> moving Nova
>> to v2 started.
>>
>> I believe it has to be an OpenStack priority to make this happen or, at
>> the very
>> least, a cross-project effort that involves all services relying on
>> Glance. Nova
>> is the last service in the list, AFAICT, and the Glance team has been very
>> active on this front. This is not to imply the Nova team hasn't help, in
>> fact,
>> there's been lots of support/feedback from the nova team during Mitaka.
>> It is
>> because of that that I believe we should grant this patches an exception
>> and let
>> them in.
>>
>> Part of the feedback the Nova team has provided is that some of that
>> code that
>> has been proposed should live in glanceclient. The Glance team is ready
>> to react
>> and merge that code, release glanceclient, and get Nova on v2.
>
> Right, I think this was the crux of the problem. It took a while to get
> consensus on that point, and now we're deep into the priority part of
> the Nova cycle, and the runway is gone. I'm happy to help review early
> during the Newton cycle.
>
> I also think as prep work for that we should probably get either glance
> folks or citrix folks to enhance the testing around the xenserver /
> glance paths in Nova. That will make reviews go faster in Newton because
> we can be a lot more sure that patches aren't breaking anything.

+1 Sean's points here.

We should totally make time to get the Nova and Glance team together
during the design summit. I keep thinking we understand each other,
and every time we did a little deeper on this topic we find more
difference in opinions.

I think we all agree with the long term view:
* Nova's image API calls maintain their existing contract (glance v1 like)
* Nova can talk to glance v2 for everything, zero dependency on glance v1
* Nova will need to support both v1 and v2 glance for at least one cycle

It's the how we get to that point that we we don't all agree on. It
feels like an in depth face to face discussion will be the best way to
resolve that.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] A prototype implementation towards the "shared state scheduler"

2016-02-19 Thread John Garbutt

On 17 February 2016 at 17:52, Clint Byrum  wrote:
> Excerpts from Cheng, Yingxin's message of 2016-02-14 21:21:28 -0800:
>> Hi,
>>
>> I've uploaded a prototype https://review.openstack.org/#/c/280047/ to 
>> testify its design goals in accuracy, performance, reliability and 
>> compatibility improvements. It will also be an Austin Summit Session if 
>> elected: 
>> https://www.openstack.org/summit/austin-2016/vote-for-speakers/Presentation/7316

Long term, I see a world where there are multiple scheduler Nova is
able to use, depending on the deployment scenario.

We have tried to stop any more scheduler going in tree (like the
solver scheduler) while we get the interface between the
nova-scheduler and the rest of Nova straightened out, to make that
much easier.

So a big question for me is, does the new scheduler interface work if
you look at slotting in your prototype scheduler?

Specifically I am thinking about this interface:
https://github.com/openstack/nova/blob/master/nova/scheduler/client/__init__.py

>> I want to gather opinions about this idea:
>> 1. Is this feature possible to be accepted in the Newton release?
>> 2. Suggestions to improve its design and compatibility.
>> 3. Possibilities to integrate with resource-provider bp series: I know 
>> resource-provider is the major direction of Nova scheduler, and there will 
>> be fundamental changes in the future, especially according to the bp 
>> https://review.openstack.org/#/c/271823/1/specs/mitaka/approved/resource-providers-scheduler.rst.
>>  However, this prototype proposes a much faster and compatible way to make 
>> schedule decisions based on scheduler caches. The in-memory decisions are 
>> made at the same speed with the caching scheduler, but the caches are kept 
>> consistent with compute nodes as quickly as possible without db refreshing.
>>
>> Here is the detailed design of the mentioned prototype:
>>
>> >>
>> Background:
>> The host state cache maintained by host manager is the scheduler resource 
>> view during schedule decision making. It is updated whenever a request is 
>> received[1], and all the compute node records are retrieved from db every 
>> time. There are several problems in this update model, proven in 
>> experiments[3]:
>> 1. Performance: The scheduler performance is largely affected by db access 
>> in retrieving compute node records. The db block time of a single request is 
>> 355ms in average in the deployment of 3 compute nodes, compared with only 
>> 3ms in in-memory decision-making. Imagine there could be at most 1k nodes, 
>> even 10k nodes in the future.
>> 2. Race conditions: This is not only a parallel-scheduler problem, but also 
>> a problem using only one scheduler. The detailed analysis of 
>> one-scheduler-problem is located in bug analysis[2]. In short, there is a 
>> gap between the scheduler makes a decision in host state cache and the
>> compute node updates its in-db resource record according to that decision in 
>> resource tracker. A recent scheduler resource consumption in cache can be 
>> lost and overwritten by compute node data because of it, result in cache 
>> inconsistency and unexpected retries. In a one-scheduler experiment using 
>> 3-node deployment, there are 7 retries out of 31 concurrent schedule 
>> requests recorded, results in 22.6% extra performance overhead.
>> 3. Parallel scheduler support: The design of filter scheduler leads to an 
>> "even worse" performance result using parallel schedulers. In the same 
>> experiment with 4 schedulers on separate machines, the average db block time 
>> is increased to 697ms per request and there are 16 retries out of 31 
>> schedule requests, namely 51.6% extra overhead.
>
> This mostly agrees with recent tests I've been doing simulating 1000
> compute nodes with the fake virt driver.

Overall this agrees with what I saw in production before moving us to
the caching scheduler driver.

I would love a nova functional test that does that test. It will help
us compare these different schedulers and find the strengths and
weaknesses.

> My retry rate is much lower,
> because there's less window for race conditions since there is no latency
> for the time between nova-compute getting the message that the VM is
> scheduled to it, and responding with a host update. Note that your
> database latency numbers seem much higher, we see about 200ms, and I
> wonder if you are running in a very resource constrained database
> instance.

Just to double check, you are using pymysql rather than MySQL-python
as the sqlalchemy backend?

If you use a driver that doesn't work well with eventlet, things can
get very bad, very quickly. Particularly because of the way the
scheduling works around handing back the results of the DB call. You
can get some benefits by shrinking the db and greenlet pools to reduce
the concurrency.

>> Improvements:
>> This prototype solved the mentioned issues above by implementing a new

Re: [openstack-dev] [oslo] upgrade implications of lots of content in paste.ini

2016-02-18 Thread John Garbutt

On 18 February 2016 at 17:58, Sean Dague  wrote:
> On 02/18/2016 12:17 PM, Michael Krotscheck wrote:
>> Clarifying:
>>
>> On Thu, Feb 18, 2016 at 2:32 AM Sean Dague > > wrote:
>>
>> Ok, to make sure we all ended up on the same page at the end of this
>> discussion, this is what I think I heard.
>>
>> 1) oslo.config is about to release with a feature that will make adding
>> config to paste.ini not needed (i.e.
>> https://review.openstack.org/#/c/265415/ is no longer needed).
>>
>>
>> I will need help to do this. More below.
>>
>>
>> 2) ideally the cors middleware will have sane defaults for that set of
>> headers in oslo.config.
>>
>>
>> I'd like to make sure we agree on what "Sane defaults" means here. By
>> design, the CORS middleware is generic, and only explicitly includes the
>> headers prescribed in the w3c spec.  It should not include any
>> additional headers, for reasons of downstream non-openstack consumers.
>>
>>
>> 3) projects should be able to apply new defaults for these options in
>> their codebase through a default override process (that is now nicely
>> documented somewhere... url?)
>>
>>
>> New sample defaults for the generated configuration files, they should
>> not live anywhere else. The CORS middleware should, if we go this path,
>> be little more than a true-to-spec implementation, with config files
>> that extend it for the appropriate API.
>>
>> The big question I now have is: What do we do with respect to the mitaka
>> freeze?
>>
>> Option 1: Implement as is, keep things consistent, fix them in Newton.
>
> The problem with Option 1 is that it's not fixable in Newton. It
> requires fixing for the next 3 releases as you have to deprecate out
> bits in paste.ini, make middleware warn for removal first soft, then
> hard, explain the config migration. Once this lands in the wild the
> unwind is very long and involved.
>
> Which is why I -1ed the patch. Because the fix in newton isn't a revert.

+1 on the upgrade impact being a blocker.
Certainly for all folks meeting these:
https://governance.openstack.org/reference/tags/assert_supports-upgrade.html#requirements

This will require lots of folks to pitch in a help, and bend the
process a touch.
But that seems way more reasonable than dragging our users through
that headache.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] How would nova microversion get-me-a-network in the API?

2016-02-12 Thread John Garbutt

On 12 February 2016 at 18:17, Andrew Laski  wrote:
>
>
> On Fri, Feb 12, 2016, at 12:15 PM, Matt Riedemann wrote:
>> Forgive me for thinking out loud, but I'm trying to sort out how nova
>> would use a microversion in the nova API for the get-me-a-network
>> feature recently added to neutron [1] and planned to be leveraged in
>> nova (there isn't a spec yet for nova, I'm trying to sort this out for a
>> draft).
>>
>> Originally I was thinking that a network is required for nova boot, so
>> we'd simply check for a microversion and allow not specifying a network,
>> easy peasy.
>>
>> Turns out you can boot an instance in nova (with neutron as the network
>> backend) without a network. All you get is a measly debug log message in
>> the compute logs [2]. That's kind of useless though and seems silly.
>>
>> I haven't tested this out yet to confirm, but I suspect that if you
>> create a nova instance w/o a network, you can latter try to attach a
>> network using the os-attach-interfaces API as long as you either provide
>> a network ID *or* there is a public shared network or the tenant has a
>> network at that point (nova looks those up if a specific network ID
>> isn't provided).
>>
>> The high-level plan for get-me-a-network in nova was simply going to be
>> if the user tries to boot an instance and doesn't provide a network, and
>> there isn't a tenant network or public shared network to default to,
>> then nova would call neutron's new auto-allocated-topology API to get a
>> network. This, however, is a behavior change.
>>
>> So I guess the question now is how do we handle that behavior change in
>> the nova API?
>>
>> We could add an auto-create-net boolean to the boot server request which
>> would only be available in a microversion, then we could check that
>> boolean in the compute API when we're doing network validation.
>>
>
> I think a flag like this is the right approach. If it's currently valid
> to boot an instance without a network than there needs to be something
> to distinguish a request that wants a network created vs. a request that
> doesn't want a network.
>
> This is still hugely useful if all that's required from a user is to
> indicate that they would like a network, they still don't need to
> understand/provide details of the network.

I was thinking a sort of opposite. Would this work?

We add a new micro-version that does this:
* nothing specified: do the best we can to get a port created
(get-me-a-network, etc,), or fail if not possible
* --no-nics option (or similar) that says "please don't give me any nics"

This means folks that don't want a network, reliably have a way to do
that. For everyone else, we do the same thing when using either
neutron or nova-network VLAN manager.

Thanks,
johnthetubaguy

PS
I think we should focus on the horizon experience, CLI experience, and
API experience separately, for a moment, to make sure each of those
cases actually works out OK.

>> Today if you don't specify a network and don't have a network available,
>> then the validation in the API is basically just quota checking that you
>> can get at least one port in your tenant [3]. With a flag on a
>> microversion, we could also validate some other things about
>> auto-creating a network (if we know that's going to be the case once we
>> hit the compute).
>>
>> Anyway, this is mostly me getting thoughts out of my head before the
>> weekend so I don't forget it and am looking for other ideas here or
>> things I might be missing.
>>
>> [1] https://blueprints.launchpad.net/neutron/+spec/get-me-a-network
>> [2]
>> https://github.com/openstack/nova/blob/30ba0c5eb19a9c9628957ac8e617ae78c0c1fa84/nova/network/neutronv2/api.py#L594-L595
>> [3]
>> https://github.com/openstack/nova/blob/30ba0c5eb19a9c9628957ac8e617ae78c0c1fa84/nova/network/neutronv2/api.py#L1107
>>
>> --
>>
>> Thanks,
>>
>> Matt Riedemann
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] do not account compute resource of instances in state SHELVED_OFFLOADED

2016-02-03 Thread John Garbutt

On 2 February 2016 at 14:11, Sascha Vogt <sascha.v...@gmail.com> wrote:
> Hi,
>
> Am 31.01.2016 um 18:57 schrieb John Garbutt:
>> We need to make sure we don't have configuration values that change
>> the semantic of our API.
>> Such things, at a minimum, need to be discoverable, but are best avoided.
> I totally agree on that. I
>
>>> I think an off-loaded / shelved resource should still count against the
>>> quota being used (instance, allocated floating IPs, disk space etc) just
>>> not the resources which are no longer consumed (CPU and RAM)
>>
>> OK, but that does mean unshelve can fail due to qutoa. Maybe thats OK?
> For me that would be ok, just like a boot could fail. Even now I think
> an unshelve can fail, because a new scheduling run is triggered and
> depending on various things you could get a "no valid host" (e.g. we
> have properties on Windows instances to only run them on a host with a
> datacenter license. If that host is full (we only have one at the
> moment), unshelve shouldn't work, should it?).
>
>> The quota really should live with the project that owns the resource.
>> i.e. nova has the "ephemeral" disk quota, but glance should have the
>> glance quota.
> Oh sure, I didn't mean to have that quota in Nova just to have them in
> general "somewhere". When I first started playing around with OpenStack,
> I was surprised that there are no quotas for images and ephemeral disks.
>
> What is the general feeling about this? Should I ask on "operators" if
> there is someone else who would like to have this added?

I think the best next step is to write up a nova-spec for newton:
http://docs.openstack.org/developer/nova/process.html#how-do-i-get-my-code-merged

But from a wider project view the quota system is very fragile, and is
proving hard to evolve. There are some suggested approaches to fix
that, but no one has had the time to take on that work. There is a bit
of a backlog of quota features right now.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Nova Midcycle Summary (i.e. mid mitaka progress report)

2016-02-02 Thread John Garbutt

Hi,

For all the details see this etherpad:
https://etherpad.openstack.org/p/mitaka-nova-midcycle

Here I am attempting a brief summary, picking out some highlights.
Feel free to reply and add your own details / corrections.

**Process**

Non-priority FFE deadline is this Friday (5th Feb).
Now open for Newton specs.
Please move any proposed Mitaka specs to Newton.

**Priorities**

Cells v2:
It is moving forward, see alaski's great summary:
http://lists.openstack.org/pipermail/openstack-dev/2016-January/084545.html
Mitaka aim is around the new create instance flow.
This will make the cell zero and the API database required.
Need to define a list of instance info that is "valid" in the API
before the instance has been built.

v2.1 API:
API docs updates moving forward, as is the removal of project-ids and
related work to support the scheduler. Discussed policy discovery for
newton, in relation to the live-resize blueprint, alaski to follow up
with keystone folks.

Live-Migrate:
Lots of code to review (see usual etherpad for priority order). Some
details around storage pools need agreeing, but the general approach
seems to have reached consensus. CI is making good progress, as its
finding bugs. Folks signed up for manual testing.
Spoke about the need to look into the token expiry fix discussed at the summit.

Scheduler:
Discussed jay's blueprints. For mitaka we agreed to focus on:
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/resource-classes.html,
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/resource-providers.html,
and possibly https://review.openstack.org/253187. The latter is likely
to require a new Scheduler API endpoint within Nova.
Overall there seemed a general agreement on the approach jaypipes
proposed, and happiness that its almost all in written down now in
spec changes.
Discussed the new scheduler plan in relation to IP scheduling for
neutron's routed networks with armax and carl_baldwin. Made a lot of
progress towards better understanding each others requirements
(https://review.openstack.org/#/c/263898/)

priv-sep:
We must have priv-sep in os-brick for mitaka to avoid more upgrade problems.
Go back and do a better job after we fix the burning upgrade issue

os-vif:
Work continues.
Decided it doesn't have to wait for priv-sep.
Agreed base os-vif lib to include ovs, ovs hybrid, and linux-bridge

**Testing**

* Got folks to help get a bleeding edge libvirt test working
* Agreement on the need to improve ironic driver testing
* Agreed on the intent to move forward with Feature Classification
* Reminder about the new CI related review guideline

**Cross Project**

Neutron:
We had armax and carl_baldwin in the room.
Discussed routed networks and the above scheduler impacts.
Spoke about API changes so we have less downtime during live-migrate
when using DVR (or similar tech).
Get me a network Neutron API needs to be idempotent. Still need help
with the patch on the Nova side, jaypipes to find someone. Agreed
overall direction.

Cinder:
Joined Cinder meetup via hangout.
Got a heads up around the issues they are having with nested quotas.
A patch broke backwards compatibility with olderer cinders, so the
patch has been reverted.
Spoke about priv-sep and os-brick, agreed above plan for the brute
force conversion.
Agreed multi-attach should wait till Newton. We have merged the DB
fixes that we need to avoid data corruption. Spoke about using service
version reporting to stop the API allowing mutli-attach until the
upgrade has completed. To make remove_export not race, spoke about the
need for every volume attachment having its own separate host
attachment, rather than trying to share connections. Still questions
around upgrade.

**Other**

Spoke about the need for policy discovery via the API, before we add
something like the live-resize blueprint.

Spoke about the architectural aim to not have computes communicate
with each other, and instead have the conductor send messages between
computes. This was in relation to tdurakov's proposal to refactor the
live-migrate workflow.

**Thank You**

Many thanks for Paul Murray and others at HP for hosting up during our
time in Bristol, UK.

Also many thanks to all who made the long trip to Bristol to help
discuss all these up coming efforts, and start to build consensus
ahead of the Newton design summit in Austin.

Thanks for reading,
johnthetubaguy

Re: [openstack-dev] [nova] Non-priority Feature Freeze update (including deadlines)

2016-02-01 Thread John Garbutt

On 31 January 2016 at 15:53, John Garbutt <j...@johngarbutt.com> wrote:
> Hi,
>
> We have recently past the deadline for the non-priority Feature Freeze:
> http://docs.openstack.org/releases/schedules/mitaka.html#m-nova-npff
>
> We do this to make sure we prioritise review and developer bandwidth
> for Bug Fixes and our agreed release priorities:
> http://specs.openstack.org/openstack/nova-specs/priorities/mitaka-priorities.html
>
> Many blueprints that were not in a position to merge and/or looked to
> have had little review, have already been deferred, and a -2 applied.
>
> If you want a FFE, please justify why on this etherpad:
> https://etherpad.openstack.org/p/mitaka-nova-non-priority-ff-tracking
>
> Core reviews, please +2 patches you think deserve a FFE. Let me know
> if that means I need to remove a -2 I may have applied.
>
> There are some blueprints I have left in limbo, while I iterate again
> through the list of all approved blueprints until will get to a list
> that is a sensible size for this point in the release.
>
> Any questions, please do ask (on IRC, or whatever).

So I said about deadlines in the subject, and failed to include them.

We want to have a hard stop on approving non-priority features this
Friday (5th February).

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Neutron] Scheduling with routed networks

2016-01-31 Thread John Garbutt

On 29 January 2016 at 16:09, Armando M. <arma...@gmail.com> wrote
> On 29 January 2016 at 12:16, Jay Pipes <jaypi...@gmail.com> wrote:
>> On 01/28/2016 09:15 PM, Carl Baldwin wrote:
>>>
>>> Hi Nova and Neutron,
>>> It was a pleasure to attend the Nova mid-cycle in Bristol this week.
>> Indeed, I thought the mid-cycle was super-productive.

+1

Many thanks for you both making the journey to be with us.
That was super helpful towards getting a plan for Newton.

> Yup, I always thought that Nova folks had tails, horns and pitchforks...it
> turns out I was wrong!

:-P

>>> I think we made a lot of progress.  I spent a little time capturing
>>> the highlights of what we discussed about scheduling and routed
>>> networks in a new revision to the backlog spec [1] that I created a
>>> couple of weeks ago.
>>>
>>> I also captured my understanding of the discussion we had this
>>> afternoon as things were winding down.  I remember Jay Pipes, Andrew
>>> Laskey, Dan Smith, John Garbutt, and Armando Migliaccio actively
>>> participating in that discussion with me.  I would appreciate it if
>>> you could visit this spec and record any thoughts or conclusions that
>>> I might have missed or mis-understood.
>>
>>
>> Will do for sure. I'll also keep you updated on the progress of the
>> generic-resource-pools work which intersects with the routed networks
>> features.
>
> It's my intention of going over the spec whilst my memory is fresh...I need
> some _light reading_ for my journey back anyway :)

Just traveling back from FOSDEM, but that looks good.

There only major thing left is that niggle around how Neutron updates
claims and the reservation count, but that looks close. I added
comments on the review.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] do not account compute resource of instances in state SHELVED_OFFLOADED

2016-01-31 Thread John Garbutt

On 27 January 2016 at 16:59, Sascha Vogt  wrote:
> Hi Andrew,
>
> Am 27.01.2016 um 10:38 schrieb Andrew Laski:
>> 1. This allows for a poor experience where a user would not be able to
>> turn on and use an instance that they already have due to overquota.
>> This is a change from the current behavior where they just can't create
>> resources, now something they have is unusable.
> That is a valid point, though I think if it's configurable it is up to
> the operator to use that feature.

We need to make sure we don't have configuration values that change
the semantic of our API.
Such things, at a minimum, need to be discoverable, but are best avoided.

>
>> 2. I anticipate a further ask for a separate quota for the number of
>> offloaded resources being used to prevent just continually spinning up
>> and shelving instances with no limit.  Because while disk/ram/cpu
>> resources are not being consumed by an offloaded instance network and
>> volume resources remain consumed and storage is required is Glance for
>> the offloaded disk.  And adding this additional quota adds more
>> complexity to shelving which is already overly complex and not well
>> understood.
> I think an off-loaded / shelved resource should still count against the
> quota being used (instance, allocated floating IPs, disk space etc) just
> not the resources which are no longer consumed (CPU and RAM)

OK, but that does mean unshelve can fail due to qutoa. Maybe thats OK?

> In addition I think it would make sense to introduce a quota for Glance
> and ephemeral disk size and a shelved instance could (should) still
> count against those quotas.

The quota really should live with the project that owns the resource.
i.e. nova has the "ephemeral" disk quota, but glance should have the
glance quota.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Non-priority Feature Freeze update (including deadlines)

2016-01-31 Thread John Garbutt

Hi,

We have recently past the deadline for the non-priority Feature Freeze:
http://docs.openstack.org/releases/schedules/mitaka.html#m-nova-npff

We do this to make sure we prioritise review and developer bandwidth
for Bug Fixes and our agreed release priorities:
http://specs.openstack.org/openstack/nova-specs/priorities/mitaka-priorities.html

Many blueprints that were not in a position to merge and/or looked to
have had little review, have already been deferred, and a -2 applied.

If you want a FFE, please justify why on this etherpad:
https://etherpad.openstack.org/p/mitaka-nova-non-priority-ff-tracking

Core reviews, please +2 patches you think deserve a FFE. Let me know
if that means I need to remove a -2 I may have applied.

There are some blueprints I have left in limbo, while I iterate again
through the list of all approved blueprints until will get to a list
that is a sensible size for this point in the release.

Any questions, please do ask (on IRC, or whatever).

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][tc] Stabilization cycles: Elaborating on the idea to move it forward

2016-01-22 Thread John Garbutt

On 22 January 2016 at 02:38, Robert Collins  wrote:
> On 21 January 2016 at 07:38, Ian Cordasco  wrote:
>>
>> I think this is a solid proposal but I'm not sure what (if anything) the TC 
>> needs to do about this. This is something most non-corporate open source 
>> projects do (and even some corporate open source projects). It's the natural 
>> life-cycle of any software project (that we ship a bunch of things and then 
>> focus on stability). Granted, I haven't seen much of a focus on it in 
>> OpenStack but that's a different story.
>>
>> That said, I'd like to see a different release cadence for cycles that are 
>> "stabilization cycles". We, as a community, are not using minor version 
>> numbers. During a stabilization cycle, I would like to see master be 
>> released around the 3 milestones as X.1.0, X.2.0, X.3.0. If we work that 
>> way, then we'll be able to avoid having to backport a lot of work to the X.0 
>> series and while we could support X.0 series with specific backports, it 
>> would avoid stressing our already small stable teams. My release strategy, 
>> however, may cause more stress for downstream packages though. It'll cause 
>> them to have to decide what and when to package and to be far more aware of 
>> each project's current development cycle. I'm not sure that's positive.
>
> So the reason this was on my todo this cycle - and I'm so glad Flavio
> has picked it up (point 9 of
> https://rbtcollins.wordpress.com/2015/11/02/openstack-mitaka-debrief/)
> - was that during the Tokyo summit, in multiple sessions, folk were
> saying that they wanted space from features, to consolidate already
> added things, and to cleanup accrued debt, and that without TC
> support, they couldn't sell it back to their companies.
>
> Essentially, if the TC provides some leadership here: maybe as little as:
>  - its ok to do it [we think it will benefit our users]
>  - sets some basic expectations
>
> And then individual projects decide to do it (whether thats a PTL
> call, a vote, core consensus, whatever) - then developers have a
> platform to say to their organisation that the focus is X, don't
> expect features to land - and that they are *expected* to help with
> the cycle.
>
> Without some framework, we're leaving those developers out in the cold
> trying to explain what-and-why-and-how all by themselves.

+1 on the need to fix the "big issues" facing projects.
By "big issues" I mean problems that affect almost all users.
Stability and Technical Debt are just two "big issues".
For me, doing bug triage, code reviews, fixing the gate are all in that list.

In Nova we are trying to dedicate time in every cycle to the big
issues facing the project. We do that using by picking priorities, and
the non-priority feature freeze. There is a very rough write up on
that process from liberty here:
http://docs.openstack.org/developer/nova/process.html#non-priority-feature-freeze

It has allowed us to get rolling upgrades working and evolve our API
to do micro-versioning. It has also lead to a big backlog of other
code people want to push, lots of it already in production. There are
other issues here, and lots of ideas on the table, but lets not go
down that rabbit hole on email.

+1 for having a TC statement to set exceptions.
That should help developers unable to get permission.
Hopefully developers will also get asked to work on these things.

+1 documenting common patterns for projects to adopt.

-1 for making projects to adopt a "stability" cycle.
Feels better as a suggested pattern.
(Although it feels a little like an anti-pattern)

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][bugs] nova-bugs-team IRC meeting

2016-01-22 Thread John Garbutt

On 22 January 2016 at 10:08, Markus Zoeller  wrote:
> The dates and times are final now [1]. They differ from the previously
> dates in this thread! The first and next meetings will be:
>
> Tuesday   2016-02-09   18:00 UTC   #openstack-meeting-4
> Tuesday   2016-02-23   18:00 UTC   #openstack-meeting-4
> Tuesday   2016-03-01   10:00 UTC   #openstack-meeting-4
>
> I have to cancel the meeting of 2016-02-16 in advance, as I have a
> medical appointment a day before which will knock me out for the week.
>
> The dates make it possible to attend either the "nova-bugs-team"
> or the "nova-team" of the same week as they take turns in "early"
> and "late" time of day.
>
> The agenda can be found in the wiki [2].
>
> See you there!
>
> References:
> [1] http://eavesdrop.openstack.org/#Nova_Bugs_Team_Meeting
> [2] https://wiki.openstack.org/wiki/Meetings/Nova/BugsTeam
>
> Regard, Markus Zoeller (markus_z)

A big thank you for pushing on this.

As we head into Mitaka-3, post non-priority feature freeze, its a
great time to push on reviewing bug fixes, and fixing important bugs.

Thanks,
johnthetubaguy

>
> Markus Zoeller/Germany/IBM@IBMDE wrote on 01/20/2016 05:14:14 PM:
>
>> From: Markus Zoeller/Germany/IBM@IBMDE
>> To: "OpenStack Development Mailing List \(not for usage questions\)"
>> 
>> Date: 01/20/2016 05:27 PM
>> Subject: Re: [openstack-dev] [nova][bugs] nova-bugs-team IRC meeting
>>
>> Due to other meetings which merged since the announcement,
>> the IRC meeting patch [1] I pushed proposes now:
>> Tuesday 1000 UTC #openstack-meeting beweekly-odd
>> Tuesday 1700 UTC #openstack-meeting beweekly-even
>>
>> February the 9th at 1000 UTC will be the first kickoff meeting.
>> I'll have an agenda ready until then [2]. Feel free to ping me in IRC
>> or here on the ML when you have questions.
>>
>> References:
>> [1] https://review.openstack.org/#/c/270281/
>> [2] https://wiki.openstack.org/wiki/Meetings/Nova/BugsTeam
>>
>> Regards, Markus Zoeller (markus_z)
>>
>>
>> Markus Zoeller/Germany/IBM@IBMDE wrote on 01/13/2016 01:24:06 PM:
>> > From: Markus Zoeller/Germany/IBM@IBMDE
>> > To: "OpenStack Development Mailing List"
>> 
>> > Date: 01/13/2016 01:25 PM
>> > Subject: [openstack-dev] [nova][bugs] nova-bugs-team IRC meeting
>> >
>> > Hey folks,
>> >
>> > I'd like to revive the nova-bugs-team IRC meeting. As I want to chair
>> > those meetings in my "bug czar" role, the timeslots are bound to my
>> > timezone (UTC+1). The two bi-weekly alternating slots I have in mind
>> > are:
>> > * Tuesdays, 10:00 UTC biweekly-odd  (to get folks to the east of me)
>> > * Tuesdays, 16:00 UTC biweekly-even (to get folks to the west of me)
>> > By choosing these slots, the concluded action items of this meeting
> can
>> > be finished until the next nova meeting of the same week on Thursday.
>> > The "early" and "late" timeslot is diametrical to the slots of the
> nova
>> > meeting to allow you to attend one of those meetings for your
> timezone.
>> >
>> >Day   odd week   even week
>> > -    -  -
>> > nova meeting   Thursday  21:00 UTC  14:00 UTC
>> > nova bugs meeting  Tuesday   10:00 UTC  16:00 UTC
>> >
>> > Let me know if you think these slots are not reasonable. My goal is
>> > to have the first kick-off meeting at the 9th of February at 10:00
> UTC.
>> >
>> > The scope of the team meeting:
>> > * discuss and set the report-priority of bugs since the last meeting
>> >   if not yet done.
>> > * decide action items for bug reports which need further attention
>> > * expire bugs which are hit by the expiration-policy unless someone
>> >   disagrees.
>> > * get one or more volunteers for the rotating bug-skimming-duty.
>> >   Get feedback from the volunteers from the previous week if there
>> >   are noteworthy items.
>> > * check if new problem areas are emerging
>> > * discuss process adjustments/changes if necessary
>> > * spreading knowledge and discuss open points
>> >
>> > Review [1] contains my (WIP) proposal of the process I have in mind.
>> > This still needs concensus, but this is not the focus in this post.
>> > The first meeting(s) can be used as a kick-off to discuss items in
>> > that proposal.
>> >
>> > References:
>> > [1] Nova docs: "Bug Process": https://review.openstack.org/#/c/266453
>> >
>> > Regards, Markus Zoeller (markus_z)
>> >
>> >
>> >
>>
> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>>
>>
>>
> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>

[openstack-dev] [nova] Heads up about Mitaka-2 and Non-priority Feature Freeze

2016-01-19 Thread John Garbutt

Hi,

So at the end of Thursday we hit the non-priority (thats roughly, all
the low priority blueprints) Feature Freeze. I will forward more
details about the exception process, once we know the scale of what is
required.

I have already postponed low priority blueprints that had no code up
for review as of late last week, so we give more review time (this
week) to those that already have code uploaded.

As usual we track the reviews to focus on here:
https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking

Please note, the mitaka-2 release that happens this week, is
technically independent of the non-priority feature freeze. They just
happen in the same week so there are less dates to remember.

Any questions, as usual, just let me know.

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][stable] Proposal to add Tony Breeds to nova-stable-maint

2016-01-15 Thread John Garbutt

On 14 January 2016 at 10:45, Michael Still  wrote:
> I think Tony would be a valuable addition to the team.

+1

> +1

+1

John

> On 14 Jan 2016 7:59 AM, "Matt Riedemann"  wrote:
>>
>> I'm formally proposing that the nova-stable-maint team [1] adds Tony
>> Breeds to the core team.
>>
>> I don't have a way to track review status on stable branches, but there
>> are review numbers from gerrit for stable/liberty [2] and stable/kilo [3].
>>
>> I know that Tony does a lot of stable branch reviews and knows the
>> backport policy well, and he's also helped out numerous times over the last
>> year or so with fixing stable branch QA / CI issues (think gate wedge
>> failures in stable/juno over the last 6 months). So I think Tony would be a
>> great addition to the team.
>>
>> So for those on the team already, please reply with a +1 or -1 vote.
>>
>> [1] https://review.openstack.org/#/admin/groups/540,members
>> [2]
>> https://review.openstack.org/#/q/reviewer:%22Tony+Breeds%22+branch:stable/liberty+project:openstack/nova
>> [3]
>> https://review.openstack.org/#/q/reviewer:%22Tony+Breeds%22+branch:stable/kilo+project:openstack/nova
>>
>> --
>>
>> Thanks,
>>
>> Matt Riedemann
>>
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 3 4 5 >

1 - 100 of 446 matches

Mail list logo