Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-22 Thread Sean Dague
On 05/22/2017 02:45 PM, James Penick wrote:

>  
> 
> I recognize that large Ironic users expressed their concerns about
> IPMI/BMC communication being unreliable and not wanting to have
> users manually retry a baremetal instance launch. But, on this
> particular point, I'm of the opinion that Nova just do one thing and
> do it well. Nova isn't an orchestrator, nor is it intending to be a
> "just continually try to get me to this eventual state" system like
> Kubernetes.
> 
> 
> Kubernetes is a larger orchestration platform that provides autoscale. I
> don't expect Nova to provide autoscale, but 
> 
> I agree that Nova should do one thing and do it really well, and in my
> mind that thing is reliable provisioning of compute resources.
> Kubernetes does autoscale among other things. I'm not asking for Nova to
> provide Autoscale, I -AM- asking OpenStack's compute platform to
> provision a discrete compute resource reliably. This means overcoming
> common and simple error cases. As a deployer of OpenStack I'm trying to
> build a cloud that wraps the chaos of infrastructure, and present a
> reliable facade. When my users issue a boot request, I want to see if
> fulfilled. I don't expect it to be a 100% guarantee across any possible
> failure, but I expect (and my users demand) that my "Infrastructure as a
> service" API make reasonable accommodation to overcome common failures. 

Right, I think hits my major queeziness with throwing the baby out with
the bathwater here. I feel like Nova's job is to give me a compute when
asked for computes. Yes, like malloc, things could fail. But honestly if
Nova can recover from that scenario, it should try to. The baremetal and
affinity cases are pretty good instances where Nova can catch and
recover, and not just export that complexity up.

It would make me sad to just export that complexity to users, and
instead of handing those cases internally make every SDK, App, and
simple script build their own retry loop.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RFC - Global Request Ids

2017-05-17 Thread Sean Dague
On 05/16/2017 12:01 PM, Sean Dague wrote:
> After the forum session on logging, we came up with what we think is an
> approach here for global request ids -
> https://review.openstack.org/#/c/464746/ - it would be great of
> interested operators would confirm this solves their concerns.
> 
> There is also an open question. A long standing concern was "trusting"
> the request-id, though I don't really know how that could be exploited
> for anything really bad, and this puts in a system for using service
> users as a signal for trust.
> 
> But the whole system is a lot easier, and comes together quicker, if
> we don't have that. For especially public cloud users, are there any
> concerns that you have in letting users set Request-Id (assuming you'll
> also still have a 2nd request-id that's service local and acts like
> request-id today)?

FYI, right now CERN and Godaddy expressed that they don't need strong
trust validation on these ids (as long as they are validated to look
like a uuid, so no injection concerns). We've had no people providing
rationale on the original fears around doing that.

So unless I hear something in the next 24 hours we'll update the spec to
drop that part.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] RFC - Global Request Ids

2017-05-16 Thread Sean Dague
After the forum session on logging, we came up with what we think is an
approach here for global request ids -
https://review.openstack.org/#/c/464746/ - it would be great of
interested operators would confirm this solves their concerns.

There is also an open question. A long standing concern was "trusting"
the request-id, though I don't really know how that could be exploited
for anything really bad, and this puts in a system for using service
users as a signal for trust.

But the whole system is a lot easier, and comes together quicker, if
we don't have that. For especially public cloud users, are there any
concerns that you have in letting users set Request-Id (assuming you'll
also still have a 2nd request-id that's service local and acts like
request-id today)?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [nova][glance] Who needs multiple api_servers?

2017-04-28 Thread Sean Dague
On 04/28/2017 12:50 AM, Blair Bethwaite wrote:
> We at Nectar are in the same boat as Mike. Our use-case is a little
> bit more about geo-distributed operations though - our Cells are in
> different States around the country, so the local glance-apis are
> particularly important for caching popular images close to the
> nova-computes. We consider these glance-apis as part of the underlying
> cloud infra rather than user-facing, so I think we'd prefer not to see
> them in the service-catalog returned to users either... is there going
> to be a (standard) way to hide them?

In a situation like this, where Cells are geographically bounded, is
there also a Region for that Cell/Glance?

    -Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [all] [quotas] Unified Limits Conceptual Spec RFC

2017-03-17 Thread Sean Dague
Background:

At the Atlanta PTG there was yet another attempt to get hierarchical
quotas more generally addressed in OpenStack. A proposal was put forward
that considered storing the limit information in Keystone
(https://review.openstack.org/#/c/363765/). While there were some
concerns on details that emerged out of that spec, the concept of the
move to Keystone was actually really well received in that room by a
wide range of parties, and it seemed to solve some interesting questions
around project hierarchy validation. We were perilously close to having
a path forward for a community request that's had a hard time making
progress over the last couple of years.

Let's keep that flame alive!


Here is the proposal for the Unified Limits in Keystone approach -
https://review.openstack.org/#/c/440815/. It is intentionally a high
level spec that largely lays out where the conceptual levels of control
will be. It intentionally does not talk about specific quota models
(there is a follow on that is doing some of that, under the assumption
that the exact model(s) supported will take a while, and that the
keystone interfaces are probably not going to substantially change based
on model).

We're shooting for a 2 week comment cycle here to then decide if we can
merge and move forward during this cycle or not. So please
comment/question now (either in the spec or here on the mailing list).

It is especially important that we get feedback from teams that have
limits implementations internally, as well as any that have started on
hierarchical limits/quotas (which I believe Cinder is the only one).

Thanks for your time, and look forward to seeing comments on this.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RFC - hierarchical quota models

2017-03-08 Thread Sean Dague

On 03/08/2017 07:12 AM, Tim Bell wrote:



On 7 Mar 2017, at 11:52, Sean Dague <s...@dague.net> wrote:

One of the things that came out of the PTG was perhaps a new path
forward on hierarchical limits that involves storing of limits in
keystone doing counting on the projects. Members of the developer
community are working through all that right now, that's not really what
this is about.

As a related issue, it seemed that every time that we talk about this
space, people jump into describing how they think the counting /
enforcement would work. It became clear that people were overusing the
word "overbooking" to the point where it didn't have a lot of meaning.

https://review.openstack.org/#/c/441203/ is a reference document that I
started in writing out every model I thought I heard people talk about,
the rules with it, and starting to walk through the kind of algorithm
needed to update limits, as well as check quota on ones that seem like
we might move forward with.

It is full of blockdiag markup, which makes the rendered HTML the best
way to view it -
http://docs-draft.openstack.org/03/441203/11/check/gate-keystone-specs-docs-ubuntu-xenial/c3fc2b3//doc/build/html/specs/keystone/backlog/hierarchical-quota-scenarios.html


There are specific question to the operator community here:


Are there other models you believe are not represented that you think
should be considered? if so, what are the rules of them so I can throw
them into the document.



Thanks.  In the interest of completeness, I’ll add one more scenario to the mix 
but I would not look for this as part of the functionality of the 1st release.

One item we have encountered in the past is how to reduce quota for projects. 
If a child project quota is to be reduced but it is running the maximum number 
of VMs, the parent project administrator has to wait for the child to do the 
deletion before they can reduce the quota. Being able to do this would mean 
that new resource creation would be blocked but that existing resources would 
continue to run (until the child project admin gets round to choosing the 
priorities for deletion out of the many VMs he has running)

However, this does bring in significant additional complexity so unless there 
is an easy way of modelling it, I’d suggest this for nested quotes v2 at the 
earliest.


Actually if we unify limit definitions to keystone, that's going to 
be the default behavior. A change in limits *will not* validate against 
existing usage. That will get called out more specifically in the next 
version of the unified limits spec.


    -Sean

--
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] RFC - hierarchical quota models

2017-03-07 Thread Sean Dague
One of the things that came out of the PTG was perhaps a new path
forward on hierarchical limits that involves storing of limits in
keystone doing counting on the projects. Members of the developer
community are working through all that right now, that's not really what
this is about.

As a related issue, it seemed that every time that we talk about this
space, people jump into describing how they think the counting /
enforcement would work. It became clear that people were overusing the
word "overbooking" to the point where it didn't have a lot of meaning.

https://review.openstack.org/#/c/441203/ is a reference document that I
started in writing out every model I thought I heard people talk about,
the rules with it, and starting to walk through the kind of algorithm
needed to update limits, as well as check quota on ones that seem like
we might move forward with.

It is full of blockdiag markup, which makes the rendered HTML the best
way to view it -
http://docs-draft.openstack.org/03/441203/11/check/gate-keystone-specs-docs-ubuntu-xenial/c3fc2b3//doc/build/html/specs/keystone/backlog/hierarchical-quota-scenarios.html


There are specific question to the operator community here:


Are there other models you believe are not represented that you think
should be considered? if so, what are the rules of them so I can throw
them into the document.


Would love to try to model everything under consideration here. It seems
like the conversations go around in circles a bit because everyone is
trying to keep everything in working memory, and paging out parts.
Diagrams hopefully ensure we all are talking about the same things.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] What would you like in Pike?

2017-01-17 Thread Sean Dague
On 01/16/2017 04:02 PM, Jonathan Bryce wrote:
> 
>> On Jan 16, 2017, at 11:03 AM, Matt Riedemann
>> <mrie...@linux.vnet.ibm.com <mailto:mrie...@linux.vnet.ibm.com>> wrote:
>>
>> On 1/12/2017 7:30 PM, Melvin Hillsman wrote:
>>> Hey everyone,
>>>
>>> I am hoping to get a dialogue started to gain some insight around things
>>> Operators, Application Developers, and End Users would like to see
>>> happen in Pike. If you had a dedicated environment, dedicated team, and
>>> freedom to choose how you deployed, new features, older features,
>>> enhancements, etc, and were not required to deal with customer/client
>>> tickets, calls, and maintenances, could keep a good feedback loop
>>> between your team and the upstream community of any project, what would
>>> like to make happen or work on hoping the next release of OpenStack
>>> had/included/changed/enhanced/removed…?
>>>
>>
>> I just wanted to say thanks for starting this thread. I often wonder
>> what people would like to see the Nova team prioritize because we
>> don't get input from the product work group or Foundation really on
>> big ticket items so we're generally left to prioritizing work on our
>> own each release.
> 
> I agree; thanks Melvin! Great input so far.
> 
> Matt, on the input on big ticket items, I’d love to get your feedback on
> what is missing or you’d like to see more of in the Foundation reports
> or Product Working Group roadmaps to make them more useful for these
> kinds of items. Is this thread more consumable because specific
> functionality is identified over themes? Is it the way it’s scoped to a
> release? We could possibly even add in a similar question (“What would
> you like to see in the next release?”) to the user survey if this is
> info you’ve been looking for.

One of the challenges thus far on PWG input is it has often not been
very grounded in reality. It often misses key details about what's hard,
possible, or easy when it comes to the code and communities in question.
The operator list feedback is typically by teams that are much more
steeped in the day to days of running OpenStack, have sifted through
chunks of the code, chatted more with community members. It very often
starts from a place of much more common ground and understanding.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How are people dealing with API rate limiting?

2016-06-14 Thread Sean Dague
On 06/14/2016 11:02 AM, Matt Riedemann wrote:
> A question came up in the nova IRC channel this morning about the
> api_rate_limit config option in nova which was only for the v2 API.
> 
> Sean Dague explained that it never really worked because it was per API
> server so if you had more than one API server it was busted. There is no
> in-tree replacement in nova.
> 
> So the open question here is, what are people doing as an alternative?

Just as a clarification. The toy implementation that used to live in
Nova was even worse then described above. The counters were kept per
process. So if you had > 1 API worker (default is # of workers per CPU)
it would be inconsistent.

This is why it was defaulted in false in Havana, and never carried
forward to the new API backend.

    -Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Nova API extensions removal plan

2016-06-14 Thread Sean Dague
Nova is getting towards it's final phases of the long term arc to really
standardize the API, which includes removing the API extensions
facility. This has been a long arc that was started in Atlanta. And has
been talked about in a lot of channels, but some interactions this past
week made us realize that some folks might not have realized this is
happening.

So we've now got an over arching spec about how and why we're removing
the API extensions facility from Nova, and alternatives that exist for
folks -
https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/api-no-more-extensions.html

This is informative for folks, please take a look if you think this will
impact you.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-27 Thread Sean Dague
On 05/27/2016 04:23 AM, Dario Vianello wrote:
> 
>> On 25 May 2016, at 17:31, Tim Bell <tim.b...@cern.ch
>> <mailto:tim.b...@cern.ch>> wrote:
>>
>>
>> On 25/05/16 17:36, "Sean Dague" <s...@dague.net
>> <mailto:s...@dague.net>> wrote:
>>
>>> On 05/23/2016 10:24 AM, Tim Bell wrote:
>>>>
>>>>
>>>> Quick warning for those who are dependent on the "user_id:%(user_id)s"
>>>> syntax for limiting actions by user. According to 
>>>> https://bugs.launchpad.net/nova/+bug/1539351, this behavior was
>>>> apparently not intended according to the bug report feedback. The
>>>> behavior has changed from v2 to v2.1 and the old syntax no longer works.
>>>>
>>>>
>>>>
>>>> There can be security implications also so I’d recommend those using
>>>> this current v2 feature to review the bug to understand the potential
>>>> impacts as clouds enable v2.1.
>>>
>>> The Nova team is currently lacking information about the minimum number
>>> of user_id supporting policy points are needed. Because supporting
>>> user_id everywhere is definitely not going to be an option.
>>>
>>> We really need very detailed lists of which actions are required, and
>>> why. And for all server actions why "lock" action is not sufficient. And
>>> we need all of that by N1, which is in a week. With that we can evaluate
>>> what can be added to the API stack. Especially because this all needs
>>> tests so it doesn't regress. So if we can keep it at a small number of
>>> operations, it is way more likely to happen. If this grows to
>>> "everything", it definitely won't.
>>>
>>> It would honestly be great if people affected by this could also
>>> prioritize top to bottom what operations are most important. Detailed
>>> use case and priority is really needed to figure out what can be done.
>>>
>>
>> Thanks for looking into this. The current set of activities that our
>> developers want to do for their VMs (and do not want other doing to
>> their instances ☺ are
>>
>> - power off/power on/restart
>> - VNC console (since this also allows the above with appropriate SysRq)
>> - delete VM
>>
>> I think in the longer term, we’ll can work together to find a way to
>> do this with nested projects and some kind of automatic project
>> creation but without nested quotas and image sharing in the hierarchy
>> being priorities, these are not yet at functional parity compared to
>> the current Nova V2 implementation.
>>
>> Tim
> 
> Here at EMBL-EBI we are touched by this possible change in several ways:
> - to properly federate our OpenStack to the EGI FedCloud, which requires
> this feature. More on this coming, I hope somebody from the EGI will
> post here soon.
> - to support the same activities mentioned by Tim (power off/on/restart,
> console, VM deletion) in our about-to-come dev platform
> - to effectively support the long term of science scenario, where a
> single tenancy is shared by different independent researches to do their
> computation. 
> 
> I do agree that all this can be tackled by leveraging the nested
> projects, but especially for the FedCloud we are committed to deliver
> soon. Strip this feature out of Nova 2.1 would thus be an issue for us. 

Ok, but these are not the level of detail that we need to be actionable.
We realistically need an ordered list of every policy rule changed in
the importance of how that impacts things.

What is listed above is way too high level to understand any of that.
From Tim we got a small list of actions, we can probably work with that.

And to be clear, this feature already doesn't exist in master as it went
away in the default implementation 2 releases ago. We're talking about
new feature development here, which is real work. And this feature will
be marked as deprecated when should it go in, so other alternatives are
going to need to be considered here.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-25 Thread Sean Dague
On 05/23/2016 10:24 AM, Tim Bell wrote:
>  
> 
> Quick warning for those who are dependent on the "user_id:%(user_id)s"
> syntax for limiting actions by user. According to 
> https://bugs.launchpad.net/nova/+bug/1539351, this behavior was
> apparently not intended according to the bug report feedback. The
> behavior has changed from v2 to v2.1 and the old syntax no longer works.
> 
>  
> 
> There can be security implications also so I’d recommend those using
> this current v2 feature to review the bug to understand the potential
> impacts as clouds enable v2.1.

The Nova team is currently lacking information about the minimum number
of user_id supporting policy points are needed. Because supporting
user_id everywhere is definitely not going to be an option.

We really need very detailed lists of which actions are required, and
why. And for all server actions why "lock" action is not sufficient. And
we need all of that by N1, which is in a week. With that we can evaluate
what can be added to the API stack. Especially because this all needs
tests so it doesn't regress. So if we can keep it at a small number of
operations, it is way more likely to happen. If this grows to
"everything", it definitely won't.

It would honestly be great if people affected by this could also
prioritize top to bottom what operations are most important. Detailed
use case and priority is really needed to figure out what can be done.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-24 Thread Sean Dague
On 05/23/2016 11:56 AM, Tim Bell wrote:
> On 23/05/16 17:02, "Sean Dague" <s...@dague.net> wrote:
> 
>> On 05/23/2016 10:24 AM, Tim Bell wrote:
>>>  
>>>
>>> Quick warning for those who are dependent on the "user_id:%(user_id)s"
>>> syntax for limiting actions by user. According to 
>>> https://bugs.launchpad.net/nova/+bug/1539351, this behavior was
>>> apparently not intended according to the bug report feedback. The
>>> behavior has changed from v2 to v2.1 and the old syntax no longer works.
>>
>> Well, the behavior changes with the backend code base. By mitaka the
>> default backend code for both is the same. And the legacy code base is
>> about to be removed.
>>
>> This feature (policy enforcement by user_id) was 100% untested, which is
>> why it never ended up in the new API stack. Being untested setting
>> owner: 'user_id: %(user_id)s' might have some really unexpected results
>> because not everything has a user_id.
>>
> 
> There are several hints given in the documentation regarding this sort of 
> feature. 
> 
> Examples are such as http://docs.openstack.org/developer/oslo.policy/api.html 
> and 
> http://docs.openstack.org/mitaka/config-reference/policy-json-file.html#examples

Ok, follow on question.

Is the concern that within a large tenant you do not want user A to
accidentally reboot user B's server? Would the "lock" construct be
sufficient here for users that have servers in critical states?

Are all the failure domains these kinds of failures? Or what is the
detailed list of bad interactions that you are concerned about.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-24 Thread Sean Dague
On 05/24/2016 02:22 AM, Jerome Pansanel wrote:
> Hi,
> 
> Le 23/05/2016 18:23, Sean Dague a écrit :
>> On 05/23/2016 11:56 AM, Tim Bell wrote:
>>> On 23/05/16 17:02, "Sean Dague" <s...@dague.net> wrote:
>>>
>>>> On 05/23/2016 10:24 AM, Tim Bell wrote:
>>>>>  
>>>>>
> [...]
>>>>> There can be security implications also so I’d recommend those using
>>>>> this current v2 feature to review the bug to understand the potential
>>>>> impacts as clouds enable v2.1.
>>>>
>>>> While I understand from the bug report what your use case is now, I'm
>>>> kind of wondering what the shared resources / actions of these 150
>>>> people are in this project. Are they all in the same project for other
>>>> reasons?
>>>
>>> The resource pool (i.e. quota) is shared between all of the developers.
>>> A smaller team is responsible for maintaining the image set for the project
>>> and also providing 2nd line support (such as reboot/problem diagnosis…).
>>
>> Ok, so Bob can take up all the instances and go on vacation, and it's a
>> 2nd line support call to handle shutting them down? It definitely
>> creates some weird situations where you can all pull from the pool, and
>> once pulled only you can give back.
>>
>> What's the current policy patch look like? (i.e. which operations are
>> you changing to user_id).
>>
>>> I do not know the EMBL-EBI use case or the EGI Federated Cloud scenarios
>>> which are also mentioned in the review.
> 
> The EGI Federated Cloud scenarios is almost the same. We have tenants
> for several projects and a "catch-all" tenant for small projects (1 or 2
> person per project). Therefore, it is important to be sure that a user
> from one project does not interact with VMs from another one.

Ok, but the catch-all project is just to have less "projects" allocated
in keystone right? Are these users using shared resources. I get there
is a convenience that no one is allocating projects... and this just
falls back to AD and people get in dynamically, but that seems like we
could solve project on demand differently.

I think part of the challenge here is that fundamentally project is
intended to be the unit of sharing. Because policy is so open ending
people did a bunch of things which effectively made nested projects,
which work in some cases, but might have some really odd edge conditions
based on what actions are enforced where. It also really breaks the
construct of

GET /servers

Being the list of servers you can do things to. Which... is a pretty
fundamental contract point in Nova. And confusing that point across
clouds makes it really hard for people to build software against the API
that your users can just use.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-23 Thread Sean Dague
On 05/23/2016 11:56 AM, Tim Bell wrote:
> On 23/05/16 17:02, "Sean Dague" <s...@dague.net> wrote:
> 
>> On 05/23/2016 10:24 AM, Tim Bell wrote:
>>>  
>>>
>>> Quick warning for those who are dependent on the "user_id:%(user_id)s"
>>> syntax for limiting actions by user. According to 
>>> https://bugs.launchpad.net/nova/+bug/1539351, this behavior was
>>> apparently not intended according to the bug report feedback. The
>>> behavior has changed from v2 to v2.1 and the old syntax no longer works.
>>
>> Well, the behavior changes with the backend code base. By mitaka the
>> default backend code for both is the same. And the legacy code base is
>> about to be removed.
>>
>> This feature (policy enforcement by user_id) was 100% untested, which is
>> why it never ended up in the new API stack. Being untested setting
>> owner: 'user_id: %(user_id)s' might have some really unexpected results
>> because not everything has a user_id.
>>
> 
> There are several hints given in the documentation regarding this sort of 
> feature. 
> 
> Examples are such as http://docs.openstack.org/developer/oslo.policy/api.html 
> and 
> http://docs.openstack.org/mitaka/config-reference/policy-json-file.html#examples

Ok, so those are good points of documentation to bring back in line with
whatever reality we feel is the one to land on. Keeping user_id support
in Nova policy is going to require someone to write a lot of tests to
verify it, because it's never been in the current stack, again, because
it was never really implemented, because there were never any tests for
this scenario anywhere.

>>> There can be security implications also so I’d recommend those using
>>> this current v2 feature to review the bug to understand the potential
>>> impacts as clouds enable v2.1.
>>
>> While I understand from the bug report what your use case is now, I'm
>> kind of wondering what the shared resources / actions of these 150
>> people are in this project. Are they all in the same project for other
>> reasons?
> 
> The resource pool (i.e. quota) is shared between all of the developers.
> A smaller team is responsible for maintaining the image set for the project
> and also providing 2nd line support (such as reboot/problem diagnosis…).

Ok, so Bob can take up all the instances and go on vacation, and it's a
2nd line support call to handle shutting them down? It definitely
creates some weird situations where you can all pull from the pool, and
once pulled only you can give back.

What's the current policy patch look like? (i.e. which operations are
you changing to user_id).

> I do not know the EMBL-EBI use case or the EGI Federated Cloud scenarios
> which are also mentioned in the review.

Those would be good. I honestly think we need someone to start capturing
these in a spec, because a huge part of the disconnect here was this was
a backdoor feature that no one on the development side really understood
existed, was never tested, and didn't think it was the way things were
supposed to be working. And if we are bringing it back we really need to
capture the use cases a lot more clearly so in 5 years we don't do the
same thing again.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Nova 2.1 and user permissions in the policy file

2016-05-23 Thread Sean Dague
On 05/23/2016 10:24 AM, Tim Bell wrote:
>  
> 
> Quick warning for those who are dependent on the "user_id:%(user_id)s"
> syntax for limiting actions by user. According to 
> https://bugs.launchpad.net/nova/+bug/1539351, this behavior was
> apparently not intended according to the bug report feedback. The
> behavior has changed from v2 to v2.1 and the old syntax no longer works.

Well, the behavior changes with the backend code base. By mitaka the
default backend code for both is the same. And the legacy code base is
about to be removed.

This feature (policy enforcement by user_id) was 100% untested, which is
why it never ended up in the new API stack. Being untested setting
owner: 'user_id: %(user_id)s' might have some really unexpected results
because not everything has a user_id.

> There can be security implications also so I’d recommend those using
> this current v2 feature to review the bug to understand the potential
> impacts as clouds enable v2.1.

While I understand from the bug report what your use case is now, I'm
kind of wondering what the shared resources / actions of these 150
people are in this project. Are they all in the same project for other
reasons?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] disabling deprecated APIs by config?

2016-05-18 Thread Sean Dague
nova-net is now deprecated - https://review.openstack.org/#/c/310539/

And we're in the process in Nova of doing some spring cleaning and
deprecating the proxies to other services -
https://review.openstack.org/#/c/312209/

At some point in the future after deprecation the proxy code is going to
stop working. Either accidentally, because we're not going to test or
fix this forever (and we aren't going to track upstream API changes to
the proxy targets), or intentionally when we decide to delete it to make
it easier to address core features and bugs that everyone wants addressed.

However, the world moves forward slowly. Consider the following scenario.

We delete nova-net & the network proxy entirely in Peru (a not entirely
unrealistic idea). At that release there are a bunch of people just
getting around to Newton. Their deployments allow all these things to
happen which are going to 100% break when they upgrade, and people are
writing more and more OpenStack software every cycle.

How do we signal to users this kind of deprecation? Can we give sites
tools to help prevent new software being written to deprecated (and
scheduled for deletion) APIs?

One idea was a "big red switch" in the format of a config option
``disable_deprecated_apis=True`` (defaults to False). Which would set
all deprecated APIs to 404 routes.

One of the nice ideas here is this would allow some API servers to have
this set, and others not. So users could point to the "clean" API
server, figure out that they will break, but the default API server
would still support these deprecated APIs. Or, conversely, the default
could be the clean API server, and a legacy API server endpoint could be
provided for projects that really needed it that included these
deprecated things for now. Either way it would allow some site assisted
transition. And be something like the -Werror flag in gcc.

In the Nova case the kinds of things ending up in this bucket are going
to be interfaces that people *really* shouldn't be using any more. Many
of them data back to when OpenStack was only 2 projects, and the concept
of splitting out function wasn't really thought about (note: we're
getting ahead of this one for the 'placement' rest API, so it won't have
any of these issues). At some point this house cleaning was going to
have to happen, and now seems to be the time to do get it rolling.

Feedback on this idea would be welcomed. We're going to deprecate the
proxy APIs regardless, however disable_deprecated_apis is it's own idea
and consequences, and we really want feedback before pushing forward on
this.

    -Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] nova snapshots should dump all RAM to hypervisor disk ?

2016-04-25 Thread Sean Dague
On 04/24/2016 10:15 AM, Matt Riedemann wrote:
>>
> 
> To clarify, live snapshots aren't disabled by default because they don't
> work, it's because at least with libvirt 1.2.2 and QEMU 2.0 (which is
> what we test against in the gate), we'd hit a lot of failures (about 25%
> failure rate with the devstack/tempest jobs) when running live snapshot,
> so we suspect there are concurrency issues when running live snapshot on
> a compute along with other operations at the same time (the CI jobs run
> 4 tests concurrently on a single-node devstack).
> 
> This might not be an issue on newer libvirt/QEMU, we'll see when we
> start testing with Ubuntu 16.04 nodes.

Correct. I tried to build a clarifying comment in the config option here
(as I realized it wasn't described all that well in docs) -
https://review.openstack.org/#/c/309629/

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Anyone else use vendordata_driver in nova.conf?

2016-04-18 Thread Sean Dague
On 04/18/2016 10:13 AM, Ned Rhudy (BLOOMBERG/ 731 LEX) wrote:
> Requiring users to remember to pass specific userdata through to their
> instance at every launch in order to replace functionality that
> currently works invisible to them would be a step backwards. It's an
> alternative, yes, but it's an alternative that adds burden to our users
> and is not one we would pursue.
> 
> What is the rationale for desiring to remove this functionality?

The Nova team would like to remove every config option that specifies an
arbitrary out of tree class file at a function point. This has been the
sentiment for a while and we did a wave of deprecations at the end of
Mitaka to signal this more broadly, because as an arbitrary class loader
it completely impossible to even understand who might be using it and how.

These interfaces are not considered stable or contractual, so exposing
them as raw class loader is something that we want to stop doing, as
we're going to horribly break people at some point. It's fine if there
are multiple implementations for these things, however those should all
be upstream, and selected by a symbolic name CONF option.

One of the alternatives is to propose your solution upstream.

    -Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] question on project_id formats

2016-01-11 Thread Sean Dague
It has come in looking at ways to remove project_ids from API urls that
the format of a project_id is actually quite poorly specified in
OpenStack. This makes dual stacking supporting the project_id and not
the project id at the same time a little challenging because the
matching to figure out which controller is going to get called goes a
little nuts.

For anyone using upstream Keystone, they'll get a uuid (no dashes)
unless they work really hard at changing it.

RAX currently doesn't use upstream Keystone, they use ints as their
project ids.

What I'm curious about is if anyone else is doing something in
specifying project_ids in their cloud. As this is going to impact the
solution going forward here.

We're currently pondering a default enforcement of uuid form if you are
using the project id in the url, with a config override to allow a
different schema (as we know there is at least one cloud in the wild
that is different).

Comments welcomed.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Service Catalog TNG urls

2015-12-03 Thread Sean Dague
For folks that don't know, we've got an effort under way to look at some
of what's happened with the service catalog, how it's organically grown,
and do some pruning and tuning to make sure it's going to support what
we want to do with OpenStack for the next 5 years (wiki page to dive
deeper here - https://wiki.openstack.org/wiki/ServiceCatalogTNG).

One of the early Open Questions is about urls. Today there is a
completely free form field to specify urls, and there are conventions
about having publicURL, internalURL, adminURL. These are, however, only
conventions.

The only project that's ever really used adminURL has been Keystone, so
that's something we feel we can phase out in new representations.

The real question / concern is around public vs. internal. And something
we'd love feedback from people on.

When this was brought up in Tokyo the answer we got was that internal
URL was important because:

* users trusted it to mean "I won't get changed for bandwidth"
* it is often http instead of https, which provides a 20% performance
gain for transfering large amounts of data (i.e. glance images)

The question is, how hard would it be for sites to be configured so that
internal routing is used whenever possible? Or is this a concept we need
to formalize and make user applications always need to make the decision
about which interface they should access?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [gate] devstack / grenade branching day - minor snafu

2015-09-30 Thread Sean Dague
FYI, we've got a bit of a gate snafu this morning in cutting the
devstack / grenade branches for stable/liberty, as devstack-gate did not
yet have the support for stable/liberty in it.

That patch is up at - https://review.openstack.org/#/c/229363/

With any luck this all gets resolved in the next couple of hours. But
things are going to be a bit rocky until that bit lands.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova] Backlog Specs: a way to send requirements to the developer community

2015-05-15 Thread Sean Dague
On 05/15/2015 07:20 AM, Boris Pavlovic wrote:
 John,
 
 
 So, you can have all details in the spec, or you can have only the
 problem statement complete. Its up to you as a submitter how much
 detail you want to provide. I would recommend adding rough ideas into
 the alternatives section, and leaving everything else blank except the
 problem statement.
 
 
  We are trying to use a single process for parked developer ideas and
 operator ideas, so they are on an equal footing.
 The reason its not just a bug or similar, is so we are able to
 review the idea with the submitter, to ensure we have a good enough
 problem description, so a developer can pick up and run with the idea,
 and turn it into a more complete spec targeted at a specific release.
 In addition, we are ensuring that the problem description is in scope.
 
 
 Feature requests are the same process as specs. And I fully agree 
 that bug reports are not good idea for this. 
 
 The only difference is template. I believe you won't find end users
 that would like to spend time to read all this steps:
 http://specs.openstack.org/openstack/nova-specs/specs/kilo/template.html 
 and provide required info. 
 
 As an end users I would like to spend maximum 5 minutes to provide info
 about:
 - use case
 - problem description 
 - possible solution [optional]
 
 
 What about making nova template simpler? 
 And actually doing this across all projects? 

So... maximum 5 minutes is where I think that idea goes horribly off
the rails. Because it actually sets up the *wrong* expectations of how
we make progress as a community, and is only going to cause anger and
frustration by all parties.

Adding most features to most projects requires a reasonable conversation
about how that impacts other parts of that project, and other projects
in OpenStack, and how it impacts existing deploys of OpenStack, and
compatibility between OpenStack implementations in the field, especially
as we try to get more serious about interoperability.

I get that everyone wants an easy button, and for magical elves to do
stuff. However I think Rally is the anomaly here, as it doesn't need to
address many of these concerns for where it lives in the tools stack.

And easy process for submit, that ends up with too little information or
engagement to be useful, is worse than none at all. Because then there
is just a black hole in the middle where everything is going to get lost.

We make progress as a community by building strong communication ties
across different points of views, reaching out across processes, and
being willing, on all parts, to invest time to move things forward
together. It's one of the reasons I think the Ops meetups have been very
successful, as they provide really great forums to do that.

We don't do it with a max 5 minute submit process.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] OpenStack services and ca certificate config entries

2015-03-26 Thread Sean Dague
 design summit (which operators 
 are
  officially part of).
 
  - jlk
 
  ___
  OpenStack-operators mailing list
  OpenStack-operators@lists.openstack.org
 mailto:OpenStack-operators@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 
 
 
 
 --
 Rackspace Australia
 
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 mailto:OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 
 
 -Erik
 
 
 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 


-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Deprecation of in tree EC2 API in Nova for Kilo release

2015-01-28 Thread Sean Dague
The following review for Kilo deprecates the EC2 API in Nova -
https://review.openstack.org/#/c/150929/

There are a number of reasons for this. The EC2 API has been slowly
rotting in the Nova tree, never was highly tested, implements a
substantially older version of what AWS has, and currently can't work
with any recent releases of the boto library (due to implementing
extremely old version of auth). This has given the misunderstanding that
it's a first class supported feature in OpenStack, which it hasn't been
in quite sometime. Deprecating honestly communicates where we stand.

There is a new stackforge project which is getting some activity now -
https://github.com/stackforge/ec2-api. The intent and hope is that is
the path forward for the portion of the community that wants this
feature, and that efforts will be focused there.

Comments are welcomed, but we've attempted to get more people engaged to
address these issues over the last 18 months, and never really had
anyone step up. Without some real maintainers of this code in Nova (and
tests somewhere in the community) it's really no longer viable.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators