from:"Day, Phil"

Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource Tracking

2014-08-19 Thread Day, Phil

 -Original Message-
 From: Nikola Đipanov [mailto:ndipa...@redhat.com]
 Sent: 19 August 2014 17:50
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource
 Tracking

 On 08/19/2014 06:39 PM, Sylvain Bauza wrote:
  On the other hand, ERT discussion is decoupled from the scheduler
  split discussion and will be delayed until Extensible Resource Tracker
  owner (Paul Murray) is back from vacation.
  In the mean time, we're considering new patches using ERT as
  non-acceptable, at least until a decision is made about ERT.

 Even though this was not officially agreed I think this is the least we can do
 under the circumstances.

 A reminder that a revert proposal is up for review still, and I consider it 
 fair
 game to approve, although it would be great if we could hear from Paul first:

   https://review.openstack.org/115218

Given the general consensus seemed to be to wait some before deciding what to 
do here, isn't putting the revert patch up for approval a tad premature ?

The RT may be not able to cope with all of the new and more complex resource 
types we're now trying to schedule, and so it's not surprising that the ERT 
can't fix that.  It does however address some specific use cases that the 
current RT can't cope with,  the spec had a pretty through review under the new 
process, and was discussed during the last 2 design summits.   It worries me 
that we're continually failing to make even small and useful progress in this 
area.

Sylvain's approach of leaving the ERT in place so it can be used for the use 
cases it was designed for while holding back on doing some of the more complex 
things than might need either further work in the ERT, or some more fundamental 
work in the RT (which feels like as L or M timescales based on current 
progress) seemed pretty pragmatic to me.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource Tracking

2014-08-20 Thread Day, Phil

 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 20 August 2014 14:13
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource
 Tracking

 On Wed, Aug 20, 2014 at 08:33:31AM -0400, Jay Pipes wrote:
  On 08/20/2014 04:48 AM, Nikola Đipanov wrote:
  On 08/20/2014 08:27 AM, Joe Gordon wrote:
  On Aug 19, 2014 10:45 AM, Day, Phil philip@hp.com
  mailto:philip@hp.com wrote:

  -Original Message-
  From: Nikola Đipanov [mailto:ndipa...@redhat.com
  mailto:ndipa...@redhat.com]
  Sent: 19 August 2014 17:50
  To: openstack-dev@lists.openstack.org
  mailto:openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible
  Resource
  Tracking

  On 08/19/2014 06:39 PM, Sylvain Bauza wrote:
  On the other hand, ERT discussion is decoupled from the scheduler
  split discussion and will be delayed until Extensible Resource
  Tracker owner (Paul Murray) is back from vacation.
  In the mean time, we're considering new patches using ERT as
  non-acceptable, at least until a decision is made about ERT.

  Even though this was not officially agreed I think this is the
  least
  we can do
  under the circumstances.

  A reminder that a revert proposal is up for review still, and I
  consider it fair
  game to approve, although it would be great if we could hear from
  Paul first:

 https://review.openstack.org/115218

  Given the general consensus seemed to be to wait some before
  deciding
  what to do here, isn't putting the revert patch up for approval a
  tad premature ?

  There was a recent discussion about reverting patches, and from that
  (but not only) my understanding is that we should revert whenever in
 doubt.

  Right.

  http://lists.openstack.org/pipermail/openstack-dev/2014-August/042728.
  html

  Putting the patch back in is easy, and if proven wrong I'd be the
  first to +2 it. As scary as they sound - I don't think reverts are a big 
  deal.

  Neither do I. I think it's more appropriate to revert quickly and then
  add it back after any discussions, per the above revert policy.

  The RT may be not able to cope with all of the new and more complex
  resource types we're now trying to schedule, and so it's not
  surprising that the ERT can't fix that.  It does however address
  some specific use cases that the current RT can't cope with,  the
  spec had a pretty through review under the new process, and was
 discussed during the last
  2 design summits.   It worries me that we're continually failing to make
  even small and useful progress in this area.

  Sylvain's approach of leaving the ERT in place so it can be used
  for
  the use cases it was designed for while holding back on doing some
  of the more complex things than might need either further work in
  the ERT, or some more fundamental work in the RT (which feels like
  as L or M timescales based on current progress) seemed pretty
 pragmatic to me.

  ++, I really don't like the idea of rushing the revert of a feature
  ++that
  went through significant design discussion especially when the
  author is away and cannot defend it.

  Fair enough - I will WIP the revert until Phil is back. It's the
  right thing to do seeing that he is away.

  Well, it's as much (or more?) Paul Murray and Andrea Rosa :)

  However - I don't agree with using the length of discussion around
  the feature as a valid argument against reverting.

  Neither do I.

  I've supplied several technical arguments on the original thread to
  why I think we should revert it, and would expect a discussion that
  either refutes them, or provides alternative ways forward.

  Saying 'but we talked about it at length' is the ultimate appeal to
  imaginary authority and frankly not helping at all.

  Agreed. Perhaps it's just my provocative nature, but I hear a lot of
  we've already decided/discussed this talk especially around the
  scheduler and RT stuff, and I don't think the argument holds much
  water. We should all be willing to reconsider design decisions and
  discussions when appropriate, and in the case of the RT, this
  discussion is timely and appropriate due to the push to split the scheduler
 out of Nova (prematurely IMO).

 Yes, this is absolutely right. Even if we have approved a spec / blueprint we
 *always* reserve the right to change our minds at a later date if new
 information or points of view come to light. Hopefully this will be fairly
 infrequent and we won't do it lightly, but it is a key thing we accept as a
 possible outcome of the process we follow.

My point was more that reverting a patch that does meet the use cases it was 
designed to cover, even if there is something more fundamental that needs to be 
looked at to cover some new use cases that weren't considered at the time is 
the route to stagnation.   

It seems (unless

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-09-02 Thread Day, Phil

Adding in such case more bureaucracy (specs) is not the best way to resolve 
team throughput issues...

I’d argue that  if fundamental design disagreements can be surfaced and debated 
at the design stage rather than first emerging on patch set XXX of an 
implementation, and be used to then prioritize what needs to be implemented 
then they do have a useful role to play.

Phil


From: Boris Pavlovic [mailto:bpavlo...@mirantis.com]
Sent: 28 August 2014 23:13
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

Joe,


This is a resource problem, the nova team simply does not have enough people 
doing enough reviews to make this possible.

Adding in such case more bureaucracy (specs) is not the best way to resolve 
team throughput issues...

my 2cents


Best regards,
Boris Pavlovic

On Fri, Aug 29, 2014 at 2:01 AM, Joe Gordon 
joe.gord...@gmail.commailto:joe.gord...@gmail.com wrote:


On Thu, Aug 28, 2014 at 2:43 PM, Alan Kavanagh 
alan.kavan...@ericsson.commailto:alan.kavan...@ericsson.com wrote:
I share Donald's points here, I believe what would help is to clearly describe 
in the Wiki the process and workflow for the BP approval process and build in 
this process how to deal with discrepancies/disagreements and build timeframes 
for each stage and process of appeal etc.
The current process would benefit from some fine tuning and helping to build 
safe guards and time limits/deadlines so folks can expect responses within a 
reasonable time and not be left waiting in the cold.


This is a resource problem, the nova team simply does not have enough people 
doing enough reviews to make this possible.

My 2cents!
/Alan

-Original Message-
From: Dugger, Donald D 
[mailto:donald.d.dug...@intel.commailto:donald.d.dug...@intel.com]
Sent: August-28-14 10:43 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

I would contend that that right there is an indication that there's a problem 
with the process.  You submit a BP and then you have no idea of what is 
happening and no way of addressing any issues.  If the priority is wrong I can 
explain why I think the priority should be higher, getting stonewalled leaves 
me with no idea what's wrong and no way to address any problems.

I think, in general, almost everyone is more than willing to adjust proposals 
based upon feedback.  Tell me what you think is wrong and I'll either explain 
why the proposal is correct or I'll change it to address the concerns.

Trying to deal with silence is really hard and really frustrating.  Especially 
given that we're not supposed to spam the mailing it's really hard to know what 
to do.  I don't know the solution but we need to do something.  More core team 
members would help, maybe something like an automatic timeout where BPs/patches 
with no negative scores and no activity for a week get flagged for special 
handling.

I feel we need to change the process somehow.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786tel:303%2F443-3786

-Original Message-
From: Jay Pipes [mailto:jaypi...@gmail.commailto:jaypi...@gmail.com]
Sent: Thursday, August 28, 2014 1:44 PM
To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

On 08/27/2014 09:04 PM, Dugger, Donald D wrote:
 I'll try and not whine about my pet project but I do think there is a
 problem here.  For the Gantt project to split out the scheduler there
 is a crucial BP that needs to be implemented (
 https://review.openstack.org/#/c/89893/ ) and, unfortunately, the BP
 has been rejected and we'll have to try again for Kilo.  My question
 is did we do something wrong or is the process broken?

 Note that we originally proposed the BP on 4/23/14, went through 10
 iterations to the final version on 7/25/14 and the final version got
 three +1s and a +2 by 8/5.  Unfortunately, even after reaching out to
 specific people, we didn't get the second +2, hence the rejection.

 I understand that reviews are a burden and very hard but it seems
 wrong that a BP with multiple positive reviews and no negative reviews
 is dropped because of what looks like indifference.

I would posit that this is not actually indifference. The reason that there may 
not have been 1 +2 from a core team member may very well have been that the 
core team members did not feel that the blueprint's priority was high enough to 
put before other work, or that the core team members did have the time to 
comment on the spec (due to them not feeling the blueprint had the priority to 
justify the time to do a full review).

Note that I'm not a core drivers team member.

Best,
-jay


___
OpenStack-dev mailing list

Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-02 Thread Day, Phil

Needing 3 out of 19 instead of 3 out of 20 isn't an order of magnatude 
according to my calculator.   Its much closer/fairer than making it 2/19 vs 
3/20.

If a change is borderline in that it can only get 2 other cores maybe it 
doesn't have a strong enough case for an exception.

Phil


Sent from Samsung Mobile


 Original message 
From: Nikola Đipanov
Date:02/09/2014 19:41 (GMT+00:00)
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

On 09/02/2014 08:16 PM, Michael Still wrote:
 Hi.

 We're soon to hit feature freeze, as discussed in Thierry's recent
 email. I'd like to outline the process for requesting a freeze
 exception:

 * your code must already be up for review
 * your blueprint must have an approved spec
 * you need three (3) sponsoring cores for an exception to be granted

Can core reviewers who have features up for review have this number
lowered to two (2) sponsoring cores, as they in reality then need four
(4) cores (since they themselves are one (1) core but cannot really
vote) making it an order of magnitude more difficult for them to hit
this checkbox?

Thanks,
N.

 * exceptions must be granted before midnight, Friday this week
 (September 5) UTC
 * the exception is valid until midnight Friday next week
 (September 12) UTC when all exceptions expire

 For reference, our rc1 drops on approximately 25 September, so the
 exception period needs to be short to maximise stabilization time.

 John Garbutt and I will both be granting exceptions, to maximise our
 timezone coverage. We will grant exceptions as they come in and gather
 the required number of cores, although I have also carved some time
 out in the nova IRC meeting this week for people to discuss specific
 exception requests.

 Michael



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-04 Thread Day, Phil

 -Original Message-
 From: Nikola Đipanov [mailto:ndipa...@redhat.com]
 Sent: 03 September 2014 10:50
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for
 Juno

snip

 I will follow up with a more detailed email about what I believe we are
 missing, once the FF settles and I have applied some soothing creme to my
 burnout wounds, but currently my sentiment is:

 Contributing features to Nova nowadays SUCKS!!1 (even as a core
 reviewer) We _have_ to change that!

 N.

While agreeing with your overall sentiment, what worries me a tad is implied 
perception that contributing as a core should somehow be easier that as a 
mortal.While I might expect cores to produce better initial code, I though 
the process and standards were intended to be a level playing field.

Has anyone looked at the review bandwidth issue from the perspective of whether 
there has been a change in the amount of time cores now spend contributing vs 
reviewing ?
Maybe there's an opportunity to get cores to mentor non-cores to do the code 
production, freeing up review cycles ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-04 Thread Day, Phil

 
  One final note: the specs referenced above didn't get approved until
  Spec Freeze, which seemed to leave me with less time to implement
  things.  In fact, it seemed that a lot of specs didn't get approved
  until spec freeze.  Perhaps if we had more staggered approval of
  specs, we'd have more staggered submission of patches, and thus less of a
 sudden influx of patches in the couple weeks before feature proposal
 freeze.
 
 Yeah I think the specs were getting approved too late into the cycle, I was
 actually surprised at how far out the schedules were going in allowing things
 in and then allowing exceptions after that.
 
 Hopefully the ideas around priorities/slots/runways will help stagger some of
 this also.
 
I think there is a problem with the pattern that seemed to emerge in June where 
the J.1 period was taken up with spec review  (a lot of good reviews happened 
early in that period, but the approvals kind of came in a lump at the end)  
meaning that the implementation work itself only seemed to really kick in 
during J.2 - and not surprisingly given the complexity of some of the changes 
ran late into J.3.   

We also has previously noted didn’t do any prioritization between those specs 
that were approved - so it was always going to be a race to who managed to get 
code up for review first.  

It kind of feels to me as if the ideal model would be if we were doing spec 
review for K now (i.e during the FF / stabilization period) so that we hit 
Paris with a lot of the input already registered and a clear idea of the range  
of things folks want to do.We shouldn't really have to ask for session 
suggestions for the summit  - they should be something that can be extracted 
from the proposed specs (maybe we do voting across the specs or something like 
that).In that way the summit would be able to confirm the list of specs for 
K and the priority order.

With the current state of the review queue maybe we can’t quite hit this 
pattern for K, but would be worth aspiring to for I ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Day, Phil

Hi Daniel,

Thanks for putting together such a thoughtful piece - I probably need to 
re-read it  few times to take in everything you're saying, but  a couple of 
thoughts that did occur to me:

- I can see how this could help where a change is fully contained within a virt 
driver, but I wonder how many of those there really are ?   Of the things that 
I've see go through recently nearly all also seem to touch the compute manager 
in someway, and a lot (like the Numa changes) also have impacts into the 
scheduler. Isn't it going to make it harder to get any of those changes in 
if they have to be co-ordinated across two or more repos ?  

- I think you hit the nail on the head in terms of the scope of Nova and how 
few people probably really understand all of it, but given the amount of trust 
that goes with being a core wouldn't it also be able to make people cores on 
the understanding that they will only approve code in the areas they are expert 
in ?It kind of feels that this happens to a large extent already, for 
example I don't see Chris or Ken'ichi  taking on work outside of the API layer. 
   It kind of feels as if given a small amount of trust we could have 
additional core reviewers focused on specific parts of the system without 
having to split up the code base if that's where the problem is.

Phil




 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 04 September 2014 11:24
 To: OpenStack Development
 Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt
 drivers
 
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that Nova is
 heading for (or probably already at) a major crisis. If steps are not taken to
 avert this, the project is likely to loose a non-trivial amount of talent, 
 both
 regular code contributors and core team members. That includes myself. This
 is not good for Nova's long term health and so should be of concern to
 anyone involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive summary is
 that the nova-core team is an unfixable bottleneck in our development
 process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt drivers out 
 of
 tree and let them all have their own core teams in their area of code, leaving
 current nova core to focus on all the common code outside the virt driver
 impls. I, now, none the less urge people to read the whole mail.
 
 
 Background information
 ==
 
 I see many factors coming together to form the crisis
 
  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing
 
 Each item on their own may not seem too bad, but combined they add up to
 a big problem.
 
 Core team burn out
 --
 
 Having been involved in Nova for several dev cycles now, it is clear that the
 backlog of code up for review never goes away. Even intensive code review
 efforts at various points in the dev cycle makes only a small impact on the
 backlog. This has a pretty significant impact on core team members, as their
 work is never done. At best, the dial is sometimes set to 10, instead of 11.
 
 Many people, myself included, have built tools to help deal with the reviews
 in a more efficient manner than plain gerrit allows for. These certainly help,
 but they can't ever solve the problem on their own - just make it slightly
 more bearable. And this is not even considering that core team members
 might have useful contributions to make in ways beyond just code review.
 Ultimately the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they have done
 many times already).
 
 Even if one person attempts to take the initiative to heavily invest in review
 of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag team' it is hard
 for one person to make a difference. The end result is that a patch is +2d and
 then sits idle for weeks or more until a merge conflict requires it to be
 reposted at which point even that one +2 is lost. This is a pretty 
 demotivating
 outcome for both reviewers  the patch contributor.
 
 
 New core team talent
 
 
 It can't escape attention that the Nova core team does not grow in size very
 often. When Nova was younger and its code base was smaller, it was easier
 for contributors to get onto core because the base level of knowledge
 required was that much smaller. To get onto core today requires a major
 investment in learning Nova over a year or more. Even people who
 potentially have the latent skills may not

[openstack-dev] [nova] FFE server-group-quotas

2014-09-05 Thread Day, Phil

Hi,

I'd like to ask for a FFE for the 3 patchsets that implement quotas for server 
groups.

Server groups (which landed in Icehouse) provides a really useful anti-affinity 
filter for scheduling that a lot of customers woudl like to use, but without 
some form of quota control to limit the amount of anti-affinity its impossible 
to enable it as a feature in a public cloud.

The code itself is pretty simple - the number of files touched is a side-effect 
of having three V2 APIs that report quota information and the need to protect 
the change in V2 via yet another extension.

https://review.openstack.org/#/c/104957/
https://review.openstack.org/#/c/116073/
https://review.openstack.org/#/c/116079/

Phil

 -Original Message-
 From: Sahid Orentino Ferdjaoui [mailto:sahid.ferdja...@redhat.com]
 Sent: 04 September 2014 13:42
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [nova] FFE request serial-ports
 
 Hello,
 
 I would like to request a FFE for 4 changesets to complete the blueprint
 serial-ports.
 
 Topic on gerrit:
 
 https://review.openstack.org/#/q/status:open+project:openstack/nova+br
 anch:master+topic:bp/serial-ports,n,z
 
 Blueprint on launchpad.net:
   https://blueprints.launchpad.net/nova/+spec/serial-ports
 
 They have already been approved but didn't get enough time to be merged
 by the gate.
 
 Sponsored by:
 Daniel Berrange
 Nikola Dipanov
 
 s.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Day, Phil

 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 05 September 2014 11:49
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out
 virt drivers

 On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

  Ahem, IIRC, there is a third proposal for Kilo :
   - create subteam's half-cores responsible for reviewing patch's
  iterations and send to cores approvals requests once they consider the
  patch enough stable for it.

  As I explained, it would allow to free up reviewing time for cores
  without loosing the control over what is being merged.

 I don't really understand how the half core idea works outside of a math
 equation, because the point is in core is to have trust over the judgement of
 your fellow core members so that they can land code when you aren't
 looking. I'm not sure how I manage to build up half trust in someone any
 quicker.

   -Sean

You seem to be looking at a model Sean where trust is purely binary - you’re 
either trusted to know about all of Nova or not trusted at all.  

What Sylvain is proposing (I think) is something more akin to having folks that 
are trusted in some areas of the system and/or trusted to be right enough of 
the time that their reviewing skills take a significant part of the burden of 
the core reviewers.That kind of incremental development of trust feels like 
a fairly natural model me.Its some way between the full divide and rule 
approach of splitting out various components (which doesn't feel like a short 
term solution) and the blanket approach of adding more cores.

Making it easier to incrementally grant trust, and having the processes and 
will to remove it if its seen to be misused feels to me like it has to be part 
of the solution to breaking out of the we need more people we trust, but we 
don’t feel comfortable trusting more than N people at any one time.  Sometimes 
you have to give people a chance in small, well defined and controlled steps.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] FFE server-group-quotas

2014-09-05 Thread Day, Phil

The corresponding Tempest change is also ready to roll (thanks to Ken'inci):  
https://review.openstack.org/#/c/112474/1   so its kind of just a question of 
getting the sequence right.

Phil


 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 05 September 2014 17:05
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] FFE server-group-quotas
 
 On 09/05/2014 11:28 AM, Ken'ichi Ohmichi wrote:
  2014-09-05 21:56 GMT+09:00 Day, Phil philip@hp.com:
  Hi,
 
  I'd like to ask for a FFE for the 3 patchsets that implement quotas for
 server groups.
 
  Server groups (which landed in Icehouse) provides a really useful anti-
 affinity filter for scheduling that a lot of customers woudl like to use, but
 without some form of quota control to limit the amount of anti-affinity its
 impossible to enable it as a feature in a public cloud.
 
  The code itself is pretty simple - the number of files touched is a side-
 effect of having three V2 APIs that report quota information and the need to
 protect the change in V2 via yet another extension.
 
  https://review.openstack.org/#/c/104957/
  https://review.openstack.org/#/c/116073/
  https://review.openstack.org/#/c/116079/
 
  I am happy to sponsor this work.
 
  Thanks
  Ken'ichi ohmichi
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 These look like they are also all blocked by Tempest because it's changing
 return chunks. How does one propose to resolve that, as I don't think there
 is an agreed path up there for to get this into a passing state from my 
 reading
 of the reviews.
 
   -Sean
 
 --
 Sean Dague
 http://dague.net
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Expand resource name allowed characters

2014-09-17 Thread Day, Phil

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 12 September 2014 19:37
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Expand resource name allowed
 characters
 
 Had to laugh about the PILE OF POO character :) Comments inline...

Can we get support for that in gerrit ?
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] are we going to remove the novaclient v3 shell or what?

2014-09-18 Thread Day, Phil

 -Original Message-
 From: Kenichi Oomichi [mailto:oomi...@mxs.nes.nec.co.jp]
 Sent: 18 September 2014 02:44
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] are we going to remove the novaclient
 v3 shell or what?

  -Original Message-
  From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
  Sent: Wednesday, September 17, 2014 11:59 PM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: [openstack-dev] [nova] are we going to remove the novaclient v3
 shell or what?

  This has come up a couple of times in IRC now but the people that
  probably know the answer aren't available.

  There are python-novaclient patches that are adding new CLIs to the v2
  (v1_1) and v3 shells, but now that we have the v2.1 API (v2 on v3) why
  do we still have a v3 shell in the client?  Are there plans to remove that?

  I don't really care either way, but need to know for code reviews.

  One example: [1]

  [1] https://review.openstack.org/#/c/108942/

 Sorry for a little late response,
 I think we don't need new features of v3 into novaclient anymore.
 For example, the v3 part of the above[1] was not necessary because a new
 feature server-group quota is provided as v2 and v2.1, not v3.

That would be true if there was a version of the client that supported v2.1 
today, but while the V2.1 API is still presented as V3 and doesn't include the 
tenant_id - making the V3 client the only simple way to test new V2.1 features 
in devstack as far as I can see.

How about this as a plan:

1) We add support to the client for --os-compute-api-version=v2.1   which 
maps into the client with the URL set to include v2.1(this won't be usable 
until we do step 2)

2) We change the Nova  to present the v2.1 API  as 
'http://X.X.X.X:8774/v2.1/tenant_id/
 - At this point we will have a working client for all of the stuff that's been 
moved back from V3 to V2.1, but will lose access to any V3 stuff not yet moved 
(which is the opposite of the current state where the v3 client can only be 
used for things that haven't been refactored to V2.1)

3) We remove V3 from the client.

Until we get 1  2 done, to me it still makes sense to allow small changes into 
the v3 client, so that we keep it usable with the V2.1 API

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] are we going to remove the novaclient v3 shell or what?

2014-09-19 Thread Day, Phil


 
  DevStack doesn't register v2.1 endpoint to keytone now, but we can use
  it with calling it directly.
  It is true that it is difficult to use v2.1 API now and we can check
  its behavior via v3 API instead.
 
 I posted a patch[1] for registering v2.1 endpoint to keystone, and I confirmed
 --service-type option of current nova command works for it.

Ah - I'd misunderstood where we'd got to with the v2.1 endpoint, thanks for 
putting me straight.

So with this in place then yes I agree we could stop fixing the v3 client.   

Since its actually broken for even operations like boot do we merge in the 
changes I pushed this week so it can still do basic functions, or just go 
straight to removing v3 from the client ?   
 
Phil
 

 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-22 Thread Day, Phil

Hi Folks,

I'd like to get some opinions on the use of pairs of notification messages for 
simple events.   I get that for complex operations on an instance (create, 
rebuild, etc) a start and end message are useful to help instrument progress 
and how long the operations took.However we also use this pattern for 
things like aggregate creation, which is just a single DB operation - and it 
strikes me as kind of overkill and probably not all that useful to any external 
system compared to a a single event .create event after the DB operation.

There is a change up for review to add notifications for service groups which 
is following this pattern (https://review.openstack.org/#/c/107954/) - the 
author isn't doing  anything wrong in that there just following that pattern, 
but it made me wonder if we shouldn't have some better guidance on when to use 
a single notification rather that a .start/.end pair.

Does anyone else have thoughts on this , or know of external systems that would 
break if we restricted .start and .end usage to long-lived instance operations ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-22 Thread Day, Phil

Hi Daniel,

 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 22 September 2014 12:24
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] - do we need .start and .end
 notifications in all cases ?

 On Mon, Sep 22, 2014 at 11:03:02AM +, Day, Phil wrote:
  Hi Folks,

  I'd like to get some opinions on the use of pairs of notification
  messages for simple events.   I get that for complex operations on
  an instance (create, rebuild, etc) a start and end message are useful
  to help instrument progress and how long the operations took. However
  we also use this pattern for things like aggregate creation, which is
  just a single DB operation - and it strikes me as kind of overkill and
  probably not all that useful to any external system compared to a
  single event .create event after the DB operation.

 A start + end pair is not solely useful for timing, but also potentially 
 detecting
 if it completed successfully. eg if you receive an end event notification you
 know it has completed. That said, if this is a use case we want to target, 
 then
 ideally we'd have a third notification for this failure case, so consumers 
 don't
 have to wait  timeout to detect error.

I'm just a tad worried that this sounds like its starting to use notification 
as a replacement for logging.If we did this for every CRUD operation on an 
object don't we risk flooding the notification system.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-24 Thread Day, Phil

 
  I think we should aim to /always/ have 3 notifications using a pattern
  of
 
 try:
...notify start...
 
...do the work...
 
...notify end...
 except:
...notify abort...
 
 Precisely my viewpoint as well. Unless we standardize on the above, our
 notifications are less than useful, since they will be open to interpretation 
 by
 the consumer as to what precisely they mean (and the consumer will need to
 go looking into the source code to determine when an event actually
 occurred...)
 
 Smells like a blueprint to me. Anyone have objections to me writing one up
 for Kilo?
 
 Best,
 -jay
 
Hi Jay,

So just to be clear, are you saying that we should generate 2 notification 
messages on Rabbit for every DB update ?   That feels like a big overkill for 
me.   If I follow that login then the current state transition notifications 
should also be changed to Starting to update task state / finished updating 
task state  - which seems just daft and confuisng logging with notifications.

Sandy's answer where start /end are used if there is a significant amount of 
work between the two and/or the transaction spans multiple hosts makes a lot 
more sense to me.   Bracketing a single DB call with two notification messages 
rather than just a single one on success to show that something changed would 
seem to me to be much more in keeping with the concept of notifying on key 
events.

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Nova] All I want for Christmas is one more +2 ...

2013-12-12 Thread Day, Phil

Hi Cores,

The Stop, Rescue, and Delete should give guest a chance to shutdown change 
https://review.openstack.org/#/c/35303/ was approved a couple of days ago, but 
failed to merge because the RPC version had moved on.   Its rebased and sitting 
there with one +2 and a bunch of +1s  -would be really nice if it could land 
before it needs another rebase please ?

Thanks
Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] All I want for Christmas is one more +2 ...

2013-12-16 Thread Day, Phil

I've had to rebase this a couple of times since to keep ahead of RPC version 
numbers - sitting there with a +1 from Jenkins at the moment:

https://review.openstack.org/#/c/35303/

Phil 

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 12 December 2013 14:38
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] All I want for Christmas is one more +2
 ...
 
 On 12/12/2013 09:22 AM, Day, Phil wrote:
  Hi Cores,
 
 
 
  The Stop, Rescue, and Delete should give guest a chance to shutdown
  change https://review.openstack.org/#/c/35303/ was approved a couple
  of days ago, but failed to merge because the RPC version had moved on.
  Its rebased and sitting there with one +2 and a bunch of +1s  -would
  be really nice if it could land before it needs another rebase please ?
 
 Approved.
 
 FWIW, I'm fine with folks approving with a single +2 for cases where a patch
 is approved but needed a simple rebase.  This happens pretty often.  We
 even have a script that generates a list of patches still open that were
 previously approved:
 
 http://russellbryant.net/openstack-stats/nova-openapproved.txt
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Future meeting times

2013-12-18 Thread Day, Phil

+1, I would make the 14:00 meeting. I often have good intention of making the 
21:00 meeting,  but it's tough to work in around family life


Sent from Samsung Mobile



 Original message 
From: Joe Gordon joe.gord...@gmail.com
Date:
To: OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Nova] Future meeting times



On Dec 18, 2013 6:38 AM, Russell Bryant 
rbry...@redhat.commailto:rbry...@redhat.com wrote:

 Greetings,

 The weekly Nova meeting [1] has been held on Thursdays at 2100 UTC.
 I've been getting some requests to offer an alternative meeting time.
 I'd like to try out alternating the meeting time between two different
 times to allow more people in our global development team to attend
 meetings and engage in some real-time discussion.

 I propose the alternate meeting time as 1400 UTC.  I realize that
 doesn't help *everyone*, but it should be an improvement for some,
 especially for those in Europe.

 If we proceed with this, we would meet at 2100 UTC on January 2nd, 1400
 UTC on January 9th, and alternate from there.  Note that we will not be
 meeting at all on December 26th as a break for the holidays.

 If you can't attend either of these times, please note that the meetings
 are intended to be supplementary to the openstack-dev mailing list.  In
 the meetings, we check in on status, raise awareness of important
 issues, and progress some discussions with real-time debate, but the
 most important discussions and decisions will always be brought to the
 openstack-dev mailing list, as well.  With that said, active Nova
 contributors are always encouraged to attend and participate if they are
 able.

 Comments welcome, especially some acknowledgement that there are people
 that would attend the alternate meeting time.  :-)

I am fine with this, but I will never be attending the 1400 UTC meetings, as I 
live in utc-8


 Thanks,

 [1] https://wiki.openstack.org/wiki/Meetings/Nova

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Day, Phil

Hi Folks,

I know it may seem odd to be arguing for slowing down a part of the review 
process, but I'd like to float the idea that there should be a minimum review 
period for patches that change existing functionality in a way that isn't 
backwards compatible.

The specific change that got me thinking about this is 
https://review.openstack.org/#/c/63209/ which changes the default fs type from 
ext3 to ext4.I agree with the comments in the commit message that ext4 is a 
much better filesystem, and it probably does make sense to move to that as the 
new default at some point, however there are some old OS's that may still be in 
use that don't support ext4.  By making this change to the default without any 
significant notification period this change has the potential to brake existing 
images and snapshots.  It was already possible to use ext4 via existing 
configuration values, so there was no urgency to this change (and no urgency 
implied in the commit messages, which is neither a bug or blueprint).

I'm not trying to pick out the folks involved in this change in particular, it 
just happened to serve as a good and convenient example of something that I 
think we need to be more aware of and think about having some specific policy 
around.  On the plus side the reviewers did say they would wait 24 hours to see 
if anyone objected, and the actual review went over 4 days - but I'd suggest 
that is still far too quick even in a non-holiday period for something which is 
low priority (the functionality could already be achieved via existing 
configuration options) and which is a change in default behaviour.  (In the 
period around a major holiday there probable needs to be an even longer wait).  
   I know there are those that don't want to see blueprints for every minor 
functional change to the system, but maybe this is a case where a blueprint 
being proposed and reviewed may have caught the impact of the change.With a 
number of people now using a continual deployment approach any change in 
default behaviour needs to be considered not just  for the benefits it brings 
but what it might break.  The advantage we have as a community is that there 
are lot of different perspectives that can be brought to bear on the impact of 
functional changes, but we equally have to make sure there is sufficient time 
for those perspectives to emerge.

Somehow it feels that we're getting the priorities on reviews wrong when a low 
priority changes like this which can  go through in a matter of days, when 
there are bug fixes such as https://review.openstack.org/#/c/57708/ which have 
been sitting for over a month with a number of +1's which don't seem to be 
making any progress.

Cheers,
Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Day, Phil

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 29 December 2013 05:36
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] minimum review period for functional
 changes that break backwards compatibility

 On 29 December 2013 05:15, John Griffith john.griff...@solidfire.com
 wrote:
  I think Sean made some good recommendations in the review (waiting 24
  hours as well as suggesting ML etc).  It seems that cases like this
  don't necessarily need mandated time requirements for review but just
  need good core reviewers to say hey, this is a big deal... we should
  probably get some feedback here etc.

  One thing I am curious about however, Gary made a good point about
  using the default_ephemeral_format= config setting to make this
  pretty easy and straight forward.  I didn't see any other responses to
  that, and it looks like the patch still uses a default of none.
  Quick look at the code it seems like this would be a clean way to go
  about things, any reason why this wasn't discussed further?

 We make a point of running defaults in TripleO: if the defaults aren't
 generally production suitable, they aren't suitable defaults. If/when we find
 a place where there is no sane default, we'll push for having no default and
 forcing a choice to be made.

 ext3 wasn't a sane default :).

ext3 may no longer be the best choice of a default, but that the fact that is 
already established as the default means that we have to plan any changes 
carefully.

 In fact, for CD environments, the ability to set ext3 via config options means
 this change is easy to convert into an arbitrary-time warning period to users,
 if a cloud needs to.

IMO that puts the emphasis in the wrong place - yes given sufficient notice a 
CD user can make changes to their existing images to protect them from this 
change, but that requires them to have sufficient notification to make and test 
that change.  The responsibility should be on reviewers to not allow though 
changes that break backwards compatibility without some form of notice / 
deprecation period - not on the operators to have to monitor for and react to 
changes as they come through.

Phil

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-29 Thread Day, Phil

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 29 December 2013 06:50
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] minimum review period for functional
 changes that break backwards compatibility

 On 29 December 2013 04:21, Day, Phil philip@hp.com wrote:
  Hi Folks,

  I know it may seem odd to be arguing for slowing down a part of the
  review process, but I'd like to float the idea that there should be a
  minimum review period for patches that change existing functionality
  in a way that isn't backwards compatible.

 What is the minimum review period intended to accomplish? I mean:
 everyone that reviewed this *knew* it changed a default, and that guest
 OS's that did support ext3 but don't support ext4 would be broken. 

My point is that for some type of non-urgent change (i.e. those that change 
existing behaviour) there needs to be a longer period to make sure that more 
views and opinions can surface and be taken into account.   Maybe all the 
reviewers in this case did realise the full impact of this change, but that's 
still not the same thing as getting a wide range of input.   This is a change 
which has some significant impact, and there was no prior discussion as far as 
I know in the form of a BP or thread in the mailing list.   There was also no 
real urgency in getting the change merged.

 Would  you like to have seen a different judgement call - e.g.
 'Because this is a backward breaking change, it has to go through one release
 of deprecation warning, and *then* can be made' ?

 Yep, I think that would be appropriate in this case.   There is an impact 
beyond just the GuestOS support that occurred to me since, but I don't want to 
get this thread bogged down in this specific change so I'll start a new thread 
for that.  My point is that where changes are proposed that affect the 
behaviour of the system, and especially where they are not urgent (i.e not high 
priority bug fixes) then we need to slow down the reviews and not assume that 
all possible views / impacts will surface in a few days. 

As I said, there seems to me to be something wrong with the priority around 
changes when non urgent but behaviour changes go though in a few days but we 
have bug fixes sitting with many +1's for over a month.

 One possible reason to want a different judgment call is that the logic about
 impacted OS's was wrong - I claimed (correctly) that every OS has support for
 ext4, but neglected to consider the 13 year lifespan of RHEL...
 https://access.redhat.com/site/support/policy/updates/errata/ shows that
 RHEL 3 and 4 are both still supported, and neither support ext4.
 So folk that are running apps in those legacy environments indeed cannot
 move.

Yep - that's part of my concern for this specific change.   Its an example of 
the kind of detail that I think would have emerged from a longer review cycle 
(at least I know I would have flagged it if I'd had the chance to ;-)

 Another possible reason is that we should have a strict no-exceptions-by-
 default approach to backwards incompatible changes, even when there are
 config settings to override them. Whatever the nub is - lets surface that and
 target it.

Yep - I think we should have a very clear policy around how and when we make 
changes to default behaviour.   That's really the point I'm trying to surface.

 Basically, I'm not sure what problem you're trying to solve - lets tease that
 out, and then talk about how to solve it. Backwards incompatible change
 landed might be the problem - but since every reviewer knew it, having a
 longer review period is clearly not connected to solving the problem :).

That assumes that a longer review period wouldn't of allowed more reviewers to 
provide input - and I'm arguing the opposite.   I also think that some clear 
guidelines might have led to the core reviewers holding this up for longer.   
As I said in my original post, the intent to get more input was clear in the 
reviews, but the period wasn't IMO long enough to make sure all the folks who 
may have something to contribute could.   I'd rather see some established 
guidelines than have to be constantly on the lookout for changes every day or 
so and hoping to catch them in time.   

  The specific change that got me thinking about this is
  https://review.openstack.org/#/c/63209/ which changes the default fs
 type
  from ext3 to ext4.I agree with the comments in the commit message that
  ext4 is a much better filesystem, and it probably does make sense to
  move to that as the new default at some point, however there are some
  old OS's that may still be in use that don't support ext4.  By making
  this change to the

 Per above, these seem to be solely RHEL3 and RHEL4.

And SLES.

It also causes inconsistent behaviour in the system, as any existing default 
backing files will have ext3 in them, so a VM will now get

[openstack-dev] [nova] - Revert change of default ephemeral fs to ext4

2013-12-29 Thread Day, Phil

Hi Folks,

As highlighted in the thread minimal review period for functional changes I'd 
like to propose that change is https://review.openstack.org/#/c/63209/ is 
reverted because:


-  It causes inconsistent behaviour in the system, as any existing 
default backing files will have ext3 in them, so a VM will now get either 
ext3 or 3ext4 depending on whether the host they get created on already has a 
backing file of the required size or not.   I don't think the existing design 
ever considered the default FS changing - maybe we shouldn't have files that 
include default as the file system type if it can change over time, and the 
name should always reflect the FS type.



-  It introduces a new a new requirement for GuestOS's to have to 
support ext4 in order to work with the default configuration.   I think that's 
a significant enough change that it needs to be flagged, discussed, and planned.





I'm about to go off line for a few days and won't have anything other than 
patchy e-mail access, otherwise I'd submit the change myself ;-)


Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-29 Thread Day, Phil

Hi Sean,

I'm not convinced the comparison to my clean shut down change is valid here.  
For sure that proved that beyond a certain point (in that case months) there is 
no additional value in extending the review period, and no amount of review 
will catch all problems,  but that's not the same as saying that there is no 
value in a minimum period.

In this particular case if the review has been open for say three weeks then 
imo the issue would have been caught, as I spotted it as soon as I saw the 
merge.  As it wasn't and urgent bug fix I don't see a major gain from not 
waiting even if there wasn't a problem.

I'm all for continually improving the gate tests, but in this case they would 
need to be testing against a system that had been running before the change, to 
test specifically that new vms got the new fs, so there would have needed to be 
a matching test added to grenade as part of the same commit.

Not quite sure where the number of open changes comes in either, just because 
there are a lot of proposed changes doesn't to me mean we need to rush the non 
urgent ones, it mwans we maybe need better prioritisation.  There are plenty of 
long lived buf fixes siting with many +1s

Phil

Sent from Samsung Mobile



 Original message 
From: Sean Dague s...@dague.net
Date:
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] minimum review period for functional 
changes that break backwards compatibility


On 12/29/2013 03:06 AM, Day, Phil wrote:
snip
 Basically, I'm not sure what problem you're trying to solve - lets tease that
 out, and then talk about how to solve it. Backwards incompatible change
 landed might be the problem - but since every reviewer knew it, having a
 longer review period is clearly not connected to solving the problem :).


 That assumes that a longer review period wouldn't of allowed more reviewers 
 to provide input - and I'm arguing the opposite.   I also think that some 
 clear guidelines might have led to the core reviewers holding this up for 
 longer.   As I said in my original post, the intent to get more input was 
 clear in the reviews, but the period wasn't IMO long enough to make sure all 
 the folks who may have something to contribute could.   I'd rather see some 
 established guidelines than have to be constantly on the lookout for changes 
 every day or so and hoping to catch them in time.

Honestly, there are currently 397 open reviews in Nova. I am not
convinced that waiting on this one would have come up with a better
decision. I'll given an alternative point of view of the graceful
shutdown patch, where we sat on that for months, had many iterations,
landed it, it added 25 minutes to all the test runs (which had been
hinted at sometime in month 2 of the review, but got lots in the mists
of time), and we had to revert it.

I'm not convinced more time brings more wisdom. We did take it to the
list, and there were no objections. I did tell Robert to wait because I
wanted to get those points of view. But they didn't show up. Because it
was holidays could we have waited longer? Sure. I'll take a bad on that
in feeling that Dec 19th wasn't really holidays yet because I was still
working. :) But, honestly, given no negative feedback on the thread in
question and no -1 on the review, the fact that folks like google
skipped ext3 entirely, means this review was probably landing regardless.

Every time we need to do a revert, we need to figure out how to catch it
the next time. Humans be better is really not a solution. So this
sounds like we need a guest compatibility test where we boot a ton of
different guests on each commit and make sure they all work. I'd whole
heartily support getting that in if someone wants to champion that.
That's really going to be the only way we have a systematic way of
knowing that we break SLES in the future.

So the net, we're all human, and sometimes make mistakes. I don't think
we're going to fix this with review policy changes, but we could with
actual CI enhancements.

-Sean

--
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] - Revert change of default ephemeral fs to ext4

2013-12-30 Thread Day, Phil




Sent from Samsung Mobile



 Original message 
From: Pádraig Brady p...@draigbrady.com
Date:
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Cc: Day, Phil philip@hp.com
Subject: Re: [openstack-dev] [nova] - Revert change of default ephemeral fs to 
ext4



 -  It causes inconsistent behaviour in the system, as any existing 
 default backing files will have ext3 in them, so a VM will now get either 
 ext3 or 3ext4 depending on whether the host they get created on already has a 
 backing file of the required size or not.   I don't think the existing design 
 ever considered the default FS changing - maybe we shouldn’t have files that 
 include default as the file system type if it can change over time, and the 
 name should always reflect the FS type.

  I'm not sure this is a practical issue since ephemeral storage is built up 
  from blank by each instance

Maybe it varies by hypervisor; my understanding is that at least in libvirt the 
ephemeral disks are cow layers on a shared common backing file that has the 
file system in it.  The naming convention includes the size and fs type or 
default and is only created once per size-fs combination.

So this is a real issue - I think that maybe the eventual change to ext4 need 
to be combined withoving away from default  in the file name.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] - Revert change of default ephemeral fs to ext4

2013-12-30 Thread Day, Phil

Hi, so it seems we were saying the same thing - new vms get a shared blank 
(empty) file system,  not blank disc.  How big a problem it is that in many 
cases this will be the already created ext3 disk and not ext4 depends I guess 
on how important consistency is to you (to me its pretty important).  Either 
way the change as it stands wont give all new vms an ext4 fs as intended,  so 
its flawed in that regard.

Like you I was thinking that we may have to move away from default being in 
the file name to fix this.

I don't think the cache clean up code ever removes the ephemeral backing files 
though at the moment.

Phil

Sent from Samsung Mobile



 Original message 
From: Pádraig Brady p...@draigbrady.com
Date:
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org,Day, Phil philip@hp.com
Subject: Re: [openstack-dev] [nova] - Revert change of default ephemeral fs to 
ext4


On 12/30/2013 12:39 AM, Pádraig Brady wrote:
 On 12/29/2013 08:12 AM, Day, Phil wrote:
 Hi Folks,

 As highlighted in the thread “minimal review period for functional changes” 
 I’d like to propose that change is https://review.openstack.org/#/c/63209/ 
 is reverted because:

 -  It causes inconsistent behaviour in the system, as any existing 
 default backing files will have ext3 in them, so a VM will now get either 
 ext3 or 3ext4 depending on whether the host they get created on already has 
 a backing file of the required size or not.   I don't think the existing 
 design ever considered the default FS changing - maybe we shouldn’t have 
 files that include default as the file system type if it can change over 
 time, and the name should always reflect the FS type.

 I'm not sure this is a practical issue since ephemeral storage is built up 
 from blank by each instance

Phil Maybe it varies by hypervisor; my understanding is that at least in 
libvirt the ephemeral disks are cow layers on a shared common backing file that 
has the file system in it.
Phil The naming convention includes the size and fs type or default and is 
only created once per size-fs combination.
Phil So this is a real issue - I think that maybe the eventual change to ext4 
need to be combined withoving away from default  in the file name.

Right, what I meant by each instance building ephemeral up from blank each time,
is that each instance will go from either a blank ext3 or blank ext4 each time,
so if they support ext4 then there should be no practical issue. Now agreed 
there
is a consistency issue, which could have performance consistency issues for 
example,
so it's not ideal.

To be complete, for the libvirt case we don't even use these persistent backing 
files
for ephemeral disks if use_cow_images=False or with LVM backing.

To be most general and avoid this consistency issue, I suppose we could change 
the
name format for these cached CoW base images from ephemeral_10_default, 
ephemeral_20_linux etc.
to 'ephemeral_20_' + md5(mkfs_command)[:7]
That would impose a performance hit at first boot with this new logic,
and we'd have to double check that the cache cleaning logic would
handle removing unused older format images.
Alternatively we might just document this issue and put the onus on
users to clean these cached ephemeral disk on upgrade
(the commit is already tagged as DocImpact).

thanks,
Pádraig.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-30 Thread Day, Phil

I wonder if at least part of the problem is that whilst we have prioritisation 
for bugs (via severity) and blueprints (via  approval and target release) that 
doesn't obviously carry through into gerrit.  If it was easier to see what 
we're high and low priory changes it might be easier to decide which need 
attention and which can / should wait for more input ?

At the moment or does feel that a changes chance of getting merged is somewhat 
random,  and we must be able to do better than that.

Of course we'd still need to work out how to prioritise changes which land add 
neither big or bp ( or maybe this is part of the argument for not having such 
changes)

Phil


Sent from  Mobile - spelling will be even worse than normal



 Original message 
From: Robert Collins robe...@robertcollins.net
Date:
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] minimum review period for functional 
changes that break backwards compatibility


On 29 December 2013 21:06, Day, Phil philip@hp.com wrote:

 What is the minimum review period intended to accomplish? I mean:
 everyone that reviewed this *knew* it changed a default, and that guest
 OS's that did support ext3 but don't support ext4 would be broken.

 My point is that for some type of non-urgent change (i.e. those that change 
 existing behaviour) there needs to be a longer period to make sure that more 
 views and opinions can surface and be taken into account.   Maybe all the 
 reviewers in this case did realise the full impact of this change, but that's 
 still not the same thing as getting a wide range of input.   This is a change 
 which has some significant impact, and there was no prior discussion as far 
 as I know in the form of a BP or thread in the mailing list.   There was also 
 no real urgency in getting the change merged.

I disagree that 'longer period' implies 'more views and opinions'.
From the nova open reviews stats:
3rd quartile wait time: 17 days, 7 hours, 14 minutes

25% of *all* open nova reviews have had no review at all in 17 days.

3rd quartile wait time: 23 days, 10 hours, 44 minutes

25% of all open nova reviews have had no -1 or -2 in 23 days.

I'm not debating the merits of more views and opinions - Sean has
pointed out already that automation is better than us having to guess
at when things will or won't work. But even if you accept that more
views and opinions will help, there are over 100 reviews up with *no*
such opinions added already.

Lets say that something like the patch that triggered this went up for
review, and that we established a one month mininum review period for
such patches. There's a 25% chance it would hit 3 weeks with no input
at all. The *effective* time then that a one month minimum period
would set for it would be a week.

Once the volume of reviews needed exceeds a single reviewers capacity,
by definition some reviewers will not see some patches *at all*. At
that point it doesn't matter how long a patch waits, it will never hit
the front of the line for some reviewers unless we have super strict -
and careful - ordering on who reviews what. Which we don't have, and
can't get trivially. But even if we do:

- time is not a good proxy for attention, care, detail or pretty much
any other metric when operating a scaled out human process.

What would make a good proxy metric for 'more views and opinions'? I
think asking for more cores to +2 such changes would do it. E.g. ask
for 4 +2's for backward incompatible changes unless they've gone
through the a release cycle of being deprecated/warned.

 Would  you like to have seen a different judgement call - e.g.
 'Because this is a backward breaking change, it has to go through one release
 of deprecation warning, and *then* can be made' ?


  Yep, I think that would be appropriate in this case.   There is an impact 
 beyond just the GuestOS support that occurred to me since, but I don't want 
 to get this thread bogged down in this specific change so I'll start a new 
 thread for that.  My point is that where changes are proposed that affect the 
 behaviour of the system, and especially where they are not urgent (i.e not 
 high priority bug fixes) then we need to slow down the reviews and not assume 
 that all possible views / impacts will surface in a few days.

Again, I really disagree with 'need to slow down'. We need to achieve
something *different*.

 As I said, there seems to me to be something wrong with the priority around 
 changes when non urgent but behaviour changes go though in a few days but we 
 have bug fixes sitting with many +1's for over a month.

 Another possible reason is that we should have a strict no-exceptions-by-
 default approach to backwards incompatible changes, even when there are
 config settings to override them. Whatever the nub is - lets surface that and
 target it.


 Yep - I think we should have a very clear policy around how

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2014-01-01 Thread Day, Phil

Hi Sean, and Happy New Year :-)

 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 30 December 2013 22:05
 To: Day, Phil; OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] minimum review period for functional
 changes that break backwards compatibility

 On 12/29/2013 07:58 PM, Day, Phil wrote:
  Hi Sean,

  I'm not convinced the comparison to my clean shut down change is valid
 here.  For sure that proved that beyond a certain point (in that case months)
 there is no additional value in extending the review period, and no amount
 of review will catch all problems,  but that's not the same as saying that 
 there
 is no value in a minimum period.

 The reason I brought up graceful shutdown was that I actually think that
 review went the opposite direction, and got worse over time.

 Borrowing from another part of Robert's thread -
 https://review.openstack.org/#/q/status:open+-Verified-1+-
 CodeReview%2B2+-CodeReview%2B1+-CodeReview-1+-CodeReview-
 2+(project:openstack/nova+OR+project:openstack/python-
 novaclient)+branch:master,n,z

 Minimum review time only works if all patches eventually see review, which
 they don't.

I agree that the controlled shutdown didn't improve over time, although I don’t 
think it got worse.   It ironic in a way that the controlled shutdown issue is 
one which adversely affected the gate tests but wouldn’t I think of affected 
any real workloads (the problem is if you stop an instance immediately after 
its been created but before the GuestOS is running then it doesn’t see the 
shutdown signal and so waits for the full 2 minute period)  whereas the ext4 
change improves the gate tests but breaks some production systems.   

However whilst I agree that too long is bad, I don’t think that's inconsistent 
with too short also being bad - it seems to me that there is probably some 
sweet spot between these two extremes that probably also depends on the type of 
change being proposed. 

  In this particular case if the review has been open for say three weeks then
 imo the issue would have been caught, as I spotted it as soon as I saw the
 merge.  As it wasn't and urgent bug fix I don't see a major gain from not
 waiting even if there wasn't a problem.

 So... then why didn't you spot it on the submit? And would you have found
 the review on your own if it hadn't been for the merge commit email?

If I'd been working during the four days that the patch was open for review I'm 
confident that I would have spotted it - I make a point of looking out for any 
changes which might break our production system.  
It wasn't the merge commit e-mail that made me notice it BTW,  I made a point 
of looking at the recently merged changes in gerrit to see if there was 
anything significant that I'd missed.

But this thread wasn't created to get a review cadence that takes my holiday 
plans into account, it was more to open a discussion about are there different 
types of review strategies we should adopt to take into account the different 
types of changes that we now see.There was a time when Nova was a lot 
simpler, there were very few production deployments, there were a lot less 
active reviewers, and probably only the cores new all aspects of the system, 
and so two +2 and some +1 was enough to ensure a thorough review.  I'd suggest 
that things are different now and it would be worthwhile to identify a few 
characteristics and see if there isn't scope for a few different types of merge 
criteria.  For example (and I'd really like to hear other ideas and suggestions 
here) perhaps changes could be chacraterised / priorisied by one or more of the 
following:

- High priority bug fixes:  obvious why these need to be reviewed and merged 
quickly - two +2s is sufficient

- Approved BPs targeted for a specific release:  By being approved the design 
has had a level of scrutiny, and it's important to consumers of nova that the 
release roadmap is as reliable as possible, so these do need to get attention.  
 However a lot of design detail often only emerges in the code,  so these 
should be open to more scrutiny.  Maybe there should be an additional soft  
limit of at least 4 or 5 +1's

- Normal bug fixes:  Need to keep this ticking through, but should pause to 
make sure there is a reprentative range of reviews, for example a soft limit of 
at least 3 +1's

- Changes in default behaviour:   Always likely to affect existing systems in 
some way.   Maybe we should have an additional type of review vote that comes 
from people who are recognised as reperensting large production deployments ?

 It was urgent from a tripleo perspective, enough so that they were carrying
 an out of tree patch for it until it merged. Remember, we're trying to get
 tripleo, eventually, gating. And 45 mins deploy times was a big fix to move
 that ball forward. That's why I prioritized that.

 So while it was a low priority change

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2014-01-02 Thread Day, Phil

Hi Thierry,

Thanks for a great summary.

I don't really share your view that there is a us vs them attitude emerging 
between operators and developers (but as someone with a foot in  both camps 
maybe I'm just thinking that because otherwise I'd become even more bi-polar 
:-) 

I would suggest though that the criteria for core reviewers is maybe more 
slanted towards developers that operators, and that it would be worth 
considering if there is some way to recognised and incorporate the different 
perspective that operators can provide into the review process.

Regards,
Phil

 -Original Message-
 From: Thierry Carrez [mailto:thie...@openstack.org]
 Sent: 02 January 2014 09:53
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] minimum review period for functional
 changes that break backwards compatibility
 
 Tim Bell wrote:
  - Changes in default behaviour:   Always likely to affect existing systems 
  in
 some way.   Maybe we should have an additional type of review vote that
 comes from people who are recognised as reperensting large production
 deployments ?
 
  This is my biggest worry...  there are changes which may be technically 
  valid
 but have a significant disruptive impact on those people who are running
 clouds. Asking the people who are running production OpenStack clouds to
 review every patch to understand the risks and assess the migration impact is
 asking a lot.
 
 IMHO there are a few takeaways from this thread...
 
 When a proposed patch is known to change default behavior, or break
 backward compatibility, or cause an upgrade headache, we should definitely
 be more careful before finally approving the change. We should also have a
 mechanism to engage with users and operators so that they can weigh in. In
 the worst case scenario where there is no good solution, at least they are
 informed that the pain is coming. One remaining question would be... what is
 that mechanism ? Mail to the general list ? the operators list ? (should those
 really be two separate lists ?) Some impact tag that upgrade-minded
 operators can subscribe to ?
 
 For the cases where we underestimate the impact of a change, there is no
 magic bullet. So, like Sean said, we need to continue improving the number
 of things we cover by automated testing. We also need to continue
 encouraging a devops culture. When most developers are also people
 running clouds, we are better at estimating operational impact.
 
 I've seen a bit of us vs. them between operators and developers recently,
 and this is a dangerous trends. A sysadmin-friendly programming language
 was picked for OpenStack for a reason: to make sure that operation-minded
 developers and development-minded operators could all be part of the
 same game. If we create two separate groups, tension will only get worse.
 
 --
 Thierry Carrez (ttx)
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2014-01-07 Thread Day, Phil

Would be nice in this specific example though if the actual upgrade impact was 
explicitly called out in the commit message.

From the DocImpact it looks as if some Neutron config options are changing 
names - in which case the impact would seem to be that running systems have 
until the end of this cycle to change the names in their config files. 

(Is that the point at which the change would need to be made - i.e. if someone 
is planning an upgrade from H to I they need to make sure they have the new 
config names in place before the update ?)

Looking at the changes highlighted in nova.conf.sample it looks as if a lot 
more has changed - but I'm guessing this is an artifact of the way the file is 
generated rather that actual wholesale changes to config options.

Either way I'm not sure anyone trying to plan around the upgrade impact should 
be expected to have to dig into the diff's of the changed files to work out 
what they need to do, and what time period they have to do it in.

So it looks as if UpgradeImpact is really a warning of some change that needs 
to be considered at some point, but doesn't break a running system just by 
incorporating this change (since the deprecated names are still supported) - 
but the subsequent change that will eventually remove the deprecated names is 
the thing that is the actual upgrade impact (in that that once that change is 
incorporated the system will be broken if some extra action isn't taken).
Would both of those changes be tagged as UpgradeImpact ?  Should we make some 
distinction between these two cases ? 

Phil


From: Thierry Carrez [thie...@openstack.org]
Sent: 07 January 2014 10:04
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] minimum review period for functional 
changes that break backwards compatibility

Matt Riedemann wrote:
 There is discussion in this thread about wouldn't it be nice to have a
 tag on commits for changes that impact upgrades?.  There is.

 http://lists.openstack.org/pipermail/openstack-dev/2013-October/016619.html

 https://wiki.openstack.org/wiki/GitCommitMessages#Including_external_references

 Here is an example of a patch going through the gate now with
 UpgradeImpact:

 https://review.openstack.org/#/c/62815/

The good thing about UpgradeImpact is that it's less subjective than
OpsImpact, and I think it catches what matters: backward-incompatible
changes, upgrades needing manual intervention (or smart workarounds in
packaging), etc.

Additional benefit is that it's relevant for more than just the ops
population: packagers and the release notes writers also need to track
those.

--
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] where to expose network quota

2014-01-10 Thread Day, Phil

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 10 January 2014 08:54
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] where to expose network quota

 On 8 January 2014 03:01, Christopher Yeoh cbky...@gmail.com wrote:
  On Mon, Jan 6, 2014 at 4:47 PM, Yaguang Tang
  yaguang.t...@canonical.com
 ...

  For the V3 API clients should access neutron directly for quota information.
  The V3 API will no longer proxy quota related information for neutron.
  Also novaclient will not get the quota information from neutron, but
  users should use neutronclient or python-openstackclient instead.

  The V3 API mode for novaclient will only be accessing Nova - with one
  big exception for querying glance so images can be specified by name.
  And longer term I think we need to think about how we share client
  code amongst clients because I think there will be more cases where
  its useful to access other servers so things can be specified by name
  rather than UUID but we don't want to duplicate code in the clients.

 Also I think we shouldn't change v2 for this.

 -Rob

If you mean we shouldn't fix the V2 API to report Neutron quotas (rather that 
we shouldn't change the V2 api to remove network quotas) then I disagree  - 
currently the V2 API contains information on network quotas, and can be used on 
systems configured for either nova-network or Neutron.   It should provide the 
same consistent information regardless of the network backend configured - so 
it's a bug that the V2 API doesn't provide network quotas when using neutron.

I know we want to deprecate the V2 API but it will still be around for a while 
- and in the meantime if people want to put the effort into working on bug 
fixes then that should still be allowed.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-20 Thread Day, Phil

HI Folks,

The original (and fairly simple) driver behind whole-host-allocation 
(https://wiki.openstack.org/wiki/WholeHostAllocation) was to enable users to 
get guaranteed isolation for their instances.  This then grew somewhat along 
the lines of If they have in effect a dedicated hosts then wouldn't it be 
great if the user could also control some aspect of the scheduling, access for 
other users, etc.The Proof of Concept I presented at the Icehouse Design 
summit provided this by providing API extensions to in effect manipulate an 
aggregate and scheduler filters used with that aggregate.   
https://etherpad.openstack.org/p/NovaIcehousePclouds

Based on the discussion and feedback from the design summit session it became 
clear that this approach was kind of headed into a difficult middle ground 
between a very simple approach for users who just wanted the isolation for 
their instances, and a fully delegated admin model which would allow any admin 
operation to be scoped to a specific set of servers/flavours/instances

I've spent some time since mulling over what it would take to add some kind of 
scoped admin capability into Nova, and my current thinking is that it would 
be a pretty big change because there isn't really a concept of ownership once 
you get beyond instances and a few related objects.   Also with TripleO its 
becoming easier to set up new copies of a Nova stack to control a specific set 
of hosts, and that in effect provides the same degree of scoped admin in a much 
more direct way.  The sort of model I'm thinking of here is a system where 
services such as Glance/Cinder and maybe Neutron are shared by a number of Nova 
services.There are still a couple of things needed to make this work, such 
as limiting tenant access to regions on Keystone, but that feels like a better 
layer to try and address this kind of issue.

In terms of the original driver of just guaranteeing instance isolation then we 
could (as suggested by Alex Gilkson and others) implement this just as a new 
instance property with an appropriate scheduler filter (i.e. for this type of 
instance only allow scheduling to hosts that are either empty or running only 
instances for the same tenant).The attribute would then be passed through 
in notification messages, etc for the billing system to process.
This would be pretty much the peer of AWS dedicated instances.

The host_state object already has the required num_instances_by_project data 
required by the scheduler filter, and the stats field in the compute manager 
resource tracker also has this information - so both the new filter and 
additional limits check on the compute manager look like they would be fairly 
straight forward to implement.

It's kind of beyond the scope of Nova, but the resulting billing model in this 
case is more complex -as the user isn't telling you explicitly how many 
dedicated hosts they are going to consume.  AWS just charge a flat rate per 
region for having any number of dedicated instances - if you wanted to charge 
per dedicated host then it'd difficult to warn the user before they create a 
new instance that they are about to branch onto a new host.

Would welcome thoughts on the above,
Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-21 Thread Day, Phil

   So, I actually don't think the two concepts (reservations and
   isolated instances) are competing ideas. Isolated instances are
   actually not reserved. They are simply instances that have a
   condition placed on their assignment to a particular compute node
   that the node must only be hosting other instances of one or more
   specified projects (tenants).
 
  I got your idea. This filter [1] already does most of the work,
  although it relies on aggregates and requires admin management. The
  main issue with isolated instances is that it requires kind of
  capacity planning for making sure you can cope with the load, that's
  why we placed the idea of having such a placement scheduler.
 
  [1] :
 
 https://github.com/openstack/nova/blob/master/nova/scheduler/filters/a
  ggregate_multitenancy_isolation.py
 
 Right, the difference between that and my proposed solution would be
 there would be no dependency on any aggregate at all.
 
 I do understand your point about capacity planning in light of such scheduling
 functionality -- due to the higher likelihood that compute nodes would be
 unable to service a more general workload from other tenants.
 
 But I believe that the two concerns can be tackled separately.
 
Exactly - that's why I wanted to start this debate about the way forward for 
the Pcloud Blueprint, which was heading into some kind of middle ground.  As 
per my original post, and it sounds like the three of us are at least aligned 
I'm proposing to spilt this into two streams:

i) A new BP that introduces the equivalent of AWS dedicated instances. 
User - Only has to specify that at boot time that the instance must be 
on a host used exclusively by that tenant.
Scheduler - ether finds a hoist which matches this constraint or it 
doesn't.   No linkage to aggregates (other than that from other filters), no 
need for the aggregate to have been pre-configured
Compute Manager - has to check the constraint (as with any other 
scheduler limit) and add the info that this is a dedicated instance to 
notification messages
Operator - has to manage capacity as they do for any other such 
constraint (it is a significant capacity mgmt issue, but no worse in my mind 
that having flavors that can consume most of a host) , and work out how they 
want to charge for such a model (flat rate additional charge for first such 
instance, charge each time a new host is used, etc).

I think there is clear water between this and the existing aggregate based 
isolation.  I also think this is a different use case from reservations.   It's 
*mostly* like a new scheduler hint, but because it has billing impacts I think 
it needs to be more than just that - for example the ability to request a 
dedicated instance is something that should be controlled by a specific role.


ii) Leave the concept of private clouds within a cloud  to something that can 
be handled at the region level.  I think there are valid use cases here, but it 
doesn’t make sense to try and get this kind of granularity within Nova.

 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Climate - Was: Next steps for Whole Host allocation / Pclouds

2014-01-21 Thread Day, Phil


 Hi Phil and Jay,

Phil, maybe you remember I discussed with you about the possibility of using 
pclouds with Climate, but we finally ended up using Nova aggregates and a 
dedicated filter. 
That works pretty fine. We don't use instance_properties 
but rather aggregate metadata but the idea remains the same for isolation.

Sure do, and I had a question around that which has been buzzing in my head for 
a while now.

I can see how you can use an aggregate as a way of isolating the capacity of 
some specific hosts (Pclouds was pretty much doing the same thing - it was in 
effect an abstraction layer to surface aggregates to users), and I can see that 
you can then plan how to use that capacity against a list of reservations.

It does though seem that you're confined to working on some subset of the 
physical hosts, which I'd of thought could become quite restrictive in some 
cases and hard to optimize for capacity.  (if for example a user wants to 
combine reservations with anti-affinity then you'd need to have a larger pool 
of hosts to work with).

It sort of feels to me that a significant missing of having a reservation 
system for Nova is that there is no matching concept within Nova of the 
opposite of a reservation - a spot instance (i.e an instance which the user 
gets for a lower price in return for knowing it can be deleted by the system if 
the capacity is needed for another higher-priority request - e.g. a 
reservation).

If we had a concept of spot instances in Nova, and the corresponding process to 
remove them, then the capacity demands of reservations could be balanced by the 
amount of spot-instance usage in the system (and this would seem a good role 
for an external controller). 

I'm wondering if managing spot instances and reservations across the whole of a 
Nova system wouldn't be a more general use case than having to manage this 
within a specific aggregate - or am I missing something ?

Cheers,
Phil



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-21 Thread Day, Phil


 
  I think there is clear water between this and the existing aggregate based
 isolation.  I also think this is a different use case from reservations.   
 It's
 *mostly* like a new scheduler hint, but because it has billing impacts I 
 think it
 needs to be more than just that - for example the ability to request a
 dedicated instance is something that should be controlled by a specific role.
 
 
 I agree with that, that's another scheduler filter with extra scheduler hint,
 plus a notification message on the AMQP queue thanks to an handler. That's
 not role of Nova but Ceilometer to handle billable items and meters
 consolidation.
 That said, AWS dedicated instances are backed by VPC, so that's not fairly
 identical from what you propose here. Here the proposal is more likely
 making me thinking of AWS reserved instances without a contractual
 period.
 
 IMHO, this model is interesting but hard to use for operators, because they
 don't have visibility on the capacity. Anyway, if Nova would provide this
 feature, Climate would be glad using it (and on a personal note, I would be
 glad contributing to it).
 

I think VPC in AWS is more of a network construct that anything to do with 
specific hosts - (i.e the difference between running nova-net or neutron in 
flat mode vs vlan or vxlan mode). 

I agree that it is hard for an operator to work out how to charge for this - in 
effect it comes down to some sort of statistical model to work out what 
additional premium you need to charge to recover the cost of the capacity that 
is now not usable by other tenants.   This kind of thing becomes easier at 
scale than it is for very small systems.   So some/many operators may decide 
that they don't want to offer it (which is why it needs to be a specific 
feature in my mind, even if that does mean a minor bump to the API version in 
V3 and a new extension to provide the new option in V2 - sorry Jay).   It 
probably even merits its own specific quota value (max number of dedicated 
instances).   I'm not sure that its really that much harder to manage than any 
other capacity issue though - for example if you define a flavor that occupies 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-21 Thread Day, Phil

 -Original Message-
 From: Khanh-Toan Tran [mailto:khanh-toan.t...@cloudwatt.com]
 Sent: 21 January 2014 14:21
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

  Exactly - that's why I wanted to start this debate about the way
  forward for the Pcloud Blueprint, which was heading into some kind of
  middle ground.  As per my original post, and it sounds like the three
  of us are at least aligned I'm proposing to spilt this into two
  streams:

  i) A new BP that introduces the equivalent of AWS dedicated instances.

 Why do you want to transform pCloud into AWS dedicated instances? As I
 see it, pCloud is for requesting physical hosts (HostFlovors as in pcloud 
 wiki)
 on which users can create their own instances (theoretically in unlimited
 number).
 Therefore it should be charged per physical server (HostFlavor), not by
 instances. It is completely different from AWS dedicated instances which is
 charged per instance. IMO, pcloud resembles Godrid Dedicated Server, not
 AWS Dedicated Instance.

 If you want to provide AWS dedicated instances  typed service, then it would
 not be Pcloud, nor it is a continuation of the WholeHostAllocaiton blueprint,
 which , IMO, is damned well designed. 

Thank you ;-)

I probably didn’t explain it very well, but I wasn't trying to say that 
dedicated instances were a complete replacement for pClouds - more that as a 
simpler concept they would provide one of the use cases that originally drove 
pClouds in a much simpler form.

Based on the feedback I got the problem with the more general pClouds scope as 
it currently stands is that its somewhere between a VM isolation model and 
fully delegated control of a specific set of hosts, and as such that doesn’t 
really feel like a tenable place to end up.

As a simple VM isolation model (which is where I made the comparison with 
dedicated instances) its more complex that it needs to be.

As a way of allowing a user to manage some set of hosts its fine for 
allocation/deallocation and scheduling - and if that was the full set of 
operations that were ever going to be needed then maybe it would be fine.   But 
as soon as you start to look at the other operations that folks want to really 
deliver a cloud within a cloud type concept (specific scheduler config, 
control placement, define and manage flavors, etc) I think you'd end up 
replicating large parts of the existing code.   An alternative is to extend the 
roles model within Nova somehow so that roles can be scoped to a specific 
aggregate or set of aggregates,  but that's a pretty big change from where we 
are and would only every cover Nova.  So I came round to thinking that the 
better way to have that kind of delegated control is to actually set up 
separate Nova's each covering the hosts that you want to delegate and sharing 
other services like Glance, Cinder, and Neutron - esp as the promise of TripleO 
is that it's going to make this much easier to do.   

If there's value in just keeping pClouds as a host allocation feature, and not 
trying to go any further into the delegated admin model than the few simple 
features already included in the PoC then that's also useful feedback. 

 It'll be just another scheduler job.
 Well, I did not say that it's not worth pursuing ; I just say that
 WholeHostAllocation is worth being kept pcloud.

  User - Only has to specify that at boot time that the instance must
  be on a host used exclusively by that tenant.
  Scheduler - ether finds a hoist which matches this constraint or it
  doesn't.   No linkage to aggregates (other than that from other filters),
  no need
  for the aggregate to have been pre-configured
  Compute Manager - has to check the constraint (as with any other
  scheduler limit) and add the info that this is a dedicated instance to
  notification messages
  Operator - has to manage capacity as they do for any other such
  constraint (it is a significant capacity mgmt issue, but no worse in
  my mind that having flavors that can consume most of a host) , and
  work out how they want to charge for such a model (flat rate
  additional charge for first such instance, charge each time a new host
  is used, etc).

 How about using migration for releasing compute hosts for new allocation? In
 standard configuration, admin would use LoadBalancing for his computes.
 Thus if we don't have a dedicated resources pool (this comes back to
 aggregate configuration), then all hosts would be used, which leaves no host
 empty for hosting dedicated instances.

In either case the cloud operator has to do a degree of capacity management.  
Dedicated instances (as a simple scheduler feature) are unlikely to work with 
spreading configuration.   On the other hand with pClouds the operator also has 
to maintain an explicit free pool of hosts, and again with a spread allocator 
they're

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-24 Thread Day, Phil

 
  Cool. I like this a good bit better as it avoids the reboot. Still, this is 
  a rather
 large amount of data to copy around if I'm only changing a single file in 
 Nova.
 
 
 I think in most cases transfer cost is worth it to know you're deploying what
 you tested. Also it is pretty easy to just do this optimization but still be
 rsyncing the contents of the image. Instead of downloading the whole thing
 we could have a box expose the mounted image via rsync and then all of the
 machines can just rsync changes. Also rsync has a batch mode where if you
 know for sure the end-state of machines you can pre-calculate that rsync and
 just ship that. Lots of optimization possible that will work fine in your 
 just-
 update-one-file scenario.
 
 But really, how much does downtime cost? How much do 10Gb NICs and
 switches cost?
 

It's not as simple as just saying buy better hardware (although I do have a 
vested interest in that approach ;-)  - on a compute node the Network and Disk 
bandwidth is already doing useful work for paying customers.   The more 
overhead you put into that for updates, the more disruptive it becomes.

Phil 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-24 Thread Day, Phil

 On 01/22/2014 12:17 PM, Dan Prince wrote:
  I've been thinking a bit more about how TripleO updates are developing
 specifically with regards to compute nodes. What is commonly called the
 update story I think.
 
  As I understand it we expect people to actually have to reboot a compute
 node in the cluster in order to deploy an update. This really worries me
 because it seems like way overkill for such a simple operation. Lets say all I
 need to deploy is a simple change to Nova's libvirt driver. And I need to
 deploy it to *all* my compute instances. Do we really expect people to
 actually have to reboot every single compute node in their cluster for such a
 thing. And then do this again and again for each update they deploy?
 
 FWIW, I agree that this is going to be considered unacceptable by most
 people.  Hopefully everyone is on the same page with that.  It sounds like
 that's the case so far in this thread, at least...
 
 If you have to reboot the compute node, ideally you also have support for
 live migrating all running VMs on that compute node elsewhere before doing
 so.  That's not something you want to have to do for *every* little change to
 *every* compute node.


Yep, my reading is the same as yours Russell, everyone agreed that there needs 
to be an update that avoids the reboot where possible (other parts of the 
thread seem to be focused on how much further the update can be optimized).

What's not clear to me is when the plan is to have that support in TripleO - I 
tried looking for a matching Blueprint to see if it was targeted for Icehouse 
but can't match it against the five listed.   Perhaps Rob or Clint can clarify ?
Feels to me that this is a must have before anyone will really be able to use 
TripleO beyond a PoC for initial deployment.






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Why Nova should fail to boot if there are only one private network and one public network ?

2014-01-24 Thread Day, Phil

HI Sylvain,

The change only makes the user have to supply a network ID if there is more 
than one private network available (and the issue there is that otherwise the 
assignment order in the Guest is random, which normally leads to all sorts of 
routing problems).

I'm running a standard Devstack with Neuron (built from trunk a couple of days 
ago), can see both a private and public network, and can boot VMs without 
having to supply any network info:

$ neutron net-list
+--+-+--+
| id   | name| subnets  
|
+--+-+--+
| 16f659a8-6953-4ead-bba5-abf8081529a5 | public  | 
a94c6a9d-bebe-461b-b056-fed281063bc0 |
| 335113bf-f92f-4249-8341-45cdc9d781bf | private | 
51b97cde-d06a-4265-95aa-d9165b7becd0 10.0.0.0/24 |
+--+-+--+

$ nova boot --image  cirros-0.3.1-x86_64-uec --flavor m1.tiny phil
+--++
| Property | Value  
|
+--++
| OS-DCF:diskConfig| MANUAL 
|
| OS-EXT-AZ:availability_zone  | nova   
|
| OS-EXT-STS:power_state   | 0  
|
| OS-EXT-STS:task_state| scheduling 
|
| OS-EXT-STS:vm_state  | building   
|
| OS-SRV-USG:launched_at   | -  
|
| OS-SRV-USG:terminated_at | -  
|
| accessIPv4   |
|
| accessIPv6   |
|
| adminPass| DaX2mcPnEK9U   
|
| config_drive |
|
| created  | 2014-01-24T13:11:30Z   
|
| flavor   | m1.tiny (1)
|
| hostId   |
|
| id   | 34210c19-7a4f-4438-b376-6e65722b4bd6   
|
| image| cirros-0.3.1-x86_64-uec 
(8ee8f7af-1327-4e28-a0bd-1701e04a6ba7) |
| key_name | -  
|
| metadata | {} 
|
| name | phil   
|
| os-extended-volumes:volumes_attached | [] 
|
| progress | 0  
|
| security_groups  | default
|
| status   | BUILD  
|
| tenant_id| cc6258c6a4f34bd1b79e90f41bec4726   
|
| updated  | 2014-01-24T13:11:30Z   
|
| user_id  | 3a497f5e004145d494f80c0c9a81567c   
|
+--++

$ nova list
+--+---+++-+--+
| ID   | Name  | Status | Task State | Power 
State | Networks |
+--+---+++-+--+
| 34210c19-7a4f-4438-b376-6e65722b4bd6 | phil  | ACTIVE | -  | Running  
   | private=10.0.0.5 |
+--+---+++-+--+



From: Sylvain Bauza [mailto:sylvain.ba...@bull.net]
Sent: 23 January 2014 09:58
To:

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-24 Thread Day, Phil

Hi Justin,

I can see the value of this, but I'm a bit wary of the metadata service 
extending into a general API - for example I can see this extending into a 
debate about what information needs to be made available about the instances 
(would you always want all instances exposed, all details, etc) - if not we'd 
end up starting to implement policy restrictions in the metadata service and 
starting to replicate parts of the API itself.

Just seeing instances launched before me doesn't really help if they've been 
deleted (but are still in the cached values) does it ?

Since there is some external agent creating these instances, why can't that 
just provide the details directly as user defined metadata ?

Phil

From: Justin Santa Barbara [mailto:jus...@fathomdb.com]
Sent: 23 January 2014 16:29
To: OpenStack Development Mailing List
Subject: [openstack-dev] [Nova] bp proposal: discovery of peer instances 
through metadata service

Would appreciate feedback / opinions on this blueprint: 
https://blueprints.launchpad.net/nova/+spec/first-discover-your-peers

The idea is: clustered services typically run some sort of gossip protocol, but 
need to find (just) one peer to connect to.  In the physical environment, this 
was done using multicast.  On the cloud, that isn't a great solution.  Instead, 
I propose exposing a list of instances in the same project, through the 
metadata service.

In particular, I'd like to know if anyone has other use cases for instance 
discovery.  For peer-discovery, we can cache the instance list for the lifetime 
of the instance, because it suffices merely to see instances that were launched 
before me.  (peer1 might not join to peer2, but peer2 will join to peer1).  
Other use cases are likely much less forgiving!

Justin


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova]Why not allow to create a vm directly with two VIF in the same network

2014-01-24 Thread Day, Phil

I agree its oddly inconsistent (you'll get used to that over time ;-)  - but to 
me it feels more like the validation is missing on the attach that that the 
create should allow two VIFs on the same network.   Since these are both 
virtualised (i.e share the same bandwidth, don't provide any additional 
resilience, etc) I'm curious about why you'd want two VIFs in this 
configuration ?

From: shihanzhang [mailto:ayshihanzh...@126.com]
Sent: 24 January 2014 03:22
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova]Why not allow to create a vm directly with two 
VIF in the same network

I am a beginer of nova, there is a problem which has confused me, in the latest 
version, it not allowed to create a vm directly with two VIF in the same 
network, but allowed to add a VIF that it network is same with a existed 
VIF'network, there is the use case that a vm with two VIF in the same network, 
but why not allow to create the vm directly with two VIF in the same network?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Why Nova should fail to boot if there are only one private network and one public network ?

2014-01-24 Thread Day, Phil

Hi Sylvain,

Thanks for the clarification, I'd missed that it was where the public network 
belonged to the same tenant (it's not a use case we run with).

So I can see that option [1] would make the validation work by (presumably) not 
including the shared network in the list of networks,  but looking further into 
the code allocate_for_instance() uses the same call to decide which networks it 
needs to create ports for, and from what I can see it would attach the instance 
to both networks.

https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L244

However that feels like the same problem that the patch was originally trying 
to fix, in that the network order isn't controlled by the user, and many Guest 
OS's will only configure the first NIC they are presented with.  The idea was 
that in this case the user needs to explicitly specify the networks in the 
order that they want them to be attached to.

Am I still missing something ?

Cheers,
Phil



From: Sylvain Bauza [mailto:sylvain.ba...@bull.net]
Sent: 24 January 2014 14:02
To: OpenStack Development Mailing List (not for usage questions)
Cc: Day, Phil
Subject: Re: [openstack-dev] [Nova] Why Nova should fail to boot if there are 
only one private network and one public network ?

Hi Phil,

Le 24/01/2014 14:13, Day, Phil a écrit :
HI Sylvain,

The change only makes the user have to supply a network ID if there is more 
than one private network available (and the issue there is that otherwise the 
assignment order in the Guest is random, which normally leads to all sorts of 
routing problems).

I'm sorry, but the query also includes shared (so, public) networks from the 
same tenant. See [1].



I'm running a standard Devstack with Neuron (built from trunk a couple of days 
ago), can see both a private and public network, and can boot VMs without 
having to supply any network info:


Indeed, that does work because Devstack is smart enough for creating the two 
networks with distinct tenant_ids. See [2] as a proof :-)
If someone is building a private and a public network *on the same tenant*, it 
will fail to boot. Apologies if I was unclear.

So, the question is : what shall I do for changing this ? There are 2 options 
for me:
 1. Add an extra param to _get_available_networks : shared=True and only return 
shared networks if the param is set to True (so we keep compatibility with all 
the calls)
 2. Parse the nets dict here [3] to expurge the shared networks when len(nets) 
 1. That's simple but potentially a performance issue, as it's O(N).

I would personnally vote for #1 and I'm ready to patch. By the way, the test 
case needs also to be updated [4].

-Sylvain


[1] 
https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L127
[2] : http://paste.openstack.org/show/61819/
[3] : 
https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L528
[4] : 
https://github.com/openstack/nova/blob/master/nova/tests/network/test_neutronv2.py#L1028
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-24 Thread Day, Phil

 systems, but it 
does enable e.g. just launching N instances in one API call, or just using an 
auto-scaling group.  I suspect the configuration management systems would 
prefer this to having to implement this themselves.

(Example JSON below)

Justin

---

Example JSON:

[
{
availability_zone: nova,
network_info: [
{
id: e60bbbaf-1d2e-474e-bbd2-864db7205b60,
network: {
id: f2940cd1-f382-4163-a18f-c8f937c99157,
label: private,
subnets: [
{
cidr: 10.11.12.0/24http://10.11.12.0/24,
ips: [
{
address: 10.11.12.4,
type: fixed,
version: 4
}
],
version: 4
},
{
cidr: null,
ips: [],
version: null
}
]
}
}
],
reservation_id: r-44li8lxt,
security_groups: [
{
name: default
}
],
uuid: 2adcdda2-561b-494b-a8f6-378b07ac47a4
},

... (the above is repeated for every instance)...
]




On Fri, Jan 24, 2014 at 8:43 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
 Hi Justin,



 I can see the value of this, but I'm a bit wary of the metadata service
 extending into a general API - for example I can see this extending into a
 debate about what information needs to be made available about the instances
 (would you always want all instances exposed, all details, etc) - if not
 we'd end up starting to implement policy restrictions in the metadata
 service and starting to replicate parts of the API itself.



 Just seeing instances launched before me doesn't really help if they've been
 deleted (but are still in the cached values) does it ?



 Since there is some external agent creating these instances, why can't that
 just provide the details directly as user defined metadata ?



 Phil



 From: Justin Santa Barbara 
 [mailto:jus...@fathomdb.commailto:jus...@fathomdb.com]
 Sent: 23 January 2014 16:29
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] bp proposal: discovery of peer instances
 through metadata service



 Would appreciate feedback / opinions on this blueprint:
 https://blueprints.launchpad.net/nova/+spec/first-discover-your-peers



 The idea is: clustered services typically run some sort of gossip protocol,
 but need to find (just) one peer to connect to.  In the physical
 environment, this was done using multicast.  On the cloud, that isn't a
 great solution.  Instead, I propose exposing a list of instances in the same
 project, through the metadata service.



 In particular, I'd like to know if anyone has other use cases for instance
 discovery.  For peer-discovery, we can cache the instance list for the
 lifetime of the instance, because it suffices merely to see instances that
 were launched before me.  (peer1 might not join to peer2, but peer2 will
 join to peer1).  Other use cases are likely much less forgiving!


 Justin



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-24 Thread Day, Phil

 
 Good points - thank you.  For arbitrary operations, I agree that it would be
 better to expose a token in the metadata service, rather than allowing the
 metadata service to expose unbounded amounts of API functionality.  We
 should therefore also have a per-instance token in the metadata, though I
 don't see Keystone getting the prerequisite IAM-level functionality for two+
 releases (?).

I can also see that in Neutron not all instances have access to the API servers.
so I'm not against having something in metadata providing its well-focused.
 
...

 In terms of information exposed: An alternative would be to try to connect
 to every IP in the subnet we are assigned; this blueprint can be seen as an
 optimization on that (to avoid DDOS-ing the public clouds).  

Well if you're on a Neutron private network then you'd only be DDOS-ing 
yourself.
In fact I think Neutron allows broadcast and multicast on private networks, and
as nova-net is going to be deprecated at some point I wonder if this is reducing
to a corner case ?


 So I've tried to
 expose only the information that enables directed scanning: availability zone,
 reservation id, security groups, network ids  labels  cidrs  IPs [example
 below].  A naive implementation will just try every peer; a smarter
 implementation might check the security groups to try to filter it, or the 
 zone
 information to try to connect to nearby peers first.  Note that I don't expose
 e.g. the instance state: if you want to know whether a node is up, you have
 to try connecting to it.  I don't believe any of this information is at all
 sensitive, particularly not to instances in the same project.
 
Does it really need all of that - it seems that IP address would really be 
enough
and the agents or whatever in the instance could take it from there ?

What worried me most, I think, is that if we make this part of the standard
metadata then everyone would get it, and that raises a couple of concerns:

- Users with lots of instances (say 1000's) but who weren't trying to run any 
form 
of discovery would start getting a lot more metadata returned, which might cause
performance issues

- Some users might be running instances on behalf of customers (consider say a
PaaS type service where the user gets access into an instance but not to the
Nova API.   In that case I wouldn't want one instance to be able to discover 
these
kinds of details about other instances. 


So it kind of feels to me that this should be some other specific set of 
metadata
that instances can ask for, and that instances have to explicitly opt into. 

We already have a mechanism now where an instance can push metadata as a
way of Windows instances sharing their passwords - so maybe this could build
on that somehow - for example each instance pushes the data its willing to share
with other instances owned by the same tenant ?

 On external agents doing the configuration: yes, they could put this into user
 defined metadata, but then we're tied to a configuration system.  We have
 to get 20 configuration systems to agree on a common format (Heat, Puppet,
 Chef, Ansible, SaltStack, Vagrant, Fabric, all the home-grown systems!)  It
 also makes it hard to launch instances concurrently (because you want node
 #2 to have the metadata for node #1, so you have to wait for node #1 to get
 an IP).
 
Well you've kind of got to agree on a common format anyway haven't you
if the information is going to come from metadata ?   But I get your other 
points. 

 More generally though, I have in mind a different model, which I call
 'configuration from within' (as in 'truth comes from within'). I don't want a 
 big
 imperialistic configuration system that comes and enforces its view of the
 world onto primitive machines.  I want a smart machine that comes into
 existence, discovers other machines and cooperates with them.  This is the
 Netflix pre-baked AMI concept, rather than the configuration management
 approach.
 
 The blueprint does not exclude 'imperialistic' configuration systems, but it
 does enable e.g. just launching N instances in one API call, or just using an
 auto-scaling group.  I suspect the configuration management systems would
 prefer this to having to implement this themselves.

Yep, I get the concept, and metadata does seem like the best existing
mechanism to do this as its already available to all instances regardless of
where they are on the network, and it's a controlled interface.  I'd just like 
to
see it separate from the existing metadata blob, and on an opt-in basis.

Phil 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-27 Thread Day, Phil

 
 What worried me most, I think, is that if we make this part of the standard
 metadata then everyone would get it, and that raises a couple of concerns:
 
 - Users with lots of instances (say 1000's) but who weren't trying to run any
 form of discovery would start getting a lot more metadata returned, which
 might cause performance issues

 
 The list of peers is only returned if the request comes in for peers.json, so
 there's no growth in the returned data unless it is requested.  Because of the
 very clear instructions in the comment to always pre-fetch data, it is always
 pre-fetched, even though it would make more sense to me to fetch it lazily
 when it was requested!  Easy to fix, but I'm obeying the comment because it
 was phrased in the form of a grammatically valid sentence :-)
 
Ok, thanks for the clarification - I'd missed that this was a new json object, 
I thought you were just adding the data onto the existing object.

 
 - Some users might be running instances on behalf of customers (consider
 say a PaaS type service where the user gets access into an instance but not 
 to
 the Nova API.   In that case I wouldn't want one instance to be able to
 discover these kinds of details about other instances.


 Yes, I do think this is a valid concern.  But, there is likely to be _much_ 
 more
 sensitive information in the metadata service, so anyone doing this is
 hopefully blocking the metadata service anyway.  On EC2 with IAM, or if we
 use trusts, there will be auth token in there.  And not just for security, but
 also because if the PaaS program is auto-detecting EC2/OpenStack by looking
 for the metadata service, that will cause the program to be very confused if 
 it
 sees the metadata for its host!

Currently the metadata service only returns information for the instance that 
is requesting it (the Neutron proxy validates the source address and project), 
so the concern around sensitive information is already mitigated.But if 
we're now going to return information about other instances that changes the 
picture somewhat. 


 
 We already have a mechanism now where an instance can push metadata as
 a way of Windows instances sharing their passwords - so maybe this could
 build on that somehow - for example each instance pushes the data its
 willing to share with other instances owned by the same tenant ?
 
 I do like that and think it would be very cool, but it is much more complex to
 implement I think.

I don't think its that complicated - just needs one extra attribute stored per 
instance (for example into instance_system_metadata) which allows the instance 
to be included in the list


  It also starts to become a different problem: I do think we
 need a state-store, like Swift or etcd or Zookeeper that is easily accessibly 
 to
 the instances.  Indeed, one of the things I'd like to build using this 
 blueprint is
 a distributed key-value store which would offer that functionality.  But I 
 think
 that having peer discovery is a much more tightly defined blueprint, whereas
 some form of shared read-write data-store is probably top-level project
 complexity.
 
Isn't the metadata already in effect that state-store ? 

  I'd just like to
 see it separate from the existing metadata blob, and on an opt-in basis
 
 Separate: is peers.json enough?  I'm not sure I'm understanding you here.

Yep - that ticks the separate box. 

 
 Opt-in:   IMHO, the danger of our OpenStack everything-is-optional-and-
 configurable approach is that we end up in a scenario where nothing is
 consistent and so nothing works out of the box.  I'd much rather hash-out
 an agreement about what is safe to share, even if that is just IPs, and then
 get to the point where it is globally enabled.  Would you be OK with it if it 
 was
 just a list of IPs?

I still think that would cause problems for PaaS services that abstracts the 
users away from direct control of the instance (I,e. the PaaS service is the 
Nova tenant, and creates instances in that tenant that are then made available 
to individual users.   At the moment the only data such a user can see even 
from metadata are details of their instance. Extending that to allowing 
discover of other instances in the same tenant still feels to me to be 
something that needs to be controllable.   The number of instances that 
want / need to be able to discover each other is subset of all instances, so 
making those explicitly declare themselves to the metadata service (when they 
have to already have the logic to get peers.json) doesn't sound like a major 
additional complication to me.

Cheers,
Phil





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-29 Thread Day, Phil

 -Original Message-
 From: Justin Santa Barbara [mailto:jus...@fathomdb.com]
 Sent: 28 January 2014 20:17
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances
 through metadata service

 Thanks John - combining with the existing effort seems like the right thing to
 do (I've reached out to Claxton to coordinate).  Great to see that the larger
 issues around quotas / write-once have already been agreed.

 So I propose that sharing will work in the same way, but some values are
 visible across all instances in the project.  I do not think it would be
 appropriate for all entries to be shared this way.  A few
 options:

 1) A separate endpoint for shared values
 2) Keys are shared iff  e.g. they start with a prefix, like 'peers_XXX'
 3) Keys are set the same way, but a 'shared' parameter can be passed, either
 as a query parameter or in the JSON.

 I like option #3 the best, but feedback is welcome.

 I think I will have to store the value using a system_metadata entry per
 shared key.  I think this avoids issues with concurrent writes, and also makes
 it easier to have more advanced sharing policies (e.g.
 when we have hierarchical projects)

 Thank you to everyone for helping me get to what IMHO is a much better
 solution than the one I started with!

 Justin

I think #1 or #3 would be fine.   I don't really like #2 - doing this kind of 
thing through naming conventions always leads to problems IMO.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-29 Thread Day, Phil

)
 +else:
 +result[instance['key_name']] = [line]
 +return result
 +
  def get_metadata(self, ip):
  i = self.get_instance_by_ip(ip)
 +mpi = self._get_mpi_data(i['project_id'])
  if i is None:
  return None
  if i['key_name']:
 @@ -135,7 +148,8 @@ class CloudController(object):
  'public-keys' : keys,
  'ramdisk-id': i.get('ramdisk_id', ''),
  'reservation-id': i['reservation_id'],
 -'security-groups': i.get('groups', '')
 +'security-groups': i.get('groups', ''),
 +'mpi': mpi
  }
  }
  if False: # TODO: store ancestor ids
 
 
 
 
 
  On Tue, Jan 28, 2014 at 4:38 AM, John Garbutt j...@johngarbutt.com
 wrote:
  On 27 January 2014 14:52, Justin Santa Barbara jus...@fathomdb.com
 wrote:
  Day, Phil wrote:
 
 
  We already have a mechanism now where an instance can push
  metadata as a way of Windows instances sharing their passwords -
  so maybe this could build on that somehow - for example each
  instance pushes the data its willing to share with other
  instances owned by the same tenant ?
 
  I do like that and think it would be very cool, but it is much
  more complex to implement I think.
 
  I don't think its that complicated - just needs one extra attribute
  stored per instance (for example into instance_system_metadata)
  which allows the instance to be included in the list
 
 
  Ah - OK, I think I better understand what you're proposing, and I do
  like it.  The hardest bit of having the metadata store be full
  read/write would be defining what is and is not allowed
  (rate-limits, size-limits, etc).  I worry that you end up with a new
  key-value store, and with per-instance credentials.  That would be a
  separate discussion: this blueprint is trying to provide a focused
 replacement for multicast discovery for the cloud.
 
  But: thank you for reminding me about the Windows password though...
  It may provide a reasonable model:
 
  We would have a new endpoint, say 'discovery'.  An instance can POST
  a single string value to the endpoint.  A GET on the endpoint will
  return any values posted by all instances in the same project.
 
  One key only; name not publicly exposed ('discovery_datum'?); 255
  bytes of value only.
 
  I expect most instances will just post their IPs, but I expect other
  uses will be found.
 
  If I provided a patch that worked in this way, would you/others be on-
 board?
 
  I like that idea. Seems like a good compromise. I have added my
  review comments to the blueprint.
 
  We have this related blueprints going on, setting metadata on a
  particular server, rather than a group:
  https://blueprints.launchpad.net/nova/+spec/metadata-service-callback
  s
 
  It is limiting things using the existing Quota on metadata updates.
 
  It would be good to agree a similar format between the two.
 
  John
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] about the bp cpu-entitlement

2014-02-04 Thread Day, Phil

Hi,

There were a few related blueprints which were looking to add various 
additional types of resource to the scheduler - all of which will now be 
implemented on top of a new generic mechanism covered by:

https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking

 -Original Message-
 From: sahid [mailto:sahid.ferdja...@cloudwatt.com]
 Sent: 04 February 2014 09:24
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: [openstack-dev] [nova] about the bp cpu-entitlement
 
 Greetings,
 
   I saw a really interesting blueprint about cpu entitlement, it will be 
 targeted
 for icehouse-3 and I would like to get some details about the progress?. Does
 the developer need help? I can give a part of my time on it.
 
 https://blueprints.launchpad.net/nova/+spec/cpu-entitlement
 
 Thanks a lot,
 s.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] How do I mark one option as deprecating another one ?

2014-02-26 Thread Day, Phil

Hi Folks,

I could do with some pointers on config value deprecation.

All of the examples in the code and documentation seem to deal with  the case 
of old_opt being replaced by new_opt but still returning the same value
Here using deprecated_name and  / or deprecated_opts in the definition of 
new_opt lets me still get the value (and log a warning) if the config still 
uses old_opt

However my use case is different because while I want deprecate old-opt, 
new_opt doesn't take the same value and I need to  different things depending 
on which is specified, i.e. If old_opt is specified and new_opt isn't I still 
want to do some processing specific to old_opt and log a deprecation warning.

Clearly I can code this up as a special case at the point where I look for the 
options - but I was wondering if there is some clever magic in oslo.config that 
lets me declare this as part of the option definition ?



As a second point,  I thought that using a deprecated option automatically 
logged a warning, but in the latest Devstack wait_soft_reboot_seconds is 
defined as:

cfg.IntOpt('wait_soft_reboot_seconds',
   default=120,
   help='Number of seconds to wait for instance to shut down after'
' soft reboot request is made. We fall back to hard reboot'
' if instance does not shutdown within this window.',
   deprecated_name='libvirt_wait_soft_reboot_seconds',
   deprecated_group='DEFAULT'),



but if I include the following in nova.conf

libvirt_wait_soft_reboot_seconds = 20


I can see the new value of 20 being used, but there is no warning logged that 
I'm using a deprecated name ?

Thanks
Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] How do I mark one option as deprecating another one ?

2014-02-27 Thread Day, Phil

Hi Denis,

Thanks for the pointer, but I looked at that and I my understanding is that it 
only allows me to retrieve a value by an old name, but doesn't let me know that 
the old name has been used.  So If all I wanted to do was change the name/group 
of the config value it would be fine.  But in my case I need to be able to 
implement:
If new_value_defined:
  do_something
else if old_value_defined:
 warn_about_deprectaion
do_something_else

Specifically I want to replace tenant_name based authentication with tenant_id 
- so I need to know which has been specified.

Phil


From: Denis Makogon [mailto:dmako...@mirantis.com]
Sent: 26 February 2014 14:31
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] How do I mark one option as deprecating another 
one ?

Here what oslo.config documentation says.

Represents a Deprecated option. Here's how you can use it

oldopts = [cfg.DeprecatedOpt('oldfoo', group='oldgroup'),
   cfg.DeprecatedOpt('oldfoo2', group='oldgroup2')]
cfg.CONF.register_group(cfg.OptGroup('blaa'))
cfg.CONF.register_opt(cfg.StrOpt('foo', deprecated_opts=oldopts),
   group='blaa')

Multi-value options will return all new and deprecated
options.  For single options, if the new option is present
([blaa]/foo above) it will override any deprecated options
present.  If the new option is not present and multiple
deprecated options are present, the option corresponding to
the first element of deprecated_opts will be chosen.
I hope that it'll help you.

Best regards,
Denis Makogon.

On Wed, Feb 26, 2014 at 4:17 PM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I could do with some pointers on config value deprecation.

All of the examples in the code and documentation seem to deal with  the case 
of old_opt being replaced by new_opt but still returning the same value
Here using deprecated_name and  / or deprecated_opts in the definition of 
new_opt lets me still get the value (and log a warning) if the config still 
uses old_opt

However my use case is different because while I want deprecate old-opt, 
new_opt doesn't take the same value and I need to  different things depending 
on which is specified, i.e. If old_opt is specified and new_opt isn't I still 
want to do some processing specific to old_opt and log a deprecation warning.

Clearly I can code this up as a special case at the point where I look for the 
options - but I was wondering if there is some clever magic in oslo.config that 
lets me declare this as part of the option definition ?



As a second point,  I thought that using a deprecated option automatically 
logged a warning, but in the latest Devstack wait_soft_reboot_seconds is 
defined as:

cfg.IntOpt('wait_soft_reboot_seconds',
   default=120,
   help='Number of seconds to wait for instance to shut down after'
' soft reboot request is made. We fall back to hard reboot'
' if instance does not shutdown within this window.',
   deprecated_name='libvirt_wait_soft_reboot_seconds',
   deprecated_group='DEFAULT'),



but if I include the following in nova.conf

libvirt_wait_soft_reboot_seconds = 20


I can see the new value of 20 being used, but there is no warning logged that 
I'm using a deprecated name ?

Thanks
Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil

 -Original Message-
 From: Chris Behrens [mailto:cbehr...@codestud.com]
 Sent: 26 February 2014 22:05
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Future of the Nova API

 This thread is many messages deep now and I'm busy with a conference this
 week, but I wanted to carry over my opinion from the other v3 API in
 Icehouse thread and add a little to it.

 Bumping versions is painful. v2 is going to need to live for a long time to
 create the least amount of pain. I would think that at least anyone running a
 decent sized Public Cloud would agree, if not anyone just running any sort of
 decent sized cloud. I don't think there's a compelling enough reason to
 deprecate v2 and cause havoc with what we currently have in v3. I'd like us
 to spend more time on the proposed tasks changes. And I think we need
 more time to figure out if we're doing versioning in the correct way. If we've
 got it wrong, a v3 doesn't fix the problem and we'll just be causing more
 havoc with a v4.

 - Chris

Like Chris I'm struggling to keep up with this thread,  but of all the various 
messages I've read this is the one that resonates most with me.

My perception of the V3 API improvements (in order to importance to me):
i) The ability to version individual extensions
Crazy that small improvements can't be introduced without having to create a 
new extension,  when often the extension really does nothing more that indicate 
that some other part of the API code has changed.

ii) The opportunity to get the proper separation between Compute and Network 
APIs
Being (I think) one of the few clouds that provides both the Nova and Neutron 
API this is a major source of confusion and hence support calls.

iii) The introduction of the task model
I like the idea of tasks, and think it will be a much easier way for users to 
interact with the system.   Not convinced that it couldn't co-exist in V2 
thought rather than having to co-exist as V2 and V3

iv)Clean-up of a whole bunch of minor irritations / inconsistencies
There are lots of things that are really messy (inconsistent error codes, 
aspects of core that are linked to just Xen, etc, etc).  They annoy people the 
first time they hit them, then the code around them and move on.Probably 
I've had more hate mail from people writing language bindings than application 
developers (who tend to be abstracted from this by the clients)

 Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 24 February 2014 23:49
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Future of the Nova API

  Similarly with a Xen vs KVM situation I don't think its an extension
  related issue. In V2 we have features in *core* which are only
  supported by some virt backends. It perhaps comes down to not being
  willing to say either that we will force all virt backends to support
  all features in the API or they don't get in the tree. Or
  alternatively be willing to say no to any feature in the API which can
  not be currently implemented in all virt backends. The former greatly
  increases the barrier to getting a hypervisor included, the latter
  restricts Nova development to the speed of the slowest developing and
  least mature hypervisor supported.

 Actually, the problem is not feature parity. The problem lies where two
 drivers implement the same or similar functionality, but the public API for a
 user to call the functionality is slightly different depending on which 
 driver is
 used by the deployer.

 There's nothing wrong at all (IMO) in having feature disparity amongst
 drivers.

I agree with the rest of your posy Jay, but I  think there are some feature 
parity issues - for example having rescue always return a generated admin 
password when only some (one ?) Hypervisor supports actually setting the 
password is an issue. 

For some calls (create , rebuild) this can be suppressed by a Conf value 
(enable_instance_password) but when I tried to get that extended to Rescue in 
V2 it was blocked as a would break compatibility - either add an extension or 
only do it in V3 change.   So clients have to be able to cope with an optional 
attribute in the response to create/rebuild (because they can't inspect the API 
to see if the conf value is set), but can't be expected to cope with in the 
response from rescue apparently ;-(

 Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil

The current set of reviews on this change seems relevant to this debate:  
https://review.openstack.org/#/c/43822/

In effect a fully working and tested change which makes the nova-net / neutron 
compatibility via the V2 API that little bit closer to being complete is being 
blocked because it's thought that by not having it people will be quicker to 
move to V3 instead.

Folks this is just madness - no one is going to jump to using V3 just because 
we don't fix minor things like this in V2,  they're just as likely to start 
jumping to something completely different because that Openstack stuff is just 
too hard to work with. User's don't think like developers, and you can't 
force them into a new API by deliberately keeping the old one bad - at least 
not if you want to keep them as users in the long term.

I can see an argument (maybe) for not adding lots of completely new features 
into V2 if V3 was already available in a stable form - but V2 already provides 
a nearly complete support for nova-net features on top of Neutron.I fail to 
see what is wrong with continuing to improve that.

Phil

 -Original Message-
 From: Day, Phil
 Sent: 28 February 2014 11:07
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
  -Original Message-
  From: Chris Behrens [mailto:cbehr...@codestud.com]
  Sent: 26 February 2014 22:05
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
 
  This thread is many messages deep now and I'm busy with a conference
  this week, but I wanted to carry over my opinion from the other v3
  API in Icehouse thread and add a little to it.
 
  Bumping versions is painful. v2 is going to need to live for a long
  time to create the least amount of pain. I would think that at least
  anyone running a decent sized Public Cloud would agree, if not anyone
  just running any sort of decent sized cloud. I don't think there's a
  compelling enough reason to deprecate v2 and cause havoc with what we
  currently have in v3. I'd like us to spend more time on the proposed
  tasks changes. And I think we need more time to figure out if we're
  doing versioning in the correct way. If we've got it wrong, a v3
  doesn't fix the problem and we'll just be causing more havoc with a v4.
 
  - Chris
 
 Like Chris I'm struggling to keep up with this thread,  but of all the various
 messages I've read this is the one that resonates most with me.
 
 My perception of the V3 API improvements (in order to importance to me):
 i) The ability to version individual extensions Crazy that small improvements
 can't be introduced without having to create a new extension,  when often
 the extension really does nothing more that indicate that some other part of
 the API code has changed.
 
 ii) The opportunity to get the proper separation between Compute and
 Network APIs Being (I think) one of the few clouds that provides both the
 Nova and Neutron API this is a major source of confusion and hence support
 calls.
 
 iii) The introduction of the task model
 I like the idea of tasks, and think it will be a much easier way for users to
 interact with the system.   Not convinced that it couldn't co-exist in V2
 thought rather than having to co-exist as V2 and V3
 
 iv)Clean-up of a whole bunch of minor irritations / inconsistencies
 There are lots of things that are really messy (inconsistent error codes,
 aspects of core that are linked to just Xen, etc, etc).  They annoy people the
 first time they hit them, then the code around them and move on.Probably
 I've had more hate mail from people writing language bindings than
 application developers (who tend to be abstracted from this by the clients)
 
 
  Phil
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil

Sorry if I'm coming late to this thread, but why would you define AZs to cover 
othognal zones ?

AZs are a very specific form of aggregate - they provide a particular isolation 
schematic between the hosts (i.e. physical hosts are never in more than one AZ) 
- hence the availability in the name.

AZs are built on aggregates, and yes aggregates can overlap and aggreagtes are 
used for scheduling.

So if you want to schedule on features as well as (or instead of) physical 
isolation, then you can already:

- Create an aggregate that contains hosts with fast CPUs
- Create another aggregate that includes hosts with SSDs
- Write (or configure in some cases) schedule filters that look at something in 
the request (such as schedule hint, an image property, or a flavor extra_spec) 
so that the scheduler can filter on those aggregates

nova boot --availability-zone az1 --scheduler-hint want-fast-cpu 
--scheduler-hint want-ssd  ...

nova boot --availability-zone az1 --flavor 1000
(where flavor 1000 has extra spec that says it needs fast cpu and ssd)

But there is no need that I can see to make AZs overlapping just to so the same 
thing - that would break what everyone (including folks used to working with 
AWS) expects from an AZ




 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 27 March 2014 13:18
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..
 
 On 03/27/2014 05:03 AM, Khanh-Toan Tran wrote:
 
  Well, perhaps I didn't make it clearly enough. What I intended to say
  is that user should be able to select a set of AZs in his request,
  something like :
 
   nova  boot   --flavor 2  --image ubuntu   --availability-zone
  Z1  --availability-zone AZ2  vm1
 
 I think it would make more sense to make the availability-zone argument
 take a comma-separated list of zones.
 
 nova boot --flavor 2 --image ubuntu --availability-zone AZ1,AZ2 vm1
 
 
 Just to clarify, in a case like this we're talking about using the 
 intersection of
 the two zones, right?  That's the only way that makes sense when using
 orthogonal zones like hosts with fast CPUs and hosts with SSDs.
 
 Chris
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil

 
 The need arises when you need a way to use both the zones to be used for
 scheduling when no specific zone is specified. The only way to do that is
 either have a AZ which is a superset of the two AZ or the other way could be
 if the default_scheduler_zone can take a list of zones instead of just 1.

If you don't configure a default_schedule_zone, and don't specify an 
availability_zone to the request  - then I thought that would make the AZ 
filter in effect ignore AZs for that request.  Isn't that want you need ?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil

 -Original Message-
 From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
 Sent: 26 March 2014 20:33
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..

 On Mar 26, 2014, at 11:40 AM, Jay Pipes jaypi...@gmail.com wrote:

  On Wed, 2014-03-26 at 09:47 -0700, Vishvananda Ishaya wrote:
  Personally I view this as a bug. There is no reason why we shouldn't
  support arbitrary grouping of zones. I know there is at least one
  problem with zones that overlap regarding displaying them properly:

  https://bugs.launchpad.net/nova/+bug/1277230

  There is probably a related issue that is causing the error you see
  below. IMO both of these should be fixed. I also think adding a
  compute node to two different aggregates with azs should be allowed.

  It also might be nice to support specifying multiple zones in the
  launch command in these models. This would allow you to limit booting
  to an intersection of two overlapping zones.

  A few examples where these ideas would be useful:

  1. You have 3 racks of servers and half of the nodes from each rack
  plugged into a different switch. You want to be able to specify to
  spread across racks or switches via an AZ. In this model you could
  have a zone for each switch and a zone for each rack.

  2. A single cloud has 5 racks in one room in the datacenter and 5
  racks in a second room. You'd like to give control to the user to
  choose the room or choose the rack. In this model you would have one
  zone for each room, and smaller zones for each rack.

  3. You have a small 3 rack cloud and would like to ensure that your
  production workloads don't run on the same machines as your dev
  workloads, but you also want to use zones spread workloads across the
  three racks. Similarly to 1., you could split your racks in half via
  dev and prod zones. Each one of these zones would overlap with a rack
  zone.

  You can achieve similar results in these situations by making small
  zones (switch1-rack1 switch1-rack2 switch1-rack3 switch2-rack1
  switch2-rack2 switch2-rack3) but that removes the ability to decide
  to launch something with less granularity. I.e. you can't just
  specify 'switch1' or 'rack1' or 'anywhere'

  I'd like to see all of the following work nova boot ... (boot anywhere)
  nova boot -availability-zone switch1 ... (boot it switch1 zone) nova
  boot -availability-zone rack1 ... (boot in rack1 zone) nova boot
  -availability-zone switch1,rack1 ... (boot

  Personally, I feel it is a mistake to continue to use the Amazon
  concept of an availability zone in OpenStack, as it brings with it the
  connotation from AWS EC2 that each zone is an independent failure
  domain. This characteristic of EC2 availability zones is not enforced
  in OpenStack Nova or Cinder, and therefore creates a false expectation
  for Nova users.

  In addition to the above problem with incongruent expectations, the
  other problem with Nova's use of the EC2 availability zone concept is
  that availability zones are not hierarchical -- due to the fact that
  EC2 AZs are independent failure domains. Not having the possibility of
  structuring AZs hierarchically limits the ways in which Nova may be
  deployed -- just see the cells API for the manifestation of this
  problem.

  I would love it if the next version of the Nova and Cinder APIs would
  drop the concept of an EC2 availability zone and introduce the concept
  of a generic region structure that can be infinitely hierarchical in
  nature. This would enable all of Vish's nova boot commands above in an
  even simpler fashion. For example:

  Assume a simple region hierarchy like so:

   regionA
   /  \
  regionBregionC

  # User wants to boot in region B
  nova boot --region regionB
  # User wants to boot in either region B or region C nova boot --region
  regionA

 I think the overlapping zones allows for this and also enables additional use
 cases as mentioned in my earlier email. Hierarchical doesn't work for the
 rack/switch model. I'm definitely +1 on breaking from the amazon usage of
 availability zones but I'm a bit leery to add another parameter to the create
 request. It is also unfortunate that region already has a meaning in the
 amazon world which will add confusion.

 Vish

Ok, got far enough back down my stack to understand the drive here, and I kind 
of understand the use case, but I think what's missing is that currently we 
only allow for one group of availability zones.

I can see why you would want them to overlap in a certain way - i.e. a rack 
based zone could overlap with a switch based zone - but I still don't want 
any overlap within the set of switch based zones, or any overlap within the 
set of rack based zones.

Maybe the issue is that when we converted / mapped  AZs onto aggregates we only 
ever considered that there

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil

 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 27 March 2014 18:15
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..

 On 03/27/2014 11:48 AM, Day, Phil wrote:
  Sorry if I'm coming late to this thread, but why would you define AZs
  to cover othognal zones ?

 See Vish's first message.

  AZs are a very specific form of aggregate - they provide a particular
  isolation schematic between the hosts (i.e. physical hosts are never
  in more than one AZ) - hence the availability in the name.

 That's why I specified orthogonal.  If you're looking at different resources
 then it makes sense to have one host be in different AZs because the AZs are
 essentially in different namespaces.

 So you could have hosts in server room A vs hosts in server room B.
   Or hosts on network switch A vs hosts on network switch B.  Or hosts
 with SSDs vs hosts with disks.  Then you could specify you want to boot an
 instance in server room A, on switch B, on a host with SSDs.

  AZs are built on aggregates, and yes aggregates can overlap and
  aggreagtes are used for scheduling.

  So if you want to schedule on features as well as (or instead of)
  physical isolation, then you can already:

  - Create an aggregate that contains hosts with fast CPUs - Create
  another aggregate that includes hosts with SSDs - Write (or
  configure in some cases) schedule filters that look at something in
  the request (such as schedule hint, an image property, or a flavor
  extra_spec) so that the scheduler can filter on those aggregates

  nova boot --availability-zone az1 --scheduler-hint want-fast-cpu
  --scheduler-hint want-ssd  ...

 Does this actually work?  The docs only describe setting the metadata on the
 flavor, not as part of the boot command.

If you want to be able to pass it in as explicit hints then you need to write a 
filter to cope with that hint- I was using it as an example of the kind of 
relationship between hints and aggregate filtering 
The more realistic example for this kind of attribute is to make it part of the 
flavor and use the aggregate_instance_extra_spec filter - which does exactly 
this kind of filtering (for overlapping aggregates)

  nova boot --availability-zone az1 --flavor 1000 (where flavor 1000 has
  extra spec that says it needs fast cpu and ssd)

  But there is no need that I can see to make AZs overlapping just to so
  the same thing - that would break what everyone (including folks used
  to working with AWS) expects from an AZ

 As an admin user you can create arbitrary host aggregates, assign metadata,
 and have flavors with extra specs to look for that metadata.

 But as far as I know there is no way to match host aggregate information on a
 per-instance basis.

Matching aggregate information on a per-instance basis is what the scheduler 
filters do.

Well yes  it is down to the admin to decide what groups are going to be 
available, how to map them into aggregates, how to map that into flavors (which 
are often the link to a charging mechanism) - but once they've done that then 
the user can work within those bounds by choosing the correct flavor, image, 
etc.

 Also, unless things have changed since I looked at it last as a regular user 
 you
 can't create new flavors so the only way to associate an instance with a host
 aggregate is via an availability zone.

Well it depends on the roles you want to assign to your users really and how 
you set up your policy file, but in general you don't want users defining 
flavors, the cloud admin defines the flavors based on what makes sense from 
their environment.

 Chris

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-28 Thread Day, Phil

 Personally, I feel it is a mistake to continue to use the Amazon concept
 of an availability zone in OpenStack, as it brings with it the
 connotation from AWS EC2 that each zone is an independent failure
 domain. This characteristic of EC2 availability zones is not enforced in
 OpenStack Nova or Cinder, and therefore creates a false expectation for
 Nova users.

I think this is backwards training, personally. I think azs as separate failure
domains were done like that for a reason by amazon, and make good sense. 
What we've done is overload that with cells, aggregates etc which should 
have a better interface and are a different concept. Redefining well 
understood 
terms because they don't suite your current implementation is a slippery 
slope, 
and overloading terms that already have a meaning in the industry in just 
annoying.

+1
I don't think there is anything wrong with identifying new use cases and 
working out how to cope with them:

 - First we generalized Aggregates
- Then we mapped AZs onto aggregates as a special mutually exclusive group
- Now we're recognizing that maybe we need to make those changes to support AZs 
more generic so we can create additional groups of mutually exclusive aggregates

That all feels like good evolution.

But I don't see why that means we have to fit that in under the existing 
concept of AZs - why can't we keep AZs as they are and have a better thing 
called Zones that is just an OSAPI concept and is better that AZs ?
Arguments around not wanting to add new options to create server seem a bit 
weak to me - for sure we don't want to add them in an uncontrolled way, but if 
we have a new, richer, concept we should be able to express that separately.

I'm still not personally convinced by the need use cases of racks having 
orthogonal power failure domains and switch failure domains - that seems to me 
from a practical perspective that it becomes really hard to work out where to 
separate VMs so that they don't share a failure mode.Every physical DC 
design I've been involved with tries to get the different failure domains to 
align.   However if it the use case makes sense to someone then I'm not against 
extending aggregates to support multiple mutually exclusive groups.

I think I see a Design Summit session emerging here

Phil
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-07 Thread Day, Phil

Hi Sylvain,

There was a similar thread on this recently - which might be worth reviewing:   
http://lists.openstack.org/pipermail/openstack-dev/2014-March/031006.html

Some interesting use cases were posted, and a I don't think a conclusion was 
reached, which seems to suggest this might be a good case for a session in 
Atlanta.

Personally I'm not sure that selecting more than one AZ really makes a lot of 
sense - they are generally objects which are few in number and large in scale, 
so if for example there are 3 AZs and you want to create two servers in 
different AZs, does it really help if you can do the sequence:


-  Create a server in any AZ

-  Find the AZ the server is in

-  Create a new server in any of the two remaining AZs

Rather than just picking two from the list to start with ?

If you envisage a system with many AZs, and thereby allow users some pretty 
find grained choices about where to place their instances, then I think you'll 
end up with capacity management issues.

If the use case is more to get some form of server isolation, then 
server-groups might be worth looking at, as these are dynamic and per user.

I can see a case for allowing more than one set of mutually exclusive host 
aggregates - at the moment that's a property implemented just for the set of 
aggregates that are designated as AZs, and generalizing that concept so that 
there can be other sets (where host overlap is allowed between sets, but not 
within a set) might be useful.

Phil

From: Murray, Paul (HP Cloud Services)
Sent: 03 April 2014 16:34
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

Hi Sylvain,

I would go with keeping AZs exclusive. It is a well-established concept even if 
it is up to providers to implement what it actually means in terms of 
isolation. Some good use cases have been presented on this topic recently, but 
for me they suggest we should develop a better concept rather than bend the 
meaning of the old one. We certainly don't have hosts in more than one AZ in HP 
Cloud and I think some of our users would be very surprised if we changed that.

Paul.

From: Khanh-Toan Tran [mailto:khanh-toan.t...@cloudwatt.com]
Sent: 03 April 2014 15:53
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

+1 for AZs not sharing hosts.

Because it's the only mechanism that allows us to segment the datacenter. 
Otherwise we cannot provide redundancy to client except using Region which is 
dedicated infrastructure and networked separated and anti-affinity filter which 
IMO is not pragmatic as it has tendency of abusive usage.  Why sacrificing this 
power so that users can select the types of his desired physical hosts ? The 
latter can be exposed using flavor metadata, which is a lot safer and more 
controllable than using AZs. If someone insists that we really need to let 
users choose the types of physical hosts, then I suggest creating a new hint, 
and use aggregates with it. Don't sacrifice AZ exclusivity!

Btw, there is a datacenter design called dual-room [1] which I think best fit 
for AZs to make your cloud redundant even with one datacenter.

Best regards,

Toan

[1] IBM and Cisco: Together for a World Class Data Center, Page 141. 
http://books.google.fr/books?id=DHjJAgAAQBAJpg=PA141#v=onepageqf=false



De : Sylvain Bauza [mailto:sylvain.ba...@gmail.com]
Envoyé : jeudi 3 avril 2014 15:52
À : OpenStack Development Mailing List (not for usage questions)
Objet : [openstack-dev] [Nova] Hosts within two Availability Zones : possible 
or not ?

Hi,

I'm currently trying to reproduce [1]. This bug requires to have the same host 
on two different aggregates, each one having an AZ.

IIRC, Nova API prevents hosts of being part of two distinct AZs [2], so IMHO 
this request should not be possible.
That said, there are two flaws where I can identify that no validation is done :
 - when specifying an AZ in nova.conf, the host is overriding the existing AZ 
by its own
 - when adding an host to an aggregate without AZ defined, and afterwards 
update the aggregate to add an AZ


So, I need direction. Either we consider it is not possible to share 2 AZs for 
the same host and then we need to fix the two above scenarios, or we say it's 
nice to have 2 AZs for the same host and then we both remove the validation 
check in the API and we fix the output issue reported in the original bug [1].


Your comments are welcome.
Thanks,
-Sylvain


[1] : https://bugs.launchpad.net/nova/+bug/1277230

[2] : 
https://github.com/openstack/nova/blob/9d45e9cef624a4a972c24c47c7abd57a72d74432/nova/compute/api.py#L3378
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Server Groups are not an optional element, bug or feature ?

2014-04-07 Thread Day, Phil

Hi Folks,

Generally the scheduler's capabilities that are exposed via hints can be 
enabled or disabled in a Nova install by choosing the set of filters that are 
configured. However the server group feature doesn't fit that pattern - 
even if the affinity filter isn't configured the anti-affinity check on the 
server will still impose the anti-affinity behavior via throwing the request 
back to the scheduler.

I appreciate that you can always disable the server-groups API extension, in 
which case users can't create a group (and so the server create will fail if 
one is specified), but that seems kind of at odds with other type of scheduling 
that has to be specifically configured in rather than out of a base system.
In particular having the API extension in by default but the ServerGroup 
Affinity and AntiAffinity  filters not in by default seems an odd combination 
(it kind of works, but only by a retry from the host and that's limited to a 
number of retries).

Given that the server group work isn't complete yet (for example the list of 
instances in a group isn't tided up when an instance is deleted) I feel a tad 
worried that the current default configuration exposed this rather than keeping 
it as something that has to be explicitly enabled - what do others think ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] config options, defaults, oh my!

2014-04-08 Thread Day, Phil

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 07 April 2014 21:01
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [TripleO] config options, defaults, oh my!

 So one interesting thing from the influx of new reviews is lots of patches
 exposing all the various plumbing bits of OpenStack. This is good in some
 ways (yay, we can configure more stuff), but in some ways its kindof odd -
 like - its not clear when https://review.openstack.org/#/c/83122/ is needed.

 I'm keen to expose things that are really needed, but i'm not sure that /all/
 options are needed - what do folk think? 

I'm very wary of trying to make the decision in TripleO of what should and 
shouldn't be configurable in some other project.For sure the number of 
config options in Nova is a problem, and one that's been discussed many times 
at summits.   However I think you could also make the case/assumption for any 
service that the debate about having a config option has already been held 
within that service as part of the review that merged that option in the code - 
re-running the debate about whether something should be configurable via 
TripleO feels like some sort of policing function on configurability above and 
beyond what the experts in that service have already considered, and that 
doesn't feel right to me.

Right now TripleO has a very limited view of what can be configured, based as I 
understand on primarily what's needed for its CI job.  As more folks who have 
real deployments start to look at using TripleO its inevitable that they are 
going to want to enable the settings that are important to them to be 
configured.  I can't imagine that anyone is going to add a configuration value 
for the sake of it, so can't we start with the perspective that we are slowly 
exposing the set of values that do need to be configured ?

Also, some things really should be higher order operations - like the neutron 
callback to nova right - that should
 be either set to timeout in nova  configured in neutron, *or* set in both
 sides appropriately, never one-half or the other.

 I think we need to sort out our approach here to be systematic quite quickly
 to deal with these reviews.

 Here's an attempt to do so - this could become a developers guide patch.

 Config options in TripleO
 ==

 Non-API driven configuration falls into four categories:
 A - fixed at buildtime (e.g. ld.so path) B - cluster state derived C - local
 machine derived D - deployer choices

 For A, it should be entirely done within the elements concerned.

 For B, the heat template should accept parameters to choose the desired
 config (e.g. the Neutron-Nova example able) but then express the config in
 basic primitives in the instance metadata.

 For C, elements should introspect the machine (e.g. memory size to
 determine mysql memory footprint) inside os-refresh-config scripts; longer
 term we should make this an input layer to os-collect-config.

 For D, we need a sensible parameter in the heat template and probably
 direct mapping down to instance metadata.

I understand the split, but all of the reviews in question seem to be in D, so 
I'm not sure this helps much.  

 But we have a broader question - when should something be configurable at
 all?

 In my mind we have these scenarios:
 1) There is a single right answer
 2) There are many right answers

 An example of 1) would be any test-only option like failure injection
 - the production value is always 'off'. For 2), hypervisor driver is a great
 example - anything other than qemu is a valid production value
 :)

 But, it seems to me that these cases actually subdivide further -
 1a) single right answer, and the default is the right answer
 1b) single right answer and it is not the default
 2a) many right answers, and the default is the most/nearly most common
 one
 2b) many right answers, and the default is either not one of them or is a
 corner case

 So my proposal here - what I'd like to do as we add all these config options 
 to
 TripleO is to take the care to identify which of A/B/C/D they are and code
 them appropriately, and if the option is one of 1b) or 2b) make sure there is 
 a
 bug in the relevant project about the fact that we're having to override a
 default. If the option is really a case of 1a) I'm not sure we want it
 configurable at all.

I'm not convinced that anyone is in a position to judge that there is a single 
right answer - I know the values that are right for my deployments, but I'm not 
arrogant enough to say that they universally applicable.You only have to 
see the  wide range of Openstack Deployments presented at every summit to know 
that that there a lot of different use cases out there.   My worry is that if 
we try to have that debate in the context of a TripleO review, then we'll just 
spin between opinions rather than make the rapid progress towards getting the 
needed

[openstack-dev] Enabling ServerGroup filters by default (was RE: [nova] Server Groups are not an optional element, bug or feature ?)

2014-04-08 Thread Day, Phil

 https://bugs.launchpad.net/nova/+bug/1303983
 
 --
 Russell Bryant

Wow - was there really a need to get that change merged within 12 hours and 
before others had a chance to review and comment on it ?

I see someone has already queried (post the merge) if there isn't a performance 
impact.

I've raised this point before - but apart from non-urgent security fixes 
shouldn't there be a minimum review period to make sure that all relevant 
feedback can be given ?

Phil 

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 07 April 2014 20:38
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
 element, bug or feature ?
 
 On 04/07/2014 02:12 PM, Russell Bryant wrote:
  On 04/07/2014 01:43 PM, Day, Phil wrote:
  Generally the scheduler's capabilities that are exposed via hints can
  be enabled or disabled in a Nova install by choosing the set of filters
  that are configured. However the server group feature doesn't fit
  that pattern - even if the affinity filter isn't configured the
  anti-affinity check on the server will still impose the anti-affinity
  behavior via throwing the request back to the scheduler.
 
  I appreciate that you can always disable the server-groups API
  extension, in which case users can't create a group (and so the
  server create will fail if one is specified), but that seems kind of
  at odds with other type of scheduling that has to be specifically 
  configured
 in
  rather than out of a base system.In particular having the API
  extension in by default but the ServerGroup Affinity and AntiAffinity
  filters not in by default seems an odd combination (it kind of works,
  but only by a retry from the host and that's limited to a number of
  retries).
 
  Given that the server group work isn't complete yet (for example the
  list of instances in a group isn't tided up when an instance is
  deleted) I feel a tad worried that the current default configuration
  exposed this rather than keeping it as something that has to be
  explicitly enabled - what do others think ?
 
  I consider it a complete working feature.  It makes sense to enable
  the filters by default.  It's harmless when the API isn't used.  That
  was just an oversight.
 
  The list of instances in a group through the API only shows
  non-deleted instances.
 
  There are some implementation details that could be improved (the
  check on the server is the big one).
 
 
 https://bugs.launchpad.net/nova/+bug/1303983
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

2014-04-08 Thread Day, Phil

Its more than just non-admin,  it also allows a user to lock an instance so 
that they don’t accidentally perform some operation on a VM.

At one point it was (by default) an admin only operation on the OSAPI, but its 
always been open to all users in EC2.   Recently it was changed so that admin 
and non-admin locks are considered as separate things.

From: Chen CH Ji [mailto:jiche...@cn.ibm.com]
Sent: 08 April 2014 07:13
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

the instance lock is a mechanism that prevent non-admin user to operate on the 
instance (resize, etc, looks to me snapshot is not currently included)
the permission is a wider concept that major in API layer to allow or prevent 
user in using the API , guess instance lock might be enough for prevent 
instance actions .

Best Regards!

Kevin (Chen) Ji 纪 晨

Engineer, zVM Development, CSTL
Notes: Chen CH Ji/China/IBM@IBMCN   Internet: 
jiche...@cn.ibm.commailto:jiche...@cn.ibm.com
Phone: +86-10-82454158
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, 
Beijing 100193, PRC

[Inactive hide details for Hopper, Justin ---04/08/2014 02:05:02 PM---Phil, I 
am reviewing the existing “check_instance_lock]Hopper, Justin ---04/08/2014 
02:05:02 PM---Phil, I am reviewing the existing “check_instance_lock” 
implementation to see

From: Hopper, Justin justin.hop...@hp.commailto:justin.hop...@hp.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org,
Date: 04/08/2014 02:05 PM
Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

Phil,

I am reviewing the existing “check_instance_lock” implementation to see
how it might be leveraged.  Off the cuff, it looks pretty much what we
need.  I need to look into the permissions to better understand how one
can “lock” and instance.

Thanks for the guidance.

Justin Hopper
Software Engineer - DBaaS
irc: juice | gpg: EA238CF3 | twt: @justinhopper

On 4/7/14, 10:01, Day, Phil philip@hp.commailto:philip@hp.com 
wrote:

I can see the case for Trove being to create an instance within a
customer's tenant (if nothing else it would make adding it onto their
Neutron network a lot easier), but I'm wondering why it really needs to
be hidden from them ?

If the instances have a name that makes it pretty obvious that Trove
created them, and the user presumably knows that did this from Trove, why
hide them  ?I'd of thought that would lead to a whole bunch of
confusion and support calls when they  try to work out why they are out
of quota and can only see subset of the instances being counted by the
system.

If the need is to stop the users doing something with those instances
then maybe we need an extension to the lock mechanism such that a lock
can be made by a specific user (so the trove user in the same tenant
could lock the instance so that a non-trove user in that tenant couldn’t
unlock ).  We already have this to an extent, in that an instance locked
by an admin can' t be unlocked by the owner, so I don’t think it would be
too hard to build on that.   Feels like that would be a lot more
transparent than trying to obfuscate the instances themselves.

 -Original Message-
 From: Hopper, Justin
 Sent: 06 April 2014 01:37
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

 Russell,

 Thanks for the quick reply. If I understand what you are suggesting it
is that
 there would be one Trove-Service Tenant/User that owns all instances
from
 the perspective of Nova.  This was one option proposed during our
 discussions.  However, what we thought would be best is to continue to
use
 the user credentials so that Nova has the correct association.  We
wanted a
 more substantial and deliberate relationship between Nova and a
 dependent service.  In this relationship, Nova would acknowledge which
 instances are being managed by which Services and while ownership was
still
 to that of the User, management/manipulation of said Instance would be
 solely done by the Service.

 At this point the guard that Nova needs to provide around the instance
does
 not need to be complex.  It would even suffice to keep those instances
 hidden from such operations as ³nova list² when invoked by directly by
the
 user.

 Thanks,

 Justin Hopper
 Software Engineer - DBaaS
 irc: juice | gpg: EA238CF3 | twt: @justinhopper

 On 4/5/14, 14:20, Russell Bryant 
 rbry...@redhat.commailto:rbry...@redhat.com wrote:

 On 04/04/2014 08:12 PM, Hopper, Justin wrote:
  Greetings,

  I am trying to address an issue from certain perspectives and I think
  some support from Nova may be needed.

  _Problem_
  Services like Trove use run in Nova Compute Instances.  These
  Services try to provide an integrated

Re: [openstack-dev] [nova] Server Groups are not an optional element, bug or feature ?

2014-04-08 Thread Day, Phil

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 07 April 2014 19:12
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
 element, bug or feature ?

...

 I consider it a complete working feature.  It makes sense to enable the 
 filters
 by default.  It's harmless when the API isn't used.  That was just an 
 oversight.

 The list of instances in a group through the API only shows non-deleted
 instances.

True, but the lack of even a soft delete on the rows in the 
instance_group_member worries me  - its not clear why that wasn't fixed  rather 
than just hiding the deleted instances.I'd of expected the full DB 
lifecycle to implemented before something was considered as a complete working 
feature.

 There are some implementation details that could be improved (the check
 on the server is the big one).

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler meeting and Icehouse Summit

2013-10-14 Thread Day, Phil

Hi Folks,

In the weekly scheduler meeting we've been trying to pull together a 
consolidated list of Summit sessions so that we can find logical groupings and 
make a more structured set of sessions for the limited time available at the 
summit.

https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions

With the deadline for sessions being this Thursday 17th, tomorrows IRC meeting 
is the last chance to decide which sessions we want to combine / prioritize.
Russell has indicated that a starting assumption of three scheduler sessions is 
reasonable, with any extras depending on what else is submitted.

I've matched the list on the Either pad to submitted sessions below, and added 
links to any other proposed sessions that look like they are related.


1) Instance Group Model and API
Session Proposal:  http://summit.openstack.org/cfp/details/190
  
2) Smart Resource Placement:
Session Proposal:  http://summit.openstack.org/cfp/details/33
Possibly related sessions:  Resource optimization service for nova  
(http://summit.openstack.org/cfp/details/201)

3) Heat and Scheduling and Software, Oh My!:
Session Proposal: http://summit.openstack.org/cfp/details/113

4) Generic Scheduler Metrics and Celiometer:
Session Proposal: http://summit.openstack.org/cfp/details/218
Possibly related sessions:  Making Ceilometer and Nova play nice  
http://summit.openstack.org/cfp/details/73

5) Image Properties and Host Capabilities
Session Proposal:  NONE

6) Scheduler Performance:
Session Proposal:  NONE
Possibly related Sessions: Rethinking Scheduler Design  
http://summit.openstack.org/cfp/details/34

7) Scheduling Across Services:
Session Proposal: NONE

8) Private Clouds:
Session Proposal:   http://summit.openstack.org/cfp/details/228

9) Multiple Scheduler Policies:
Session Proposal: NONE


The proposal from last weeks meeting was to use the three slots for:
- Instance Group Model and API   (1)
- Smart Resource Placement (2)
- Performance (6)

However, at the moment there doesn't seem to be a session proposed to cover the 
performance work ?

It also seems to me that the Group Model and Smart Placement are pretty closely 
linked along with (3) (which says it wants to combine 1  2 into the same 
topic) , so if we only have three slots available then these look like logical 
candidates for consolidating into a single session.That would free up a 
session to cover the generic metrics (4) and Ceilometer - where a lot of work 
in Havana stalled because we couldn't get a consensus on the way forward.  The 
third slot would be kept for performance - which based on the lively debate in 
the scheduler meetings I'm assuming will still be submitted as a session.
Private Clouds isn't really a scheduler topic, so I suggest it takes its 
chances as a general session.  Hence my revised proposal for the three slots is:

  i) Group Scheduling / Smart Placement / Heat and Scheduling  (1), (2), (3),  
(7)
- How do you schedule something more complex that a single VM ?

ii) Generalized scheduling metrics / celiometer integration (4)
- How do we extend the set of resources a scheduler can use to make its 
decisions ?
- How do we make this work with  / compatible with Celiometer ?

iii) Scheduler Performance (6)
 
In that way we will at least give airtime to all of the topics. If a 4th 
scheduler slot becomes available then we could break up the first session into 
two parts.

Thoughts welcome here or in tomorrows IRC meeting.

Cheers,
Phil  

 









___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler meeting and Icehouse Summit

2013-10-15 Thread Day, Phil

Hi Debo,

 I was wondering if we are shooting for too much to discuss by clubbing.
 i) Group Scheduling / Smart Placement / Heat and Scheduling  (1), (2), (3),  
 (7)
- How do you schedule something more complex that a single VM ?

I agree it's a lot to get through,  but we're working to a budget of only 3 
slots for scheduler sessions.   If we split this across two slots as originally 
suggested by Gary, then we don't get to discuss one of generalized 
metrics/ceilometer  or performance at all - which would seem an even worse 
compromise to me.

We never going to be able to avoid session which have more content than can 
comfortably fit into a single slot - that's kind of just a way of life from the 
time constraints of the summit - what we're trying to do is make sure that we 
can plan those sessions ahead of time rather that the morning before at the 
Hotel ;-)

I did add a rider that if Russell can give a 4th session to scheduling then 
this is the one that would most benefit from being split.


Phil
-Original Message-
From: Debojyoti Dutta [mailto:ddu...@gmail.com] 
Sent: 15 October 2013 04:31
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Scheduler meeting and Icehouse Summit

Hi Phil

Good summary ...

I was wondering if we are shooting for too much to discuss by clubbing
 i) Group Scheduling / Smart Placement / Heat and Scheduling  (1), (2), (3),  
(7)
- How do you schedule something more complex that a single VM ?

I think specifying something more complex than a single VM is a great theme. 
But dont know if we can do justice in 1 session. I think maybe a simple nova 
scheduling API with groups/bundles of resources  itself would be a lot for 1 
session. In fact in order to specify what you want in your resources bundle, 
you would need to think about policies.
So maybe just the simple Nova API and policies might be useful.

Also we might have a session correlating the different models of how more than 
1 VM can be requested - you could start from nova and then generalize to cross 
services or you could start from heat workload models and drill down. There are 
passionate people on both sides and maybe that debate needs a session.

I think the smart resource placement is very interesting and might need at 
least 1/2 a slot since one can show how it can be done today in nova and how it 
can handle cross services scenarios.

See you tomorrow on IRC

debo


On Mon, Oct 14, 2013 at 10:56 AM, Alex Glikson glik...@il.ibm.com wrote:
 IMO, the three themes make sense, but I would suggest waiting until 
 the submission deadline and discuss at the following IRC meeting on the 22nd.
 Maybe there will be more relevant proposals to consider.

 Regards,
 Alex

 P.S. I plan to submit a proposal regarding scheduling policies, and 
 maybe one more related to theme #1 below



 From:Day, Phil philip@hp.com
 To:OpenStack Development Mailing List
 openstack-dev@lists.openstack.org,
 Date:14/10/2013 06:50 PM
 Subject:Re: [openstack-dev] Scheduler meeting and Icehouse Summit
 



 Hi Folks,

 In the weekly scheduler meeting we've been trying to pull together a 
 consolidated list of Summit sessions so that we can find logical 
 groupings and make a more structured set of sessions for the limited 
 time available at the summit.

 https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions

 With the deadline for sessions being this Thursday 17th, tomorrows IRC 
 meeting is the last chance to decide which sessions we want to combine /
 prioritize.Russell has indicated that a starting assumption of three
 scheduler sessions is reasonable, with any extras depending on what 
 else is submitted.

 I've matched the list on the Either pad to submitted sessions below, 
 and added links to any other proposed sessions that look like they are 
 related.


 1) Instance Group Model and API
Session Proposal:
 http://summit.openstack.org/cfp/details/190

 2) Smart Resource Placement:
   Session Proposal:
 http://summit.openstack.org/cfp/details/33
Possibly related sessions:  Resource
 optimization service for nova  
 (http://summit.openstack.org/cfp/details/201)

 3) Heat and Scheduling and Software, Oh My!:
 Session Proposal:
 http://summit.openstack.org/cfp/details/113

 4) Generic Scheduler Metrics and Celiometer:
 Session Proposal:
 http://summit.openstack.org/cfp/details/218
 Possibly related sessions:  Making Ceilometer and Nova 
 play nice  http://summit.openstack.org/cfp/details/73

 5) Image Properties and Host Capabilities
 Session Proposal:  NONE

 6) Scheduler Performance:
 Session Proposal:  NONE
 Possibly related Sessions: Rethinking Scheduler Design
 http://summit.openstack.org/cfp/details/34

 7) Scheduling Across Services

Re: [openstack-dev] Scheduler meeting and Icehouse Summit

2013-10-15 Thread Day, Phil

Hi Alex,

My understanding is that the 17th is the deadline and that Russell needs to be 
planning the sessions from that point onwards.  If we delay in giving him our 
suggestions until the 22nd I think it would be too late.We've had weeks if 
not months now of discussing possible scheduler sessions, I really don't see 
why we can't deliver a recommendation on how best to fit into the 3 committed 
slots on or before the 17th.

Phil

On Mon, Oct 14, 2013 at 10:56 AM, Alex Glikson glik...@il.ibm.com wrote:
 IMO, the three themes make sense, but I would suggest waiting until 
 the submission deadline and discuss at the following IRC meeting on the 22nd.
 Maybe there will be more relevant proposals to consider.

 Regards,
 Alex

 P.S. I plan to submit a proposal regarding scheduling policies, and 
 maybe one more related to theme #1 below



 From:Day, Phil philip@hp.com
 To:OpenStack Development Mailing List
 openstack-dev@lists.openstack.org,
 Date:14/10/2013 06:50 PM
 Subject:Re: [openstack-dev] Scheduler meeting and Icehouse Summit
 



 Hi Folks,

 In the weekly scheduler meeting we've been trying to pull together a 
 consolidated list of Summit sessions so that we can find logical 
 groupings and make a more structured set of sessions for the limited 
 time available at the summit.

 https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions

 With the deadline for sessions being this Thursday 17th, tomorrows IRC 
 meeting is the last chance to decide which sessions we want to combine /
 prioritize.Russell has indicated that a starting assumption of three
 scheduler sessions is reasonable, with any extras depending on what 
 else is submitted.

 I've matched the list on the Either pad to submitted sessions below, 
 and added links to any other proposed sessions that look like they are 
 related.


 1) Instance Group Model and API
Session Proposal:
 http://summit.openstack.org/cfp/details/190

 2) Smart Resource Placement:
   Session Proposal:
 http://summit.openstack.org/cfp/details/33
Possibly related sessions:  Resource
 optimization service for nova  
 (http://summit.openstack.org/cfp/details/201)

 3) Heat and Scheduling and Software, Oh My!:
 Session Proposal:
 http://summit.openstack.org/cfp/details/113

 4) Generic Scheduler Metrics and Celiometer:
 Session Proposal:
 http://summit.openstack.org/cfp/details/218
 Possibly related sessions:  Making Ceilometer and Nova 
 play nice  http://summit.openstack.org/cfp/details/73

 5) Image Properties and Host Capabilities
 Session Proposal:  NONE

 6) Scheduler Performance:
 Session Proposal:  NONE
 Possibly related Sessions: Rethinking Scheduler Design
 http://summit.openstack.org/cfp/details/34

 7) Scheduling Across Services:
 Session Proposal: NONE

 8) Private Clouds:
 Session Proposal:
 http://summit.openstack.org/cfp/details/228

 9) Multiple Scheduler Policies:
 Session Proposal: NONE


 The proposal from last weeks meeting was to use the three slots for:
 - Instance Group Model and API   (1)
 - Smart Resource Placement (2)
 - Performance (6)

 However, at the moment there doesn't seem to be a session proposed to 
 cover the performance work ?

 It also seems to me that the Group Model and Smart Placement are 
 pretty closely linked along with (3) (which says it wants to combine 1 
  2 into the same topic) , so if we only have three slots available then 
 these look like
 logical candidates for consolidating into a single session.That would
 free up a session to cover the generic metrics (4) and Ceilometer - 
 where a lot of work in Havana stalled because we couldn't get a 
 consensus on the way forward.  The third slot would be kept for 
 performance - which based on the lively debate in the scheduler meetings I'm 
 assuming will still be submitted
 as a session.Private Clouds isn't really a scheduler topic, so I suggest
 it takes its chances as a general session.  Hence my revised proposal 
 for the three slots is:

  i) Group Scheduling / Smart Placement / Heat and Scheduling  (1), 
 (2), (3),  (7)
 - How do you schedule something more complex that a 
 single VM ?

 ii) Generalized scheduling metrics / celiometer integration (4)
 - How do we extend the set of resources a scheduler 
 can use to make its decisions ?
 - How do we make this work with  / compatible with 
 Celiometer ?

 iii) Scheduler Performance (6)

 In that way we will at least give airtime to all of the topics. If a 4th
 scheduler slot becomes available then we could break up the first 
 session into two parts.

 Thoughts welcome here or in tomorrows IRC meeting

Re: [openstack-dev] Disable async network allocation

2013-10-24 Thread Day, Phil

Yep, that was the feature I was referring to.  

As I said I don't have anything defiant that shows this to be not working (and 
the code looks fine) - just wanted to try and simplify the world a bit for a 
while.

-Original Message-
From: Melanie Witt [mailto:melw...@yahoo-inc.com] 
Sent: 24 October 2013 02:48
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Disable async network allocation

On Oct 23, 2013, at 5:56 PM, Aaron Rosen aro...@nicira.com wrote:

 I believe he's referring to:
  https://github.com/openstack/nova/blob/master/nova/network/model.py#L335
 https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1211

I found some more background on the feature (not configurable) which might help 
in trying revert it for testing.

https://blueprints.launchpad.net/nova/+spec/async-network-alloc

There was also addition of config option 'network_allocate_retries' which 
defaults to 0:

https://review.openstack.org/#/c/34473/
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Disable async network allocation

2013-10-24 Thread Day, Phil

This is a quite interest findings. so If we use httplib, this won't happen?

That's my understanding.   It also looks like you might be able to configure 
the retrys in later versions of httplib2

-Original Message-
From: Nachi Ueno [mailto:na...@ntti3.com] 
Sent: 24 October 2013 00:38
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Disable async network allocation

Hi Phil

2013/10/21 Day, Phil philip@hp.com:
 Hi Folks,

 I'm trying to track down a couple of obsecure issues in network port 
 creation where it would be really useful if I could disable the async 
 network allocation so that everything happens in the context of a 
 single eventlet rather than two (and also rule out if there is some obscure
 eventlet threading issue in here).   I thought it was configurable - but I
 don't see anything obvious in the code to go back to the old (slower) 
 approach of doing network allocation in-lien in the main create thread ?

May I ask the meaning of   async network allocation ?

 One of the issues I'm trying to track is Neutron occasionally creating 
 more than one port - I suspect a retry mechanism in the httplib2 is 
 sending the port create request multiple times if  Neutron is slow to 
 reply, resulting in Neutron processing it multiple times.  Looks like 
 only the Neutron client has chosen to use httplib2 rather that httplib 
 - anyone got any insight here ?

This is a quite interest findings. so If we use httplib, this won't happen?

 Sometimes of course the Neutron timeout results in the create request 
 being re-scheduled onto another node (which can it turn generate its own set 
 of
 port create requests).Its the thread behavior around how the timeout
 exception is handled that I'm slightly nervous of (some of the retries 
 seem to occur after the original network thread should have terminated).

I agree. The kind of unintentional retry causes issues.

 Thanks

 Phil

Best
Nachi

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem

2013-10-25 Thread Day, Phil

Hi Folks,

We're very occasionally seeing problems where a thread processing a create 
hangs (and we've seen when taking to Cinder and Glance).  Whilst those issues 
need to be hunted down in their own rights, they do show up what seems to me to 
be a weakness in the processing of delete requests that I'd like to get some 
feedback on.

Delete is the one operation that is allowed regardless of the Instance state 
(since it's a one-way operation, and users should always be able to free up 
their quota).   However when we get a create thread hung in one of these 
states, the delete requests when they hit the manager will also block as they 
are synchronized on the uuid.   Because the user making the delete request 
doesn't see anything happen they tend to submit more delete requests.   The 
Service is still up, so these go to the computer manager as well, and 
eventually all of the threads will be waiting for the lock, and the compute 
manager will stop consuming new messages.

The problem isn't limited to deletes - although in most cases the change of 
state in the API means that you have to keep making different calls to get past 
the state checker logic to do it with an instance stuck in another state.   
Users also seem to be more impatient with deletes, as they are trying to free 
up quota for other things. 

So while I know that we should never get a thread into a hung state into the 
first place, I was wondering about one of the following approaches to address 
just the delete case:

i) Change the delete call on the manager so it doesn't wait for the uuid lock.  
Deletes should be coded so that they work regardless of the state of the VM, 
and other actions should be able to cope with a delete being performed from 
under them.  There is of course no guarantee that the delete itself won't block 
as well. 

ii) Record in the API server that a delete has been started (maybe enough to 
use the task state being set to DELETEING in the API if we're sure this doesn't 
get cleared), and add a periodic task in the compute manager to check for and 
delete instances that are in a DELETING state for more than some timeout. 
Then the API, knowing that the delete will be processes eventually can just 
no-op any further delete requests.

iii) Add some hook into the ServiceGroup API so that the timer could depend on 
getting a free thread from the compute manager pool (ie run some no-op task) - 
so that of there are no free threads then the service becomes down. That would 
(eventually) stop the scheduler from sending new requests to it, and make 
deleted be processed in the API server but won't of course help with commands 
for other instances on the same host.

iv) Move away from having a general topic and thread pool for all requests, and 
start a listener on an instance specific topic for each running instance on a 
host (leaving the general topic and pool just for creates and other 
non-instance calls like the hypervisor API).   Then a blocked task would only 
affect request for a specific instance.

I'm tending towards ii) as a simple and pragmatic solution in the near term, 
although I like both iii) and iv) as being both generally good enhancments - 
but iv) in particular feels like a pretty seismic change.

Thoughts please,

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem

2013-10-25 Thread Day, Phil

There may be multiple API servers; global state in an API server seems fraught 
with issues.

No, the state would be in the DB (it would either be a task_state of Deleteing 
or some new delete_stated_at timestamp

I agree that i) is nice and simple - it just has the minor risks that the 
delete itself could hang, and/or that we might find some other issues with bits 
of the code that can't cope at the moment with the instance being deleted from 
underneath them

-Original Message-
From: Robert Collins [mailto:robe...@robertcollins.net] 
Sent: 25 October 2013 12:21
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [nova] Thoughs please on how to address a problem 
with mutliple deletes leading to a nova-compute thread pool problem

On 25 October 2013 23:46, Day, Phil philip@hp.com wrote:
 Hi Folks,

 We're very occasionally seeing problems where a thread processing a create 
 hangs (and we've seen when taking to Cinder and Glance).  Whilst those issues 
 need to be hunted down in their own rights, they do show up what seems to me 
 to be a weakness in the processing of delete requests that I'd like to get 
 some feedback on.

 Delete is the one operation that is allowed regardless of the Instance state 
 (since it's a one-way operation, and users should always be able to free up 
 their quota).   However when we get a create thread hung in one of these 
 states, the delete requests when they hit the manager will also block as they 
 are synchronized on the uuid.   Because the user making the delete request 
 doesn't see anything happen they tend to submit more delete requests.   The 
 Service is still up, so these go to the computer manager as well, and 
 eventually all of the threads will be waiting for the lock, and the compute 
 manager will stop consuming new messages.

 The problem isn't limited to deletes - although in most cases the change of 
 state in the API means that you have to keep making different calls to get 
 past the state checker logic to do it with an instance stuck in another 
 state.   Users also seem to be more impatient with deletes, as they are 
 trying to free up quota for other things.

 So while I know that we should never get a thread into a hung state into the 
 first place, I was wondering about one of the following approaches to address 
 just the delete case:

 i) Change the delete call on the manager so it doesn't wait for the uuid 
 lock.  Deletes should be coded so that they work regardless of the state of 
 the VM, and other actions should be able to cope with a delete being 
 performed from under them.  There is of course no guarantee that the delete 
 itself won't block as well.

I like this.

 ii) Record in the API server that a delete has been started (maybe enough to 
 use the task state being set to DELETEING in the API if we're sure this 
 doesn't get cleared), and add a periodic task in the compute manager to check 
 for and delete instances that are in a DELETING state for more than some 
 timeout. Then the API, knowing that the delete will be processes eventually 
 can just no-op any further delete requests.

There may be multiple API servers; global state in an API server seems fraught 
with issues.

 iii) Add some hook into the ServiceGroup API so that the timer could depend 
 on getting a free thread from the compute manager pool (ie run some no-op 
 task) - so that of there are no free threads then the service becomes down. 
 That would (eventually) stop the scheduler from sending new requests to it, 
 and make deleted be processed in the API server but won't of course help with 
 commands for other instances on the same host.

This seems a little kludgy to me.

 iv) Move away from having a general topic and thread pool for all requests, 
 and start a listener on an instance specific topic for each running instance 
 on a host (leaving the general topic and pool just for creates and other 
 non-instance calls like the hypervisor API).   Then a blocked task would only 
 affect request for a specific instance.

That seems to suggest instance  # topics? Aieee. I don't think that solves the 
problem anyway, because either a) you end up with a tonne of threads, or b) you 
have a multiplexing thread with the same potential issue.

You could more simply just have a dedicated thread pool for deletes, and have 
no thread limit on the pool. Of course, this will fail when you OOM :). You 
could do a dict with instance - thread for deletes instead, without creating 
lots of queues.

 I'm tending towards ii) as a simple and pragmatic solution in the near term, 
 although I like both iii) and iv) as being both generally good enhancments - 
 but iv) in particular feels like a pretty seismic change.


My inclination would be (i) - make deletes nonblocking idempotent with lazy 
cleanup if resources take a while to tear down.

-Rob

--
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

Re: [openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem

2013-10-25 Thread Day, Phil

 -Original Message-
 From: Clint Byrum [mailto:cl...@fewbar.com]
 Sent: 25 October 2013 17:05
 To: openstack-dev
 Subject: Re: [openstack-dev] [nova] Thoughs please on how to address a
 problem with mutliple deletes leading to a nova-compute thread pool
 problem

 Excerpts from Day, Phil's message of 2013-10-25 03:46:01 -0700:
  Hi Folks,

  We're very occasionally seeing problems where a thread processing a
 create hangs (and we've seen when taking to Cinder and Glance).  Whilst
 those issues need to be hunted down in their own rights, they do show up
 what seems to me to be a weakness in the processing of delete requests
 that I'd like to get some feedback on.

  Delete is the one operation that is allowed regardless of the Instance state
 (since it's a one-way operation, and users should always be able to free up
 their quota).   However when we get a create thread hung in one of these
 states, the delete requests when they hit the manager will also block as they
 are synchronized on the uuid.   Because the user making the delete request
 doesn't see anything happen they tend to submit more delete requests.
 The Service is still up, so these go to the computer manager as well, and
 eventually all of the threads will be waiting for the lock, and the compute
 manager will stop consuming new messages.

  The problem isn't limited to deletes - although in most cases the change of
 state in the API means that you have to keep making different calls to get
 past the state checker logic to do it with an instance stuck in another state.
 Users also seem to be more impatient with deletes, as they are trying to free
 up quota for other things.

  So while I know that we should never get a thread into a hung state into
 the first place, I was wondering about one of the following approaches to
 address just the delete case:

  i) Change the delete call on the manager so it doesn't wait for the uuid 
  lock.
 Deletes should be coded so that they work regardless of the state of the VM,
 and other actions should be able to cope with a delete being performed from
 under them.  There is of course no guarantee that the delete itself won't
 block as well.

 Almost anything unexpected that isn't start the creation results in just
 marking an instance as an ERROR right? So this approach is actually pretty
 straight forward to implement. You don't really have to make other
 operations any more intelligent than they already should be in cleaning up
 half-done operations when they encounter an error. It might be helpful to
 suppress or de-prioritize logging of these errors when it is obvious that this
 result was intended.

  ii) Record in the API server that a delete has been started (maybe enough
 to use the task state being set to DELETEING in the API if we're sure this
 doesn't get cleared), and add a periodic task in the compute manager to
 check for and delete instances that are in a DELETING state for more than
 some timeout. Then the API, knowing that the delete will be processes
 eventually can just no-op any further delete requests.

 s/API server/database/ right? I like the coalescing approach where you no
 longer take up more resources for repeated requests.

Yep, the state is saved in the DB, but its set by the API server  - that's what 
I meant.
So it's not dependent on the manager getting the delete.

 I don't like the garbage collection aspect of this plan though.Garbage
 collection is a trade off of user experience for resources. If your GC thread
 gets too far behind your resources will be exhausted. If you make it too
 active, it wastes resources doing the actual GC. Add in that you have a
 timeout before things can be garbage collected and I think this becomes a
 very tricky thing to tune, and it may not be obvious it needs to be tuned 
 until
 you have a user who does a lot of rapid create/delete cycles.

The GC is just a backstop here - you always let the first delete message 
through 
so normally things work as they do now.   Its only if the delete message 
doesn't get
processed for some reason that the GC would kick in.   There are already
examples of this kind of clean-up in other periodic tasks.

  iii) Add some hook into the ServiceGroup API so that the timer could
 depend on getting a free thread from the compute manager pool (ie run
 some no-op task) - so that of there are no free threads then the service
 becomes down. That would (eventually) stop the scheduler from sending
 new requests to it, and make deleted be processed in the API server but
 won't of course help with commands for other instances on the same host.

 I'm not sure I understand this one.

At the moment the liveness of a service is determined by a separate thread
in the  ServiceGroup class - all it really shows is that something in the 
manager
is still running.   What I was thinking of is extending that so that it shows 
that 
the manager is still capable of doing something useful.   Doing some

Re: [openstack-dev] extending nova boot

2013-10-25 Thread Day, Phil

Hi Drew,

Generally you need to create a new api extention and make some changes in the 
main servers.py

The scheduler-hints API extension does this kind of thing, so if you look at:  
api/openstack/compute/contrib/scheduler_hints.py for how the extension is 
defined, and look  in api/poenstack/compute/servers.py code for 
scheduler_hints   (e.g. _extract_scheduler_hints()  ) then that should point 
you in the right direction.

Hope that helps,
Phil

 -Original Message-
 From: Drew Fisher [mailto:drew.fis...@oracle.com]
 Sent: 25 October 2013 16:34
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] extending nova boot
 
 Good morning!
 
 I am looking at extending nova boot with a few new flags.  I've found enough
 examples online that I have a working extension to novaclient (I can see the
 new flags in `nova help boot` and if I run with the --debug flag I can see the
 curl requests to the API have the data.
 
 What I can't seem to figure out is how nova-api processes these extra
 arguments.  With stable/grizzly bits, in
 nova/api/openstack/compute/servers.py, I can see where that data is
 processed (in Controller.create()) but it doesn't appear to me that any
 leftover flags are handled.
 
 What do I need to do to get these new flags to nova boot from novaclient
 into nova-api and ultimately my compute driver?
 
 Thanks for any help!
 
 -Drew Fisher
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem

2013-10-28 Thread Day, Phil

I’d disagree that that – from a user perspective they should always be able to 
delete an Instance regardless of its state, and the delete should always work 
(or at least always appear to work to the user so that it no longer counts 
against their quota, and they are no longer charged for it)

From: Abhishek Lahiri [mailto:aviost...@gmail.com]
Sent: 26 October 2013 17:10
To: OpenStack Development Mailing List
Cc: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [nova] Thoughs please on how to address a problem 
with mutliple deletes leading to a nova-compute thread pool problem

Deletes should only be allowed when the vm is in a power off state. This will 
allow consistent state transition.

Thanks
Al

On Oct 26, 2013, at 8:55 AM, Joshua Harlow 
harlo...@yahoo-inc.commailto:harlo...@yahoo-inc.com wrote:
I think I will try to have a unconference at the HK summit about ideas the 
cinder developers (and the taskflow developers, since it's not a concept that 
is unique /applicable to just cinder) are having about said state machine (and 
it's potential usage).

So look out for that, be interesting to have some nova folks involved there 
also :-)

Sent from my really tiny device...

On Oct 26, 2013, at 3:14 AM, Alex Glikson 
glik...@il.ibm.commailto:glik...@il.ibm.com wrote:
+1

Regards,
Alex

Joshua Harlow harlo...@yahoo-inc.commailto:harlo...@yahoo-inc.com wrote on 
26/10/2013 09:29:03 AM:

 An idea that others and I are having for a similar use case in
 cinder (or it appears to be similar).

 If there was a well defined state machine/s in nova with well
 defined and managed transitions between states then it seems like
 this state machine could resume on failure as well as be interrupted
 when a dueling or preemptable operation arrives (a delete while
 being created for example). This way not only would it be very clear
 the set of states and transitions but it would also be clear how
 preemption occurs (and under what cases).

 Right now in nova there is a distributed and ad-hoc state machine
 which if it was more formalized it could inherit some if the
 described useful capabilities. It would also be much more resilient
 to these types of locking problems that u described.

 IMHO that's the only way these types of problems will be fully be
 fixed, not by more queues or more periodic tasks, but by solidifying
  formalizing the state machines that compose the work nova does.

 Sent from my really tiny device...

  On Oct 25, 2013, at 3:52 AM, Day, Phil 
  philip@hp.commailto:philip@hp.com wrote:

  Hi Folks,

  We're very occasionally seeing problems where a thread processing
 a create hangs (and we've seen when taking to Cinder and Glance).
 Whilst those issues need to be hunted down in their own rights, they
 do show up what seems to me to be a weakness in the processing of
 delete requests that I'd like to get some feedback on.

  Delete is the one operation that is allowed regardless of the
 Instance state (since it's a one-way operation, and users should
 always be able to free up their quota).   However when we get a
 create thread hung in one of these states, the delete requests when
 they hit the manager will also block as they are synchronized on the
 uuid.   Because the user making the delete request doesn't see
 anything happen they tend to submit more delete requests.   The
 Service is still up, so these go to the computer manager as well,
 and eventually all of the threads will be waiting for the lock, and
 the compute manager will stop consuming new messages.

  The problem isn't limited to deletes - although in most cases the
 change of state in the API means that you have to keep making
 different calls to get past the state checker logic to do it with an
 instance stuck in another state.   Users also seem to be more
 impatient with deletes, as they are trying to free up quota for other things.

  So while I know that we should never get a thread into a hung
 state into the first place, I was wondering about one of the
 following approaches to address just the delete case:

  i) Change the delete call on the manager so it doesn't wait for
 the uuid lock.  Deletes should be coded so that they work regardless
 of the state of the VM, and other actions should be able to cope
 with a delete being performed from under them.  There is of course
 no guarantee that the delete itself won't block as well.

  ii) Record in the API server that a delete has been started (maybe
 enough to use the task state being set to DELETEING in the API if
 we're sure this doesn't get cleared), and add a periodic task in the
 compute manager to check for and delete instances that are in a
 DELETING state for more than some timeout. Then the API, knowing
 that the delete will be processes eventually can just no-op any
 further delete requests.

  iii) Add some hook into the ServiceGroup API so that the timer
 could depend on getting a free thread from the compute manager pool
 (ie run

Re: [openstack-dev] [Nova] Preserving ephemeral block device on rebuild?

2013-10-29 Thread Day, Phil

Hi Rob,

I think it looks like a good option -  but I'd like to see it exposed as such 
to the user rather than a change in the default behavior as such.  I.e. 
rebuild --keep-ephemenral=True 

Phil


 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 28 October 2013 02:00
 To: OpenStack Development Mailing List; Russell Bryant; Joe Gordon
 Subject: [openstack-dev] [Nova] Preserving ephemeral block device on
 rebuild?
 
 For context, in TripleO we have precious state to preserve when we push
 updates out to a cluster: nova instance volumes (obviously), swift data
 stores, mysql db's etc. We have a long term plan to have a volume model and
 interface that with Cinder, but thats an Ironic planed feature, and somewhat
 down the track : in the short term we'd like to use the ephemeral volume for
 such storage: it seems like 'nova rebuild' could easily be extended to
 preserve the ephemeral block device.
 
 From a nova bm perspective, all that needs to happen is for us to /not/
 format the volume - simples - and we can do that in the current rebuild code
 path where destroy + spawn is called, as long as we end up on the same
 host.
 
 However, we'd like to support this for libvirt too, because that lets us test
 workflows in virt rather than on purely baremetal (or emulated baremetal).
 For that, it looks to me like we need to push rebuild down a layer to the virt
 driver : so rather than destroy(); spawn(); have a
 rebuild() method that takes the same data spawn would, and will be able to
 preserve data as needed.
 
 Seeking thoughts on this - both the use of ephemeral in this way, and
 sketched out code change - are sought!
 
 Thanks,
 Rob
 
 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Bad review patterns

2013-11-06 Thread Day, Phil

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 06 November 2013 22:08
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] Bad review patterns

 On 6 November 2013 21:34, Radomir Dopieralski openst...@sheep.art.pl
 wrote:
  Hello,

 Secondly, this:

  Leaving a mark.
  ===

  You review a change and see that it is mostly fine, but you feel that
  since you did so much work reviewing it, you should at least find
  *something* wrong. So you find some nitpick and -1 the change just so
  that they know you reviewed it.

 Thats indeed not cool. Perhaps a 0 with the nitpick. On the other hand,
 perhaps the nitpick actually matters. If it's a nitpick it's going to be what 
 - a
 couple minutes to fix and push ?

 ... I think the real concern is that by pushing it up again you go to the 
 back of
 the queue for reviews, so maybe we should talk about that instead. We
 don't want backpressure on folk polishing a patch on request because of
 review latency.

  This is quite obvious. Just don't do it. It's OK to spend an hour
  reviewing something, and then leaving no comments on it, because it's
  simply fine, or because we had to means to test someting (see the
  first pattern).

 Core reviewers look for the /comments/ from people, not just the votes. A
 +1 from someone that isn't core is meaningless unless they are known to be
 a thoughtful code reviewer. A -1 with no comment is also bad, because it
 doesn't help the reviewee get whatever the issue is fixed.

 It's very much not OK to spend an hour reviewing something and then +1
 with no comment: if I, and I think any +2er across the project see a patch 
 that
 needs an hour of review, with a commentless +1, we'll likely discount the +1
 as being meaningful.

I don't really get what you're saying here Rob.   It seems to me an almost 
inevitable
part of the review process that useful comments are going to be mostly negative.
If someone has invested that amount of effort because the patch is complex, or
It took then a while to work their way back into that part of the systems, etc, 
but 
having given the code careful consideration they decide it's good - do you want
comments in there saying I really like your code, Well done on fixing such a 

complex problem or some such ?

I just don't see how you can use a lack or presence of positive feedback in a 
+1 as 
any sort of indication on the quality of that +1.   Unless you start asking 
reviewers
to précis the change in their own words to show that they understood it I don't 
really see how additional positive comments help in most cases.   Perhaps if you
have some specific examples of this it would help to clarify

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] What key strings can we set in scheduler_hints param when boot an instance?

2013-11-07 Thread Day, Phil

The hints are coded into the various scheduler filters, so the set supported on 
any install depends on what filters have been  configured.

I have a change under way (I need to just find the time to go back and fix the 
last wave of review comments) to expose what is supported via an API call

https://review.openstack.org/#/c/34291/


From: openstack learner [mailto:openstacklea...@gmail.com]
Sent: 06 November 2013 20:01
To: openstack-dev@lists.openstack.org; openst...@lists.openstack.org
Subject: [openstack-dev] What key strings can we set in scheduler_hints param 
when boot an instance?

Hi all,
I am using the nova python api and recently i need to use the filter schedule 
hint when i boot up an instance.  In the 
novaclient.v1_1.client.Client.servers.create() method, there is a
:param scheduler_hints: (optional extension) arbitrary key-value pairs
  specified by the client to help boot an instance

which we can specify the key-value pairs to help boot an instance.
However, i don't know what key string can I specify in my key-values pairs. I 
search online but did not get any information about that?  Is there any 
document that list all the keystrings we can specify in the scheduler_hints?  I 
would like to have a list of all the keys that we can specify in the 
scheduler_hints.
Thanks a lot
xin
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Nova] Recent change breaks manual control of service enabled / disabled status - suggest it is backed out and re-worked

2013-11-11 Thread Day, Phil

Hi Folks,

I'd like to get some eyes on a bug I just filed:  
https://bugs.launchpad.net/nova/+bug/1250049

A recent change (https://review.openstack.org/#/c/52189/9 ) introduced the 
automatic disable / re-enable of nova-compute when connection to libvirt is 
lost and recovered.   The problem is that it doesn't take any account of the 
fact that a cloud administrator may have other reasons for disabling a service, 
and always put nova-compute back into an enabled state.

The impact of this is pretty big for us - at any point in time we have a number 
of servers disabled for various operational reasons, and there are times when 
we need to restart libvirt as part of a deployment.  With this change in place 
all of those hosts are returned to an enabled state, and the reason that they 
were disabled is lost.

While I like the concept that an error condition like this should disable the 
host from a scheduling perspective, I think it needs to be implemented as an 
additional form of disablement (i.e a separate value kept in the ServiceGroup 
API), not an override of the current one.

I'd like to propose that the current change is reverted as a priority, and a 
new approach then submitted as a second step that works alongside the current 
enable /disable reason.

Sorry for not catching this in the review stage - I didn't notice this one at 
all.

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Maintaining backwards compatibility for RPC calls

2013-11-27 Thread Day, Phil

Hi Folks,

I'm a bit confused about the expectations of a manager class to be able to 
receive and process messages from a previous RPC version.   I thought the 
objective was to always make changes such that the manage can process any 
previous version of the call  that could come from the last release,  For 
example Icehouse code should be able to receive any version that could be 
generated by a version of Havana.   Generally of course that means new 
parameters have to have a default value.

I'm kind of struggling then to see why we've now removed, for example, the 
default values for example from terminate_instance() as part of moving to RPC 
version to 3.0:

def terminate_instance(self, context, instance, bdms=None, reservations=None):

def terminate_instance(self, context, instance, bdms, reservations):

https://review.openstack.org/#/c/54493/

Doesn't this mean that you can't deploy Icehouse (3.0) code into a Havana 
system but leave the RPC version pinned at Havana until all of the code has 
been updated ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology

2013-12-03 Thread Day, Phil

Hi,

I think the concept of allowing users to request a cpu topology, but have a few 
questions / concerns:

 
 The host is exposing info about vCPU count it is able to support and the
 scheduler picks on that basis. The guest image is just declaring upper limits 
 on
 topology it can support. So If the host is able to support the guest's vCPU
 count, then the CPU topology decision should never cause any boot failure
 As such CPU topology has no bearing on scheduling, which is good because I
 think it would significantly complicate the problem.
 

i) Is that always true ?Some configurations (like ours) currently ignore 
vcpu count altogether because what we're actually creating are VMs that are n 
vcpus wide (as defined by the flavour) but each vcpu is only some subset of the 
processing capacity of a physical core (There was a summit session on this: 
http://summit.openstack.org/cfp/details/218).  So if vcpu count isn't being 
used for scheduling, can you still guarantee that all topology selections can 
always be met ?

ii) Even if you are counting vcpus and mapping them 1:1 against cores, are 
there not some topologies that are either more inefficient in terms of overall 
host usage and /or incompatible with other topologies (i.e. leave some (spare) 
resource un-used in way that it can't be used for a specific topology that 
would otherwise fit) ? As a provider I don't want users to be able to 
determine how efficiently (even indirectly) the hosts are utilised.   There 
maybe some topologies that I'm willing to allow (because they always pack 
efficiently) and others I would never allow.   Putting this into the control of 
the users via image metadata feels wrong in that case. Maybe flavour 
extra-spec (which is in the control of the cloud provider) would be a more 
logical fit for this kind of property ?

iii) I can see the logic of associating a topology with an image - but don't 
really understand how that would fit with the image being used with different 
flavours.  What happens if a topology in the image just can't be implemented 
within the constraints of a selected flavour ?It kind of feels as if we 
either need a way to constrain images to specific flavours, or perhaps allow an 
image to express a preferred flavour / topology, but allow the user to override 
these as part of the create request.

Cheers,
Phil



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Splitting up V3 API admin-actions plugin

2013-12-03 Thread Day, Phil

+1 from me - would much prefer to be able to pick this on an individual basis.

Could kind of see a case for keeping reset_network and inject_network_info 
together - but don't have a strong feeling about it (as we don't use them)

 -Original Message-
 From: Andrew Laski [mailto:andrew.la...@rackspace.com]
 Sent: 02 December 2013 14:59
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Splitting up V3 API admin-actions plugin

 On 12/02/13 at 08:38am, Russell Bryant wrote:
 On 12/01/2013 08:39 AM, Christopher Yeoh wrote:
  Hi,

  At the summit we agreed to split out lock/unlock, pause/unpause,
  suspend/unsuspend functionality out of the V3 version of admin
  actions into separate extensions to make it easier for deployers to
  only have loaded the functionality that they want.

  Remaining in admin_actions we have:

  migrate
  live_migrate
  reset_network
  inject_network_info
  create_backup
  reset_state

  I think it makes sense to separate out migrate and live_migrate into
  a migrate plugin as well.

  What do people think about the others? There is no real overhead of
  having them in separate plugins and totally remove admin_actions.
  Does anyone have any objections from this being done?

  Also in terms of grouping I don't think any of the others remaining
  above really belong together, but welcome any suggestions.

 +1 to removing admin_actions and splitting everything out.

 +1 from me as well.

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology

2013-12-03 Thread Day, Phil

Hi Daniel,

I spent some more time reading your write up on the wiki (and it is a great 
write up BTW), and had a couple of further questions (I think my original ones 
are also still valid, but do let me know if / where I'm missing the point):

iv) In the worked example where do the preferred_topology and 
mandatory_topology come from ?  (For example are these per host configuration 
values)

v) You give an example where its possible to get the situation where the 
combination of image_hw_cpu_topology, flavour means the instance can't be 
created (vcpus=2048) but that looks more like a flavour misconfiguration 
(unless there is some node that does have that many vcpus).   The case that 
worries me more is where, for example an image says it need max-sockets=1 and 
the flavour says it needs more vcpus that can be provided from a single socket. 
  In this case the flavour is still valid, just not with this particular image 
- and that feels like a case that should fail validation at the API layer, not 
down on the compute node where the only option is to reschedule or go into an 
Error state.

Phil  


 -Original Message-
 From: Day, Phil
 Sent: 03 December 2013 12:03
 To: 'Daniel P. Berrange'; OpenStack Development Mailing List (not for usage
 questions)
 Subject: RE: [openstack-dev] [Nova] Blueprint: standard specification of
 guest CPU topology
 
 Hi,
 
 I think the concept of allowing users to request a cpu topology, but have a
 few questions / concerns:
 
 
  The host is exposing info about vCPU count it is able to support and
  the scheduler picks on that basis. The guest image is just declaring
  upper limits on topology it can support. So If the host is able to
  support the guest's vCPU count, then the CPU topology decision should
  never cause any boot failure As such CPU topology has no bearing on
  scheduling, which is good because I think it would significantly complicate
 the problem.
 
 
 i) Is that always true ?Some configurations (like ours) currently ignore 
 vcpu
 count altogether because what we're actually creating are VMs that are n
 vcpus wide (as defined by the flavour) but each vcpu is only some subset of
 the processing capacity of a physical core (There was a summit session on
 this: http://summit.openstack.org/cfp/details/218).  So if vcpu count isn't
 being used for scheduling, can you still guarantee that all topology 
 selections
 can always be met ?
 
 ii) Even if you are counting vcpus and mapping them 1:1 against cores, are
 there not some topologies that are either more inefficient in terms of overall
 host usage and /or incompatible with other topologies (i.e. leave some
 (spare) resource un-used in way that it can't be used for a specific topology
 that would otherwise fit) ? As a provider I don't want users to be able to
 determine how efficiently (even indirectly) the hosts are utilised.   There
 maybe some topologies that I'm willing to allow (because they always pack
 efficiently) and others I would never allow.   Putting this into the control 
 of
 the users via image metadata feels wrong in that case. Maybe flavour
 extra-spec (which is in the control of the cloud provider) would be a more
 logical fit for this kind of property ?
 
 iii) I can see the logic of associating a topology with an image - but don't 
 really
 understand how that would fit with the image being used with different
 flavours.  What happens if a topology in the image just can't be implemented
 within the constraints of a selected flavour ?It kind of feels as if we 
 either
 need a way to constrain images to specific flavours, or perhaps allow an
 image to express a preferred flavour / topology, but allow the user to
 override these as part of the create request.
 
 Cheers,
 Phil
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Nova] Core sponsors wanted for BP user defined shutdown

2013-12-04 Thread Day, Phil

Hi Nova cores,

As per the discussion at the Summit I need two (or more) nova cores to sponsor 
the BP that allows Guests a chance to shutdown cleanly rather than just yanking 
the virtual power cord out  -which is approved and targeted for I2

https://review.openstack.org/#/c/35303/

The Non API aspect of this is has been kicking around for a while now (on patch 
set 30), and passing all of the tests etc (The change in timing was upsetting 
some of the long running Tempest tests but this has now been fixed) - and as 
far as I know there are no outstanding issues to be addressed.   Would be 
really nice to get this landed now before it needs another rebase.

The API aspect is also under development and on target to be available for 
review soon.

Any takers ?

Cheers
Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler

2014-06-12 Thread Day, Phil

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 09 June 2014 19:03
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory
 allocation ratio out of scheduler

 On 06/09/2014 12:32 PM, Chris Friesen wrote:
  On 06/09/2014 07:59 AM, Jay Pipes wrote:
  On 06/06/2014 08:07 AM, Murray, Paul (HP Cloud) wrote:
  Forcing an instance to a specific host is very useful for the
  operator - it fulfills a valid use case for monitoring and testing
  purposes.

  Pray tell, what is that valid use case?

  I find it useful for setting up specific testcases when trying to
  validate thingsput *this* instance on *this* host, put *those*
  instances on *those* hosts, now pull the power plug on *this* host...etc.

 So, violating the main design tenet of cloud computing: though shalt not care
 what physical machine your virtual machine lives on. :)

  I wouldn't expect the typical openstack end-user to need it though.

 Me either :)

But the full set of system capabilities isn't only about things that an 
end-user needs - there are also admin features we need to include.

Another use case for this is to place a probe instance on specific hosts to 
help monitor specific aspects of the system performance from a VM perspective.

 I will point out, though, that it is indeed possible to achieve the same use
 case using host aggregates that would not break the main design tenet of
 cloud computing... just make two host aggregates, one for each compute
 node involved in your testing, and then simply supply scheduler hints that
 would only match one aggregate or the other.

Even I wouldn't argue that aggregates are a great solution here ;-)   Not only 
does having single node aggregates for every host you want to force to seem a 
tad overkill, the logic for this admin feature includes by-passing the normal 
scheduler filters, 

 Best,
 -jay

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Review guidelines for API patches

2014-06-13 Thread Day, Phil

Hi Chris,

The documentation is NOT the canonical source for the behaviour of the API, 
currently the code should be seen as the reference. We've run into issues 
before where people have tried to align code to the fit the documentation and 
made backwards incompatible changes (although this is not one).

I’ve never seen this defined before – is this published as official Openstack  
or Nova policy ?

Personally I think we should be putting as much effort into reviewing the API 
docs as we do API code so that we can say that the API docs are the canonical 
source for behavior.Not being able to fix bugs in say input validation that 
escape code reviews because they break backwards compatibility seems to be a 
weakness to me.


Phil



From: Christopher Yeoh [mailto:cbky...@gmail.com]
Sent: 13 June 2014 04:00
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Review guidelines for API patches

On Fri, Jun 13, 2014 at 11:28 AM, Matt Riedemann 
mrie...@linux.vnet.ibm.commailto:mrie...@linux.vnet.ibm.com wrote:


On 6/12/2014 5:58 PM, Christopher Yeoh wrote:
On Fri, Jun 13, 2014 at 8:06 AM, Michael Still 
mi...@stillhq.commailto:mi...@stillhq.com
mailto:mi...@stillhq.commailto:mi...@stillhq.com wrote:

In light of the recent excitement around quota classes and the
floating ip pollster, I think we should have a conversation about the
review guidelines we'd like to see for API changes proposed against
nova. My initial proposal is:

  - API changes should have an associated spec


+1

  - API changes should not be merged until there is a tempest change to
test them queued for review in the tempest repo


+1

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

We do have some API change guidelines here [1].  I don't want to go overboard 
on every change and require a spec if it's not necessary, i.e. if it falls into 
the 'generally ok' list in that wiki.  But if it's something that's not 
documented as a supported API (so it's completely new) and is pervasive (going 
into novaclient so it can be used in some other service), then I think that 
warrants some spec consideration so we don't miss something.

To compare, this [2] is an example of something that is updating an existing 
API but I don't think warrants a blueprint since I think it falls into the 
'generally ok' section of the API change guidelines.

So really I see this a new feature, not a bug fix. Someone thought that detail 
was supported when writing the documentation but it never was. The 
documentation is NOT the canonical source for the behaviour of the API, 
currently the code should be seen as the reference. We've run into issues 
before where people have tried to align code to the fit the documentation and 
made backwards incompatible changes (although this is not one).

Perhaps we need a streamlined queue for very simple API changes, but I do think 
API changes should get more than the usual review because we have to live with 
them for so long (short of an emergency revert if we catch it in time).

[1] https://wiki.openstack.org/wiki/APIChangeGuidelines
[2] https://review.openstack.org/#/c/99443/

--

Thanks,

Matt Riedemann



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Review guidelines for API patches

2014-06-13 Thread Day, Phil

I agree that we need to keep a tight focus on all API changes.

However was the problem with the floating IP change just to do with the 
implementation in Nova or the frequency with which Ceilometer was calling it ?  
   Whatever guildelines we follow on API changes themselves its pretty hard to 
protect against the impact of a system with admin creds putting a large load 
onto the system.

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 12 June 2014 23:36
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] Review guidelines for API patches
 
 In light of the recent excitement around quota classes and the floating ip
 pollster, I think we should have a conversation about the review guidelines
 we'd like to see for API changes proposed against nova. My initial proposal 
 is:
 
  - API changes should have an associated spec
 
  - API changes should not be merged until there is a tempest change to test
 them queued for review in the tempest repo
 
 Thoughts?
 
 Michael
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil

Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones - which seems fair enough I guess (you can't drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn't it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn't support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn't seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )


It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil

Theoretically impossible to reduce disk unless you have some really nasty 
guest additions.

That’s what I thought – but many of the drivers seem to at least partially 
support it based on the code, hence the question on here to find out of that is 
really supported and works – or is just inconsistent error checking across 
drivers.

From: Aryeh Friedman [mailto:aryeh.fried...@gmail.com]
Sent: 13 June 2014 11:12
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

Theoretically impossible to reduce disk unless you have some really nasty guest 
additions.

On Fri, Jun 13, 2014 at 6:02 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones – which seems fair enough I guess (you can’t drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn’t it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn’t support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn’t seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )


It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil

I guess the question I'm really asking here is:  Since we know resize down 
won't work in all cases, and the failure if it does occur will be hard for the 
user to detect, should we just block it at the API layer and be consistent 
across all Hypervisors ?

From: Andrew Laski [mailto:andrew.la...@rackspace.com]
Sent: 13 June 2014 13:57
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

On 06/13/2014 08:03 AM, Day, Phil wrote:
Theoretically impossible to reduce disk unless you have some really nasty 
guest additions.

That's what I thought - but many of the drivers seem to at least partially 
support it based on the code, hence the question on here to find out of that is 
really supported and works - or is just inconsistent error checking across 
drivers.

My grumpy dev answer is that what works is not resizing down.  I'm familiar 
with the xen driver resize operation and will say that it does work when the 
guest filesystem and partition sizes are accommodating, but there's no good way 
to know whether or not it will succeed without actually trying it.  So when it 
fails it's after someone was waiting on a resize that seemed like it was 
working and then suddenly didn't.

If we want to aim for what's going to work consistently across drivers, it's 
probably going to end up being not resizing disks down.

From: Aryeh Friedman [mailto:aryeh.fried...@gmail.com]
Sent: 13 June 2014 11:12
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

Theoretically impossible to reduce disk unless you have some really nasty guest 
additions.

On Fri, Jun 13, 2014 at 6:02 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones - which seems fair enough I guess (you can't drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn't it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn't support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn't seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )

It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org

___

OpenStack-dev mailing list

OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][ironic] what to do with unit test failures from ironic api contract

2014-06-13 Thread Day, Phil

Hi Folks,

A recent change introduced a unit test to warn/notify developers when they 
make a change which will break the out of tree Ironic virt driver:   
https://review.openstack.org/#/c/98201

Ok - so my change (https://review.openstack.org/#/c/68942) broke it as it adds 
some extra parameters to the virt drive power_off() method - and so I now feel 
suitable warned and notified - but am not really clear what I'm meant to do 
next.

So far I've:

-  Modified the unit test in my Nova patch so it now works

-  Submitted an Ironic patch to add the extra parameters 
(https://review.openstack.org/#/c/99932/)

As far as I can see there's no way to create a direct dependency from the 
Ironic change to my patch - so I guess its down to the Ironic folks to wait and 
accept it in the correct sequence ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][ironic] what to do with unit test failures from ironic api contract

2014-06-16 Thread Day, Phil

From: David Shrewsbury [mailto:shrewsbury.d...@gmail.com]
Sent: 14 June 2014 02:10
To: OpenStack Development Mailing List (not for usage questions)
Cc: Shrewsbury, David; Van Der Veen, Devananda
Subject: Re: [openstack-dev] [nova][ironic] what to do with unit test failures 
from ironic api contract

Hi!

On Fri, Jun 13, 2014 at 9:30 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

A recent change introduced a unit test to “warn/notify developers” when they 
make a change which will break the out of tree Ironic virt driver:   
https://review.openstack.org/#/c/98201

Ok – so my change (https://review.openstack.org/#/c/68942) broke it as it adds 
some extra parameters to the virt drive power_off() method – and so I now feel 
suitable warned and notified – but am not really clear what I’m meant to do 
next.

So far I’ve:

-  Modified the unit test in my Nova patch so it now works

-  Submitted an Ironic patch to add the extra parameters 
(https://review.openstack.org/#/c/99932/)

As far as I can see there’s no way to create a direct dependency from the 
Ironic change to my patch – so I guess its down to the Ironic folks to wait and 
accept it in the correct sequence ?

Thanks for bringing up this question.

98201 was added at the suggestion of Sean Dague during a conversation
in #openstack-infra to help prevent terrible breakages that affect the gate.
What wasn't discussed, however, is how we should coordinate these changes
going forward.

As for your change, I think what you've done is exactly what we had hoped would
be done. In your particular case, I don't see any need for Nova dev's to not go 
ahead
and approve 68942 *before* 99932 since defaults are added to the arguments. The
question is, how do we coordinate such changes if a change DOES actually break
ironic?

One suggestion is that if 
test_ironic_api_contracts.pyhttps://review.openstack.org/#/c/68942/15/nova/tests/virt/test_ironic_api_contracts.py
 is ever changed, Nova require
the Ironic PTL (or a core dev) to vote before approving. That seems sensible to 
me.
There might be an easier way of coordinating that I'm overlooking, though.

-Dave
--
David Shrewsbury (Shrews)

Hi Dave,

I agree that co-ordination is the key here – if the Ironic change is approved 
first then Nova and Ironic will continue to work, but there is a risk that the 
Nova change gets blocked / modified after the Ironic commit which would be 
painful.

If the Nova change is committed first then Ironic will of course be broken 
until its change is committed.

I’ll add a pointer and a note to the corresponding change in each of the 
patches.

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-18 Thread Day, Phil

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 17 June 2014 15:57
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?

 On 06/17/2014 10:43 AM, Richard W.M. Jones wrote:
  On Fri, Jun 13, 2014 at 06:12:16AM -0400, Aryeh Friedman wrote:
  Theoretically impossible to reduce disk unless you have some really
  nasty guest additions.

  True for live resizing.

  For dead resizing, libguestfs + virt-resize can do it.  Although I
  wouldn't necessarily recommend it.  In almost all cases where someone
  wants to shrink a disk, IMHO it is better to sparsify it instead (ie.
  virt-sparsify).

 FWIW, the resize operation in OpenStack is a dead one.

Dead as in not supported in V3 ?

How does that map into the plans to implement V2.1 on top of V3 ?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] locked instances and snaphot

2014-06-18 Thread Day, Phil

 -Original Message-
 From: Ahmed RAHAL [mailto:ara...@iweb.com]
 Sent: 18 June 2014 01:21
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] locked instances and snaphot

 Hi there,

 Le 2014-06-16 15:28, melanie witt a écrit :
  Hi all,

 [...]

  During the patch review, a reviewer raised a concern about the purpose
  of instance locking and whether prevention of snapshot while an
  instance is locked is appropriate. From what we understand, instance
  lock is meant to prevent unwanted modification of an instance. Is
  snapshotting considered a logical modification of an instance? That
  is, if an instance is locked to a user, they take a snapshot, create
  another instance using that snapshot, and modify the instance, have
  they essentially modified the original locked instance?

  I wanted to get input from the ML on whether it makes sense to
  disallow snapshot an instance is locked.

 Beyond 'preventing accidental change to the instance', locking could be seen
 as 'preventing any operation' to the instance.
 If I, as a user, lock an instance, it certainly only prevents me from 
 accidentally
 deleting the VM. As I can unlock whenever I need to, there seems to be no
 other use case (chmod-like).

It bocks any operation that would stop the instance from changing state:  
Delete, stop, start, reboot, rebuild, resize, shelve, pause, resume, etc

In keeping with that I don't see why it should block a snapshot, and having to 
unlock it to take a snapshot doesn't feel good either. 

 If I, as an admin, lock an instance, I am preventing operations on a VM and
 am preventing an ordinary user from overriding the lock.

The driver for doing this as an admin is slightly different - its to stop the 
user from changing the state of an instance rather than a protection.   A 
couple of use cases:
- if you want to migrate a VM and the user is running a continual 
sequence of say reboot commands at it putting an admin lock in place gives you 
a way to break into that cycle.
- There are a few security cases where we need to take over control of 
an instance, and make sure it doesn't get deleted by the user

 This is a form of authority enforcing that maybe should prevent even
 snapshots to be taken off that VM. The thing is that enforcing this beyond
 the limits of nova is AFAIK not there, so cloning/snapshotting cinder volumes
 will still be feasible.
 Enforcing it only in nova as a kind of 'security feature' may become
 misleading.

 The more I think about it, the more I get to think that locking is just there 
 to
 avoid mistakes, not voluntary misbehaviour.

 --

 Ahmed

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-18 Thread Day, Phil

 -Original Message-
 From: Richard W.M. Jones [mailto:rjo...@redhat.com]
 Sent: 18 June 2014 12:32
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?

 On Wed, Jun 18, 2014 at 11:05:01AM +, Day, Phil wrote:
   -Original Message-
   From: Russell Bryant [mailto:rbry...@redhat.com]
   Sent: 17 June 2014 15:57
   To: OpenStack Development Mailing List (not for usage questions)
   Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk
   reduction as part of resize ?

   On 06/17/2014 10:43 AM, Richard W.M. Jones wrote:
On Fri, Jun 13, 2014 at 06:12:16AM -0400, Aryeh Friedman wrote:
Theoretically impossible to reduce disk unless you have some
really nasty guest additions.

True for live resizing.

For dead resizing, libguestfs + virt-resize can do it.  Although
I wouldn't necessarily recommend it.  In almost all cases where
someone wants to shrink a disk, IMHO it is better to sparsify it instead
 (ie.
virt-sparsify).

   FWIW, the resize operation in OpenStack is a dead one.

  Dead as in not supported in V3 ?

 dead as in not live resizing, ie. it happens only on offline disk images.

 Rich.

Ah, thanks.  I was thinking of dead as in it is an ex-operation, it has 
ceased to be, ... ;-)

There seems to be a consensus towards this being treated as an error - so I'll 
raise a spec.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] bp: nova-ecu-support

2014-06-24 Thread Day, Phil

The basic framework for supporting this kind of resource scheduling is the 
extensible-resource-tracker:

https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking
https://review.openstack.org/#/c/86050/
https://review.openstack.org/#/c/71557/

Once that lands being able schedule on arbitrary resources (such as an ECU) 
becomes a lot easier to implement.

Phil

 -Original Message-
 From: Kenichi Oomichi [mailto:oomi...@mxs.nes.nec.co.jp]
 Sent: 03 February 2014 09:37
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: [openstack-dev] [Nova] bp: nova-ecu-support
 
 Hi,
 
 There is a blueprint ECU[1], and that is an interesting idea for me.
 so I'd like to know the comments about ECU idea.
 
 After production environments start, the operators will need to add
 compute nodes before exhausting the capacity.
 On the scenario, they'd like to add cost-efficient machines as the compute
 node at the time. So the production environments will consist of different
 performance compute nodes. Also they hope to provide the same
 performance virtual machines on different performance nodes if specifying
 the same flavor.
 
 Now nova contains flavor_extraspecs[2] which can customize the cpu
 bandwidth for each flavor:
  # nova flavor-key m1.low_cpu set quota:cpu_quota=1  # nova flavor-
 key m1.low_cpu set quota:cpu_period=2
 
 However, this feature can not provide the same vm performance on
 different performance node, because this arranges the vm performance
 with the same ratio(cpu_quota/cpu_period) only even if the compute node
 performances are different. So it is necessary to arrange the different ratio
 based on each compute node performance.
 
 Amazon EC2 has ECU[3] already for implementing this, and the blueprint [1]
 is also for it.
 
 Any thoughts?
 
 
 Thanks
 Ken'ichi Ohmichi
 
 ---
 [1]: https://blueprints.launchpad.net/nova/+spec/nova-ecu-support
 [2]: http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-
 to-openstack-compute.html#customize-flavors
 [3]: http://aws.amazon.com/ec2/faqs/  Q: What is a EC2 Compute Unit
 and why did you introduce it?
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-24 Thread Day, Phil

 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: 23 June 2014 10:35
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?

 On 18 June 2014 21:57, Jay Pipes jaypi...@gmail.com wrote:
  On 06/17/2014 05:42 PM, Daniel P. Berrange wrote:

  On Tue, Jun 17, 2014 at 04:32:36PM +0100, Pádraig Brady wrote:

  On 06/13/2014 02:22 PM, Day, Phil wrote:

  I guess the question I’m really asking here is:  “Since we know
  resize down won’t work in all cases, and the failure if it does
  occur will be hard for the user to detect, should we just block it
  at the API layer and be consistent across all Hypervisors ?”

  +1

  There is an existing libvirt blueprint:

  https://blueprints.launchpad.net/nova/+spec/libvirt-resize-disk-down
  which I've never been in favor of:
 https://bugs.launchpad.net/nova/+bug/1270238/comments/1

  All of the functionality around resizing VMs to match a different
  flavour seem to be a recipe for unleashing a torrent of unfixable
  bugs, whether resizing disks, adding CPUs, RAM or any other aspect.

  +1

  I'm of the opinion that we should plan to rip resize functionality out
  of (the next major version of) the Compute API and have a *single*,
  *consistent* API for migrating resources. No more API extension X for
  migrating this kind of thing, and API extension Y for this kind of
  thing, and API extension Z for migrating /live/ this type of thing.

  There should be One move API to Rule Them All, IMHO.

 +1 for one move API, the two evolved independently, in different
 drivers, its time to unify them!

 That plan got stuck behind the refactoring of live-migrate and migrate to the
 conductor, to help unify the code paths. But it kinda got stalled (I must
 rebase those patches...).

 Just to be clear, I am against removing resize down from v2 without a
 deprecation cycle. But I am pro starting that deprecation cycle.

 John

I'm not sure Daniel and Jay are arguing for the same thing here John:  I 
*think*  Daniel is saying drop resize altogether and Jay is saying unify it 
with migration - so I'm a tad confused which of those you're agreeing with.

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-24 Thread Day, Phil

Hi Michael,

Not sure I understand the need for a gap between Juno Spec approval freeze 
(Jul 10th) and K opens for spec proposals (Sep 4th).I can understand that 
K specs won't get approved in that period, and may not get much feedback from 
the cores - but I don't see the harm in letting specs be submitted to the K 
directory for early review / feedback during that period ?  

Phil

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 24 June 2014 09:59
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] Timeline for the rest of the Juno release
 
 Hi, this came up in the weekly release sync with ttx, and I think its worth
 documenting as clearly as possible.
 
 Here is our proposed timeline for the rest of the Juno release. This is
 important for people with spec proposals either out for review, or intending
 to be sent for review soon.
 
 (The numbers if brackets are weeks before the feature freeze).
 
 Jun 12 (-12): Juno-1
 Jun 25 (-10): Spec review day
 (https://etherpad.openstack.org/p/nova-juno-spec-priorities)
 
 Jul  3 (-9): Spec proposal freeze
 Jul 10 (-8): Spec approval freeze
 Jul 24 (-6): Juno-2
 Jul 28 (-5): Nova mid cycle meetup
 (https://wiki.openstack.org/wiki/Sprints/BeavertonJunoSprint)
 
 Aug 21 (-2): Feature proposal freeze
 
 Sep  4 ( 0): Juno-3
  Feature freeze
  Merged J specs with no code proposed get deleted from nova-specs
 repo
  K opens for spec proposals, unmerged J spec proposals must rebase
 Sep 25 (+3): RC 1 build expected
  K spec review approvals start
 
 Oct 16 (+6): Release!
 (https://wiki.openstack.org/wiki/Juno_Release_Schedule)
 Oct 30: K summit spec proposal freeze
 
 Nov  6: K design summit
 
 Cheers,
 Michael
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-24 Thread Day, Phil

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 24 June 2014 13:08
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

 On 06/24/2014 07:35 AM, Michael Still wrote:
  Phil -- I really want people to focus their efforts on fixing bugs in
  that period was the main thing. The theory was if we encouraged people
  to work on specs for the next release, then they'd be distracted from
  fixing the bugs we need fixed in J.

  Cheers,
  Michael

  On Tue, Jun 24, 2014 at 9:08 PM, Day, Phil philip@hp.com wrote:
  Hi Michael,

  Not sure I understand the need for a gap between Juno Spec approval
 freeze (Jul 10th) and K opens for spec proposals (Sep 4th).I can
 understand that K specs won't get approved in that period, and may not get
 much feedback from the cores - but I don't see the harm in letting specs be
 submitted to the K directory for early review / feedback during that period ?

 I agree with both of you.  Priorities need to be finishing up J, but I don't 
 see
 any reason not to let people post K specs whenever.
 Expectations just need to be set appropriately that it may be a while before
 they get reviewed/approved.

Exactly - I think it's reasonable to set the expectation that the focus of 
those that can produce/review code will be elsewhere - but that shouldn't stop 
some small effort going into knocking the rough corners off the specs at the 
same time

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-25 Thread Day, Phil

Discussing at the meet-up if fine with me

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 25 June 2014 00:48
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

 Your comments are fair. I think perhaps at this point we should defer
 discussion of the further away deadlines until the mid cycle meetup -- that
 will give us a chance to whiteboard the flow for that period of the release.

 Or do you really want to lock this down now?

 Michael

 On Wed, Jun 25, 2014 at 12:53 AM, Day, Phil philip@hp.com wrote:
  -Original Message-
  From: Russell Bryant [mailto:rbry...@redhat.com]
  Sent: 24 June 2014 13:08
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno
  release

  On 06/24/2014 07:35 AM, Michael Still wrote:
   Phil -- I really want people to focus their efforts on fixing bugs
   in that period was the main thing. The theory was if we encouraged
   people to work on specs for the next release, then they'd be
   distracted from fixing the bugs we need fixed in J.

   Cheers,
   Michael

   On Tue, Jun 24, 2014 at 9:08 PM, Day, Phil philip@hp.com wrote:
   Hi Michael,

   Not sure I understand the need for a gap between Juno Spec
   approval
  freeze (Jul 10th) and K opens for spec proposals (Sep 4th).I can
  understand that K specs won't get approved in that period, and may
  not get much feedback from the cores - but I don't see the harm in
  letting specs be submitted to the K directory for early review / feedback
 during that period ?

  I agree with both of you.  Priorities need to be finishing up J, but
  I don't see any reason not to let people post K specs whenever.
  Expectations just need to be set appropriately that it may be a while
  before they get reviewed/approved.

  Exactly - I think it's reasonable to set the expectation that the
  focus of those that can produce/review code will be elsewhere - but
  that shouldn't stop some small effort going into knocking the rough
  corners off the specs at the same time

  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'?

2014-06-25 Thread Day, Phil

Hi WingWJ,

I agree that we shouldn’t have a task state of None while an operation is in 
progress.  I’m pretty sure back in the day this didn’t use to be the case and 
task_state stayed as Scheduling until it went to Networking  (now of course 
networking and BDM happen in parallel, so you have to be very quick to see the 
Networking state).

Personally I would like to see the extra granularity of knowing that a request 
has been started on the compute manager (and knowing that the request was 
started rather than is still sitting on the queue makes the decision to put it 
into an error state when the manager is re-started more robust).

Maybe a task state of “STARTING_BUILD” for this case ?

BTW I don’t think _start_building() is called anymore now that we’ve switched 
to conductor calling build_and_run_instance() – but the same task_state issue 
exist in there well.

From: wu jiang [mailto:win...@gmail.com]
Sent: 25 June 2014 08:19
To: OpenStack Development Mailing List
Subject: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi all,

Recently, some of my instances were stuck in task_state 'None' during VM 
creation in my environment.

So I checked  found there's a 'None' task_state between 'SCHEDULING'  
'BLOCK_DEVICE_MAPPING'.

The related codes are implemented like this:

#def _start_building():
#self._instance_update(context, instance['uuid'],
#  vm_state=vm_states.BUILDING,
#  task_state=None,
#  expected_task_state=(task_states.SCHEDULING,
#   None))

So if compute node is rebooted after that procession, all building VMs on it 
will always stay in 'None' task_state. And it's useless and not convenient for 
locating problems.

Why not a new task_state for this step?


WingWJ
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 >

1 - 100 of 162 matches

Mail list logo