Re: [openstack-dev] In loving memory of Chris Yeoh

2015-04-08 Thread Day, Phil
Thanks for letting us know Michael,  and thanks for doing it in such a moving 
way.Sad news indeed

Phil


From: Michael Still [mailto:mi...@stillhq.com]
Sent: 08 April 2015 05:49
To: OpenStack Development Mailing List
Subject: [openstack-dev] In loving memory of Chris Yeoh


It is my sad duty to inform the community that Chris Yeoh passed away this 
morning. Chris leaves behind a daughter Alyssa, aged 6, who I hope will 
remember Chris as the clever and caring person that I will remember him as. I 
haven’t had a chance to confirm with the family if they want flowers or a 
donation to a charity. As soon as I know those details I will reply to this 
email.

Chris worked on open source for a very long time, with OpenStack being just the 
most recent in a long chain of contributions. He worked tirelessly on his 
contributions to Nova, including mentoring other developers. He was dedicated 
to the cause, with a strong vision of what OpenStack could become. He even 
named his cat after the project.

Chris might be the only person to have ever sent an email to his coworkers 
explaining what his code review strategy would be after brain surgery. It takes 
phenomenal strength to carry on in the face of that kind of adversity, but 
somehow he did. Frankly, I think I would have just sat on the beach.

Chris was also a contributor to the Linux Standards Base (LSB), where he helped 
improve the consistency and interoperability between Linux distributions. He 
ran the ‘Hackfest’ programming contests for a number of years at Australia’s 
open source conference -- linux.conf.auhttp://linux.conf.au. He supported 
local Linux user groups in South Australia and Canberra, including involvement 
at installfests and speaking at local meetups. He competed in a programming 
challenge called Loki Hack, and beat out the world to win the event[1].

Alyssa’s memories of her dad need to last her a long time, so we’ve decided to 
try and collect some fond memories of Chris to help her along the way. If you 
feel comfortable doing so, please contribute a memory or two at 
https://docs.google.com/forms/d/1kX-ePqAO7Cuudppwqz1cqgBXAsJx27GkdM-eCZ0c1V8/viewform

Chris was humble, helpful and honest. The OpenStack and broader Open Source 
communities are poorer for his passing.

Michael

[1] http://www.lokigames.com/hack/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] novaclient support for V2.1 micro versions

2015-01-23 Thread Day, Phil
Hi Folks,

Is there any support yet in novaclient for requesting a specific microversion ? 
  (looking at the final leg of extending clean-shutdown to the API, and 
wondering how to test this in devstack via the novaclient)

Phil


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading

2014-10-23 Thread Day, Phil
Hi,

 -Original Message-
 From: joehuang [mailto:joehu...@huawei.com]
 Sent: 23 October 2014 09:59
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack
 cascading
 
 Hi,
 
 Because I am not able to find a meeting room to have deep diving OpenStack
 cascading before design summit. You are welcome to have a f2f conversation
 about the cascading before design summit. I planned to stay at Paris from
 Oct.30 to Nov.8, if you have any doubt or question, please feel free to
 contact me. All the conversation is for clarification / idea exchange purpose,
 not for any secret agreement purpose. It is necessary before design summit,
 for design summit session, it's only 40 minutes, if all 40 minutes are spent 
 on
 basic question and clarification, then no valuable conclusion can be drawn in
 the meeting. So I want to work as client-server mode, anyone who is
 interested in talking cascading with me, just tell me when he will come to the
 hotel where I stay at Paris, then a chat could be made to reduce
 misunderstanding, get more clear picture, and focus on what need to be
 discussed and consensuses during the design summit session.
 
Sure, I'll certainly try and find some time to meet and talk.


 It kind of feels to me that if we just concentrated on the part of this 
 that
 is working out how to distribute/federate Neutron then we'd have a solution
 that could be mapped as easily cells and/or regions - and I wonder if then
 why really need yet another aggregation concept ?
 
 My answer is that it seems to be feasible but can not meet the muti-site
 cloud demand (that's the drive force for cascading):
 1) large cloud operator ask multi-vendor to build the distributed but unified
 multi-site cloud together and each vendor has his own OpenStack based
 solution. If shared Nova/Cinder with federated Neutron used, the cross data
 center integration through RPC message for multi-vendor infrastrcuture is
 very difficult, and no clear responsibility boundry, it leads to difficulty 
 for
 trouble shooting, upgrade, etc.

So if the scope of what you're doing to is to provide a single API across 
multiple clouds that are being built and operated independently then I'm not 
sure how you can impose enough consistency to guarantee any operations.What 
if one of those clouds has Nova AZs configured, and your using (from what I 
understand AZs to try and route to a specific cloud) ?   How do you get image 
and flavor consistency across the clouds ?

I picked up on the Network aspect because that seems to be something you've 
covered in some depth here 
https://docs.google.com/presentation/d/1wIqWgbZBS_EotaERV18xYYA99CXeAa4tv6v_3VlD2ik/edit?pli=1#slide=id.g390a1cf23_2_149
 so I'd assumed it was an intrinsic part of your proposal.  Now I'm even less 
clear on the scope of what you're trying to achieve ;-( 

If this is a federation layer for in effect arbitrary Openstack clouds then it 
kind of feels like it can't be anything other than an aggregator of queries 
(list the VMs in all of the clouds you know about, and show the results in one 
output).   If you have to make API calls into many clouds (when only one of 
them may have any results) then that feels like it would be a performance 
issue.  If you're going to cache the results somehow then in effect you needs 
the Cells approach for propogating up results, which means the sub-clouds have 
to be co-operating.

Maybe I missed it somewhere, but is there a clear write-up of the restrictions 
/ expectations of sub-clouds to work in this model ?

Kind Regards
Phil

 2) restful API /CLI is required for each site to make the cloud always 
 workable
 and manageable. If shared Nova/Cinder with federated Neutron, then some
 data center is not able to expose restful API/CLI for management purpose.
 3) the unified cloud need to expose open and standard api. If shared Nova /
 Cinder with federated Neutron, this point can be arhieved.
 
 Best Regards
 
 Chaoyi Huang ( joehuang )
 
 -Original Message-
 From: henry hly [mailto:henry4...@gmail.com]
 Sent: Thursday, October 23, 2014 3:13 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack
 cascading
 
 Hi Phil,
 
 Thanks for your feedback, and patience of this long history reading :) See
 comments inline.
 
 On Wed, Oct 22, 2014 at 5:59 PM, Day, Phil philip@hp.com wrote:
  -Original Message-
  From: henry hly [mailto:henry4...@gmail.com]
  Sent: 08 October 2014 09:16
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [all] [tc] Multi-clouds integration by
  OpenStack cascading
 
  Hi,
 
  Good questions: why not just keeping multiple endpoints, and leaving
  orchestration effort in the client side?
 
  From feedback of some large data center operators, they want the
  cloud

Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading

2014-10-22 Thread Day, Phil
 -Original Message-
 From: henry hly [mailto:henry4...@gmail.com]
 Sent: 08 October 2014 09:16
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack
 cascading
 
 Hi,
 
 Good questions: why not just keeping multiple endpoints, and leaving
 orchestration effort in the client side?
 
 From feedback of some large data center operators, they want the cloud
 exposed to tenant as a single region with multiple AZs, while each AZ may be
 distributed in different/same locations, very similar with AZ concept of AWS.
 And the OpenStack API is indispensable for the cloud for eco-system
 friendly.
 
 The cascading is mainly doing one thing: map each standalone child
 Openstack to AZs in the parent Openstack, hide separated child endpoints,
 thus converge them into a single standard OS-API endpoint.
 
 One of the obvious benefit doing so is the networking: we can create a single
 Router/LB, with subnet/port member from different child, just like in a single
 OpenStack instance. Without the parent OpenStack working as the
 aggregation layer, it is not so easy to do so. Explicit VPN endpoint may be
 required in each child.

I've read through the thread and the various links, and to me this still sounds 
an awful lot like having multiple regions in Keystone.

First of all I think we're in danger of getting badly mixed up in terminology 
here around AZs which is an awfully overloaded term - esp when we make 
comparisons to AWS AZs.  Whether we think the current Openstack usage of these 
terms or not, lets at least stick to how they are currently defined and used in 
Openstack:

AZs - A scheduling concept in Nova and Cinder.Simply provides some 
isolation schemantic about a compute host or storage server.  Nothing to do 
with explicit physical or geographical location, although some degree of that 
(separate racks, power, etc) is usually implied.

Regions - A keystone concept for a collection of Openstack Endpoints.   They 
may be distinct (a completely isolated set of Openstack service) or overlap 
(some shared services).  Openstack clients support explicit user selection of a 
region.

Cells - A scalability / fault-isolation concept within Nova.  Because Cells 
aspires to provide all Nova features transparently across cells this kind or 
acts like multiple regions where only the Nova service is distinct (Networking 
has to be common, Glance has to be common or at least federated in a 
transparent way, etc).   The difference from regions is that the user doesn’t 
have to make an explicit region choice - they get a single Nova URL for all 
cells.   From what I remember Cells originally started out also using the 
existing APIs as the way to connect the Cells together, but had to move away 
from that because of the performance overhead of going through multiple layers.



Now with Cascading it seems that we're pretty much building on the Regions 
concept, wrapping it behind a single set of endpoints for user convenience, 
overloading the term AZ to re-expose those sets of services to allow the user 
to choose between them (doesn't this kind of negate the advantage of not having 
to specify the region in the client- is that really such a bit deal for users 
?) , and doing something to provide a sort of federated Neutron service - 
because as we all know the hard part in all of this is how you handle the 
Networking.

It kind of feels to me that if we just concentrated on the part of this that is 
working out how to distribute/federate Neutron then we'd have a solution that 
could be mapped as easily cells and/or regions - and I wonder if then why 
really need yet another aggregation concept ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-25 Thread Day, Phil
  Hi Jay,
 
  So just to be clear, are you saying that we should generate 2
  notification messages on Rabbit for every DB update?   That feels
  like a big overkill for me.   If I follow that login then the current
  state transition notifications should also be changed to Starting to
  update task state / finished updating task state  - which seems just
  daft and confuisng logging with notifications.
  Sandy's answer where start /end are used if there is a significant
  amount of work between the two and/or the transaction spans multiple
  hosts makes a lot more sense to me.   Bracketing a single DB call
  with two notification messages rather than just a single one on
  success to show that something changed would seem to me to be much
  more in keeping with the concept of notifying on key events.
 
 I can see your point, Phil. But what about when the set of DB calls takes a
 not-insignificant amount of time? Would the event be considered significant
 then? If so, sending only the I completed creating this thing notification
 message might mask the fact that the total amount of time spent creating
 the thing was significant.

Sure, I think there's a judgment call to be made on a case by case basis on 
this.   In general thought I'd say it's tasks that do more than just update the 
database that need to provide this kind of timing data.   Simple object 
creation / db table inserts don't really feel like they need to be individually 
timed by pairs of messages - if there is value in providing the creation time 
that could just be part of the payload of the single message, rather than 
doubling up on messages.
 
 
 That's why I think it's safer to always wrap tasks -- a series of actions that
 *do* one or more things -- with start/end/abort context managers that send
 the appropriate notification messages.
 
 Some notifications are for events that aren't tasks, and I don't think those
 need to follow start/end/abort semantics. Your example of an instance state
 change is not a task, and therefore would not need a start/end/abort
 notification manager. However, the user action of say, Reboot this server
 *would* have a start/end/abort wrapper for the REBOOT_SERVER event.
 In between the start and end notifications for this REBOOT_SERVER event,
 there may indeed be multiple SERVER_STATE_CHANGED notification
 messages sent, but those would not have start/end/abort wrappers around
 them.
 
 Make a bit more sense?
 -jay
 
Sure - it sounds like we're agreed in principle then that not all operations 
need start/end/abort messages, only those that are a series of operations.

So in that context the server group operations to me still look like they fall 
into the first groups.

Phil



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [NOVA] security group fails to attach to an instance if port-id is specified during boot.

2014-09-25 Thread Day, Phil
I think the expectation is that if a user is already interaction with Neutron 
to create ports then they should do the security group assignment in Neutron as 
well.

The trouble I see with supporting this way of assigning security groups is what 
should the correct behavior be if the user passes more than one port into the 
Nova boot command ?   In the case where Nova is creating the ports it kind of 
feels (just)  Ok to assign the security groups to all the ports.  In the case 
where the ports have already been created then it doesn’t feel right to me that 
Nova modifies them.






From: Oleg Bondarev [mailto:obonda...@mirantis.com]
Sent: 25 September 2014 08:19
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [NOVA] security group fails to attach to an 
instance if port-id is specified during boot.

Hi Parikshit,

Looks like a bug. Currently if port is specified its security groups are not 
updated, it shpould be fixed.
I've reported https://bugs.launchpad.net/nova/+bug/1373774 to track this.
Thanks for reporting!

Thanks,
Oleg

On Thu, Sep 25, 2014 at 10:15 AM, Parikshit Manur 
parikshit.ma...@citrix.commailto:parikshit.ma...@citrix.com wrote:
Hi All,
Creation of server with command  ‘nova boot  --image image 
--flavor m1.medium --nic port-id=port-id --security-groups  sec_grp name’ 
fails to attach the security group to the port/instance. The response payload 
has the security group added but only default security group is attached to the 
instance.  Separate action has to be performed on the instance to add sec_grp, 
and it is successful. Supplying the same with ‘--nic net-id=net-id’ works as 
expected.

Is this the expected behaviour / are there any other options which needs to be 
specified to add the security group when port-id needs to be attached during 
boot.

Thanks,
Parikshit Manur

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-24 Thread Day, Phil
 
  I think we should aim to /always/ have 3 notifications using a pattern
  of
 
 try:
...notify start...
 
...do the work...
 
...notify end...
 except:
...notify abort...
 
 Precisely my viewpoint as well. Unless we standardize on the above, our
 notifications are less than useful, since they will be open to interpretation 
 by
 the consumer as to what precisely they mean (and the consumer will need to
 go looking into the source code to determine when an event actually
 occurred...)
 
 Smells like a blueprint to me. Anyone have objections to me writing one up
 for Kilo?
 
 Best,
 -jay
 
Hi Jay,

So just to be clear, are you saying that we should generate 2 notification 
messages on Rabbit for every DB update ?   That feels like a big overkill for 
me.   If I follow that login then the current state transition notifications 
should also be changed to Starting to update task state / finished updating 
task state  - which seems just daft and confuisng logging with notifications.

Sandy's answer where start /end are used if there is a significant amount of 
work between the two and/or the transaction spans multiple hosts makes a lot 
more sense to me.   Bracketing a single DB call with two notification messages 
rather than just a single one on success to show that something changed would 
seem to me to be much more in keeping with the concept of notifying on key 
events.

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-22 Thread Day, Phil
Hi Folks,

I'd like to get some opinions on the use of pairs of notification messages for 
simple events.   I get that for complex operations on an instance (create, 
rebuild, etc) a start and end message are useful to help instrument progress 
and how long the operations took.However we also use this pattern for 
things like aggregate creation, which is just a single DB operation - and it 
strikes me as kind of overkill and probably not all that useful to any external 
system compared to a a single event .create event after the DB operation.

There is a change up for review to add notifications for service groups which 
is following this pattern (https://review.openstack.org/#/c/107954/) - the 
author isn't doing  anything wrong in that there just following that pattern, 
but it made me wonder if we shouldn't have some better guidance on when to use 
a single notification rather that a .start/.end pair.

Does anyone else have thoughts on this , or know of external systems that would 
break if we restricted .start and .end usage to long-lived instance operations ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?

2014-09-22 Thread Day, Phil
Hi Daniel,


 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 22 September 2014 12:24
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] - do we need .start and .end
 notifications in all cases ?
 
 On Mon, Sep 22, 2014 at 11:03:02AM +, Day, Phil wrote:
  Hi Folks,
 
  I'd like to get some opinions on the use of pairs of notification
  messages for simple events.   I get that for complex operations on
  an instance (create, rebuild, etc) a start and end message are useful
  to help instrument progress and how long the operations took. However
  we also use this pattern for things like aggregate creation, which is
  just a single DB operation - and it strikes me as kind of overkill and
  probably not all that useful to any external system compared to a
  single event .create event after the DB operation.
 
 A start + end pair is not solely useful for timing, but also potentially 
 detecting
 if it completed successfully. eg if you receive an end event notification you
 know it has completed. That said, if this is a use case we want to target, 
 then
 ideally we'd have a third notification for this failure case, so consumers 
 don't
 have to wait  timeout to detect error.
 

I'm just a tad worried that this sounds like its starting to use notification 
as a replacement for logging.If we did this for every CRUD operation on an 
object don't we risk flooding the notification system.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] are we going to remove the novaclient v3 shell or what?

2014-09-19 Thread Day, Phil

 
  DevStack doesn't register v2.1 endpoint to keytone now, but we can use
  it with calling it directly.
  It is true that it is difficult to use v2.1 API now and we can check
  its behavior via v3 API instead.
 
 I posted a patch[1] for registering v2.1 endpoint to keystone, and I confirmed
 --service-type option of current nova command works for it.

Ah - I'd misunderstood where we'd got to with the v2.1 endpoint, thanks for 
putting me straight.

So with this in place then yes I agree we could stop fixing the v3 client.   

Since its actually broken for even operations like boot do we merge in the 
changes I pushed this week so it can still do basic functions, or just go 
straight to removing v3 from the client ?   
 
Phil
 

 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] are we going to remove the novaclient v3 shell or what?

2014-09-18 Thread Day, Phil
 -Original Message-
 From: Kenichi Oomichi [mailto:oomi...@mxs.nes.nec.co.jp]
 Sent: 18 September 2014 02:44
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] are we going to remove the novaclient
 v3 shell or what?
 
 
  -Original Message-
  From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com]
  Sent: Wednesday, September 17, 2014 11:59 PM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: [openstack-dev] [nova] are we going to remove the novaclient v3
 shell or what?
 
  This has come up a couple of times in IRC now but the people that
  probably know the answer aren't available.
 
  There are python-novaclient patches that are adding new CLIs to the v2
  (v1_1) and v3 shells, but now that we have the v2.1 API (v2 on v3) why
  do we still have a v3 shell in the client?  Are there plans to remove that?
 
  I don't really care either way, but need to know for code reviews.
 
  One example: [1]
 
  [1] https://review.openstack.org/#/c/108942/
 
 Sorry for a little late response,
 I think we don't need new features of v3 into novaclient anymore.
 For example, the v3 part of the above[1] was not necessary because a new
 feature server-group quota is provided as v2 and v2.1, not v3.

That would be true if there was a version of the client that supported v2.1 
today, but while the V2.1 API is still presented as V3 and doesn't include the 
tenant_id - making the V3 client the only simple way to test new V2.1 features 
in devstack as far as I can see.


How about this as a plan:

1) We add support to the client for --os-compute-api-version=v2.1   which 
maps into the client with the URL set to include v2.1(this won't be usable 
until we do step 2)

2) We change the Nova  to present the v2.1 API  as 
'http://X.X.X.X:8774/v2.1/tenant_id/
 - At this point we will have a working client for all of the stuff that's been 
moved back from V3 to V2.1, but will lose access to any V3 stuff not yet moved 
(which is the opposite of the current state where the v3 client can only be 
used for things that haven't been refactored to V2.1)

3) We remove V3 from the client.


Until we get 1  2 done, to me it still makes sense to allow small changes into 
the v3 client, so that we keep it usable with the V2.1 API



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Expand resource name allowed characters

2014-09-17 Thread Day, Phil
 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 12 September 2014 19:37
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Expand resource name allowed
 characters
 
 Had to laugh about the PILE OF POO character :) Comments inline...

Can we get support for that in gerrit ?
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] FFE server-group-quotas

2014-09-05 Thread Day, Phil
Hi,

I'd like to ask for a FFE for the 3 patchsets that implement quotas for server 
groups.

Server groups (which landed in Icehouse) provides a really useful anti-affinity 
filter for scheduling that a lot of customers woudl like to use, but without 
some form of quota control to limit the amount of anti-affinity its impossible 
to enable it as a feature in a public cloud.

The code itself is pretty simple - the number of files touched is a side-effect 
of having three V2 APIs that report quota information and the need to protect 
the change in V2 via yet another extension.

https://review.openstack.org/#/c/104957/
https://review.openstack.org/#/c/116073/
https://review.openstack.org/#/c/116079/

Phil

 -Original Message-
 From: Sahid Orentino Ferdjaoui [mailto:sahid.ferdja...@redhat.com]
 Sent: 04 September 2014 13:42
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [nova] FFE request serial-ports
 
 Hello,
 
 I would like to request a FFE for 4 changesets to complete the blueprint
 serial-ports.
 
 Topic on gerrit:
 
 https://review.openstack.org/#/q/status:open+project:openstack/nova+br
 anch:master+topic:bp/serial-ports,n,z
 
 Blueprint on launchpad.net:
   https://blueprints.launchpad.net/nova/+spec/serial-ports
 
 They have already been approved but didn't get enough time to be merged
 by the gate.
 
 Sponsored by:
 Daniel Berrange
 Nikola Dipanov
 
 s.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Day, Phil


 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 05 September 2014 11:49
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out
 virt drivers
 
 On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
 
 
  Ahem, IIRC, there is a third proposal for Kilo :
   - create subteam's half-cores responsible for reviewing patch's
  iterations and send to cores approvals requests once they consider the
  patch enough stable for it.
 
  As I explained, it would allow to free up reviewing time for cores
  without loosing the control over what is being merged.
 
 I don't really understand how the half core idea works outside of a math
 equation, because the point is in core is to have trust over the judgement of
 your fellow core members so that they can land code when you aren't
 looking. I'm not sure how I manage to build up half trust in someone any
 quicker.
 
   -Sean
 
You seem to be looking at a model Sean where trust is purely binary - you’re 
either trusted to know about all of Nova or not trusted at all.  

What Sylvain is proposing (I think) is something more akin to having folks that 
are trusted in some areas of the system and/or trusted to be right enough of 
the time that their reviewing skills take a significant part of the burden of 
the core reviewers.That kind of incremental development of trust feels like 
a fairly natural model me.Its some way between the full divide and rule 
approach of splitting out various components (which doesn't feel like a short 
term solution) and the blanket approach of adding more cores.

Making it easier to incrementally grant trust, and having the processes and 
will to remove it if its seen to be misused feels to me like it has to be part 
of the solution to breaking out of the we need more people we trust, but we 
don’t feel comfortable trusting more than N people at any one time.  Sometimes 
you have to give people a chance in small, well defined and controlled steps.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] FFE server-group-quotas

2014-09-05 Thread Day, Phil
The corresponding Tempest change is also ready to roll (thanks to Ken'inci):  
https://review.openstack.org/#/c/112474/1   so its kind of just a question of 
getting the sequence right.

Phil


 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 05 September 2014 17:05
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] FFE server-group-quotas
 
 On 09/05/2014 11:28 AM, Ken'ichi Ohmichi wrote:
  2014-09-05 21:56 GMT+09:00 Day, Phil philip@hp.com:
  Hi,
 
  I'd like to ask for a FFE for the 3 patchsets that implement quotas for
 server groups.
 
  Server groups (which landed in Icehouse) provides a really useful anti-
 affinity filter for scheduling that a lot of customers woudl like to use, but
 without some form of quota control to limit the amount of anti-affinity its
 impossible to enable it as a feature in a public cloud.
 
  The code itself is pretty simple - the number of files touched is a side-
 effect of having three V2 APIs that report quota information and the need to
 protect the change in V2 via yet another extension.
 
  https://review.openstack.org/#/c/104957/
  https://review.openstack.org/#/c/116073/
  https://review.openstack.org/#/c/116079/
 
  I am happy to sponsor this work.
 
  Thanks
  Ken'ichi ohmichi
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 These look like they are also all blocked by Tempest because it's changing
 return chunks. How does one propose to resolve that, as I don't think there
 is an agreed path up there for to get this into a passing state from my 
 reading
 of the reviews.
 
   -Sean
 
 --
 Sean Dague
 http://dague.net
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-04 Thread Day, Phil


 -Original Message-
 From: Nikola Đipanov [mailto:ndipa...@redhat.com]
 Sent: 03 September 2014 10:50
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for
 Juno

snip

 
 I will follow up with a more detailed email about what I believe we are
 missing, once the FF settles and I have applied some soothing creme to my
 burnout wounds, but currently my sentiment is:
 
 Contributing features to Nova nowadays SUCKS!!1 (even as a core
 reviewer) We _have_ to change that!
 
 N.
 
While agreeing with your overall sentiment, what worries me a tad is implied 
perception that contributing as a core should somehow be easier that as a 
mortal.While I might expect cores to produce better initial code, I though 
the process and standards were intended to be a level playing field.

Has anyone looked at the review bandwidth issue from the perspective of whether 
there has been a change in the amount of time cores now spend contributing vs 
reviewing ?
Maybe there's an opportunity to get cores to mentor non-cores to do the code 
production, freeing up review cycles ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-04 Thread Day, Phil
 
  One final note: the specs referenced above didn't get approved until
  Spec Freeze, which seemed to leave me with less time to implement
  things.  In fact, it seemed that a lot of specs didn't get approved
  until spec freeze.  Perhaps if we had more staggered approval of
  specs, we'd have more staggered submission of patches, and thus less of a
 sudden influx of patches in the couple weeks before feature proposal
 freeze.
 
 Yeah I think the specs were getting approved too late into the cycle, I was
 actually surprised at how far out the schedules were going in allowing things
 in and then allowing exceptions after that.
 
 Hopefully the ideas around priorities/slots/runways will help stagger some of
 this also.
 
I think there is a problem with the pattern that seemed to emerge in June where 
the J.1 period was taken up with spec review  (a lot of good reviews happened 
early in that period, but the approvals kind of came in a lump at the end)  
meaning that the implementation work itself only seemed to really kick in 
during J.2 - and not surprisingly given the complexity of some of the changes 
ran late into J.3.   

We also has previously noted didn’t do any prioritization between those specs 
that were approved - so it was always going to be a race to who managed to get 
code up for review first.  

It kind of feels to me as if the ideal model would be if we were doing spec 
review for K now (i.e during the FF / stabilization period) so that we hit 
Paris with a lot of the input already registered and a clear idea of the range  
of things folks want to do.We shouldn't really have to ask for session 
suggestions for the summit  - they should be something that can be extracted 
from the proposed specs (maybe we do voting across the specs or something like 
that).In that way the summit would be able to confirm the list of specs for 
K and the priority order.

With the current state of the review queue maybe we can’t quite hit this 
pattern for K, but would be worth aspiring to for I ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Day, Phil
Hi Daniel,

Thanks for putting together such a thoughtful piece - I probably need to 
re-read it  few times to take in everything you're saying, but  a couple of 
thoughts that did occur to me:

- I can see how this could help where a change is fully contained within a virt 
driver, but I wonder how many of those there really are ?   Of the things that 
I've see go through recently nearly all also seem to touch the compute manager 
in someway, and a lot (like the Numa changes) also have impacts into the 
scheduler. Isn't it going to make it harder to get any of those changes in 
if they have to be co-ordinated across two or more repos ?  

- I think you hit the nail on the head in terms of the scope of Nova and how 
few people probably really understand all of it, but given the amount of trust 
that goes with being a core wouldn't it also be able to make people cores on 
the understanding that they will only approve code in the areas they are expert 
in ?It kind of feels that this happens to a large extent already, for 
example I don't see Chris or Ken'ichi  taking on work outside of the API layer. 
   It kind of feels as if given a small amount of trust we could have 
additional core reviewers focused on specific parts of the system without 
having to split up the code base if that's where the problem is.

Phil




 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 04 September 2014 11:24
 To: OpenStack Development
 Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt
 drivers
 
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that Nova is
 heading for (or probably already at) a major crisis. If steps are not taken to
 avert this, the project is likely to loose a non-trivial amount of talent, 
 both
 regular code contributors and core team members. That includes myself. This
 is not good for Nova's long term health and so should be of concern to
 anyone involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive summary is
 that the nova-core team is an unfixable bottleneck in our development
 process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt drivers out 
 of
 tree and let them all have their own core teams in their area of code, leaving
 current nova core to focus on all the common code outside the virt driver
 impls. I, now, none the less urge people to read the whole mail.
 
 
 Background information
 ==
 
 I see many factors coming together to form the crisis
 
  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing
 
 Each item on their own may not seem too bad, but combined they add up to
 a big problem.
 
 Core team burn out
 --
 
 Having been involved in Nova for several dev cycles now, it is clear that the
 backlog of code up for review never goes away. Even intensive code review
 efforts at various points in the dev cycle makes only a small impact on the
 backlog. This has a pretty significant impact on core team members, as their
 work is never done. At best, the dial is sometimes set to 10, instead of 11.
 
 Many people, myself included, have built tools to help deal with the reviews
 in a more efficient manner than plain gerrit allows for. These certainly help,
 but they can't ever solve the problem on their own - just make it slightly
 more bearable. And this is not even considering that core team members
 might have useful contributions to make in ways beyond just code review.
 Ultimately the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they have done
 many times already).
 
 Even if one person attempts to take the initiative to heavily invest in review
 of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag team' it is hard
 for one person to make a difference. The end result is that a patch is +2d and
 then sits idle for weeks or more until a merge conflict requires it to be
 reposted at which point even that one +2 is lost. This is a pretty 
 demotivating
 outcome for both reviewers  the patch contributor.
 
 
 New core team talent
 
 
 It can't escape attention that the Nova core team does not grow in size very
 often. When Nova was younger and its code base was smaller, it was easier
 for contributors to get onto core because the base level of knowledge
 required was that much smaller. To get onto core today requires a major
 investment in learning Nova over a year or more. Even people who
 potentially have the latent skills may not 

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-09-02 Thread Day, Phil
Adding in such case more bureaucracy (specs) is not the best way to resolve 
team throughput issues...

I’d argue that  if fundamental design disagreements can be surfaced and debated 
at the design stage rather than first emerging on patch set XXX of an 
implementation, and be used to then prioritize what needs to be implemented 
then they do have a useful role to play.

Phil


From: Boris Pavlovic [mailto:bpavlo...@mirantis.com]
Sent: 28 August 2014 23:13
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

Joe,


This is a resource problem, the nova team simply does not have enough people 
doing enough reviews to make this possible.

Adding in such case more bureaucracy (specs) is not the best way to resolve 
team throughput issues...

my 2cents


Best regards,
Boris Pavlovic

On Fri, Aug 29, 2014 at 2:01 AM, Joe Gordon 
joe.gord...@gmail.commailto:joe.gord...@gmail.com wrote:


On Thu, Aug 28, 2014 at 2:43 PM, Alan Kavanagh 
alan.kavan...@ericsson.commailto:alan.kavan...@ericsson.com wrote:
I share Donald's points here, I believe what would help is to clearly describe 
in the Wiki the process and workflow for the BP approval process and build in 
this process how to deal with discrepancies/disagreements and build timeframes 
for each stage and process of appeal etc.
The current process would benefit from some fine tuning and helping to build 
safe guards and time limits/deadlines so folks can expect responses within a 
reasonable time and not be left waiting in the cold.


This is a resource problem, the nova team simply does not have enough people 
doing enough reviews to make this possible.

My 2cents!
/Alan

-Original Message-
From: Dugger, Donald D 
[mailto:donald.d.dug...@intel.commailto:donald.d.dug...@intel.com]
Sent: August-28-14 10:43 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

I would contend that that right there is an indication that there's a problem 
with the process.  You submit a BP and then you have no idea of what is 
happening and no way of addressing any issues.  If the priority is wrong I can 
explain why I think the priority should be higher, getting stonewalled leaves 
me with no idea what's wrong and no way to address any problems.

I think, in general, almost everyone is more than willing to adjust proposals 
based upon feedback.  Tell me what you think is wrong and I'll either explain 
why the proposal is correct or I'll change it to address the concerns.

Trying to deal with silence is really hard and really frustrating.  Especially 
given that we're not supposed to spam the mailing it's really hard to know what 
to do.  I don't know the solution but we need to do something.  More core team 
members would help, maybe something like an automatic timeout where BPs/patches 
with no negative scores and no activity for a week get flagged for special 
handling.

I feel we need to change the process somehow.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786tel:303%2F443-3786

-Original Message-
From: Jay Pipes [mailto:jaypi...@gmail.commailto:jaypi...@gmail.com]
Sent: Thursday, August 28, 2014 1:44 PM
To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

On 08/27/2014 09:04 PM, Dugger, Donald D wrote:
 I'll try and not whine about my pet project but I do think there is a
 problem here.  For the Gantt project to split out the scheduler there
 is a crucial BP that needs to be implemented (
 https://review.openstack.org/#/c/89893/ ) and, unfortunately, the BP
 has been rejected and we'll have to try again for Kilo.  My question
 is did we do something wrong or is the process broken?

 Note that we originally proposed the BP on 4/23/14, went through 10
 iterations to the final version on 7/25/14 and the final version got
 three +1s and a +2 by 8/5.  Unfortunately, even after reaching out to
 specific people, we didn't get the second +2, hence the rejection.

 I understand that reviews are a burden and very hard but it seems
 wrong that a BP with multiple positive reviews and no negative reviews
 is dropped because of what looks like indifference.

I would posit that this is not actually indifference. The reason that there may 
not have been 1 +2 from a core team member may very well have been that the 
core team members did not feel that the blueprint's priority was high enough to 
put before other work, or that the core team members did have the time to 
comment on the spec (due to them not feeling the blueprint had the priority to 
justify the time to do a full review).

Note that I'm not a core drivers team member.

Best,
-jay


___
OpenStack-dev mailing list

Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-02 Thread Day, Phil
Needing 3 out of 19 instead of 3 out of 20 isn't an order of magnatude 
according to my calculator.   Its much closer/fairer than making it 2/19 vs 
3/20.

If a change is borderline in that it can only get 2 other cores maybe it 
doesn't have a strong enough case for an exception.

Phil


Sent from Samsung Mobile


 Original message 
From: Nikola Đipanov
Date:02/09/2014 19:41 (GMT+00:00)
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

On 09/02/2014 08:16 PM, Michael Still wrote:
 Hi.

 We're soon to hit feature freeze, as discussed in Thierry's recent
 email. I'd like to outline the process for requesting a freeze
 exception:

 * your code must already be up for review
 * your blueprint must have an approved spec
 * you need three (3) sponsoring cores for an exception to be granted

Can core reviewers who have features up for review have this number
lowered to two (2) sponsoring cores, as they in reality then need four
(4) cores (since they themselves are one (1) core but cannot really
vote) making it an order of magnitude more difficult for them to hit
this checkbox?

Thanks,
N.

 * exceptions must be granted before midnight, Friday this week
 (September 5) UTC
 * the exception is valid until midnight Friday next week
 (September 12) UTC when all exceptions expire

 For reference, our rc1 drops on approximately 25 September, so the
 exception period needs to be short to maximise stabilization time.

 John Garbutt and I will both be granting exceptions, to maximise our
 timezone coverage. We will grant exceptions as they come in and gather
 the required number of cores, although I have also carved some time
 out in the nova IRC meeting this week for people to discuss specific
 exception requests.

 Michael



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource Tracking

2014-08-20 Thread Day, Phil


 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 20 August 2014 14:13
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource
 Tracking
 
 On Wed, Aug 20, 2014 at 08:33:31AM -0400, Jay Pipes wrote:
  On 08/20/2014 04:48 AM, Nikola Đipanov wrote:
  On 08/20/2014 08:27 AM, Joe Gordon wrote:
  On Aug 19, 2014 10:45 AM, Day, Phil philip@hp.com
  mailto:philip@hp.com wrote:
  
  -Original Message-
  From: Nikola Đipanov [mailto:ndipa...@redhat.com
  mailto:ndipa...@redhat.com]
  Sent: 19 August 2014 17:50
  To: openstack-dev@lists.openstack.org
  mailto:openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible
  Resource
  Tracking
  
  On 08/19/2014 06:39 PM, Sylvain Bauza wrote:
  On the other hand, ERT discussion is decoupled from the scheduler
  split discussion and will be delayed until Extensible Resource
  Tracker owner (Paul Murray) is back from vacation.
  In the mean time, we're considering new patches using ERT as
  non-acceptable, at least until a decision is made about ERT.
  
  
  Even though this was not officially agreed I think this is the
  least
  we can do
  under the circumstances.
  
  A reminder that a revert proposal is up for review still, and I
  consider it fair
  game to approve, although it would be great if we could hear from
  Paul first:
  
 https://review.openstack.org/115218
  
  Given the general consensus seemed to be to wait some before
  deciding
  what to do here, isn't putting the revert patch up for approval a
  tad premature ?
  
  There was a recent discussion about reverting patches, and from that
  (but not only) my understanding is that we should revert whenever in
 doubt.
 
  Right.
 
  http://lists.openstack.org/pipermail/openstack-dev/2014-August/042728.
  html
 
  Putting the patch back in is easy, and if proven wrong I'd be the
  first to +2 it. As scary as they sound - I don't think reverts are a big 
  deal.
 
  Neither do I. I think it's more appropriate to revert quickly and then
  add it back after any discussions, per the above revert policy.
 
  
  The RT may be not able to cope with all of the new and more complex
  resource types we're now trying to schedule, and so it's not
  surprising that the ERT can't fix that.  It does however address
  some specific use cases that the current RT can't cope with,  the
  spec had a pretty through review under the new process, and was
 discussed during the last
  2 design summits.   It worries me that we're continually failing to make
  even small and useful progress in this area.
  
  Sylvain's approach of leaving the ERT in place so it can be used
  for
  the use cases it was designed for while holding back on doing some
  of the more complex things than might need either further work in
  the ERT, or some more fundamental work in the RT (which feels like
  as L or M timescales based on current progress) seemed pretty
 pragmatic to me.
  
  ++, I really don't like the idea of rushing the revert of a feature
  ++that
  went through significant design discussion especially when the
  author is away and cannot defend it.
  
  Fair enough - I will WIP the revert until Phil is back. It's the
  right thing to do seeing that he is away.
 
  Well, it's as much (or more?) Paul Murray and Andrea Rosa :)
 
  However - I don't agree with using the length of discussion around
  the feature as a valid argument against reverting.
 
  Neither do I.
 
  I've supplied several technical arguments on the original thread to
  why I think we should revert it, and would expect a discussion that
  either refutes them, or provides alternative ways forward.
  
  Saying 'but we talked about it at length' is the ultimate appeal to
  imaginary authority and frankly not helping at all.
 
  Agreed. Perhaps it's just my provocative nature, but I hear a lot of
  we've already decided/discussed this talk especially around the
  scheduler and RT stuff, and I don't think the argument holds much
  water. We should all be willing to reconsider design decisions and
  discussions when appropriate, and in the case of the RT, this
  discussion is timely and appropriate due to the push to split the scheduler
 out of Nova (prematurely IMO).
 
 Yes, this is absolutely right. Even if we have approved a spec / blueprint we
 *always* reserve the right to change our minds at a later date if new
 information or points of view come to light. Hopefully this will be fairly
 infrequent and we won't do it lightly, but it is a key thing we accept as a
 possible outcome of the process we follow.
 
My point was more that reverting a patch that does meet the use cases it was 
designed to cover, even if there is something more fundamental that needs to be 
looked at to cover some new use cases that weren't considered at the time is 
the route to stagnation.   

It seems (unless

Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource Tracking

2014-08-19 Thread Day, Phil
 -Original Message-
 From: Nikola Đipanov [mailto:ndipa...@redhat.com]
 Sent: 19 August 2014 17:50
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Scheduler split wrt Extensible Resource
 Tracking
 
 On 08/19/2014 06:39 PM, Sylvain Bauza wrote:
  On the other hand, ERT discussion is decoupled from the scheduler
  split discussion and will be delayed until Extensible Resource Tracker
  owner (Paul Murray) is back from vacation.
  In the mean time, we're considering new patches using ERT as
  non-acceptable, at least until a decision is made about ERT.
 
 
 Even though this was not officially agreed I think this is the least we can do
 under the circumstances.
 
 A reminder that a revert proposal is up for review still, and I consider it 
 fair
 game to approve, although it would be great if we could hear from Paul first:
 
   https://review.openstack.org/115218

Given the general consensus seemed to be to wait some before deciding what to 
do here, isn't putting the revert patch up for approval a tad premature ?
 
The RT may be not able to cope with all of the new and more complex resource 
types we're now trying to schedule, and so it's not surprising that the ERT 
can't fix that.  It does however address some specific use cases that the 
current RT can't cope with,  the spec had a pretty through review under the new 
process, and was discussed during the last 2 design summits.   It worries me 
that we're continually failing to make even small and useful progress in this 
area.

Sylvain's approach of leaving the ERT in place so it can be used for the use 
cases it was designed for while holding back on doing some of the more complex 
things than might need either further work in the ERT, or some more fundamental 
work in the RT (which feels like as L or M timescales based on current 
progress) seemed pretty pragmatic to me.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Spec freeze exception] Controlled shutdown of GuestOS

2014-07-24 Thread Day, Phil
According to: https://etherpad.openstack.org/p/nova-juno-spec-priorities   
alaski has also singed up for this if I drop the point of contention - which 
I'ev done

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 24 July 2014 00:50
 To: Daniel P. Berrange; OpenStack Development Mailing List (not for usage
 questions)
 Subject: Re: [openstack-dev] [Nova][Spec freeze exception] Controlled
 shutdown of GuestOS
 
 Another core sponsor would be nice on this one. Any takers?
 
 Michael
 
 On Thu, Jul 24, 2014 at 4:14 AM, Daniel P. Berrange berra...@redhat.com
 wrote:
  On Wed, Jul 23, 2014 at 06:08:52PM +, Day, Phil wrote:
  Hi Folks,
 
  I'd like to propose the following as an exception to the spec freeze, on 
  the
 basis that it addresses a potential data corruption issues in the Guest.
 
  https://review.openstack.org/#/c/89650
 
  We were pretty close to getting acceptance on this before, apart from a
 debate over whether one additional config value could be allowed to be set
 via image metadata - so I've given in for now on wanting that feature from a
 deployer perspective, and said that we'll hard code it as requested.
 
  Initial parts of the implementation are here:
  https://review.openstack.org/#/c/68942/
  https://review.openstack.org/#/c/99916/
 
  Per my comments already, I think this is important for Juno and will
  sponsor it.
 
  Regards,
  Daniel
  --
  |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
  |: http://libvirt.org  -o- http://virt-manager.org 
  :|
  |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ 
  :|
  |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc 
  :|
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova][Spec freeze exception] Controlled shutdown of GuestOS

2014-07-23 Thread Day, Phil
Hi Folks,

I'd like to propose the following as an exception to the spec freeze, on the 
basis that it addresses a potential data corruption issues in the Guest.

https://review.openstack.org/#/c/89650

We were pretty close to getting acceptance on this before, apart from a debate 
over whether one additional config value could be allowed to be set via image 
metadata - so I've given in for now on wanting that feature from a deployer 
perspective, and said that we'll hard code it as requested.

Initial parts of the implementation are here:
https://review.openstack.org/#/c/68942/
https://review.openstack.org/#/c/99916/


Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Server groups specified by name

2014-07-08 Thread Day, Phil

 Sorry, forgot to put this in my previous message.  I've been advocating the 
 ability to use names instead of UUIDs for server groups pretty much since I 
 saw them last year.

 I'd like to just enforce that server group names must be unique within a 
 tenant, and then allow names to be used anywhere we currently have UUIDs 
 (the way we currently do for instances). 
 If there is ambiguity (like from admin doing an operation where there are 
multiple groups with the same name in different tenants) then we can have it 
fail with an appropriate error message

The question here is not just about server group names, but all names. Having 
one name be unique and not another (instance names), is a recipe for a poor 
user experience. Unless 
 there is a strong reason why our current model is bad ( non unique names), I 
 don't think this type of change is worth the impact on users.

I think in general we've moved away from using names at the API layer and 
pushed name to UUID translation into the clients for a better command line 
experience, which seems like the
right thing to do.  The only reason the legacy groups are based on names is 
because the creation was an implicit side effect of the first call to use the 
group.  Since we're presumably going
to deprecate support for that at some stage then keeping new API to be UUID 
based seems the right direction to me.

If we now try to enforce unique names even in the new groups API that would be 
change that would probably needs its own extension as it’s a break in behavior, 
so what I'm proposing is
that any group processing based on name in a hint explicitly means a group with 
that name and a policy of legacy, so the code would need to change so that 
when dealing with name
based hints:
- The group is created if there is no existing group with a policy of legacy  
(at the moment it wouldn't be created if a new group exists of the same name)
- The filter scheduler should only find groups with a policy of legacy when 
looking for them by name

Looking at the current implementation I think this could be done inside the 
get_by_hint() and get_by_name() methods of the Instance Group Object - does 
that work for people ?
(It looks to me that these were only introduced in order to support the legacy 
groups - I'm just not sure if its Ok to embed this legacy only behavior 
inside those calls)

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Server groups specified by name

2014-07-07 Thread Day, Phil
Hi Folks,

I noticed a couple of changes that have just merged to allow the server group 
hints to be specified by name (some legacy behavior around automatically 
creating groups).

https://review.openstack.org/#/c/83589/
https://review.openstack.org/#/c/86582/

But group names aren't constrained to be unique, and the method called to get 
the group instance_group_obj.InstanceGroup.get_by_name() will just return the 
first group I finds with that name (which could be either the legacy group or 
some new group, in which case the behavior is going to be different from the 
legacy behavior I think ?

I'm thinking that there may need to be some additional logic here, so that 
group hints passed by name will fail if there is an existing group with a 
policy that isn't legacy - and equally perhaps group creation needs to fail 
if a legacy groups exists with the same name ?

Thoughts ?

(Sorry I missed this on the reviews)
Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][libvirt] why use domain destroy instead of shutdown?

2014-07-04 Thread Day, Phil
Hi Melanie,

I have a BP (https://review.openstack.org/#/c/89650) and the first couple of 
bits of implementation (https://review.openstack.org/#/c/68942/  
https://review.openstack.org/#/c/99916/) out for review on this very topic ;-)

Phil

 -Original Message-
 From: melanie witt [mailto:melw...@outlook.com]
 Sent: 04 July 2014 03:13
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: [openstack-dev] [nova][libvirt] why use domain destroy instead of
 shutdown?
 
 Hi all,
 
 I noticed in nova/virt/libvirt/driver.py we use domain destroy instead of
 domain shutdown in most cases (except for soft reboot). Is there a special
 reason we don't use shutdown to do a graceful shutdown of the guest for
 the stop, shelve, migrate, etc functions? Using destroy can corrupt the guest
 file system.
 
 Thanks,
 Melanie

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Limits API and project-user-quotas

2014-07-04 Thread Day, Phil
Hi Folks,

Working on the server groups quotas I hit an issue with the limits API which I 
wanted to get feedback on.

Currently this always shows just the project level quotas and usage, which can 
be confusing if there is a lower user specific quota.  For example:

Project Quota = 10
User Quota = 1
User Usage = 1
Other User Usage = 2

If we show just the overall project usage and quota we get (used=3, quota=10) - 
which suggest that the quota is not fully used, and I can go ahead and create 
something.

However if we show the user quotas we get (used=1, quota=1), which shows 
correctly that I would get a quota error on creation.


But if we do switch to returning the used view of quotas and usage we can get a 
different problem:

Project Quota = 10
User Quota = 5
User Usage = 1
Other User Usage = 9

Now if we show just the user quotas we get (used=1, quota=5), which suggests 
that there is capacity when in fact there isn't.

Whereas if we just return the overall project usage and quota (current 
behavior) we get (used=10, quota=10) - which shows that the project quota is 
fully used.


It kind of feels as if really we need to return both the project and per user 
values if the results are going to be reliable in the face of 
project-user-quotas, but that led me to thinking whether a user that has been 
given a specific quota is meant to eb able to see the corresponding overall 
project level quota ?

The quota API itself allows a user to get either the project level quota or any 
per-user quota within that project - which does make all of the information 
available even if it is a tad odd that the default (no user specified) is to 
see the overall quota rather than the one that apples to the user making the  
request.   They can't however via the quotas API find out project level usage.

Thoughts on what the correct model is here ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'?

2014-06-26 Thread Day, Phil
Why do others think – do we want a spec to add an additional task_state value 
that will be set in a well defined place.   Kind of feels overkill for me in 
terms of the review effort that would take compared to just reviewing the code 
- its not as there are going to be lots of alternatives to consider here.

From: wu jiang [mailto:win...@gmail.com]
Sent: 26 June 2014 09:19
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

 Hi Phil,

thanks for your reply. So should I need to submit a patch/spec to add it now?

On Wed, Jun 25, 2014 at 5:53 PM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Looking at this a bit deeper the comment in _start_buidling() says that its 
doing this to “Save the host and launched_on fields and log appropriately “.
But as far as I can see those don’t actually get set until the claim is made 
against the resource tracker a bit later in the process, so this whole update 
might just be not needed – although I still like the idea of a state to show 
that the request has been taken off the queue by the compute manager.

From: Day, Phil
Sent: 25 June 2014 10:35

To: OpenStack Development Mailing List
Subject: RE: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi WingWJ,

I agree that we shouldn’t have a task state of None while an operation is in 
progress.  I’m pretty sure back in the day this didn’t use to be the case and 
task_state stayed as Scheduling until it went to Networking  (now of course 
networking and BDM happen in parallel, so you have to be very quick to see the 
Networking state).

Personally I would like to see the extra granularity of knowing that a request 
has been started on the compute manager (and knowing that the request was 
started rather than is still sitting on the queue makes the decision to put it 
into an error state when the manager is re-started more robust).

Maybe a task state of “STARTING_BUILD” for this case ?

BTW I don’t think _start_building() is called anymore now that we’ve switched 
to conductor calling build_and_run_instance() – but the same task_state issue 
exist in there well.

From: wu jiang [mailto:win...@gmail.com]
Sent: 25 June 2014 08:19
To: OpenStack Development Mailing List
Subject: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi all,

Recently, some of my instances were stuck in task_state 'None' during VM 
creation in my environment.

So I checked  found there's a 'None' task_state between 'SCHEDULING'  
'BLOCK_DEVICE_MAPPING'.

The related codes are implemented like this:

#def _start_building():
#self._instance_update(context, instance['uuid'],
#  vm_state=vm_states.BUILDING,
#  task_state=None,
#  expected_task_state=(task_states.SCHEDULING,
#   None))

So if compute node is rebooted after that procession, all building VMs on it 
will always stay in 'None' task_state. And it's useless and not convenient for 
locating problems.

Why not a new task_state for this step?


WingWJ

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-26 Thread Day, Phil
 -Original Message-
 From: Ahmed RAHAL [mailto:ara...@iweb.com]
 Sent: 25 June 2014 20:25
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] should we have a stale data indication in
 nova list/show?
 
 Le 2014-06-25 14:26, Day, Phil a écrit :
  -Original Message-
  From: Sean Dague [mailto:s...@dague.net]
  Sent: 25 June 2014 11:49
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] should we have a stale data
  indication in nova list/show?
 
 
  +1 that the state shouldn't be changed.
 
  What about if we exposed the last updated time to users and allowed
 them to decide if its significant or not ?
 
 
 This would just indicate the last operation's time stamp.
 There already is a field in nova show called 'updated' that has some kind of
 indication. I honestly do not know who updates that field, but if anything,
 this existing field could/should be used.
 
 
Doh ! - yes that is the updated_at value in the DB.

I'd missed the last bit of my train of thought on this, which was that we could 
make the periodic task which checks (and corrects) the instance state update 
the updated_at timestamp even if the state is unchanged.

However that does add another DB update per instance every 60 seconds, and I'm 
with Joe that I'm really not convinced this is taking the Nova view of Status 
in the right direction.   Better to educate / document the limitation of status 
as they stand than to try and change it I think.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-25 Thread Day, Phil
Discussing at the meet-up if fine with me

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 25 June 2014 00:48
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release
 
 Your comments are fair. I think perhaps at this point we should defer
 discussion of the further away deadlines until the mid cycle meetup -- that
 will give us a chance to whiteboard the flow for that period of the release.
 
 Or do you really want to lock this down now?
 
 Michael
 
 On Wed, Jun 25, 2014 at 12:53 AM, Day, Phil philip@hp.com wrote:
  -Original Message-
  From: Russell Bryant [mailto:rbry...@redhat.com]
  Sent: 24 June 2014 13:08
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno
  release
 
  On 06/24/2014 07:35 AM, Michael Still wrote:
   Phil -- I really want people to focus their efforts on fixing bugs
   in that period was the main thing. The theory was if we encouraged
   people to work on specs for the next release, then they'd be
   distracted from fixing the bugs we need fixed in J.
  
   Cheers,
   Michael
  
   On Tue, Jun 24, 2014 at 9:08 PM, Day, Phil philip@hp.com wrote:
   Hi Michael,
  
   Not sure I understand the need for a gap between Juno Spec
   approval
  freeze (Jul 10th) and K opens for spec proposals (Sep 4th).I can
  understand that K specs won't get approved in that period, and may
  not get much feedback from the cores - but I don't see the harm in
  letting specs be submitted to the K directory for early review / feedback
 during that period ?
 
  I agree with both of you.  Priorities need to be finishing up J, but
  I don't see any reason not to let people post K specs whenever.
  Expectations just need to be set appropriately that it may be a while
  before they get reviewed/approved.
 
  Exactly - I think it's reasonable to set the expectation that the
  focus of those that can produce/review code will be elsewhere - but
  that shouldn't stop some small effort going into knocking the rough
  corners off the specs at the same time
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'?

2014-06-25 Thread Day, Phil
Hi WingWJ,

I agree that we shouldn’t have a task state of None while an operation is in 
progress.  I’m pretty sure back in the day this didn’t use to be the case and 
task_state stayed as Scheduling until it went to Networking  (now of course 
networking and BDM happen in parallel, so you have to be very quick to see the 
Networking state).

Personally I would like to see the extra granularity of knowing that a request 
has been started on the compute manager (and knowing that the request was 
started rather than is still sitting on the queue makes the decision to put it 
into an error state when the manager is re-started more robust).

Maybe a task state of “STARTING_BUILD” for this case ?

BTW I don’t think _start_building() is called anymore now that we’ve switched 
to conductor calling build_and_run_instance() – but the same task_state issue 
exist in there well.

From: wu jiang [mailto:win...@gmail.com]
Sent: 25 June 2014 08:19
To: OpenStack Development Mailing List
Subject: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi all,

Recently, some of my instances were stuck in task_state 'None' during VM 
creation in my environment.

So I checked  found there's a 'None' task_state between 'SCHEDULING'  
'BLOCK_DEVICE_MAPPING'.

The related codes are implemented like this:

#def _start_building():
#self._instance_update(context, instance['uuid'],
#  vm_state=vm_states.BUILDING,
#  task_state=None,
#  expected_task_state=(task_states.SCHEDULING,
#   None))

So if compute node is rebooted after that procession, all building VMs on it 
will always stay in 'None' task_state. And it's useless and not convenient for 
locating problems.

Why not a new task_state for this step?


WingWJ
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'?

2014-06-25 Thread Day, Phil
Looking at this a bit deeper the comment in _start_buidling() says that its 
doing this to “Save the host and launched_on fields and log appropriately “.
But as far as I can see those don’t actually get set until the claim is made 
against the resource tracker a bit later in the process, so this whole update 
might just be not needed – although I still like the idea of a state to show 
that the request has been taken off the queue by the compute manager.

From: Day, Phil
Sent: 25 June 2014 10:35
To: OpenStack Development Mailing List
Subject: RE: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi WingWJ,

I agree that we shouldn’t have a task state of None while an operation is in 
progress.  I’m pretty sure back in the day this didn’t use to be the case and 
task_state stayed as Scheduling until it went to Networking  (now of course 
networking and BDM happen in parallel, so you have to be very quick to see the 
Networking state).

Personally I would like to see the extra granularity of knowing that a request 
has been started on the compute manager (and knowing that the request was 
started rather than is still sitting on the queue makes the decision to put it 
into an error state when the manager is re-started more robust).

Maybe a task state of “STARTING_BUILD” for this case ?

BTW I don’t think _start_building() is called anymore now that we’ve switched 
to conductor calling build_and_run_instance() – but the same task_state issue 
exist in there well.

From: wu jiang [mailto:win...@gmail.com]
Sent: 25 June 2014 08:19
To: OpenStack Development Mailing List
Subject: [openstack-dev] [nova] Why is there a 'None' task_state between 
'SCHEDULING'  'BLOCK_DEVICE_MAPPING'?

Hi all,

Recently, some of my instances were stuck in task_state 'None' during VM 
creation in my environment.

So I checked  found there's a 'None' task_state between 'SCHEDULING'  
'BLOCK_DEVICE_MAPPING'.

The related codes are implemented like this:

#def _start_building():
#self._instance_update(context, instance['uuid'],
#  vm_state=vm_states.BUILDING,
#  task_state=None,
#  expected_task_state=(task_states.SCHEDULING,
#   None))

So if compute node is rebooted after that procession, all building VMs on it 
will always stay in 'None' task_state. And it's useless and not convenient for 
locating problems.

Why not a new task_state for this step?


WingWJ
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [ironic]nova scheduler and ironic

2014-06-25 Thread Day, Phil
I think there’s a bit more to it that just having an aggregate:


-  Ironic provides its own version of the Host manager class for the 
scheduler, I’m not sure if that is fully compatible with the non-ironic case.  
Even in the BP for merging the Ironic driver back into Nova it still looks like 
this will stay as a sub-class (would be good if they can just be merged IMO)


-  You’d need to decide how you want to use the aggregate – extra specs 
in the flavor matching against the aggregate metatdata is one way, you could 
also do it by matching image metadata (as the ironic images are going to be 
different from KVM ones)


From: Joe Gordon [mailto:joe.gord...@gmail.com]
Sent: 25 June 2014 05:53
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] [ironic]nova scheduler and ironic


On Jun 24, 2014 7:02 PM, Jander lu 
lhcxx0...@gmail.commailto:lhcxx0...@gmail.com wrote:

 hi, guys, I have two confused issue when reading source code.

 1) can we have ironic driver and KVM driver both exist in the cloud? for 
 example, I have 8 compute nodes, I make 4 of them with compute_driver = 
 libvirt and remaining 4 nodes with 
 compute_driver=nova.virt.ironic.IronicDriver ?

 2) if it works, how does nova scheduler work to choose the right node in this 
 case if I want boot a VM or a physical node ?

You can use host aggregates to make certain flavors bare metal and others KVM



 thx all.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Day, Phil
 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 25 June 2014 11:49
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] should we have a stale data indication in
 nova list/show?
 
 On 06/25/2014 04:28 AM, Belmiro Moreira wrote:
  I like the current behavior of not changing the VM state if
  nova-compute goes down.
 
  The cloud operators can identify the issue in the compute node and try
  to fix it without users noticing. Depending in the problem I can
  inform users if instances are affected and change the state if necessary.
 
  I wouldn't like is to expose any failure in nova-compute to users and
  be contacted because VM state changed.
 
 Agreed. Plus in the perfectly normal case of an upgrade of a compute node,
 it's expected that nova-compute is going to be down for some period of
 time, and it's 100% expected that the VMs remain up and ACTIVE over that
 period.
 
 Setting VMs to ERROR would totally gum that up.
 
+1 that the state shouldn't be changed.

What about if we exposed the last updated time to users and allowed them to 
decide if its significant or not ?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] bp: nova-ecu-support

2014-06-24 Thread Day, Phil
The basic framework for supporting this kind of resource scheduling is the 
extensible-resource-tracker:

https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking
https://review.openstack.org/#/c/86050/
https://review.openstack.org/#/c/71557/

Once that lands being able schedule on arbitrary resources (such as an ECU) 
becomes a lot easier to implement.

Phil

 -Original Message-
 From: Kenichi Oomichi [mailto:oomi...@mxs.nes.nec.co.jp]
 Sent: 03 February 2014 09:37
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: [openstack-dev] [Nova] bp: nova-ecu-support
 
 Hi,
 
 There is a blueprint ECU[1], and that is an interesting idea for me.
 so I'd like to know the comments about ECU idea.
 
 After production environments start, the operators will need to add
 compute nodes before exhausting the capacity.
 On the scenario, they'd like to add cost-efficient machines as the compute
 node at the time. So the production environments will consist of different
 performance compute nodes. Also they hope to provide the same
 performance virtual machines on different performance nodes if specifying
 the same flavor.
 
 Now nova contains flavor_extraspecs[2] which can customize the cpu
 bandwidth for each flavor:
  # nova flavor-key m1.low_cpu set quota:cpu_quota=1  # nova flavor-
 key m1.low_cpu set quota:cpu_period=2
 
 However, this feature can not provide the same vm performance on
 different performance node, because this arranges the vm performance
 with the same ratio(cpu_quota/cpu_period) only even if the compute node
 performances are different. So it is necessary to arrange the different ratio
 based on each compute node performance.
 
 Amazon EC2 has ECU[3] already for implementing this, and the blueprint [1]
 is also for it.
 
 Any thoughts?
 
 
 Thanks
 Ken'ichi Ohmichi
 
 ---
 [1]: https://blueprints.launchpad.net/nova/+spec/nova-ecu-support
 [2]: http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-
 to-openstack-compute.html#customize-flavors
 [3]: http://aws.amazon.com/ec2/faqs/  Q: What is a EC2 Compute Unit
 and why did you introduce it?
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-24 Thread Day, Phil
 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: 23 June 2014 10:35
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?
 
 On 18 June 2014 21:57, Jay Pipes jaypi...@gmail.com wrote:
  On 06/17/2014 05:42 PM, Daniel P. Berrange wrote:
 
  On Tue, Jun 17, 2014 at 04:32:36PM +0100, Pádraig Brady wrote:
 
  On 06/13/2014 02:22 PM, Day, Phil wrote:
 
  I guess the question I’m really asking here is:  “Since we know
  resize down won’t work in all cases, and the failure if it does
  occur will be hard for the user to detect, should we just block it
  at the API layer and be consistent across all Hypervisors ?”
 
 
  +1
 
  There is an existing libvirt blueprint:
 
  https://blueprints.launchpad.net/nova/+spec/libvirt-resize-disk-down
  which I've never been in favor of:
 https://bugs.launchpad.net/nova/+bug/1270238/comments/1
 
 
  All of the functionality around resizing VMs to match a different
  flavour seem to be a recipe for unleashing a torrent of unfixable
  bugs, whether resizing disks, adding CPUs, RAM or any other aspect.
 
 
  +1
 
  I'm of the opinion that we should plan to rip resize functionality out
  of (the next major version of) the Compute API and have a *single*,
  *consistent* API for migrating resources. No more API extension X for
  migrating this kind of thing, and API extension Y for this kind of
  thing, and API extension Z for migrating /live/ this type of thing.
 
  There should be One move API to Rule Them All, IMHO.
 
 +1 for one move API, the two evolved independently, in different
 drivers, its time to unify them!
 
 That plan got stuck behind the refactoring of live-migrate and migrate to the
 conductor, to help unify the code paths. But it kinda got stalled (I must
 rebase those patches...).
 
 Just to be clear, I am against removing resize down from v2 without a
 deprecation cycle. But I am pro starting that deprecation cycle.
 
 John
 
I'm not sure Daniel and Jay are arguing for the same thing here John:  I 
*think*  Daniel is saying drop resize altogether and Jay is saying unify it 
with migration - so I'm a tad confused which of those you're agreeing with.

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-24 Thread Day, Phil
Hi Michael,

Not sure I understand the need for a gap between Juno Spec approval freeze 
(Jul 10th) and K opens for spec proposals (Sep 4th).I can understand that 
K specs won't get approved in that period, and may not get much feedback from 
the cores - but I don't see the harm in letting specs be submitted to the K 
directory for early review / feedback during that period ?  

Phil

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 24 June 2014 09:59
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] Timeline for the rest of the Juno release
 
 Hi, this came up in the weekly release sync with ttx, and I think its worth
 documenting as clearly as possible.
 
 Here is our proposed timeline for the rest of the Juno release. This is
 important for people with spec proposals either out for review, or intending
 to be sent for review soon.
 
 (The numbers if brackets are weeks before the feature freeze).
 
 Jun 12 (-12): Juno-1
 Jun 25 (-10): Spec review day
 (https://etherpad.openstack.org/p/nova-juno-spec-priorities)
 
 Jul  3 (-9): Spec proposal freeze
 Jul 10 (-8): Spec approval freeze
 Jul 24 (-6): Juno-2
 Jul 28 (-5): Nova mid cycle meetup
 (https://wiki.openstack.org/wiki/Sprints/BeavertonJunoSprint)
 
 Aug 21 (-2): Feature proposal freeze
 
 Sep  4 ( 0): Juno-3
  Feature freeze
  Merged J specs with no code proposed get deleted from nova-specs
 repo
  K opens for spec proposals, unmerged J spec proposals must rebase
 Sep 25 (+3): RC 1 build expected
  K spec review approvals start
 
 Oct 16 (+6): Release!
 (https://wiki.openstack.org/wiki/Juno_Release_Schedule)
 Oct 30: K summit spec proposal freeze
 
 Nov  6: K design summit
 
 Cheers,
 Michael
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release

2014-06-24 Thread Day, Phil
 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 24 June 2014 13:08
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Timeline for the rest of the Juno release
 
 On 06/24/2014 07:35 AM, Michael Still wrote:
  Phil -- I really want people to focus their efforts on fixing bugs in
  that period was the main thing. The theory was if we encouraged people
  to work on specs for the next release, then they'd be distracted from
  fixing the bugs we need fixed in J.
 
  Cheers,
  Michael
 
  On Tue, Jun 24, 2014 at 9:08 PM, Day, Phil philip@hp.com wrote:
  Hi Michael,
 
  Not sure I understand the need for a gap between Juno Spec approval
 freeze (Jul 10th) and K opens for spec proposals (Sep 4th).I can
 understand that K specs won't get approved in that period, and may not get
 much feedback from the cores - but I don't see the harm in letting specs be
 submitted to the K directory for early review / feedback during that period ?
 
 I agree with both of you.  Priorities need to be finishing up J, but I don't 
 see
 any reason not to let people post K specs whenever.
 Expectations just need to be set appropriately that it may be a while before
 they get reviewed/approved.
 
Exactly - I think it's reasonable to set the expectation that the focus of 
those that can produce/review code will be elsewhere - but that shouldn't stop 
some small effort going into knocking the rough corners off the specs at the 
same time


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-18 Thread Day, Phil
 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 17 June 2014 15:57
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?
 
 On 06/17/2014 10:43 AM, Richard W.M. Jones wrote:
  On Fri, Jun 13, 2014 at 06:12:16AM -0400, Aryeh Friedman wrote:
  Theoretically impossible to reduce disk unless you have some really
  nasty guest additions.
 
  True for live resizing.
 
  For dead resizing, libguestfs + virt-resize can do it.  Although I
  wouldn't necessarily recommend it.  In almost all cases where someone
  wants to shrink a disk, IMHO it is better to sparsify it instead (ie.
  virt-sparsify).
 
 FWIW, the resize operation in OpenStack is a dead one.
 
Dead as in not supported in V3 ?

How does that map into the plans to implement V2.1 on top of V3 ?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] locked instances and snaphot

2014-06-18 Thread Day, Phil
 -Original Message-
 From: Ahmed RAHAL [mailto:ara...@iweb.com]
 Sent: 18 June 2014 01:21
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] locked instances and snaphot
 
 Hi there,
 
 Le 2014-06-16 15:28, melanie witt a écrit :
  Hi all,
 
 [...]
 
  During the patch review, a reviewer raised a concern about the purpose
  of instance locking and whether prevention of snapshot while an
  instance is locked is appropriate. From what we understand, instance
  lock is meant to prevent unwanted modification of an instance. Is
  snapshotting considered a logical modification of an instance? That
  is, if an instance is locked to a user, they take a snapshot, create
  another instance using that snapshot, and modify the instance, have
  they essentially modified the original locked instance?
 
  I wanted to get input from the ML on whether it makes sense to
  disallow snapshot an instance is locked.
 
 Beyond 'preventing accidental change to the instance', locking could be seen
 as 'preventing any operation' to the instance.
 If I, as a user, lock an instance, it certainly only prevents me from 
 accidentally
 deleting the VM. As I can unlock whenever I need to, there seems to be no
 other use case (chmod-like).

It bocks any operation that would stop the instance from changing state:  
Delete, stop, start, reboot, rebuild, resize, shelve, pause, resume, etc

In keeping with that I don't see why it should block a snapshot, and having to 
unlock it to take a snapshot doesn't feel good either. 


 If I, as an admin, lock an instance, I am preventing operations on a VM and
 am preventing an ordinary user from overriding the lock.

The driver for doing this as an admin is slightly different - its to stop the 
user from changing the state of an instance rather than a protection.   A 
couple of use cases:
- if you want to migrate a VM and the user is running a continual 
sequence of say reboot commands at it putting an admin lock in place gives you 
a way to break into that cycle.
- There are a few security cases where we need to take over control of 
an instance, and make sure it doesn't get deleted by the user

 
 This is a form of authority enforcing that maybe should prevent even
 snapshots to be taken off that VM. The thing is that enforcing this beyond
 the limits of nova is AFAIK not there, so cloning/snapshotting cinder volumes
 will still be feasible.
 Enforcing it only in nova as a kind of 'security feature' may become
 misleading.
 
 The more I think about it, the more I get to think that locking is just there 
 to
 avoid mistakes, not voluntary misbehaviour.
 
 --
 
 Ahmed
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-18 Thread Day, Phil
 -Original Message-
 From: Richard W.M. Jones [mailto:rjo...@redhat.com]
 Sent: 18 June 2014 12:32
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction
 as part of resize ?
 
 On Wed, Jun 18, 2014 at 11:05:01AM +, Day, Phil wrote:
   -Original Message-
   From: Russell Bryant [mailto:rbry...@redhat.com]
   Sent: 17 June 2014 15:57
   To: OpenStack Development Mailing List (not for usage questions)
   Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk
   reduction as part of resize ?
  
   On 06/17/2014 10:43 AM, Richard W.M. Jones wrote:
On Fri, Jun 13, 2014 at 06:12:16AM -0400, Aryeh Friedman wrote:
Theoretically impossible to reduce disk unless you have some
really nasty guest additions.
   
True for live resizing.
   
For dead resizing, libguestfs + virt-resize can do it.  Although
I wouldn't necessarily recommend it.  In almost all cases where
someone wants to shrink a disk, IMHO it is better to sparsify it instead
 (ie.
virt-sparsify).
  
   FWIW, the resize operation in OpenStack is a dead one.
  
  Dead as in not supported in V3 ?
 
 dead as in not live resizing, ie. it happens only on offline disk images.
 
 Rich.
 
Ah, thanks.  I was thinking of dead as in it is an ex-operation, it has 
ceased to be, ... ;-)

There seems to be a consensus towards this being treated as an error - so I'll 
raise a spec.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-16 Thread Day, Phil
Beyond what is and isn’t technically possible at the file system level there is 
always the problem that the user may have more data than can fit into the 
reduced disk.

I don’t want to take away useful functionality from folks if there are cases 
where it already works – mostly I just want to improve the user experience, and 
 to me the biggest problem here is the current failure mode where the user 
can’t tell if the request has been tried and failed, or just not happened at 
all for some other reason.

What if we introduced a new state of “Resize_failed” from which the only 
allowed operations are “resize_revert” and delete – so the user can at least 
get some feedback on the cases that can’t be supported ?

From: Aryeh Friedman [mailto:aryeh.fried...@gmail.com]
Sent: 13 June 2014 18:15
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

Also ZFS needs to know what is on the guest for example bhyve (the only working 
hv for bsd currency [vbox kind of also works]) stores the backing store (unless 
bare metal) as single block file.   It is impossible to make that non-opaque to 
the outside world unless you can run commands on the instance.

On Fri, Jun 13, 2014 at 11:53 AM, Darren J Moffat 
darren.mof...@oracle.commailto:darren.mof...@oracle.com wrote:


On 06/13/14 16:37, Daniel P. Berrange wrote:
The xenapi implementation only works on ext[234] filesystems. That rules
out *BSD, Windows and Linux distributions that don't use ext[234]. RHEL7
defaults to XFS for instance.
Presumably it'll have a hard time if the guest uses LVM for its image
or does luks encryption, or anything else that's more complex than just
a plain FS in a partition.

For example ZFS, which doesn't currently support device removal (except for 
mirror detach) or device size shrink (but does support device grow).  ZFS does 
support file system resize but file systems are just logical things within a 
storage pool (made up of 1 or more devices) so that has nothing to do with the 
block device size.

--
Darren J Moffat


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][ironic] what to do with unit test failures from ironic api contract

2014-06-16 Thread Day, Phil

From: David Shrewsbury [mailto:shrewsbury.d...@gmail.com]
Sent: 14 June 2014 02:10
To: OpenStack Development Mailing List (not for usage questions)
Cc: Shrewsbury, David; Van Der Veen, Devananda
Subject: Re: [openstack-dev] [nova][ironic] what to do with unit test failures 
from ironic api contract

Hi!

On Fri, Jun 13, 2014 at 9:30 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

A recent change introduced a unit test to “warn/notify developers” when they 
make a change which will break the out of tree Ironic virt driver:   
https://review.openstack.org/#/c/98201

Ok – so my change (https://review.openstack.org/#/c/68942) broke it as it adds 
some extra parameters to the virt drive power_off() method – and so I now feel 
suitable warned and notified – but am not really clear what I’m meant to do 
next.

So far I’ve:

-  Modified the unit test in my Nova patch so it now works

-  Submitted an Ironic patch to add the extra parameters 
(https://review.openstack.org/#/c/99932/)

As far as I can see there’s no way to create a direct dependency from the 
Ironic change to my patch – so I guess its down to the Ironic folks to wait and 
accept it in the correct sequence ?

Thanks for bringing up this question.

98201 was added at the suggestion of Sean Dague during a conversation
in #openstack-infra to help prevent terrible breakages that affect the gate.
What wasn't discussed, however, is how we should coordinate these changes
going forward.

As for your change, I think what you've done is exactly what we had hoped would
be done. In your particular case, I don't see any need for Nova dev's to not go 
ahead
and approve 68942 *before* 99932 since defaults are added to the arguments. The
question is, how do we coordinate such changes if a change DOES actually break
ironic?

One suggestion is that if 
test_ironic_api_contracts.pyhttps://review.openstack.org/#/c/68942/15/nova/tests/virt/test_ironic_api_contracts.py
 is ever changed, Nova require
the Ironic PTL (or a core dev) to vote before approving. That seems sensible to 
me.
There might be an easier way of coordinating that I'm overlooking, though.

-Dave
--
David Shrewsbury (Shrews)



Hi Dave,

I agree that co-ordination is the key here – if the Ironic change is approved 
first then Nova and Ironic will continue to work, but there is a risk that the 
Nova change gets blocked / modified after the Ironic commit which would be 
painful.

If the Nova change is committed first then Ironic will of course be broken 
until its change is committed.

I’ll add a pointer and a note to the corresponding change in each of the 
patches.

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Review guidelines for API patches

2014-06-13 Thread Day, Phil
Hi Chris,

The documentation is NOT the canonical source for the behaviour of the API, 
currently the code should be seen as the reference. We've run into issues 
before where people have tried to align code to the fit the documentation and 
made backwards incompatible changes (although this is not one).

I’ve never seen this defined before – is this published as official Openstack  
or Nova policy ?

Personally I think we should be putting as much effort into reviewing the API 
docs as we do API code so that we can say that the API docs are the canonical 
source for behavior.Not being able to fix bugs in say input validation that 
escape code reviews because they break backwards compatibility seems to be a 
weakness to me.


Phil



From: Christopher Yeoh [mailto:cbky...@gmail.com]
Sent: 13 June 2014 04:00
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Review guidelines for API patches

On Fri, Jun 13, 2014 at 11:28 AM, Matt Riedemann 
mrie...@linux.vnet.ibm.commailto:mrie...@linux.vnet.ibm.com wrote:


On 6/12/2014 5:58 PM, Christopher Yeoh wrote:
On Fri, Jun 13, 2014 at 8:06 AM, Michael Still 
mi...@stillhq.commailto:mi...@stillhq.com
mailto:mi...@stillhq.commailto:mi...@stillhq.com wrote:

In light of the recent excitement around quota classes and the
floating ip pollster, I think we should have a conversation about the
review guidelines we'd like to see for API changes proposed against
nova. My initial proposal is:

  - API changes should have an associated spec


+1

  - API changes should not be merged until there is a tempest change to
test them queued for review in the tempest repo


+1

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

We do have some API change guidelines here [1].  I don't want to go overboard 
on every change and require a spec if it's not necessary, i.e. if it falls into 
the 'generally ok' list in that wiki.  But if it's something that's not 
documented as a supported API (so it's completely new) and is pervasive (going 
into novaclient so it can be used in some other service), then I think that 
warrants some spec consideration so we don't miss something.

To compare, this [2] is an example of something that is updating an existing 
API but I don't think warrants a blueprint since I think it falls into the 
'generally ok' section of the API change guidelines.

So really I see this a new feature, not a bug fix. Someone thought that detail 
was supported when writing the documentation but it never was. The 
documentation is NOT the canonical source for the behaviour of the API, 
currently the code should be seen as the reference. We've run into issues 
before where people have tried to align code to the fit the documentation and 
made backwards incompatible changes (although this is not one).

Perhaps we need a streamlined queue for very simple API changes, but I do think 
API changes should get more than the usual review because we have to live with 
them for so long (short of an emergency revert if we catch it in time).

[1] https://wiki.openstack.org/wiki/APIChangeGuidelines
[2] https://review.openstack.org/#/c/99443/

--

Thanks,

Matt Riedemann



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Review guidelines for API patches

2014-06-13 Thread Day, Phil
I agree that we need to keep a tight focus on all API changes.

However was the problem with the floating IP change just to do with the 
implementation in Nova or the frequency with which Ceilometer was calling it ?  
   Whatever guildelines we follow on API changes themselves its pretty hard to 
protect against the impact of a system with admin creds putting a large load 
onto the system.

 -Original Message-
 From: Michael Still [mailto:mi...@stillhq.com]
 Sent: 12 June 2014 23:36
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] Review guidelines for API patches
 
 In light of the recent excitement around quota classes and the floating ip
 pollster, I think we should have a conversation about the review guidelines
 we'd like to see for API changes proposed against nova. My initial proposal 
 is:
 
  - API changes should have an associated spec
 
  - API changes should not be merged until there is a tempest change to test
 them queued for review in the tempest repo
 
 Thoughts?
 
 Michael
 
 --
 Rackspace Australia
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil
Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones - which seems fair enough I guess (you can't drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn't it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn't support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn't seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )


It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil
Theoretically impossible to reduce disk unless you have some really nasty 
guest additions.

That’s what I thought – but many of the drivers seem to at least partially 
support it based on the code, hence the question on here to find out of that is 
really supported and works – or is just inconsistent error checking across 
drivers.

From: Aryeh Friedman [mailto:aryeh.fried...@gmail.com]
Sent: 13 June 2014 11:12
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

Theoretically impossible to reduce disk unless you have some really nasty guest 
additions.

On Fri, Jun 13, 2014 at 6:02 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones – which seems fair enough I guess (you can’t drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn’t it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn’t support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn’t seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )


It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Day, Phil
I guess the question I'm really asking here is:  Since we know resize down 
won't work in all cases, and the failure if it does occur will be hard for the 
user to detect, should we just block it at the API layer and be consistent 
across all Hypervisors ?

From: Andrew Laski [mailto:andrew.la...@rackspace.com]
Sent: 13 June 2014 13:57
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?


On 06/13/2014 08:03 AM, Day, Phil wrote:
Theoretically impossible to reduce disk unless you have some really nasty 
guest additions.

That's what I thought - but many of the drivers seem to at least partially 
support it based on the code, hence the question on here to find out of that is 
really supported and works - or is just inconsistent error checking across 
drivers.

My grumpy dev answer is that what works is not resizing down.  I'm familiar 
with the xen driver resize operation and will say that it does work when the 
guest filesystem and partition sizes are accommodating, but there's no good way 
to know whether or not it will succeed without actually trying it.  So when it 
fails it's after someone was waiting on a resize that seemed like it was 
working and then suddenly didn't.

If we want to aim for what's going to work consistently across drivers, it's 
probably going to end up being not resizing disks down.



From: Aryeh Friedman [mailto:aryeh.fried...@gmail.com]
Sent: 13 June 2014 11:12
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as 
part of resize ?

Theoretically impossible to reduce disk unless you have some really nasty guest 
additions.

On Fri, Jun 13, 2014 at 6:02 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I was looking at the resize code in libvirt, and it has checks which raise an 
exception if the target root or ephemeral disks are smaller than the current 
ones - which seems fair enough I guess (you can't drop arbitary disk content on 
resize), except that the  because the check is in the virt driver the effect is 
to just ignore the request (the instance remains active rather than going to 
resize-verify).

It made me wonder if there were any hypervisors that actually allow this, and 
if not wouldn't it be better to move the check to the API layer so that the 
request can be failed rather than silently ignored ?

As far as I can see:

baremetal: Doesn't support resize

hyperv: Checks only for root disk 
(https://github.com/openstack/nova/blob/master/nova/virt/hyperv/migrationops.py#L99-L108
  )

libvirt: fails for a reduction of either root or ephemeral  
(https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4918-L4923
 )

vmware:   doesn't seem to check at all ?

xen: Allows resize down for root but not for ephemeral 
(https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1015-L1032
 )


It feels kind of clumsy to have such a wide variation of behavior across the 
drivers, and to have the check performed only in the driver ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Aryeh M. Friedman, Lead Developer, http://www.PetiteCloud.org




___

OpenStack-dev mailing list

OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][ironic] what to do with unit test failures from ironic api contract

2014-06-13 Thread Day, Phil
Hi Folks,

A recent change introduced a unit test to warn/notify developers when they 
make a change which will break the out of tree Ironic virt driver:   
https://review.openstack.org/#/c/98201

Ok - so my change (https://review.openstack.org/#/c/68942) broke it as it adds 
some extra parameters to the virt drive power_off() method - and so I now feel 
suitable warned and notified - but am not really clear what I'm meant to do 
next.

So far I've:

-  Modified the unit test in my Nova patch so it now works

-  Submitted an Ironic patch to add the extra parameters 
(https://review.openstack.org/#/c/99932/)

As far as I can see there's no way to create a direct dependency from the 
Ironic change to my patch - so I guess its down to the Ironic folks to wait and 
accept it in the correct sequence ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler

2014-06-12 Thread Day, Phil


 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 09 June 2014 19:03
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory
 allocation ratio out of scheduler
 
 On 06/09/2014 12:32 PM, Chris Friesen wrote:
  On 06/09/2014 07:59 AM, Jay Pipes wrote:
  On 06/06/2014 08:07 AM, Murray, Paul (HP Cloud) wrote:
  Forcing an instance to a specific host is very useful for the
  operator - it fulfills a valid use case for monitoring and testing
  purposes.
 
  Pray tell, what is that valid use case?
 
  I find it useful for setting up specific testcases when trying to
  validate thingsput *this* instance on *this* host, put *those*
  instances on *those* hosts, now pull the power plug on *this* host...etc.
 
 So, violating the main design tenet of cloud computing: though shalt not care
 what physical machine your virtual machine lives on. :)
 
  I wouldn't expect the typical openstack end-user to need it though.
 
 Me either :)

But the full set of system capabilities isn't only about things that an 
end-user needs - there are also admin features we need to include.

Another use case for this is to place a probe instance on specific hosts to 
help monitor specific aspects of the system performance from a VM perspective.


 
 I will point out, though, that it is indeed possible to achieve the same use
 case using host aggregates that would not break the main design tenet of
 cloud computing... just make two host aggregates, one for each compute
 node involved in your testing, and then simply supply scheduler hints that
 would only match one aggregate or the other.
 

Even I wouldn't argue that aggregates are a great solution here ;-)   Not only 
does having single node aggregates for every host you want to force to seem a 
tad overkill, the logic for this admin feature includes by-passing the normal 
scheduler filters, 


 Best,
 -jay
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Arbitrary extra specs for compute nodes?

2014-06-09 Thread Day, Phil
Hi Joe,

Can you give some examples of what that data would be used for ?

It sounds on the face of it that what you’re looking for is pretty similar to 
what Extensible Resource Tracker sets out to do 
(https://review.openstack.org/#/c/86050   
https://review.openstack.org/#/c/71557)

Phil

From: Joe Cropper [mailto:cropper@gmail.com]
Sent: 07 June 2014 07:30
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] Arbitrary extra specs for compute nodes?

Hi Folks,
I was wondering if there was any such mechanism in the compute node structure 
to hold arbitrary key-value pairs, similar to flavors' extra_specs concept?
It appears there are entries for things like pci_stats, stats and recently 
added extra_resources -- but these all tend to have more specific usages vs. 
just arbitrary data that may want to be maintained about the compute node over 
the course of its lifetime.
Unless I'm overlooking an existing construct for this, would this be something 
that folks would welcome a Juno blueprint for--i.e., adding extra_specs style 
column with a JSON-formatted string that could be loaded as a dict of key-value 
pairs?

Thoughts?
Thanks,
Joe
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler

2014-06-06 Thread Day, Phil

From: Scott Devoid [mailto:dev...@anl.gov]
Sent: 04 June 2014 17:36
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation 
ratio out of scheduler

Not only live upgrades but also dynamic reconfiguration.

Overcommitting affects the quality of service delivered to the cloud user.  In 
this situation in particular, as in many situations in general, I think we want 
to enable the service provider to offer multiple qualities of service.  That 
is, enable the cloud provider to offer a selectable level of overcommit.  A 
given instance would be placed in a pool that is dedicated to the relevant 
level of overcommit (or, possibly, a better pool if the selected one is 
currently full).  Ideally the pool sizes would be dynamic.  That's the dynamic 
reconfiguration I mentioned preparing for.

+1 This is exactly the situation I'm in as an operator. You can do different 
levels of overcommit with host-aggregates and different flavors, but this has 
several drawbacks:

  1.  The nature of this is slightly exposed to the end-user, through 
extra-specs and the fact that two flavors cannot have the same name. One 
scenario we have is that we want to be able to document our flavor names--what 
each name means, but we want to provide different QoS standards for different 
projects. Since flavor names must be unique, we have to create different 
flavors for different levels of service. Sometimes you do want to lie to your 
users!
[Day, Phil] I agree that there is a problem with having every new option we add 
in extra_specs leading to a new set of flavors.There are a number of 
changes up for review to expose more hypervisor capabilities via extra_specs 
that also have this potential problem.What I’d really like to be able to 
ask for a s a user is something like “a medium instance with a side order of 
overcommit”, rather than have to choose from a long list of variations.I 
did spend some time trying to think of a more elegant solution – but as the 
user wants to know what combinations are available it pretty much comes down to 
needing that full list of combinations somewhere.So maybe the problem isn’t 
having the flavors so much, but in how the user currently has to specific an 
exact match from that list.
If the user could say “I want a flavor with these attributes” and then the 
system would find a “best match” based on criteria set by the cloud admin (for 
example I might or might not want to allow a request for an overcommitted 
instance to use my not-overcommited flavor depending on the roles of the 
tenant) then would that be a more user friendly solution ?


  1.  If I have two pools of nova-compute HVs with different overcommit 
settings, I have to manage the pool sizes manually. Even if I use puppet to 
change the config and flip an instance into a different pool, that requires me 
to restart nova-compute. Not an ideal situation.
[Day, Phil] If the pools are aggregates, and the overcommit is defined by 
aggregate meta-data then I don’t see why you  need to restart nova-compute.
3.  If I want to do anything complicated, like 3 overcommit tiers with 
good, better, best performance and allow the scheduler to pick better 
for a good instance if the good pool is full, this is very hard and 
complicated to do with the current system.
[Day, Phil]  Yep, a combination of filters and weighting functions would allow 
you to do this – its not really tied to whether the overcommit Is defined in 
the scheduler or the host though as far as I can see.

I'm looking forward to seeing this in nova-specs!
~ Scott
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler

2014-06-06 Thread Day, Phil


From: Scott Devoid [mailto:dev...@anl.gov]
Sent: 04 June 2014 17:36
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation 
ratio out of scheduler

Not only live upgrades but also dynamic reconfiguration.

Overcommitting affects the quality of service delivered to the cloud user.  In 
this situation in particular, as in many situations in general, I think we want 
to enable the service provider to offer multiple qualities of service.  That 
is, enable the cloud provider to offer a selectable level of overcommit.  A 
given instance would be placed in a pool that is dedicated to the relevant 
level of overcommit (or, possibly, a better pool if the selected one is 
currently full).  Ideally the pool sizes would be dynamic.  That's the dynamic 
reconfiguration I mentioned preparing for.

+1 This is exactly the situation I'm in as an operator. You can do different 
levels of overcommit with host-aggregates and different flavors, but this has 
several drawbacks:

  1.  The nature of this is slightly exposed to the end-user, through 
extra-specs and the fact that two flavors cannot have the same name. One 
scenario we have is that we want to be able to document our flavor names--what 
each name means, but we want to provide different QoS standards for different 
projects. Since flavor names must be unique, we have to create different 
flavors for different levels of service. Sometimes you do want to lie to your 
users!
[Day, Phil] BTW you might be able to (nearly) do this already if you define 
aggregates for the two QoS pools, and limit which projects can be scheduled 
into those pools using the AggregateMultiTenancyIsolation filter.I say 
nearly because as pointed out by this spec that filter currently only excludes 
tenants from each aggregate – it doesn’t actually constrain them to only be in 
a specific aggregate.


  1.  If I have two pools of nova-compute HVs with different overcommit 
settings, I have to manage the pool sizes manually. Even if I use puppet to 
change the config and flip an instance into a different pool, that requires me 
to restart nova-compute. Not an ideal situation.
  2.  If I want to do anything complicated, like 3 overcommit tiers with 
good, better, best performance and allow the scheduler to pick better 
for a good instance if the good pool is full, this is very hard and 
complicated to do with the current system.

I'm looking forward to seeing this in nova-specs!
~ Scott
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler

2014-06-05 Thread Day, Phil


 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 04 June 2014 19:23
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory
 allocation ratio out of scheduler
 
 On 06/04/2014 11:56 AM, Day, Phil wrote:
  Hi Jay,
 
  * Host aggregates may also have a separate allocation ratio that
  overrides any configuration setting that a particular host may have
 
  So with your proposal would the resource tracker be responsible for
  picking and using override values defined as part of an aggregate that
  includes the host ?
 
 Not quite sure what you're asking, but I *think* you are asking whether I am
 proposing that the host aggregate's allocation ratio that a compute node
 might be in would override any allocation ratio that might be set on the
 compute node? I would say that no, the idea would be that the compute
 node's allocation ratio would override any host aggregate it might belong to.
 

I'm not sure why you would want it that way round - aggregates lets me 
set/change the value of a number of hosts, and change the set of hosts that the 
values apply to.That in general seems a much better model for operators 
that having to manage things on a per host basis.

Why not keep the current model where an aggregate  setting overrides the 
default - that will now come from the host config rather that scheduler 
config ?

Cheers,
Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Need help with a gnarly Object Version issue

2014-06-05 Thread Day, Phil
Hi Dan,

 
  On a compute manager that is still running the old version of the code
  (i.e using the previous object version), if a method that hasn't yet
  been converted to objects gets a dict created from the new  version of
  the object (e.g. rescue, get_console_output), then object_compat()
  decorator will call the _/from_db/_object() method in
  objects.Instance. Because this is the old version of the object
  code, it expects user_data to be a field in dict, and throws a key error.
 
 Yeah, so the versioning rules are that for a minor version, you can only add
 things to the object, not remove them.

Ah - Ok.  That probably explains why it doesn't work then ;-(

 
  1)  Rather than removing the user_data field from the object just
  set it to a null value if its not requested.
 
 Objects have a notion of unset which is what you'd want here. Fields that
 are not set can be lazy-loaded when touched, which might be a reasonable
 way out of the box here if user_data is really only used in one place. It 
 would
 mean that older clients would lazy-load it when needed, and going forward
 we'd be specific about asking for it when we want.
 
 However, the problem is that instance defines the fields it's willing to lazy-
 load, and user_data isn't one of them. That'd mean that we need to backport
 a change to allow it to be lazy-loaded, which means we should probably just
 backport the thing that requests user_data when needed instead.
 
Not quite sure I follow.  The list of can be lazy loaded fields is defined by 
INSTANCE_OPTIONAL_ATTRS right ?   I moved user_data into that set of fields as 
part of my patch, but the problem I have is with mix of objects and non 
objects, such as the sequence where:

Client:Gets an Object (of new version)
RPCAPI:  Converts Object to a Dict (because the specific RPC method hasn't been 
converted to take an Object yet)
Manager:  Converts dict to an Object (of the old version) via the 
@object_compat decorator

The last step fails because _from_db_object() runs just in the 
not-yet-updated manager, and hence hits a key error.

I don't think lazy loading helps here, because the code that fails is trying to 
create the object form a dict, not trying to access into an Object - or am I 
missing something ? 


  2)  Add object versioning in the client side of the RPC layer for
  those methods that don't take objects.
 
 I'm not sure what you mean here.
 
In terms of the above scenario I was thinking that the RPCAPI layer could make 
sure the object was the right version before it converts it to a dict.



  I'm open to other ideas, and general guidance around how deletion of
  fields from Objects is meant to be handled ?
 
 It's meant to be handled by rev-ing the major version, since removing
 something isn't a compatible operation.
 
 Note that *conductor* has knowledge of the client-side version of an object
 on which the remotable_classmethod is being called, but that is not exposed
 to the actual object implementation in any way. We could, perhaps, figure
 out a sneaky way to expose that, which would let you honor the old behavior
 if we know the object is old, or the new behavior otherwise.
 
I think the problem is that I don't have an object at the point where I get the 
failure, I have a dict that is trying to be mapped into an object, so it 
doesn't call back into conductor.

I'm thinking now that as the problem is the size and/or data in user_data - and 
that is only in a very few specific places I could just set the user_data 
contest in the instance Object to None or X if its not requested when the 
object is created.  (Setting it to X would probably make it easier to debug 
if something that does what it gets missed)

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Different tenants can assign the same hostname to different machines without an error

2014-06-04 Thread Day, Phil

 The patch [2] proposes changing the default DNS driver from
 'nova.network.noop_dns_driver.NoopDNSDriver' to other that verifies if
 DNS entries already exists before adding them, such as the
 'nova.network.minidns.MiniDNS'.

Changing a default setting in a way that isn't backwards compatible when the 
cloud admin can already make that config change if they want it doesn't seem 
like the right thing to do. I think you need to do this in two stages:
1) In Juno deprecate the existing default value first (i.e add a waning for one 
cycle that says that the default will change) 
2) In K change the default

 

 -Original Message-
 From: samuel [mailto:sam...@lsd.ufcg.edu.br]
 Sent: 04 June 2014 15:01
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [Nova] Different tenants can assign the same
 hostname to different machines without an error
 
 Hi everyone,
 
 Concerning the bug described at [1], where n different machines may have
 the same hostname and then n different DNS entries with that hostname are
 written; some points have to be discussed with the nova community.
 
 On the bug review [2], Andrew Laski pointed out that we should discuss
 about having users getting errors due to the display name they choose.
 He thinks that name should be used purely for aesthetic purposes so I think a
 better approach to this problem would be to decouple display name and DNS
 entries. And display name has never needed to be globally unique before.
 
 The patch [2] proposes changing the default DNS driver from
 'nova.network.noop_dns_driver.NoopDNSDriver' to other that verifies if
 DNS entries already exists before adding them, such as the
 'nova.network.minidns.MiniDNS'.
 
 What are your thoughts up there?
 
 Sincerely,
 Samuel Queiroz
 
 [1] https://bugs.launchpad.net/nova/+bug/1283538
 [2] https://review.openstack.org/#/c/94252/
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Need help with a gnarly Object Version issue

2014-06-04 Thread Day, Phil
Hi Folks,

I've been working on a change to make the user_data field an optional part of 
the Instance object since passing it around everywhere seems a bad idea since:

-  It can be huge

-  It's only used when getting metadata

-  It can contain user sensitive data

-
https://review.openstack.org/#/c/92623/9

I've included the object version changes, and that all works fine - but I'm 
left with one issue that I'm not sure how to proceed with:

On a compute manager that is still running the old version of the code (i.e 
using the previous object version), if a method that hasn't yet been converted 
to objects gets a dict created from the new  version of the object (e.g. 
rescue, get_console_output), then object_compat() decorator will call the 
_from_db_object() method in objects.Instance. Because this is the old 
version of the object code, it expects user_data to be a field in dict, and 
throws a key error.

I can think of a number of possible fixes - but I'm not sure any of them are 
very elegant (and of course they have to fix the problem before the data is 
sent to the compute manager):


1)  Rather than removing the user_data field from the object just set it to 
a null value if its not requested.


2)  Add object versioning in the client side of the RPC layer for those 
methods that don't take objects.

I'm open to other ideas, and general guidance around how deletion of fields 
from Objects is meant to be handled ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Neutron] heal_instance_info_cache_interval - Can we kill it?

2014-05-29 Thread Day, Phil
Could we replace the refresh from the period task with a timestamp in the 
network cache of when it was last updated so that we refresh it only when it’s 
accessed if older that X ?

From: Aaron Rosen [mailto:aaronoro...@gmail.com]
Sent: 29 May 2014 01:47
To: Assaf Muller
Cc: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] [Neutron] heal_instance_info_cache_interval 
- Can we kill it?



On Wed, May 28, 2014 at 7:39 AM, Assaf Muller 
amul...@redhat.commailto:amul...@redhat.com wrote:


- Original Message -
 Hi,

 Sorry somehow I missed this email. I don't think you want to disable it,
 though we can definitely have it run less often. The issue with disabling it
 is if one of the notifications from neutron-nova never gets sent
 successfully to nova (neutron-server is restarted before the event is sent
 or some other internal failure). Nova will never update it's cache if the
 heal_instance_info_cache_interval is set to 0.
The thing is, this periodic healing doesn't imply correctness either.
In the case where you lose a notification and the compute node hosting
the VM is hosting a non-trivial amount of VMs it can take (With the default
of 60 seconds) dozens of minutes to update the cache, since you only
update a VM a minute. I could understand the use of a sanity check if it
was performed much more often, but as it is now it seems useless to me
since you can't really rely on it.

I agree with you. That's why we implemented the event callback so that the 
cache would be more up to date. In honesty you can probably safely disable the  
heal_instance_info_cache_interval and things will probably be fine as we 
haven't seen many failures where events from neutron fail to send. If we find 
out this is the case we can definitely make the event sending notification 
logic in neutron much more robust by persisting events to the db and 
implementing retry logic on failure there to help ensure nova gets the 
notification.

What I'm trying to say is that with the inefficiency of the implementation,
coupled with Neutron's default plugin inability to cope with a large
amount of API calls, I feel like the disadvantages outweigh the
advantages when it comes to the cache healing.

Right the current heal_instance implementation has scaling issues as every 
compute node runs this task querying neutron. The more compute nodes you have 
the more querying. Hopefully the nova v3 api should solve this issue though as 
the networking information will no longer have to live in nova as well. So 
someone interested in this data network data can query neutron directly and we 
can avoid these type of caching issues all together :)

How would you feel about disabling it, optimizing the implementation
(For example by introducing a new networking_for_instance API verb to Neutron?)
then enabling it again?

I think this is a good idea we should definitely implement something like this 
so nova can interface with less api calls.

 The neutron-nova events help
 to ensure that the nova info_cache is up to date sooner by having neutron
 inform nova whenever a port's data has changed (@Joe Gordon - this happens
 regardless of virt driver).

 If you're using the libvirt virt driver the neutron-nova events will also be
 used to ensure that the networking is 'ready' before the instance is powered
 on.

 Best,

 Aaron

 P.S: we're working on making the heal_network call to neutron a lot less
 expensive as well in the future.




 On Tue, May 27, 2014 at 7:25 PM, Joe Gordon  
 joe.gord...@gmail.commailto:joe.gord...@gmail.com  wrote:






 On Wed, May 21, 2014 at 6:21 AM, Assaf Muller  
 amul...@redhat.commailto:amul...@redhat.com  wrote:


 Dear Nova aficionados,

 Please make sure I understand this correctly:
 Each nova compute instance selects a single VM out of all of the VMs
 that it hosts, and every heal_instance_info_cache_interval seconds
 queries Neutron for all of its networking information, then updates
 Nova's DB.

 If the information above is correct, then I fail to see how that
 is in anyway useful. For example, for a compute node hosting 20 VMs,
 it would take 20 minutes to update the last one. Seems unacceptable
 to me.

 Considering Icehouse's Neutron to Nova notifications, my question
 is if we can change the default to 0 (Disable the feature), deprecate
 it, then delete it in the K cycle. Is there a good reason not to do this?

 Based on the patch that introduced this function [0] you may be on to
 something, but AFAIK unfortunately the neutron to nova notifications only
 work in libvirt right now [1], so I don' think we can fully deprecate this
 periodic task. That being said turning it off by default may be an option.
 Have you tried disabling this feature and seeing what happens (in the gate
 and/or in production)?

We've disabled it in a scale lab and didn't observe any black holes forming
or other catastrophes.


 [0] https://review.openstack.org/#/c/4269/
 [1] 

Re: [openstack-dev] [nova] nova default quotas

2014-05-29 Thread Day, Phil


From: Kieran Spear [mailto:kisp...@gmail.com]
Sent: 28 May 2014 06:05
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] nova default quotas

Hi Joe,

On 28/05/2014, at 11:21 AM, Joe Gordon 
joe.gord...@gmail.commailto:joe.gord...@gmail.com wrote:




On Tue, May 27, 2014 at 1:30 PM, Kieran Spear 
kisp...@gmail.commailto:kisp...@gmail.com wrote:


On 28/05/2014, at 6:11 AM, Vishvananda Ishaya 
vishvana...@gmail.commailto:vishvana...@gmail.com wrote:

 Phil,

 You are correct and this seems to be an error. I don't think in the earlier 
 ML thread[1] that anyone remembered that the quota classes were being used 
 for default quotas. IMO we need to revert this removal as we (accidentally) 
 removed a Havana feature with no notification to the community. I've 
 reactivated a bug[2] and marked it critical.

+1.

We rely on this to set the default quotas in our cloud.

Hi Kieran,

Can you elaborate on this point. Do you actually use the full quota-class 
functionality that allows for quota classes, if so what provides the quota 
classes? If you only use this for setting the default quotas, why do you prefer 
the API and not setting the config file?

We just need the defaults. My comment was more to indicate that yes, this is 
being used by people. I'm sure we could switch to using the config file, and 
generally I prefer to keep configuration in code, but finding out about this 
half way through a release cycle isn't ideal.

I notice that only the API has been removed in Icehouse, so I'm assuming the 
impact is limited to *changing* the defaults, which we don't do often. I was 
initially worried that after upgrading to Icehouse we'd be left with either no 
quotas or whatever the config file defaults are, but it looks like this isn't 
the case.

Unfortunately the API removal in Nova was followed by similar changes in 
novaclient and Horizon, so fixing Icehouse at this point is probably going to 
be difficult.

[Day, Phil]  I think we should revert the changes in all three system then.   
We have the rules about not breaking API compatibility in place for a reason, 
if we want to be taken seriously as a stable API then we need to be prepared to 
roll back if we goof-up.

Joe - was there a nova-specs BP for the change ?  I'm wondering how this one 
slipped through


Cheers,
Kieran




Kieran


 Vish

 [1] 
 http://lists.openstack.org/pipermail/openstack-dev/2014-February/027574.html
 [2] https://bugs.launchpad.net/nova/+bug/1299517

 On May 27, 2014, at 12:19 PM, Day, Phil 
 philip@hp.commailto:philip@hp.com wrote:

 Hi Vish,

 I think quota classes have been removed from Nova now.

 Phil


 Sent from Samsung Mobile


  Original message 
 From: Vishvananda Ishaya
 Date:27/05/2014 19:24 (GMT+00:00)
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] nova default quotas

 Are you aware that there is already a way to do this through the cli using 
 quota-class-update?

 http://docs.openstack.org/user-guide-admin/content/cli_set_quotas.html (near 
 the bottom)

 Are you suggesting that we also add the ability to use just regular 
 quota-update? I'm not sure i see the need for both.

 Vish

 On May 20, 2014, at 9:52 AM, Cazzolato, Sergio J 
 sergio.j.cazzol...@intel.commailto:sergio.j.cazzol...@intel.com wrote:

 I would to hear your thoughts about an idea to add a way to manage the 
 default quota values through the API.

 The idea is to use the current quota api, but sending ''default' instead of 
 the tenant_id. This change would apply to quota-show and quota-update 
 methods.

 This approach will help to simplify the implementation of another blueprint 
 named per-flavor-quotas

 Feedback? Suggestions?


 Sergio Juan Cazzolato
 Intel Software Argentina

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev

Re: [openstack-dev] [nova] nova default quotas

2014-05-27 Thread Day, Phil
Hi Vish,

I think quota classes have been removed from Nova now.

Phil


Sent from Samsung Mobile


 Original message 
From: Vishvananda Ishaya
Date:27/05/2014 19:24 (GMT+00:00)
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] nova default quotas

Are you aware that there is already a way to do this through the cli using 
quota-class-update?

http://docs.openstack.org/user-guide-admin/content/cli_set_quotas.html (near 
the bottom)

Are you suggesting that we also add the ability to use just regular 
quota-update? I’m not sure i see the need for both.

Vish

On May 20, 2014, at 9:52 AM, Cazzolato, Sergio J sergio.j.cazzol...@intel.com 
wrote:

 I would to hear your thoughts about an idea to add a way to manage the 
 default quota values through the API.

 The idea is to use the current quota api, but sending ''default' instead of 
 the tenant_id. This change would apply to quota-show and quota-update methods.

 This approach will help to simplify the implementation of another blueprint 
 named per-flavor-quotas

 Feedback? Suggestions?


 Sergio Juan Cazzolato
 Intel Software Argentina

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] Confusion about the respective use cases for volume's admin_metadata, metadata and glance_image_metadata

2014-05-21 Thread Day, Phil
 -Original Message-
 From: Tripp, Travis S
 Sent: 07 May 2014 18:06
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Cinder] Confusion about the respective use
 cases for volume's admin_metadata, metadata and glance_image_metadata
 
  We're suffering from a total overload of the term 'metadata' here, and
  there are
  3 totally separate things that are somehow becoming mangled
 
 Thanks for the summary. The term metadata definitely gets overloaded.
 I've been experimenting with the metadata to see what happens with all
 of it.
 
OK I won't even try to bring Nova's three types of metadata into the discussion 
then.

 Glance image properties == ALL properties are copied to 
 volume_image_metadata in Cinder 

Let's just limit this thread to this one, since that's the one that is partly 
mutable in Glance and becomes immutable in Cinder

 
 Regarding the property protections in Glance, it looks to use RBAC.  It seems
 to me that if a volume is being uploaded to glance with protected properties
 and the user doing the copying doesn't have the right roles to create those
 properties that Glance should reject the upload request.
 
 Based on the etherpads, the primary motivation for property protections
 was for an image marketplace, which doesn't seem like there would be the
 same need for volumes. 
No it is still needed.   Consider the case where there is a licensed image in 
Glance.   That license key, which will be passed through to the billing system 
has to be immutable and has to be availabe to Nova for any instance that is 
running a copy of that image.  Create a snapshot in Glance, the key needs to be 
there.  Create a bootable volume in Cinder, the key needs to be there, etc, 
etc.So both Nova and Cinder have to copy the Glance Image properties 
whenever they create a copy of an image.

The full set of paths where the image properties need to be copied are:

- When Cinder creates a bootable volume from an Image on Glance
- When Cinder creates a snapshot or copy of a bootable volume
- When Nova creates a snapshot in Glance from a running instance (So Nova has 
to have a copy of the properties of the image the instance was booted from - 
the image in Glance can be deleted while the instance is running)

The issue is that the set of Glance Image Properties that are copied need are a 
combination of muatable and immutable values - but that distinction is lost 
when they are copied into Cinder.  I'm not even sure if you can query Glance to 
find out if a property is mutable or not.

So to make Cinder and Glance consistent I think we would need:

1) A way to find out from Glance is a property is mutable or not
2) A way in Cinder to mark a property as mutable or immutable

I don't think Nova needs to know the difference, since it only ever creates 
snapshots in Glance - and Glance already knows what can and can't be changed.

Phil

 
  -Original Message-
  From: Duncan Thomas [mailto:duncan.tho...@gmail.com]
  Sent: Wednesday, May 07, 2014 7:57 AM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [Cinder] Confusion about the respective
  use cases for volume's admin_metadata, metadata and
  glance_image_metadata
 
  On 7 May 2014 09:36, Trump.Zhang zhangleiqi...@gmail.com wrote:
   @Tripp, Thanks for your reply and info.
  
   I am also thinking if it is proper to add support for updating the
   volume's glance_image_metadta to reflect the newest status of
 volume.
  
   However, there may be alternative ways to achieve it:
   1. Using the volume's metatadata
   2. Using the volume's admin_metadata
  
   So I am wondering which is the most proper method.
 
 
  We're suffering from a total overload of the term 'metadata' here, and
  there are
  3 totally separate things that are somehow becoming mangled:
 
  1. Volume metadata - this is for the tenant's own use. Cinder and nova
  don't assign meaning to it, other than treating it as stuff the tenant
  can set. It is entirely unrelated to glance_metadata 2. admin_metadata
  - this is an internal implementation detail for cinder to avoid every
  extension having to alter the core volume db model. It is not the same
  thing as glance metadata or volume_metadata.
 
  An interface to modify volume_glance_metadata sounds reasonable,
  however it is *unrelated* to the other two types of metadata. They are
  different things, not replacements or anything like that.
 
  Glance protected properties need to be tied into the modification API
  somehow, or else it becomes a trivial way of bypassing protected
  properties. Hopefully a glance expert can pop up and suggest a way of
 achieving this integration.
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev 

Re: [openstack-dev] [nova] Proposal: remove the server groups feature

2014-05-01 Thread Day, Phil
 
  In the original API there was a way to remove members from the group.
  This didn't make it into the code that was submitted.
 
 Well, it didn't make it in because it was broken. If you add an instance to a
 group after it's running, a migration may need to take place in order to keep
 the semantics of the group. That means that for a while the policy will be
 being violated, and if we can't migrate the instance somewhere to satisfy the
 policy then we need to either drop it back out, or be in violation. Either 
 some
 additional states (such as being queued for inclusion in a group, etc) may be
 required, or some additional footnotes on what it means to be in a group
 might have to be made.
 
 It was for the above reasons, IIRC, that we decided to leave that bit out 
 since
 the semantics and consequences clearly hadn't been fully thought-out.
 Obviously they can be addressed, but I fear the result will be ... ugly. I 
 think
 there's a definite possibility that leaving out those dynamic functions will 
 look
 more desirable than an actual implementation.
 
If we look at a server group as a general contained or servers, that may have 
an attribute that expresses scheduling policy, then it doesn't seem to ugly to 
restrict the conditions on which an add is allowed to only those that don't 
break the (optional) policy.Wouldn't even have to go to the scheduler to 
work this out.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][service group]improve host state detection

2014-05-01 Thread Day, Phil
Nova now can detect host unreachable. But it fails to make out host isolation, 
host dead and nova compute service down. When host unreachable is reported, 
users have to find out the exact state by himself and then take the 
appropriate measure to recover. Therefore we'd like to improve the host 
detection for nova.

I guess this depends on the service group driver that you use.  For example if 
you use the DB driver, then there is a thread running on the compute manager 
that periodically updates the alive status - which included both a liveness 
check (to the extent that the thread is still running) of the compute manager 
and that it can contact the DB.If the compute manager is using conductor 
then it also includes implicitly a check that the compute manager can talk to 
MQ (a nice side effect of conductor - as before a node could be Up because it 
could talk to the DB but not able to process any messages)

So to me the DB driver kind of already covers send network heartbeat to the 
central agent and writes timestamp in shared storage periodically - so maybe 
this is more of a specific ServiceGroup Driver issue rather than a generic 
ServiceGroup change ?

Phil



From: Jiangying (Jenny) [mailto:jenny.jiangy...@huawei.com]
Sent: 28 April 2014 13:31
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova][service group]improve host state detection

Nova now can detect host unreachable. But it fails to make out host isolation, 
host dead and nova compute service down. When host unreachable is reported, 
users have to find out the exact state by himself and then take the appropriate 
measure to recover. Therefore we'd like to improve the host detection for nova.

Currently the service group API factors out the host detection and makes it a 
set of abstract internal APIs with a pluggable backend implementation. The 
backend we designed is as follows:

A detection central agent is introduced. When a member joins into the service 
group, the member host starts to send network heartbeat to the central agent 
and writes timestamp in shared storage periodically. When the central agent 
stops receiving the network heartbeats from a member, it pings the member and 
checks the storage heartbeat before declaring the host to have failed.


network heartbeat|network ping|storage heartbeat| state  | reason
|-||---|--
OK   |  - |-| Running | -
  Not OK |   Not OK   | Not OK  | Dead   | hardware 
failure/abnormal host shut down
  Not OK | OK | Not OK  | Service unreachable| service 
process crashed
  Not OK |   Not OK   |   OK| Isolated   | network 
unreachable

Based on the state recognition table, nova can discern the exact host state and 
assign the reasons.

Thoughts?

Jenny

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal: remove the server groups feature

2014-04-28 Thread Day, Phil
 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 25 April 2014 23:29
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Proposal: remove the server groups
 feature
 
 On Fri, 2014-04-25 at 22:00 +, Day, Phil wrote:
  Hi Jay,
 
  I'm going to disagree with you on this one, because:
 
 No worries, Phil, I expected some dissention and I completely appreciate
 your feedback and perspective :)

Always happy to meet your expectations ;-)

Seem that where we mainly disagree is on the usability of the current Server 
Group API - and maybe we just need a wider range of views to see where the 
majority feeling is on that.

The folks we've got who have been looking for affinity/anti-affinity scheduling 
(we've been holding off from the previous filters because they include DB 
lookups) don't seem to find the Server Groups schematic confusing - you know 
you want to do something with a group of servers so you create a group, define 
the properties for that group, and add servers to it as you create them.

I agree there are a number of things needed to round this out (such as 
add/remove server, and some form of quota on the maximum size of a group), but 
I just don't see the  basic approach as broken in the way that you do - and I 
am worried that we end up spinning on much needed functionality if we start to 
rework it now.

The tagging approach  (if I understand it correctly) seems like it would start 
to introduce system schematics to values that are currently just user defined 
free text - which I think might lead to more confusion  / name-space clashes 
around which tags are now in effect reserved names and which are still user 
defined.   I think I prefer the clearer separation.

Phil
 
  i) This is a feature that was discussed in at least one if not two Design
 Summits and went through a long review period, it wasn't one of those
 changes that merged in 24 hours before people could take a good look at it.
 
 Completely understood. That still doesn't mean we can't propose to get rid
 of it early instead of letting it sit around when an alternate implementation
 would be better for the user of OpenStack.
 
Whatever you feel about the implementation,  it is now in the API and we
 should assume that people have started coding against it.
 
 Sure, maybe. AFAIK, it's only in the v2 API, though, not in the v3 API 
 (sorry, I
 made a mistake about that in my original email). Is there a reason it wasn't
 added to the v3 API?
 
I don't think it gives any credibility to Openstack as a platform if we 
  yank
 features back out just after they've landed.
 
 Perhaps not, though I think we have less credibility if we don't recognize
 when a feature isn't implemented with users in mind and leave it in the code
 base to the detriment and confusion of users. We absolutely must, IMO, as a
 community, be able to say this isn't right
 and have a path for changing or removing something.
 
 If that path is deprecation vs outright removal, so be it, I'd be cool with 
 that.
 I'd just like to nip this anti-feature in the bud early so that it doesn't 
 become
 the next feature like file-injection to persist in Nova well after its time 
 has
 come and passed.
 
  ii) Sever Group - It's a way of defining a group of servers, and the initial
 thing (only thing right now) you can define for such a group is the affinity 
 or
 anti-affinity for scheduling.
 
 We already had ways of defining groups of servers. This new feature
 doesn't actually define a group of servers. It defines a policy, which is not
 particularly useful, as it's something that is better specified at the time of
 launching.
 
Maybe in time we'll add other group properties or operations - like
 delete all the servers in a group (I know some QA folks that would love to
 have that feature).
 
 We already have the ability to define a group of servers using key=value tags.
 Deleting all servers in a group is a three-line bash script that loops over 
 the
 results of a nova list command and calls nova delete.
 Trust me, I've done group deletes in this way many times.
 
I don't see why it shouldn't be possible to have a server group that 
  doesn't
 have a scheduling policy associated to it.
 
 I don't think the grouping of servers should have *anything* to do with
 scheduling :) That's the point of my proposal. Servers can and should be
 grouped using simple tags or key=value pair tags.
 
 The grouping of servers together doesn't have anything of substance to do
 with scheduling policies.
 
 I don't see any  Cognitive dissonance here - I think your just assuming 
  that
 the only reason for being able to group servers is for scheduling.
 
 Again, I don't think scheduling and grouping of servers has anything to do
 with each other, thus my proposal to remove the relationship between
 groups of servers and scheduling policies, which is what the existing server
 group API and implementation does

Re: [openstack-dev] [nova] Proposal: remove the server groups feature

2014-04-25 Thread Day, Phil
Hi Jay,

I'm going to disagree with you on this one, because:

i) This is a feature that was discussed in at least one if not two Design 
Summits and went through a long review period, it wasn't one of those changes 
that merged in 24 hours before people could take a good look at it.  Whatever 
you feel about the implementation,  it is now in the API and we should assume 
that people have started coding against it.  I don't think it gives any 
credibility to Openstack as a platform if we yank features back out just after 
they've landed. 

ii) Sever Group - It's a way of defining a group of servers, and the initial 
thing (only thing right now) you can define for such a group is the affinity or 
anti-affinity for scheduling.  Maybe in time we'll add other group properties 
or operations - like delete all the servers in a group (I know some QA folks 
that would love to have that feature).  I don't see why it shouldn't be 
possible to have a server group that doesn't have a scheduling policy 
associated to it.   I don't see any  Cognitive dissonance here - I think your 
just assuming that the only reason for being able to group servers is for 
scheduling.

iii) If the issue is that you can't add or remove servers from a group, then 
why don't we add those operations to the API (you could add a server to a group 
providing doing so  doesn't break any policy that might be associated with the 
group).   Seems like a useful addition to me.

iv) Since the user created the group, and chose a name for it that is 
presumably meaningful, then I don't understand why you think --group XXX 
isn't going to be meaningful to that same user ?

So I think there are a bunch of API operations missing, but I don't see any 
advantage in throwing away what's now in place and  replacing it with a tag 
mechanism that basically says everything with this tag is in a sort of group.

Cheers,
Phil


PS: Congrats on the TC election


 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 25 April 2014 22:16
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [nova] Proposal: remove the server groups feature
 
 Hi Stackers,
 
 When recently digging in to the new server group v3 API extension
 introduced in Icehouse, I was struck with a bit of cognitive dissonance that I
 can't seem to shake. While I understand and support the idea behind the
 feature (affinity and anti-affinity scheduling hints), I can't help but feel 
 the
 implementation is half-baked and results in a very awkward user experience.
 
 The use case here is very simple:
 
 Alice wants to launch an instance and make sure that the instance does not
 land on a compute host that contains other instances of that type.
 
 The current user experience is that the user creates a server group
 like so:
 
 nova server-group-create $GROUP_NAME --policy=anti-affinity
 
 and then, when the user wishes to launch an instance and make sure it
 doesn't land on a host with another of that instance type, the user does the
 following:
 
 nova boot --group $GROUP_UUID ...
 
 There are myriad problems with the above user experience and
 implementation. Let me explain them.
 
 1. The user isn't creating a server group when they issue a nova server-
 group-create call. They are creating a policy and calling it a group. 
 Cognitive
 dissonance results from this mismatch.
 
 2. There's no way to add an existing server to this group. What this means
 is that the user needs to effectively have pre-considered their environment
 and policy before ever launching a VM. To realize why this is a problem,
 consider the following:
 
  - User creates three VMs that consume high I/O utilization
  - User then wants to launch three more VMs of the same kind and make
 sure they don't end up on the same hosts as the others
 
 No can do, since the first three VMs weren't started using a --group
 scheduler hint.
 
 3. There's no way to remove members from the group
 
 4. There's no way to manually add members to the server group
 
 5. The act of telling the scheduler to place instances near or away from some
 other instances has been hidden behind the server group API, which means
 that users doing a nova help boot will see a --group option that doesn't make
 much sense, as it doesn't describe the scheduling policy activity.
 
 Proposal
 
 
 I propose to scrap the server groups API entirely and replace it with a 
 simpler
 way to accomplish the same basic thing.
 
 Create two new options to nova boot:
 
  --near-tag TAG
 and
  --not-near-tag TAG
 
 The first would tell the scheduler to place the new VM near other VMs
 having a particular tag. The latter would tell the scheduler to place the 
 new
 VM *not* near other VMs with a particular tag.
 
 What is a tag? Well, currently, since the Compute API doesn't have a
 concept of a single string tag, the tag could be a key=value pair that would 
 be
 matched against the server extra properties.
 
 Once a real 

Re: [openstack-dev] [nova][neutron] Changing to external events

2014-04-16 Thread Day, Phil
 
  Is that right, and any reason why the default for
  vif_plugging_is_fatal shouldn't be False insated of True to make this
  sequence less dependent on matching config changes ?
 
 Yes, because the right approach to a new deployment is to have this
 enabled. If it was disabled by default, most deployments would never turn it
 on.
 
Difficult to get the balance here, as I think we also have a duty to not break 
existing deployments.   I would have preferred the initial set of defaults to 
be backwards compatible, with a subsequent change (say at Juno.1) to change 
them to the on by default

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] nova-specs

2014-04-16 Thread Day, Phil
 On 04/15/2014 11:01 AM, Brian Elliott wrote:
  * specs review. The new blueprint process is a work of genius, and I
  think its already working better than what we've had in previous
  releases. However, there are a lot of blueprints there in review, and
  we need to focus on making sure these get looked at sooner rather
  than later. I'd especially like to encourage operators to take a look
  at blueprints relevant to their interests. Phil Day from HP has been
  doing a really good job at this, and I'd like to see more of it.
 
  I have mixed feelings about the nova-specs repo.  I dig the open
 collaboration of the blueprints process, but I also think there is a danger of
 getting too process-oriented here.  Are these design documents expected to
 call out every detail of a feature?  Ideally, I'd like to see only very high 
 level
 documentation in the specs repo.  Basically, the spec could include just
 enough detail for people to agree that they think a feature is worth 
 inclusion.
 More detailed discussion could remain on the code reviews since they are
 the actual end work product.
 
They are intended to be high level designs rather than low level designs, so no 
they don't have to include all of the implementation details.

On the other hand they should provided not only the info required to decide 
that the feature is worth implementing, but also enough details so that the 
reviewers can agree on the overall design approach (to avoid churn late in the 
implementation review) and cover a number of other areas that can and should be 
considered before the implementation starts but seem too often get overlooked 
and are quite hard to dig back out from the code (like what is the impact going 
to be on an system that's already running).   The template is specifically set 
up to try and prompt the submitter to think about these issues, and I think 
that brings a huge amount of value to this stage.  At the moment I'm seeing a 
number of sections being answered as None when really this seems to be don't 
know or didn't think about that - and I'm thinking that we should ask for a 
simple one-line justification of why there is no impact.

Don't think of it as becoming process-orientated, if we get to the stage where 
BPs are submitted just as a formality then we've lost the plot again.   Think 
of it as way to bridge the gap between hard-core developers  and other 
stakeholders (Deployers, Operators,  Archietctes, etc), and a way to increase 
the chances of not getting a -2 after patch set 98 because someone has just 
decided they don't like the design. 

I think having this approach is a great sign of the growing maturity of 
OpenStack.

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] TC Candidacy

2014-04-16 Thread Day, Phil
I would like to announce my TC candidacy.

I work full time for HP where I am the architect and technical lead for the 
core OpenStack Engineering team, with responsibility for the architecture and 
deployment of the OpenStack Infrastructure projects  (Nova, Neutron, Cinder, 
Glance, Swift) across a range of  HP Products including our Public Cloud, which 
is one of the largest public deployments and has been tracking upstream trunk 
for the last 18 months.

I have been working with OpenStack since Diablo, and am continually in awe at 
what we, as a community, have managed to create over the last few years.  I 
believe that OpenStack has now reached the point of maturity where Operator and 
Deployer perspectives need to have a strong voice in how the services evolve. 
My position in HP as the architect for the Infrastructure service of HP's 
Public Cloud, as well as being an active contributor and reviewer to Nova, puts 
me in a strong position to provide those perspectives.

As an active developer in Nova I have been a strong advocate of bringing 
improved formality to the Blueprint review process, both to allow other 
perspectives to be incorporated and to address as early as possible some of the 
issues that have in the past been overlooked until the final stages of 
implementation.  Although still in its early days, that new process has had 
very positive feedback, and is an example of the kind of areas where I think 
the TC should be seeking to make a difference across all projects.

While respecting and valuing the rights of each project to have a degree of 
autonomy, I believe that the TC needs to take a strong role in joining the 
dots and making sure that a change required by one project (such as for 
example the introduction of the Keystone V3 API) has a supporting plan of how 
it can be incorporated into the roadmap of other projects - not through 
dictating to the projects but by working with the PTLs to ensure that there is 
a smooth and agreed transition plan, and by resolving conflicts in priorities.  
  Each project has its own highly skilled, and to a degree specialized, 
community and a list of features that they want to introduce, and I want to see 
the TC play a role in monitoring those changes to look for, and seek to 
resolve, conflicts and overlaps.This is a role I already play with HP, and 
I would welcome the opportunity to bring that experience to OpenStack.

Thanks
Phil Day
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][neutron] Changing to external events

2014-04-15 Thread Day, Phil
Hi Folks,

Sorry for being a tad slow on the uptake here, but I'm trying to understand the 
sequence of updates required to move from a  system that doesn't have external  
events configured between Neutron and Nova and one that does  (The new 
nova-specs repo would have captured this as part of the BP ;-)

So assuming I start form the model where neither service is using events, as I 
understand it


-  As soon as I deploy the modified Nova code the compute manager will 
start waiting for events when pugging vifs. By default it will wait for 600 
seconds and then fail (because neutron can't deliver the event).   Because 
vif_plugging_is_fatal=True by default this will mean all instance creates will 
fail - so that doesn't seem like a good first move (and maybe not the best set 
of defaults)


-  If I modify Neutron first so that it now sends the events, but Nova 
isn't yet updated to expose the new API extension, then Neutron will fail 
instead because it can't send the event (I can't find the corresponding neutron 
patch references in the BP page 
https://blueprints.launchpad.net/nova/+spec/admin-event-callback-api - so if 
someone could point me at that it would be helpful).So unless Neutron can 
cope with this that doesn't feel like a good first move either.



If feels like the right sequence is:

-  Deploy the new code in Nova and at the same time set 
vif_plugging_is-fatal=False, so that Nova will wait for Neutron, but will still 
continue if the event never turns up (which is kind of like the code was 
before, but with a wait)

-  At the same time enable the new API extension in Nova so that Nova 
can consume events

-  Then update Neutron (with presumably some additional config) so that 
it starts sending events

Is that right, and any reason why the default for vif_plugging_is_fatal 
shouldn't be False insated of True to make this sequence less dependent on 
matching config changes ?

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Server Groups are not an optional element, bug or feature ?

2014-04-09 Thread Day, Phil
 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 08 April 2014 13:13
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
 element, bug or feature ?
 
 On 04/08/2014 06:29 AM, Day, Phil wrote:
 
 
  -Original Message-
  From: Russell Bryant [mailto:rbry...@redhat.com]
  Sent: 07 April 2014 19:12
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
  element, bug or feature ?
 
 
  ...
 
  I consider it a complete working feature.  It makes sense to enable
  the filters by default.  It's harmless when the API isn't used.  That was 
  just
 an oversight.
 
  The list of instances in a group through the API only shows
  non-deleted instances.
 
  True, but the lack of even a soft delete on the rows in the
 instance_group_member worries me  - its not clear why that wasn't fixed
 rather than just hiding the deleted instances.I'd of expected the full DB
 lifecycle to implemented before something was considered as a complete
 working feature.
 
 We were thinking that there may be a use for being able to query a full list 
 of
 instances (including the deleted ones) for a group.  The API just hasn't made
 it that far yet.  Just hiding them for now leaves room to iterate and doesn't
 prevent either option (exposing the deleted instances, or changing to auto-
 delete them from the group).
 
Maybe it's just me, but I have a natural aversion to anything that grows 
forever in the database - over time and at scale this becomes a real problem.

We really need a consistent view / policy on what does and doesn't need to be 
possible for deleted instances - they trend of discussion seems to have been 
heading towards moving away from soft delete to more of a once its gone it's 
gone approach, which would get my vote. If we need an archive service 
let's make that something explicit in the architecture (maybe it can be 
something that consumes notifications like stacktach)

Seems that at the moment service groups is heading in a new direction of its 
own here - not just using soft delete and read-deleted to get at old records, 
but actually keeping them in the DB and then hiding them in the API layer - 
which feels really inconsistent.


 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-09 Thread Day, Phil
 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 08 April 2014 14:25
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones :
 possible or not ?
 
 On Tue, 2014-04-08 at 10:49 +, Day, Phil wrote:
  On a large cloud you’re protect against this to some extent if the
  number of servers is  number of instances in the quota.
 
  However it does feel that there are a couple of things missing to
  really provide some better protection:
 
  - A quota value on the maximum size of a server group
  - A policy setting so that the ability to use service-groups
  can be controlled on a per project basis
 
 Alternately, we could just have the affinity filters serve as weighting 
 filters
 instead of returning NoValidHosts.
 
 That way, a request containing an affinity hint would cause the scheduler to
 prefer placing the new VM near (or not-near) other instances in the server
 group, but if no hosts exist that meet that criteria, the filter simply finds 
 a
 host with the most (or fewest, in case of anti-affinity) instances that meet
 the affinity criteria.
 

I agree that  hint would be more consistent with weighting that filtering 
(constraint would be a better word for that) - but how does the user get 
feedback on whether the hint has been honoured or not ?

In the case of anti-affinity they would need to:
- Create a VM
- Check the host hash value is different
- if they really care delete the VM and try again

... which is pretty much the same cycle they can do without the hint (the 
filter/weighter just gives it a better chance of working first time a small 
system) 

I would guess that affinity is more likely to be a soft requirement that 
anti-affinity,  in that I can see some services just not meeting their HA goals 
without anti-affinity but I'm struggling to think of a use case why affinity is 
a must for the service. 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-09 Thread Day, Phil
 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 08 April 2014 15:19
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones :
 possible or not ?
 
 On 04/08/2014 07:25 AM, Jay Pipes wrote:
  On Tue, 2014-04-08 at 10:49 +, Day, Phil wrote:
  On a large cloud you're protect against this to some extent if the
  number of servers is  number of instances in the quota.
 
  However it does feel that there are a couple of things missing to
  really provide some better protection:
 
  - A quota value on the maximum size of a server group
  - A policy setting so that the ability to use service-groups
  can be controlled on a per project basis
 
  Alternately, we could just have the affinity filters serve as
  weighting filters instead of returning NoValidHosts.
 
  That way, a request containing an affinity hint would cause the
  scheduler to prefer placing the new VM near (or not-near) other
  instances in the server group, but if no hosts exist that meet that
  criteria, the filter simply finds a host with the most (or fewest, in
  case of anti-affinity) instances that meet the affinity criteria.
 
 I'd be in favor of this.   I've actually been playing with an internal
 patch to do both of these things, though in my case I was just doing it via
 metadata on the group and a couple hacks in the scheduler and the compute
 node.
 
 Basically I added a group_size metadata field and a best_effort flag to
 indicate whether we should error out or continue on if the policy can't be
 properly met.
 
I like the idea of the user being able to say if the affinity should be treated 
as a filter or weight.

In terms of group_size I'd want to able to impose a limit on that as an 
operator, not just have it in the control of the user (hence the quota idea)


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-09 Thread Day, Phil
 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 09 April 2014 15:37
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones :
 possible or not ?
 
 On 04/09/2014 03:55 AM, Day, Phil wrote:
 
  I would guess that affinity is more likely to be a soft requirement
  that anti-affinity,  in that I can see some services just not meeting
  their HA goals without anti-affinity but I'm struggling to think of a
  use case why affinity is a must for the service.
 
 Maybe something related to latency?  Put a database server and several
 public-facing servers all on the same host and they can talk to each other
 with less latency then if they had to go over the wire to another host?
 
I can see that as a high-want, but would you actually rather not start the 
service if you couldn't get it ?  I suspect not, as there are many other 
factors that could affect performance.  On the other hand I could imagine a 
case where I declare its not worth having a second VM at all if I can't get it 
on a separate server.   Hence affinity feels more soft and anti-affinity 
hard in terms or requirments.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] config options, defaults, oh my!

2014-04-08 Thread Day, Phil
 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: 07 April 2014 21:01
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [TripleO] config options, defaults, oh my!
 
 So one interesting thing from the influx of new reviews is lots of patches
 exposing all the various plumbing bits of OpenStack. This is good in some
 ways (yay, we can configure more stuff), but in some ways its kindof odd -
 like - its not clear when https://review.openstack.org/#/c/83122/ is needed.
 
 I'm keen to expose things that are really needed, but i'm not sure that /all/
 options are needed - what do folk think? 

I'm very wary of trying to make the decision in TripleO of what should and 
shouldn't be configurable in some other project.For sure the number of 
config options in Nova is a problem, and one that's been discussed many times 
at summits.   However I think you could also make the case/assumption for any 
service that the debate about having a config option has already been held 
within that service as part of the review that merged that option in the code - 
re-running the debate about whether something should be configurable via 
TripleO feels like some sort of policing function on configurability above and 
beyond what the experts in that service have already considered, and that 
doesn't feel right to me.

Right now TripleO has a very limited view of what can be configured, based as I 
understand on primarily what's needed for its CI job.  As more folks who have 
real deployments start to look at using TripleO its inevitable that they are 
going to want to enable the settings that are important to them to be 
configured.  I can't imagine that anyone is going to add a configuration value 
for the sake of it, so can't we start with the perspective that we are slowly 
exposing the set of values that do need to be configured ?


Also, some things really should be higher order operations - like the neutron 
callback to nova right - that should
 be either set to timeout in nova  configured in neutron, *or* set in both
 sides appropriately, never one-half or the other.
 
 I think we need to sort out our approach here to be systematic quite quickly
 to deal with these reviews.
 
 Here's an attempt to do so - this could become a developers guide patch.
 
 Config options in TripleO
 ==
 
 Non-API driven configuration falls into four categories:
 A - fixed at buildtime (e.g. ld.so path) B - cluster state derived C - local
 machine derived D - deployer choices
 
 For A, it should be entirely done within the elements concerned.
 
 For B, the heat template should accept parameters to choose the desired
 config (e.g. the Neutron-Nova example able) but then express the config in
 basic primitives in the instance metadata.
 
 For C, elements should introspect the machine (e.g. memory size to
 determine mysql memory footprint) inside os-refresh-config scripts; longer
 term we should make this an input layer to os-collect-config.
 
 For D, we need a sensible parameter in the heat template and probably
 direct mapping down to instance metadata.
 
I understand the split, but all of the reviews in question seem to be in D, so 
I'm not sure this helps much.  


 But we have a broader question - when should something be configurable at
 all?
 
 In my mind we have these scenarios:
 1) There is a single right answer
 2) There are many right answers
 
 An example of 1) would be any test-only option like failure injection
 - the production value is always 'off'. For 2), hypervisor driver is a great
 example - anything other than qemu is a valid production value
 :)
 
 But, it seems to me that these cases actually subdivide further -
 1a) single right answer, and the default is the right answer
 1b) single right answer and it is not the default
 2a) many right answers, and the default is the most/nearly most common
 one
 2b) many right answers, and the default is either not one of them or is a
 corner case
 
 So my proposal here - what I'd like to do as we add all these config options 
 to
 TripleO is to take the care to identify which of A/B/C/D they are and code
 them appropriately, and if the option is one of 1b) or 2b) make sure there is 
 a
 bug in the relevant project about the fact that we're having to override a
 default. If the option is really a case of 1a) I'm not sure we want it
 configurable at all.
 

I'm not convinced that anyone is in a position to judge that there is a single 
right answer - I know the values that are right for my deployments, but I'm not 
arrogant enough to say that they universally applicable.You only have to 
see the  wide range of Openstack Deployments presented at every summit to know 
that that there a lot of different use cases out there.   My worry is that if 
we try to have that debate in the context of a TripleO review, then we'll just 
spin between opinions rather than make the rapid progress towards getting the 
needed 

[openstack-dev] Enabling ServerGroup filters by default (was RE: [nova] Server Groups are not an optional element, bug or feature ?)

2014-04-08 Thread Day, Phil
 https://bugs.launchpad.net/nova/+bug/1303983
 
 --
 Russell Bryant

Wow - was there really a need to get that change merged within 12 hours and 
before others had a chance to review and comment on it ?

I see someone has already queried (post the merge) if there isn't a performance 
impact.

I've raised this point before - but apart from non-urgent security fixes 
shouldn't there be a minimum review period to make sure that all relevant 
feedback can be given ?

Phil 

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 07 April 2014 20:38
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
 element, bug or feature ?
 
 On 04/07/2014 02:12 PM, Russell Bryant wrote:
  On 04/07/2014 01:43 PM, Day, Phil wrote:
  Generally the scheduler's capabilities that are exposed via hints can
  be enabled or disabled in a Nova install by choosing the set of filters
  that are configured. However the server group feature doesn't fit
  that pattern - even if the affinity filter isn't configured the
  anti-affinity check on the server will still impose the anti-affinity
  behavior via throwing the request back to the scheduler.
 
  I appreciate that you can always disable the server-groups API
  extension, in which case users can't create a group (and so the
  server create will fail if one is specified), but that seems kind of
  at odds with other type of scheduling that has to be specifically 
  configured
 in
  rather than out of a base system.In particular having the API
  extension in by default but the ServerGroup Affinity and AntiAffinity
  filters not in by default seems an odd combination (it kind of works,
  but only by a retry from the host and that's limited to a number of
  retries).
 
  Given that the server group work isn't complete yet (for example the
  list of instances in a group isn't tided up when an instance is
  deleted) I feel a tad worried that the current default configuration
  exposed this rather than keeping it as something that has to be
  explicitly enabled - what do others think ?
 
  I consider it a complete working feature.  It makes sense to enable
  the filters by default.  It's harmless when the API isn't used.  That
  was just an oversight.
 
  The list of instances in a group through the API only shows
  non-deleted instances.
 
  There are some implementation details that could be improved (the
  check on the server is the big one).
 
 
 https://bugs.launchpad.net/nova/+bug/1303983
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

2014-04-08 Thread Day, Phil
Its more than just non-admin,  it also allows a user to lock an instance so 
that they don’t accidentally perform some operation on a VM.

At one point it was (by default) an admin only operation on the OSAPI, but its 
always been open to all users in EC2.   Recently it was changed so that admin 
and non-admin locks are considered as separate things.

From: Chen CH Ji [mailto:jiche...@cn.ibm.com]
Sent: 08 April 2014 07:13
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature


the instance lock is a mechanism that prevent non-admin user to operate on the 
instance (resize, etc, looks to me snapshot is not currently included)
the permission is a wider concept that major in API layer to allow or prevent 
user in using the API , guess instance lock might be enough for prevent 
instance actions .


Best Regards!

Kevin (Chen) Ji 纪 晨

Engineer, zVM Development, CSTL
Notes: Chen CH Ji/China/IBM@IBMCN   Internet: 
jiche...@cn.ibm.commailto:jiche...@cn.ibm.com
Phone: +86-10-82454158
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, 
Beijing 100193, PRC

[Inactive hide details for Hopper, Justin ---04/08/2014 02:05:02 PM---Phil, I 
am reviewing the existing “check_instance_lock]Hopper, Justin ---04/08/2014 
02:05:02 PM---Phil, I am reviewing the existing “check_instance_lock” 
implementation to see

From: Hopper, Justin justin.hop...@hp.commailto:justin.hop...@hp.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org,
Date: 04/08/2014 02:05 PM
Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature





Phil,

I am reviewing the existing “check_instance_lock” implementation to see
how it might be leveraged.  Off the cuff, it looks pretty much what we
need.  I need to look into the permissions to better understand how one
can “lock” and instance.

Thanks for the guidance.


Justin Hopper
Software Engineer - DBaaS
irc: juice | gpg: EA238CF3 | twt: @justinhopper




On 4/7/14, 10:01, Day, Phil philip@hp.commailto:philip@hp.com 
wrote:

I can see the case for Trove being to create an instance within a
customer's tenant (if nothing else it would make adding it onto their
Neutron network a lot easier), but I'm wondering why it really needs to
be hidden from them ?

If the instances have a name that makes it pretty obvious that Trove
created them, and the user presumably knows that did this from Trove, why
hide them  ?I'd of thought that would lead to a whole bunch of
confusion and support calls when they  try to work out why they are out
of quota and can only see subset of the instances being counted by the
system.

If the need is to stop the users doing something with those instances
then maybe we need an extension to the lock mechanism such that a lock
can be made by a specific user (so the trove user in the same tenant
could lock the instance so that a non-trove user in that tenant couldn’t
unlock ).  We already have this to an extent, in that an instance locked
by an admin can' t be unlocked by the owner, so I don’t think it would be
too hard to build on that.   Feels like that would be a lot more
transparent than trying to obfuscate the instances themselves.

 -Original Message-
 From: Hopper, Justin
 Sent: 06 April 2014 01:37
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

 Russell,

 Thanks for the quick reply. If I understand what you are suggesting it
is that
 there would be one Trove-Service Tenant/User that owns all instances
from
 the perspective of Nova.  This was one option proposed during our
 discussions.  However, what we thought would be best is to continue to
use
 the user credentials so that Nova has the correct association.  We
wanted a
 more substantial and deliberate relationship between Nova and a
 dependent service.  In this relationship, Nova would acknowledge which
 instances are being managed by which Services and while ownership was
still
 to that of the User, management/manipulation of said Instance would be
 solely done by the Service.

 At this point the guard that Nova needs to provide around the instance
does
 not need to be complex.  It would even suffice to keep those instances
 hidden from such operations as ³nova list² when invoked by directly by
the
 user.

 Thanks,

 Justin Hopper
 Software Engineer - DBaaS
 irc: juice | gpg: EA238CF3 | twt: @justinhopper




 On 4/5/14, 14:20, Russell Bryant 
 rbry...@redhat.commailto:rbry...@redhat.com wrote:

 On 04/04/2014 08:12 PM, Hopper, Justin wrote:
  Greetings,
 
  I am trying to address an issue from certain perspectives and I think
  some support from Nova may be needed.
 
  _Problem_
  Services like Trove use run in Nova Compute Instances.  These
  Services try to provide an integrated

Re: [openstack-dev] [nova] Server Groups are not an optional element, bug or feature ?

2014-04-08 Thread Day, Phil


 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: 07 April 2014 19:12
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Server Groups are not an optional
 element, bug or feature ?
 

...
 
 I consider it a complete working feature.  It makes sense to enable the 
 filters
 by default.  It's harmless when the API isn't used.  That was just an 
 oversight.

 The list of instances in a group through the API only shows non-deleted
 instances.

True, but the lack of even a soft delete on the rows in the 
instance_group_member worries me  - its not clear why that wasn't fixed  rather 
than just hiding the deleted instances.I'd of expected the full DB 
lifecycle to implemented before something was considered as a complete working 
feature.

 
 There are some implementation details that could be improved (the check
 on the server is the big one).
 
 --
 Russell Bryant
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Trove] Managed Instances Feature

2014-04-08 Thread Day, Phil
Hi Justin,

Glad you like the idea of using lock ;-) 

I still think you need some more granularity that user or admin - currently for 
Trove to lock the users  VMs as admin it would need an account that has admin 
rights across the board in Nova, and I don't think folks would want to delegate 
that much power to Trove.

Also the folks who genuinely need to enforce an admin level lock on a VM 
(normally if there is some security issue with the VM) wouldn’t want Trove to 
be able to unlock it.

So I think we're on the right lines, but needs more thinking about how to get a 
bit more granularity - I'm thinking of some other variant of lock that fits 
somewhere between the current user and admin locks, and is controlled via 
policy by a specific role, so you have something like:

User without AppLock role  - can apply/remove user lock to instance.Cannot 
perform operations is any lock is set on the instance
User with AppLock role - can apply/remove application lock to instance.   
Cannot perform operations on the instance if the admin lock is set
User with Admin role - can apply/remove admin lock.   Can perform any 
operations on the instance

Phil

 -Original Message-
 From: Hopper, Justin
 Sent: 07 April 2014 19:01
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature
 
 Phil,
 
 I think you “lock” concept is more along the lines of what we are looking for.
 Hiding them is not a requirement.  Preventing the user from using Nova
 directly on those Instances is.  So locking it with an “Admin” user so that 
 they
 could not snapshot, resize it directly in Nova would be great.  When they use
 the Trove API, Trove, as Admin, could “unlock” those Instances, make the
 modification and then “lock” them after it is complete.
 
 Thanks,
 
 Justin Hopper
 Software Engineer - DBaaS
 irc: juice | gpg: EA238CF3 | twt: @justinhopper
 
 
 
 
 On 4/7/14, 10:01, Day, Phil philip@hp.com wrote:
 
 I can see the case for Trove being to create an instance within a
 customer's tenant (if nothing else it would make adding it onto their
 Neutron network a lot easier), but I'm wondering why it really needs to
 be hidden from them ?
 
 If the instances have a name that makes it pretty obvious that Trove
 created them, and the user presumably knows that did this from Trove,
 why
 hide them  ?I'd of thought that would lead to a whole bunch of
 confusion and support calls when they  try to work out why they are out
 of quota and can only see subset of the instances being counted by the
 system.
 
 If the need is to stop the users doing something with those instances
 then maybe we need an extension to the lock mechanism such that a lock
 can be made by a specific user (so the trove user in the same tenant
 could lock the instance so that a non-trove user in that tenant
 couldn’t unlock ).  We already have this to an extent, in that an
 instance locked by an admin can' t be unlocked by the owner, so I don’t
 think it would be
 too hard to build on that.   Feels like that would be a lot more
 transparent than trying to obfuscate the instances themselves.
 
  -Original Message-
  From: Hopper, Justin
  Sent: 06 April 2014 01:37
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [Nova][Trove] Managed Instances Feature
 
  Russell,
 
  Thanks for the quick reply. If I understand what you are suggesting
 it is that  there would be one Trove-Service Tenant/User that owns all
 instances from  the perspective of Nova.  This was one option proposed
 during our  discussions.  However, what we thought would be best is to
 continue to use  the user credentials so that Nova has the correct
 association.  We wanted a  more substantial and deliberate
 relationship between Nova and a  dependent service.  In this
 relationship, Nova would acknowledge which  instances are being
 managed by which Services and while ownership was still  to that of
 the User, management/manipulation of said Instance would be  solely
 done by the Service.
 
  At this point the guard that Nova needs to provide around the
 instance does  not need to be complex.  It would even suffice to keep
 those instances  hidden from such operations as ³nova list² when
 invoked by directly by the  user.
 
  Thanks,
 
  Justin Hopper
  Software Engineer - DBaaS
  irc: juice | gpg: EA238CF3 | twt: @justinhopper
 
 
 
 
  On 4/5/14, 14:20, Russell Bryant rbry...@redhat.com wrote:
 
  On 04/04/2014 08:12 PM, Hopper, Justin wrote:
   Greetings,
  
   I am trying to address an issue from certain perspectives and I
   think some support from Nova may be needed.
  
   _Problem_
   Services like Trove use run in Nova Compute Instances.  These
   Services try to provide an integrated and stable platform for
   which the ³service² can run in a predictable manner.  Such
   elements include configuration of the service, networking, installed

Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-08 Thread Day, Phil
On a large cloud you're protect against this to some extent if the number of 
servers is  number of instances in the quota.

However it does feel that there are a couple of things missing to really 
provide some better protection:


-  A quota value on the maximum size of a server group

-  A policy setting so that the ability to use service-groups can be 
controlled on a per project basis

From: Khanh-Toan Tran [mailto:khanh-toan.t...@cloudwatt.com]
Sent: 08 April 2014 11:32
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

Abusive usage : If user can request anti-affinity VMs, then why doesn't he 
uses that? This will result in user constantly requesting all his VMs being in 
the same anti-affinity group. This makes scheduler choose one physical host per 
VM. This will quickly flood the infrastructure and mess up with the objective 
of admin (e.g. Consolidation that regroup VM instead of spreading, spared 
hosts, etc) ; at some time it will be reported back that there is no host 
available, which appears as a bad experience for user.


De : Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Envoyé : mardi 8 avril 2014 01:02
À : OpenStack Development Mailing List (not for usage questions)
Objet : Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

On 3 April 2014 08:21, Khanh-Toan Tran 
khanh-toan.t...@cloudwatt.commailto:khanh-toan.t...@cloudwatt.com wrote:
Otherwise we cannot provide redundancy to client except using Region which
is dedicated infrastructure and networked separated and anti-affinity
filter which IMO is not pragmatic as it has tendency of abusive usage.

I'm sorry, could you explain what you mean here by 'abusive usage'?
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-07 Thread Day, Phil
Hi Sylvain,

There was a similar thread on this recently - which might be worth reviewing:   
http://lists.openstack.org/pipermail/openstack-dev/2014-March/031006.html

Some interesting use cases were posted, and a I don't think a conclusion was 
reached, which seems to suggest this might be a good case for a session in 
Atlanta.

Personally I'm not sure that selecting more than one AZ really makes a lot of 
sense - they are generally objects which are few in number and large in scale, 
so if for example there are 3 AZs and you want to create two servers in 
different AZs, does it really help if you can do the sequence:


-  Create a server in any AZ

-  Find the AZ the server is in

-  Create a new server in any of the two remaining AZs

Rather than just picking two from the list to start with ?

If you envisage a system with many AZs, and thereby allow users some pretty 
find grained choices about where to place their instances, then I think you'll 
end up with capacity management issues.

If the use case is more to get some form of server isolation, then 
server-groups might be worth looking at, as these are dynamic and per user.

I can see a case for allowing more than one set of mutually exclusive host 
aggregates - at the moment that's a property implemented just for the set of 
aggregates that are designated as AZs, and generalizing that concept so that 
there can be other sets (where host overlap is allowed between sets, but not 
within a set) might be useful.

Phil

From: Murray, Paul (HP Cloud Services)
Sent: 03 April 2014 16:34
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

Hi Sylvain,

I would go with keeping AZs exclusive. It is a well-established concept even if 
it is up to providers to implement what it actually means in terms of 
isolation. Some good use cases have been presented on this topic recently, but 
for me they suggest we should develop a better concept rather than bend the 
meaning of the old one. We certainly don't have hosts in more than one AZ in HP 
Cloud and I think some of our users would be very surprised if we changed that.

Paul.

From: Khanh-Toan Tran [mailto:khanh-toan.t...@cloudwatt.com]
Sent: 03 April 2014 15:53
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] Hosts within two Availability Zones : 
possible or not ?

+1 for AZs not sharing hosts.

Because it's the only mechanism that allows us to segment the datacenter. 
Otherwise we cannot provide redundancy to client except using Region which is 
dedicated infrastructure and networked separated and anti-affinity filter which 
IMO is not pragmatic as it has tendency of abusive usage.  Why sacrificing this 
power so that users can select the types of his desired physical hosts ? The 
latter can be exposed using flavor metadata, which is a lot safer and more 
controllable than using AZs. If someone insists that we really need to let 
users choose the types of physical hosts, then I suggest creating a new hint, 
and use aggregates with it. Don't sacrifice AZ exclusivity!

Btw, there is a datacenter design called dual-room [1] which I think best fit 
for AZs to make your cloud redundant even with one datacenter.

Best regards,

Toan

[1] IBM and Cisco: Together for a World Class Data Center, Page 141. 
http://books.google.fr/books?id=DHjJAgAAQBAJpg=PA141#v=onepageqf=false



De : Sylvain Bauza [mailto:sylvain.ba...@gmail.com]
Envoyé : jeudi 3 avril 2014 15:52
À : OpenStack Development Mailing List (not for usage questions)
Objet : [openstack-dev] [Nova] Hosts within two Availability Zones : possible 
or not ?

Hi,

I'm currently trying to reproduce [1]. This bug requires to have the same host 
on two different aggregates, each one having an AZ.

IIRC, Nova API prevents hosts of being part of two distinct AZs [2], so IMHO 
this request should not be possible.
That said, there are two flaws where I can identify that no validation is done :
 - when specifying an AZ in nova.conf, the host is overriding the existing AZ 
by its own
 - when adding an host to an aggregate without AZ defined, and afterwards 
update the aggregate to add an AZ


So, I need direction. Either we consider it is not possible to share 2 AZs for 
the same host and then we need to fix the two above scenarios, or we say it's 
nice to have 2 AZs for the same host and then we both remove the validation 
check in the API and we fix the output issue reported in the original bug [1].


Your comments are welcome.
Thanks,
-Sylvain


[1] : https://bugs.launchpad.net/nova/+bug/1277230

[2] : 
https://github.com/openstack/nova/blob/9d45e9cef624a4a972c24c47c7abd57a72d74432/nova/compute/api.py#L3378
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Server Groups are not an optional element, bug or feature ?

2014-04-07 Thread Day, Phil
Hi Folks,

Generally the scheduler's capabilities that are exposed via hints can be 
enabled or disabled in a Nova install by choosing the set of filters that are 
configured. However the server group feature doesn't fit that pattern - 
even if the affinity filter isn't configured the anti-affinity check on the 
server will still impose the anti-affinity behavior via throwing the request 
back to the scheduler.

I appreciate that you can always disable the server-groups API extension, in 
which case users can't create a group (and so the server create will fail if 
one is specified), but that seems kind of at odds with other type of scheduling 
that has to be specifically configured in rather than out of a base system.
In particular having the API extension in by default but the ServerGroup 
Affinity and AntiAffinity  filters not in by default seems an odd combination 
(it kind of works, but only by a retry from the host and that's limited to a 
number of retries).

Given that the server group work isn't complete yet (for example the list of 
instances in a group isn't tided up when an instance is deleted) I feel a tad 
worried that the current default configuration exposed this rather than keeping 
it as something that has to be explicitly enabled - what do others think ?

Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-28 Thread Day, Phil
 Personally, I feel it is a mistake to continue to use the Amazon concept
 of an availability zone in OpenStack, as it brings with it the
 connotation from AWS EC2 that each zone is an independent failure
 domain. This characteristic of EC2 availability zones is not enforced in
 OpenStack Nova or Cinder, and therefore creates a false expectation for
 Nova users.

I think this is backwards training, personally. I think azs as separate failure
domains were done like that for a reason by amazon, and make good sense. 
What we've done is overload that with cells, aggregates etc which should 
have a better interface and are a different concept. Redefining well 
understood 
terms because they don't suite your current implementation is a slippery 
slope, 
and overloading terms that already have a meaning in the industry in just 
annoying.

+1
I don't think there is anything wrong with identifying new use cases and 
working out how to cope with them:

 - First we generalized Aggregates
- Then we mapped AZs onto aggregates as a special mutually exclusive group
- Now we're recognizing that maybe we need to make those changes to support AZs 
more generic so we can create additional groups of mutually exclusive aggregates

That all feels like good evolution.

But I don't see why that means we have to fit that in under the existing 
concept of AZs - why can't we keep AZs as they are and have a better thing 
called Zones that is just an OSAPI concept and is better that AZs ?
Arguments around not wanting to add new options to create server seem a bit 
weak to me - for sure we don't want to add them in an uncontrolled way, but if 
we have a new, richer, concept we should be able to express that separately.

I'm still not personally convinced by the need use cases of racks having 
orthogonal power failure domains and switch failure domains - that seems to me 
from a practical perspective that it becomes really hard to work out where to 
separate VMs so that they don't share a failure mode.Every physical DC 
design I've been involved with tries to get the different failure domains to 
align.   However if it the use case makes sense to someone then I'm not against 
extending aggregates to support multiple mutually exclusive groups.

I think I see a Design Summit session emerging here

Phil
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil
Sorry if I'm coming late to this thread, but why would you define AZs to cover 
othognal zones ?

AZs are a very specific form of aggregate - they provide a particular isolation 
schematic between the hosts (i.e. physical hosts are never in more than one AZ) 
- hence the availability in the name.

AZs are built on aggregates, and yes aggregates can overlap and aggreagtes are 
used for scheduling.

So if you want to schedule on features as well as (or instead of) physical 
isolation, then you can already:

- Create an aggregate that contains hosts with fast CPUs
- Create another aggregate that includes hosts with SSDs
- Write (or configure in some cases) schedule filters that look at something in 
the request (such as schedule hint, an image property, or a flavor extra_spec) 
so that the scheduler can filter on those aggregates

nova boot --availability-zone az1 --scheduler-hint want-fast-cpu 
--scheduler-hint want-ssd  ...

nova boot --availability-zone az1 --flavor 1000
(where flavor 1000 has extra spec that says it needs fast cpu and ssd)

But there is no need that I can see to make AZs overlapping just to so the same 
thing - that would break what everyone (including folks used to working with 
AWS) expects from an AZ




 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 27 March 2014 13:18
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..
 
 On 03/27/2014 05:03 AM, Khanh-Toan Tran wrote:
 
  Well, perhaps I didn't make it clearly enough. What I intended to say
  is that user should be able to select a set of AZs in his request,
  something like :
 
   nova  boot   --flavor 2  --image ubuntu   --availability-zone
  Z1  --availability-zone AZ2  vm1
 
 I think it would make more sense to make the availability-zone argument
 take a comma-separated list of zones.
 
 nova boot --flavor 2 --image ubuntu --availability-zone AZ1,AZ2 vm1
 
 
 Just to clarify, in a case like this we're talking about using the 
 intersection of
 the two zones, right?  That's the only way that makes sense when using
 orthogonal zones like hosts with fast CPUs and hosts with SSDs.
 
 Chris
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil
 
 The need arises when you need a way to use both the zones to be used for
 scheduling when no specific zone is specified. The only way to do that is
 either have a AZ which is a superset of the two AZ or the other way could be
 if the default_scheduler_zone can take a list of zones instead of just 1.

If you don't configure a default_schedule_zone, and don't specify an 
availability_zone to the request  - then I thought that would make the AZ 
filter in effect ignore AZs for that request.  Isn't that want you need ?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil
 -Original Message-
 From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
 Sent: 26 March 2014 20:33
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..
 
 
 On Mar 26, 2014, at 11:40 AM, Jay Pipes jaypi...@gmail.com wrote:
 
  On Wed, 2014-03-26 at 09:47 -0700, Vishvananda Ishaya wrote:
  Personally I view this as a bug. There is no reason why we shouldn't
  support arbitrary grouping of zones. I know there is at least one
  problem with zones that overlap regarding displaying them properly:
 
  https://bugs.launchpad.net/nova/+bug/1277230
 
  There is probably a related issue that is causing the error you see
  below. IMO both of these should be fixed. I also think adding a
  compute node to two different aggregates with azs should be allowed.
 
  It also might be nice to support specifying multiple zones in the
  launch command in these models. This would allow you to limit booting
  to an intersection of two overlapping zones.
 
  A few examples where these ideas would be useful:
 
  1. You have 3 racks of servers and half of the nodes from each rack
  plugged into a different switch. You want to be able to specify to
  spread across racks or switches via an AZ. In this model you could
  have a zone for each switch and a zone for each rack.
 
  2. A single cloud has 5 racks in one room in the datacenter and 5
  racks in a second room. You'd like to give control to the user to
  choose the room or choose the rack. In this model you would have one
  zone for each room, and smaller zones for each rack.
 
  3. You have a small 3 rack cloud and would like to ensure that your
  production workloads don't run on the same machines as your dev
  workloads, but you also want to use zones spread workloads across the
  three racks. Similarly to 1., you could split your racks in half via
  dev and prod zones. Each one of these zones would overlap with a rack
  zone.
 
  You can achieve similar results in these situations by making small
  zones (switch1-rack1 switch1-rack2 switch1-rack3 switch2-rack1
  switch2-rack2 switch2-rack3) but that removes the ability to decide
  to launch something with less granularity. I.e. you can't just
  specify 'switch1' or 'rack1' or 'anywhere'
 
  I'd like to see all of the following work nova boot ... (boot anywhere)
  nova boot -availability-zone switch1 ... (boot it switch1 zone) nova
  boot -availability-zone rack1 ... (boot in rack1 zone) nova boot
  -availability-zone switch1,rack1 ... (boot
 
  Personally, I feel it is a mistake to continue to use the Amazon
  concept of an availability zone in OpenStack, as it brings with it the
  connotation from AWS EC2 that each zone is an independent failure
  domain. This characteristic of EC2 availability zones is not enforced
  in OpenStack Nova or Cinder, and therefore creates a false expectation
  for Nova users.
 
  In addition to the above problem with incongruent expectations, the
  other problem with Nova's use of the EC2 availability zone concept is
  that availability zones are not hierarchical -- due to the fact that
  EC2 AZs are independent failure domains. Not having the possibility of
  structuring AZs hierarchically limits the ways in which Nova may be
  deployed -- just see the cells API for the manifestation of this
  problem.
 
  I would love it if the next version of the Nova and Cinder APIs would
  drop the concept of an EC2 availability zone and introduce the concept
  of a generic region structure that can be infinitely hierarchical in
  nature. This would enable all of Vish's nova boot commands above in an
  even simpler fashion. For example:
 
  Assume a simple region hierarchy like so:
 
   regionA
   /  \
  regionBregionC
 
  # User wants to boot in region B
  nova boot --region regionB
  # User wants to boot in either region B or region C nova boot --region
  regionA
 
 I think the overlapping zones allows for this and also enables additional use
 cases as mentioned in my earlier email. Hierarchical doesn't work for the
 rack/switch model. I'm definitely +1 on breaking from the amazon usage of
 availability zones but I'm a bit leery to add another parameter to the create
 request. It is also unfortunate that region already has a meaning in the
 amazon world which will add confusion.
 
 Vish

Ok, got far enough back down my stack to understand the drive here, and I kind 
of understand the use case, but I think what's missing is that currently we 
only allow for one group of availability zones.

I can see why you would want them to overlap in a certain way - i.e. a rack 
based zone could overlap with a switch based zone - but I still don't want 
any overlap within the set of switch based zones, or any overlap within the 
set of rack based zones.

Maybe the issue is that when we converted / mapped  AZs onto aggregates we only 
ever considered that there 

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Day, Phil
 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: 27 March 2014 18:15
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova][scheduler] Availability Zones and Host
 aggregates..
 
 On 03/27/2014 11:48 AM, Day, Phil wrote:
  Sorry if I'm coming late to this thread, but why would you define AZs
  to cover othognal zones ?
 
 See Vish's first message.
 
  AZs are a very specific form of aggregate - they provide a particular
  isolation schematic between the hosts (i.e. physical hosts are never
  in more than one AZ) - hence the availability in the name.
 
 That's why I specified orthogonal.  If you're looking at different resources
 then it makes sense to have one host be in different AZs because the AZs are
 essentially in different namespaces.
 
 So you could have hosts in server room A vs hosts in server room B.
   Or hosts on network switch A vs hosts on network switch B.  Or hosts
 with SSDs vs hosts with disks.  Then you could specify you want to boot an
 instance in server room A, on switch B, on a host with SSDs.
 
  AZs are built on aggregates, and yes aggregates can overlap and
  aggreagtes are used for scheduling.
 
  So if you want to schedule on features as well as (or instead of)
  physical isolation, then you can already:
 
  - Create an aggregate that contains hosts with fast CPUs - Create
  another aggregate that includes hosts with SSDs - Write (or
  configure in some cases) schedule filters that look at something in
  the request (such as schedule hint, an image property, or a flavor
  extra_spec) so that the scheduler can filter on those aggregates
 
  nova boot --availability-zone az1 --scheduler-hint want-fast-cpu
  --scheduler-hint want-ssd  ...
 
 Does this actually work?  The docs only describe setting the metadata on the
 flavor, not as part of the boot command.
 
If you want to be able to pass it in as explicit hints then you need to write a 
filter to cope with that hint- I was using it as an example of the kind of 
relationship between hints and aggregate filtering 
The more realistic example for this kind of attribute is to make it part of the 
flavor and use the aggregate_instance_extra_spec filter - which does exactly 
this kind of filtering (for overlapping aggregates)


  nova boot --availability-zone az1 --flavor 1000 (where flavor 1000 has
  extra spec that says it needs fast cpu and ssd)
 
  But there is no need that I can see to make AZs overlapping just to so
  the same thing - that would break what everyone (including folks used
  to working with AWS) expects from an AZ
 
 
 As an admin user you can create arbitrary host aggregates, assign metadata,
 and have flavors with extra specs to look for that metadata.
 
 But as far as I know there is no way to match host aggregate information on a
 per-instance basis.

Matching aggregate information on a per-instance basis is what the scheduler 
filters do.

Well yes  it is down to the admin to decide what groups are going to be 
available, how to map them into aggregates, how to map that into flavors (which 
are often the link to a charging mechanism) - but once they've done that then 
the user can work within those bounds by choosing the correct flavor, image, 
etc.
 
 Also, unless things have changed since I looked at it last as a regular user 
 you
 can't create new flavors so the only way to associate an instance with a host
 aggregate is via an availability zone.

Well it depends on the roles you want to assign to your users really and how 
you set up your policy file, but in general you don't want users defining 
flavors, the cloud admin defines the flavors based on what makes sense from 
their environment.

 
 Chris
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [bug?] live migration fails with boot-from-volume

2014-03-08 Thread Day, Phil
I guess that would depend on whether the flavour has any ephemeral storage in 
addition to the boot volume.

The block migration should work in this case,  have you tried that.


Sent from Samsung Mobile


 Original message 
From: Chris Friesen
Date:08/03/2014 06:16 (GMT+00:00)
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova] [bug?] live migration fails with 
boot-from-volume

Hi,

I was just testing the current icehouse code and came across some
behaviour that looked suspicious.

I have two nodes, an all-in-one and a compute node.  I was not using
shared instance storage.

I created a volume from an image and then booted an instance from the
volume.  Once the image was up and running I tried to do a nova
live-migration instance and got the following error:


cfriesen@controller:/opt/stack/nova/nova/compute$ nova live-migration
fromvol
ERROR: controller is not on shared storage: Live migration can not be
used without shared storage. (HTTP 400) (Request-ID:
req-0d8da5e4-b0ec-401d-be95-d9c4f9f7e062)


Shouldn't booting from volume count as a form of shared storage?

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil
 -Original Message-
 From: Chris Behrens [mailto:cbehr...@codestud.com]
 Sent: 26 February 2014 22:05
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
 
 This thread is many messages deep now and I'm busy with a conference this
 week, but I wanted to carry over my opinion from the other v3 API in
 Icehouse thread and add a little to it.
 
 Bumping versions is painful. v2 is going to need to live for a long time to
 create the least amount of pain. I would think that at least anyone running a
 decent sized Public Cloud would agree, if not anyone just running any sort of
 decent sized cloud. I don't think there's a compelling enough reason to
 deprecate v2 and cause havoc with what we currently have in v3. I'd like us
 to spend more time on the proposed tasks changes. And I think we need
 more time to figure out if we're doing versioning in the correct way. If we've
 got it wrong, a v3 doesn't fix the problem and we'll just be causing more
 havoc with a v4.
 
 - Chris
 
Like Chris I'm struggling to keep up with this thread,  but of all the various 
messages I've read this is the one that resonates most with me.

My perception of the V3 API improvements (in order to importance to me):
i) The ability to version individual extensions
Crazy that small improvements can't be introduced without having to create a 
new extension,  when often the extension really does nothing more that indicate 
that some other part of the API code has changed.

ii) The opportunity to get the proper separation between Compute and Network 
APIs
Being (I think) one of the few clouds that provides both the Nova and Neutron 
API this is a major source of confusion and hence support calls.

iii) The introduction of the task model
I like the idea of tasks, and think it will be a much easier way for users to 
interact with the system.   Not convinced that it couldn't co-exist in V2 
thought rather than having to co-exist as V2 and V3

iv)Clean-up of a whole bunch of minor irritations / inconsistencies
There are lots of things that are really messy (inconsistent error codes, 
aspects of core that are linked to just Xen, etc, etc).  They annoy people the 
first time they hit them, then the code around them and move on.Probably 
I've had more hate mail from people writing language bindings than application 
developers (who tend to be abstracted from this by the clients)


 Phil




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil
 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: 24 February 2014 23:49
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
 
  Similarly with a Xen vs KVM situation I don't think its an extension
  related issue. In V2 we have features in *core* which are only
  supported by some virt backends. It perhaps comes down to not being
  willing to say either that we will force all virt backends to support
  all features in the API or they don't get in the tree. Or
  alternatively be willing to say no to any feature in the API which can
  not be currently implemented in all virt backends. The former greatly
  increases the barrier to getting a hypervisor included, the latter
  restricts Nova development to the speed of the slowest developing and
  least mature hypervisor supported.
 
 Actually, the problem is not feature parity. The problem lies where two
 drivers implement the same or similar functionality, but the public API for a
 user to call the functionality is slightly different depending on which 
 driver is
 used by the deployer.
 
 There's nothing wrong at all (IMO) in having feature disparity amongst
 drivers.

I agree with the rest of your posy Jay, but I  think there are some feature 
parity issues - for example having rescue always return a generated admin 
password when only some (one ?) Hypervisor supports actually setting the 
password is an issue. 

For some calls (create , rebuild) this can be suppressed by a Conf value 
(enable_instance_password) but when I tried to get that extended to Rescue in 
V2 it was blocked as a would break compatibility - either add an extension or 
only do it in V3 change.   So clients have to be able to cope with an optional 
attribute in the response to create/rebuild (because they can't inspect the API 
to see if the conf value is set), but can't be expected to cope with in the 
response from rescue apparently ;-(

 Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Future of the Nova API

2014-02-28 Thread Day, Phil
The current set of reviews on this change seems relevant to this debate:  
https://review.openstack.org/#/c/43822/

In effect a fully working and tested change which makes the nova-net / neutron 
compatibility via the V2 API that little bit closer to being complete is being 
blocked because it's thought that by not having it people will be quicker to 
move to V3 instead.

Folks this is just madness - no one is going to jump to using V3 just because 
we don't fix minor things like this in V2,  they're just as likely to start 
jumping to something completely different because that Openstack stuff is just 
too hard to work with. User's don't think like developers, and you can't 
force them into a new API by deliberately keeping the old one bad - at least 
not if you want to keep them as users in the long term.

I can see an argument (maybe) for not adding lots of completely new features 
into V2 if V3 was already available in a stable form - but V2 already provides 
a nearly complete support for nova-net features on top of Neutron.I fail to 
see what is wrong with continuing to improve that.

Phil

 -Original Message-
 From: Day, Phil
 Sent: 28 February 2014 11:07
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
  -Original Message-
  From: Chris Behrens [mailto:cbehr...@codestud.com]
  Sent: 26 February 2014 22:05
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] Future of the Nova API
 
 
  This thread is many messages deep now and I'm busy with a conference
  this week, but I wanted to carry over my opinion from the other v3
  API in Icehouse thread and add a little to it.
 
  Bumping versions is painful. v2 is going to need to live for a long
  time to create the least amount of pain. I would think that at least
  anyone running a decent sized Public Cloud would agree, if not anyone
  just running any sort of decent sized cloud. I don't think there's a
  compelling enough reason to deprecate v2 and cause havoc with what we
  currently have in v3. I'd like us to spend more time on the proposed
  tasks changes. And I think we need more time to figure out if we're
  doing versioning in the correct way. If we've got it wrong, a v3
  doesn't fix the problem and we'll just be causing more havoc with a v4.
 
  - Chris
 
 Like Chris I'm struggling to keep up with this thread,  but of all the various
 messages I've read this is the one that resonates most with me.
 
 My perception of the V3 API improvements (in order to importance to me):
 i) The ability to version individual extensions Crazy that small improvements
 can't be introduced without having to create a new extension,  when often
 the extension really does nothing more that indicate that some other part of
 the API code has changed.
 
 ii) The opportunity to get the proper separation between Compute and
 Network APIs Being (I think) one of the few clouds that provides both the
 Nova and Neutron API this is a major source of confusion and hence support
 calls.
 
 iii) The introduction of the task model
 I like the idea of tasks, and think it will be a much easier way for users to
 interact with the system.   Not convinced that it couldn't co-exist in V2
 thought rather than having to co-exist as V2 and V3
 
 iv)Clean-up of a whole bunch of minor irritations / inconsistencies
 There are lots of things that are really messy (inconsistent error codes,
 aspects of core that are linked to just Xen, etc, etc).  They annoy people the
 first time they hit them, then the code around them and move on.Probably
 I've had more hate mail from people writing language bindings than
 application developers (who tend to be abstracted from this by the clients)
 
 
  Phil
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] How do I mark one option as deprecating another one ?

2014-02-27 Thread Day, Phil
Hi Denis,

Thanks for the pointer, but I looked at that and I my understanding is that it 
only allows me to retrieve a value by an old name, but doesn't let me know that 
the old name has been used.  So If all I wanted to do was change the name/group 
of the config value it would be fine.  But in my case I need to be able to 
implement:
If new_value_defined:
  do_something
else if old_value_defined:
 warn_about_deprectaion
do_something_else

Specifically I want to replace tenant_name based authentication with tenant_id 
- so I need to know which has been specified.

Phil


From: Denis Makogon [mailto:dmako...@mirantis.com]
Sent: 26 February 2014 14:31
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] How do I mark one option as deprecating another 
one ?

Here what oslo.config documentation says.

Represents a Deprecated option. Here's how you can use it

oldopts = [cfg.DeprecatedOpt('oldfoo', group='oldgroup'),
   cfg.DeprecatedOpt('oldfoo2', group='oldgroup2')]
cfg.CONF.register_group(cfg.OptGroup('blaa'))
cfg.CONF.register_opt(cfg.StrOpt('foo', deprecated_opts=oldopts),
   group='blaa')

Multi-value options will return all new and deprecated
options.  For single options, if the new option is present
([blaa]/foo above) it will override any deprecated options
present.  If the new option is not present and multiple
deprecated options are present, the option corresponding to
the first element of deprecated_opts will be chosen.
I hope that it'll help you.

Best regards,
Denis Makogon.

On Wed, Feb 26, 2014 at 4:17 PM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

I could do with some pointers on config value deprecation.

All of the examples in the code and documentation seem to deal with  the case 
of old_opt being replaced by new_opt but still returning the same value
Here using deprecated_name and  / or deprecated_opts in the definition of 
new_opt lets me still get the value (and log a warning) if the config still 
uses old_opt

However my use case is different because while I want deprecate old-opt, 
new_opt doesn't take the same value and I need to  different things depending 
on which is specified, i.e. If old_opt is specified and new_opt isn't I still 
want to do some processing specific to old_opt and log a deprecation warning.

Clearly I can code this up as a special case at the point where I look for the 
options - but I was wondering if there is some clever magic in oslo.config that 
lets me declare this as part of the option definition ?



As a second point,  I thought that using a deprecated option automatically 
logged a warning, but in the latest Devstack wait_soft_reboot_seconds is 
defined as:

cfg.IntOpt('wait_soft_reboot_seconds',
   default=120,
   help='Number of seconds to wait for instance to shut down after'
' soft reboot request is made. We fall back to hard reboot'
' if instance does not shutdown within this window.',
   deprecated_name='libvirt_wait_soft_reboot_seconds',
   deprecated_group='DEFAULT'),



but if I include the following in nova.conf

libvirt_wait_soft_reboot_seconds = 20


I can see the new value of 20 being used, but there is no warning logged that 
I'm using a deprecated name ?

Thanks
Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] How do I mark one option as deprecating another one ?

2014-02-26 Thread Day, Phil
Hi Folks,

I could do with some pointers on config value deprecation.

All of the examples in the code and documentation seem to deal with  the case 
of old_opt being replaced by new_opt but still returning the same value
Here using deprecated_name and  / or deprecated_opts in the definition of 
new_opt lets me still get the value (and log a warning) if the config still 
uses old_opt

However my use case is different because while I want deprecate old-opt, 
new_opt doesn't take the same value and I need to  different things depending 
on which is specified, i.e. If old_opt is specified and new_opt isn't I still 
want to do some processing specific to old_opt and log a deprecation warning.

Clearly I can code this up as a special case at the point where I look for the 
options - but I was wondering if there is some clever magic in oslo.config that 
lets me declare this as part of the option definition ?



As a second point,  I thought that using a deprecated option automatically 
logged a warning, but in the latest Devstack wait_soft_reboot_seconds is 
defined as:

cfg.IntOpt('wait_soft_reboot_seconds',
   default=120,
   help='Number of seconds to wait for instance to shut down after'
' soft reboot request is made. We fall back to hard reboot'
' if instance does not shutdown within this window.',
   deprecated_name='libvirt_wait_soft_reboot_seconds',
   deprecated_group='DEFAULT'),



but if I include the following in nova.conf

libvirt_wait_soft_reboot_seconds = 20


I can see the new value of 20 being used, but there is no warning logged that 
I'm using a deprecated name ?

Thanks
Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] about the bp cpu-entitlement

2014-02-04 Thread Day, Phil
Hi,

There were a few related blueprints which were looking to add various 
additional types of resource to the scheduler - all of which will now be 
implemented on top of a new generic mechanism covered by:

https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking

 -Original Message-
 From: sahid [mailto:sahid.ferdja...@cloudwatt.com]
 Sent: 04 February 2014 09:24
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: [openstack-dev] [nova] about the bp cpu-entitlement
 
 Greetings,
 
   I saw a really interesting blueprint about cpu entitlement, it will be 
 targeted
 for icehouse-3 and I would like to get some details about the progress?. Does
 the developer need help? I can give a part of my time on it.
 
 https://blueprints.launchpad.net/nova/+spec/cpu-entitlement
 
 Thanks a lot,
 s.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-29 Thread Day, Phil

 -Original Message-
 From: Justin Santa Barbara [mailto:jus...@fathomdb.com]
 Sent: 28 January 2014 20:17
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances
 through metadata service
 
 Thanks John - combining with the existing effort seems like the right thing to
 do (I've reached out to Claxton to coordinate).  Great to see that the larger
 issues around quotas / write-once have already been agreed.
 
 So I propose that sharing will work in the same way, but some values are
 visible across all instances in the project.  I do not think it would be
 appropriate for all entries to be shared this way.  A few
 options:
 
 1) A separate endpoint for shared values
 2) Keys are shared iff  e.g. they start with a prefix, like 'peers_XXX'
 3) Keys are set the same way, but a 'shared' parameter can be passed, either
 as a query parameter or in the JSON.
 
 I like option #3 the best, but feedback is welcome.
 
 I think I will have to store the value using a system_metadata entry per
 shared key.  I think this avoids issues with concurrent writes, and also makes
 it easier to have more advanced sharing policies (e.g.
 when we have hierarchical projects)
 
 Thank you to everyone for helping me get to what IMHO is a much better
 solution than the one I started with!
 
 Justin
 
 
I think #1 or #3 would be fine.   I don't really like #2 - doing this kind of 
thing through naming conventions always leads to problems IMO.

Phil



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-29 Thread Day, Phil
)
 +else:
 +result[instance['key_name']] = [line]
 +return result
 +
  def get_metadata(self, ip):
  i = self.get_instance_by_ip(ip)
 +mpi = self._get_mpi_data(i['project_id'])
  if i is None:
  return None
  if i['key_name']:
 @@ -135,7 +148,8 @@ class CloudController(object):
  'public-keys' : keys,
  'ramdisk-id': i.get('ramdisk_id', ''),
  'reservation-id': i['reservation_id'],
 -'security-groups': i.get('groups', '')
 +'security-groups': i.get('groups', ''),
 +'mpi': mpi
  }
  }
  if False: # TODO: store ancestor ids
 
 
 
 
 
  On Tue, Jan 28, 2014 at 4:38 AM, John Garbutt j...@johngarbutt.com
 wrote:
  On 27 January 2014 14:52, Justin Santa Barbara jus...@fathomdb.com
 wrote:
  Day, Phil wrote:
 
 
  We already have a mechanism now where an instance can push
  metadata as a way of Windows instances sharing their passwords -
  so maybe this could build on that somehow - for example each
  instance pushes the data its willing to share with other
  instances owned by the same tenant ?
 
  I do like that and think it would be very cool, but it is much
  more complex to implement I think.
 
  I don't think its that complicated - just needs one extra attribute
  stored per instance (for example into instance_system_metadata)
  which allows the instance to be included in the list
 
 
  Ah - OK, I think I better understand what you're proposing, and I do
  like it.  The hardest bit of having the metadata store be full
  read/write would be defining what is and is not allowed
  (rate-limits, size-limits, etc).  I worry that you end up with a new
  key-value store, and with per-instance credentials.  That would be a
  separate discussion: this blueprint is trying to provide a focused
 replacement for multicast discovery for the cloud.
 
  But: thank you for reminding me about the Windows password though...
  It may provide a reasonable model:
 
  We would have a new endpoint, say 'discovery'.  An instance can POST
  a single string value to the endpoint.  A GET on the endpoint will
  return any values posted by all instances in the same project.
 
  One key only; name not publicly exposed ('discovery_datum'?); 255
  bytes of value only.
 
  I expect most instances will just post their IPs, but I expect other
  uses will be found.
 
  If I provided a patch that worked in this way, would you/others be on-
 board?
 
  I like that idea. Seems like a good compromise. I have added my
  review comments to the blueprint.
 
  We have this related blueprints going on, setting metadata on a
  particular server, rather than a group:
  https://blueprints.launchpad.net/nova/+spec/metadata-service-callback
  s
 
  It is limiting things using the existing Quota on metadata updates.
 
  It would be good to agree a similar format between the two.
 
  John
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

2014-01-27 Thread Day, Phil
 
 What worried me most, I think, is that if we make this part of the standard
 metadata then everyone would get it, and that raises a couple of concerns:
 
 - Users with lots of instances (say 1000's) but who weren't trying to run any
 form of discovery would start getting a lot more metadata returned, which
 might cause performance issues

 
 The list of peers is only returned if the request comes in for peers.json, so
 there's no growth in the returned data unless it is requested.  Because of the
 very clear instructions in the comment to always pre-fetch data, it is always
 pre-fetched, even though it would make more sense to me to fetch it lazily
 when it was requested!  Easy to fix, but I'm obeying the comment because it
 was phrased in the form of a grammatically valid sentence :-)
 
Ok, thanks for the clarification - I'd missed that this was a new json object, 
I thought you were just adding the data onto the existing object.

 
 - Some users might be running instances on behalf of customers (consider
 say a PaaS type service where the user gets access into an instance but not 
 to
 the Nova API.   In that case I wouldn't want one instance to be able to
 discover these kinds of details about other instances.


 Yes, I do think this is a valid concern.  But, there is likely to be _much_ 
 more
 sensitive information in the metadata service, so anyone doing this is
 hopefully blocking the metadata service anyway.  On EC2 with IAM, or if we
 use trusts, there will be auth token in there.  And not just for security, but
 also because if the PaaS program is auto-detecting EC2/OpenStack by looking
 for the metadata service, that will cause the program to be very confused if 
 it
 sees the metadata for its host!

Currently the metadata service only returns information for the instance that 
is requesting it (the Neutron proxy validates the source address and project), 
so the concern around sensitive information is already mitigated.But if 
we're now going to return information about other instances that changes the 
picture somewhat. 


 
 We already have a mechanism now where an instance can push metadata as
 a way of Windows instances sharing their passwords - so maybe this could
 build on that somehow - for example each instance pushes the data its
 willing to share with other instances owned by the same tenant ?
 
 I do like that and think it would be very cool, but it is much more complex to
 implement I think.

I don't think its that complicated - just needs one extra attribute stored per 
instance (for example into instance_system_metadata) which allows the instance 
to be included in the list


  It also starts to become a different problem: I do think we
 need a state-store, like Swift or etcd or Zookeeper that is easily accessibly 
 to
 the instances.  Indeed, one of the things I'd like to build using this 
 blueprint is
 a distributed key-value store which would offer that functionality.  But I 
 think
 that having peer discovery is a much more tightly defined blueprint, whereas
 some form of shared read-write data-store is probably top-level project
 complexity.
 
Isn't the metadata already in effect that state-store ? 

  I'd just like to
 see it separate from the existing metadata blob, and on an opt-in basis
 
 Separate: is peers.json enough?  I'm not sure I'm understanding you here.

Yep - that ticks the separate box. 

 
 Opt-in:   IMHO, the danger of our OpenStack everything-is-optional-and-
 configurable approach is that we end up in a scenario where nothing is
 consistent and so nothing works out of the box.  I'd much rather hash-out
 an agreement about what is safe to share, even if that is just IPs, and then
 get to the point where it is globally enabled.  Would you be OK with it if it 
 was
 just a list of IPs?

I still think that would cause problems for PaaS services that abstracts the 
users away from direct control of the instance (I,e. the PaaS service is the 
Nova tenant, and creates instances in that tenant that are then made available 
to individual users.   At the moment the only data such a user can see even 
from metadata are details of their instance. Extending that to allowing 
discover of other instances in the same tenant still feels to me to be 
something that needs to be controllable.   The number of instances that 
want / need to be able to discover each other is subset of all instances, so 
making those explicitly declare themselves to the metadata service (when they 
have to already have the logic to get peers.json) doesn't sound like a major 
additional complication to me.

Cheers,
Phil





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-24 Thread Day, Phil
 
  Cool. I like this a good bit better as it avoids the reboot. Still, this is 
  a rather
 large amount of data to copy around if I'm only changing a single file in 
 Nova.
 
 
 I think in most cases transfer cost is worth it to know you're deploying what
 you tested. Also it is pretty easy to just do this optimization but still be
 rsyncing the contents of the image. Instead of downloading the whole thing
 we could have a box expose the mounted image via rsync and then all of the
 machines can just rsync changes. Also rsync has a batch mode where if you
 know for sure the end-state of machines you can pre-calculate that rsync and
 just ship that. Lots of optimization possible that will work fine in your 
 just-
 update-one-file scenario.
 
 But really, how much does downtime cost? How much do 10Gb NICs and
 switches cost?
 

It's not as simple as just saying buy better hardware (although I do have a 
vested interest in that approach ;-)  - on a compute node the Network and Disk 
bandwidth is already doing useful work for paying customers.   The more 
overhead you put into that for updates, the more disruptive it becomes.

Phil 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-24 Thread Day, Phil
 On 01/22/2014 12:17 PM, Dan Prince wrote:
  I've been thinking a bit more about how TripleO updates are developing
 specifically with regards to compute nodes. What is commonly called the
 update story I think.
 
  As I understand it we expect people to actually have to reboot a compute
 node in the cluster in order to deploy an update. This really worries me
 because it seems like way overkill for such a simple operation. Lets say all I
 need to deploy is a simple change to Nova's libvirt driver. And I need to
 deploy it to *all* my compute instances. Do we really expect people to
 actually have to reboot every single compute node in their cluster for such a
 thing. And then do this again and again for each update they deploy?
 
 FWIW, I agree that this is going to be considered unacceptable by most
 people.  Hopefully everyone is on the same page with that.  It sounds like
 that's the case so far in this thread, at least...
 
 If you have to reboot the compute node, ideally you also have support for
 live migrating all running VMs on that compute node elsewhere before doing
 so.  That's not something you want to have to do for *every* little change to
 *every* compute node.


Yep, my reading is the same as yours Russell, everyone agreed that there needs 
to be an update that avoids the reboot where possible (other parts of the 
thread seem to be focused on how much further the update can be optimized).

What's not clear to me is when the plan is to have that support in TripleO - I 
tried looking for a matching Blueprint to see if it was targeted for Icehouse 
but can't match it against the five listed.   Perhaps Rob or Clint can clarify ?
Feels to me that this is a must have before anyone will really be able to use 
TripleO beyond a PoC for initial deployment.






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Why Nova should fail to boot if there are only one private network and one public network ?

2014-01-24 Thread Day, Phil
HI Sylvain,

The change only makes the user have to supply a network ID if there is more 
than one private network available (and the issue there is that otherwise the 
assignment order in the Guest is random, which normally leads to all sorts of 
routing problems).

I'm running a standard Devstack with Neuron (built from trunk a couple of days 
ago), can see both a private and public network, and can boot VMs without 
having to supply any network info:

$ neutron net-list
+--+-+--+
| id   | name| subnets  
|
+--+-+--+
| 16f659a8-6953-4ead-bba5-abf8081529a5 | public  | 
a94c6a9d-bebe-461b-b056-fed281063bc0 |
| 335113bf-f92f-4249-8341-45cdc9d781bf | private | 
51b97cde-d06a-4265-95aa-d9165b7becd0 10.0.0.0/24 |
+--+-+--+

$ nova boot --image  cirros-0.3.1-x86_64-uec --flavor m1.tiny phil
+--++
| Property | Value  
|
+--++
| OS-DCF:diskConfig| MANUAL 
|
| OS-EXT-AZ:availability_zone  | nova   
|
| OS-EXT-STS:power_state   | 0  
|
| OS-EXT-STS:task_state| scheduling 
|
| OS-EXT-STS:vm_state  | building   
|
| OS-SRV-USG:launched_at   | -  
|
| OS-SRV-USG:terminated_at | -  
|
| accessIPv4   |
|
| accessIPv6   |
|
| adminPass| DaX2mcPnEK9U   
|
| config_drive |
|
| created  | 2014-01-24T13:11:30Z   
|
| flavor   | m1.tiny (1)
|
| hostId   |
|
| id   | 34210c19-7a4f-4438-b376-6e65722b4bd6   
|
| image| cirros-0.3.1-x86_64-uec 
(8ee8f7af-1327-4e28-a0bd-1701e04a6ba7) |
| key_name | -  
|
| metadata | {} 
|
| name | phil   
|
| os-extended-volumes:volumes_attached | [] 
|
| progress | 0  
|
| security_groups  | default
|
| status   | BUILD  
|
| tenant_id| cc6258c6a4f34bd1b79e90f41bec4726   
|
| updated  | 2014-01-24T13:11:30Z   
|
| user_id  | 3a497f5e004145d494f80c0c9a81567c   
|
+--++

$ nova list
+--+---+++-+--+
| ID   | Name  | Status | Task State | Power 
State | Networks |
+--+---+++-+--+
| 34210c19-7a4f-4438-b376-6e65722b4bd6 | phil  | ACTIVE | -  | Running  
   | private=10.0.0.5 |
+--+---+++-+--+



From: Sylvain Bauza [mailto:sylvain.ba...@bull.net]
Sent: 23 January 2014 09:58
To: 

  1   2   >