Re: [Openstack] nova client support for restore from soft delete ?

2013-01-30 Thread Day, Phil
Hi Vish,

Sorry, I wasn't very clear in my original post.   I have 
reclaim_instance_inteval set, and the instance does go to SOFT_DELETED.  I 
can see that the api extension adds a restore verb to the list of actions on 
an instance.

What I was trying to find out was if that additional action was available from 
the nova client.  E.g is there a nova restore xxx command ?Looking 
through the client code I can't see one, but thought I might be missing  
something.

Thanks
Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 30 January 2013 00:32
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: Re: [Openstack] nova client support for restore from soft delete ?


On Jan 29, 2013, at 8:55 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:


Hi Folks,

Does the nova client provide support to restore a soft deleted instance  (and 
if not, what is the process for pulling an instance back from the brink) ?

If you have reclaim_instance_interval set then you can restore instances via an 
admin api command. If not then you are not going to have much luck reclaiming 
the insance becasue the drive will be deleted. If by some chance you have the 
backing files still, then you should be able to fix the db and do a hard reboot 
on the instance to get it to come back up. Fixing the db is mostly about 
setting deleted=False but keep in mind that you will also have to manually 
restore the vif and reassociate the fixed ip which hopefully hasn't been 
associated to a new instance.

Vish
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] nova client support for restore from soft delete ?

2013-01-29 Thread Day, Phil
Hi Folks,

Does the nova client provide support to restore a soft deleted instance  (and 
if not, what is the process for pulling an instance back from the brink) ?

Cheers,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Using durable queues

2013-01-28 Thread Day, Phil
Hi Folks,

I'm trying to understand the configuration required to create a durable 
notification queue for billing with RabbitMQ.   As I understand it for messages 
to be durable there need to be three things:

-  The exchange has to be created as durable

-  The queue needs to be created as durable

-  Messages themselves need to be marked as durable when they are 
published.

I can see that setting rabbit_durable_queues=True would create the exchange and 
notification publisher with the options required for me to create a durable 
queue for a billing system, but I thought there was a concern in having durable 
queues in general within Nova because if the consumer fails and recovers it may 
receive the same message multiple times  - so is there some magic in the system 
which prevents all of the other topic queues from becoming durable when this 
option is set (or is there no problem with durable queues in general) ?

Cheers,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Is nova-client thread safe ?

2013-01-21 Thread Day, Phil
Hi Folks,

Does anyone know if the nova-client python binding is written to be thread safe 
?

We saw some odd behavour when using it with multiple threads, and before 
digging deeper just thought I'd check if there were known issues, etc.

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Question about flavor_access extension

2013-01-16 Thread Day, Phil
Hi Folks,

Can anyone point me to some examples of using the flavour_access extension 
please ?

I've been playing around with this in Devstack, and so far I've found that 
whilst I can create a non-public flavor and add access to a specific tenant 
(which I take to show that the extension is loaded properly)


-  It doesn't show up in the result of flavor-list, even for tenants 
that have access to it, or if the user has the admin role



-  Providing I know the ID of the flavor I can get details of it with 
nova flavor-show even if the tenant hasn't been added for access



-  Providing I know the ID of the flavor I can create an instance to 
use it even if the tenant hasn't been added for access


Looking in the code I can't see any support in the API to ever list private 
flavors, or to validate the access to a flavor, but maybe I'm looking in the 
wrong place.

Has anyone else been using this extension ?

Cheers,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] How to create vm instance to specific compute node?

2013-01-03 Thread Day, Phil
 Note this is an admin-only ability by default and can oversubscribe the 
 compute node the instance goes on.

It is now controlled by a policy (create:forced_host) - so if you want to 
extend it to other users you can, for example, set up the policy file to 
control this via a Keystone role

Phil

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 27 December 2012 22:39
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] How to create vm instance to specific compute node?

No.

Use nova boot --availability_zone=nova:hostname where nova: is your 
availability zone and hostname is the hostname of the compute node you wish to 
put the instance on.

Note this is an admin-only ability by default and can oversubscribe the compute 
node the instance goes on.

Best,
-jay

On 12/27/2012 02:45 PM, Rick Jones wrote:
 Does the convention of adding --onhost--computenodename to the instanc 
 name being created still work?
 
 rick jones
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Potential filter scheduler enhancement

2013-01-03 Thread Day, Phil
Hi Folks, and Happy New Year.

In working with the Filter Scheduler I'm considering an enhancement to make the 
final host selection stage configurable.  Whilst its sometimes fine to just 
pick the first host from the list of weighted hosts, the more general case is 
that I'd like to be able to have the scheduler pick one of the first N hosts on 
the weighted list.The specific use cases that I have in mind are:


-  On a large system there is very rarely a single ideal / optimal host 
for a particular instance to be placed on.  In practice any of the N most 
suitable hosts would be fine and allowing the scheduler to randomly pick one of 
these would add some spread for multiple requests that come in at the same 
time.  (I know we now have the retry mechanism if a particular host can't in 
fact handle a specific request - this is a complement to that rather an 
alternative).  Of course anyone who wants to schedule to host in strict 
weighted order would be able to configure N to be 1 (or we could keep the 
current host selection method as a separate default)


-  When creating M instances in one request we could just put each onto 
one of the first M hosts in the list (since these have all been filtered as 
being suitable) instead of having to iterate through the filter / weighting 
functions for each successive instance.

Based on this I'm thinking that such a host_selection function would replace 
the whole of the for loop at the end of the _schedule() method in 
filter_scheduler.py, and take as input the number of instances.  The default 
function would of course be the current behaviour. Before going any further 
with this thinking I wanted to get input on:


i)Do others recognise these use cases as being useful, and 
are there other similar use cases to be considered at the same time ?



ii)   Is it reasonable to make the filter scheduler 
configurable in this way, rather than creating a separate scheduler ?   (My 
feeling  is that because it would only be replacing ~10% of the current 
filter_scheduler code it would be better to not create a new scheduler)



iii) Should the configuration logic for this new function be in 
the fliter_scheduler itself, or in the host_manager (which is where the filter 
and weighting functions are configured) ?


Cheers,
Phil




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Expected behaviour of --meta and --file with --config_drive=True

2012-10-22 Thread Day, Phil
Hi All,

Can someone tell me what is expected to happen for metadata and file injection 
when also specifying a config drive -  For example is the metadata file 
creation  (/meta.js) and file injection meant to still work, or get re-directed 
to the config drive (is that part of config-drive 2.0) ?

I'm trying to understand the impact of moving to a no injection allowed 
config - but on my current system combining the two options just results in the 
VM going to an error state (although I think that may be my issue rather that 
the code)

Thanks,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts

2012-10-16 Thread Day, Phil
Hi Christian,

For a more general solution you might want to look at the code that supports 
passing in -availabilty_zone=az:host (look for forced_host in 
compute/api.py).  Currently this is limited to admin, but I think that should 
be changed to be a specific action that can be controlled by policy (we have a 
change in preparation for this).

Cheers,
Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
GMI M
Sent: 16 October 2012 15:13
To: Christian Parpart
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] looking for a Nova scheduler filter plugin to boot 
nodes on named hosts

Hi Christian,

I think you might be able to use the existing filters in Essex.
For example, you can add the following lines in the nova.conf of the controller 
host (or where nova-scheduler runs) and restart nova-scheduler:

isolated_hosts=nova7
isolated_images=sadsd1e35dfe63

This will allow you to run the image with ID sadsd1e35dfe63 only on the 
compute host nova7.
You can also pass a list of compute servers in the isolated_hosts, if you have 
the need.

I certainly see the use-case for this feature, for example when you want to run 
Windows based instances and you don't want to buy a Windows datacenter license 
for each nova-compute host, but only for a few that will run Windows instances.

I hope this helps you.




On Mon, Oct 15, 2012 at 7:45 PM, Christian Parpart 
tra...@gmail.commailto:tra...@gmail.com wrote:
Hi all,

I am looking for a (Essex) Nova scheduler plugin that parses the 
scheduler_hints to get a hostname of the
hypervisor to spawn the actual VM on, rejecting any other node.

This allows us to explicitely spawn a VM on a certain host (yes, there are 
really usecases where you want that). :-)

I was trying to build my own and searching around since I couldn't believe I 
was the only one, but didn't find one yet.

Does anyone of you maybe have the skills to actually write that simple plugin, 
or even maybe knows where such
a plugin has already been developed?

Many thanks in advance,
Christian Parpart.

___
Mailing list: 
https://launchpad.net/~openstackhttps://launchpad.net/%7Eopenstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : 
https://launchpad.net/~openstackhttps://launchpad.net/%7Eopenstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts

2012-10-16 Thread Day, Phil
Hi Chrsitian,

I'm not sure it's a great use case for a scheduler filter, as the filters are 
cumulative (so each filter only gets the hosts that the other filters in the 
stack have agreed are ok), and even before you get to the filters the host 
manager will take out from the list any of the hosts which are marked as 
disabled (and one of our use cases is to be able to schedule to a disable host).

The current forcing via the az name seems to me to be in the right place 
architecturally - so if we can change that from being admin only to gated on a 
specific keystone role (after all we still want to control how can do this) 
then it seems to me that all the rest of the capability is already there.

If you want to schedule to a specific subset of hosts then take a look at host 
aggregates - these were generalised in Folsom specifically to support this, and 
the scheduler is already aggregate aware.

Cheers,
Phil

From: Christian Parpart [mailto:tra...@gmail.com]
Sent: 16 October 2012 15:35
To: Day, Phil
Cc: GMI M; openstack@lists.launchpad.net
Subject: Re: [Openstack] looking for a Nova scheduler filter plugin to boot 
nodes on named hosts

Hey all,

many thanks for your replies so far.

In general, I must say that there really is an absolute need for explicit 
provisioning, that is, deciding by the admin what single host to prefer (but 
still to reject when there're just no resources left of course).

- Filter like SameHost filter only works when there is already a host, and then 
you've to look up and built up the correlation first (not a big problem, but 
doesn't feel comfortable).
- IsolatedHosts filter doesn't make that much sense, as we are using one 
general template-%{TIMESTAMP} to bootstrap some node and then set up 
everything else inside, and we usually still may have more than one VM on that 
compute node (e.g. a memcached VM and a postfix VM).
- availability zones, so I got told, are deprecated already (dunno) and I can't 
give every compute node a different availability zone, as - tbh - that's what I 
have hostnames for :-)

Philip, I'd really like to dive into developing such a plugin, let's call it 
HostnameFilter-plugin, that the operator can pass one (or a set of) hostname(s) 
that are allowed to spawn the VM on.
However, I just wrote Python once, and even dislike the syntax a bit, not 
saying I hate it, but still :-)

Is there any guide (/tutorial) for reference (or hello_world nova scheduler 
plugin) I can look at to learn on how to write such a plugin?

Many thanks for your replies so far,
Christian Parpart.
On Tue, Oct 16, 2012 at 4:22 PM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Christian,

For a more general solution you might want to look at the code that supports 
passing in -availabilty_zone=az:host (look for forced_host in 
compute/api.py).  Currently this is limited to admin, but I think that should 
be changed to be a specific action that can be controlled by policy (we have a 
change in preparation for this).

Cheers,
Phil

From: 
openstack-bounces+philip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net
 
[mailto:openstack-bounces+philip.daymailto:openstack-bounces%2Bphilip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net]
 On Behalf Of GMI M
Sent: 16 October 2012 15:13
To: Christian Parpart
Cc: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Subject: Re: [Openstack] looking for a Nova scheduler filter plugin to boot 
nodes on named hosts

Hi Christian,

I think you might be able to use the existing filters in Essex.
For example, you can add the following lines in the nova.conf of the controller 
host (or where nova-scheduler runs) and restart nova-scheduler:

isolated_hosts=nova7
isolated_images=sadsd1e35dfe63

This will allow you to run the image with ID sadsd1e35dfe63 only on the 
compute host nova7.
You can also pass a list of compute servers in the isolated_hosts, if you have 
the need.

I certainly see the use-case for this feature, for example when you want to run 
Windows based instances and you don't want to buy a Windows datacenter license 
for each nova-compute host, but only for a few that will run Windows instances.

I hope this helps you.



On Mon, Oct 15, 2012 at 7:45 PM, Christian Parpart 
tra...@gmail.commailto:tra...@gmail.com wrote:
Hi all,

I am looking for a (Essex) Nova scheduler plugin that parses the 
scheduler_hints to get a hostname of the
hypervisor to spawn the actual VM on, rejecting any other node.

This allows us to explicitely spawn a VM on a certain host (yes, there are 
really usecases where you want that). :-)

I was trying to build my own and searching around since I couldn't believe I 
was the only one, but didn't find one yet.

Does anyone of you maybe have the skills to actually write that simple plugin, 
or even maybe knows where such
a plugin has already been developed?

Many thanks in advance,
Christian Parpart

Re: [Openstack] Versioning for notification messages

2012-10-10 Thread Day, Phil
Thanks Doug,  I'll make sure I get to that session then.

Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Doug Hellmann
Sent: 09 October 2012 23:03
To: Eric Windisch
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Versioning for notification messages


On Tue, Oct 9, 2012 at 4:12 PM, Eric Windisch 
e...@cloudscaling.commailto:e...@cloudscaling.com wrote:



On Tuesday, October 9, 2012 at 15:58 PM, David Ripton wrote:

 On 10/09/2012 01:07 PM, Day, Phil wrote:

  What do people think about adding a version number to the notification
  systems, so that consumers of notification messages are protected to
  some extent from changes in the message contents ?

Right now, there is no appropriate or acceptable way to consume notifications. 
Plainly, while notifications exist, they shouldn't be considered stable or even 
usable until this exists.

Message formats and versioning should be a consideration in the effort to 
create a reusable consumption pattern.

This will be part of what is covered during the Using the message bus for 
messages session Tuesday afternoon at the summit.

http://summit.openstack.org/cfp/details/117

Doug
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Versioning for notification messages

2012-10-10 Thread Day, Phil
Hi All,

I guess I may have mis-stated the problem a tad in talking about version 
numbering.  The notification system is an outbound interface, and my interest 
is in being able to write consumers with some guarantee that they won't be 
broken as the notification message format evolves.   

Having a version number gives the client a way to know that it may now be 
broken, but that's not really the same as having an interface with some degree 
of guaranteed compatibility,

Phil

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
David Ripton
Sent: 09 October 2012 20:59
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Versioning for notification messages

On 10/09/2012 01:07 PM, Day, Phil wrote:

 What do people think about adding a version number to the notification 
 systems, so that consumers of notification messages are protected to 
 some extent from changes in the message contents ?

 For example, would it be enough to add a version number to the 
 messages - or should we have the version number as part of the topic 
 itself (so that the notification system can provide both a 1.0 and 1.1 feed), 
 etc ?

Putting a version number in the messages is easy, and should work fine. 
  Of course it only really helps if someone writes clients that can deal with 
multiple versions, or at least give helpful error messages when they get an 
unexpected version.

I think using separate topics for each version would be inefficient and 
error-prone.

Inefficient because you'd have to send out multiples of each message, some of 
which would probably never be read.  Obviously, if you're sending out N copies 
of each message then you expect only 1/N the queue performance.  Worse, if 
you're sending out N copies of each message but only 1 of them is being 
consumed, your queue server is using a lot more memory than it needs to, to 
hold onto old messages that nobody needs. 
(If you properly configure a high-water mark or timeout, then the old messages 
will eventually be thrown away.  If you don't, then your queue server will 
eventually consume way too much memory and start swapping, your cloud will 
break, and someone will get paged at 2 a.m.)

Error-prone because someone would end up reusing the notification queue code 
for less idempotent/safe uses of queues, like internal API calls. 
And then client A would pick up the message from topic_v1, and client B would 
pick up the same message from topic_v2, and they'd both perform the same API 
operation, resulting in wasted resources in the best case and data corruption 
in the worst case.

-- 
David Ripton   Red Hat   drip...@redhat.com

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-10 Thread Day, Phil

 Per my understanding, this shouldn't happen no matter how (fast) you create 
 instances since the requests are
 queued and scheduler updates resource information after it processes each 
 request.  The only possibility may cause 
the problem you met that I can think of is there are more than 1 scheduler 
doing scheduling.

I think the new retry logic is meant to be safe even if there is more than one 
scheduler, as the requests are effectively serialised when they get to the 
compute manager, which can then reject any that break its actual resource 
limits ?

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Huang Zhiteng
Sent: 10 October 2012 04:28
To: Jonathan Proulx
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Folsom nova-scheduler race condition?

On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote:
 Hi All,

 Looking for a sanity test before I file a bug.  I very recently 
 upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My 
 scheduler settings in nova.conf are:

 scheduler_available_filters=nova.scheduler.filters.standard_filters
 scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,
 ComputeFilter 
 least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost
 _fn
 compute_fill_first_cost_fn_weight=1.0
 cpu_allocation_ratio=1.0

 This had been working to fill systems based on available RAM and to 
 not exceed 1:1 allocation ration of CPU resources with Essex.  With 
 Folsom, if I specify a moderately large number of instances to boot or 
 spin up single instances in a tight shell loop they will all get 
 schedule on the same compute node well in excess of the number of 
 available vCPUs . If I start them one at a time (using --poll in a 
 shell loop so each instance is started before the next launches) then 
 I get the expected allocation behaviour.

Per my understanding, this shouldn't happen no matter how (fast) you create 
instances since the requests are queued and scheduler updates resource 
information after it processes each request.  The only possibility may cause 
the problem you met that I can think of is there are more than
 1 scheduler doing scheduling.
 I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to 
 attempt to address this issue but as I read it that fix is based on 
 retrying failures.  Since KVM is capable of over committing both CPU 
 and Memory I don't seem to get retryable failure, just really bad 
 performance.

 Am I missing something this this fix or perhaps there's a reported bug 
 I didn't find in my search, or is this really a bug no one has 
 reported?

 Thanks,
 -Jon

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



--
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Versioning for notification messages

2012-10-10 Thread Day, Phil
Whilst a version number would allow a consumer to detect that something has 
changed, it doesn't really help in providing any kind of backward compatibility.
Consider the following scenario:  There are a bunch of systems external to Nova 
developed to consume notification messages, and someone introduces a change to 
the notification system that changes the message content.   They do the right 
thing in updating the version number, but all of those external systems now 
need to change as well.   The new version number lets them fail explicitly, but 
it could still have a significant impact on a production system.

Where I'd like to get to make the notification system a formal external 
interface, that has the same degree of stability, version control, and rigor 
around changes that the inbound API has.

I would guess that Ceilometer will have some requirement around this ?

Phil

From: Diego Parrilla Santamaría [mailto:diego.parrilla.santama...@gmail.com]
Sent: 10 October 2012 09:18
To: Day, Phil
Cc: David Ripton; openstack@lists.launchpad.net
Subject: Re: [Openstack] Versioning for notification messages

If we want to have a notification system that could handle messages with 
different payloads and different versions, we have two options:

1) detect the version of the payload in the notification message
2) add a version number in the notification message

Option 1 sounds to me like something hard to maintain. Option 2 seems to be 
correct way to do it in the long term.

+1 for a version number in the notification message

Cheers
Diego
 --
Diego Parrilla
CEO
www.stackops.comhttp://www.stackops.com/ |  
diego.parri...@stackops.commailto:diego.parri...@stackops.com | +34 649 94 43 
29 | skype:diegoparrilla
http://www.stackops.com/

[http://stackops.s3-external-3.amazonaws.com/STACKOPSLOGO-ICON.png]



On Wed, Oct 10, 2012 at 9:27 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi All,

I guess I may have mis-stated the problem a tad in talking about version 
numbering.  The notification system is an outbound interface, and my interest 
is in being able to write consumers with some guarantee that they won't be 
broken as the notification message format evolves.

Having a version number gives the client a way to know that it may now be 
broken, but that's not really the same as having an interface with some degree 
of guaranteed compatibility,

Phil

-Original Message-
From: 
openstack-bounces+philip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net
 
[mailto:openstack-bounces+philip.daymailto:openstack-bounces%2Bphilip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net]
 On Behalf Of David Ripton
Sent: 09 October 2012 20:59
To: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Subject: Re: [Openstack] Versioning for notification messages
On 10/09/2012 01:07 PM, Day, Phil wrote:

 What do people think about adding a version number to the notification
 systems, so that consumers of notification messages are protected to
 some extent from changes in the message contents ?

 For example, would it be enough to add a version number to the
 messages - or should we have the version number as part of the topic
 itself (so that the notification system can provide both a 1.0 and 1.1 feed), 
 etc ?

Putting a version number in the messages is easy, and should work fine.
  Of course it only really helps if someone writes clients that can deal with 
multiple versions, or at least give helpful error messages when they get an 
unexpected version.

I think using separate topics for each version would be inefficient and 
error-prone.

Inefficient because you'd have to send out multiples of each message, some of 
which would probably never be read.  Obviously, if you're sending out N copies 
of each message then you expect only 1/N the queue performance.  Worse, if 
you're sending out N copies of each message but only 1 of them is being 
consumed, your queue server is using a lot more memory than it needs to, to 
hold onto old messages that nobody needs.
(If you properly configure a high-water mark or timeout, then the old messages 
will eventually be thrown away.  If you don't, then your queue server will 
eventually consume way too much memory and start swapping, your cloud will 
break, and someone will get paged at 2 a.m.)

Error-prone because someone would end up reusing the notification queue code 
for less idempotent/safe uses of queues, like internal API calls.
And then client A would pick up the message from topic_v1, and client B would 
pick up the same message from topic_v2, and they'd both perform the same API 
operation, resulting in wasted resources in the best case and data corruption 
in the worst case.

--
David Ripton   Red Hat   drip...@redhat.commailto:drip...@redhat.com

___
Mailing list: https://launchpad.net/~openstack
Post to : 
openstack@lists.launchpad.netmailto:openstack

Re: [Openstack] Discussion / proposal: Ability to reset tenant's quotas to default

2012-10-09 Thread Day, Phil
Hi Glynn,

My point was that if a user is currently configured to have a quota of 50 VMs, 
and the default is currently configured to be 20 VMs then there is a difference 
between configuring the user to have a quota of 20 and configuring a user to 
have the default quota.The first is just a subsequent update to give a 
user a different but still specific quota value, whereas the second undoes any 
quota value that has been specifically assigned.   And the intent here is the 
second case.

Perhaps quota delete would be a more appropriate description (and the right way 
to implement this in the API) ?

Cheers,
Phil 



-Original Message-
From: Eoghan Glynn [mailto:egl...@redhat.com] 
Sent: 09 October 2012 17:32
To: Day, Phil
Cc: openstack@lists.launchpad.net; Vijaya Erukala
Subject: Re: [Openstack] Discussion / proposal: Ability to reset tenant's 
quotas to default



 Isn't that just changing one custom limit with another ?
 
 A true reset to the defaults would see the user stay in step with any 
 changes to the default values.

Do you mean configured changes to the defaults?

AFAIK 'nova quota-defaults' returns the current set of defaults, which seems to 
be the logical point to reset against.


  HI All,
  
  
  
  I would like to open a discussion on a topic user should have a 
  option to reset the tenant’s quotas( to the default).
 
 
 Hi Vijaya,
 
 I don't think a new nova command is needed for this use-case, just
 add a simple custom script:
 
   nova quota-update `nova quota-defaults $1 | tail -n +4 | tr '_' '-'
   | awk '/|/ {printf( --%s %s, $2,$4)}'` $1
 
 then call with the tenant ID as command line arg.
 
 Cheers,
 Eoghan
   
 
  
  
  
  At present nova client has following commands for the quota
  operation.
  
  
  
  $nova --help | grep quota
  
  quota-defaults List the default quotas for a tenant.
  
  quota-show List the quotas for a tenant.
  
  quota-update Update the quotas for a tenant.
  
  
  
  It will be very helpful to have a command to reset quota values to
  defaults .
  
  For ex: User who wants to do huge tests on the system and rollback
  once the test is done.
  
  So my proposal is to add a new command quota-reset to the nova
  client
  which reverts the quota value supplied for the tenant ,to the
  default.
  
  Some thing similar to nova quota-reset( tenant-id  key)
  
  Let me know your suggestion/thoughts on the same.
  
  
  
  Thanks,
  
  Vijaya
  
  
  
  DISCLAIMER == This e-mail may contain privileged and
  confidential information which is the property of Persistent
  Systems
  Ltd. It is intended only for the use of the individual or entity to
  which it is addressed. If you are not the intended recipient, you
  are
  not authorized to read, retain, copy, print, distribute or use this
  message. If you have received this communication in error, please
  notify the sender and delete all copies of this message.
  Persistent Systems Ltd. does not accept any liability for virus
  infected mails.
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-09 Thread Day, Phil
Hi Jon,

I believe the retry is meant to occur not just if the spawn fails, but also if 
a host receives a request which it can't honour because it already has too many 
VMs running or in progress of being launched.   

Maybe try reducing your filters down a bit (standard_filters means all 
filters I think) in case there is some odd interaction between that full set ?

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jonathan Proulx
Sent: 09 October 2012 15:53
To: openstack@lists.launchpad.net
Subject: [Openstack] Folsom nova-scheduler race condition?

Hi All,

Looking for a sanity test before I file a bug.  I very recently upgraded my 
install to Folsom (on top of Ubuntu 12.04/kvm).  My scheduler settings in 
nova.conf are:

scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter
least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
compute_fill_first_cost_fn_weight=1.0
cpu_allocation_ratio=1.0

This had been working to fill systems based on available RAM and to not exceed 
1:1 allocation ration of CPU resources with Essex.  With Folsom, if I specify a 
moderately large number of instances to boot or spin up single instances in a 
tight shell loop they will all get schedule on the same compute node well in 
excess of the number of available vCPUs . If I start them one at a time (using 
--poll in a shell loop so each instance is started before the next launches) 
then I get the expected allocation behaviour.

I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to 
address this issue but as I read it that fix is based on retrying failures.  
Since KVM is capable of over committing both CPU and Memory I don't seem to get 
retryable failure, just really bad performance.

Am I missing something this this fix or perhaps there's a reported bug I didn't 
find in my search, or is this really a bug no one has reported?

Thanks,
-Jon

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Discussion / proposal: deleted column marker

2012-10-03 Thread Day, Phil
I *think* deleted flavours used to be needed as there could still be instances 
running against them and the flavour definition was used by the quota 
calculations.  Not sure if this is still the case, or if the data now comes 
straight from the instances table.Some aspects of a flavour (e.g. 
rxtx_factor) could be useful to a scheduler, and that data currently isn't 
saved into the instance.

I guess the usage audit type functionality (i.e. tell me about all instances 
that have run sometime in this audit period) may be another case where deleted 
instances are required at the moment.



-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Pitucha, Stanislaw Izaak
Sent: 03 October 2012 13:09
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Discussion / proposal: deleted column marker

Hi Johannes,
I know the names collide here, but since this technique is known as 
soft-deletes... We need more namespaces :)

Thanks for the idea of grepping for read_deleted. Fortunately I think the 
situation isn't as bad as it would seem. Let me group the places which change 
read_deleted in the code (many results from grep are comments).
Reading only deleted entries, or all:

- xenserver (instance) - cleanup tool - I don't do xen, so I'm not sure how 
useful is it. Anyone?
- tests - can be ignored - if there is no functionality, tests can be killed
- sqlalchemy api (instance) - fixed ip can reference a deleted instance (tricky 
situation; from the commit message: It adds a test  to verify that the code 
works with a duplicate deleted floating_ip - this seems very
wrong...)
- sqlalchemy api (iscsi) - getting deleted iscsi targets which are still 
referenced by volume
- sqlalchemy api (various) - instance migration, s3image, bandwidth, storage 
manager, flavors (only available from nova-manage)
- compute manager (instance) - reaping deleted instances - I can't see why the 
same logic wouldn't apply if the rows are actually missing (needs analysis, 
maybe there's a reason)
- compute instance_types (flavour) - apparently flavours are usually read even 
if they're deleted
- network manager (instance) - making sure that ips/networks can be removed 
even if the instance is already deleted

So here's what I can see: pretty much all the usage is about deleting instances 
or making sure parts connected to instances go away if the instance is deleted 
earlier. It doesn't seem right, but could be progressively fixed. It looks like 
another state of the instance, which could be integrated into the other state 
fields.

Nothing else uses the deleted column explicitly (unless I missed something - 
possible). Ips, networks, keys, anything that actually goes away permanently 
(and doesn't involve a big chain of cleanup events) is never read back once 
it's marked as deleted.
So maybe a better approach is not to remove the deleted column completely, but 
to start stripping it from places where it's not needed (fixed, floating ips, 
networks, ssh keys being good candidates). This could be done by creating a new 
layer over NovaBase and removing the deleted marker from NovaBase itself. Or 
maybe even by creating a SoftDeleteMixin and applying it to all current models, 
then removing it where unnecessary? That would keep the current behaviour where 
it's currently needed, but at the same time it would provide a known migration 
path to get rid of it.
We could start stripping the new layer then table by table and adding unique 
constraints where they make sense, before trying to tackle the really tricky 
parts (for instances table, maybe the marker actually makes sense? maybe not? - 
it's definitely not going to be an easy decision/fix)

Regards,
Stanisław Pitucha
Cloud Services
Hewlett Packard


-Original Message-
From: openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net
[mailto:openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net] On
Behalf Of Johannes Erdfelt
Sent: Tuesday, October 02, 2012 6:12 PM
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Discussion / proposal: deleted column marker

On Tue, Oct 02, 2012, Pitucha, Stanislaw Izaak stanislaw.pitu...@hp.com
wrote:
 Does anyone know why soft-delete is still in place?
 Are there any reasons it can't / shouldn't be removed at this time?
 If it's possible to remove it, would you miss it?

I'm certainly not a fan of the database soft-delete for many of the same
reasons you've described, but there are some places where removing it would
require code changes.

Off the top of my head would be pretty much anywhere a context sets
read_deleted to 'yes' or 'only', which is a surprising number of places now
that I've done a grep.

I suspect at least a handful of those cases don't need the functionality and
others probably use it as a crutch around other problems.

JE


___
Mailing list: 

[Openstack] Project specific flavors

2012-10-02 Thread Day, Phil
Hi Folks,

Can someone point me to where Nova uses the instance_type_projects information 
to decide which flavors are and aren't available to a project please ?

I can see how the flavour_access API extension sets up entries in the table, 
but I don't see anything which seems to take that into account when listing 
flavors ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Generalsied host aggregates in Folsom

2012-09-19 Thread Day, Phil
Hi Folks,

Trying to catch-up  (I'm thinking of changing my middle name to catch-up :)  ) 
with the generalisation of host aggregates - and looking at the code it looks 
to me as if the chain for adding a host to an aggregate still ends up calling 
the virt layer

api/openstack/compute/contrib/aggregates/AggregateController/action()
compute/api/AggregateAPI/add_host_to_aggregate()
RPC
compute/manager/add_aggregate_host()
virt/add_to_aggregate()

I thought the change was to be able to create aggregates that can be linked to 
a hypervisor concept, but could also just be a way of tagging hosts into 
pools for other scheduler reasons - am I missing somethign ?

Thanks,
Phil



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Generalsied host aggregates in Folsom

2012-09-19 Thread Day, Phil
Thanks Joe,

I was anticipating something more complex to be able to say when an aggregate 
should or shouldn't be linked to the hypevisor and overlooked the obvious.

So just to make sure I've  got it - on libvirt systems an aggregate can be used 
for anything (because of the NoOp in the driver), but on xen systems it's still 
liked to the hypervisor pools ?

Thanks
Phil

From: Joe Gordon [mailto:j...@cloudscaling.com]
Sent: 19 September 2012 19:02
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: Re: [Openstack] Generalsied host aggregates in Folsom


On Wed, Sep 19, 2012 at 10:18 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Folks,

Trying to catch-up  (I'm thinking of changing my middle name to catch-up :)  ) 
with the generalisation of host aggregates - and looking at the code it looks 
to me as if the chain for adding a host to an aggregate still ends up calling 
the virt layer

api/openstack/compute/contrib/aggregates/AggregateController/action()
compute/api/AggregateAPI/add_host_to_aggregate()
RPC
compute/manager/add_aggregate_host()
virt/add_to_aggregate()

I thought the change was to be able to create aggregates that can be linked to 
a hypervisor concept, but could also just be a way of tagging hosts into 
pools for other scheduler reasons - am I missing somethign ?

The RPC component is there to ensure XenAPI still works.  In the libvirt 
driver, add_to_aggregate() is a noop.

So you can create an aggregate that can be linked to a hypervisor but also as a 
way to tag hosts


Thanks,
Phil




___
Mailing list: https://launchpad.net/~openstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Which volume API in Nova ?

2012-09-11 Thread Day, Phil
Thanks Vish,

So are both maintained at present - for example if there was a bug fix to 
volume creation would it be applied to the VolumeController in both places ?

I'm just trying to work out how best to provide compatibility as we roll 
forwards - seems like for some period we may need to have both the compute 
extension and the Volume API server running.

Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 10 September 2012 18:08
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: Re: [Openstack] Which volume API in Nova ?

Prior to creating our own endpoint for volumes inside of nova (one of the first 
steps in the transition that happened right before the essex release), volume 
control was done by compute extensions. We left these extensions in case anyone 
was using them. They should be roughly functionally equivalent, but the compute 
extension is located at:
http://host:8774/os-volumes (host and port of the compute endpoint)
and the volume api is at:
http://host:8776/volumes (host and port of the volume endpoint)

Vish

On Sep 10, 2012, at 8:34 AM, Day, Phil 
philip@hp.commailto:philip@hp.com wrote:


Hi Folks,

I know things are in transition right now from Nova to Cinder, but can someone 
shed light on the difference between api.openstack.compute.contrib.volumes 
and api.openstack.volume ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Which volume API in Nova ?

2012-09-10 Thread Day, Phil
Hi Folks,

I know things are in transition right now from Nova to Cinder, but can someone 
shed light on the difference between api.openstack.compute.contrib.volumes 
and api.openstack.volume ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] display_name, hostname, and muliple instances

2012-08-30 Thread Day, Phil
Hi Folks,

I'm trying to understand the difference between display_name and hostname 
in the instances table, and struggling a bit to track their use though the 
code.   It looks to me as if:

display_name is always the name specified by the user

hostname is a sanitized version of this (always converted to 
lowercase, etc)


Is that correct, and if so:


-  Is there anything which makes sure that the sanitized hostname is 
always unique for a particular customer (or are they meant to understand and 
anticipate the consequences of this sanitization)


-  How is the hostname set when creating more than one instance at the 
same time (I thought I'd remembered seeing something in the code sometime back 
to add the an integer to the hostname, but I can't seem to see anything in the 
current code that does this)



Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Default rules for the 'default' security group

2012-08-29 Thread Day, Phil
The HPCS portal does this for you via the Nova API when the account is created 
– we haven’t implemented it as a specific Nova feature.

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Shake Chen
Sent: 24 August 2012 01:54
To: Gabriel Hurley
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Default rules for the 'default' security group

Now in HPcloud, have this feature.

all the new user, the default security group would open 80,22, 443 and icmp.

On Fri, Aug 24, 2012 at 2:02 AM, Gabriel Hurley 
gabriel.hur...@nebula.commailto:gabriel.hur...@nebula.com wrote:
I traced this through the code at one point looking for the same thing. As it 
stands, right now there is *not* a mechanism for customizing the default 
security group’s rules. It’s created programmatically the first time the rules 
for a project are retrieved with no hook to add or change its characteristics.

I’d love to see this be possible, but it’s definitely a feature request.


-  Gabriel

From: 
openstack-bounces+gabriel.hurley=nebula@lists.launchpad.netmailto:nebula@lists.launchpad.net
 
[mailto:openstack-bounces+gabriel.hurleymailto:openstack-bounces%2Bgabriel.hurley=nebula@lists.launchpad.netmailto:nebula@lists.launchpad.net]
 On Behalf Of Boris-Michel Deschenes
Sent: Thursday, August 23, 2012 7:59 AM
To: Yufang Zhang; 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Subject: Re: [Openstack] Default rules for the 'default' security group

I’m very interested in this, we run essex and have a very bad workaround for 
this currently, but it would be great to be able to do this (set default rules 
for the default security group).

Boris

De : 
openstack-bounces+boris-michel.deschenes=ubisoft@lists.launchpad.netmailto:openstack-bounces+boris-michel.deschenes=ubisoft@lists.launchpad.net
 
[mailto:openstack-bounces+boris-michel.deschenes=ubisoft@lists.launchpad.net]mailto:[mailto:openstack-bounces+boris-michel.deschenes=ubisoft@lists.launchpad.net]
 De la part de Yufang Zhang
Envoyé : 23 août 2012 08:43
À : openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Objet : [Openstack] Default rules for the 'default' security group

Hi all,

Could I ask how to set the default rules for the 'default' security group for 
all the users in openstack? Currently, the 'default' security group has no rule 
by default, thus newly created instances could only be accessed by instances 
from the same group.

Is there any method to set default rules(such as ssh or icmp) for the 'default' 
security group for all users in openstack, so that I don't have to remind the 
new users to modify security group setting the fist time they logged into 
openstack and create instances?  I have ever tried HP could which is built on 
openstack, they permit ssh or ping to the instances in the 'default' security 
group.

Best Regards.

Yufang

___
Mailing list: 
https://launchpad.net/~openstackhttps://launchpad.net/%7Eopenstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : 
https://launchpad.net/~openstackhttps://launchpad.net/%7Eopenstack
More help   : https://help.launchpad.net/ListHelp



--
Shake Chen

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Default policy for flavorextraspecs and flavorextradata

2012-08-01 Thread Day, Phil
Hi Folks,

Looking at the current policy.json file both falvorextraspecs and 
flavorextradata are set to [], whereas flavormanage is [[rule:admin_api]]

Seems to me that all three of these should be admin_api - or am I missing 
something ?

Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Instance stuck in deleting state with error

2012-07-31 Thread Day, Phil
Sorry for a dumb question, but can someone point me to where the authorization 
is configured to determine who does and doesn't get access to these actions 
please ?

Thanks,
Phil

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Karajgi, Rohit
Sent: 31 July 2012 15:17
To: Wolfgang Hennerbichler; openstack@lists.launchpad.net
Subject: Re: [Openstack] Instance stuck in deleting state with error

Hi Wolfgang, 

Have you updated the your python-novaclient? The 'nova reset-state server 
--active' command is pretty much there. It is an admin action in Nova 
extensions.
$ nova help | grep reset
reset-state Reset the state of an instance


Regards,
Rohit Karajgi | Lead Engineer | NTT Data Global Technology Services Private Ltd 
| w. +91.20.6604.1500 x 378 |  m. +91 992.242.9639 | rohit.kara...@nttdata.com



-Original Message-
From: openstack-bounces+rohit.karajgi=nttdata@lists.launchpad.net 
[mailto:openstack-bounces+rohit.karajgi=nttdata@lists.launchpad.net] On 
Behalf Of Wolfgang Hennerbichler
Sent: Tuesday, July 31, 2012 10:45 AM
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Instance stuck in deleting state with error

On 07/30/2012 09:35 PM, Kevin L. Mitchell wrote:

 That said, be aware that there is a reset-state command to 
 novaclient, so that you can do Chris's recommended reset without 
 having to muck around with the database directly.

where?
nova help | grep reset
yields nothing.
I think this is one of openstack worst weaknesses, that if the status of an 
instance is in error-state and one has to wade through a couple of logfiles 
(scheduler, nova-network, nova-compute) in order to find out what really 
happened. I would be superior if the error itself would be reported back to the 
database.

Wolfgang

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

__
Disclaimer:This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data.  If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Capacity based scheduling: What updated free_ram_mb in Folsom

2012-07-13 Thread Day, Phil
Hi Folks,

I was reviewing a code change to add generic retries for build failures ( 
https://review.openstack.org/#/c/9540/2 ), and wanted to be sure that it 
wouldn't invalidate the capacity accounting used by the scheduler.


However I've been sitting here for a while working through the Folsom scheduler 
code trying to understand how the capacity based scheduling now works, and I'm 
sure I'm missing something obvious but I just can't work out where the 
free_ram_mb value in the compute_node table gets updated.



I can see the database api method to update the values, 
compute_node_utilization_update(),  it doesn't look as if anything in the code 
ever calls that ?



From when I last looked at this / various discussions here and at the design 
summits I thought the approach was that:

-  The scheduler would make a call (rather than a cast) to the compute 
manger, which would then do some verification work, update the DB table whilst 
in the context of that call, and then start a thread to complete the spawn.  
The need to go all the way to the compute node as a call was to avoid race 
conditions from multiple schedulers.  (the change I'm looking at is part of a 
blueprint to avoid such a race, so maybe I imagined the change from cast to 
call ?)



-  On a delete, the capacity_notifer (which had to be configured into 
the list_notifier) would detect the delete message, and decrement the database 
values.



But now I look through the code it looks as if the scheduler is still doing a 
cast (scheduler/driver),  and although I can see the database api call to 
update the values, compute_node_utilization_update(),  it doesn't look as if 
anything in the code ever calls that ?



The ram_filter scheduler seems to use the free_ram_mb value, and that value 
seems to come from the host_manager in the scheduler which is read from the 
Database,  but I can't for the life of me work out where these values are 
updated in the Database.



The capacity_notifier, which used to decrement values on a VM deletion only 
(according to the comments the increment was done in the scheduler) seems to 
have now disappeared altogether in the move of the notifier to openstack/common 
?



So I'm sure I'm missing some other even more cunning plan on how to keep the 
values current, but I can't for the life of me work out what it is - can 
someone fill me in please ?



Thanks,

Phil

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Nova Cells

2012-07-13 Thread Day, Phil
Hi Chris,

What happens to notifications to other compute servers that are generated as 
side effect of VM creation a result of using the IPtables firewall driver ?
Are they somehow propagated to other Cells, or is there something that keeps 
all VMs in a particular security group within a Cell ?

I looked in the doc and code, but I couldn't see anything that seems to 
indicate either way.

Thanks
Phil


From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 12 July 2012 20:13
To: Michael J Fork
Cc: openstack@lists.launchpad.net; Chris Behrens
Subject: Re: [Openstack] Nova Cells

Partially developed.  This probably isn't much use, but I'll throw it out 
there: http://comstud.com/cells.pdf

ATM the messy code speaks for itself here:

https://github.com/comstud/nova/tree/cells_service

The basic architecture is:

Top level cell with API service has DB, rabbit, and the nova-cells service.  
API's compute_api_class is overridden to use a new class that shoves every 
action on an instance into the nova-cells service, telling it which cell to 
route the request to based on instance['cell_name'].  The nova-cells service 
routes the request to correct cell as requested... 1 hop at a time to the 
nova-cells service in each child.

(Each child runs this new nova-cells service also)

If nova-cells service gets a message destined for itself, it'll call the 
appropriate compute_api call in the child.

DB updates are hooked in the child and pushed up to parent cells.

New instance creation is slightly different.  API will create the DB entry up 
front... and pass the uuid and all of the same data to the nova-cells service, 
which will pick a cell for the instance.  When it is decided to use the 
'current cell' in some child, it will create the DB entry there as well... push 
a notification upward... and cast the message over to the host scheduler 
(current scheduler).  And the build continues as normal from there (host is 
picked, and message is casted to the host, etc).

There's some code to sync instances in case of lost DB updates.. but there's 
improvements to make yet..

Sorry... that's very quick.  I'm going to be AFK for a couple days..

- Chris


On Jul 12, 2012, at 10:39 AM, Michael J Fork wrote:



Outside of the Etherpad (http://etherpad.openstack.org/FolsomComputeCells) and 
presentation referenced there (http://comstud.com/FolsomCells.pdf), are there 
additional details available on the architecture / implementation of Cells?

Thanks.

Michael

-
Michael Fork
Cloud Architect, Emerging Solutions
IBM Systems  Technology Group
___
Mailing list: https://launchpad.net/~openstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] bw_uasge counters

2012-07-11 Thread Day, Phil
Hi All,

I'm looking at the network bandwidth code with a view to how the current 
framework could be made to work with libvirt, and a I have a couple of 
questions that hopefully someone familiar with the Xen implementation can 
answer:


-  Do the Xen counters get reset after they are read, or are the values 
always cumulative ?   (I'm guessing the latter as they seem to be just 
overwritten by the periodic task).



-  It looks as if the table is intended to provide a set of values per 
instance_uuid/mac combination (presumably to have counters per NIC), but the 
code which updates the entries looks like it always just updates the first 
entry it finds for a particular uuid:

bwusage = model_query(context, models.BandwidthUsage,
  session=session, read_deleted=yes).\
  filter_by(start_period=start_period).\
  filter_by(uuid=uuid).\
 first()

if not bwusage:
...

Thanks,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Setting VM passwords when not running on Xen

2012-07-05 Thread Day, Phil
 -Original Message-
 From: openstack-bounces+john.garbutt=eu.citrix@lists.launchpad.net
 [mailto:openstack-bounces+john.garbutt=eu.citrix.com@lists.launchpad.n
 et]
 On Behalf Of Thierry Carrez
 Sent: Wednesday, July 4, 2012 10:33 AM
 To: openstack@lists.launchpad.net
 Subject: Re: [Openstack] Setting VM passwords when not running on Xen
 
 Scott Moser wrote:
  Is it for some reason not possible to have code that runs on first 
  instance boot that reads the metadata service (or config drive) and 
  sets the password appropriately?
 
 I see no reason why you could not. Windows scripting supported both 
 running scripts at boot and setting user passwords last time I looked 
 :)
 

From a security perspective we want to keep the un-encrypted password (or an 
encrypted password and the means to decrypt it) out of Nova - hence generating 
it inside the VM and encrypting with the public key during boot seems stronger.

   

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Time for a UK Openstack User Group meeting ?

2012-07-04 Thread Day, Phil
Hi All,

I'm thinking it's about time we had an OpenStack User Group meeting in the UK , 
and would be interested in hearing from anyone interested in attending, 
presenting, helping to organise, etc.

London would seem the obvious choice, but we could also host here in HP Bristol 
if that works for people.

Reply here or e-mail me directly (phil@hp.com), and if there's enough 
interest I'll pull something together.

Phil Day
Compute Tech Lead
HP Cloud Services


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Setting VM passwords when not running on Xen

2012-07-03 Thread Day, Phil
Hi Folks,

Is anyone else looking at how to support images that need a password rather 
than an ssh key (windows) on hypervisors that don't support set_admin_password 
(e.g. libvirt) ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Setting VM passwords when not running on Xen

2012-07-03 Thread Day, Phil
Thanks John,

One approach we were wondering about is to have an agent in Windows which:


o   Generates a random password and sets it for the admin account

o   Gets the public ssh key from the metadata service

o   Encrypts the password with the public key

o   Pushes the encrypted public key back to the metadata server (requires the 
metadata server to support Push)

The user can then get the encrypted password from the API and decrypt it with 
their private key

The advantage would be that the clear text password never leaves the VM, so 
there are fewer security concerns about Nova having access to clear text 
passwords.

It would also seem to be a small change in the metadata service and no change 
in the API layer - not sure if there are concerns about what a VM could break 
if it updates its own metadata, but I guess we could also limit what values can 
be set.

Thoughts ?

Phil



From: John Garbutt [mailto:john.garb...@citrix.com]
Sent: 03 July 2012 16:41
To: Day, Phil; openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: RE: Setting VM passwords when not running on Xen

This seemed to crop up quite a lot in different sessions at the Design summit. 
I am certainly interested in a standard way to inject information into VMs.

What I think we need is a cross hypervisor two-way guest communication channel 
that is fairly transparent to the user of that VM (i.e. ideally not a network 
connection).

If I understand things correctly, we currently have these setup ideas:

* Config Drive (not supported by XenAPI, but not a two way transport)

* Cloud-Init / Metadata service (depends on DHCP(?), and not a two-way 
transport)

But to set the password, we ideally want two way communication. We currently 
have these:

* XenAPI guest plugin (XenServer specific, uses XenStore, but two way, 
no networking assumed )

* Serial port (used by http://wiki.libvirt.org/page/Qemu_guest_agent 
but not supported on XenServer)

I like the idea of building a common interface (maybe write out to a known file 
system location) for the above two hypervisor specific mechanisms. The agent 
should be able to pick which mechanism works. Then on top of that, we could 
write a common agent that can be shared for all the different hypervisors. You 
could also fallback to the metadata service and config drive when no two way 
communication is available.

I would love this Guest Agent to be an OpenStack project that can then be up 
streamed into many Linux distribution cloud images.

Sadly, I don't have any time to work on this right now, but hopefully that will 
change in the near future.

Cheers,
John

From: 
openstack-bounces+john.garbutt=eu.citrix@lists.launchpad.netmailto:openstack-bounces+john.garbutt=eu.citrix@lists.launchpad.net
 
[mailto:openstack-bounces+john.garbutt=eu.citrix@lists.launchpad.net]mailto:[mailto:openstack-bounces+john.garbutt=eu.citrix@lists.launchpad.net]
 On Behalf Of Day, Phil
Sent: 03 July 2012 3:07
To: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net 
(openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net) 
(openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net)
Subject: [Openstack] Setting VM passwords when not running on Xen

Hi Folks,

Is anyone else looking at how to support images that need a password rather 
than an ssh key (windows) on hypervisors that don't support set_admin_password 
(e.g. libvirt) ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Nova and asynchronous instance launching

2012-07-02 Thread Day, Phil
Hi Chris,

Thanks for the pointer on the new notification on state change stuff, I'd 
missed that change.

Is there a blueprint or some such which describes the change ?   

 In particular I'm trying to understand how the bandwidth_usage values fit in 
here.  It seems that during a VM creation there would normally be a number of 
fairly rapid state changes, so re-calculating the bandwidth_usage figures might 
be quiet expensive jut to log a change in task_state from say Networking to 
Block Device Mapping. I was kind of expecting that to be more part of the 
compute.exists messages than the update.

Do we have something that catalogues the various notification messages and 
their payloads ?

Thanks,
Phil



-Original Message-
From: Chris Behrens [mailto:cbehr...@codestud.com] 
Sent: 02 July 2012 00:14
To: Day, Phil
Cc: Jay Pipes; Huang Zhiteng; openstack@lists.launchpad.net
Subject: Re: [Openstack] Nova and asynchronous instance launching



On Jul 1, 2012, at 3:04 PM, Day, Phil philip@hp.com wrote:

 Rather than adding debug statements could we please add additional 
 notification events (for example a notification event whenever task_state 
 changes)
 

This has been in trunk for a month or maybe a little longer.

FYI

- Chris

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Nova and asynchronous instance launching

2012-07-01 Thread Day, Phil
Rather than adding debug statements could we please add additional notification 
events (for example a notification event whenever task_state changes)

Anyone that want's log file entries could then use the log_notifier, but those 
that want to get information like this back into a central system can then use 
rabbit_notifier.

Maybe we need some way of configuring filters on the notifier stream for those 
that want to decide which events should be logged, sent to MQ, or ignored 
altogether.

Phil   

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 29 June 2012 18:47
To: Huang Zhiteng
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Nova and asynchronous instance launching

On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
 Sound like a performance issue.  I think this symptom can be much 
 eased if we spend sometime fixing whatever bottleneck causing this 
 (slow AMQP, scheduler, or network)?  Now that Nova API has got 
 multprocess enabled, we'd move to next bottleneck in long path of 
 'launching instance'.
 Devin, is it possible that you provide more details about this issue 
 so that someone else can reproduce it?

Actually, Vish, David Kranz and I had a discussion about similar stuff on IRC 
yesterday. I think that an easy win for this would be to add much more 
fine-grained DEBUG logging statements in the various nova service pieces -- 
nova-compute, nova-network, etc. Right now, there are areas that seem to look 
like performance or locking culprits (iptables save/restore for example), but 
because there isn't very fine-grained logging statements, it's tough to say 
whether:

a) A process (or greenthread) has simply yielded to another while it waits for 
something

b) A process is doing something that is blocking

or

c) A process is doing some other work but no log statements are being logged 
about that work, which makes it seem like some other work is taking much longer 
than it really is

This would be a really easy win for a beginner developer or someone looking for 
something to assist with -- simply add informative
LOG.debug() statements at various points in the API call pipelines

Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Nova and asynchronous instance launching

2012-06-29 Thread Day, Phil
However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?


I assume the philosophy is that the API has validated the request as far and it 
can, and returned any meaningful error messages, etc.   Anything that fails 
past that point is something going wrong from the cloud provider and there is 
nothing the user could have done to avoid the error, so any additional 
information won't help them.

However on the basis that up-front validation is seldom perfect, and things can 
change while a request is in flight I think that being able to tell a user 
that, for example, their request failed because the image was deleted before it 
could be downloaded would be useful.

One approach might be to make the task_state more granular and use that to 
qualify the error.   In general our users have found having the state shown as 
vm_state (task_state) was useful as it shows progress during things like 
building.

Phil



From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Doug Davis
Sent: 29 June 2012 12:45
To: Eoghan Glynn
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Nova and asynchronous instance launching


Right - examining the current state isn't a good way to determine what happened 
with one particular request.  This is exactly one of the reasons some providers 
create Jobs for all actions.  Checking the resource later to see why 
something bad happened is fragile since other opertaons might have happened 
since then, erasing any error message type of state info.  And relying on 
event/error logs is hard since correlating one particular action with a flood 
of events is tricky - especially in a multi-user environment where several 
actions could be underway at once.  If each action resulted in a Job URI being 
returned then the client can check that Job resource when its convinient for 
them - and this could be quite useful in both happy and unhappy situations.

And to be clear, a Job doesn't necessarily need to be a a full new resource, it 
could (under the covers) map to a grouping of event logs entries but the point 
is that from a client's perspective they have an easy mechanism (e.g. issue a 
GET to a single URI) that returns all of the info needed to determine what 
happened with one particular operation.

thanks
-Doug
__
STSM |  Standards Architect  |  IBM Software Group
(919) 254-6905  |  IBM 444-6905  |  d...@us.ibm.commailto:d...@us.ibm.com
The more I'm around some people, the more I like my dog.

Eoghan Glynn egl...@redhat.commailto:egl...@redhat.com

06/29/2012 06:00 AM

To

Doug Davis/Raleigh/IBM@IBMUS

cc

openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net, Jay Pipes 
jaypi...@gmail.commailto:jaypi...@gmail.com

Subject

Re: [Openstack] Nova and asynchronous instance launching








 Note that I do distinguish between a 'real' async op (where you
 really return little more than a 202) and one that returns a
 skeleton of the resource being created - like instance.create() does
 now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] compute_rpcapi ?

2012-06-28 Thread Day, Phil
Hi All,

At the risk of sounding badly behind the curve once again, can someone point me 
to the Blueprint that describes why we now have the compute/rcpapi layer 
between compute/api and compute/manager please ?I'm guessing that its 
something to do with api versioning, but a simple overview would be really 
helpful.

Thanks,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Nova API Specification

2012-05-30 Thread Day, Phil
Hi Folks,

I was looking for the full definition of the API requests, and I'm a tad 
confused by what I find here:

http://api.openstack.org/

Specifically for Server Create there is both and Server - Create and Server 
- Extended Create, although as far as I can see the extended create isn't 
actually an extension as such (the additional parameters are supported in the 
core servers module).

Also there seem to be a number of parameter values that aren't specified in 
either interface entry, such as:

min_count
max_count
networks
key_name

So is the API document intended to be:


-  A formal specification of the Interface

-  A set of examples  (but if you want the details you need to read the 
code)

Are there plans to define the validation schematics of interface parameters ?

I have another specific question on what seems to be an inconsistency between 
the XML and JSON output of get server details:

The XML response defines the names of networks as values within the addresses 
section:

addresses
network id=public
ip version=4 addr=67.23.10.132/

But in the JSON response it looks as if the network name is a structural 
element of the response:
addresses:  {
public : [
{
version: 4,
addr: 67.23.10.132
},

i.e. depending on the value of the label field in the networks table of the 
nova database the structure of the JSON response seems to change (I may not be 
expressing that very well, my point is that addresses is fixed by the API 
definition, but public is defined per implementation ?

Is this a known issue ?

Cheers,
Phil




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Compute] nova-compute does not show up as :-) in nova-manage service list

2012-05-16 Thread Day, Phil
So the things to check are:


-  Is the nova-compute service running ?  If not the nova-compute.log 
should show why its failing


-  If it is running then it probably means that its getting stuck on 
some long running issue (e.g as down loading an image, problems talking to 
libvirt, slow response from the DB, etc). XXX in this case means that its 
slow in updating its services entry rather than failed as such


Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Leander Bessa Beernaert
Sent: 16 May 2012 12:19
To: Gurjar, Unmesh
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] [Compute] nova-compute does not show up as :-) in 
nova-manage service list

Shouldn't nova-network and nova-scheduler also fail then, since they share the 
same config file? If so, that's not what's happening with me, only the compute 
service is listed as XXX.
On Wed, May 16, 2012 at 12:10 PM, Gurjar, Unmesh 
unmesh.gur...@nttdata.commailto:unmesh.gur...@nttdata.com wrote:
Hi Leander,

The issue is the Compute is not updating its heartbeat (services table in the 
nova database), which is causing this.
You probably need to check database connection string in the nova.conf on the  
Compute host and the db connection from the Compute host is working.

Thanks  Regards,
Unmesh Gurjar | Lead Engineer | Vertex Software Private Ltd. | w. 
+91.20.6604.1500 x 379tel:%2B91.20.6604.1500%20x%20379 | m. 
+91.982.324.7631tel:%2B91.982.324.7631 | 
unmesh.gur...@nttdata.commailto:unmesh.gur...@nttdata.com | Follow us on 
Twitter@NTTDATAAmericas

From: 
openstack-bounces+unmesh.gurjar=nttdata@lists.launchpad.netmailto:nttdata@lists.launchpad.net
 
[mailto:openstack-bounces+unmesh.gurjarmailto:openstack-bounces%2Bunmesh.gurjar=nttdata@lists.launchpad.netmailto:nttdata@lists.launchpad.net]
 On Behalf Of Leander Bessa Beernaert
Sent: Wednesday, May 16, 2012 4:30 PM
To: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Subject: [Openstack] [Compute] nova-compute does not show up as :-) in 
nova-manage service list

Hello,


I can't get nova-compute to show up as :-) under 'nova-manager service list'. 
I've checked the logs and can't find any error or warning.

I'm using the default packages shipped with ubuntu 12.04 and have installed 
everything in a virtual machine.

Regards,

Leander

__
Disclaimer:This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data. If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [nova] why does notification use a topic exchange instead of fanout?

2012-05-09 Thread Day, Phil
Hi Doug,

 I think you missed my main point, which was that a topic exchange does
 not impose a limitation that only one client can consume a given
 notification.  That's only true if each client is consuming from the
 same queue bound to the exchange.

So just to be clear, if I understand you correctly within the nova service/rpc 
abstraction layers the code is set up so that all services do bind to the same 
queue, and hence we get the round-robin delivery.
But, if someone wanted to write a separate notification consumer so that they 
didn't block anyone else from seeing the same messages then they (the consumer) 
should create a new queue on the existing topic exchange.

Is that correct - and is there any worked example of doing this ?

I thought within the nova code both the exchange and topic queues were set up 
by the consumer (so for example all compute_managers try to create the 
compute exchange and topic queue, but its only created by the first one and 
the others connect to the same queue).   In that context I'm finding it hard to 
see how to change this model to have multiple notify.info topic queues into 
the same exchange ?

Cheers,
Phil




From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Doug Hellmann
Sent: 08 May 2012 23:34
To: Russell Bryant
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] [nova] why does notification use a topic exchange 
instead of fanout?


On Tue, May 8, 2012 at 6:04 PM, Russell Bryant 
rbry...@redhat.commailto:rbry...@redhat.com wrote:
On 05/08/2012 05:59 PM, Doug Hellmann wrote:
 Here is a relevant section pulled out of the amqp 0-9-1 spec:

3.1.3.3 The Topic Exchange Type

The topic exchange type works as follows:

1. A message queue binds to the exchange using a routing
   pattern, P.
2. A publisher sends the exchange a message with the routing
   key R.
3. The message is passed to the message queue if R matches P.

The routing key used for a topic exchange MUST consist of zero or
more words delimited by dots. Each word may contain the letters A-Z
and a-z and digits 0-9.

The routing pattern follows the same rules as the routing key with
the addition that * matches a single word, and # matches zero or
more words. Thus the routing pattern *.stock.# matches the routing
keys usd.stock and eur.stock.db but not stock.nasdaq.

 In nova, for a given topic such as 'scheduler', all of the consumers are
 binding to the same queue on the topic exchange, resulting in
 round-robin delivery to each of the consumers.  If instead you make a
 new queue, you can get your own copy of each message.

 There is an additional benefit of using a topic exchange here.  The
 topic used for notifications is 'notifications.priority'.  That means
 that when you create your queue, you can set it up to receive all
 notifications, or only notifications of a certain priority.


 Topic exchanges make a lot of sense for messages that should only be
 consumed once, such as tasks. Notifications are different. Lots of
 different clients might want to know that some event happened in the
 system. The way things are in Nova today, they can't. The first client
 who consumes a notification message will prevent all of the other
 clients from seeing that message at all.
I think you missed my main point, which was that a topic exchange does
not impose a limitation that only one client can consume a given
notification.  That's only true if each client is consuming from the
same queue bound to the exchange.

Yes, that wasn't obvious from any of the kombu documentation I've seen so far. 
I'll keep looking.

Thanks,
Doug


 I can change Nova's notification system to use a fanout exchange (in
 impl_kombu.py changing the exchange type used by NotifyPublisher), but
 before I submit a patch I want to make sure the current implementation
 using a topic exchange wasn't selected deliberately for some reason.
I think using a fanout exchange would be a downgrade.  As I mentioned
before, a topic exchange allows you to create a queue to get all
notifications or only notifications of a specific priority.  If the
exchange type is changed to fanout, it's everybody gets everything, and
that's it.

--
Russell Bryant

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [nova] why does notification use a topic exchange instead of fanout?

2012-05-09 Thread Day, Phil
OK, get that so far - so both consumers need to declare and use the same 
exchange.

But If I understand the next step right, to get multiple consumers of  info 
notification messages they would all need to create separate 
notifications.info queues into that exchange.And isn't that exactly what 
Nova currently does to create a shared queue ?

Phil

From: Kiall Mac Innes [mailto:ki...@managedit.ie]
Sent: 09 May 2012 10:51
To: Day, Phil
Cc: openstack@lists.launchpad.net; Russell Bryant; Doug Hellmann
Subject: Re: [Openstack] [nova] why does notification use a topic exchange 
instead of fanout?


Your own queue listener should attempt to declare the exchange, using the same 
settings as Nova does.

If the exchange exists, its a noop. Otherwise it's created for you.

After that, if you start up Nova, it will do the same and reuse your exchange.

Obviously this works both ways, and either nova or your code can declare the 
exchange.

AMQP is designed to be a configuration-less protocol, where resources are 
configured by the first consumer attempting to use them.

Thanks,
Kiall

Sent from my phone.
On May 9, 2012 9:52 a.m., Day, Phil 
philip@hp.commailto:philip@hp.com wrote:
Hi Doug,

 I think you missed my main point, which was that a topic exchange does
 not impose a limitation that only one client can consume a given
 notification.  That's only true if each client is consuming from the
 same queue bound to the exchange.

So just to be clear, if I understand you correctly within the nova service/rpc 
abstraction layers the code is set up so that all services do bind to the same 
queue, and hence we get the round-robin delivery.
But, if someone wanted to write a separate notification consumer so that they 
didn't block anyone else from seeing the same messages then they (the consumer) 
should create a new queue on the existing topic exchange.

Is that correct - and is there any worked example of doing this ?

I thought within the nova code both the exchange and topic queues were set up 
by the consumer (so for example all compute_managers try to create the 
compute exchange and topic queue, but its only created by the first one and 
the others connect to the same queue).   In that context I'm finding it hard to 
see how to change this model to have multiple notify.infohttp://notify.info 
topic queues into the same exchange ?

Cheers,
Phil




From: 
openstack-bounces+philip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net
 
[mailto:openstack-bounces+philip.daymailto:openstack-bounces%2Bphilip.day=hp@lists.launchpad.netmailto:hp@lists.launchpad.net]
 On Behalf Of Doug Hellmann
Sent: 08 May 2012 23:34
To: Russell Bryant
Cc: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Subject: Re: [Openstack] [nova] why does notification use a topic exchange 
instead of fanout?


On Tue, May 8, 2012 at 6:04 PM, Russell Bryant 
rbry...@redhat.commailto:rbry...@redhat.com wrote:
On 05/08/2012 05:59 PM, Doug Hellmann wrote:
 Here is a relevant section pulled out of the amqp 0-9-1 spec:

3.1.3.3 The Topic Exchange Type

The topic exchange type works as follows:

1. A message queue binds to the exchange using a routing
   pattern, P.
2. A publisher sends the exchange a message with the routing
   key R.
3. The message is passed to the message queue if R matches P.

The routing key used for a topic exchange MUST consist of zero or
more words delimited by dots. Each word may contain the letters A-Z
and a-z and digits 0-9.

The routing pattern follows the same rules as the routing key with
the addition that * matches a single word, and # matches zero or
more words. Thus the routing pattern *.stock.# matches the routing
keys usd.stock and eur.stock.db but not stock.nasdaq.

 In nova, for a given topic such as 'scheduler', all of the consumers are
 binding to the same queue on the topic exchange, resulting in
 round-robin delivery to each of the consumers.  If instead you make a
 new queue, you can get your own copy of each message.

 There is an additional benefit of using a topic exchange here.  The
 topic used for notifications is 'notifications.priority'.  That means
 that when you create your queue, you can set it up to receive all
 notifications, or only notifications of a certain priority.


 Topic exchanges make a lot of sense for messages that should only be
 consumed once, such as tasks. Notifications are different. Lots of
 different clients might want to know that some event happened in the
 system. The way things are in Nova today, they can't. The first client
 who consumes a notification message will prevent all of the other
 clients from seeing that message at all.
I think you missed my main point, which was that a topic exchange does
not impose a limitation that only one client can consume a given

[Openstack] Periodic clean-up of fixed_ip addresses in multi-host DHCP mode

2012-04-27 Thread Day, Phil
Hi Folks,

In multi-host mode the host field of a network never seems to get set (as 
only IPs are allocated, not networks)

However the periodic revovery task in NetworkManager uses the host field to 
filter what addresses it should consider cleaning up (to catch the case where 
the message from dnsmasq is either never sent or not delivered for some reason)

if self.timeout_fixed_ips:
now = utils.utcnow()
timeout = FLAGS.fixed_ip_disassociate_timeout
time = now - datetime.timedelta(seconds=timeout)
num = self.db.fixed_ip_disassociate_all_by_timeout(context,
   self.host,
   time)
if num:
LOG.debug(_('Dissassociated %s stale fixed ip(s)'), num)


Where db.fixed_ip_disassociate_all_by_timeout   is:

def fixed_ip_disassociate_all_by_timeout(_context, host, time):
session = get_session()
inner_q = session.query(models.Network.id).\
  filter_by(host=host).\
  subquery()
result = session.query(models.FixedIp).\
 filter(models.FixedIp.network_id.in_(inner_q)).\
 filter(models.FixedIp.updated_at  time).\
 filter(models.FixedIp.instance_id != None).\
 filter_by(allocated=False).\
 update({'instance_id': None,
 'leased': False,
 'updated_at': utils.utcnow()},
 synchronize_session='fetch')
return result


So what this seems to do to me is:

-  Find all of the fixed_ips which are:

o   on networks assigned to this host

o   Were last updated more that Timeout seconds ago

o   Are associated to an instance

o   Are not allocated

Because in multi-host mode the network host field is always Null, this query 
does nothing apart from give the DB a good work out every 10 seconds - so there 
could be a slow leakage of IP addresses.

Has anyone else spotted this - and if so do you have a good strategy for 
dealing with it ?

It seems that running this on every network_manager every 10 seconds is 
excessive - so what still running on all netwok_managers but using a long 
random sleep between runs in mult-host mode ?

Thoughts ?

Cheers,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Metadata and File Injection (code summit session?)

2012-04-10 Thread Day, Phil
+1

I was looking at this area earlier in terms of how the system works out what 
partition to inject keys/files/etc into, which feels like it should be 
specified by image metadata but currently defaults to partition 1.

Made me wonder if we really need so many different ways for instances to get 
their metadata ?

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Scott Moser
Sent: 10 April 2012 16:52
To: andrewbog...@gmail.com
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Metadata and File Injection (code summit session?)

On Tue, 10 Apr 2012, Andrew Bogott wrote:

 I'm reviving this ancient thread to ask:  Will there be a code summit 
 session about this?  And/or are there plans to start developing a 
 standard set of guest agents for Folsom?

http://summit.openstack.org/sessions/view/100


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Limit flavors to specific hosts

2012-04-03 Thread Day, Phil
Yes - Its more generic that hypervisor capabilities –  my main problem with 
Host Aggregates is that it limits it to some specific 1:1 groupings based on 
hypervisor functionality.

Use cases I want to be able to cover include:


-  Rolling new hardware through an existing cluster, and limiting some 
new flavors (which might for example provide higher network bandwidth) to just 
those servers



-  Providing a range of flavours that are dependent on specific 
hardware features (GPU)


-  There may be a further group that couples flavour and/or images to 
host groups – for example it’s possible to imagine a scenario where an image is 
only licensed to some specific subset of servers, or where a subset of nodes 
are running LXC (in which case the image is in effect pre-defined). In this 
case the image metadata could, for example, specify the flavors that it can be 
used with, and those flavors are in turn limited to specific hosts.   I don’t 
really like this model of linking Glance objects (images) to Nova Objects 
(flavors), but I’m not sure what an alternative would be.

On the config file vs REST API for configuration debate (maybe this needs to be 
a Design Summit subject in its own right), I agree that we should make a 
distinction between items that are deploy time configuration (which hypervisor 
to use, network driver, etc) and items that could change whilst the system is 
running (rate limits is a good example). I don’t however see this as being 
an REST API vs config file issue  - more a configuration repository issue.   
I’d also add that anything which is going to be configured via a REST API needs 
to also provide a command line tool to drive that interface – so that out of 
the box the system can be installed and configured via the tools and scripts 
shipped with it.

Phil



From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jan Drake
Sent: 03 April 2012 02:23
To: Lorin Hochstein
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Limit flavors to specific hosts

If I understand this correctly, the motivation is to be able to provide a hint 
to schedulers on host-level appropriateness based on information external to 
that found in the hyperviser.

Right/Wrong/Close?

It would help to have a real-world example of where basic host resource  
evalution for scheduling would cause a situation requiring the host-level 
hard-coding of what is essentially a flavor-constraint.

I'll hold further thoughts for downstream.


Jan

On Apr 2, 2012, at 6:06 PM, Lorin Hochstein 
lo...@nimbisservices.commailto:lo...@nimbisservices.com wrote:
Just created a blueprint for this:

https://blueprints.launchpad.net/nova/+spec/host-capabilities-api


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.comhttps://www.nimbisservices.com/




On Apr 2, 2012, at 3:29 PM, Jay Pipes wrote:


Can I add a feature request to the below thoughtstream? Can we make it so that 
the management of these things can be done outside of config files? i.e. via a 
REST API with some simple middleware exposing the particular scheduler nodes' 
understanding of which capabilities/filters it is using to apply its scheduling 
algorithm?

Making changes to configuration files works OK for simple uses and testing, not 
so much for on-demand operations :) I say this after grumbling about similar 
configuration obstacles with ratelimits.

Best,
-jay

On 04/02/2012 02:37 PM, Chris Behrens wrote:

I have some plans for being able to set arbitrary capabilities for
hosts via nova.conf that you can use to build scheduler filters.

Right now, there are capabilities, but I believe we're only creating
these from hypervisor stats. You can filter on those today. What I'm
planning on adding is a way to specify additional keyname/value pairs in
nova.conf to supplement the capabilities we build from hypervisor stats.
You could set things like this in your nova.conf:

--host_capabilities=instance_type_ids=1,2,3;keyX;keyY=something

etc. Since capabilities are already passed to scheduler rules, you could
add some basic filters that do:

if 'instance_type_ids' in capabilities and instance_type.id not in
capabilities['instance_type_ids']:
return False

Since host_capabilities are just arbitrary keyname/value pairs, you can
pretty much add anything you want to --host_capabilities and then write
some matching scheduler filter rules.

That's the basic idea, anyway. The exact same behavior will apply to
'cells' and the cells scheduler as well. (Except you'll have
cells_capabilities= somewhere (prob nova.conf for the cells service).

- Chris


On Apr 2, 2012, at 10:36 AM, Day, Phil wrote:

Hi Folks,
I’m looking for a capability to limit some flavours to some hosts. I
want the mapping to be as flexible as possible, and work within a
zone/cell (I don’t want to add zones just to get

[Openstack] Limit flavors to specific hosts

2012-04-02 Thread Day, Phil
Hi Folks,

I'm looking for a capability to limit some flavours to some hosts.  I want the 
mapping to be as flexible as possible, and work within a zone/cell  (I don't 
want to add zones just to get this mapping).For example I want to express 
something like:

Host_1 supports flavours A, C
Host_2 supports flavours A, B
Host_3 supports flavours A, B, C
Host_4 supports flavours D

Ideally there would be some form of grouping to sets of flavours:

Flavour_A  is part of Flavour_Sets 1, 2, 3
Flavour_B is part of Flavour_Sets 2, 3
Flavour_C is part of Flavour_Sets 1, 3, 4

Host_1 supports flavour Set 1
Host_2 supports flavour Set 2
Host_3 supports flavour Set 3
Host_4 supports flavour Set 4


From the Essex design summit I thought that host aggregates was going to give 
this sort of capability, but having looked through the code that seems to be 
quite tightly coupled with specific hypervisor functionality, whereas this is 
purely a scheduler feature.

I can see that I could define flavour group membership through the 
instanace_type_extra_specs, but not how to then associate these with specific 
hosts.

I know I'm a tad behind some of the recent changes - so before suggesting a 
design summit session on this I thought I'd ask - is there something that 
already does this type of mapping ?

Cheers,
Phil



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Distributed rate-limiting

2012-03-30 Thread Day, Phil
Yep - good point Chris  Kevin.  I hadn't thought of it that way.

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 30 March 2012 00:30
To: Kevin L. Mitchell
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Distributed rate-limiting

My issue with using the URL is someone could easily DoS any tenant.  Maybe you 
said that below.  I only have a brief moment to scan email ATM. :)

On Mar 29, 2012, at 5:26 PM, Kevin L. Mitchell kevin.mitch...@rackspace.com 
wrote:

 On Thu, 2012-03-29 at 22:58 +0100, Day, Phil wrote:
 - As you get the tenant id from the context I assume this module has
 to come after the authentication in the pipeline.   
 
 Yes, I have made that assumption.  It seems reasonable, given that the 
 existing rate-limit middleware is right after authentication as well.
 
 Have you thought about using the tenant_id in the URL instead ?   (I'm
 thinking of the case where you want rate limit requests into the 
 authentication system as well as Nova itself).
 
 No, I haven't.  I don't trust the user, which is where the tenant_id 
 in the URL comes from.  I do trust the auth system, which is why I 
 want to use the tenant ID from the context.  (And yes, you could argue 
 that authz would prevent their access to other tenants anyway, but why 
 make nova have to check authz if rate limiting would stop them in the 
 first
 place?)
 
 As for rate limiting requests into the authentication system, I'd 
 suggest using a Limit subclass which uses the remote IP address in 
 place of a tenant ID, at least for the user endpoint.  I don't think 
 we want any rate limiting at all on the service side of Keystone; our 
 current architecture means that Keystone is going to be hit a *lot*: 
 at least once for each request that hits Nova, and more in certain 
 cases (i.e., instance boot, where we'll have to hit quantum and glance as 
 well).
 
 - Does this work for EC2 as well as OSAPI ?
 
 Actually, it didn't occur to me to test, given that I don't really use 
 the EC2 API.  I don't think there's anything in the basic architecture 
 which would be incompatible with EC2; the only possible sticking point 
 that occurs to me is the URL construction in
 nova_limits:NovaClassLimit.route(): if the URL specified is prefixed 
 with '/v1.1/' or '/v2/', the version identifier is dropped (otherwise 
 the route wouldn't match).  That would be easy to work around; simply 
 extend NovaClassLimit and override route() to do the appropriate 
 transformation for EC2.  Any EC2 experts want to weigh in?
 --
 Kevin L. Mitchell kevin.mitch...@rackspace.com
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Validation of floating IP opertaions in Essex codebase ?

2012-03-29 Thread Day, Phil
Thanks Vish, that makes it clearer.I guess the validation can be handled by 
which ever manager picks up the call rather than having to be validates on the 
manager of a specific host (assuming multi-host of course), which should mean 
it's still reasonably responsive.

Just looking through the code it looks to me that there a few things that might 
still need clearing up to make this separation work.  For example:

_add_flaoting_ip calls to the compute.api (makes sense now) - which could do 
whatever validation makes sense at the instance level and then passes on the 
network.api.  But _remove_floating_ip calls direct to network_api, so even if 
the Instance wanted to do some validation it can't.   Shouldn't both pass 
through compute.api in this new model ?

There are also a few other casts left in the Network API layer:
release_floating_ip
deallocate_for_instance
add_fixed_ip_to_instance
remove_fixed_ip_from_instance
add_network_to_project

If the network manager if now the only thing that can perform validation 
shouldn't all of these be turned into calls as well ?

Cheers,
Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 28 March 2012 23:26
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: Re: [Openstack] Validation of floating IP opertaions in Essex codebase 
?


On Mar 28, 2012, at 10:04 AM, Day, Phil wrote:


Hi Folks,

At the risk of looking lazy in my first question by following up with a second:

So I tracked this down in the code and can see that the validation has moved 
into network/manager.py, and what was a validation/cast in network/api.py has 
been replaced with a call - but that seems to make the system more tightly 
coupled across components (i.e. if my there is a problem getting the message to 
the Network Manager then even an invalid request will be blocked until the call 
returns or times out).

This is a side effect of trying to decouple compute and network, see the 
explanation below.



It also looks as if the validation for disassociate_floating_ip has also been 
moved to the manager, but this is still a cast from the api layer - so those 
error messages never get back to the user.

Good point.  This probably needs to be a call with the current model.



Coming from Diablo it all feels kind of odd to me - I thought we were trying to 
validate what we could of requests in the API server and return immediate 
errors at that stage and then cast into the system (so that only internal 
errors can stop something from working at this stage). Was there a 
deliberate design policy around this at some stage ?

There are a few things going on here.

First we have spent a lot of time decoupling network and compute.  Ultimately 
network will be an external service, so we can't depend on having access to the 
network database on the compute api side. We can do a some checks in 
compute_api to make sure that it isn't attached to another instance that we 
know about, but ultimately the network service has to be responsible for saying 
what can happen with the ip address.

So the second part is about why it is happening in network_manager vs 
network_api.  This is a side-effect of the decision to plug in 
quantum/melange/etc. at the manager layer instead of the api layer.  The api 
layer is therefore being very dumb, just passing requests on to the manager.

So that explains where we are.  Here is the plan (as I understand) for the 
future:

a) move the quantum plugin to the api layer
(At this point we could move validation into the api if necessary.)

b) define a more complete network api which includes all of the necessary 
features that are currently compute extensions

c) make a client to talk to the api

d) make compute talk through the client to the api instead of using rabbit 
messages
(this decouples network completely, allowing us to deploy and run network as a 
completely separate service if need be.  At this point the quantum-api-plugin 
could be part of quantum or a new shared NaaS project.  More to decide at the 
summit here)

In general, we are hoping to switch to quantum as the default by Folsom, and 
not have to touch the legacy network code very much.  If there are serious 
performance issues we could make some optimizations by doing checks in 
network-api, but these will quickly become moot if we are moving towards using 
a client and talking through a rest interface.

So Looks like the following could be done in the meantime:

a) switch disassociate from a cast to a call - i would consider this one a a 
bug and would appreciate someone verifying that it fails and reporting it

b) add some validation in compute api - I'm not sure what we can assert here.  
Perhaps we could use the network_info cache and check for duplicates etc.

c) if we have serious performance issues, we could add another layer of checks 
in the compute_api, but we may have to make sure that we make sure

Re: [Openstack] Distributed rate-limiting

2012-03-29 Thread Day, Phil
Hi Kevin,

A couple of quick questions:

- As you get the tenant id from the context I assume this module has to come 
after the authentication in the pipeline.   Have you thought about using the 
tenant_id in the URL instead ?   (I'm thinking of the case where you want rate 
limit requests into the authentication system as well as Nova itself).

- Does this work for EC2 as well as OSAPI ?

Cheers,
Phil 

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Kevin L. Mitchell
Sent: 16 March 2012 21:45
To: openstack@lists.launchpad.net
Subject: [Openstack] Distributed rate-limiting

Howdy, folks.  I've been working on a replacement for nova's rate-limiting 
middleware that will handle the multiple-node case, and I've developed a fairly 
generic rate-limiting package, along with a second package that adapts it to 
nova.  (This means you could also use this rate-limiting setup with, say, 
glance, or with any other project that uses Python middleware.)  Here is some 
information:

* Turnstile
Turnstile is a piece of WSGI middleware that performs true distributed
rate-limiting.  System administrators can run an API on multiple
nodes, then place this middleware in the pipeline prior to the
application.  Turnstile uses a Redis database to track the rate at
which users are hitting the API, and can then apply configured rate
limits, even if each request was made against a different API node.

- https://github.com/klmitch/turnstile
- http://pypi.python.org/pypi/turnstile

* nova_limits
This package provides the ``nova_limits`` Python module, which
contains the ``nova_preprocess()`` preprocessor, the
``NovaClassLimit`` limit class, and the ``NovaTurnstileMiddleware``
replacement middleware class, all for use with Turnstile.  These
pieces work together to provide class-based rate limiting integration
with nova.

- https://github.com/klmitch/nova_limits
- http://pypi.python.org/pypi/nova_limits

Both packages should be fairly well documented (start with README.rst), and 
please feel free to log issues or make pull requests.
--
Kevin L. Mitchell kevin.mitch...@rackspace.com


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Validation of floating IP opertaions in Essex codebase ?

2012-03-28 Thread Day, Phil
Hi Stackers,

In Diablo there is a bunch of validation that goes on in the network/api layer, 
for example when associating an IP to an instance there are checks for:


-  Is the address allocated

-  Is it allocated to this project

-  Is it already assigned to an instance, and if so dis-associate it 
first.

However looking at the same code in Essex I just see a simple call to the 
Manager:

def associate_floating_ip(self, context, floating_address, fixed_address,
 affect_auto_assigned=False):
Associates a floating ip with a fixed ip.

ensures floating ip is allocated to the project in context

rpc.call(context,
 FLAGS.network_topic,
 {'method': 'associate_floating_ip',
  'args': {'floating_address': floating_address,
   'fixed_address': fixed_address,
   'affect_auto_assigned': affect_auto_assigned}})


True there is some validation in the manager side  to prevent association if 
the address is already in use (which was also in Diablo), but by then it's too 
late to return a meaningful error to the user.

I can't see where the other checks have been moved to (they don't appear to be 
in the API extension or compute/api layer (which the request passes through).   
Can someone point me to where this sort of validation is handled now please ?

I agree that the api code looks a lot cleaner in Essex without all of that 
validation code in it ;-)  - but surely we haven't removed those checks 
altogether ?

Thanks
Phil

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Auto Assigned Floating IPs take a long time to associate

2012-03-28 Thread Day, Phil
Are you sure that its Nova that is taking the time to associate the IP, and not 
an ARP issue in your network ?

I've seen this behaviour when quickly reusing floating IP addresses - Nova does 
the assignment and sends out an unsolicited ARP response (assuming you have the 
send_arp_for_ha flag set) - this is in network/linux_net.bind_floating_ip().  
 However  sometimes an unsolicited APR response can get dropped in the network, 
ad so if the switch doesn't see this message and it already has a previous ARP 
mapping in its cache then it will continue to try and send traffic to the 
previous user of that address until the cache times out.

Note that some network failover systems send multiple requests to get more 
certainty around this (for example I've seen a VPN solution use 6 messages).

There are a couple of things you could try:


-  Add a flag to be able to increase the number of arp_responses sent

-  Change the allocation of floating_ips so that instead of picking the 
first free one in the DB you pick the one which has been unused for the longest 
time (reduces the risk of reusing an address before the switch times out the 
entry in its cache).

Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Kevin Jackson
Sent: 28 March 2012 12:49
To: openstack@lists.launchpad.net
Subject: [Openstack] Auto Assigned Floating IPs take a long time to associate

Hi all,
I've got the following set in my nova.conf:

--auto_assign_floating_ip

and I fire up an instance.
Everything works, a private IP is assigned... the instance is running... but it 
can take an inordinate amount of time (anywhere upwards of 2 mins, sometimes a 
lot longer) to associate a floating IP automatically.

Anybody else experienced this?  Any clues on what I can do to troubleshoot 
this?  What is the condition when a Floating IP is assigned?  I've seen it 
assign Floating IPs very quickly when it is still Booting, not Active, say.
Is it dependent on anything within the Instance itself?

Cheers,

Kev
--
Kevin Jackson
@itarchitectkev
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Can't delete instances with error status.

2012-03-23 Thread Day, Phil
Here’s the way we’ve approached this:


-  A user can always send a delete request for a VM in any state (this 
is the only action that is always allowed).

-  Once a VM has a task_state of “Deleting” (set in the API server) the 
only action they can perform is delete

o   Hence at this point we can stop billing for it, and the user shouldn’t have 
it counted in their quota

-  A common reason for VMs getting stuck in Deleting is that the 
compute manager is restarted (or fails) �C so we have added code to the 
computer manager start-up to check for instances with a task_state of deleting 
and delete them (this needs to be able to cope with various exceptions if the 
delete was part way thought).   Since the manage is restarting we can be sure 
that the eventlet that was handling the delete isn’t doing it anymore ;-)

So from the user perspective we honour their request to delete VMs, make sure 
they can’t change their mind, and try to cleanup eventually as part of 
compute_manager restart.


-  By the same logic we also reset the “Image_Snapshot” and 
“Image_backup” task_state, as we know they aren’t true anymore.

-  It would be possible to also handle other task_states such as 
“rebuilding” and “rebooting”, but we haven’t tried that.

Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Guilherme Birk
Sent: 22 March 2012 13:24
To: Openstack Mail List
Subject: Re: [Openstack] Can't delete instances with error status.

Gabe, responding to your question Do you know how to reliably reproduce an 
instance in ERROR state that cannot be deleted?:
In my case, I'm updating the status of the VM to error directly on the 
database. This is just for testing. But even when my VM is running and working 
fine, when I update it to error in the database I can't delete it.
 From: gabe.westm...@rackspace.commailto:gabe.westm...@rackspace.com
 To: cp16...@gmail.commailto:cp16...@gmail.com
 Date: Thu, 22 Mar 2012 06:49:15 +
 CC: openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
 Subject: Re: [Openstack] Can't delete instances with error status.

 There are definitely lots of cases where deleting an instance in error state 
 works fine, and I’d like to know about the cases where it doesn’t. They do 
 count against quota as well, so that’s a problem!

 I can see value in keeping an instance around �C if the operations team has 
 configured it to do so. However, it seems like the end user has asked for it 
 to be deleted and to them it should appear to be deleted. Johannes had an 
 idea a while ago to allow the operations team to specify a project ID that 
 deleted servers that match certain criteria should be moved to. If the delete 
 finishes up fine, then its no problem, the delete is done, the customer is 
 happy and the ops account is empty. If it fails for some reason, there is 
 manual cleanup to be done, but that should be on the operator of the 
 deployment, not the user. I think its critical for anything like this to be 
 configurable, as public clouds and private clouds have different privacy and 
 retention concerns, I would guess.

 Do we have cases where we can reliably reproduce this issue? If its happening 
 every time on some deployments there is a very serious problem!

 Gabe


 From: Craig Vyvial 
 [mailto:cp16...@gmail.com]mailto:[mailto:cp16...@gmail.com]
 Sent: Thursday, March 22, 2012 2:20 AM
 To: Ga be Westmaas
 Cc: Yong Sheng Gong; Openstack Mail List
 Subject: Re: [Openstack] Can't delete instances with error status.

 My understanding is that you would not always want a user to delete an 
 instance in an error state. So an operations person can figure out what went 
 wrong. I think the instances that are in error state do not count against the 
 quota but i agree that they clutter up the API calls to list servers.

 I have noticed this with my team and written code around this case to force 
 the instance into an 'active' state before sending nova the delete call if 
 the instance was in an 'error' or 'suspended' state.

 -Craig

 On Thu, Mar 22, 2012 at 1:02 AM, Gabe Westmaas 
 gabe.westm...@rackspace.commailto:gabe.westm...@rackspace.com wrote:
 Instances in deleted status can normally be deleted, but there is definitely 
 a bug to file here somewhere �C possibly more than one.  A common reason I 
 have seen is that the node the instance lives on is no longer operating 
 correctly, so the compute manager never gets the delete request, so it 
 doesn’t finish.  If we can narrow the cases where this happens, we can file 
 bugs and decide how to resolve them �C although there may be some additional 
 work beyond just a developer picking up the bug and working on it to decide 
 what should happen!

 Do you know how to reliably reproduce an instance in ERROR state that cannot 
 be deleted?

 Gabe

 From: 
 

Re: [Openstack] Quota classes

2012-03-19 Thread Day, Phil
+1

And make the whole combine quota/limits module pluggable -  so that all of 
these per-user configuration items can be managed in a central system (e.g 
keystone)  

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 17 March 2012 16:25
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] Quota classes

On 03/16/2012 07:02 PM, Jesse Andrews wrote:
 There is the concept of limits that are very similar.  Should we 
 align quotas  limits?

Oh, yes please! :)

And make it configurable via a REST API, since editing config files ain't the 
most admin-friendly thang ;)

/me waits for Jorge to bring up Repose...

best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Quota classes

2012-03-19 Thread Day, Phil
On 03/19/2012 10:42 AM, Mark Washenberger wrote:
 Out of curiosity, why prefer keystone for centrally managing quota groups 
 rather than an admin api in nova? From my perspective, a nova admin api 
 would save a data migration and preserve nova-manage backwards compatibility.

Because more services than Nova can/should have Quotas/limits. Glance would 
like to piggy back on some common quota code if possible, instead of inventing 
something new :)

And more than one Nova instance can be using the same central user management 
system.For example if I have a number of separate Nova instances I'd like 
to not have to manage the quota settings for a user separately in each one.

 Also, since quota clearly isn't an auth-n thing, is keystone way more auth-z 
 than I realized?

RBAC and other functionality planned for Keystone is all about auth-z.

But, that said, I would not be opposed to having the quota/limits stuff 
outside of Keystone. I think Kevin's Turnstile is a pretty good solution that 
offers middleware that does distributed ratelimiting in a flexible 
architecture and has some nice advantages over the Swift ratelimit middleware, 
including having a control thread that allows admins to reconfigure the 
ratelimit middleware without restarting the service that houses the middleware 
-- just send a 
message to the control daemon's pubsub channel...

I agree it doesn't have to Keystone - what I meant was that it should be 
possible to have a system outside of Nova manage these per-user settings, given 
that with Keystone users/projects are in effect foreign keys to entities who's 
life cycle is manages elsewhere.

Phil

 Day, Philphilip@hp.com  said:

 +1

 And make the whole combine quota/limits module pluggable -  so that 
 all of these per-user configuration items can be managed in a 
 central system (e.g keystone)

 -Original Message-
 From: openstack-bounces+philip.day=hp@lists.launchpad.net
 [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On 
 Behalf Of Jay Pipes
 Sent: 17 March 2012 16:25
 To: openstack@lists.launchpad.net
 Subject: Re: [Openstack] Quota classes

 On 03/16/2012 07:02 PM, Jesse Andrews wrote:
 There is the concept of limits that are very similar.  Should we
 align quotas   limits?

 Oh, yes please! :)

 And make it configurable via a REST API, since editing config files 
 ain't the most admin-friendly thang ;)

 /me waits for Jorge to bring up Repose...

 best,
 -jay

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp




 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [NOVA] Possible causes for hung VMs (Diablo)

2012-03-12 Thread Day, Phil
Compute/api sets the task state to deleting at the start of delete() but 
without updating the vm_state,   so if these were VMs that failed to build, or 
were deleted during the build, then you could get that combination.



-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 12 March 2012 16:27
To: openstack@lists.launchpad.net
Subject: [Openstack] [NOVA] Possible causes for hung VMs (Diablo)

Hey Stackers,

We've noticed while administering the TryStack site that VMs tend to get into a 
stuck 'building' VM state, but we haven't been able to track down exactly what 
might be causing the problems. Hoping I can get some insight from folks running 
Diablo-based clouds.

Here is what the Nova database has recorded for VMs in the building or error VM 
states:

mysql select vm_state, task_state, count(*) from instances where
vm_state in ('building', 'error') group by vm_state, task_state order by
count(*) desc;
+--++--+
| vm_state | task_state | count(*) |
+--++--+
| building | deleting   |  128 |
| building | networking |   40 |
| building | scheduling |   26 |
| error| spawning   |   10 |
| building | spawning   |1 |
+--++--+
5 rows in set (0.01 sec)


As you can see, the majority of stuck VMs are in a building vm_state but with 
a deleting task_state.

Could someone elaborate how something is in a deleting task state during a 
build process?

Thanks in advance for any hints!
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Day, Phil
Hi Yun,

The point of the sleep(0) is to explicitly yield from a long running eventlet 
to so that other eventlets aren't blocked for a long period.   Depending on how 
you look at that either means we're making an explicit judgement on priority, 
or trying to provide a more equal sharing of run-time across eventlets.

It's not that things are CPU bound as such - more just that eventlets have 
every few pre-emption points.Even an IO bound activity like creating a 
snapshot won't cause an eventlet switch.

So in terms of priority we're trying to get to the state where:
 - Important periodic events (such as service status) run when expected  (if 
these take a long time we're stuffed anyway)
 - User initiated actions don't get blocked by background system eventlets 
(such as refreshing power-state)
- Slow action from one user don't block actions from other users (the first 
user will expect their snapshot to take X seconds, the second one won't expect 
their VM creation to take X + Y seconds).

It almost feels like the right level of concurrency would be to have a 
task/process running for each VM, so that there is concurrency across 
un-related VMs, but serialisation for each VM.

Phil 

-Original Message-
From: Yun Mao [mailto:yun...@gmail.com] 
Sent: 02 March 2012 20:32
To: Day, Phil
Cc: Chris Behrens; Joshua Harlow; openstack
Subject: Re: [Openstack] eventlet weirdness

Hi Phil, I'm a little confused. To what extend does sleep(0) help?

It only gives the greenlet scheduler a chance to switch to another green 
thread. If we are having a CPU bound issue, sleep(0) won't give us access to 
any more CPU cores. So the total time to finish should be the same no matter 
what. It may improve the fairness among different green threads but shouldn't 
help the throughput. I think the only apparent gain to me is situation such 
that there is 1 green thread with long CPU time and many other green threads 
with small CPU time.
The total finish time will be the same with or without sleep(0), but with sleep 
in the first threads, the others should be much more responsive.

However, it's unclear to me which part of Nova is very CPU intensive.
It seems that most work here is IO bound, including the snapshot. Do we have 
other blocking calls besides mysql access? I feel like I'm missing something 
but couldn't figure out what.

Thanks,

Yun


On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil philip@hp.com wrote:
 I didn't say it was pretty - Given the choice I'd much rather have a 
 threading model that really did concurrency and pre-emption all the right 
 places, and it would be really cool if something managed the threads that 
 were started so that is a second conflicting request was received it did some 
 proper tidy up or blocking rather than just leaving the race condition to 
 work itself out (then we wouldn't have to try and control it by checking 
 vm_state).

 However ...   In the current code base where we only have user space based 
 eventlets, with no pre-emption, and some activities that need to be 
 prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit 
 of untidy.   And it works now without a major code refactor.

 Always open to other approaches ...

 Phil


 -Original Message-
 From: openstack-bounces+philip.day=hp@lists.launchpad.net 
 [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On 
 Behalf Of Chris Behrens
 Sent: 02 March 2012 19:00
 To: Joshua Harlow
 Cc: openstack; Chris Behrens
 Subject: Re: [Openstack] eventlet weirdness

 It's not just you


 On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

 Does anyone else feel that the following seems really dirty, or is it just 
 me.

 adding a few sleep(0) calls in various places in the Nova codebase 
 (as was recently added in the _sync_power_states() periodic task) is 
 an easy and simple win with pretty much no ill side-effects. :)

 Dirty in that it feels like there is something wrong from a design point of 
 view.
 Sprinkling sleep(0) seems like its a band-aid on a larger problem imho.
 But that's just my gut feeling.

 :-(

 On 3/2/12 8:26 AM, Armando Migliaccio armando.migliac...@eu.citrix.com 
 wrote:

 I knew you'd say that :P

 There you go: https://bugs.launchpad.net/nova/+bug/944145

 Cheers,
 Armando

  -Original Message-
  From: Jay Pipes [mailto:jaypi...@gmail.com]
  Sent: 02 March 2012 16:22
  To: Armando Migliaccio
  Cc: openstack@lists.launchpad.net
  Subject: Re: [Openstack] eventlet weirdness
 
  On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
   I'd be cautious to say that no ill side-effects were introduced. 
   I found a
  race condition right in the middle of sync_power_states, which I 
  assume was exposed by breaking the task deliberately.
 
  Such a party-pooper! ;)
 
  Got a link to the bug report for me?
 
  Thanks!
  -jay

 ___
 Mailing list: https://launchpad.net/~openstack Post to     : 
 openstack

Re: [Openstack] eventlet weirdness

2012-03-05 Thread Day, Phil
 However I'd like to point out that the math below is misleading (the average 
 time for the non-blocking case is also miscalculated but 
 it's not my point). The number that matters more in real life is throughput. 
 For the blocking case it's 3/30 = 0.1 request per second.

I think it depends on whether you are trying to characterise system performance 
(processing time) or perceived user experience (queuing time + processing 
time).   My users are kind of selfish in that they don't care how many 
transactions per second I can get through,  just how long it takes for them to 
get a response from when they submit the request.

Making the DB calls non-blocking does help a very small bit in driving up API 
server utilisation  - but my point was that time spent in the DB is such a 
small part of the total time in the API server that it's not the thing that 
needs to be optimised first. 

Any queuing system will explode when its utilisation approaches 100%, blocking 
or not.   Moving to non-blocking just means that you can hit 100% utilisation 
in the API server with 2 concurrent requests instead of *only* being able to 
hit 90+% with one transition.   That's not a great leap forward in my 
perception.

Phil

-Original Message-
From: Yun Mao [mailto:yun...@gmail.com] 
Sent: 03 March 2012 01:11
To: Day, Phil
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

First I agree that having blocking DB calls is no big deal given the way Nova 
uses mysql and reasonably powerful db server hardware.

However I'd like to point out that the math below is misleading (the average 
time for the nonblocking case is also miscalculated but it's not my point). The 
number that matters more in real life is throughput. For the blocking case it's 
3/30 = 0.1 request per second.
For the non-blocking case it's 3/27=0.11 requests per second. That means if 
there is a request coming in every 9 seconds constantly, the blocking system 
will eventually explode but the nonblocking system can still handle it. 
Therefore, the non-blocking one should be preferred.
Thanks,

Yun


 For example in the API server (before we made it properly multi-threaded) 
 with blocking db calls the server was essentially a serial processing queue - 
 each request was fully processed before the next.  With non-blocking db calls 
 we got a lot more apparent concurrencybut only at the expense of making all 
 of the requests equally bad.

 Consider a request takes 10 seconds, where after 5 seconds there is a call to 
 the DB which takes 1 second, and three are started at the same time:

 Blocking:
 0 - Request 1 starts
 10 - Request 1 completes, request 2 starts
 20 - Request 2 completes, request 3 starts
 30 - Request 3 competes
 Request 1 completes in 10 seconds
 Request 2 completes in 20 seconds
 Request 3 completes in 30 seconds
 Ave time: 20 sec


 Non-blocking
 0 - Request 1 Starts
 5 - Request 1 gets to db call, request 2 starts
 10 - Request 2 gets to db call, request 3 starts
 15 - Request 3 gets to db call, request 1 resumes
 19 - Request 1 completes, request 2 resumes
 23 - Request 2 completes,  request 3 resumes
 27 - Request 3 completes

 Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes 
 in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 
 seconds) Ave time: 20 sec

 So instead of worrying about making db calls non-blocking we've been working 
 to make certain eventlets non-blocking - i.e. add sleep(0) calls to long 
 running iteration loops - which IMO has a much bigger impact on the 
 performance of the apparent latency of the system. Thanks for the 
 explanation. Let me see if I understand this.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
 By properly multi-threaded are you instead referring to making the nova-api 
 server multi-*processed* with eventlet greenthread pools in each process? 
 i.e. The way Swift (and now Glance) works? Or are you referring to a 
 different approach entirely?

Yep - following your posting in here pointing to the glance changes we 
back-ported that into the Diablo API server.   We're now running each API 
server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
happier.  The same changes were being done in parallel into Essex by someone in 
the community I thought ?

 Curious... do you have a list of all the places where sleep(0) calls were 
 inserted in the HP Nova code? I can turn that into a bug report and get to 
 work on adding them... 

So far the only two cases we've done this are in the _sync_power_state and  in 
the security group refresh handling 
(libvirt/firewall/do_refresh_security_group_rules) - which we modified to only 
refresh for instances in the group and added a sleep in the loop (I need to 
finish writing the bug report for this one).

I have contemplated doing something similar in the image code when reading 
chunks from glance - but am slightly worried that in this case the only thing 
that currently stops two creates for the same image from making separate 
requests to glance might be that one gets queued behind the other.  It would be 
nice to do the same thing on snapshot (as this can also be a real hog), but 
there the transfer is handled completely within the glance client.   A more 
radical approach would be to split out the image handling code from compute 
manager into a separate (co-hosted) image_manager so at least only commands 
which need interaction with glance will block each other.

Phil




-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 02 March 2012 15:17
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/02/2012 05:34 AM, Day, Phil wrote:
 In our experience (running clusters of several hundred nodes) the DB 
 performance is not generally the significant factor, so making its calls 
 non-blocking  gives only a very small increase in processing capacity and 
 creates other side effects in terms of slowing all eventlets down as they 
 wait for their turn to run.

Yes, I believe I said that this was the case at the last design summit
-- or rather, I believe I said is there any evidence that the database is a 
performance or scalability problem at all?

 That shouldn't really be surprising given that the Nova DB is pretty small 
 and MySQL is a pretty good DB - throw reasonable hardware at the DB server 
 and give it a bit of TLC from a DBA (remove deleted entries from the DB, add 
 indexes where the slow query log tells you to, etc) and it shouldn't be the 
 bottleneck in the system for performance or scalability.

++

 We use the python driver and have experimented with allowing the eventlet 
 code to make the db calls non-blocking (its not the default setting), and it 
 works, but didn't give us any significant advantage.

Yep, identical results to the work that Mark Washenberger did on the same 
subject.

 For example in the API server (before we made it properly 
 multi-threaded)

By properly multi-threaded are you instead referring to making the nova-api 
server multi-*processed* with eventlet greenthread pools in each process? i.e. 
The way Swift (and now Glance) works? Or are you referring to a different 
approach entirely?

  with blocking db calls the server was essentially a serial processing queue 
  - each request was fully processed before the next.  With non-blocking db 
  calls we got a lot more apparent concurrencybut only at the expense of 
  making all of the requests equally bad.

Yep, not surprising.

 Consider a request takes 10 seconds, where after 5 seconds there is a call to 
 the DB which takes 1 second, and three are started at the same time:

 Blocking:
 0 - Request 1 starts
 10 - Request 1 completes, request 2 starts
 20 - Request 2 completes, request 3 starts
 30 - Request 3 competes
 Request 1 completes in 10 seconds
 Request 2 completes in 20 seconds
 Request 3 completes in 30 seconds
 Ave time: 20 sec

 Non-blocking
 0 - Request 1 Starts
 5 - Request 1 gets to db call, request 2 starts
 10 - Request 2 gets to db call, request 3 starts
 15 - Request 3 gets to db call, request 1 resumes
 19 - Request 1 completes, request 2 resumes
 23 - Request 2 completes,  request 3 resumes
 27 - Request 3 completes

 Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes 
 in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 
 seconds) Ave time: 20 sec

 So instead of worrying about making db calls non-blocking we've been working 
 to make certain eventlets non-blocking - i.e. add sleep(0) calls to long 
 running iteration loops - which IMO

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
That sounds a bit over complicated to me - Having a string of tasks sounds like 
you still have to think about what the concurrency is within each step.

There is already a good abstraction around the context of each operation - they 
just (I know - big just) need to be running in something that maps to kernel 
threads rather than user space ones.

All I really want to is to allow more than one action to run at the same time.  
So if I have two requests to create a snapshot, why can't they both run at the 
same time and still allow other things to happen ? I have all these cores 
sitting in my compute node that there that could be used, but I'm still having 
to think like a punch-card programmer submitting batch jobs to the mainframe ;-)

Right now creating snapshots is pretty close to a DoS attack on a compute node.


From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
Sent: 02 March 2012 19:23
To: Day, Phil; Chris Behrens
Cc: openstack
Subject: Re: [Openstack] eventlet weirdness

So a thought I had was that say if the design of a component forces as part of 
its design the ability to be ran with threads or with eventlet or with 
processes.

Say if u break everything up into tasks (where a task would produce some 
output/result/side-effect).
A set of tasks could complete some action (ie, create a vm).
Subtasks could be the following:
0. Validate credentials
1. Get the image
2. Call into libvirt
3. ...

These tasks, if constructed in a way that makes them stateless, and then 
could be chained together to form an action, then that action could be given 
say to a threaded engine that would know how to execute those tasks with 
threads, or it could be given to an eventlet engine that would do the same 
with evenlet pool/greenthreads/coroutings, or with processes (and so on). This 
could be one way the design of your code abstracts that kind of execution 
(where eventlet is abstracted away from the actual work being done, instead of 
popping up in calls to sleep(0), ie the leaky abstraction).

On 3/2/12 11:08 AM, Day, Phil philip@hp.com wrote:
I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really did concurrency and pre-emption all the right places, and it 
would be really cool if something managed the threads that were started so that 
is a second conflicting request was received it did some proper tidy up or 
blocking rather than just leaving the race condition to work itself out (then 
we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based 
eventlets, with no pre-emption, and some activities that need to be prioritised 
then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   
And it works now without a major code refactor.

Always open to other approaches ...

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

 Does anyone else feel that the following seems really dirty, or is it just 
 me.

 adding a few sleep(0) calls in various places in the Nova codebase
 (as was recently added in the _sync_power_states() periodic task) is
 an easy and simple win with pretty much no ill side-effects. :)

 Dirty in that it feels like there is something wrong from a design point of 
 view.
 Sprinkling sleep(0) seems like its a band-aid on a larger problem imho.
 But that's just my gut feeling.

 :-(

 On 3/2/12 8:26 AM, Armando Migliaccio armando.migliac...@eu.citrix.com 
 wrote:

 I knew you'd say that :P

 There you go: https://bugs.launchpad.net/nova/+bug/944145

 Cheers,
 Armando

  -Original Message-
  From: Jay Pipes [mailto:jaypi...@gmail.com]
  Sent: 02 March 2012 16:22
  To: Armando Migliaccio
  Cc: openstack@lists.launchpad.net
  Subject: Re: [Openstack] eventlet weirdness
 
  On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
   I'd be cautious to say that no ill side-effects were introduced. I
   found a
  race condition right in the middle of sync_power_states, which I
  assume was exposed by breaking the task deliberately.
 
  Such a party-pooper! ;)
 
  Got a link to the bug report for me?
 
  Thanks!
  -jay

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Memory leaks from greenthreads

2012-03-01 Thread Day, Phil
 Has there been any thinking around only using eventlet/greenlet for webserver 
 endpoints and using something like multiprocessing for everything else?

I was already beginning to think that this would be a good blueprint/discussion 
topic for the design summit ;-)

We've seen a number of issues with the eventlet approach in the computer  
network manager where a long running activity (such as updating all security 
groups, creating and uploading a snapshot) will block any other activities.   
Whilst it's possible to work round the first of these types of issues by 
planting sleep(0) statements in the loop, snapshot upload is still a problem.

Of course making everything thread safe isn't going to be trivial, although 
there is lock code in place for things like iptables I suspect that there are a 
whole bunch of other timing / concurrency issues that we'll find once we move 
to a full threaded model.

Phil



From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Joshua Harlow
Sent: 29 February 2012 21:26
To: Vishvananda Ishaya; openstack
Subject: Re: [Openstack] Memory leaks from greenthreads

Cool.

Just a thought I was having, that others might want to chime in on.

Has there been any thinking around only using eventlet/greenlet for webserver 
endpoints and using something like multiprocessing for everything else?

I know its a fundamental change, but it would force people to think about how 
to break up there code into something that would work with a message passing 
architecture (this is already happening with nova + rabbitmq). Nova is a good 
example, but my thought was to go even further and have anything that needs to 
run for a long time (ie a equivalent of a nova manager) that is shared inside a 
application also be a separate process with a queue for message passing. Then 
maybe eventlet/greenlet isn't needed at all? This would force good interfaces, 
and we wouldn't have to worry about missing a monkey patch. Maybe the python 
people plan for multiprocess to replace eventlet/greenlet in the end anyway???

Thoughts?

On 2/29/12 12:48 PM, Vishvananda Ishaya vishvana...@gmail.com wrote:
Hello Everyone,

We have had a memory leak due to an interaction with eventlet for a while that 
Johannes has just made a fix for.

bug:
https://bugs.launchpad.net/nova/+bug/903199

fix:
https://bitbucket.org/which_linden/eventlet/pull-request/10/monkey-patch-threadingcurrent_thread-as

Unfortuantely, I don' t think we have a decent workaround for nova while that 
patch is upstreamed.  I wanted to make sure that all of the distros are aware 
of it in case they want to carry an eventlet patch to prevent the slow memory 
leak.

Vish
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Security Group Rule Refresh

2012-02-23 Thread Day, Phil
OK - I'll put a description into lanchpad along with our notes on how we're 
proposing to fix this on our Diablo branch (as there is a performance related 
change in here as well)

As with the previous performance change it will take us some time to get an 
Essex compatible fix - but if I provide all the details perhaps someone else 
can pick this up in parallel.

Phil

From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Vishvananda Ishaya
Sent: 22 February 2012 22:00
To: McNally, Dave (HP Cloud Services)
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Security Group Rule Refresh

Maybe soren has a comment on this, but as far as I can tell it looks like a 
bug.  It seems getting a list of instances that are in that group and 
refreshing those would be the right approach.

Vish

On Feb 22, 2012, at 9:13 AM, McNally, Dave (HP Cloud Services) wrote:


Hi all,

Currently I'm trying to track how a refresh of the security groups is handled 
(upon creation or deletion of a vm). Following through the code I get to 
'do_refresh_security_group_rules' in libvirt/firewall.py. Up to this point the 
security group in question has been carried through however it seems to be 
discarded here and rather than filtering the instances to refresh the rules for 
based on this group it looks to me like all instances on the current host are 
iterated through and then there is an attempt to update the rules for all these 
instances.

Is this full refresh necessary/intentional? If so can anyone tell me why it's 
required?

Thanks,

Dave
___
Mailing list: https://launchpad.net/~openstack
Post to : 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Security Group Rule Refresh

2012-02-23 Thread Day, Phil
Hi Soren,

Thanks for the insight, a few questions / comments:


 1 deal with the situation where a refresh call to one of the compute
   nodes got lost. If that happened, at least it would all get sorted
   out on the next refresh.
Can see the advantage of this, but on an active system this can be quite an 
overhead compared to a periodic refresh.

 2 the routine that turned the rules from the database into iptables
   rules was complex enough as it was. Making it remove only rules for a
   single security group or a single instance or whatever would make it
   even worse.
I wonder if we're talking about the same driver - the code we're looking at is 
in the IptablesFirewallDriver  in libvirt/firewall.py (which I think is moved 
up to virt/firewall.py in Essex).  That seems to create a chain per Instance 
and do the update on a per instance basis, so I'm  not quite sure I understand 
your point ?

 3 The difference in terms of efficiency is miniscule. iptables replaces
   full tables at a time anyway, and while the relative amount of data
   needed to be fetched from the database might be much larger than with
   a more selective refresh, the absolute amount of data is still pretty
   small.
It may be that we're hitting a particular case - but we have a test system with 
10's of VMs per host, on not many hosts, and some groups with 70+ VMs and a 
rule set that references the security group itself.  So every VM in that group 
that gets refreshed (and there are many on each host) has to rebuild rules for 
each VM in the group.   The impact of this overhead on every VM create and 
delete in un-related groups is killing the system - eps as the update code 
doesn't yield so other tasks on the compute node (such as the create itself are 
blocked).

Point 2 should be more palatable now that the simpler implementation has 
proven itself. 
Could you clarify which simpler implementation your referring to - I've seen 
the  NWFilterFirewall class and its associated comment block, but it wasn't 
clear to me under what circumstances it would be worth switching to this ?

Thanks,
Phil

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Soren Hansen
Sent: 23 February 2012 12:53
To: McNally, Dave (HP Cloud Services)
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Security Group Rule Refresh

2012/2/22 McNally, Dave (HP Cloud Services) dave.mcna...@hp.com:
 Currently I’m trying to track how a refresh of the security groups is 
 handled (upon creation or deletion of a vm). Following through the 
 code I get to ‘do_refresh_security_group_rules’ in 
 libvirt/firewall.py. Up to this point the security group in question 
 has been carried through however it seems to be discarded here and 
 rather than filtering the instances to refresh the rules for based on 
 this group it looks to me like all instances on the current host are 
 iterated through and then there is an attempt to update the rules for 
 all these instances.

 Is this full refresh necessary/intentional? If so can anyone tell me 
 why it’s required?

I forget the exact history here (i.e. why some of the method calls include it 
and why some don't), but there are three reasons I decided to do a full refresh:

 1 deal with the situation where a refresh call to one of the compute
   nodes got lost. If that happened, at least it would all get sorted
   out on the next refresh.
 2 the routine that turned the rules from the database into iptables
   rules was complex enough as it was. Making it remove only rules for a
   single security group or a single instance or whatever would make it
   even worse.
 3 The difference in terms of efficiency is miniscule. iptables replaces
   full tables at a time anyway, and while the relative amount of data
   needed to be fetched from the database might be much larger than with
   a more selective refresh, the absolute amount of data is still pretty
   small.


Point 1 could be addressed now by a periodical refresh of the rules, if one was 
so inclined.

Point 2 should be more palatable now that the simpler implementation has proven 
itself.

Point 3 might be less true now. In the beginning, there were separate chains 
for each security group, now it's just one big list, IIRC. That may change 
things.

--
Soren Hansen             | http://linux2go.dk/ Senior Software Engineer | 
http://www.cisco.com/ Ubuntu Developer         | http://www.ubuntu.com/ 
OpenStack Developer      | http://www.openstack.org/

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : 

[Openstack] Is there a security issue with qcow2 images ?

2012-01-25 Thread Day, Phil
Hi Folks,

I have a half remembered conversation from the Boston summit where someone said 
that there was a security issue in using qcow2 as a format for creating 
snapshot images.

Does that ring any bells with anyone, and if so can you expand on the potential 
issue please ?

I think it was something to do with why base images have to be expanded after 
download, but I might be wrong on that.   I'm particularly interested in using 
qcow2 as an upload format for snapshots.

Thanks
Phil

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] F release naming poll - Cast your vote !

2012-01-12 Thread Day, Phil
Am I the only one that sees a mispronunciation issue with Fawnskin ? 

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Thierry Carrez
Sent: 12 January 2012 09:35
To: openstack@lists.launchpad.net
Subject: [Openstack] F release naming poll - Cast your vote !

Fawnskin, Felton, Fillmore, Flournoy, Folsom, Fortuna, Fowler...
How should the F version of OpenStack, due Fall 2012, be named ?

Please participate to the F naming poll at:
https://launchpad.net/~openstack/+poll/f-release-naming/+vote

Pick your choice among the 28 options we have ! This is open to all
members of the Launchpad OpenStack team (which is an open team).

The poll closes next Tuesday, January 17, at 21:30 UTC.
Cheers,

-- 
Thierry Carrez (ttx)
Release Manager, OpenStack

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Configure Rate limits on OS API

2011-12-19 Thread Day, Phil
Hi Folks,

Is there a file that can be used to configure the API rate limits for the OS 
API on a per user basis ?

I can see where the default values are set in the code, but it looks as if 
there should be a less brutal configuration mechanism to go along with this ?

Thanks
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Nova Attribute Groups Blueprint ?

2011-11-18 Thread Day, Phil
Hi Folks,

I seem to remember a discussion in Boston that identified a need to be able to 
group compute servers within a zone by attributes that would be significant 
to the scheduler - for example a group of servers which share a storage pool, 
etc.

But I can't see any blueprint following on from this (or find the etherpad from 
Boston)- anyone know what happened to this, or was I having another senior 
moment ?

Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-11-01 Thread Day, Phil
Hi Vish,

I probably wasn't careful enough with my wording - the API server may not be 
threaded as such, but the use of eventlets gives effectively the same 
concurrency issues that point towards needing to use scoped sessions.

Our basis for concluding that this is some form of concurrency issue is that we 
can easily reproduce the issue by running concurrent requests into an API 
server, and we have seen the problem disappear if we reduce the eventlet pool 
to 1 or change to scoped sessions.   Whilst the symptom is that the session has 
terminated by the time the lazy load is requested, as far as we can see the 
eventlet handing the query hasn't itself terminated the session - although it 
does seem likely that another eventlet using the same shared session could 
have. This seems to be specifically the type of issue that scoped sessions are 
intended to address.

http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions

All of this is based on a limited understanding of how sqlalchemy is used in 
Nova - I'd be more than happy to be corrected by others with more experience, 
hence the question to the mailing list.

I fully understand the drive to clean up the database layer, and I'm not 
knocking the fix to 855660 - its clearly a good template for the way the DB 
needs to go in Essex.   My concern is that as shown by 855660 these changes 
have a pretty wide scope, and by the time that's been expanded to all of the 
current joinedloads it feels like it would be such a large set of changes that 
I'd be concerned about them coming back into Diablo.Stable.

Hence instead we were looking for a much smaller change that can address the 
whole class of problem of joinedloads in Diablo for now ahead of the DB 
refactoring in Essex - and from our testing scoped sessions seem to address 
that.  However as changing to scoped session breaks the migrate code in unit 
tests, and not really understanding why this is or the intricacies of the DB 
unit tests I wanted to see if we were heading down a path that had already been 
examined and discarded before we spend too much time on it.

I'd be really interested in hearing from anyone with experience of 
scoped_sessions, and/or willing to help us understand the issues we're seeing 
in the Unit Tests.

And of course I'd like to know what the communities feeling is towards a 
simpler approach to fixing the issue in Diablo.Final vs the backport of DB 
simplification changes from Essex - which I'm assuming will take some tiem yet 
to work through all of the joinedloads.

Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 31 October 2011 19:50
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net); Johnson, 
Andrew Gordon (HP Cloud Services); Hassan, Ahmad; Haynes, David; 
nova-datab...@lists.launchpad.net
Subject: Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in 
sqlalchemy ?

All of the workers are single-threaded, so I'm not sure that scoped sessions 
are really necessary.

We did however decide that objects from the db layer are supposed to be simple 
dictionaries.  We currently allow nested dictionaries to optimize joined 
objects. Unfortunately we never switched to sanitizing data from sqlalchemy, 
and instead we make the sqlalchemy objects provide a dictionary-like interface 
and pass the object itself.

The issue that you're seeing is because network wasn't properly 'joinedload'ed 
in the initial query, and because the data is not sanitized, sqlalchemy tries 
to joinedload, but the session has been terminated.  If we had sanitized data, 
we would get a more useful error like a key error when network is accessed. The 
current solution is to add the proper joinedload.

One of the goals of the nova-database team is to do the necessary data 
sanitization and to remove as many of the joinedloads as possible (hopefully 
all of them).

Vish

On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:


Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in 
__call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in 
invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in 
describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204

[Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-10-31 Thread Day, Phil
Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in 
__call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in 
invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in 
describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204, in 
_format_describe_instances
(nova.api): TRACE: return {'reservationSet': self._format_instances(context, 
**kwargs)}
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1309, in 
_format_instances
(nova.api): TRACE: if fixed['network'] and use_v6:
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py, line 76, in 
__getitem__
(nova.api): TRACE: return getattr(self, key)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 163, in 
__get__
(nova.api): TRACE: instance_dict(instance))
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 383, in 
get
(nova.api): TRACE: value = callable_(passive=passive)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py, line 595, in 
__call__
(nova.api): TRACE: (mapperutil.state_str(state), self.key)
(nova.api): TRACE: DetachedInstanceError: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE:


As far as we can see the problem seems to be related to some conflict between 
multiple threads in the same API server instance and lazy loading of some part 
of the object.

Looking at the sqlalchemy documentation it seems to strongly suggest that when 
used from multi-threaded WSGI applications that scoped_sessions should be used 
(I'm not clear on the details but it seems that this effectively makes lazy 
load operations thread safe).However whilst this fixes the problem it has a 
bad effect on the unit tests - in particular it seems to upset all of the DB 
migration code used in the unit tests.

So does anyone know if there was an explicit decision / reason not to use 
scoped_sessions in Nova ?

Thanks,
Phil

PS:  The other possible fix we've found is to change sqlalchemy/models.py so 
that the associations are explicitly set to use eager load - which also seems 
to fix the problem but feels like a more clumsy way to go about it.   Any 
thoughts on that would also be appreciated ?



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] OSAPI equivalent of euca-get-console-output ?

2011-10-21 Thread Day, Phil
Hi Folks,

The title says it all really - is there an OSAPI / nova-client equivalent to 
the EC2 command to get the console output of a VM ?(I can't see anything in 
the code or extensions which calls the relevant compute.api method)


If there's nothing at the moment are there any plans for adding in (seems like 
is should be a core server action rather that an extension) ?

Thanks,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] OSAPI equivalent of euca-get-console-output ?

2011-10-21 Thread Day, Phil
Right - I was looking for the simple equivalent to euca-get-console-output, 
i.e. something which calls compute_api.get_console_output().  

It seems to me that would be basic enough to qualify for 
/server/{id}/console-text or some such.Xenapi not a problem for me - the 
current KVM support if fine ;-)

-Original Message-
From: Chris Behrens [mailto:chris.behr...@rackspace.com] 
Sent: 21 October 2011 20:42
To: Jake Dahn
Cc: Chris Behrens; Day, Phil; openstack@lists.launchpad.net
Subject: Re: [Openstack] OSAPI equivalent of euca-get-console-output ?


Ah, I see.  consoles.py is for getting info about VNC (or similar) 
console...ie: create console, get info on how to connect to the vnc console, 
etc.  It doesn't appear compute_api.get_console_output() is exposed in OS API 
right now, which is what you want.  Those compute_api console methods are also 
currently not implemented for 'xenapi'.

- Chris


On Oct 21, 2011, at 12:35 PM, Jake Dahn wrote:

 Chris,
 
 What is the output of the detailed info call? I'm actually working on an 
 extension to get tailable console output, and I didn't see the request you 
 mentioned anywhere in the api code. 
 
 Correct me if I'm wrong, but - consoles.py talks to Consoles.API when i think 
 to get the actual output of things we need to talk to Compute.API if we want 
 to get the actual output of a console and not just info about it.
 
 
 On Oct 21, 2011, at 11:25 AM, Chris Behrens wrote:
 
 For OSAPI:
 
 There's POST /version/project/servers/server_id/consoles  to create a 
 console
 Use GET to get consoles for that server_id
 Then you can use:  GET 
 /version/project/servers/server_id/consoles/console_id  to get the 
 detailed info.
 
 I don't think there's support for this in nova-client.
 
 (Look at nova/api/openstack/consoles.py)
 
 - Chris
 
 
 On Oct 21, 2011, at 9:30 AM, Day, Phil wrote:
 
 Hi Folks,
 
 The title says it all really - is there an OSAPI / nova-client equivalent 
 to the EC2 command to get the console output of a VM ?(I can't see 
 anything in the code or extensions which calls the relevant compute.api 
 method)
 
 
 If there's nothing at the moment are there any plans for adding in (seems 
 like is should be a core server action rather that an extension) ?
 
 Thanks,
 Phil
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 This email may include confidential information. If you received it in 
 error, please delete it.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

This email may include confidential information. If you received it in error, 
please delete it.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] API for Configurtaion Drive ?

2011-09-26 Thread Day, Phil
Hi Folks,

Can anyone point me towards some documentation on how to use the Configuration 
Drive feature - in particular does it have its own API, or is passed as 
attributes to Create Server ?

Thanks,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Anyone using multi_host networking with VLANs ?

2011-09-16 Thread Day, Phil
HI Folks,

Looking through the code it looks as if all of the changes to support the 
multi_host Network model have been made in the NetworkManager base class - so 
although I've only seen a description of this being used in Flat networks I 
wondered if anyone has tried it in VLAN mode (or knows of any gotchas that 
would stop it working with VLANs) ?

Cheers,
Phil
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] OpenStack nova data model

2011-09-15 Thread Day, Phil
 Looks pretty good. It's worth mentioning that some (many?) of the
 tables that deal with Identity information are being removed because
 they duplicate data in Keystone. So, tables like User, AuthToken,
 Project, UserRoleAssociation, UserProjectAssociation and
 UserProjectRoleAssociation may change significantly in the near
 future.

 Cheers!
 jay

Hi Jay,

I'd understood that it was still possible to run Diablo without keystone - but 
your comments that some the table that would be needed to support that will be 
removed has got me worried.  

Can someone confirm if Diablo can be run without keystone please (i.e. in a 
backwards compatible mode) ?

Thanks,
Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 14 September 2011 16:43
To: Takahiro Shida
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] OpenStack nova data model

On Wed, Sep 14, 2011 at 11:24 AM, Takahiro Shida
shida.takah...@gmail.com wrote:
 Hi Stackers,

 I'm interested in openstack nova design and architecture to extend,
 improvement openstack.
 I wanted to know about openstack in deep, so I searched  openstack
 database schema.
 And I found it.

 http://wiki.openstack.org/NovaDatabaseSchema

 But, it looks out of date to recent trunk. I'm start to create new
 data model document on according to trunk.
 Please share following

 https://docs.google.com/spreadsheet/ccc?key=0AsUHVTZg__ridEE3cjdrTWZaRGxtSXd0dVRUT0ZsdlEhl=en_US#gid=0

Looks pretty good. It's worth mentioning that some (many?) of the
tables that deal with Identity information are being removed because
they duplicate data in Keystone. So, tables like User, AuthToken,
Project, UserRoleAssociation, UserProjectAssociation and
UserProjectRoleAssociation may change significantly in the near
future.

Cheers!
jay

 I'm unfamiliar about new features, Quantum and Virtual Server Array and so on.
 Maybe I have a misunderstanding about these.

 Best Regards
  Takahiro Shida

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp