Re: [openstack-dev] [all] The future of the integrated release

2014-08-07 Thread Chris Friesen
On 08/07/2014 12:32 PM, Eoghan Glynn wrote: If we try to limit the number of WIP slots, then surely aspiring contributors will simply work around that restriction by preparing the code they're interested in on their own private branches, or in their github forks? OK, some pragmatic

Re: [openstack-dev] [git-review] Supporting development in local branches

2014-08-07 Thread Chris Friesen
On 08/07/2014 04:52 PM, Yuriy Taraday wrote: I hope you don't think that this thread was about rebases vs merges. It's about keeping track of your changes without impact on review process. But if you rebase, what is stopping you from keeping whatever private history you want and then rebase

[openstack-dev] testing performance/latency of various components?

2014-08-08 Thread Chris Friesen
Is there a straightforward way to determine where the time is going when I run a command from novaclient? For instance, if I run nova list, that's going to run novaclient, which will send a message to nova-api, which wakes up and does some processing and sends a message to nova-conductor,

Re: [openstack-dev] [all] The future of the integrated release

2014-08-20 Thread Chris Friesen
On 08/20/2014 07:21 AM, Jay Pipes wrote: Hi Thierry, thanks for the reply. Comments inline. :) On 08/20/2014 06:32 AM, Thierry Carrez wrote: If we want to follow your model, we probably would have to dissolve programs as they stand right now, and have blessed categories on one side, and teams

Re: [openstack-dev] [all] The future of the integrated release

2014-08-21 Thread Chris Friesen
On 08/20/2014 09:54 PM, Clint Byrum wrote: Excerpts from Jay Pipes's message of 2014-08-20 14:53:22 -0700: On 08/20/2014 05:06 PM, Chris Friesen wrote: On 08/20/2014 07:21 AM, Jay Pipes wrote: Hi Thierry, thanks for the reply. Comments inline. :) On 08/20/2014 06:32 AM, Thierry Carrez wrote

Re: [openstack-dev] [nova] Server Groups - remove VM from group?

2014-08-26 Thread Chris Friesen
On 08/25/2014 11:25 AM, Joe Cropper wrote: I was thinking something simple such as only allowing the add operation to succeed IFF no policies are found to be in violation... and then nova wouldn't need to get into all the complexities you mention? Personally I would be in favour of

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Chris Friesen
On 08/28/2014 01:44 PM, Jay Pipes wrote: On 08/27/2014 09:04 PM, Dugger, Donald D wrote: I understand that reviews are a burden and very hard but it seems wrong that a BP with multiple positive reviews and no negative reviews is dropped because of what looks like indifference. I would posit

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Chris Friesen
On 08/28/2014 02:25 PM, Jay Pipes wrote: On 08/28/2014 04:05 PM, Chris Friesen wrote: On 08/28/2014 01:44 PM, Jay Pipes wrote: On 08/27/2014 09:04 PM, Dugger, Donald D wrote: I understand that reviews are a burden and very hard but it seems wrong that a BP with multiple positive reviews

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Chris Friesen
On 08/28/2014 03:02 PM, Jay Pipes wrote: I understand your frustration about the silence, but the silence from core team members may actually be a loud statement about where their priorities are. Or it could be that they haven't looked at it, aren't aware of it, or haven't been paying

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Chris Friesen
On 08/28/2014 04:01 PM, Joe Gordon wrote: On Thu, Aug 28, 2014 at 2:43 PM, Alan Kavanagh alan.kavan...@ericsson.com mailto:alan.kavan...@ericsson.com wrote: I share Donald's points here, I believe what would help is to clearly describe in the Wiki the process and workflow for the BP

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Chris Friesen
On 09/05/2014 03:52 AM, Daniel P. Berrange wrote: So my biggest fear with a model where each team had their own full Nova tree and did large pull requests, is that we'd suffer major pain during the merging of large pull requests, especially if any of the merges touched common code. It could

[openstack-dev] anyone using RabbitMQ with active/active mirrored queues?

2014-09-10 Thread Chris Friesen
Hi, I see that the OpenStack high availability guide is still recommending the active/standby method of configuring RabbitMQ. Has anyone tried using active/active with mirrored queues as recommended by the RabbitMQ developers? If so, what problems did you run into? Thanks, Chris

[openstack-dev] [nova] expected behaviour of _report_state() on rabbitmq failover

2014-09-10 Thread Chris Friesen
Hi, I'm running Havana and I'm seeing some less-than-ideal behaviour on rabbitmq failover. I'd like to figure out if this is expected behaviour or if something is going wrong. We're running rabbitmq in active/standby mode with DRBD storage. On the controller the timeline looks like this:

Re: [openstack-dev] [nova] expected behaviour of _report_state() on rabbitmq failover

2014-09-10 Thread Chris Friesen
On 09/10/2014 02:13 PM, Chris Friesen wrote: As it stands, it seems that waiting for the RPC call to time out blocks _report_state() from running again in report_interval seconds, which delays the service update until the RPC timeout period expires. Just noticed something... In the case

Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Chris Friesen
On 09/10/2014 02:44 AM, Daniel P. Berrange wrote: On Tue, Sep 09, 2014 at 05:14:43PM -0700, Stefano Maffulli wrote: I have the impression this idea has been circling around for a while but for some reason or another (like lack of capabilities in gerrit and other reasons) we never tried to

Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Chris Friesen
On 09/10/2014 04:11 PM, Jay Pipes wrote: On 09/10/2014 05:55 PM, Chris Friesen wrote: If each hypervisor team mostly only modifies their own code, why would there be conflict? As I see it, the only causes for conflict would be in the shared code, and you'd still need to sort out the issues

Re: [openstack-dev] [nova] Server Groups - remove VM from group?

2014-09-10 Thread Chris Friesen
On 09/10/2014 04:16 PM, Russell Bryant wrote: On Sep 10, 2014, at 2:03 PM, Joe Cropper cropper@gmail.com wrote: I would like to craft up a blueprint proposal for Kilo to add two simple extensions to the existing server group APIs that I believe will make them infinitely more usable in

Re: [openstack-dev] anyone using RabbitMQ with active/active mirrored queues?

2014-09-11 Thread Chris Friesen
On 09/11/2014 12:50 AM, Jesse Pretorius wrote: On 10 September 2014 17:20, Chris Friesen chris.frie...@windriver.com mailto:chris.frie...@windriver.com wrote: I see that the OpenStack high availability guide is still recommending the active/standby method of configuring RabbitMQ

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Chris Friesen
On 09/11/2014 12:02 PM, Dan Prince wrote: Maybe I'm impatient (I totally am!) but I see much of the review slowdown as a result of the feedback loop times increasing over the years. OpenStack has some really great CI and testing but I think our focus on not breaking things actually has us

Re: [openstack-dev] [nova] Server Groups - remove VM from group?

2014-09-11 Thread Chris Friesen
On 09/11/2014 03:01 PM, Jay Pipes wrote: On 09/11/2014 04:51 PM, Matt Riedemann wrote: On 9/10/2014 6:00 PM, Russell Bryant wrote: On 09/10/2014 06:46 PM, Joe Cropper wrote: Hmm, not sure I follow the concern, Russell. How is that any different from putting a VM into the group when it’s

[openstack-dev] how to debug RPC timeout issues?

2014-09-16 Thread Chris Friesen
Hi, I'm running Havana, and I just tried a testcase involving doing six simultaneous live-migrations. It appears that the migrations succeeded, but two of the instances got stuck with a status of MIGRATING because of RPC timeouts: 2014-09-16 20:35:07.376 12493 INFO nova.notifier [-]

Re: [openstack-dev] [Nova] Some ideas for micro-version implementation

2014-09-23 Thread Chris Friesen
On 09/23/2014 12:19 PM, Sean Dague wrote: On 09/23/2014 02:10 PM, Kevin L. Mitchell wrote: On Tue, 2014-09-23 at 18:39 +0100, Louis Taylor wrote: On Tue, Sep 23, 2014 at 01:32:50PM -0400, Andrew Laski wrote: I've been thinking along very similar lines, but I don't think each current API needs

[openstack-dev] anyone aware of networking issues with grizzly live migration of kvm instances?

2013-12-09 Thread Chris Friesen
Hi, We've got a grizzly setup using quantum networking and libvirt/kvm with VIR_MIGRATE_LIVE set. I was live-migrating an instance back and forth between a couple of compute nodes. It worked fine for maybe half a dozen migrations and then after a migration I could no longer ping it. It

Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

2013-12-12 Thread Chris Friesen
On 12/12/2013 11:02 AM, Clint Byrum wrote: So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive

[openstack-dev] why don't we deal with claims when live migrating an instance?

2014-01-15 Thread Chris Friesen
When we create a new instance via _build_instance() or _build_and_run_instance(), in both cases we call instance_claim() to reserve and test for resources. During a cold migration I see us calling prep_resize() which calls resize_claim(). How come we don't need to do something like this

Re: [openstack-dev] Proposal for dd disk i/o performance blueprint of cinder.

2014-01-15 Thread Chris Friesen
On 12/26/2013 01:56 AM, cosmos cosmos wrote: Hello. My name is Rucia for Samsung SDS. I had in truouble in volume deleting. I am developing for supporting big data storage such as hadoop in lvm. it use as a full disk io for deleting of cinder lvm volume because of dd the high disk I/O

Re: [openstack-dev] Proposal for dd disk i/o performance blueprint of cinder.

2014-01-15 Thread Chris Friesen
On 01/15/2014 06:00 PM, Fox, Kevin M wrote: What about a configuration option on the volume for delete type? I can see some possible options: * None - Don't clear on delete. Its junk data for testing and I don't want to wait. * Zero - Return zero's from subsequent reads either by zeroing on

Re: [openstack-dev] Proposal for dd disk i/o performance blueprint of cinder.

2014-01-15 Thread Chris Friesen
On 01/15/2014 06:30 PM, Jay S Bryant wrote: There is already an option that can be set in cinder.conf using 'volume_clear=none' Is there a reason that that option is not sufficient? That option would be for the cloud operator and since it would apply to all volumes on that cinder node. My

Re: [openstack-dev] Proposal for dd disk i/o performance blueprint of cinder.

2014-01-16 Thread Chris Friesen
On 01/15/2014 11:25 PM, Clint Byrum wrote: Excerpts from Alan Kavanagh's message of 2014-01-15 19:11:03 -0800: Hi Paul I posted a query to Ironic which is related to this discussion. My thinking was I want to ensure the case you note here (1) a tenant can not read another tenants disk..

[openstack-dev] [nova] how is resource tracking supposed to work for live migration and evacuation?

2014-01-16 Thread Chris Friesen
Hi, I'm trying to figure out how resource tracking is intended to work for live migration and evacuation. For a while I thought that maybe we were relying on the call to ComputeManager._instance_update() in ComputeManager.post_live_migration_at_destination(). However, in

Re: [openstack-dev] Proposal for dd disk i/o performance blueprint of cinder.

2014-01-16 Thread Chris Friesen
On 01/16/2014 04:22 PM, Clint Byrum wrote: Excerpts from Fox, Kevin M's message of 2014-01-16 09:29:14 -0800: Yeah, I think the evil firmware issue is separate and should be solved separately. Ideally, there should be a mode you can set the bare metal server into where firmware updates are

Re: [openstack-dev] Evil Firmware

2014-01-16 Thread Chris Friesen
On 01/16/2014 05:12 PM, CARVER, PAUL wrote: Jumping back to an earlier part of the discussion, it occurs to me that this has broader implications. There's some discussion going on under the heading of Neutron with regard to PCI passthrough. I imagine it's under Neutron because of a desire to

Re: [openstack-dev] [ironic] Disk Eraser

2014-01-17 Thread Chris Friesen
On 01/17/2014 04:20 PM, Devananda van der Veen wrote: tl;dr, We should not be recycling bare metal nodes between untrusted tenants at this time. There's a broader discussion about firmware security going on, which, I think, will take a while for the hardware vendors to really address. What

Re: [openstack-dev] [nova]Why not allow to create a vm directly with two VIF in the same network

2014-01-24 Thread Chris Friesen
On 01/24/2014 08:33 AM, CARVER, PAUL wrote: I agree that I’d like to see a set of use cases for this. This is the second time in as many days that I’ve heard about a desire to have such a thing but I still don’t think I understand any use cases adequately. In the physical world it makes perfect

Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV

2014-01-28 Thread Chris Friesen
On 01/28/2014 10:55 AM, Jani, Nrupal wrote: While technically it is possible, we as a team can decide about the final recommendationJGiven that VFs are going to be used for the high-performance VMs, mixing VMs with virtio VFs may not be a good option. Initially we can use PF interface for the

[openstack-dev] transactions in openstack REST API?

2014-02-03 Thread Chris Friesen
Has anyone ever considered adding the concept of transaction IDs to the openstack REST API? I'm envisioning a way to handle long-running transactions more cleanly. For example: 1) A user sends a request to live-migrate an instance 2) Openstack acks the request and includes a transaction

Re: [openstack-dev] [Nova][Scheduler] Policy Based Scheduler and Solver Scheduler

2014-02-03 Thread Chris Friesen
On 02/03/2014 12:28 PM, Khanh-Toan Tran wrote: Another though would be the need for Instance Group API [1]. Currently users can only request multiple instances of the same flavors. These requests do not need LP to solve, just placing instances one by one is sufficient. Therefore we need this

Re: [openstack-dev] transactions in openstack REST API?

2014-02-03 Thread Chris Friesen
On 02/03/2014 01:31 PM, Andrew Laski wrote: On 02/03/14 at 01:10pm, Chris Friesen wrote: Has anyone ever considered adding the concept of transaction IDs to the openstack REST API? I'm envisioning a way to handle long-running transactions more cleanly. For example: 1) A user sends a request

Re: [openstack-dev] [Nova][Scheduler] Policy Based Scheduler and Solver Scheduler

2014-02-11 Thread Chris Friesen
On 02/11/2014 03:21 AM, Khanh-Toan Tran wrote: Second, there is nothing wrong with booting the instances (or instantiating other resources) as separate commands as long as we support some kind of reservation token. I'm not sure what reservation token would do, is it some kind of informing

Re: [openstack-dev] [nova][libvirt] Is there anything blocking the libvirt driver from implementing the host_maintenance_mode API?

2014-02-24 Thread Chris Friesen
On 02/20/2014 11:38 AM, Matt Riedemann wrote: On 2/19/2014 4:05 PM, Matt Riedemann wrote: The os-hosts OS API extension [1] showed up before I was working on the project and I see that only the VMware and XenAPI drivers implement it, but was wondering why the libvirt driver doesn't - either

[openstack-dev] [nova] why doesn't _rollback_live_migration() always call rollback_live_migration_at_destination()?

2014-02-24 Thread Chris Friesen
I'm looking at the live migration rollback code and I'm a bit confused. When setting up a live migration we unconditionally run ComputeManager.pre_live_migration() on the destination host to do various things including setting up networks on the host. If something goes wrong with the live

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-24 Thread Chris Friesen
On 02/24/2014 04:01 PM, Morgan Fainberg wrote: TL;DR, “don’t break the contract”. If we are seriously making incompatible changes (and we will be regardless of the direction) the only reasonable option is a new major version. Agreed. I don't think we can possibly consider making

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-24 Thread Chris Friesen
On 02/24/2014 04:59 PM, Sean Dague wrote: So, that begs a new approach. Because I think at this point even if we did put out Nova v3, there can never be a v4. It's too much, too big, and doesn't fit in the incremental nature of the project. Does it necessarily need to be that way though?

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-24 Thread Chris Friesen
On 02/24/2014 05:17 PM, Sean Dague wrote: On 02/24/2014 06:13 PM, Chris Friesen wrote: On 02/24/2014 04:59 PM, Sean Dague wrote: So, that begs a new approach. Because I think at this point even if we did put out Nova v3, there can never be a v4. It's too much, too big, and doesn't fit

Re: [openstack-dev] [nova] why doesn't _rollback_live_migration() always call rollback_live_migration_at_destination()?

2014-02-25 Thread Chris Friesen
On 02/25/2014 05:15 AM, John Garbutt wrote: On 24 February 2014 22:14, Chris Friesen chris.frie...@windriver.com wrote: What happens if we have a shared-storage instance that we try to migrate and fail and end up rolling back? Are we going to end up with messed-up networking

[openstack-dev] need advice on how to supply automated testing with bugfix patch

2014-02-25 Thread Chris Friesen
I'm in the process of putting together a bug report and a patch for properly handling resource tracking on live migration. The change involves code that will run on the destination compute node in order to properly account for the resources that the instance to be migrated will consume.

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-26 Thread Chris Friesen
On 02/26/2014 04:50 PM, Dan Smith wrote: So if we make backwards incompatible changes we really need a major version bump. Minor versions don't cut it, because the expectation is you have API stability within a major version. I disagree. If the client declares support for it, I think we can

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-27 Thread Chris Friesen
On 02/27/2014 08:43 AM, Dan Smith wrote: So I think once we start returning different response codes, or completely different structures (such as the tasks change will be), it doesn't matter if we make the change in effect by invoking /v2 prefix or /v3 prefix or we look for a header. Its a major

Re: [openstack-dev] [nova] Future of the Nova API

2014-02-27 Thread Chris Friesen
On 02/27/2014 06:00 PM, Alex Xu wrote: Does mean our code looks like as below? if client_version 2: elif client_version 3 ... elif client_version 4: ... elif client_version 5: ... elif client_version 6: .. And we need test each version... That looks bad... I don't

[openstack-dev] inconsistent naming? node vs host vs vs hypervisor_hostname vs OS-EXT-SRV-ATTR:host

2014-02-28 Thread Chris Friesen
Hi, I've been working with OpenStack for a while now but I'm still a bit fuzzy on the precise meaning of some of the terminology. It seems reasonably clear that a node is a computer running at least one component of an Openstack system. However, nova service-list talks about the host that

Re: [openstack-dev] inconsistent naming? node vs host vs vs hypervisor_hostname vs OS-EXT-SRV-ATTR:host

2014-02-28 Thread Chris Friesen
On 02/28/2014 11:38 AM, Jiang, Yunhong wrote: One reason of the confusion is, in some virt driver (maybe xenapi or vmwareapi), one compute service manages multiple node. Okay, so in the scenario above, is the nova-compute service running on a node or a host? (And if it's a host, then what is

Re: [openstack-dev] [nova] Future of the Nova API

2014-03-03 Thread Chris Friesen
On 03/03/2014 08:14 AM, Steve Gordon wrote: I would be interested in your opinion on the impact of a V2 version release which had backwards incompatibility in only one area - and that is input validation. So only apps/SDKs which are currently misusing the API (I think the most common problem

[openstack-dev] [nova] [bug?] live migration fails with boot-from-volume

2014-03-07 Thread Chris Friesen
Hi, I was just testing the current icehouse code and came across some behaviour that looked suspicious. I have two nodes, an all-in-one and a compute node. I was not using shared instance storage. I created a volume from an image and then booted an instance from the volume. Once the

[openstack-dev] UTF-8 required charset/encoding for openstack database?

2014-03-10 Thread Chris Friesen
Hi, I'm using havana and recent we ran into an issue with heat related to character sets. In heat/db/sqlalchemy/api.py in user_creds_get() we call _decrypt() on an encrypted password stored in the database and then try to convert the result to unicode. Today we hit a case where this

Re: [openstack-dev] [nova] [bug?] live migration fails with boot-from-volume

2014-03-10 Thread Chris Friesen
On 03/08/2014 02:23 AM, ChangBo Guo wrote: Are you using libvirt driver ? As I remember, the way to check if compute nodes with shared storage is : create a temporary file from source node , then check the file from dest node , by accessing file system from operating system level. And

Re: [openstack-dev] UTF-8 required charset/encoding for openstack database?

2014-03-10 Thread Chris Friesen
On 03/10/2014 02:02 PM, Ben Nemec wrote: We just had a discussion about this in #openstack-oslo too. See the discussion starting at 2014-03-10T16:32:26 http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2014-03-10.log In that discussion dhellmann said, I wonder if we

Re: [openstack-dev] [nova] a question about instance snapshot

2014-03-10 Thread Chris Friesen
On 03/10/2014 02:58 PM, Jay Pipes wrote: On Mon, 2014-03-10 at 16:30 -0400, Shawn Hartsock wrote: While I understand the general argument about pets versus cattle. The question is, would you be willing to poke a few holes in the strict cattle abstraction for the sake of pragmatism. Few shops

Re: [openstack-dev] UTF-8 required charset/encoding for openstack database?

2014-03-12 Thread Chris Friesen
On 03/11/2014 05:50 PM, Clint Byrum wrote: But MySQL can't possibly know what you _meant_ when you were inserting data. So, if you _assumed_ that the database was UTF-8, and inserted UTF-8 with all of those things accidentally set for latin1, then you will have UTF-8 in your db, but MySQL will

[openstack-dev] any recommendations for live debugging of openstack services?

2014-03-12 Thread Chris Friesen
Are there any tools that people can recommend for live debugging of openstack services? I'm looking for a mechanism where I could take a running system that isn't behaving the way I expect and somehow poke around inside the program while it keeps running. (Sort of like tracepoints in gdb.)

[openstack-dev] [nova] [bug?] possible postgres/mysql incompatibility in InstanceGroup.get_hosts()

2014-03-15 Thread Chris Friesen
Hi, I'm trying to run InstanceGroup.get_hosts() on a havana installation that uses postgres. When I run the code, I get the following error: RemoteError: Remote error: ProgrammingError (ProgrammingError) operator does not exist: timestamp without time zone ~ unknown 2014-03-14 09:58:57.193

[openstack-dev] [nova] question about e41fb84 fix anti-affinity race condition on boot

2014-03-15 Thread Chris Friesen
Hi, I'm curious why the specified git commit chose to fix the anti-affinity race condition by aborting the boot and triggering a reschedule. It seems to me that it would have been more elegant for the scheduler to do a database transaction that would atomically check that the chosen host

Re: [openstack-dev] [nova] [bug?] possible postgres/mysql incompatibility in InstanceGroup.get_hosts()

2014-03-15 Thread Chris Friesen
On 03/15/2014 04:29 AM, Sean Dague wrote: On 03/15/2014 02:49 AM, Chris Friesen wrote: Hi, I'm trying to run InstanceGroup.get_hosts() on a havana installation that uses postgres. When I run the code, I get the following error: RemoteError: Remote error: ProgrammingError (ProgrammingError

Re: [openstack-dev] [nova] question about e41fb84 fix anti-affinity race condition on boot

2014-03-17 Thread Chris Friesen
On 03/17/2014 11:59 AM, John Garbutt wrote: On 17 March 2014 17:54, John Garbutt j...@johngarbutt.com wrote: Given the scheduler split, writing that value into the nova db from the scheduler would be a step backwards, and it probably breaks lots of code that assumes the host is not set until

Re: [openstack-dev] [nova] question about e41fb84 fix anti-affinity race condition on boot

2014-03-17 Thread Chris Friesen
On 03/17/2014 01:29 PM, Andrew Laski wrote: On 03/17/14 at 01:11pm, Chris Friesen wrote: On 03/17/2014 11:59 AM, John Garbutt wrote: On 17 March 2014 17:54, John Garbutt j...@johngarbutt.com wrote: Given the scheduler split, writing that value into the nova db from the scheduler would

[openstack-dev] [nova] need help with unit test framework, trying to fix bug 1292963

2014-03-17 Thread Chris Friesen
I've submitted code for review at https://review.openstack.org/80808; but it seems to break the unit tests. Where do the deleted and deleted_at fields for the instance get created for unit tests? Where is the database stored for unit tests, and is there a way to look at it directly? Here

Re: [openstack-dev] [nova] question about e41fb84 fix anti-affinity race condition on boot

2014-03-17 Thread Chris Friesen
On 03/17/2014 02:30 PM, Sylvain Bauza wrote: There is a global concern here about how an holistic scheduler can perform decisions, and from which key metrics. The current effort is leading to having the Gantt DB updated thanks to resource tracker for scheduling appropriately the hosts. If we

Re: [openstack-dev] [nova] need help with unit test framework, trying to fix bug 1292963

2014-03-17 Thread Chris Friesen
On 03/17/2014 04:04 PM, Joe Gordon wrote: On Mon, Mar 17, 2014 at 2:16 PM, Chris Friesen chris.frie...@windriver.com mailto:chris.frie...@windriver.com wrote: The original code looks like this: filters = {'uuid': filter_uuids, 'deleted_at': None} instances

Re: [openstack-dev] [nova] question about e41fb84 fix anti-affinity race condition on boot

2014-03-17 Thread Chris Friesen
On 03/17/2014 05:01 PM, Sylvain Bauza wrote: There are 2 distinct cases : 1. there are multiple schedulers involved in the decision 2. there is one single scheduler but there is a race condition on it About 1., I agree we need to see how the scheduler (and later on Gantt) could address

Re: [openstack-dev] [nova] need help with unit test framework, trying to fix bug 1292963

2014-03-18 Thread Chris Friesen
On 03/17/2014 04:28 PM, Chris Friesen wrote: The second one filters out all of the objects and returns nothing. (Pdb) query_prefix.filter(models.Instance.vm_state != vm_states.SOFT_DELETED).all() [] I think I've found another problem. (The rabbit hole continues...) It appears

Re: [openstack-dev] [Nova][Heat] How to reliably detect VM failures?

2014-03-19 Thread Chris Friesen
On 03/18/2014 11:18 AM, Zane Bitter wrote: On 18/03/14 12:42, Steven Dake wrote: You should be able to use the HARestarter resource and functionality to do healthchecking of a vm. HARestarter is actually pretty problematic, both in a causes major architectural headaches for Heat and will

Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?

2014-03-19 Thread Chris Friesen
On 03/19/2014 02:24 PM, Fox, Kevin M wrote: Can someone please give more detail into why MongoDB being AGPL is a problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is separated by the network stack and MongoDB is not exposed to the Marconi users so I don't think the 'A' part of

Re: [openstack-dev] [Nova][Heat] How to reliably detect VM failures?

2014-03-19 Thread Chris Friesen
On 03/19/2014 08:38 PM, Qiming Teng wrote: I don't think it a good idea to rely on some external monitoring systems to do a VM failure detection. It means additional steps to set up, additional software to upgrade, additional chapter in the Operator's Guide, etc. We are evaluating whether

[openstack-dev] [nova] instances stuck with task_state of REBOOTING

2014-03-20 Thread Chris Friesen
I'm running a havana install, and during some testing I've managed to get the system into a state where two instances are up and running but are reporting a task_state of REBOOTING. I can see the nova-api logs showing the soft-reboot request. I don't see a corresponding nova-compute log

Re: [openstack-dev] [nova] instances stuck with task_state of REBOOTING

2014-03-20 Thread Chris Friesen
On 03/20/2014 12:06 PM, Solly Ross wrote: Hi Chris, Are you in the position to determine whether or not this happens with the latest master code? Either way, it definitely looks like a bug. Unfortunately not right now, working towards a deadline. If you could give more specific reproduction

Re: [openstack-dev] [nova] instances stuck with task_state of REBOOTING

2014-03-20 Thread Chris Friesen
On 03/20/2014 12:29 PM, Chris Friesen wrote: The fact that there are no success or error logs in nova-compute.log makes me wonder if we somehow got stuck in self.driver.reboot(). Also, I'm kind of wondering what would happen if nova-compute was running reboot_instance() and we rebooted

Re: [openstack-dev] [nova] Backwards incompatible API changes

2014-03-21 Thread Chris Friesen
This is sort of off on a tangent, but one of the things that resulted in this being a problem was the fact that if someone creates a private flavor and then tries to add access second flavor access call will fail because the the tenant already is on the access list. Something I was

Re: [openstack-dev] [nova] instances stuck with task_state of REBOOTING

2014-03-21 Thread Chris Friesen
On 03/21/2014 08:41 AM, Solly Ross wrote: Well, if messages are getting dropped on the floor due to communication issues, that's not a good thing. If you have time, could you determine why the messages are getting dropped on the floor? We shouldn't be doing things that require both the

[openstack-dev] auto-delete in amqp reply_* queues in OpenStack

2014-03-23 Thread Chris Friesen
Hi, If I run rabbitmqadmin list queues on my controller node I see 28 queues with names of the form reply_uuid. From what I've been reading, these queues are supposed to be used for the replies to rpc calls, they're not durable', and they all have auto_delete set to True. Given the above,

Re: [openstack-dev] auto-delete in amqp reply_* queues in OpenStack

2014-03-24 Thread Chris Friesen
On 03/24/2014 02:59 AM, Dmitry Mescheryakov wrote: Chris, In oslo.messaging a single reply queue is used to gather results from all the calls. It is created lazily on the first call and is used until the process is killed. I did a quick look at oslo.rpc from oslo-incubator and it seems like it

[openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
We've been stress-testing our system doing controlled switchover of the controller. Normally this works okay, but we've run into a situation that seems to show a flaw in the reconnection logic. On the compute node, nova-compute has managed to get into a state where it shows as down in nova

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 10:41 AM, Chris Friesen wrote: We've been stress-testing our system doing controlled switchover of the controller. Normally this works okay, but we've run into a situation that seems to show a flaw in the reconnection logic. On the compute node, nova-compute has managed to get

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 10:59 AM, Dan Smith wrote: Any ideas on what might be going on would be appreciated. This looks like something that should be filed as a bug. I don't have any ideas off hand, bit I will note that the reconnection logic works fine for us in the upstream upgrade tests. That

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 11:31 AM, Chris Friesen wrote: It looks like we're raising RecoverableConnectionError: connection already closed down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but nothing handles it. It looks like the most likely place that should be handling

Re: [openstack-dev] auto-delete in amqp reply_* queues in OpenStack

2014-03-24 Thread Chris Friesen
On 03/24/2014 01:27 PM, Dmitry Mescheryakov wrote: I see two possible explanations for these 5 remaining queues: * They were indeed recreated by 'compute' services. I.e. controller service send some command over rpc and then it was shut down. Its reply queue was automatically deleted, since

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 07:45 PM, Chris Behrens wrote: Do you have some sort of network device like a firewall between your compute and rabbit or you failed from one rabbit over to another? There are two controllers (active/standby) and two computes all hooked up to the same switch. We definitely did

Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover

2014-03-24 Thread Chris Friesen
On 03/24/2014 09:24 PM, Chris Friesen wrote: The problem is that the RPC code in Havana doesn't handle connection error exceptions. The oslo.messaging code used in Icehouse does. If we ported https://github.com/openstack/oslo.messaging/commit/0400cbf4f83cf8d58076c7e65e08a156ec3508a8

[openstack-dev] [nova] should there be an audit to clear the REBOOTING task_state?

2014-03-25 Thread Chris Friesen
I've reported a bug (https://bugs.launchpad.net/nova/+bug/1296967) where we got stuck with a task_state of REBOOTING due to what seem to be RPC issues. Regardless of how we got there, currently there is no audit that will clear the task_state if it gets stuck. Because of this, once we got

Re: [openstack-dev] [All][Keystone] Deprecation of the v2 API

2014-03-25 Thread Chris Friesen
On 03/25/2014 04:50 PM, Russell Bryant wrote: We discussed the deprecation of the v2 keystone API in the cross-project meeting today [1]. This thread is to recap and bring that discussion to some consensus. snip In summary, until we have completed v3 support within OpenStack itself, it's

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-26 Thread Chris Friesen
On 03/25/2014 02:50 PM, Sangeeta Singh wrote: What I am trying to achieve is have two AZ that the user can select during the boot but then have a default AZ which has the HV from both AZ1 AND AZ2 so that when the user does not specify any AZ in the boot command I scatter my VM on both the AZ

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-26 Thread Chris Friesen
On 03/26/2014 10:47 AM, Vishvananda Ishaya wrote: Personally I view this as a bug. There is no reason why we shouldn’t support arbitrary grouping of zones. I know there is at least one problem with zones that overlap regarding displaying them properly:

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-26 Thread Chris Friesen
On 03/26/2014 11:17 AM, Khanh-Toan Tran wrote: I don't know why you need a compute node that belongs to 2 different availability-zones. Maybe I'm wrong but for me it's logical that availability-zones do not share the same compute nodes. The availability-zones have the role of partition your

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Chris Friesen
On 03/27/2014 11:48 AM, Day, Phil wrote: Sorry if I'm coming late to this thread, but why would you define AZs to cover othognal zones ? See Vish's first message. AZs are a very specific form of aggregate - they provide a particular isolation schematic between the hosts (i.e. physical hosts

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Chris Friesen
On 03/27/2014 12:28 PM, Day, Phil wrote: Personally I'm a bit worried about users having too fine a granularity over where they place a sever - AZs are generally few and big so you can afford to allow this and not have capacity issues, but if I had to expose 40 different rack based zones it

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-27 Thread Chris Friesen
On 03/27/2014 12:49 PM, Day, Phil wrote: -Original Message- From: Chris Friesen [mailto:chris.frie...@windriver.com] On 03/27/2014 11:48 AM, Day, Phil wrote: nova boot --availability-zone az1 --scheduler-hint want-fast-cpu --scheduler-hint want-ssd ... Does this actually work

[openstack-dev] [nova] [bug] nova server-group-list doesn't show any members

2014-03-27 Thread Chris Friesen
I've filed this as a bug (https://bugs.launchpad.net/nova/+bug/1298494) but I thought I'd post it here as well to make sure it got visibility. If I create a server group, then boot a server as part of the group, then run nova server-group-list it doesn't show the server as being a member of

Re: [openstack-dev] [nova] [bug] nova server-group-list doesn't show any members

2014-03-27 Thread Chris Friesen
On 03/27/2014 03:57 PM, Chris Friesen wrote: If I change the filter to use 'deleted': False instead of 'deleted_at': None then it works as expected. The leads to a couple of questions: 1) There is a column deleted_at in the database table, why can't we filter on it? I wonder if maybe

Re: [openstack-dev] [nova] [bug] nova server-group-list doesn't show any members

2014-03-27 Thread Chris Friesen
On 03/27/2014 03:57 PM, Chris Friesen wrote: The leads to a couple of questions: 1) There is a column deleted_at in the database table, why can't we filter on it? 2) How did this get submitted when it doesn't work? I've updated to the current codebase in devstack and I'm still seeing

Re: [openstack-dev] [nova] [bug] nova server-group-list doesn't show any members

2014-03-27 Thread Chris Friesen
On 03/27/2014 04:47 PM, Chris Friesen wrote: Interestingly, unit test nova.tests.api.openstack.compute.contrib.test_server_groups.ServerGroupTest.test_display_members passes just fine, and it seems to be running the same sqlalchemy code. Is this a case where sqlite behaves differently from

Re: [openstack-dev] [nova][scheduler] Availability Zones and Host aggregates..

2014-03-28 Thread Chris Friesen
On 03/28/2014 05:01 AM, Jesse Pretorius wrote: On 27 March 2014 20:52, Chris Friesen chris.frie...@windriver.com mailto:chris.frie...@windriver.com wrote: It'd be nice to be able to do a heat template where you could specify things like put these three servers on separate hosts from

[openstack-dev] [nova] [bug] unit tests sqlite regexp() function doesn't behave like mysql

2014-03-31 Thread Chris Friesen
I mentioned this last week in another thread but I suspect it got lost. I recently came across a situation where the code failed when running it under devstack but passed the unit tests. It turns out that the unit tests regexp() behaves differently than the built-in one in mysql. Down in

  1   2   3   4   5   6   >